Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete mov_i2xmm and mov_xmm2i. #47843

Merged
merged 4 commits into from
Feb 9, 2021
Merged

Conversation

sandreenko
Copy link
Contributor

@sandreenko sandreenko commented Feb 4, 2021

The issue was in the xmm2i definition that was accepting the source argument on the first position, so each time we returned it from ins_Copy we had a potential bad code because the natural order for the Jit and for Intel is dst first.
The strange order was implemented because of intel encoding for this instruction, but even in intel disasm and manual the instruction is shown naturally with dst on the first place.

So the fix is to make the right level of abstraction and hide this logic in the instruction encoding, then delete all places where we passed source xmm as the first argument.

It makes our disasm looking like Intel and prevents issues with emitIns_R_R(ins_Copy(src, destType), size, dst, src).

The issue was tricky because jit-diff does not show such diffs (because it was using JitDisasm that printed wrong order).

I have checked that there are no stress failures and that all possible pairs can be encoded.

A simple test
    for (unsigned i = REG_EAX; i <= REG_EDI; ++i)
    {
        for (unsigned j = REG_XMM0; j <= REG_XMM7; ++j)
        {
            regNumber dstReg = regNumber(i);
            regNumber srcReg = regNumber(j);
            GetEmitter()->emitIns_R_R(INS_movd, EA_4BYTE, dstReg, srcReg);
        }
    }

    for (unsigned i = REG_EAX; i <= REG_EDI; ++i)
    {
        for (unsigned j = REG_XMM0; j <= REG_XMM7; ++j)
        {
            regNumber dstReg = regNumber(j);
            regNumber srcReg = regNumber(i);
            GetEmitter()->emitIns_R_R(INS_movd, EA_4BYTE, dstReg, srcReg);
        }
    }
    
JitDisAsm:
IN001e: 00007E vmovd    eax, xmm0
IN001f: 000082 vmovd    eax, xmm1
IN0020: 000086 vmovd    eax, xmm2
IN0021: 00008A vmovd    eax, xmm3
IN0022: 00008E vmovd    eax, xmm4
IN0023: 000092 vmovd    eax, xmm5
IN0024: 000096 vmovd    eax, xmm6
IN0025: 00009A vmovd    eax, xmm7
IN0026: 00009E vmovd    ecx, xmm0
IN0027: 0000A2 vmovd    ecx, xmm1
IN0028: 0000A6 vmovd    ecx, xmm2
IN0029: 0000AA vmovd    ecx, xmm3
IN002a: 0000AE vmovd    ecx, xmm4
IN002b: 0000B2 vmovd    ecx, xmm5
IN002c: 0000B6 vmovd    ecx, xmm6
IN002d: 0000BA vmovd    ecx, xmm7
IN002e: 0000BE vmovd    edx, xmm0
IN002f: 0000C2 vmovd    edx, xmm1
IN0030: 0000C6 vmovd    edx, xmm2
IN0031: 0000CA vmovd    edx, xmm3
IN0032: 0000CE vmovd    edx, xmm4
IN0033: 0000D2 vmovd    edx, xmm5
IN0034: 0000D6 vmovd    edx, xmm6
IN0035: 0000DA vmovd    edx, xmm7
IN0036: 0000DE vmovd    ebx, xmm0
IN0037: 0000E2 vmovd    ebx, xmm1
IN0038: 0000E6 vmovd    ebx, xmm2
IN0039: 0000EA vmovd    ebx, xmm3
IN003a: 0000EE vmovd    ebx, xmm4
IN003b: 0000F2 vmovd    ebx, xmm5
IN003c: 0000F6 vmovd    ebx, xmm6
IN003d: 0000FA vmovd    ebx, xmm7
IN003e: 0000FE vmovd    esp, xmm0
IN003f: 000102 vmovd    esp, xmm1
IN0040: 000106 vmovd    esp, xmm2
IN0041: 00010A vmovd    esp, xmm3
IN0042: 00010E vmovd    esp, xmm4
IN0043: 000112 vmovd    esp, xmm5
IN0044: 000116 vmovd    esp, xmm6
IN0045: 00011A vmovd    esp, xmm7
IN0046: 00011E vmovd    ebp, xmm0
IN0047: 000122 vmovd    ebp, xmm1
IN0048: 000126 vmovd    ebp, xmm2
IN0049: 00012A vmovd    ebp, xmm3
IN004a: 00012E vmovd    ebp, xmm4
IN004b: 000132 vmovd    ebp, xmm5
IN004c: 000136 vmovd    ebp, xmm6
IN004d: 00013A vmovd    ebp, xmm7
IN004e: 00013E vmovd    esi, xmm0
IN004f: 000142 vmovd    esi, xmm1
IN0050: 000146 vmovd    esi, xmm2
IN0051: 00014A vmovd    esi, xmm3
IN0052: 00014E vmovd    esi, xmm4
IN0053: 000152 vmovd    esi, xmm5
IN0054: 000156 vmovd    esi, xmm6
IN0055: 00015A vmovd    esi, xmm7
IN0056: 00015E vmovd    edi, xmm0
IN0057: 000162 vmovd    edi, xmm1
IN0058: 000166 vmovd    edi, xmm2
IN0059: 00016A vmovd    edi, xmm3
IN005a: 00016E vmovd    edi, xmm4
IN005b: 000172 vmovd    edi, xmm5
IN005c: 000176 vmovd    edi, xmm6
IN005d: 00017A vmovd    edi, xmm7
IN005e: 00017E vmovd    xmm0, eax
IN005f: 000182 vmovd    xmm1, eax
IN0060: 000186 vmovd    xmm2, eax
IN0061: 00018A vmovd    xmm3, eax
IN0062: 00018E vmovd    xmm4, eax
IN0063: 000192 vmovd    xmm5, eax
IN0064: 000196 vmovd    xmm6, eax
IN0065: 00019A vmovd    xmm7, eax
IN0066: 00019E vmovd    xmm0, ecx
IN0067: 0001A2 vmovd    xmm1, ecx
IN0068: 0001A6 vmovd    xmm2, ecx
IN0069: 0001AA vmovd    xmm3, ecx

G_M60085_IG07:        ; offs=0001AEH, size=00DAH, bbWeight=1    PerfScore 108.00, isz, extend

IN006a: 0001AE vmovd    xmm4, ecx
IN006b: 0001B2 vmovd    xmm5, ecx
IN006c: 0001B6 vmovd    xmm6, ecx
IN006d: 0001BA vmovd    xmm7, ecx
IN006e: 0001BE vmovd    xmm0, edx
IN006f: 0001C2 vmovd    xmm1, edx
IN0070: 0001C6 vmovd    xmm2, edx
IN0071: 0001CA vmovd    xmm3, edx
IN0072: 0001CE vmovd    xmm4, edx
IN0073: 0001D2 vmovd    xmm5, edx
IN0074: 0001D6 vmovd    xmm6, edx
IN0075: 0001DA vmovd    xmm7, edx
IN0076: 0001DE vmovd    xmm0, ebx
IN0077: 0001E2 vmovd    xmm1, ebx
IN0078: 0001E6 vmovd    xmm2, ebx
IN0079: 0001EA vmovd    xmm3, ebx
IN007a: 0001EE vmovd    xmm4, ebx
IN007b: 0001F2 vmovd    xmm5, ebx
IN007c: 0001F6 vmovd    xmm6, ebx
IN007d: 0001FA vmovd    xmm7, ebx
IN007e: 0001FE vmovd    xmm0, esp
IN007f: 000202 vmovd    xmm1, esp
IN0080: 000206 vmovd    xmm2, esp
IN0081: 00020A vmovd    xmm3, esp
IN0082: 00020E vmovd    xmm4, esp
IN0083: 000212 vmovd    xmm5, esp
IN0084: 000216 vmovd    xmm6, esp
IN0085: 00021A vmovd    xmm7, esp
IN0086: 00021E vmovd    xmm0, ebp
IN0087: 000222 vmovd    xmm1, ebp
IN0088: 000226 vmovd    xmm2, ebp
IN0089: 00022A vmovd    xmm3, ebp
IN008a: 00022E vmovd    xmm4, ebp
IN008b: 000232 vmovd    xmm5, ebp
IN008c: 000236 vmovd    xmm6, ebp
IN008d: 00023A vmovd    xmm7, ebp
IN008e: 00023E vmovd    xmm0, esi
IN008f: 000242 vmovd    xmm1, esi
IN0090: 000246 vmovd    xmm2, esi
IN0091: 00024A vmovd    xmm3, esi
IN0092: 00024E vmovd    xmm4, esi
IN0093: 000252 vmovd    xmm5, esi
IN0094: 000256 vmovd    xmm6, esi
IN0095: 00025A vmovd    xmm7, esi
IN0096: 00025E vmovd    xmm0, edi
IN0097: 000262 vmovd    xmm1, edi
IN0098: 000266 vmovd    xmm2, edi
IN0099: 00026A vmovd    xmm3, edi
IN009a: 00026E vmovd    xmm4, edi
IN009b: 000272 vmovd    xmm5, edi
IN009c: 000276 vmovd    xmm6, edi
IN009d: 00027A vmovd    xmm7, edi


Intel disasm (under a debugger):
08FAC7E6  vmovd       eax,xmm0  
08FAC7EA  vmovd       eax,xmm1  
08FAC7EE  vmovd       eax,xmm2  
08FAC7F2  vmovd       eax,xmm3  
08FAC7F6  vmovd       eax,xmm4  
08FAC7FA  vmovd       eax,xmm5  
08FAC7FE  vmovd       eax,xmm6  
08FAC802  vmovd       eax,xmm7  
08FAC806  vmovd       ecx,xmm0  
08FAC80A  vmovd       ecx,xmm1  
08FAC80E  vmovd       ecx,xmm2  
08FAC812  vmovd       ecx,xmm3  
08FAC816  vmovd       ecx,xmm4  
08FAC81A  vmovd       ecx,xmm5  
08FAC81E  vmovd       ecx,xmm6  
08FAC822  vmovd       ecx,xmm7  
08FAC826  vmovd       edx,xmm0  
08FAC82A  vmovd       edx,xmm1  
08FAC82E  vmovd       edx,xmm2  
08FAC832  vmovd       edx,xmm3  
08FAC836  vmovd       edx,xmm4  
08FAC83A  vmovd       edx,xmm5  
08FAC83E  vmovd       edx,xmm6
08FAC842  vmovd       edx,xmm7  
08FAC846  vmovd       ebx,xmm0  
08FAC84A  vmovd       ebx,xmm1  
08FAC84E  vmovd       ebx,xmm2  
08FAC852  vmovd       ebx,xmm3  
08FAC856  vmovd       ebx,xmm4  
08FAC85A  vmovd       ebx,xmm5  
08FAC85E  vmovd       ebx,xmm6  
08FAC862  vmovd       ebx,xmm7  
08FAC866  vmovd       esp,xmm0  
08FAC86A  vmovd       esp,xmm1  
08FAC86E  vmovd       esp,xmm2  
08FAC872  vmovd       esp,xmm3  
08FAC876  vmovd       esp,xmm4  
08FAC87A  vmovd       esp,xmm5  
08FAC87E  vmovd       esp,xmm6  
08FAC882  vmovd       esp,xmm7  
08FAC886  vmovd       ebp,xmm0  
08FAC88A  vmovd       ebp,xmm1  
08FAC88E  vmovd       ebp,xmm2  
08FAC892  vmovd       ebp,xmm3  
08FAC896  vmovd       ebp,xmm4  
08FAC89A  vmovd       ebp,xmm5  
08FAC89E  vmovd       ebp,xmm6  
08FAC8A2  vmovd       ebp,xmm7  
08FAC8A6  vmovd       esi,xmm0  
08FAC8AA  vmovd       esi,xmm1  
08FAC8AE  vmovd       esi,xmm2  
08FAC8B2  vmovd       esi,xmm3  
08FAC8B6  vmovd       esi,xmm4  
08FAC8BA  vmovd       esi,xmm5  
08FAC8BE  vmovd       esi,xmm6  
08FAC8C2  vmovd       esi,xmm7  
08FAC8C6  vmovd       edi,xmm0  
08FAC8CA  vmovd       edi,xmm1  
08FAC8CE  vmovd       edi,xmm2  
08FAC8D2  vmovd       edi,xmm3  
08FAC8D6  vmovd       edi,xmm4  
08FAC8DA  vmovd       edi,xmm5  
08FAC8DE  vmovd       edi,xmm6  
08FAC8E2  vmovd       edi,xmm7  
08FAC8E6  vmovd       xmm0,eax  
08FAC8EA  vmovd       xmm1,eax  
08FAC8EE  vmovd       xmm2,eax  
08FAC8F2  vmovd       xmm3,eax  
08FAC8F6  vmovd       xmm4,eax  
08FAC8FA  vmovd       xmm5,eax  
08FAC8FE  vmovd       xmm6,eax  
08FAC902  vmovd       xmm7,eax  
08FAC906  vmovd       xmm0,ecx  
08FAC90A  vmovd       xmm1,ecx  
08FAC90E  vmovd       xmm2,ecx  
08FAC912  vmovd       xmm3,ecx  
08FAC916  vmovd       xmm4,ecx  
08FAC91A  vmovd       xmm5,ecx  
08FAC91E  vmovd       xmm6,ecx  
08FAC922  vmovd       xmm7,ecx  
08FAC926  vmovd       xmm0,edx  
08FAC92A  vmovd       xmm1,edx  
08FAC92E  vmovd       xmm2,edx  
08FAC932  vmovd       xmm3,edx  
08FAC936  vmovd       xmm4,edx  
08FAC93A  vmovd       xmm5,edx  
08FAC93E  vmovd       xmm6,edx  
08FAC942  vmovd       xmm7,edx  
08FAC946  vmovd       xmm0,ebx  
08FAC94A  vmovd       xmm1,ebx  
08FAC94E  vmovd       xmm2,ebx  
08FAC952  vmovd       xmm3,ebx  
08FAC956  vmovd       xmm4,ebx  
08FAC95A  vmovd       xmm5,ebx  
08FAC95E  vmovd       xmm6,ebx  
08FAC962  vmovd       xmm7,ebx  
08FAC966  vmovd       xmm0,esp  
08FAC96A  vmovd       xmm1,esp  
08FAC96E  vmovd       xmm2,esp  
08FAC972  vmovd       xmm3,esp  
08FAC976  vmovd       xmm4,esp  
08FAC97A  vmovd       xmm5,esp  
08FAC97E  vmovd       xmm6,esp  
08FAC982  vmovd       xmm7,esp  
08FAC986  vmovd       xmm0,ebp  
08FAC98A  vmovd       xmm1,ebp  
08FAC98E  vmovd       xmm2,ebp  
08FAC992  vmovd       xmm3,ebp  
08FAC996  vmovd       xmm4,ebp  
08FAC99A  vmovd       xmm5,ebp  
08FAC99E  vmovd       xmm6,ebp  
08FAC9A2  vmovd       xmm7,ebp  
08FAC9A6  vmovd       xmm0,esi  
08FAC9AA  vmovd       xmm1,esi  
08FAC9AE  vmovd       xmm2,esi  
08FAC9B2  vmovd       xmm3,esi  
08FAC9B6  vmovd       xmm4,esi  
08FAC9BA  vmovd       xmm5,esi  
08FAC9BE  vmovd       xmm6,esi  
08FAC9C2  vmovd       xmm7,esi  
08FAC9C6  vmovd       xmm0,edi  
08FAC9CA  vmovd       xmm1,edi  
08FAC9CE  vmovd       xmm2,edi  
08FAC9D2  vmovd       xmm3,edi  
08FAC9D6  vmovd       xmm4,edi  
08FAC9DA  vmovd       xmm5,edi  
08FAC9DE  vmovd       xmm6,edi  
08FAC9E2  vmovd       xmm7,edi 

Thanks to @echesakovMSFT for helping with the fix.

Fixes #47259.

There is a follow-up issue to separate mod/movq for general reg/movq for xmm reg #47943

Comment on lines 11442 to 11459
if (ins == INS_movd)
{
assert(isFloatReg(reg1) != isFloatReg(reg2));
if (isFloatReg(reg1))
{
code = insCodeRM(ins);
}
else
{
code = insCodeMR(ins);
}
}
else
{
code = insCodeRM(ins);
}
code = AddVexPrefixIfNeeded(ins, code, size);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we simplify to:

assert((ins != INS_movd) || (isFloatReg(reg1) != isFloatReg(reg2)));

if ((ins != INS_movd) || isFloatReg(reg1))
{
    code = insCodeRM(ins);
}
else
{
    code = insCodeMR(ins);
}

I'd expect ins != INS_movd is the common case for codegen and it helps reduce nesting

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do before the merge, thanks for the suggestion.

@@ -14681,18 +14698,6 @@ emitter::insExecutionCharacteristics emitter::getInsExecutionCharacteristics(ins
result.insThroughput = PERFSCORE_THROUGHPUT_25C;
break;

case INS_mov_xmm2i:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like these perfscores and the ones used for INS_movd are tracked slightly differently?

@sandreenko sandreenko marked this pull request as draft February 5, 2021 19:06
@sandreenko
Copy link
Contributor Author

Sorry, @tannergooding I have not marked as draft, could you please review it later after I fix the failures and add a description?

@sandreenko
Copy link
Contributor Author

/azp list

@sandreenko
Copy link
Contributor Author

/azp run runtime-coreclr jitstress, runtime-coreclr jitstressregs, runtime-coreclr outerloop

@azure-pipelines
Copy link

Azure Pipelines successfully started running 3 pipeline(s).

@sandreenko
Copy link
Contributor Author

The failure is unrelated (#47096).

@sandreenko sandreenko marked this pull request as ready for review February 6, 2021 23:33
@sandreenko
Copy link
Contributor Author

PTAL @dotnet/jit-contrib , the PR fixes a recent regression and fixes wrong abstraction for mov_xmm2i.

Copy link
Member

@BruceForstall BruceForstall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. look like a nice cleanup.

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM with few questions.

@@ -11461,6 +11466,9 @@ BYTE* emitter::emitOutputRR(BYTE* dst, instrDesc* id)
}
else if ((ins == INS_movsx) || (ins == INS_movzx) || (insIsCMOV(ins)))
{
assert(hasCodeRM(ins) && !hasCodeMI(ins) && !hasCodeMR(ins));
code = insCodeRM(ins);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you confirm why we added insCodeRM() and AddVexPrefixIfNeeded() here and below?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was it missing currently?

Copy link
Contributor Author

@sandreenko sandreenko Feb 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we had logic like:

    // Get the 'base' opcode	
    code = insCodeRM(ins);	
    code = AddVexPrefixIfNeeded(ins, code, size);
    if ()
    { 
        code = insEncodeRMreg(ins, code); // use the existing value.
    } 
    else if ()
    {
       code = insCodeMR(ins);	//rewrite the existing value without reading
    }

so it was hard to follow, so I moved this

    code = insCodeRM(ins);	
    code = AddVexPrefixIfNeeded(ins, code, size);

under conditions where the result code is used.

see 11451 in the base.

@@ -1451,7 +1451,7 @@ void CodeGen::genSSE2Intrinsic(GenTreeHWIntrinsic* node)
{
assert(baseType == TYP_INT || baseType == TYP_UINT || baseType == TYP_LONG || baseType == TYP_ULONG);
op1Reg = op1->GetRegNum();
emit->emitIns_R_R(ins, emitActualTypeSize(baseType), op1Reg, targetReg);
emit->emitIns_R_R(ins, emitActualTypeSize(baseType), targetReg, op1Reg);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this an existing bug?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the instruction that we were generating here was xmm2i and for it we were passing dst as reg2. After this PR we pass it as reg1 as we do for all other instructions (dst first, source second).

@@ -1688,7 +1688,7 @@ void CodeGen::genAvxOrAvx2Intrinsic(GenTreeHWIntrinsic* node)
assert(numArgs == 1);
assert((baseType == TYP_INT) || (baseType == TYP_UINT));
instruction ins = HWIntrinsicInfo::lookupIns(intrinsicId, baseType);
emit->emitIns_R_R(ins, emitActualTypeSize(baseType), op1Reg, targetReg);
emit->emitIns_R_R(ins, emitActualTypeSize(baseType), targetReg, op1Reg);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am surprised this we never noticed any failure because of this.

instruction ins = ins_CopyFloatToInt(TYP_FLOAT, TYP_INT);
// (Note that for mov_xmm2i, the int register is always in the reg2 position.
inst_RV_RV(ins, op2Reg, tmpReg, baseType);
// Another error, op2Reg is our xmm source but it was passed second.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you want to delete this comment before merging?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I thought I did, thanks.

Copy link
Contributor

@echesakov echesakov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sandreenko
Copy link
Contributor Author

the pr was updated with the review feedback, once tests pass I will merge it.

@echesakovMSFT asked me to run spmi diffs and they showed some improvements because of:

 -- suppressing emission as previous instruction already properly extends.

becomes

IN0004: 00000C movd     eax, xmm1

because AreUpper32BitsZero now sees movd eax, xmm1 instead of movd xmm1, eax and this check passes:

// Bail if not writing to the right register
if (id->idReg1() != reg)
{
return false;
}

@sandreenko sandreenko merged commit 5b0005a into dotnet:master Feb 9, 2021
@sandreenko sandreenko deleted the fixINS_mov_xmm2i branch February 9, 2021 05:27
@ghost ghost locked as resolved and limited conversation to collaborators Mar 11, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Test failure : Interop\\PInvoke\\Vector2_3_4\\Vector2_3_4\\Vector2_3_4.cmd
6 participants