-
Notifications
You must be signed in to change notification settings - Fork 4.9k
Optimize FindFirstCharToEncode for JavaScriptEncoder.Default using Ssse3 intrinsics #42073
Optimize FindFirstCharToEncode for JavaScriptEncoder.Default using Ssse3 intrinsics #42073
Conversation
…eEscapingCheck
…ith CreateEscapingMaskSse2 (renamed from CreateEscapingMask)
Somehow this reference was left there, it's not needed anymore.
Messed something up while copying from one vm (local build) to the repo. Will fix that build failure. |
System\Text\Encodings\Web\Ssse3Helper.cs(101,13): error SA1115: (NETCORE_ENGINEERING_TELEMETRY=Build) The parameter should begin on the line after the previous parameter.
@ahsonkhan please help me figure out what causes the build-failure
It's based on your latest changes in this project (which builds), a local build runs fine ( |
...ystem.Text.Encodings.Web/src/System/Text/Encodings/Web/DefaultJavaScriptEncoderBasicLatin.cs
Show resolved
Hide resolved
Could you compare against master and this PR instead, specifically for |
…into ImproveEscapingCheck
Now I got a trace why the numbers for small inputs are so bad, compared to current master. (Which I couldn't understand so far, as the asm for the sequential path looks quite similar, this PR produces even less instructions therefore, so a difference of 30% felt definitely too much.)
// setup...not shown
[Benchmark(Baseline = true)]
public unsafe int Master()
{
fixed (char* ptr = TestStringData.JsonString)
{
return DefaultJavaScriptEncoderBasicLatin_master.s_singleton.FindFirstCharacterToEncode(ptr, TestStringData.JsonString.Length);
}
}
[Benchmark]
public int PR_42073()
{
fixed (char* ptr = TestStringData.JsonString)
{
return DefaultJavaScriptEncoderBasicLatin.s_singleton.FindFirstCharacterToEncode(ptr, TestStringData.JsonString.Length);
}
} where
When I look at the Jit-dumps for a simple program like using System;
using System.Text.Encodings.Web;
namespace ConsoleApp1
{
class Program
{
unsafe static int Main()
{
string text = "vabvxgyvhoqqfhea¢";
fixed (char* ptr = text)
{
int idx = JavaScriptEncoder.Default.FindFirstCharacterToEncode(ptr, text.Length);
return idx;
}
}
}
} I can see that
but not inlined
(which is ok, as the code to inline is huge). For the With AggressiveInlining
Without AggressiveInlining
As said before Aside: when |
These methods won't inline, and we don't want to inline them, as they're rather huge.
I merged the branch from #42064 into this PR (#42064 (comment)), so these changes are also shown right now in the diff. |
return mask; | ||
} | ||
|
||
private static readonly Vector128<sbyte> s_nibbleMaskSByte = Vector128.Create((sbyte)0xF); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a better name for the mask 0xF
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have been using the exact unicode character name, but maybe a context/usage specific name would be better:
https://www.fileformat.info/info/unicode/char/000f/index.htm
...ystem.Text.Encodings.Web/src/System/Text/Encodings/Web/DefaultJavaScriptEncoderBasicLatin.cs
Outdated
Show resolved
Hide resolved
...ystem.Text.Encodings.Web/src/System/Text/Encodings/Web/DefaultJavaScriptEncoderBasicLatin.cs
Outdated
Show resolved
Hide resolved
Current benchmark-numbers are updated in the top-post. |
I'll also run benchmarks with SSSE3 disabled, to see if the SSE2 code-path didn't regress. Edit: results Test_Escaping with SSSE3 disabled
Discussion
Thoughts (especially about the last point)? |
...ystem.Text.Encodings.Web/src/System/Text/Encodings/Web/DefaultJavaScriptEncoderBasicLatin.cs
Outdated
Show resolved
Hide resolved
...ystem.Text.Encodings.Web/src/System/Text/Encodings/Web/DefaultJavaScriptEncoderBasicLatin.cs
Outdated
Show resolved
Hide resolved
|
||
Vector128<sbyte> mask = Sse2.And(bitPositions, bitMask); | ||
|
||
Vector128<sbyte> ne0Gt = Sse2.CompareGreaterThan(mask, s_nullMaskSByte); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're looking for the equivalent of "not equal to zero", I believe the following will produce more efficient codegen:
return Sse2.CompareEqual(Sse2.CompareEqual(mask, default), default);
(And it'd allow getting rid of the static field.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! 👍
It safes the oring of the tmp-comparisons.
So far I didn't know that the JIT treats default
here the same way as Vector128<sbyte>.Zero
.
(And it'd allow getting rid of the static field.)
The static field is intentional, as the JIT hoists it outside the loop.
w/o static ro field
; ...
C4E25900DB vpshufb xmm3, xmm4, xmm3
C4E25100E7 vpshufb xmm4, xmm5, xmm7
C5D9DBDB vpand xmm3, xmm4, xmm3
C5D857E4 vxorps xmm4, xmm4
C5D057ED vxorps xmm5, xmm5
C5D174DB vpcmpeqb xmm3, xmm5, xmm3
C5D974DB vpcmpeqb xmm3, xmm4, xmm3
C5F9D7C3 vpmovmskb eax, xmm3
; ...
with static ro field
; ...
C5F9105808 vmovupd xmm3, xmmword ptr [rax+8]
G_M60692_IG11: ;; bbWeight=4 ; loop start
; ...
C4E25100E4 vpshufb xmm4, xmm5, xmm4
C4C24900E9 vpshufb xmm5, xmm6, xmm9
C5D1DBE4 vpand xmm4, xmm5, xmm4
C5B974E4 vpcmpeqb xmm4, xmm8, xmm4
C5B974E4 vpcmpeqb xmm4, xmm8, xmm4
C5F9D7C4 vpmovmskb eax, xmm4
; ...
Resolved the merge-conflict with master (src/System.Text.Encodings.Web/src/System/Text/Encodings/Web/Sse2Helper.cs). |
Test failure is unrelated. I can merge if you have no further changes you wish to make. |
I have no further changes to this PR. |
It's in! Thanks so much for the contribution. :D |
…se3 intrinsics (dotnet/corefx#42073) Commit migrated from dotnet/corefx@ba320d4
Description
Follow up to #41933
For the vectorized char-path 2*8 chars are combined to one byte-vector (16 elements), then the byte-path for finding the first element, that needs to be escaped is taken. If there are less than 2*8 chars, a dummy char-vector is used, that is known to be be valid, i.e. it must not be escaped (here
A
is chosen)(cf. Use Sse2 instrinsics to make NeedsEscaping check faster for large JSON strings #41845 (comment))
The "escaping mask" is created via bit-masks. For an outline of the algorithm see Use Sse2 instrinsics to make NeedsEscaping check faster for large JSON strings #41845 (comment) (and / or comments in the code).
For the shuffling SSSE3 is needed (for a comment on SSSE3 see also Use Sse2 instrinsics to make NeedsEscaping check faster for large JSON strings #41845 (comment)).
These optimization are applied to
JavaScriptEncoder.Default
(=DefaultJavaScriptEncoderBasicLatin
), as suggested by @ahsonkhan in #41845 (comment).For the other two mentioned encoders, the bit-mask won't work (in a performant way), as some values above
0x7F
are valid. So to handle them properly, lots of overhead have to be introduced, that's why I left them out (in this PR, maybe later when I can think more about it).Benchmarks
Benchmarks are based on the ones from #41845
Machine info
A linux-vm was used.
Test_EscapingBenchmark
Perf.Basic
Master
PR_42073 (this one)
EscapingWriter
Master
PR_42073 (this one)
WriteJson
Master
PR_42073 (this one)
SerializingNuget
Master
PR_42073 (this one)