Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prep CompareInfo for spanification #32385

Merged

Conversation

GrabYourPitchforks
Copy link
Member

@GrabYourPitchforks GrabYourPitchforks commented Feb 16, 2020

This removes many (not all, see #8890) of the string-based p/invoke wrappers in CompareInfo and replaces them with span-based wrappers.

There's a little more error checking around the actual p/invokes themselves, but otherwise the intent of this PR is that there's no behavioral change to the existing surface area, and there's no new API being exposed.

This PR serves as the shared foundation for upcoming work such as #25428, #27912, and #27935.

This PR also reverts a small optimization we did to SortKey in #31779, since I was already touching SortKey-related code paths here.

@GrabYourPitchforks
Copy link
Member Author

Over the next few days I'll get the perf measurements for the change with SetLastError = true. Been heads-down with other work.

I don't expect this to have significant impact on typical API calls to string since we already special-case the common code paths of ordinal comparisons and all-ASCII data. These changes would primarily impact non-ASCII and culture-aware text processing code paths.

Would it be a strange behavioral difference to check last error only in debug builds? If we went this route I imagine we'd want to Debug.Assert rather than throw an exception, since that would give the smallest behavioral difference between debug and release. Do we follow this pattern elsewhere in the framework?

@jkotas
Copy link
Member

jkotas commented Feb 21, 2020

If we went this route I imagine we'd want to Debug.Assert rather than throw an exception

Yep.

@GrabYourPitchforks
Copy link
Member Author

Aside from the GetLastError in debug part is there any other feedback for this?

@tarekgh
Copy link
Member

tarekgh commented Mar 13, 2020

I added minor comments. LGTM otherwise. Thanks for your effort with it.

@GrabYourPitchforks
Copy link
Member Author

I believe all the feedback is now addressed. Waiting for CI to go green.

internal static extern int FindStringOrdinal(
uint dwFindStringOrdinalFlags,
char* lpStringSource,
int cchSource,
char* lpStringValue,
int cchValue,
int bIgnoreCase);
bool bIgnoreCase);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will add marshaling frame. Does it have effect on performance for small strings?

You can use upper-case BOOL to get bool that is both zero-overhead and self-describing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean apply [MarshalAs(BOOL)] to avoid the marshaling frame?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, we have an actual internal BOOL enum. Got it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I almost want to define BOOL.FromBoolean(bool value) => Unsafe.As<bool, BOOL>(ref value) to avoid the branching at the call sites. But that's probably an issue for a different PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And it does not work. bool and BOOL have different sizes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, Unsafe.As<bool, byte>(ref value). The same thing we tried back in dotnet/coreclr#16138 but was killed.

@@ -211,11 +200,22 @@ private unsafe int CompareString(ReadOnlySpan<char> string1, string string2, Com

string? localeName = _sortHandle != IntPtr.Zero ? null : _sortName;

// CompareStringEx may try to dereference the first character of its input, even if an explicit
// length of 0 is specified. To work around potential AVs we'll always ensure zero-length inputs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really? What is the input that it happens for?

Copy link
Member Author

@GrabYourPitchforks GrabYourPitchforks Mar 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create a new C++ application and drop in this code. It will AV the process.

	auto result = CompareStringEx(
		LOCALE_NAME_INVARIANT,
		NORM_LINGUISTIC_CASING,
		L"x",
		1,
		(LPCWCH)0xdeadbeef,
		0,
		nullptr,
		nullptr,
		0);

Edit: I reported it to the Windows team and it's being fixed in a future release. But we pessimistically have to work around it for now.

@@ -211,11 +200,22 @@ private unsafe int CompareString(ReadOnlySpan<char> string1, string string2, Com

string? localeName = _sortHandle != IntPtr.Zero ? null : _sortName;

// CompareStringEx may try to dereference the first character of its input, even if an explicit
// length of 0 is specified. To work around potential AVs we'll always ensure zero-length inputs
// are normalized to a null-terminated empty string.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have test coverage for all these empty / null string corner cases?

Copy link
Member Author

@GrabYourPitchforks GrabYourPitchforks Mar 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The unit tests that are part of #1514 (from which this PR was broken out as a more manageable chunk) cover this scenario. In fact, those unit tests are where the problem was originally discovered. :)

Edit: Looks like https://github.com/dotnet/runtime/blob/master/src/libraries/System.Globalization/tests/CompareInfo/CompareInfoTests.Compare.cs also would cover this scenario when the larger "don't early-exit inappropriately" PR comes online.

@GrabYourPitchforks
Copy link
Member Author

I'm a little confused as to why CI is failing here. There are some things that look suspicious. From the Helix logs:

/private/tmp/helix/working/A96E0923/w/AB700940/e /private/tmp/helix/working/A96E0923/w/AB700940/e
  Discovering: System.Memory.Tests (method display = ClassAndMethod, method display options = None)
  Discovered:  System.Memory.Tests (found 2523 of 2547 test cases)
  Starting:    System.Memory.Tests (parallel test collections = on, max threads = 4)
Process terminated. Assertion failed.
   at System.Globalization.CompareInfo.IndexOfOrdinalCore(ReadOnlySpan`1 source, ReadOnlySpan`1 value, Boolean ignoreCase, Boolean fromBeginning) in /_/src/libraries/System.Private.CoreLib/src/System/Globalization/CompareInfo.Unix.cs:line 48
   at System.Globalization.CompareInfo.IndexOfOrdinal(String source, String value, Int32 startIndex, Int32 count, Boolean ignoreCase) in /_/src/libraries/System.Private.CoreLib/src/System/Globalization/CompareInfo.cs:line 1115
   at System.Tests.StringTests.IndexOf_AllSubstrings(String s, String value, Int32 startIndex, StringComparison comparison) in /_/src/libraries/Common/tests/Tests/System/StringTests.cs:line 2801

Since the failure is at CompareInfo.Unix.cs, line 48, the assertion that's failing is the !GlobalizationMode.Invariant check below.

https://github.com/dotnet/runtime/blob/da2eb1dbdcfab8ebfc37db726f23c2c0936d12c9/src/libraries/System.Private.CoreLib/src/System/Globalization/CompareInfo.Unix.cs#L46-L49

According to the stack trace the caller is CompareInfo.cs, line 1115, but there's no relevant code at that line. It's just another Debug.Assert statement. (Presumably that assertion succeeded, else control flow wouldn't have continued.)

https://github.com/dotnet/runtime/blob/da2eb1dbdcfab8ebfc37db726f23c2c0936d12c9/src/libraries/System.Private.CoreLib/src/System/Globalization/CompareInfo.cs#L1113-L1116

There are only three places where we call IndexOfOrdinalCore. All three are guarded with either the same Debug.Assert(!GlobalizationMode.Invariant) (which clearly didn't fail) or with an explicit if check to route control flow appropriately.

https://github.com/dotnet/runtime/blob/da2eb1dbdcfab8ebfc37db726f23c2c0936d12c9/src/libraries/System.Private.CoreLib/src/System/Globalization/CompareInfo.cs#L1130-L1137

https://github.com/dotnet/runtime/blob/da2eb1dbdcfab8ebfc37db726f23c2c0936d12c9/src/libraries/System.Private.CoreLib/src/System/Globalization/CompareInfo.cs#L969-L984

@jkotas, @tarekgh, any thoughts as to why CI might report such an error? And why the error would occur only on OSX and not any other Unix platform?

@jkotas
Copy link
Member

jkotas commented Mar 13, 2020

I do not see anything obvious. It can be a JIT bug or data corruption - lots of unsafe code in globalization.

@tarekgh
Copy link
Member

tarekgh commented Mar 13, 2020

maybe try to add some messages to the asserts in the involved files which may help give some clue.

@tarekgh
Copy link
Member

tarekgh commented Mar 13, 2020

The other idea could be trying repro it on OSX machine

@GrabYourPitchforks
Copy link
Member Author

GrabYourPitchforks commented Mar 13, 2020

I'll investigate a little more with the assumption that the stack trace is lying. :)
Edit: Yes, turned out that the stack trace was indeed reporting incorrect line numbers. I've put a new commit to work around the edge cases that were hitting this assert.

@GrabYourPitchforks GrabYourPitchforks merged commit faf08f7 into dotnet:master Mar 14, 2020
@GrabYourPitchforks GrabYourPitchforks deleted the spanify_compareinfo branch March 14, 2020 02:43
@ghost ghost locked as resolved and limited conversation to collaborators Dec 10, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants