normalize search string to NFC before comparison #1272
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Normalize the search string to NFC since all data in LF is normalized to NFC on disk. This allows for exact match or ignore diacritic queries to work regardless of form or language, e.g. Korean.
A note about this fix:
Fixes #1244
Type of Change
Tests and Test Data
Consider the single Korean character below:
감=감
To the left of the equals sign is the NFC single composed character. To the right is the NFD decomposed form (3 code points). They are canonically equivalent and should display identically where Korean is properly supported. Interestingly, my Windows machine isn't rendering the NFD portion correctly, as seen in the character identifier screenshot below. My web browser displays it just fine, as you see it in this PR description.
Test 1 - Query match with "match diacritics"
Steps:
Test 2 - Query match with "ignore diacritics" (default behavior)
Steps:
Screencast demo from this branch
Checklist: