Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I don't understand Case- and diacritics sensitivity #518

Closed
fishfree opened this issue Apr 22, 2024 · 1 comment
Closed

I don't understand Case- and diacritics sensitivity #518

fishfree opened this issue Apr 22, 2024 · 1 comment

Comments

@fishfree
Copy link

The doc here says:

- sensitive or s: case+diacritics sensitive only
- insensitive or i: case+diacritics insensitive only
- sensitive_insensitive or si: case+diacritics sensitive and insensitive
- all: all four combinations of case-sensitivity and diacritics-sensivity

My understanding is as below:

  • sensitive or s: case+diacritics are both sensitive only
  • insensitive or i: case+diacritics are both insensitive only
  • sensitive_insensitive or si: case is sensitive and diacritics is insensitive
  • all: I don't understand totally :-(

Am I right?

@jan-niestadt
Copy link
Member

sensitive_insensitive means two versions of your input docs will be indexed:

  • one that's both case-sensitive AND diacritics-sensitive
  • one that's both case-insensitive AND diacritics-insensitive

all would index all 4 combinations, i.e. CD, cd, Cd, cD (where C mean case-sensitive, c case-insensitive, etc.) and would allow you to search case-insensitively but diacritics-sensitively.

all is not well support yet by BlackLab, however. We don't currently use it, and I'm not sure if it works at all. Pull requests welcome, and I'd be happy to give you some pointers where to start if you want.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants