Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check for deprecations when analyzers are built #50908

Merged
merged 6 commits into from
Jan 14, 2020

Conversation

romseygeek
Copy link
Contributor

Generally speaking, deprecated analysis components in elasticsearch will issue deprecation
warnings when they are first used. However, this means that no warnings are emitted when
indexes are created with deprecated components, and users have to actually index a document
to see warnings. This makes it much harder to see these warnings and act on them at
appropriate times.

This commit adds a new check that pushes an empty string through all user-defined analyzers
and normalizers when an IndexAnalyzers object is built for each index; deprecation warnings
are now emitted when indexes are created or opened.

Fixes #42349

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/Analysis)

Copy link
Member

@cbuescher cbuescher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@romseygeek I took a first look and was wondering how version-dependent logic like deprecation from V7.6 on and throwing errors starting with a later version of the analysis components would work. Can you add a small example for this maybe in some tests?

Copy link
Member

@cbuescher cbuescher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a short question, rest looks good to me.

analyze =
TransportAnalyzeAction.analyze(req, registry, mockIndexService(), maxTokenCount);
assertEquals(1, analyze.getTokens().size());
assertWarnings("Using deprecated token filter [deprecated]");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a small observation: I thought this checks that the normalize method in DeprecatedTokenFilterFactory, so I tried changing it but the test didn't fail, do you know why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the default implementation of normalize delegates to create, so we get the warning from there instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that sounds like the "normalize()" method isn't actualy tested here? I saw we cover that in some other test though, so fine with whatever you decide doing here, I was just curious if this could be checked here through the "_analyze" API as well, but no problem it its too tricky I.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's tested, but in order to make it fail you can't just remove the method (because that then delegates to create()), you have to have an implementation that doesn't emit a warning

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I've just realised that we're talking about the AnalyzeAction - this always uses create() rather than normalize(), and we should probably change that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should probably change that

This can most likely be done in a follow up PR though, I just wanted to understand whats going on here and which of the two methods in the DeprecatedTokenFilterFactory in this test is supposed to fire the warning, currently they emit the same text so its hard to tell which code path is checked.

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left two minor comments that would be nice to address, LGTM otherwise.


// Deprecation warnings are emitted when actual TokenStreams are built; this is usually
// too late to be useful, so we build an empty tokenstream at construction time and
// use it, to ensure that warnings are emitted immediately.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also throw exceptions in some cases so it's not only about deprecations ?

@@ -264,4 +271,90 @@ public void testEnsureCloseInvocationProperlyDelegated() throws IOException {
registry.close();
verify(mock).close();
}

public void testDeprecations() throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add a test that throws an exception ?

@romseygeek
Copy link
Contributor Author

Generally speaking, deprecated analysis components in elasticsearch will issue deprecation
warnings when they are first used. However, this means that no warnings are emitted when
indexes are created with deprecated components, and users have to actually index a document
to see warnings. This makes it much harder to see these warnings and act on them at
appropriate times.

This is worse in the case where components throw exceptions on upgrade. In this case, users
will not be aware of a problem until a document is indexed, instead of at index creation time.

This commit adds a new check that pushes an empty string through all user-defined analyzers
and normalizers when an IndexAnalyzers object is built for each index; deprecation warnings
and exceptions are now emitted when indexes are created or opened.

Fixes #42349

@romseygeek romseygeek merged commit 736ed47 into elastic:master Jan 14, 2020
@romseygeek romseygeek deleted the analysis/deprecations-check branch January 14, 2020 13:12
romseygeek added a commit that referenced this pull request Jan 14, 2020
Generally speaking, deprecated analysis components in elasticsearch will issue deprecation
warnings when they are first used. However, this means that no warnings are emitted when
indexes are created with deprecated components, and users have to actually index a document
to see warnings. This makes it much harder to see these warnings and act on them at
appropriate times.

This is worse in the case where components throw exceptions on upgrade. In this case, users
will not be aware of a problem until a document is indexed, instead of at index creation time.

This commit adds a new check that pushes an empty string through all user-defined analyzers
and normalizers when an IndexAnalyzers object is built for each index; deprecation warnings
and exceptions are now emitted when indexes are created or opened.

Fixes #42349
SivagurunathanV pushed a commit to SivagurunathanV/elasticsearch that referenced this pull request Jan 23, 2020
Generally speaking, deprecated analysis components in elasticsearch will issue deprecation
warnings when they are first used. However, this means that no warnings are emitted when
indexes are created with deprecated components, and users have to actually index a document
to see warnings. This makes it much harder to see these warnings and act on them at
appropriate times.

This is worse in the case where components throw exceptions on upgrade. In this case, users
will not be aware of a problem until a document is indexed, instead of at index creation time.

This commit adds a new check that pushes an empty string through all user-defined analyzers
and normalizers when an IndexAnalyzers object is built for each index; deprecation warnings
and exceptions are now emitted when indexes are created or opened.

Fixes elastic#42349
romseygeek added a commit that referenced this pull request Apr 21, 2021
The lucene Ukrainian analyzer has a bug where a large in-memory
dictionary is loaded and stored on a thread local for every tokenstream
generated in a new thread (for more details see
https://issues.apache.org/jira/browse/LUCENE-9930). Due to checks
added in #50908, we create a tokenstream for every registered
analyzer in every shard, which means that any node with the ukrainian
plugin installed will leak one copy of this dictionary per shard,
whether or not the ukrainian analyzer is actually being used.

This commit makes the plugin use a fixed version of the
UkrainianMorfologikAnalyzer, until we merge a version of lucene that
contains the upstream fix.
romseygeek added a commit that referenced this pull request Apr 21, 2021
The lucene Ukrainian analyzer has a bug where a large in-memory
dictionary is loaded and stored on a thread local for every tokenstream
generated in a new thread (for more details see
https://issues.apache.org/jira/browse/LUCENE-9930). Due to checks
added in #50908, we create a tokenstream for every registered
analyzer in every shard, which means that any node with the ukrainian
plugin installed will leak one copy of this dictionary per shard,
whether or not the ukrainian analyzer is actually being used.

This commit makes the plugin use a fixed version of the
UkrainianMorfologikAnalyzer, until we merge a version of lucene that
contains the upstream fix.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make it easier to deprecate analysis components
5 participants