Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure Joni warning are logged at debug #57302

Merged

Conversation

jakelandis
Copy link
Contributor

@jakelandis jakelandis commented May 28, 2020

When Joni, the regex engine that powers grok emits a warning it
does so by default to System.err. System.err logs are all bucketed
together in the server log at WARN level. When Joni emits a warning,
it can be extremely verbose, logging a message for each execution
again that pattern. For ingest node that means for every document
that is run that Grok filter. Fortunately, Joni provides a call
back hook to push these warnings to a custom location.

This commit implements Joni's callback hook to push the Joni warning
to the Elasticsearch server logger (logger.org.elasticsearch.ingest.common.GrokProcessor)
at debug level. A warning will be emitted on processor creation at WARN
level. Generally these warning indicate a possible issue with
the regular expression. Since Grok is an abstraction on top of the regex,
these warning may not provide much value outside of a troubleshooting
scenario.

Additionally, the documentation is updated with instructions for how
to set the logger to debug level.

===

Notes:

This error is pretty easily reproduced:

for input

[foo]

and pattern

.*\[.*%{SPACE}*\].*

before this commit result in the following warning for every document run through the pattern.

[2020-05-28T11:21:31,050][WARN ][stderr                   ] [runTask-0] regular expression has redundant nested repeat operator * /.*\[.*(?:\s*)*\].*/

Prior to 7.7 this message was incorrectly missed in our logs due to stderr not getting routed properly... that fix: #51569 now allows this error (as shown above) to persist in the ES server log possibly causing a lot of noise to the ES server log. This fix quiets that noise.

When Joni, the regex engine that powers grok emits a warning it does so by default to System.err. System.err logs are all bucketed together in the server log at WARN level.  When Joni emits a warning, it can be extremely verbose, logging a message for each execution again that pattern. For ingest node that means for every document that is run that Grok filter. Fortunately, Joni provides a call back hook to push these warnings to a custom location.

This commit implements Joni's callback hook to push the Joni warning to the Elasticsearch server logger (logger.org.elasticsearch.grok.Grok)
at debug level. Generally these warning indicate a possible issue with the regular expression. Since Grok is an abstraction on top of the regex, these warning may not provide much value outside of a troubleshooting scenario.

Additionally, the documentation is updated with instructions for how to set the logger to debug level.
@jakelandis jakelandis added :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP v8.0.0 v7.7.1 v7.8.1 v7.9.0 labels May 28, 2020
@jakelandis jakelandis requested a review from rjernst May 28, 2020 17:15
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features (:Core/Features/Ingest)

@elasticmachine elasticmachine added the Team:Data Management Meta label for data/management team label May 28, 2020
@jakelandis
Copy link
Contributor Author

@rjernst - I requested your review mostly for introduction of log4j from a lib. I think that is OK ?, but also had to configure the thridparty audit.

@@ -101,7 +106,8 @@ private Grok(Map<String, String> patternBank, String grokPattern, boolean namedC

String expression = toRegex(grokPattern);
byte[] expressionBytes = expression.getBytes(StandardCharsets.UTF_8);
this.compiledExpression = new Regex(expressionBytes, 0, expressionBytes.length, Option.DEFAULT, UTF8Encoding.INSTANCE);
this.compiledExpression = new Regex(expressionBytes, 0, expressionBytes.length, Option.DEFAULT, UTF8Encoding.INSTANCE,
message -> logger.log(Level.DEBUG, message));
Copy link
Contributor Author

@jakelandis jakelandis May 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^^ this line is only substantive change here. (it implements the callback so that it does not default to Std.err)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Jake! I'm happy that this is an easy change.

Copy link
Member

@jasontedor jasontedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should add Log4j to Grok, I think consumers of Grok should push down a logger. This makes more sense to me anyway, that the logging control should occur through the feature that is using Grok (ingest, vs. ML). This means that the Grok pattern regex needs to become non-static, which I think is fine.

@jakelandis
Copy link
Contributor Author

jakelandis commented May 28, 2020

I don't think we should add Log4j to Grok

Yeah, I was unsure if we should do that. Are the lib projects subject to semver w.r.t the java API ?

I think consumers of Grok should push down a logger

Do think SLF4J (and we bind it to Log4j upstream) or pass in something custom like what Joni does ? (or possibly just expose the Joni callback)

@jasontedor
Copy link
Member

jasontedor commented May 28, 2020

Yeah, I was unsure if we should do that. Are the lib projects subject to semver w.r.t the java API ?

No, they are for internal use.

Do think SLF4J (and we bind it to Log4j upstream) or pass in something custom like what Joni does ? (or possibly just expose the Joni callback)

Neither, how about just expecting a Consumer<String> when constructing Grok?

@rjernst
Copy link
Member

rjernst commented May 28, 2020

I agree with Jason's assessment; we should not depend on log4j in a lib.

Separately, are these warnings only emitted when running Grok? Can we not get them when we compile the pattern in the first place? It seems like these should be returned to a user, not hidden at debug level. They can indicate an error, like in your example maybe a + should have been there after space instead of a * (I realize that doesn't make sense in this case, but image space is something else meaningful).

@droberts195 droberts195 added v7.7.2 and removed v7.7.1 labels May 28, 2020
@jakelandis
Copy link
Contributor Author

jakelandis commented May 28, 2020

Separately, are these warnings only emitted when running Grok?

Yes, as far as i can tell, the warnings are emitted at the runtime execution, not on the pattern compile.

Can we not get them when we compile the pattern in the first place?

Not that I am aware.

It seems like these should be returned to a user, not hidden at debug level

Agreed, however without some kind throttled logger (i.e. only log this at most once every 10m) I am not sure how that could be done. This is why i included the doc to help out with manually increasing the log level. Groks is an abstraction over regex, and the resultant regex (even for basic patterns) can get so large they become almost human un-readable. I think at best with real world patterns it is a signal that something might be off, but not likely something concrete actionable from the warning.

EDIT: I think I found a way to do this : #57302 (comment)

this.traceMatch = traceMatch;
this.ignoreMissing = ignoreMissing;
// Joni warnings are only emitted on an attempt to match, and the warning emitted for every call to match which is too verbose
// so here we emit a warning (if there is one) to the logfile at warn level on construction / processor creation.
new Grok(patternBank, combinePatterns(matchPatterns, traceMatch), matcherWatchdog, logger::warn).match("___nomatch___");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is kinda weird, but hopefully the comment explains why. This allows Joni warnings to be emitted at WARN level on PUT of the pipeline, but DEBUG for normal running.

If we put a WARN for normal running we can flood the log (and what this PR seeks to address)

@jakelandis
Copy link
Contributor Author

@rjernst @jasontedor - ready for another review. I updated to use a callback for the log message and removed the log4j dependency in the lib (I had to intro it the module). I also added a call to match on the constructor to get it to emit the warning on construction.. its kinda hacky, but the only way i can see to get the warning in the log file without flooding the log file.

Copy link
Member

@rjernst rjernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine. One more main question

modules/ingest-common/build.gradle Outdated Show resolved Hide resolved
modules/ingest-common/license/log4j-api-LICENSE.txt Outdated Show resolved Hide resolved
@jakelandis
Copy link
Contributor Author

@rjernst - mind taking another look? I fixed the license and dependency on https://github.com/elastic/elasticsearch/pull/57302/commits/2b2f269f49fe8e8c0d0f83392517f462543cb65b

@jakelandis
Copy link
Contributor Author

@elasticmachine update branch

Copy link
Member

@rjernst rjernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jakelandis
Copy link
Contributor Author

@rjernst thanks for the review
@jasontedor - merging - your requested change (removing the log4j dependency) has been addressed.

@jakelandis jakelandis merged commit f591066 into elastic:master Jun 9, 2020
@jakelandis jakelandis deleted the route_joni_warnings_to_debug_log branch June 9, 2020 18:33
jakelandis added a commit to jakelandis/elasticsearch that referenced this pull request Jun 9, 2020
When Joni, the regex engine that powers grok emits a warning it
does so by default to System.err. System.err logs are all bucketed
together in the server log at WARN level. When Joni emits a warning,
it can be extremely verbose, logging a message for each execution
again that pattern. For ingest node that means for every document
that is run that through Grok. Fortunately, Joni provides a call
back hook to push these warnings to a custom location.

This commit implements Joni's callback hook to push the Joni warning
to the Elasticsearch server logger (logger.org.elasticsearch.ingest.common.GrokProcessor)
at debug level. Generally these warning indicate a possible issue with
the regular expression and upon creation of the Grok processor will
do a "test run" of the expression and log the result (if any) at WARN
level. This WARN level log should only occur on pipeline creation which
is a much lower frequency then every document.

Additionally, the documentation is updated with instructions for how
to set the logger to debug level.
jakelandis added a commit to jakelandis/elasticsearch that referenced this pull request Jun 9, 2020
When Joni, the regex engine that powers grok emits a warning it
does so by default to System.err. System.err logs are all bucketed
together in the server log at WARN level. When Joni emits a warning,
it can be extremely verbose, logging a message for each execution
again that pattern. For ingest node that means for every document
that is run that through Grok. Fortunately, Joni provides a call
back hook to push these warnings to a custom location.

This commit implements Joni's callback hook to push the Joni warning
to the Elasticsearch server logger (logger.org.elasticsearch.ingest.common.GrokProcessor)
at debug level. Generally these warning indicate a possible issue with
the regular expression and upon creation of the Grok processor will
do a "test run" of the expression and log the result (if any) at WARN
level. This WARN level log should only occur on pipeline creation which
is a much lower frequency then every document.

Additionally, the documentation is updated with instructions for how
to set the logger to debug level.
jakelandis added a commit to jakelandis/elasticsearch that referenced this pull request Jun 9, 2020
When Joni, the regex engine that powers grok emits a warning it
does so by default to System.err. System.err logs are all bucketed
together in the server log at WARN level. When Joni emits a warning,
it can be extremely verbose, logging a message for each execution
again that pattern. For ingest node that means for every document
that is run that through Grok. Fortunately, Joni provides a call
back hook to push these warnings to a custom location.

This commit implements Joni's callback hook to push the Joni warning
to the Elasticsearch server logger (logger.org.elasticsearch.ingest.common.GrokProcessor)
at debug level. Generally these warning indicate a possible issue with
the regular expression and upon creation of the Grok processor will
do a "test run" of the expression and log the result (if any) at WARN 
level. This WARN level log should only occur on pipeline creation which 
is a much lower frequency then every document. 

Additionally, the documentation is updated with instructions for how
to set the logger to debug level.
jakelandis added a commit that referenced this pull request Jun 9, 2020
When Joni, the regex engine that powers grok emits a warning it
does so by default to System.err. System.err logs are all bucketed
together in the server log at WARN level. When Joni emits a warning,
it can be extremely verbose, logging a message for each execution
again that pattern. For ingest node that means for every document
that is run that through Grok. Fortunately, Joni provides a call
back hook to push these warnings to a custom location.

This commit implements Joni's callback hook to push the Joni warning
to the Elasticsearch server logger (logger.org.elasticsearch.ingest.common.GrokProcessor)
at debug level. Generally these warning indicate a possible issue with
the regular expression and upon creation of the Grok processor will
do a "test run" of the expression and log the result (if any) at WARN 
level. This WARN level log should only occur on pipeline creation which 
is a much lower frequency then every document. 

Additionally, the documentation is updated with instructions for how
to set the logger to debug level.
jakelandis added a commit that referenced this pull request Jun 9, 2020
When Joni, the regex engine that powers grok emits a warning it
does so by default to System.err. System.err logs are all bucketed
together in the server log at WARN level. When Joni emits a warning,
it can be extremely verbose, logging a message for each execution
again that pattern. For ingest node that means for every document
that is run that through Grok. Fortunately, Joni provides a call
back hook to push these warnings to a custom location.

This commit implements Joni's callback hook to push the Joni warning
to the Elasticsearch server logger (logger.org.elasticsearch.ingest.common.GrokProcessor)
at debug level. Generally these warning indicate a possible issue with
the regular expression and upon creation of the Grok processor will
do a "test run" of the expression and log the result (if any) at WARN
level. This WARN level log should only occur on pipeline creation which
is a much lower frequency then every document.

Additionally, the documentation is updated with instructions for how
to set the logger to debug level.
jakelandis added a commit that referenced this pull request Jun 9, 2020
When Joni, the regex engine that powers grok emits a warning it
does so by default to System.err. System.err logs are all bucketed
together in the server log at WARN level. When Joni emits a warning,
it can be extremely verbose, logging a message for each execution
again that pattern. For ingest node that means for every document
that is run that Grok filter. Fortunately, Joni provides a call
back hook to push these warnings to a custom location.

This commit implements Joni's callback hook to push the Joni warning
to the Elasticsearch server logger (logger.org.elasticsearch.ingest.common.GrokProcessor)
at debug level. A warning will be emitted on processor creation at WARN
level. Generally these warning indicate a possible issue with
the regular expression. Since Grok is an abstraction on top of the regex,
these warning may not provide much value outside of a troubleshooting
scenario.

Additionally, the documentation is updated with instructions for how
to set the logger to debug level.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP Team:Data Management Meta label for data/management team v7.7.2 v7.8.1 v7.9.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants