-
Notifications
You must be signed in to change notification settings - Fork 24.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure Joni warning are logged at debug #57302
Ensure Joni warning are logged at debug #57302
Conversation
When Joni, the regex engine that powers grok emits a warning it does so by default to System.err. System.err logs are all bucketed together in the server log at WARN level. When Joni emits a warning, it can be extremely verbose, logging a message for each execution again that pattern. For ingest node that means for every document that is run that Grok filter. Fortunately, Joni provides a call back hook to push these warnings to a custom location. This commit implements Joni's callback hook to push the Joni warning to the Elasticsearch server logger (logger.org.elasticsearch.grok.Grok) at debug level. Generally these warning indicate a possible issue with the regular expression. Since Grok is an abstraction on top of the regex, these warning may not provide much value outside of a troubleshooting scenario. Additionally, the documentation is updated with instructions for how to set the logger to debug level.
Pinging @elastic/es-core-features (:Core/Features/Ingest) |
@rjernst - I requested your review mostly for introduction of log4j from a lib. I think that is OK ?, but also had to configure the thridparty audit. |
@@ -101,7 +106,8 @@ private Grok(Map<String, String> patternBank, String grokPattern, boolean namedC | |||
|
|||
String expression = toRegex(grokPattern); | |||
byte[] expressionBytes = expression.getBytes(StandardCharsets.UTF_8); | |||
this.compiledExpression = new Regex(expressionBytes, 0, expressionBytes.length, Option.DEFAULT, UTF8Encoding.INSTANCE); | |||
this.compiledExpression = new Regex(expressionBytes, 0, expressionBytes.length, Option.DEFAULT, UTF8Encoding.INSTANCE, | |||
message -> logger.log(Level.DEBUG, message)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^^ this line is only substantive change here. (it implements the callback so that it does not default to Std.err)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Jake! I'm happy that this is an easy change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should add Log4j to Grok, I think consumers of Grok should push down a logger. This makes more sense to me anyway, that the logging control should occur through the feature that is using Grok (ingest, vs. ML). This means that the Grok pattern regex needs to become non-static, which I think is fine.
Yeah, I was unsure if we should do that. Are the lib projects subject to semver w.r.t the java API ?
Do think SLF4J (and we bind it to Log4j upstream) or pass in something custom like what Joni does ? (or possibly just expose the Joni callback) |
No, they are for internal use.
Neither, how about just expecting a |
I agree with Jason's assessment; we should not depend on log4j in a lib. Separately, are these warnings only emitted when running Grok? Can we not get them when we compile the pattern in the first place? It seems like these should be returned to a user, not hidden at debug level. They can indicate an error, like in your example maybe a |
Yes, as far as i can tell, the warnings are emitted at the runtime execution, not on the pattern compile.
Agreed, EDIT: I think I found a way to do this : #57302 (comment) |
this.traceMatch = traceMatch; | ||
this.ignoreMissing = ignoreMissing; | ||
// Joni warnings are only emitted on an attempt to match, and the warning emitted for every call to match which is too verbose | ||
// so here we emit a warning (if there is one) to the logfile at warn level on construction / processor creation. | ||
new Grok(patternBank, combinePatterns(matchPatterns, traceMatch), matcherWatchdog, logger::warn).match("___nomatch___"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is kinda weird, but hopefully the comment explains why. This allows Joni warnings to be emitted at WARN level on PUT of the pipeline, but DEBUG for normal running.
If we put a WARN for normal running we can flood the log (and what this PR seeks to address)
@rjernst @jasontedor - ready for another review. I updated to use a callback for the log message and removed the log4j dependency in the lib (I had to intro it the module). I also added a call to match on the constructor to get it to emit the warning on construction.. its kinda hacky, but the only way i can see to get the warning in the log file without flooding the log file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine. One more main question
@rjernst - mind taking another look? I fixed the license and dependency on https://github.com/elastic/elasticsearch/pull/57302/commits/2b2f269f49fe8e8c0d0f83392517f462543cb65b |
@elasticmachine update branch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@rjernst thanks for the review |
When Joni, the regex engine that powers grok emits a warning it does so by default to System.err. System.err logs are all bucketed together in the server log at WARN level. When Joni emits a warning, it can be extremely verbose, logging a message for each execution again that pattern. For ingest node that means for every document that is run that through Grok. Fortunately, Joni provides a call back hook to push these warnings to a custom location. This commit implements Joni's callback hook to push the Joni warning to the Elasticsearch server logger (logger.org.elasticsearch.ingest.common.GrokProcessor) at debug level. Generally these warning indicate a possible issue with the regular expression and upon creation of the Grok processor will do a "test run" of the expression and log the result (if any) at WARN level. This WARN level log should only occur on pipeline creation which is a much lower frequency then every document. Additionally, the documentation is updated with instructions for how to set the logger to debug level.
When Joni, the regex engine that powers grok emits a warning it does so by default to System.err. System.err logs are all bucketed together in the server log at WARN level. When Joni emits a warning, it can be extremely verbose, logging a message for each execution again that pattern. For ingest node that means for every document that is run that through Grok. Fortunately, Joni provides a call back hook to push these warnings to a custom location. This commit implements Joni's callback hook to push the Joni warning to the Elasticsearch server logger (logger.org.elasticsearch.ingest.common.GrokProcessor) at debug level. Generally these warning indicate a possible issue with the regular expression and upon creation of the Grok processor will do a "test run" of the expression and log the result (if any) at WARN level. This WARN level log should only occur on pipeline creation which is a much lower frequency then every document. Additionally, the documentation is updated with instructions for how to set the logger to debug level.
When Joni, the regex engine that powers grok emits a warning it does so by default to System.err. System.err logs are all bucketed together in the server log at WARN level. When Joni emits a warning, it can be extremely verbose, logging a message for each execution again that pattern. For ingest node that means for every document that is run that through Grok. Fortunately, Joni provides a call back hook to push these warnings to a custom location. This commit implements Joni's callback hook to push the Joni warning to the Elasticsearch server logger (logger.org.elasticsearch.ingest.common.GrokProcessor) at debug level. Generally these warning indicate a possible issue with the regular expression and upon creation of the Grok processor will do a "test run" of the expression and log the result (if any) at WARN level. This WARN level log should only occur on pipeline creation which is a much lower frequency then every document. Additionally, the documentation is updated with instructions for how to set the logger to debug level.
When Joni, the regex engine that powers grok emits a warning it does so by default to System.err. System.err logs are all bucketed together in the server log at WARN level. When Joni emits a warning, it can be extremely verbose, logging a message for each execution again that pattern. For ingest node that means for every document that is run that through Grok. Fortunately, Joni provides a call back hook to push these warnings to a custom location. This commit implements Joni's callback hook to push the Joni warning to the Elasticsearch server logger (logger.org.elasticsearch.ingest.common.GrokProcessor) at debug level. Generally these warning indicate a possible issue with the regular expression and upon creation of the Grok processor will do a "test run" of the expression and log the result (if any) at WARN level. This WARN level log should only occur on pipeline creation which is a much lower frequency then every document. Additionally, the documentation is updated with instructions for how to set the logger to debug level.
When Joni, the regex engine that powers grok emits a warning it does so by default to System.err. System.err logs are all bucketed together in the server log at WARN level. When Joni emits a warning, it can be extremely verbose, logging a message for each execution again that pattern. For ingest node that means for every document that is run that through Grok. Fortunately, Joni provides a call back hook to push these warnings to a custom location. This commit implements Joni's callback hook to push the Joni warning to the Elasticsearch server logger (logger.org.elasticsearch.ingest.common.GrokProcessor) at debug level. Generally these warning indicate a possible issue with the regular expression and upon creation of the Grok processor will do a "test run" of the expression and log the result (if any) at WARN level. This WARN level log should only occur on pipeline creation which is a much lower frequency then every document. Additionally, the documentation is updated with instructions for how to set the logger to debug level.
When Joni, the regex engine that powers grok emits a warning it does so by default to System.err. System.err logs are all bucketed together in the server log at WARN level. When Joni emits a warning, it can be extremely verbose, logging a message for each execution again that pattern. For ingest node that means for every document that is run that Grok filter. Fortunately, Joni provides a call back hook to push these warnings to a custom location. This commit implements Joni's callback hook to push the Joni warning to the Elasticsearch server logger (logger.org.elasticsearch.ingest.common.GrokProcessor) at debug level. A warning will be emitted on processor creation at WARN level. Generally these warning indicate a possible issue with the regular expression. Since Grok is an abstraction on top of the regex, these warning may not provide much value outside of a troubleshooting scenario. Additionally, the documentation is updated with instructions for how to set the logger to debug level.
When Joni, the regex engine that powers grok emits a warning it
does so by default to System.err. System.err logs are all bucketed
together in the server log at WARN level. When Joni emits a warning,
it can be extremely verbose, logging a message for each execution
again that pattern. For ingest node that means for every document
that is run that Grok filter. Fortunately, Joni provides a call
back hook to push these warnings to a custom location.
This commit implements Joni's callback hook to push the Joni warning
to the Elasticsearch server logger (logger.org.elasticsearch.ingest.common.GrokProcessor)
at debug level. A warning will be emitted on processor creation at WARN
level. Generally these warning indicate a possible issue with
the regular expression. Since Grok is an abstraction on top of the regex,
these warning may not provide much value outside of a troubleshooting
scenario.
Additionally, the documentation is updated with instructions for how
to set the logger to debug level.
===
Notes:
This error is pretty easily reproduced:
for input
and pattern
before this commit result in the following warning for every document run through the pattern.
Prior to 7.7 this message was incorrectly missed in our logs due to stderr not getting routed properly... that fix: #51569 now allows this error (as shown above) to persist in the ES server log possibly causing a lot of noise to the ES server log. This fix quiets that noise.