Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] AIOps: Adds dip support to log rate analysis in ML AIOps Labs #163100

Merged
merged 21 commits into from
Aug 9, 2023

Conversation

walterra
Copy link
Contributor

@walterra walterra commented Aug 3, 2023

Summary

Part of #161832.

This updates log rate analysis to be able to auto-detect whether the selected deviation is a spike or dip compared to the baseline time range. To achieve this, we compare the median bucket size of the two selections. If a dip gets detected, the analysis will then switch the window parameters sent to the API endpoint to run the analysis.

An info callout points out the auto-selected analysis type and explains to which time range the analysis results refer to. We need to do this to make it clear that for dip analysis the significant terms and their doc counts refer to the baseline time range and vice versa for spike analysis.

Log rate spike

image

Log rate dip

image

Functional tests

The artificial logs dataset generator for functional tests was updated to be able to also produce a dataset with a dip. Functional tests have been added to make use of that:

image

Observability Alert Details Page

In Observability, since we now auto-detect the analysis type, we no longer need to pass on the analysis type in the alert details page from the alert context. Instead, the analysis type will be part of the onAnalysisComplete() callback. The prompt for the AI assistant was updated to include the information about spike/dip that's also present in the in callout to users.

image image

Checklist

@walterra walterra self-assigned this Aug 3, 2023
@walterra walterra force-pushed the 161832-ml-aiops-detect-spike-or-dip branch 3 times, most recently from 34691f9 to c7d80fe Compare August 3, 2023 20:15
@walterra walterra force-pushed the 161832-ml-aiops-detect-spike-or-dip branch from 3297e7e to 2223676 Compare August 3, 2023 20:37
@walterra walterra added release_note:enhancement :ml Feature:ML/AIOps ML AIOps features: Change Point Detection, Log Pattern Analysis, Log Rate Analysis v8.10.0 labels Aug 4, 2023
@walterra walterra marked this pull request as ready for review August 4, 2023 14:03
@walterra walterra requested review from a team as code owners August 4, 2023 14:03
@elasticmachine
Copy link
Contributor

Pinging @elastic/ml-ui (:ml)

@walterra walterra requested a review from qn895 August 4, 2023 14:04
Copy link
Contributor

@peteharverson peteharverson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good - just tested inside the ML AIOps Labs page. All the examples I ran correctly detected if it was a dip or a spike.

Left a few comments, mostly related to the text.

})
: i18n.translate('xpack.aiops.analysis.analysisTypeDipCallOutContent', {
defaultMessage:
'The median log rate in the selected deviation time range is lower than the baseline. Therefore, the analysis results table shows statistically significant items within the baseline time range that are less present or missing within the deviation time range. The "doc count" column refers to the amount of documents in the baseline time range.',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

less in number better than less present? Any thoughts @szabosteve ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in 4f8c71a.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'm late to the party! Yes, less in number is a clearer solution. Thanks for updating!

*/
time: number | string;
/**
* Number of doc count for that time bucket
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we able to calculate how the doc count (per bucket) for the deviation compares to the baseline? I end up wanting to know how the counts compare, rather than just a single count number for the baseline / deviation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I thought about that too, it would be good to have both a baseline and deviation column in the table. To make the numbers comparable it would be good to show median per bucket (to use the same measure we use to define if it's spike or dip). I added an item to the meta issue, I'd like to add that in a separate PR: #160247

Copy link
Member

@weltenwort weltenwort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deferring to the @elastic/actionable-observability team for review of the alerting-related changes

Copy link
Contributor

@peteharverson peteharverson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latest edits LGTM

@qn895
Copy link
Member

qn895 commented Aug 8, 2023

LGTM 🎉

@kibana-ci
Copy link
Collaborator

💛 Build succeeded, but was flaky

Failed CI Steps

Test Failures

  • [job] [logs] FTR Configs #53 / Machine Learning modules get_module lists all modules
  • [job] [logs] FTR Configs #56 / spaces api with security resolve copy to spaces conflicts rbac user with all globally from the default space single-namespace types "before each" hook for "should return 200 when not overwriting, with references"

Metrics [docs]

Module Count

Fewer modules leads to a faster build time

id before after diff
aiops 439 441 +2
dataVisualizer 538 540 +2
infra 1385 1386 +1
total +5

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id before after diff
@kbn/aiops-components 6 0 -6

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
aiops 540.3KB 542.3KB +2.0KB
dataVisualizer 604.6KB 605.0KB +394.0B
infra 2.0MB 2.0MB +691.0B
total +3.1KB

Public APIs missing exports

Total count of every type that is part of your API that should be exported but is not. This will cause broken links in the API documentation system. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats exports for more detailed information.

id before after diff
@kbn/aiops-components 1 0 -1

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id before after diff
aiops 6.2KB 6.0KB -206.0B
Unknown metric groups

API count

id before after diff
@kbn/aiops-components 30 33 +3
@kbn/aiops-utils 12 20 +8
aiops 60 57 -3
total +8

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @walterra

Copy link
Contributor

@benakansara benakansara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -60,11 +58,9 @@ export const LogRateAnalysis: FC<AlertDetailsLogRateAnalysisSectionProps> = ({ r
const [dataView, setDataView] = useState<DataView | undefined>();
const [esSearchQuery, setEsSearchQuery] = useState<QueryDslQueryContainer | undefined>();
const [logRateAnalysisParams, setLogRateAnalysisParams] = useState<
{ significantFieldValues: SignificantFieldValue[] } | undefined
| { logRateAnalysisType: LogRateAnalysisType; significantFieldValues: SignificantFieldValue[] }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: There is an extra | here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's caused by our linting rules because the first item starts at a new line.

@walterra walterra merged commit da0fb1d into elastic:main Aug 9, 2023
@walterra walterra deleted the 161832-ml-aiops-detect-spike-or-dip branch August 9, 2023 06:05
bryce-b pushed a commit to bryce-b/kibana that referenced this pull request Aug 9, 2023
elastic#163100)

This updates log rate analysis to be able to auto-detect whether the
selected deviation is a spike or dip compared to the baseline time
range. To achieve this, we compare the median bucket size of the two
selections. If a dip gets detected, the analysis will then switch the
window parameters sent to the API endpoint to run the analysis.

An info callout points out the auto-selected analysis type and explains
to which time range the analysis results refer to. We need to do this to
make it clear that for dip analysis the significant terms and their doc
counts refer to the baseline time range and vice versa for spike
analysis.
@peteharverson peteharverson changed the title [ML] AIOps: Auto-detect if spike or dip selected in log rate analysis. [ML] AIOps: Adds dip support log rate analysis in ML AIOps Labs Aug 22, 2023
@peteharverson peteharverson changed the title [ML] AIOps: Adds dip support log rate analysis in ML AIOps Labs [ML] AIOps: Adds dip support to log rate analysis in ML AIOps Labs Aug 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:ML/AIOps ML AIOps features: Change Point Detection, Log Pattern Analysis, Log Rate Analysis :ml release_note:enhancement v8.10.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants