Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Lens] add rare terms support #59430

Closed
timroes opened this issue Mar 5, 2020 · 11 comments
Closed

[Lens] add rare terms support #59430

timroes opened this issue Mar 5, 2020 · 11 comments
Assignees
Labels
enhancement New value added to drive a business result Feature:Lens Team:Visualizations Visualization editors, elastic-charts and infrastructure

Comments

@timroes
Copy link
Contributor

timroes commented Mar 5, 2020

Include rare terms as a separate option. allow configuration of....

  • max_doc_count
  • size

include fix it warnings + actions for terms sorted on count ascending
include fix it warnings + actions for rare terms sorted on count descending

Original issue

We could include the Terms, Significant Terms and Rare Terms into one aggregation in Lens. The three aggregations are mainly different in the way the terms are "sorted", so from a user perspective this could rather be options to the same aggregation.

edit: due to some change concerns it was recommended by the es team to keep these options separated

@timroes timroes added enhancement New value added to drive a business result Team:Visualizations Visualization editors, elastic-charts and infrastructure Feature:Lens labels Mar 5, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-app (Team:KibanaApp)

@ghudgins
Copy link

minimum we could do is a transparent UI/UX:

  • user sorts Top Values by count
  • user selects "Ascending" (instead of descending)
    Result: use rare terms under the hood for this case

downside - this might be a breaking change...but not sure if there's a real impact or not. The results will most likely be better.

@flash1293
Copy link
Contributor

Chatted with @nik9000 and he recommends to offer rare terms separately from ascending count. We could add a new entry to the ranking dropdown - "Rare terms".

@ghudgins ghudgins changed the title [Lens] Combine Terms/Significant Terms/Rare Terms [Lens] add rare terms support Jun 30, 2021
@nik9000
Copy link
Member

nik9000 commented Jun 30, 2021

I had to reread some things but I think I can summarize the difference between sorting terms with count ascending and rare terms:

  • rare terms doesn't have a size - instead you control how many terms are returned by modifying the rarity. That's weird, but its sort of required for that agg's definition of "correct".
  • rare terms might incorrectly leave out rare terms but it won't incorrectly report a term as rare when it isn't. terms with doc count sorted ascending will incorrectly report a term as rare when it isn't. And it might leave out rare terms too.

There are other differences, but that's the gist of it.

@flash1293
Copy link
Contributor

Good point @nik9000 - we would have to enforce the size on Lens side (or simply omit it for rare terms). Also I guess it makes sense to expose the max doc count to the user.

@nik9000
Copy link
Member

nik9000 commented Jun 30, 2021

Good point @nik9000 - we would have to enforce the size on Lens side (or simply omit it for rare terms). Also I guess it makes sense to expose the max doc count to the user.

Its really a strange thing to try and integrate, I buy that.

@ghudgins
Copy link

ghudgins commented Jul 1, 2021

design ask on here: in addition to having the new mode we need to come up with a way to suggest chart configuration changes after we have more data (like field metadata or an imprecise result from elasticsearch)

design ask part II: need to display a max bucket size as definition of "rare"

@ghudgins
Copy link

ghudgins commented Jul 7, 2021

heard this workflow today:

As as security researcher I want to to find suspicious events.
I want to filter my windows event logs for important process.name such as rundll32.exe.
Then I want to look for suspicious process.parent.name which should not use rundll32.exe
I can do this by looking for the bottom N / Rare parent processes with Lens Aggregation that is filtered by my search query.
Then I can can use Discover to look at the process and suspicious parent process at the same time (AND condition)
The results of these need to be investigated further

I may or may not be using Rare Terms correctly more like bottom N ... which are rare to me
we are doing Sort on Ascending

@flash1293
Copy link
Contributor

Discussed this offline and this is the plan: Expose it as a separate option for sorting. If selected, the user can specify both size and max doc count.

There will be a fix-it action in the accuracy warning to turn a count ascending top values into a rare terms.

@ghudgins
Copy link

ghudgins commented Jul 8, 2021

edited @timroes' original description since this issue has evolved

@ghudgins
Copy link

implemented via #121500

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New value added to drive a business result Feature:Lens Team:Visualizations Visualization editors, elastic-charts and infrastructure
Projects
None yet
Development

No branches or pull requests

5 participants