Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use query in cardinality check #49939

Conversation

benwtrent
Copy link
Member

When checking the cardinality of a field, the query should be take into account. The user might know about some bad data in their index and want to filter down to the target_field values they care about.

@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (:ml)

@benwtrent
Copy link
Member Author

run elasticsearch-ci/2

Copy link
Contributor

@dimitris-athanasiou dimitris-athanasiou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM Nice one!

protected static DataFrameAnalyticsConfig buildAnalytics(String id, String sourceIndex, String destIndex,
@Nullable String resultsField, DataFrameAnalysis analysis,
QueryBuilder queryBuilder) throws Exception {
return new DataFrameAnalyticsConfig.Builder()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other method called buildAnalytics (starting in line 165) should call this one, as the new method is more generic.

@@ -248,6 +250,27 @@ public void testDependentVariableCardinalityTooHighError() {
assertThat(e.getMessage(), equalTo("Field [keyword-field] must have at most [2] distinct values but there were at least [3]"));
}

public void testDependentVariableCardinalityTooHighErrorButWithQuery() throws Exception {
initialize("cardinality_too_high");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make the names assigned to jobs unique (currently this one and the one in line 236 are the same).

@benwtrent
Copy link
Member Author

@elasticmachine update branch

Copy link
Contributor

@przemekwitek przemekwitek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@benwtrent benwtrent merged commit dd66fae into elastic:master Dec 9, 2019
@benwtrent benwtrent deleted the feature/ml-dataframe-use-query-for-cardinality-check branch December 9, 2019 13:50
benwtrent added a commit to benwtrent/elasticsearch that referenced this pull request Dec 9, 2019
benwtrent added a commit that referenced this pull request Dec 9, 2019
When checking the cardinality of a field, the query should be take into account. The user might know about some bad data in their index and want to filter down to the target_field values they care about.
@jpountz jpountz changed the title [ML] Use query in cardinality check Use query in cardinality check Dec 18, 2019
SivagurunathanV pushed a commit to SivagurunathanV/elasticsearch that referenced this pull request Jan 23, 2020
When checking the cardinality of a field, the query should be take into account. The user might know about some bad data in their index and want to filter down to the target_field values they care about.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants