-
Notifications
You must be signed in to change notification settings - Fork 24.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add "all field" execution mode to query_string query #20925
Conversation
I am opening this PR so we can discuss the direction we'd like the "all" query For more back story, see: #19784 (comment) I'd like to get feedback about whether this is the way we want to build this |
Other features off the top of my head that would be good to eventually incorporate:
|
I did not read the code but only the description of the PR and it looks great. I like it! Something like |
The naming is totally up for grabs! I named it "all" because I was planning on punting to get feedback, so other suggestions are totally welcome! |
Thanks for the work on this @dakrone Working from your description alone I have a couple of points: I think the use of "cross_fields" is irrelevant here if you're just going to "constant_score" everything. at the end. In this scenario the only thing cross_fields is doing for you is adding a pointless limitation that all fields must share a common analyzer (I think it silently reverts to a "normal" query if this is the case rather than erroring) I can see we might want to ignore IDF in scoring but I still think Would be nice for us to know more about which of the query string operators (phrase, field, AND/OR) tend to get used by folks in practice. |
I wonder if we can avoid the new query path and provides this functionality in the existing parser.
... and apply your logic on the "foo bar" part only. I am saying this because it would be difficult to handle query_string (and simple_query_string) without a default field which seems to provide a nice OOTB.
I am not sure about the numerics for numerics fields and ip for ip fields restriction. I would prefer another approach, something like create a query on this field if the input is compatible and ignore the part that are not parseable by this field. In such case 200 would also be added to the text-like fields (and the ip as well).
Since you can group each field depending on the analyzer I think that you could use the analyzer directly for the text fields instead of relying on the whitespace analyzer upfront. For numerics and others you could split on whitespace but it's really counter intuitive to have this split before the analysis takes place (see #20841)
The |
Thanks for the feedback @markharwood & @jimferenczi!
That's a great idea, I am hoping some Kibana users (and others!) chime in here with any information about use cases!
I think that is the eventual goal of this, however, I think right now it's more complex, and it'd be good to nail down what we want the "all-ish" query to be before changing the way
This is what it already does, I apologize, I simplified in my explanation, but it will still search text/keyword fields for "200" and "127.0.0.1", not only numeric and IP fields. It only ignores fields that it cannot parse the input text as.
I experimented with doing this at first, but ran into the problem where someone sends the query
I think re-evaluating this (using |
I really like how this PR reused the MultiMatchQuery to create the inner queries. Regarding scoring I don't think this query should produce a constant score query. Since it uses the The code looks good to me, it's simple and well contained. Tough in order to be able to push a first version of this I think we should discuss the following points:
|
To reiterate the conclusions in Prague: If a user wants to do real full text search against a single field which contains the contents of multiple other fields, they should use This leaves the "discovery" use case: find these search elements regardless of which fields they're in, or what type the field is. For this use case, we spoke about adding a new The main use case for this That said, the change in LUCENE-2605 is not without problems. It works well when there is a single default field, but breaks when there are multiple The gist of all of the above is that these same issues are shared by the query string query and by the proposed There are three types of queries to deal with:
Honestly, the more I write here, the less I like the idea of removing the I'm leaning towards going back to the simpler change of excluding numerics from the |
@clintongormley something that concerns me about this is that the
I believe this can be handled by Jim's suggestion to have two versions of the query text, an non-analyzed version that's used for all the string-ish fields, and a version split on whitespace to be used for all the numeric-ish fields.
Do we think that's the intended use case for this query? To me this is more of a "match-like" query in that it is for matching, not doing more complicated things. In that case, a dedicated field and potentially using |
Regarding the three different types you mentioned @clintongormley, those make sense (I was only thinking about two, but splitting on whitespace for |
But that's essentially what we're talking about doing at the moment.
I'm not getting how we would support the query string syntax with the The _all query feels like the wrong answer here. My fear is it is going to become a big complex mess, while the existing alternative (the |
This does not solve these issues (and which is why we have discussed this so much!):
And there are probably some more things I've already forgotten. We started this discussion because of all the problems with Regarding some points made above:
|
@rjernst what gets exposed to the user? A simple field whose value gets passed to just the I could imagine changing the query string to auto-expand |
I think we're conflating two different use cases. I would argue that for the users that are advanced enough to use prefixes, wildcards, regexp, fuzzy, etc that the "all" query is not for them. These users can use the To me, the "all" query is for the users that have no idea what that search box in Kibana is backed by, and simply want easy-to-use filtering functionality that works without knowing anything about their data, the mappings, and without fear of I have pushed a change that separates the query text into an analyzed and non-analyzed version (as people have recommended) and uses the appropriate version depending on the field type. This allows a |
If we get to this point, maybe |
Think about Kibana users. Sure, some use the search bar just for filtering. Others use it with at least ranges, wildcards, AND|OR|NOT etc. Others use a lot more. Either way, there's only one search field exposed in Kibana. So either we're exposing the query string query, or we're coming up with a new query parser syntax that can be backed by this _all query. |
I am confused here, Kibana exposes the
I agree with Clinton, why can't we just change the logic when _all is the default field for a In fact this simple query:
... is really close to what the proposed _all query is doing. The only difference I see is that the |
I agree with @jimferenczi and @clintongormley:
So no need for another query type if we can get a nice OOTB without the need of generating a |
I wish it did, however, if what @clintongormley said about users using |
(By the way I'm not against re-using the |
Found it. The unintended behavior is related to bools having an implicit |
Okay, I have made some changes to the
It works great so far! What do we think of this as a solution? This would allow Kibana to retain almost all of its behavior without any changes other than disabling the |
Should we try to consolidate the |
This commit introduces a new execution mode for the `simple_query_string` query, which is intended down the road to be a replacement for the current _all field. It now does auto-field-expansion and auto-leniency when the following criteria are ALL met: The _all field is disabled No default_field has been set in the index settings No fields are specified in the request Additionally, a user can force the "all-like" execution by setting the all_fields parameter to true. When executing in all field mode, the `simple_query_string` query will look at all the fields in the mapping that are not metafields and can be searched, and automatically expand the list of fields that are going to be queried. Relates to elastic#20925, which is the `query_string` version of this work. This is basically the same behavior, but for the `simple_query_string` query. Relates to elastic#19784
This commit introduces a new execution mode for the `simple_query_string` query, which is intended down the road to be a replacement for the current _all field. It now does auto-field-expansion and auto-leniency when the following criteria are ALL met: The _all field is disabled No default_field has been set in the index settings No fields are specified in the request Additionally, a user can force the "all-like" execution by setting the all_fields parameter to true. When executing in all field mode, the `simple_query_string` query will look at all the fields in the mapping that are not metafields and can be searched, and automatically expand the list of fields that are going to be queried. Relates to #20925, which is the `query_string` version of this work. This is basically the same behavior, but for the `simple_query_string` query. Relates to #19784
As part of elastic#20925 and elastic#21341 we added an "all-fields" mode to the `query_string` and `simple_query_string`. This would expand the query to all fields and automatically set `lenient` to true. However, we should still allow a user to override the `lenient` flag to whichever value they desire, should they add it in the request. This commit does that.
This change disables the _all meta field by default. Now that we have the "all-fields" method of query execution, we can save both indexing time and disk space by disabling it. _all can no longer be configured for indices created after 6.0. Relates to elastic#20925 and elastic#21341 Resolves elastic#19784
Add `quote_field_suffix` to simple_query_string query Deprecate `lowercase_expanded_terms` and `locale`: elastic/elasticsearch#20208 Add "all field" execution mode to query_string and simple_query_string query: elastic/elasticsearch#20925 Fixes #2792
Add `quote_field_suffix` to simple_query_string query Deprecate `lowercase_expanded_terms` and `locale`: elastic/elasticsearch#20208 Add "all field" execution mode to query_string and simple_query_string query: elastic/elasticsearch#20925 Fixes #2792
* Add missing fields on simple query string query Add `quote_field_suffix` to simple_query_string query Deprecate `lowercase_expanded_terms` and `locale`: elastic/elasticsearch#20208 Add "all field" execution mode to query_string and simple_query_string query: elastic/elasticsearch#20925 Fixes #2792 * Convert to property expressions
* Add missing fields on simple query string query Add `quote_field_suffix` to simple_query_string query Deprecate `lowercase_expanded_terms` and `locale`: elastic/elasticsearch#20208 Add "all field" execution mode to query_string and simple_query_string query: elastic/elasticsearch#20925 Fixes #2792 * Convert to property expressions (cherry picked from commit bef4b9d)
* Add missing fields on simple query string query Add `quote_field_suffix` to simple_query_string query Deprecate `lowercase_expanded_terms` and `locale`: elastic/elasticsearch#20208 Add "all field" execution mode to query_string and simple_query_string query: elastic/elasticsearch#20925 Fixes elastic#2792 * Convert to property expressions (cherry picked from commit bef4b9d)
I've just upgraded a test cluster from ES and Kibana from 2.4 to 5.6.8 recently, and I am grateful for this change in Log Analysis use case. In ES 2.4, to provide end users with an easy Discovery mode when searching syslog events, I set The biggest field for syslog events in my case is the
With the discontinue of
My question is that is number 2 a good approach and is there any other ways? |
Hi @anhlqn, this sounds like a good questions for the discuss forums at https://discuss.elastic.co as we reserve Github issues for feature requests and bug reports. If you want to ask your question there it is a much better place to discuss things like this. |
This commit introduces a new execution mode for the
query_string
query, whichis intended down the road to be a replacement for the current
_all
field.It now does auto-field-expansion and auto-leniency when the following criteria
are ALL met:
_all
field is disableddefault_field
has been set in the index settingsdefault_field
has been set in the requestfields
are specified in the requestAdditionally, a user can force the "all-like" execution by setting the
all_fields
parameter totrue
.When executing in all field mode, the
query_string
query will look at all thefields in the mapping that are not metafields and can be searched, and
automatically expand the list of fields that are going to be queried.
Relates to #19784