Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto expanding fields in Discovery/Query-String performance issue #12097

Closed
gmoskovicz opened this issue May 31, 2017 · 17 comments
Closed

Auto expanding fields in Discovery/Query-String performance issue #12097

gmoskovicz opened this issue May 31, 2017 · 17 comments
Assignees
Labels
blocker Feature:Query Bar Querying and query bar features PR sent

Comments

@gmoskovicz
Copy link
Contributor

Kibana version: 5.4.x

Elasticsearch version: 5.4.x

Server OS version: ANY

Browser version: ANY

Browser OS version: ANY

Original install method (e.g. download page, yum, from source, etc.): ZIP file

Description of the problem including expected versus actual behavior:

Since elastic/elasticsearch#20925 when executing a query string query in an index (index pattern) that had _all disabled, we will look at all the fields in the mapping that are not metafields and can be searched, and automatically expand the list of fields that are going to be queried.

Basically, we will expand the query string for each specific field that you have in the index. Saved searches and discovery usually use the following query, along with other filters:

  "query": {
    "query_string": {
      "analyze_wildcard": true,
      "query": "*"
    }
  },

This means that for index patterns that have lots of fields, we will see the query expanded with many ConstantScore(_field_names:<FIELD_NAME_HERE> and this can be a performance impact compared with previous Elasticsearch/Kibana versions. This has been introduced in ES/Kibana 5.1.1.

This can be easily tested by grabbing the network call from a discovery page. This two steps are a quick repro:

PUT test
{
  "mappings": {
    "test_type": {
      "_all": {
        "enabled": false
      },
      "properties": {
        "prop1": {
          "type": "text"
        },
        "prop2": {
          "type": "text"
        },
        "prop3": {
          "type": "text"
        },
        "prop4": {
          "type": "text"
        }
      }
    }
  }
}

POST test/_search
{
  "profile": true,
  "version": true,
  "size": 500,
  "sort": [
    {
      "_score": {
        "order": "desc"
      }
    }
  ],
  "query": {
    "query_string": {
      "analyze_wildcard": true,
      "query": "*"
    }
  },
  "_source": {
    "excludes": []
  },
  "stored_fields": [
    "*"
  ],
  "script_fields": {},
  "docvalue_fields": [],
  "highlight": {
    "pre_tags": [
      "@kibana-highlighted-field@"
    ],
    "post_tags": [
      "@/kibana-highlighted-field@"
    ],
    "fields": {
      "*": {
        "highlight_query": {
          "query_string": {
            "analyze_wildcard": true,
            "query": "*",
            "all_fields": true
          }
        }
      }
    },
    "fragment_size": 2147483647
  }
}

Note that i added the profile: true to be able to see the lucene expressions being used.

My proposal is to, when _all is disabled and no fields or default_field is used, add a warning that the query will auto expand to all the fields available in the index patter, and that will have an extra cost.

@gmoskovicz gmoskovicz added Feature:Query Bar Querying and query bar features release_note:enhancement labels May 31, 2017
@jimczi
Copy link

jimczi commented May 31, 2017

It's just that * is ambiguous. This is not an easy fix to rewrite every * query to a match all query since you also want field:* to match only documents that have a value for the field field. We fixed the _all case (when _all is activated and is the default search field) but the all_fields mode is tricky.
A better fix here would be to make sure that *:* is used rather than * for any default query that Kibana uses.

@jpountz
Copy link

jpountz commented May 31, 2017

+1 to making *:* the default Kibana query

@gmoskovicz
Copy link
Contributor Author

Yes, confirmed that MatchAllDocsQuery is used when we just have * in Kibana. As a workaround, if we want we can just rewrite * in kibana to *:* in Elasticsearch to hide this.

As a quick test, this is the performance win for just doing that change:

#With *

{
  "took": 4216,
  "timed_out": false,
  "_shards": {
    "total": 21,
    "successful": 21,
    "failed": 0
  },
  "hits": {
    "total": 241280,
    "max_score": 1,

vs

#With *:*
{
  "took": 713,
  "timed_out": false,
  "_shards": {
    "total": 21,
    "successful": 21,
    "failed": 0
  },
  "hits": {
    "total": 241408,
    "max_score": 1,

@epixa
Copy link
Contributor

epixa commented May 31, 2017

Under what real world circumstances would a person want the behavior of * instead of *:*?

@jpountz
Copy link

jpountz commented May 31, 2017

I suspect none.

@Bargs
Copy link
Contributor

Bargs commented May 31, 2017

If the user has a default search field set (say foo), * will be equivalent to foo:* correct? If so, I can't think of a scenario where this would be much more useful than *:*, but we'd still want to do it in 6.0 I think since it'll be a breaking change.

Is there any reason we should prefer *:* over simply sending a match_all query by default?

cc @lukasolson

@jimczi
Copy link

jimczi commented May 31, 2017

well that's just the convention in query_string. If you have an explicit default field, * search is restricted to the document with a value for the field. We special cased the _all field some time ago but it is harder to do the same for the all_fields mode which works differently.
I agree with @Bargs, a match_all query, or even no query at all

@gmoskovicz
Copy link
Contributor Author

I can't think of a scenario where a user wants to expand a * query across all the fields, unless you want to get the documents that those field exists, but for that i would recommend explicit query string with field names, and i haven't seen that case in Kibana use cases.

A match_all query will be much better than * but it'll break the convention from query_string. I don't think that people would like to see a match_all in the query bar at all. Hence why i was saying that i would, in the background, change a * in the query string for either the Match All query, or just *:*.

Whatever we do, it needs to be also done in 5.x in my opinion. I have seen:

  1. An increase in Kibana timeouts with huge msearch operations and fields.
  2. Performance drops, and queries taking 3 to 4 times longer.
  3. And of course, this is hitting worst in cases where you have multiple people at the same time using kibana with dashboards.

While we do not recommend huge/very_large mappings, some users have use cases which have been working fine in 4.x series, but after upgrading this started to fail. There is an easy fix which is configuring a default field, but it's a bad experience, so i would try with a native fix in Kibana. This issue was introduced in 5.1.1 but it's going to live in 6.0 as well. So by the time we GA Kibana, the more people that upgrade, the more people that will hit this.

@jimczi
Copy link

jimczi commented Jun 1, 2017

For very large mappings, changing * to *:* would only solve the default search and would delay the problem. The keyword search would also be affected since searching on 1000s fields will surely be slow (or worst, subject to max_boolean_clause). I think the issue is on ES side, the all_fields mode should fail with a clear error message when the number of fields to search on is too big (we'll have to define what's the maximum size allowed but more than 100 is already a lot imo).

@Bargs
Copy link
Contributor

Bargs commented Jun 1, 2017

I don't think that people would like to see a match_all in the query bar at all.

To clarify, I'm just proposing we change the default query to a match_all behind the scenes. The user would never see it in the query bar. Since #10280 was merged the query bar is empty by default (instead of showing *). When the query bar is empty we send a query string query for * by default. Instead we'll just send a match_all when the query bar is empty. We actually talked about this in the PR but waved off the idea because the implementation was a bit messy and we didn't see any real benefit at the time. There's an obvious benefit now, and I think I can see how to implement it in a more simple manner.

Whatever we do, it needs to be also done in 5.x in my opinion.

I'll leave this decision up to @epixa. If a user has a default search field set and has saved visualizations or dashboards with empty searches, this change could alter the results of their queries so it's theoretically a breaking change. Perhaps that demographic is small enough and the benefit of the change is great enough that it's worth it.

@jpountz
Copy link

jpountz commented Jun 7, 2017

If a user has a default search field set

Could we only use a match_all if the default search field is not set?

@dakrone
Copy link
Member

dakrone commented Jun 7, 2017

the all_fields mode should fail with a clear error message when the number of fields to search on is too big

I opened an issue elastic/elasticsearch#25105 so we can discuss this and decide on a good limit

@Bargs
Copy link
Contributor

Bargs commented Jun 7, 2017

Could we only use a match_all if the default search field is not set?

I think we'd have to get and parse the settings for every matching index to get that info wouldn't we?

@lukasolson lukasolson self-assigned this Jun 21, 2017
@Bargs
Copy link
Contributor

Bargs commented Jun 21, 2017

BTW this issue seems to be exacerbated by highlighting, so a workaround for currently affected users would be to set doc_table:highlight:all_fields and/or doc_table:highlight to false in Kibana's advanced settings.

@LeeDr
Copy link
Contributor

LeeDr commented Jul 11, 2017

This has gotten to the point of making Discover unusable for metricbeat with it's 993 fields.
Depending on the host running Elasticsearch (I'm running the integration-test on CentOS VM right now) the Discover default query can timeout at 30 seconds on metricbeat-* index.

If I run this in dev console (which is the query part from the Discover request);

GET metricbeat-*/_validate/query?rewrite&explain=true
{
  
  "query": {
    "bool": {
      "must": [
        {
          "query_string": {
            "analyze_wildcard": true,
            "query": "*"
          }
        },
        {
          "range": {
            "@timestamp": {
              "gte": 1499781072993,
              "lte": 1499781972993,
              "format": "epoch_millis"
            }
          }
        }
      ],
      "must_not": []
    }
  }
}

I get this (so long I cut most of it out);

{
  "valid": true,
  "_shards": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "explanations": [
    {
      "index": "metricbeat-6.0.0-alpha3-2017.07.11",
      "valid": true,
      "explanation": "+(ConstantScore(_field_names:system.process.cgroup.memory.mem.usage.max.bytes) 
| ConstantScore(_field_names:kubernetes.event.involved_object.name) | 
ConstantScore(_field_names:couchbase.node.get_hits) | 
ConstantScore(_field_names:system.socket.local.port) | 
ConstantScore(_field_names:haproxy.stat.check.health.last) | 
ConstantScore(_field_names:system.socket.remote.port) | 
ConstantScore(_field_names:mongodb.status.wired_tiger.cache.pages.write) | 
ConstantScore(_field_names:system.socket.remote.etld_plus_one) | 
ConstantScore(_field_names:haproxy.stat.queue.time.avg) | 
ConstantScore(_field_names:haproxy.info.memory.max.bytes) | 
ConstantScore(_field_names:couchbase.node.couch.views.disk_size.bytes) | 
ConstantScore(_field_names:mongodb.status.extra_info.page_faults) | 
ConstantScore(_field_names:redis.info.stats.commands_processed) | 

...

d) | ConstantScore(_field_names:system.process.cgroup.memory.stats.page_faults)) +@timestamp:[1499781072993 TO 1499781972993]"
    }
  ]
}

If I change the query from * to *:* I get this;

{
  "valid": true,
  "_shards": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "explanations": [
    {
      "index": "metricbeat-6.0.0-alpha3-2017.07.11",
      "valid": true,
      "explanation": "+*:* +@timestamp:[1499781072993 TO 1499781972993]"
    }
  ]
}

@LeeDr
Copy link
Contributor

LeeDr commented Jul 11, 2017

Screenshot from master showing a 26 second query time for default * query;

image

@lukasolson
Copy link
Member

Fixed by #13047.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocker Feature:Query Bar Querying and query bar features PR sent
Projects
None yet
Development

No branches or pull requests

8 participants