Auto expanding fields in Discovery/Query-String performance issue #12097

gmoskovicz · 2017-05-31T14:48:56Z

Kibana version: 5.4.x

Elasticsearch version: 5.4.x

Server OS version: ANY

Browser version: ANY

Browser OS version: ANY

Original install method (e.g. download page, yum, from source, etc.): ZIP file

Description of the problem including expected versus actual behavior:

Since elastic/elasticsearch#20925 when executing a query string query in an index (index pattern) that had _all disabled, we will look at all the fields in the mapping that are not metafields and can be searched, and automatically expand the list of fields that are going to be queried.

Basically, we will expand the query string for each specific field that you have in the index. Saved searches and discovery usually use the following query, along with other filters:

  "query": {
    "query_string": {
      "analyze_wildcard": true,
      "query": "*"
    }
  },

This means that for index patterns that have lots of fields, we will see the query expanded with many ConstantScore(_field_names:<FIELD_NAME_HERE> and this can be a performance impact compared with previous Elasticsearch/Kibana versions. This has been introduced in ES/Kibana 5.1.1.

This can be easily tested by grabbing the network call from a discovery page. This two steps are a quick repro:

PUT test
{
  "mappings": {
    "test_type": {
      "_all": {
        "enabled": false
      },
      "properties": {
        "prop1": {
          "type": "text"
        },
        "prop2": {
          "type": "text"
        },
        "prop3": {
          "type": "text"
        },
        "prop4": {
          "type": "text"
        }
      }
    }
  }
}

POST test/_search
{
  "profile": true,
  "version": true,
  "size": 500,
  "sort": [
    {
      "_score": {
        "order": "desc"
      }
    }
  ],
  "query": {
    "query_string": {
      "analyze_wildcard": true,
      "query": "*"
    }
  },
  "_source": {
    "excludes": []
  },
  "stored_fields": [
    "*"
  ],
  "script_fields": {},
  "docvalue_fields": [],
  "highlight": {
    "pre_tags": [
      "@kibana-highlighted-field@"
    ],
    "post_tags": [
      "@/kibana-highlighted-field@"
    ],
    "fields": {
      "*": {
        "highlight_query": {
          "query_string": {
            "analyze_wildcard": true,
            "query": "*",
            "all_fields": true
          }
        }
      }
    },
    "fragment_size": 2147483647
  }
}

Note that i added the profile: true to be able to see the lucene expressions being used.

My proposal is to, when _all is disabled and no fields or default_field is used, add a warning that the query will auto expand to all the fields available in the index patter, and that will have an extra cost.

The text was updated successfully, but these errors were encountered:

jimczi · 2017-05-31T16:04:01Z

It's just that * is ambiguous. This is not an easy fix to rewrite every * query to a match all query since you also want field:* to match only documents that have a value for the field field. We fixed the _all case (when _all is activated and is the default search field) but the all_fields mode is tricky.
A better fix here would be to make sure that *:* is used rather than * for any default query that Kibana uses.

jpountz · 2017-05-31T20:57:25Z

+1 to making *:* the default Kibana query

gmoskovicz · 2017-05-31T21:01:02Z

Yes, confirmed that MatchAllDocsQuery is used when we just have * in Kibana. As a workaround, if we want we can just rewrite * in kibana to *:* in Elasticsearch to hide this.

As a quick test, this is the performance win for just doing that change:

#With *

{
  "took": 4216,
  "timed_out": false,
  "_shards": {
    "total": 21,
    "successful": 21,
    "failed": 0
  },
  "hits": {
    "total": 241280,
    "max_score": 1,

vs

#With *:*
{
  "took": 713,
  "timed_out": false,
  "_shards": {
    "total": 21,
    "successful": 21,
    "failed": 0
  },
  "hits": {
    "total": 241408,
    "max_score": 1,

epixa · 2017-05-31T21:25:20Z

Under what real world circumstances would a person want the behavior of * instead of *:*?

jpountz · 2017-05-31T21:36:00Z

I suspect none.

Bargs · 2017-05-31T21:53:07Z

If the user has a default search field set (say foo), * will be equivalent to foo:* correct? If so, I can't think of a scenario where this would be much more useful than *:*, but we'd still want to do it in 6.0 I think since it'll be a breaking change.

Is there any reason we should prefer *:* over simply sending a match_all query by default?

cc @lukasolson

jimczi · 2017-05-31T21:54:16Z

well that's just the convention in query_string. If you have an explicit default field, * search is restricted to the document with a value for the field. We special cased the _all field some time ago but it is harder to do the same for the all_fields mode which works differently.
I agree with @Bargs, a match_all query, or even no query at all

gmoskovicz · 2017-06-01T12:14:37Z

I can't think of a scenario where a user wants to expand a * query across all the fields, unless you want to get the documents that those field exists, but for that i would recommend explicit query string with field names, and i haven't seen that case in Kibana use cases.

A match_all query will be much better than * but it'll break the convention from query_string. I don't think that people would like to see a match_all in the query bar at all. Hence why i was saying that i would, in the background, change a * in the query string for either the Match All query, or just *:*.

Whatever we do, it needs to be also done in 5.x in my opinion. I have seen:

An increase in Kibana timeouts with huge msearch operations and fields.
Performance drops, and queries taking 3 to 4 times longer.
And of course, this is hitting worst in cases where you have multiple people at the same time using kibana with dashboards.

While we do not recommend huge/very_large mappings, some users have use cases which have been working fine in 4.x series, but after upgrading this started to fail. There is an easy fix which is configuring a default field, but it's a bad experience, so i would try with a native fix in Kibana. This issue was introduced in 5.1.1 but it's going to live in 6.0 as well. So by the time we GA Kibana, the more people that upgrade, the more people that will hit this.

jimczi · 2017-06-01T13:28:19Z

For very large mappings, changing * to *:* would only solve the default search and would delay the problem. The keyword search would also be affected since searching on 1000s fields will surely be slow (or worst, subject to max_boolean_clause). I think the issue is on ES side, the all_fields mode should fail with a clear error message when the number of fields to search on is too big (we'll have to define what's the maximum size allowed but more than 100 is already a lot imo).

Bargs · 2017-06-01T14:01:28Z

I don't think that people would like to see a match_all in the query bar at all.

To clarify, I'm just proposing we change the default query to a match_all behind the scenes. The user would never see it in the query bar. Since #10280 was merged the query bar is empty by default (instead of showing *). When the query bar is empty we send a query string query for * by default. Instead we'll just send a match_all when the query bar is empty. We actually talked about this in the PR but waved off the idea because the implementation was a bit messy and we didn't see any real benefit at the time. There's an obvious benefit now, and I think I can see how to implement it in a more simple manner.

Whatever we do, it needs to be also done in 5.x in my opinion.

I'll leave this decision up to @epixa. If a user has a default search field set and has saved visualizations or dashboards with empty searches, this change could alter the results of their queries so it's theoretically a breaking change. Perhaps that demographic is small enough and the benefit of the change is great enough that it's worth it.

jpountz · 2017-06-07T07:32:37Z

If a user has a default search field set

Could we only use a match_all if the default search field is not set?

dakrone · 2017-06-07T14:16:13Z

the all_fields mode should fail with a clear error message when the number of fields to search on is too big

I opened an issue elastic/elasticsearch#25105 so we can discuss this and decide on a good limit

Bargs · 2017-06-07T16:42:44Z

Could we only use a match_all if the default search field is not set?

I think we'd have to get and parse the settings for every matching index to get that info wouldn't we?

Bargs · 2017-06-21T16:41:48Z

BTW this issue seems to be exacerbated by highlighting, so a workaround for currently affected users would be to set doc_table:highlight:all_fields and/or doc_table:highlight to false in Kibana's advanced settings.

LeeDr · 2017-07-11T15:35:58Z

This has gotten to the point of making Discover unusable for metricbeat with it's 993 fields.
Depending on the host running Elasticsearch (I'm running the integration-test on CentOS VM right now) the Discover default query can timeout at 30 seconds on metricbeat-* index.

If I run this in dev console (which is the query part from the Discover request);

GET metricbeat-*/_validate/query?rewrite&explain=true
{
  
  "query": {
    "bool": {
      "must": [
        {
          "query_string": {
            "analyze_wildcard": true,
            "query": "*"
          }
        },
        {
          "range": {
            "@timestamp": {
              "gte": 1499781072993,
              "lte": 1499781972993,
              "format": "epoch_millis"
            }
          }
        }
      ],
      "must_not": []
    }
  }
}

I get this (so long I cut most of it out);

{
  "valid": true,
  "_shards": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "explanations": [
    {
      "index": "metricbeat-6.0.0-alpha3-2017.07.11",
      "valid": true,
      "explanation": "+(ConstantScore(_field_names:system.process.cgroup.memory.mem.usage.max.bytes) 
| ConstantScore(_field_names:kubernetes.event.involved_object.name) | 
ConstantScore(_field_names:couchbase.node.get_hits) | 
ConstantScore(_field_names:system.socket.local.port) | 
ConstantScore(_field_names:haproxy.stat.check.health.last) | 
ConstantScore(_field_names:system.socket.remote.port) | 
ConstantScore(_field_names:mongodb.status.wired_tiger.cache.pages.write) | 
ConstantScore(_field_names:system.socket.remote.etld_plus_one) | 
ConstantScore(_field_names:haproxy.stat.queue.time.avg) | 
ConstantScore(_field_names:haproxy.info.memory.max.bytes) | 
ConstantScore(_field_names:couchbase.node.couch.views.disk_size.bytes) | 
ConstantScore(_field_names:mongodb.status.extra_info.page_faults) | 
ConstantScore(_field_names:redis.info.stats.commands_processed) | 

...

d) | ConstantScore(_field_names:system.process.cgroup.memory.stats.page_faults)) +@timestamp:[1499781072993 TO 1499781972993]"
    }
  ]
}

If I change the query from * to *:* I get this;

{
  "valid": true,
  "_shards": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "explanations": [
    {
      "index": "metricbeat-6.0.0-alpha3-2017.07.11",
      "valid": true,
      "explanation": "+*:* +@timestamp:[1499781072993 TO 1499781972993]"
    }
  ]
}

LeeDr · 2017-07-11T15:40:30Z

Screenshot from master showing a 26 second query time for default * query;

lukasolson · 2017-07-24T21:29:50Z

Fixed by #13047.

gmoskovicz added Feature:Query Bar Querying and query bar features release_note:enhancement labels May 31, 2017

epixa added the :Discovery label May 31, 2017

Bargs mentioned this issue Jun 7, 2017

[Discover] Discover times out when there's a lot of fields (b/c of highlight) #12131

Closed

dakrone mentioned this issue Jun 7, 2017

Throw error when all_fields mode queries over a certain number of fields elastic/elasticsearch#25105

Closed

lukasolson self-assigned this Jun 21, 2017

lukasolson mentioned this issue Jun 29, 2017

Default query to match_all instead of query_string #12583

Closed

lukasolson added the PR sent label Jun 29, 2017

dakrone mentioned this issue Jul 5, 2017

Performance degradation for '*' query with '_all' field disabled elastic/elasticsearch#25556

Closed

LeeDr added blocker v6.0.0-beta1 labels Jul 11, 2017

Bargs removed the PR sent label Jul 11, 2017

lukasolson mentioned this issue Jul 12, 2017

Default to match_all instead of query_string #12785

Closed

epixa added the PR sent label Jul 12, 2017

lukasolson removed release_note:enhancement v6.0.0-beta1 labels Jul 13, 2017

lukasolson mentioned this issue Jul 21, 2017

Change default query to match_all #13047

Merged

lukasolson closed this as completed Jul 24, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Auto expanding fields in Discovery/Query-String performance issue #12097

Auto expanding fields in Discovery/Query-String performance issue #12097

gmoskovicz commented May 31, 2017

jimczi commented May 31, 2017

jpountz commented May 31, 2017

gmoskovicz commented May 31, 2017

epixa commented May 31, 2017

jpountz commented May 31, 2017

Bargs commented May 31, 2017 •

edited

Loading

jimczi commented May 31, 2017

gmoskovicz commented Jun 1, 2017

jimczi commented Jun 1, 2017

Bargs commented Jun 1, 2017

jpountz commented Jun 7, 2017

dakrone commented Jun 7, 2017

Bargs commented Jun 7, 2017

Bargs commented Jun 21, 2017

LeeDr commented Jul 11, 2017

LeeDr commented Jul 11, 2017

lukasolson commented Jul 24, 2017

Auto expanding fields in Discovery/Query-String performance issue #12097

Auto expanding fields in Discovery/Query-String performance issue #12097

Comments

gmoskovicz commented May 31, 2017

jimczi commented May 31, 2017

jpountz commented May 31, 2017

gmoskovicz commented May 31, 2017

epixa commented May 31, 2017

jpountz commented May 31, 2017

Bargs commented May 31, 2017 • edited Loading

jimczi commented May 31, 2017

gmoskovicz commented Jun 1, 2017

jimczi commented Jun 1, 2017

Bargs commented Jun 1, 2017

jpountz commented Jun 7, 2017

dakrone commented Jun 7, 2017

Bargs commented Jun 7, 2017

Bargs commented Jun 21, 2017

LeeDr commented Jul 11, 2017

LeeDr commented Jul 11, 2017

lukasolson commented Jul 24, 2017

Bargs commented May 31, 2017 •

edited

Loading