diff --git a/docs/reference/mapping/types/parent-join.asciidoc b/docs/reference/mapping/types/parent-join.asciidoc index ad33205650d5b..56396ce7584f1 100644 --- a/docs/reference/mapping/types/parent-join.asciidoc +++ b/docs/reference/mapping/types/parent-join.asciidoc @@ -114,6 +114,17 @@ PUT my_index/doc/4?routing=1&refresh <2> `answer` is the name of the join for this document <3> The parent id of this child document +==== Parent-join and performance. + +The join field shouldn't be used like joins in a relation database. In Elasticsearch the key to good performance +is to de-normalize your data into documents. Each join field, `has_child` or `has_parent` query adds a +significant tax to your query performance. + +The only case where the join field makes sense is if your data contains a one-to-many relationship where +one entity significantly outnumbers the other entity. An example of such case is a use case with products +and offers for these products. In the case that offers significantly outnumbers the number of products then +it makes sense to model the product as parent document and the offer as child document. + ==== Parent-join restrictions * Only one `join` field mapping is allowed per index. @@ -338,7 +349,7 @@ GET _nodes/stats/indices/fielddata?human&fields=my_join_field#question // CONSOLE // TEST[continued] -==== Multiple levels of parent join +==== Multiple children per parent It is also possible to define multiple children for a single parent: @@ -363,62 +374,3 @@ PUT my_index // CONSOLE <1> `question` is parent of `answer` and `comment`. - -And multiple levels of parent/child: - -[source,js] --------------------------------------------------- -PUT my_index -{ - "mappings": { - "doc": { - "properties": { - "my_join_field": { - "type": "join", - "relations": { - "question": ["answer", "comment"], <1> - "answer": "vote" <2> - } - } - } - } - } -} --------------------------------------------------- -// CONSOLE - -<1> `question` is parent of `answer` and `comment` -<2> `answer` is parent of `vote` - -The mapping above represents the following tree: - - question - / \ - / \ - comment answer - | - | - vote - -Indexing a grand child document requires a `routing` value equals -to the grand-parent (the greater parent of the lineage): - - -[source,js] --------------------------------------------------- -PUT my_index/doc/3?routing=1&refresh <1> -{ - "text": "This is a vote", - "my_join_field": { - "name": "vote", - "parent": "2" <2> - } -} --------------------------------------------------- -// CONSOLE -// TEST[continued] - -<1> This child document must be on the same shard than its grandparent and parent -<2> The parent id of this document (must points to an `answer` document) - - diff --git a/docs/reference/query-dsl/has-child-query.asciidoc b/docs/reference/query-dsl/has-child-query.asciidoc index bfe7eff4c2f73..d13ae326fb7fe 100644 --- a/docs/reference/query-dsl/has-child-query.asciidoc +++ b/docs/reference/query-dsl/has-child-query.asciidoc @@ -23,6 +23,14 @@ GET /_search -------------------------------------------------- // CONSOLE +Note that the `has_child` is a slow query compared to other queries in the +query dsl due to the fact that it performs a join. The performance degrades +as the number of matching child documents pointing to unique parent documents +increases. If you care about query performance you should not use this query. +However if you do happen to use this query then use it as less as possible. Each +`has_child` query that gets added to a search request can increase query time +significantly. + [float] ==== Scoring capabilities diff --git a/docs/reference/query-dsl/has-parent-query.asciidoc b/docs/reference/query-dsl/has-parent-query.asciidoc index a1dcf605ddd4a..4065a9d99fe2e 100644 --- a/docs/reference/query-dsl/has-parent-query.asciidoc +++ b/docs/reference/query-dsl/has-parent-query.asciidoc @@ -25,6 +25,13 @@ GET /_search -------------------------------------------------- // CONSOLE +Note that the `has_parent` is a slow query compared to other queries in the +query dsl due to the fact that it performs a join. The performance degrades +as the number of matching parent documents increases. If you care about query +performance you should not use this query. However if you do happen to use +this query then use it as less as possible. Each `has_parent` query that gets +added to a search request can increase query time significantly. + [float] ==== Scoring capabilities