Skip to content

Commit

Permalink
docs: describe parent/child performances
Browse files Browse the repository at this point in the history
  • Loading branch information
martijnvg committed Oct 26, 2017
1 parent 8bf3324 commit f1e944a
Show file tree
Hide file tree
Showing 3 changed files with 27 additions and 60 deletions.
72 changes: 12 additions & 60 deletions docs/reference/mapping/types/parent-join.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -114,6 +114,17 @@ PUT my_index/doc/4?routing=1&refresh
<2> `answer` is the name of the join for this document
<3> The parent id of this child document

==== Parent-join and performance.

The join field shouldn't be used like joins in a relation database. In Elasticsearch the key to good performance
is to de-normalize your data into documents. Each join field, `has_child` or `has_parent` query adds a
significant tax to your query performance.

The only case where the join field makes sense is if your data contains a one-to-many relationship where
one entity significantly outnumbers the other entity. An example of such case is a use case with products
and offers for these products. In the case that offers significantly outnumbers the number of products then
it makes sense to model the product as parent document and the offer as child document.

==== Parent-join restrictions

* Only one `join` field mapping is allowed per index.
Expand Down Expand Up @@ -338,7 +349,7 @@ GET _nodes/stats/indices/fielddata?human&fields=my_join_field#question
// CONSOLE
// TEST[continued]

==== Multiple levels of parent join
==== Multiple children per parent

It is also possible to define multiple children for a single parent:

Expand All @@ -363,62 +374,3 @@ PUT my_index
// CONSOLE

<1> `question` is parent of `answer` and `comment`.

And multiple levels of parent/child:

[source,js]
--------------------------------------------------
PUT my_index
{
"mappings": {
"doc": {
"properties": {
"my_join_field": {
"type": "join",
"relations": {
"question": ["answer", "comment"], <1>
"answer": "vote" <2>
}
}
}
}
}
}
--------------------------------------------------
// CONSOLE

<1> `question` is parent of `answer` and `comment`
<2> `answer` is parent of `vote`

The mapping above represents the following tree:

question
/ \
/ \
comment answer
|
|
vote

Indexing a grand child document requires a `routing` value equals
to the grand-parent (the greater parent of the lineage):


[source,js]
--------------------------------------------------
PUT my_index/doc/3?routing=1&refresh <1>
{
"text": "This is a vote",
"my_join_field": {
"name": "vote",
"parent": "2" <2>
}
}
--------------------------------------------------
// CONSOLE
// TEST[continued]

<1> This child document must be on the same shard than its grandparent and parent
<2> The parent id of this document (must points to an `answer` document)


8 changes: 8 additions & 0 deletions docs/reference/query-dsl/has-child-query.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,14 @@ GET /_search
--------------------------------------------------
// CONSOLE

Note that the `has_child` is a slow query compared to other queries in the
query dsl due to the fact that it performs a join. The performance degrades
as the number of matching child documents pointing to unique parent documents
increases. If you care about query performance you should not use this query.
However if you do happen to use this query then use it as less as possible. Each
`has_child` query that gets added to a search request can increase query time
significantly.

[float]
==== Scoring capabilities

Expand Down
7 changes: 7 additions & 0 deletions docs/reference/query-dsl/has-parent-query.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,13 @@ GET /_search
--------------------------------------------------
// CONSOLE

Note that the `has_parent` is a slow query compared to other queries in the
query dsl due to the fact that it performs a join. The performance degrades
as the number of matching parent documents increases. If you care about query
performance you should not use this query. However if you do happen to use
this query then use it as less as possible. Each `has_parent` query that gets
added to a search request can increase query time significantly.

[float]
==== Scoring capabilities

Expand Down

0 comments on commit f1e944a

Please sign in to comment.