-
Notifications
You must be signed in to change notification settings - Fork 24.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Return matching nested inner objects per hit #3022
Comments
+1 |
1 similar comment
+1 |
I'm curious on the intended behaviour of this feature:
The answers to these questions will have implications in how we proceed in implementing our current application. |
Sorting on nested documents has been supported since the 0.90 release: #2662 Nested queries always returns the parent so I am assuming the behavior will remain the same. Hopefully this feature will have many settings, similar to most other elasticsearch features. And I hate sounding like a broken record, but can we please stop with the +1s? The elasticsearch team is not influenced by them and they only create noise. |
By "global sort", a mean without regard to parent-nested relationship. That is, it is possible to return sorted children which may not be contiguous with respect to their parent. For example:
Notice how different parents are interleaved.
It would be nice to have flexibility here as you describe.
Message received, sorry about that. |
IMHO, your use case is better suited for parent/child documents and not nested documents. The way I see things is that inner/nested documents always form a single document with the outer/parent document. The inner/nested documents never appear separately. This feature breaks that model slightly by not returning certain nested documents, but the parent is always the same. Of course, I do not work for elasticsearch so my views and thoughts have no bearing on the issue. :) BTW, there was nothing wrong with your comment. Adding discussion to an issue via a concrete use case provides value and is the type of comment we should be seeing. A comment with nothing but +1 does not provide value. Perhaps I should just create an email filter that ignores github messages with only +1. |
Parent-Child has the problem of using ALOT of in-memory for the joins. Since there might be lots of tags per photo, I want to get just the relevant tags (don't care about getting the parent really, though I'd rather not). Parent-Child just can't handle this. with 7GB of memory, The machine takes forever to do the joins, and sometimes crashes. Also, I did not know the +1 was a bother. I thought it helped you guys prioritize features. |
I never said parent-child was efficient, just that its functionality is better suited to your use case. :) Even if nested documents eventually supported your use case, the overhead of sorting will also be it grossly inefficient. Each parent document would need to be scored several times. As far as +1 goes, there has been some discussion about them. There are a few issues that are 2-3 years old that have hundreds of +1s. You can make the judgement if they are effective or not. I am not on the elasticsearch team so everyone should follow their advice on proper github etiquette and not mine. :) |
This may be true given what lucene currently supports for
|
@btiernay @brusic The idea is that the nested inner objects hits are included in the root doc hit. Something like this:
In the above case It should be possible to specify a global sort and a sort inside the root document and what to show per nested hit (the complete inner object based on the source or just some fields). In addition supporting highlighting and other per hit features makes a lot of sense as well. @eranid The memory usage of the parent/child have been reduced in the new |
@martijnvg, so the full source will still be returned? The nested hits is a great idea in terms of flexibility and makes more sense than editing the source (which I referred to above in "breaking the model"), I just hope that it is efficient. I have some convoluted logic to deal with filtering nested documents on the client side, and the serialization/deserialization using Jackson is a bit of a performance hit. Can scoring be avoid on the nested hits results? My use case calls for scoring using the fields in the parent document, but only filtering the nested documents. Not sure if you thought of this scenario, but a flexible scoring model would be a great feature. |
@brusic The full source can optionally returned if that is requested, but it isn't necessary. The source of the nested inner object will be separately returned, but is based on the source in the root document. The source can also be disabled and individual fields can be separately be set to stored in the mapping, these individual fields can then be requested instead of the source. The overhead of fetching inner nested objects should be small. This should be done in the fetch phase (so only for the competitive root docs) by re-executing the inner query of the nested query only on the nested docs of the root docs to be retrieved (a big filter). Not sure what you mean with the avoiding the scoring on neste hits. Just use a field from the parent for scoring via sorting by script? |
@martijnvg: Very nice proposal. A couple of clarifications:
When you say "global sort" do you mean global with respect to the root document, or with respect to nested documents? I could see how you might be implying the ability to do either.
I assume you mean "source" not "sort"? |
@brusic: With respect to:
I think this really depends on the size and structure of your documents. We have some very large documents (deep and wide) for which the ability to return the nested documents without "editing" the source would be much more efficient. |
@eranid to add to what @martijnvg said: up until 0.90.1, parent-child relationships required the parent IDs and child IDs to be held in memory. From 0.90.1 onwards, only the parent IDs need to be held in memory. This is a massive saving and should make parent-child much lighter. |
@btiernay The global sort is with respect to the root document. You could use nested sorting as global sorting which will base the ordering of root docs based on aggregate sort values from the nested inner objects.
Yes, I meant source. |
We definitely want to get this feature in, but in order get in it in right, a refactoring is needed in the fetch phase. |
@martijnvg To be clear, I suppose there would be no way of inverting the relationship to sort globally based on nested docs (effectively ignoring the root-nested grouping) globally? If so, is this due to a Lucene imposed limitation? |
@btiernay You can sort globally based on the nested docs with the current nested sorting support. The global nested sorting won't be changed when inner hits are added that allows to sort nested hits per root / main document hit. Makes sense? |
@martijnvg: Sorry for being so dense here, but it is still unclear if I can return nested docs as the root document using this approach. Then, I would be able to sort by the nested doc, without regard to parents, very similar to how parent-child relationships work. |
@btiernay No, with this approach the nested inner objects can't be a root document on its own. Nested inner objects are always part of the root document. |
@martijnvg: Thanks again for the clarification. Much appreciated. I realize your answer / solution is consistent with the other aspects of nested docs (e.g whole part relationships). However, I'm very curious if my proposal is technically feasible since I think it could be very powerful and more performant than the alternative parent-child approach. |
@btiernay I think your idea is technically possible. Right now the inner nested objects don't have a unique identifier like regular root document have. In theory we could use the path + the offset in the nested array as additional data to the root documents's unique identifier for the inner nested object's unique key. Also inner objects are tightly coupled to the lifecycle of the root document. If a root document is removed all the nested inner objects (which are stored as separate Lucene documents) are removed as well. Updating or adding individual nested inner objects isn't possible without reindexing the root document and all other nested inner objects (Lucene document block). If nested inner objects were exposed as independent hits in the search result, I guess the fact that these hits have limitations would be confusing. |
That gives me hope then :)
That's an interesting idea. I hadn't thought about the
Perhaps, but consider "write once" applications in which the documents rarely, (if ever) change. Given the potential speedup / memory improvements that can be achieved using block documents (especially for deeply nested or wide documents), it would be a shame to not expose this functionality. |
any progress on this one? cause i'd love to see this. |
I would also like to know if there is any progress on that feature. Any way we could help out? |
+1 |
Since #7164 has been merged, where does that leave this issue? |
@brusic It is getting close. Work is being done on a PR that adds inner_hits for including nested inner objects / children hits in regular search hits. |
+1 |
1 similar comment
+1 |
@martijnvg this feature would be super useful for my use case. We have products that contain an array of material subdocuments with attributes attached to those materials (price, title, color... etc). We need the ability to be able to see results on both the product and material level. Any word on a timeline for this "inner_hits" feature? For now I am contemplating having two product types a rolled_up product and a material type. Search now entails two queries one for the matching style. Then one for the material that has a style code matching the first query. |
In our case we have a CMS with (like most other such systems) a model of Content -< Location, and we would like to be able to search on content as well as locations without having to index twice. Potentially tricky thing is how this feature would work when searching for the nested documents (Locations) and getting hits for several of them. Ideally in our case we would prefer several search hits (Content) with corresponding inner object hits (Location), so sorting is correct from elastic search side. |
@martijnvg Thanks for all the hard work. What is the current PR that is being worked on? I would like to try out some development branches. I'm hoping to see a 1.5 tag someday. :) |
Thank you @martijnvg would love to see this PR land, as it would be perfect for our use case, and I'm sure, many others' as well. |
+1 |
Well if @s1monw +1ed the issue, then it must be important. Nevermind my constant pestering. :) |
+1 |
2 similar comments
+1 |
+1 |
Inner hits allows to embed nested inner objects, children documents or the parent document that contributed to the matching of the returned search hit as inner hits, which would otherwise be hidden. Closes elastic#8153 Closes elastic#3022 Closes elastic#3152
+1 |
Ricardo, this feature is already live. I believe in version 1.5
|
sorry, the topic was huge and I couldn't read it all. You mean that I can make a query and return the nested documents, instead of the main doc? So far I have a workaround, I use _source to help myself and I plug some python to the mix.... |
Are you sure? I get the following error: nested: QueryParsingException[[crawler_2015-04-14] [nested] filter does not support [inner_hits]]; }]", |
Which version are you using? The feature was added in 1.5
|
In my case I am using the version 1.4.4... We were using 0.9, now we have just migrated to 1.4, and then you tell me that this feature is avaiable in a new release? :( |
+1 |
Add support for including the matching nested inner objects per hit element.
The text was updated successfully, but these errors were encountered: