Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add reusable HistogramValue object #49799

Merged
merged 7 commits into from
Dec 4, 2019

Conversation

iverase
Copy link
Contributor

@iverase iverase commented Dec 3, 2019

In #49683 a new field mapper was introduced which supports percentile aggregations via binary doc values. Those are complex values that are interface to the user via HistogramValue interface.

This field mapper generates the doc values and it currently creates an object per doc value of type HistogramValue. This PR adds a new class InternalHistogramValue that implements HistogramValue which can be reused so we create one object per segment instead of per document.

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (:Analytics/Aggregations)


/** reset the value for the histogram */
void reset(BytesRef bytesRef) {
streamInput = new ByteBufferStreamInput(ByteBuffer.wrap(bytesRef.bytes, bytesRef.offset, bytesRef.length));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is little value in reusing the histogram if you still create new inputs here. You might want to have a look at ByteArrayDataInput#reset.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was a bit annoyed with that. Still now I am encoding doubles as longs and using ByteArrayDataInput for deserialising. I found a bit weird I am using different family of Input/Output classes to read / write. Is that ok / safe?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. I have a slight preference for updating the write logic to be symmetric, but could be convinced either way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a bit a catch 22.

The ByteArrayDataOutput needs to be provided an array before hand so you need to know the size of the serialise object before hand. ByteBufferStreamOutput abstract out all that complexity.

The ByteBufferStreamInput is not reusable, ByteArrayDataInput is.

Maybe I am missing something but it seems something is missing. I am seeing this pattern where we are using more complex binary doc values and it seems logic to have a strategy to be reusing those wrappers, wdyt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of using ByteBuffersDataOutput.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jpountz

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a minor comment, LGTM otherwise. Feel free to push without further reviews.

streamOutput.writeVInt(count);
streamOutput.writeDouble(values.get(i));
dataOutput.writeVInt(count);
dataOutput.writeLong(NumericUtils.doubleToSortableLong(values.get(i)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we don't need ordering, let's just do Double.doubleToRawLongBits, which is cheaper.

public boolean next() {
if (streamInput.eof() == false) {
count = streamInput.readVInt();
value = NumericUtils.sortableLongToDouble(streamInput.readLong());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and use Double.longBitsToDouble here.

@iverase iverase merged commit d8b9b02 into elastic:master Dec 4, 2019
@iverase iverase deleted the InternalHistogramValue branch December 4, 2019 10:52
iverase added a commit that referenced this pull request Dec 4, 2019
Adds a reusable implementation of HistogramValue so we do not create
an object per document.
SivagurunathanV pushed a commit to SivagurunathanV/elasticsearch that referenced this pull request Jan 23, 2020
Adds a reusable implementation of HistogramValue so we do not create
an object per document.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants