-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow IncrementalIndex to store Long/Float dimensions #2263
Conversation
@@ -74,6 +76,80 @@ | |||
{ | |||
private volatile DateTime maxIngestedEventTime; | |||
|
|||
private static final Map<String, ValueType> typeMap = ImmutableMap.<String, ValueType>builder() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
static variable name in UPPERCASE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nishantmonu51 addressed
94c8b13
to
7de6bdc
Compare
@@ -74,6 +77,76 @@ | |||
{ | |||
private volatile DateTime maxIngestedEventTime; | |||
|
|||
private static final Map<String, ValueType> TYPE_MAP = ImmutableMap.<String, ValueType>builder() | |||
.put("Long[]", ValueType.LONG) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, do we have any parser in Druid currently that returns []
type for multi-valued dimension ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pjain1 Do you mean if the multi-value array can have subitems with an array[] type? If so, I don't think that will occur, the values would be single numerics or Strings for the JSON parsers and all single strings for the delimited parsers, I believe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then why there is a need for []
types here as TYPE_MAP
is used like TYPE_MAP.get(singleVal.getClass().getSimpleName())
where singleValue as you said "would be single numerics or Strings for the JSON parsers and all single strings for the delimited parsers, I believe."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pjain1 It's not strictly necessary now, I think I had it in there from an older implementation where I used the TYPE_MAP to get a ValueType from a Comparable[].
I kept it there in case, but I can remove it if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jon-wei I was just curious to know why it was there. No need to remove if it does not hurt anything.
closed for now to rebase to master, will reopen |
dont' force-push your branch until the PR is open or it will render this unable to re-open |
@drcrallen yeah, forgot about that yesterday =\ I was able to re-open it though by doing a "git reset --hard" to the SHA at the time the PR was closed |
@jon-wei we need to rebase this |
6850331
to
7a92273
Compare
Updated this to rebase on top of changes introduced by: The PR above changed TimeAndDims to use a dictionary encoded int[] array instead of String[]; this patch uses dictionary encoding for Long/Float values as well (unnecessary). I plan to change TimeAndDims to use Comparable[] instead of int[] in a later PR, still WIP in my fork: Strings will store a dictionary encoding in TimeAndDims as in PR #2085, numerics will store their values directly in TimeAndDims. |
f2aa86d
to
8e51849
Compare
@@ -0,0 +1,178 @@ | |||
package io.druid.benchmark; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
header
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fjy added license header
👍 |
dims[dimIndex][i] = dimLookups[dimIndex].idToIndex(dimValues[dimIndex][i]); | ||
dims[dimIndex][i] = sortedDimLookups[dimIndex].getSortedIdFromUnsortedId(dimValues[dimIndex][i]); | ||
//TODO: in later PR, Rowboat will use Comparable[][] instead of int[][] | ||
// Can remove dictionary encoding for numeric dims then. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please file an issue for this todo.
What is the behavior of null for Long/Float dimensions ? can we also add tests for null values in Long/Float dimension to make sure it behaves as expected. |
Also need to add docs for the added functionality. |
RE: null handling, the nulls are being converted to 0. It's not in this PR, but a follow on PR that I am finishing up: That PR reuses the existing single-value long/float column formats for the numeric dims, representing null without changes to the storage format (need some mechanism to distinguish null from normal values) or using dictionary encoding (have a dictionary entry for null) seems problematic. Throwing an exception sounds okay to me too (maybe let user configure? choice of "throw exception" or "convert null to 0") RE: docs and tests, this open PR doesn't actually enable the use of long/float dimensions, it's just preparing IncrementalIndex for later changes:
I address those points and have tests in the follow-on patch (not ready yet), I think it would make more sense to include docs in the later PR when the numeric dims are actually usable |
@fjy @nishantmonu51 will file issues for TODOs |
+1, I think this is ready to be merged. docs can be added in the follow up PR. |
@jon-wei do you mind resolving merge conflicts? |
@fjy rebased |
Allow IncrementalIndex to store Long/Float dimensions
This PR updates IncrementalIndex such that it can ingest and store Long/Float dimensions in addition to Strings. It is intended as a step towards the larger goals being discussed in the following topic:
https://groups.google.com/forum/#!topic/druid-development/obtfNJnXPDg/discussion
I ran the row-adding benchmark with type discovery enabled for testing purposes and the numbers are shown below:
With patch, non-String dims:
Without patch, String dims only: