Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Long and Float numeric dimensions (without indexing) #2442

Closed
wants to merge 2 commits into from

Conversation

jon-wei
Copy link
Contributor

@jon-wei jon-wei commented Feb 11, 2016

This PR adds support for Long and Float numeric dimensions.

It depends on the following two PRs:

Implementation notes:

  • Dictionary encoding is not used for Long and Float dimensions. No index structures are currently created for numeric dimensions. Numeric dims will use the existing long/float column formats on disk.
  • The DimensionSelector interface now has additional functions for retrieving rows from numeric dimension columns without dictionary encoding.
  • A new DimensionSelector, UnencodedDimensionSelector, has been added for use with numeric dims.
  • As there are no bitmap indexes on numeric dims, a full scan will be performed on numeric dims for filtering logic that uses filter.getBitmapIndex()
  • To support the requirement above, ColumnSelectorBitmapIndexSelector now supports bitmap generation from the result of a column scan
  • SelectorFilter and BoundFilter can do direct numeric comparisons on Long/Float columns, no string conversion needed
  • Long and Float values will be converted to Strings for any filters or extraction functions that require String inputs.
  • null values for Long and Float dims will be converted to 0L or 0.0F, respectively.

Unit test notes:

  • The druid.sample.tsv file has been modified to include Long and Float dimension columns (market_long, market_float, quality_long, quality_float). The original druid.sample.tsv file has been kept in the repo.
  • The numeric values for these new dimensions have a 1-to-1 mapping with the values in 'market' and 'quality'
  • For larger test suites like GroupByQueryRunnerTest and TopN, I have parameterized the use of the "quality" and "market" dimensions; in different test iterations, the tests will use the long or float versions of these dimension columns.
  • Smaller test suites have separate functions for testing numeric dimensions.

@fjy
Copy link
Contributor

fjy commented Feb 11, 2016

@jon-wei this is failing UT

@jon-wei
Copy link
Contributor Author

jon-wei commented Feb 11, 2016

@fjy This PR won't be able to pass UT as-is, it depends on a change to DimensionsSpec in druid-api:
druid-io/druid-api#74

I'm opening it now to get the review process started

@jon-wei
Copy link
Contributor Author

jon-wei commented Feb 13, 2016

Based on a discussion with @gianm this afternoon, I'm looking into abstracting out a generic/extensible "dimension handling" interface across the IncrementalIndex/IndexMerger/querying paths.

Will update this PR as I make progress.

@jon-wei jon-wei closed this Mar 8, 2016
@jon-wei jon-wei deleted the flex_dims_feb16 branch October 6, 2017 22:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants