Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parquet small file reading optimization #595

Merged
merged 88 commits into from
Aug 26, 2020

Commits on Jul 30, 2020

  1. Configuration menu
    Copy the full SHA
    484c781 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    1d3dd3f View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    c168214 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    b2c2959 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    ccdf32d View commit details
    Browse the repository at this point in the history

Commits on Jul 31, 2020

  1. Fix databricks package name

    tgravescs committed Jul 31, 2020
    Configuration menu
    Copy the full SHA
    40c41e2 View commit details
    Browse the repository at this point in the history

Commits on Aug 3, 2020

  1. Configuration menu
    Copy the full SHA
    5afddf0 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    566520e View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    5054c8a View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    5c0cee4 View commit details
    Browse the repository at this point in the history

Commits on Aug 5, 2020

  1. debug

    tgravescs committed Aug 5, 2020
    Configuration menu
    Copy the full SHA
    048b4ff View commit details
    Browse the repository at this point in the history
  2. Fix order issue

    tgravescs committed Aug 5, 2020
    Configuration menu
    Copy the full SHA
    3117550 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    4857510 View commit details
    Browse the repository at this point in the history
  4. cleanup

    tgravescs committed Aug 5, 2020
    Configuration menu
    Copy the full SHA
    64981ce View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    560cc81 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    84dc48e View commit details
    Browse the repository at this point in the history
  7. refactor

    tgravescs committed Aug 5, 2020
    Configuration menu
    Copy the full SHA
    bbade25 View commit details
    Browse the repository at this point in the history

Commits on Aug 6, 2020

  1. disable mergeschema

    tgravescs committed Aug 6, 2020
    Configuration menu
    Copy the full SHA
    13fbd4d View commit details
    Browse the repository at this point in the history
  2. add check for mergeSchema

    tgravescs committed Aug 6, 2020
    Configuration menu
    Copy the full SHA
    73a212a View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    a87af11 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    c5b8a6e View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    d2ac90a View commit details
    Browse the repository at this point in the history
  6. remove extra values reader

    tgravescs committed Aug 6, 2020
    Configuration menu
    Copy the full SHA
    3e014a8 View commit details
    Browse the repository at this point in the history
  7. Fixes

    tgravescs committed Aug 6, 2020
    Configuration menu
    Copy the full SHA
    c13d8f3 View commit details
    Browse the repository at this point in the history

Commits on Aug 10, 2020

  1. Debug

    tgravescs committed Aug 10, 2020
    Configuration menu
    Copy the full SHA
    6c53c45 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    8965b81 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    1c05dc4 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    ed7d7fb View commit details
    Browse the repository at this point in the history
  5. Finding InputFileName works

    tgravescs committed Aug 10, 2020
    Configuration menu
    Copy the full SHA
    95abe40 View commit details
    Browse the repository at this point in the history
  6. finding input file working

    tgravescs committed Aug 10, 2020
    Configuration menu
    Copy the full SHA
    c75df6c View commit details
    Browse the repository at this point in the history

Commits on Aug 11, 2020

  1. Configuration menu
    Copy the full SHA
    2efbfe8 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    b14cbe8 View commit details
    Browse the repository at this point in the history
  3. Add more tests

    tgravescs committed Aug 11, 2020
    Configuration menu
    Copy the full SHA
    4ae5fff View commit details
    Browse the repository at this point in the history
  4. Add GPU metrics to GpuFileSourceScanExec

    Signed-off-by: Jason Lowe <jlowe@nvidia.com>
    jlowe committed Aug 11, 2020
    Configuration menu
    Copy the full SHA
    a457dd8 View commit details
    Browse the repository at this point in the history
  5. remove log messages

    tgravescs committed Aug 11, 2020
    Configuration menu
    Copy the full SHA
    4767453 View commit details
    Browse the repository at this point in the history
  6. Docs

    tgravescs committed Aug 11, 2020
    Configuration menu
    Copy the full SHA
    8b942a8 View commit details
    Browse the repository at this point in the history
  7. cleanup

    tgravescs committed Aug 11, 2020
    Configuration menu
    Copy the full SHA
    c4ed0bc View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    8f5480c View commit details
    Browse the repository at this point in the history

Commits on Aug 12, 2020

  1. Add test for bucketing

    tgravescs committed Aug 12, 2020
    Configuration menu
    Copy the full SHA
    9875969 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    a95c964 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    4f7d77a View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    dbd62ef View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    3acf602 View commit details
    Browse the repository at this point in the history
  6. Commonize some code

    tgravescs committed Aug 12, 2020
    Configuration menu
    Copy the full SHA
    8dfb0e4 View commit details
    Browse the repository at this point in the history
  7. Cleanup

    tgravescs committed Aug 12, 2020
    Configuration menu
    Copy the full SHA
    e7309e9 View commit details
    Browse the repository at this point in the history
  8. fixes

    tgravescs committed Aug 12, 2020
    Configuration menu
    Copy the full SHA
    8f6f8df View commit details
    Browse the repository at this point in the history
  9. Extract GpuFileSourceScanExec from shims

    Signed-off-by: Jason Lowe <jlowe@nvidia.com>
    jlowe committed Aug 12, 2020
    Configuration menu
    Copy the full SHA
    341c400 View commit details
    Browse the repository at this point in the history
  10. Add more tests

    tgravescs committed Aug 12, 2020
    Configuration menu
    Copy the full SHA
    7f78c7f View commit details
    Browse the repository at this point in the history
  11. comments

    tgravescs committed Aug 12, 2020
    Configuration menu
    Copy the full SHA
    9f74fc4 View commit details
    Browse the repository at this point in the history

Commits on Aug 13, 2020

  1. update test

    tgravescs committed Aug 13, 2020
    Configuration menu
    Copy the full SHA
    3a5e08f View commit details
    Browse the repository at this point in the history
  2. Pass metrics via GPU file format rather than custom options map

    Signed-off-by: Jason Lowe <jlowe@nvidia.com>
    jlowe committed Aug 13, 2020
    Configuration menu
    Copy the full SHA
    b0fd541 View commit details
    Browse the repository at this point in the history
  3. working

    tgravescs committed Aug 13, 2020
    Configuration menu
    Copy the full SHA
    8172861 View commit details
    Browse the repository at this point in the history
  4. pass schema around properly

    tgravescs committed Aug 13, 2020
    Configuration menu
    Copy the full SHA
    dba69fb View commit details
    Browse the repository at this point in the history
  5. fix value from tuple

    tgravescs committed Aug 13, 2020
    Configuration menu
    Copy the full SHA
    e21e68b View commit details
    Browse the repository at this point in the history
  6. Rename case class

    tgravescs committed Aug 13, 2020
    Configuration menu
    Copy the full SHA
    b2aa3bd View commit details
    Browse the repository at this point in the history
  7. Update tests

    tgravescs committed Aug 13, 2020
    Configuration menu
    Copy the full SHA
    cf05611 View commit details
    Browse the repository at this point in the history
  8. Update code checking for DataSourceScanExec

    Signed-off-by: Jason Lowe <jlowe@nvidia.com>
    jlowe committed Aug 13, 2020
    Configuration menu
    Copy the full SHA
    ee8c0b5 View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    c98f818 View commit details
    Browse the repository at this point in the history
  10. Fix scaladoc warning and unused imports

    Signed-off-by: Jason Lowe <jlowe@nvidia.com>
    jlowe committed Aug 13, 2020
    Configuration menu
    Copy the full SHA
    3474fcf View commit details
    Browse the repository at this point in the history
  11. Configuration menu
    Copy the full SHA
    03ca8e4 View commit details
    Browse the repository at this point in the history
  12. refactor memory checks

    tgravescs committed Aug 13, 2020
    Configuration menu
    Copy the full SHA
    9a88e7a View commit details
    Browse the repository at this point in the history
  13. Fix copyright

    Signed-off-by: Jason Lowe <jlowe@nvidia.com>
    jlowe committed Aug 13, 2020
    Configuration menu
    Copy the full SHA
    e2fef62 View commit details
    Browse the repository at this point in the history
  14. Configuration menu
    Copy the full SHA
    5cb7ac2 View commit details
    Browse the repository at this point in the history
  15. Configuration menu
    Copy the full SHA
    a9439fb View commit details
    Browse the repository at this point in the history
  16. Configuration menu
    Copy the full SHA
    b7d42ef View commit details
    Browse the repository at this point in the history
  17. Cleanup

    tgravescs committed Aug 13, 2020
    Configuration menu
    Copy the full SHA
    ee09ba5 View commit details
    Browse the repository at this point in the history

Commits on Aug 14, 2020

  1. remove bucket test for now

    tgravescs committed Aug 14, 2020
    Configuration menu
    Copy the full SHA
    b85400c View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    a676ada View commit details
    Browse the repository at this point in the history
  3. formatting

    tgravescs committed Aug 14, 2020
    Configuration menu
    Copy the full SHA
    c97ab17 View commit details
    Browse the repository at this point in the history
  4. Fixes

    tgravescs committed Aug 14, 2020
    Configuration menu
    Copy the full SHA
    0200dd9 View commit details
    Browse the repository at this point in the history
  5. Add more tests

    tgravescs committed Aug 14, 2020
    Configuration menu
    Copy the full SHA
    f4d155d View commit details
    Browse the repository at this point in the history

Commits on Aug 19, 2020

  1. Configuration menu
    Copy the full SHA
    fd0545f View commit details
    Browse the repository at this point in the history
  2. Merge conflict

    Signed-off-by: Thomas Graves <tgraves@nvidia.com>
    tgravescs committed Aug 19, 2020
    Configuration menu
    Copy the full SHA
    63312fb View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    24f43c7 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    5157cce View commit details
    Browse the repository at this point in the history

Commits on Aug 20, 2020

  1. Fix merge conflict

    Signed-off-by: Thomas Graves <tgraves@nvidia.com>
    tgravescs committed Aug 20, 2020
    Configuration menu
    Copy the full SHA
    ddc9e54 View commit details
    Browse the repository at this point in the history
  2. enable parquet bucket tests and change warning

    Signed-off-by: Thomas Graves <tgraves@nvidia.com>
    tgravescs committed Aug 20, 2020
    Configuration menu
    Copy the full SHA
    6670cfe View commit details
    Browse the repository at this point in the history
  3. cleanup

    Signed-off-by: Thomas Graves <tgraves@nvidia.com>
    tgravescs committed Aug 20, 2020
    Configuration menu
    Copy the full SHA
    513c8ed View commit details
    Browse the repository at this point in the history
  4. remove debug logs

    Signed-off-by: Thomas Graves <tgraves@nvidia.com>
    tgravescs committed Aug 20, 2020
    Configuration menu
    Copy the full SHA
    b1b658b View commit details
    Browse the repository at this point in the history
  5. Move FilePartition creation to shim

    Signed-off-by: Thomas Graves <tgraves@apache.org>
    tgravescs committed Aug 20, 2020
    Configuration menu
    Copy the full SHA
    a4f9571 View commit details
    Browse the repository at this point in the history
  6. Add better message for mergeSchema

    Signed-off-by: Thomas Graves <tgraves@apache.org>
    tgravescs committed Aug 20, 2020
    Configuration menu
    Copy the full SHA
    0a3a586 View commit details
    Browse the repository at this point in the history

Commits on Aug 24, 2020

  1. Address review comments. Add in withResources and closeOnExcept and m…

    …inor things.
    
    Signed-off-by: Thomas Graves <tgraves@nvidia.com>
    tgravescs committed Aug 24, 2020
    Configuration menu
    Copy the full SHA
    7b19a0c View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    97913e2 View commit details
    Browse the repository at this point in the history
  3. Fix spacing

    Signed-off-by: Thomas Graves <tgraves@nvidia.com>
    tgravescs committed Aug 24, 2020
    Configuration menu
    Copy the full SHA
    8370651 View commit details
    Browse the repository at this point in the history
  4. Fix databricks support and passing arguments

    Signed-off-by: Thomas Graves <tgraves@nvidia.com>
    tgravescs committed Aug 24, 2020
    Configuration menu
    Copy the full SHA
    ef4aa7e View commit details
    Browse the repository at this point in the history

Commits on Aug 25, 2020

  1. fix typo in db

    Signed-off-by: Thomas Graves <tgraves@nvidia.com>
    tgravescs committed Aug 25, 2020
    Configuration menu
    Copy the full SHA
    7557d71 View commit details
    Browse the repository at this point in the history
  2. Update config description

    Signed-off-by: Thomas Graves <tgraves@nvidia.com>
    tgravescs committed Aug 25, 2020
    Configuration menu
    Copy the full SHA
    fd942a3 View commit details
    Browse the repository at this point in the history

Commits on Aug 26, 2020

  1. Rework

    Signed-off-by: Thomas Graves <tgraves@nvidia.com>
    tgravescs committed Aug 26, 2020
    Configuration menu
    Copy the full SHA
    8a29098 View commit details
    Browse the repository at this point in the history