Generated on 2020-12-14
#444 | [FEA] Plugable Cache |
#1158 | [FEA] Better documentation on type support |
#57 | [FEA] Support INT96 for parquet reads and writes |
#1003 | [FEA] Reduce overlap between RapidsHostColumnVector and RapidsHostColumnVectorCore |
#913 | [FEA] In Pluggable Cache Support CalendarInterval while creating CachedBatches |
#1092 | [FEA] In Pluggable Cache handle nested types having CalendarIntervalType and NullType |
#670 | [FEA] Support NullType |
#50 | [FEA] support spark.sql.legacy.timeParserPolicy |
#1144 | [FEA] Remove Databricks 3.0.0 shim layer |
#1096 | [FEA] Implement parquet CreateDataSourceTableAsSelectCommand |
#688 | [FEA] udf compiler should be auto-appended to spark.sql.extensions |
#502 | [FEA] Support Databricks 7.3 LTS Runtime |
#764 | [FEA] Sanity checks for cudf jar mismatch |
#1018 | [FEA] Log details related to GPU memory fragmentation on GPU OOM |
#619 | [FEA] log whether libcudf and libcudfjni were built for PTDS |
#905 | [FEA] create AWS EMR 3.0.1 shim |
#838 | [FEA] Support window count for a column |
#864 | [FEA] config option to enable RMM arena memory resource |
#430 | [FEA] Audit: Parquet Writer support for TIMESTAMP_MILLIS |
#818 | [FEA] Create shim layer for AWS EMR |
#608 | [FEA] Parquet small file optimization improve handle merge schema |
#446 | [FEA] Test jucx in 1.9.x branch |
#1038 | [FEA] Accelerate the data transfer for plan WindowInPandasExec |
#533 | [FEA] Improve PTDS performance |
#849 | [FEA] Have GpuColumnarBatchSerializer return GpuColumnVectorFromBuffer instances |
#784 | [FEA] Allow Host Spilling to be more dynamic |
#627 | [FEA] Further parquet reading small file improvements |
#5 | [FEA] Support Adaptive Execution |
#1279 | [BUG] TPC-DS query 2 failing with NPE |
#1280 | [BUG] TPC-DS query 93 failing with UnsupportedOperationException |
#1308 | [BUG] TPC-DS query 14a runs much slower on 0.3 |
#1284 | [BUG] TPC-DS query 77 at scale=1TB fails with maxResultSize exceeded error |
#1061 | [BUG] orc_test.py is failing |
#1197 | [BUG] java.lang.NullPointerException when exporting delta table |
#685 | [BUG] In ParqueCachedBatchSerializer, serializing parquet buffers might blow up in certain cases |
#1269 | [BUG] GpuSubstring is not expected to be a part of a SortOrder |
#1246 | [BUG] Many TPC-DS benchmarks fail when writing to Parquet |
#961 | [BUG] ORC predicate pushdown should work with case-insensitive analysis |
#962 | [BUG] Loading columns from an ORC file without column names returns no data |
#1245 | [BUG] Code adding buffers to the spillable store should synchronize |
#570 | [BUG] Continue debugging OOM after ensuring device store is empty |
#972 | [BUG] total time metric is redundant with scan time |
#1039 | [BUG] UNBOUNDED window ranges on null timestamp columns produces incorrect results. |
#1195 | [BUG] AcceleratedColumnarToRowIterator queue empty |
#1177 | [BUG] leaks possible in the rapids shuffle if batches are received after the task completes |
#1216 | [BUG] Failure to recognize ORC file format when loaded via Hive |
#898 | [BUG] count reductions are failing on databricks because lack for Complete support |
#1184 | [BUG] test_window_aggregate_udf_array_from_python fails on databricks 3.0.1 |
#1151 | [BUG]Add databricks 3.0.1 shim layer for GpuWindowInPandasExec. |
#1199 | [BUG] No data size in Input column in Stages page from Spark UI when using Parquet as file source |
#1031 | [BUG] dependency info properties file contains error messages |
#1149 | [BUG] Scaladoc warnings in GpuDataSource |
#1185 | [BUG] test_hash_multiple_mode_query failing |
#724 | [BUG] PySpark test_broadcast_nested_loop_join_special_case intermittent failure |
#1164 | [BUG] ansi_cast tests are failing in 3.1.0 |
#1110 | [BUG] Special date "now" has wrong value on GPU |
#1139 | [BUG] Host columnar to GPU can be very slow |
#1094 | [BUG] unix_timestamp on GPU returns invalid data for special dates |
#1098 | [BUG] unix_timestamp on GPU returns invalid data for bad input |
#1082 | [BUG] string to timestamp conversion fails with split |
#1140 | [BUG] ConcurrentModificationException error after scala test suite completes |
#1073 | [BUG] java.lang.RuntimeException: BinaryExpressions must override either eval or nullSafeEval |
#975 | [BUG] BroadcastExchangeExec fails to fall back to CPU on driver node on GCP Dataproc |
#773 | [BUG] Investigate high task deserialization |
#1035 | [BUG] TPC-DS query 90 with AQE enabled fails with doExecuteBroadcast exception |
#825 | [BUG] test_window_aggs_for_ranges intermittently fails |
#1008 | [BUG] limit function is producing inconsistent result when type is Byte, Long, Boolean and Timestamp |
#996 | [BUG] TPC-DS benchmark via spark-submit does not provide option to disable appending .dat to path |
#1006 | [BUG] Spark3.1.0 changed BasicWriteTaskStats breaks BasicColumnarWriteTaskStatsTracker |
#985 | [BUG] missing metric dataSize |
#881 | [BUG] cannot disable Sort by itself |
#812 | [BUG] Test failures for 0.2 when run with multiple executors |
#925 | [BUG]Range window-functions with non-timestamp order-by expressions not falling back to CPU |
#852 | [BUG] BenchUtils.compareResults cannot compare partitioned files when ignoreOrdering=false |
#868 | [BUG] Rounding error when casting timestamp to string for timestamps before 1970 |
#880 | [BUG] doing a window operation with an orderby for a single constant crashes |
#776 | [BUG] Integration test fails on spark 3.1.0-SNAPSHOT |
#874 | [BUG] RapidsConf.scala has some un-consistency for spark.rapids.sql.format.parquet.multiThreadedRead |
#860 | [BUG] we need to mark columns from received shuffle buffers as GpuColumnVectorFromBuffer |
#122 | [BUG] CSV Timestamp parseing is broken for TS < 1902 and TS > 2038 |
#810 | [BUG] UDF Integration tests fail if pandas is not installed |
#746 | [BUG] cudf_udf_test.py is flakey |
#811 | [BUG] 0.3 nightly is timing out |
#574 | [BUG] Fix GpuTimeSub for Spark 3.1.0 |
#1376 | MetaUtils.getBatchFromMeta should return batches with GpuColumnVectorFromBuffer |
#1358 | auto-merge: instant merge after creation [skip ci] |
#1359 | Use SortOrder from shims. |
#1343 | Do not run UDFs when the partition is empty. |
#1342 | Fix and edit docs for standalone mode |
#1350 | fix GpuRangePartitioning canonicalization |
#1281 | Documentation added for testing |
#1336 | Fix missing post-shuffle coalesce with AQE |
#1318 | Fix copying GpuFileSourceScanExec node |
#1337 | Use UTC instead of GMT |
#1307 | Fallback to cpu when reading Delta log files for stats |
#1310 | Fix canonicalization of GpuFileSourceScanExec, GpuShuffleCoalesceExec |
#1302 | Add GpuSubstring handling to SortOrder canonicalization |
#1265 | Chunking input before writing a ParquetCachedBatch |
#1278 | Add a config to disable decimal types by default |
#1272 | Add Alias to shims |
#1268 | Adds in support docs for 0.3 release |
#1235 | Trigger reading and handling control data. |
#1266 | Updating Databricks getting started for 0.3 release |
#1291 | Increase pre-merge resource requests [skip ci] |
#1275 | Temporarily disable more CAST tests for Spark 3.1.0 |
#1264 | Fix race condition in batch creation |
#1260 | Update UCX license info in NOTIFY-binary for 1.9 and RAPIDS plugin copyright dates |
#1247 | Ensure column names are valid when writing benchmark query results to file |
#1240 | Fix loading from ORC file with no column names |
#1242 | Remove compatibility documentation about unsupported INT96 |
#1192 | [REVIEW] Support GpuFilter and GpuCoalesceBatches for decimal data |
#1170 | Add nested type support to MetaUtils |
#1194 | Drop redundant total time metric from scan |
#1248 | At BatchedTableCompressor.finish synchronize to allow for "right-size… |
#1169 | Use CUDF's "UNBOUNDED" window boundaries for time-range queries. |
#1204 | Avoid empty batches on columnar to row conversion |
#1133 | Refactor batch coalesce to be based solely on batch data size |
#1237 | In transport, limit pending transfer requests to fit within a bounce |
#1232 | Move SortOrder creation to shims |
#1068 | Write int96 to parquet |
#1193 | Verify shuffle of decimal columns |
#1180 | Remove batches if they are received after the iterator detects that t… |
#1173 | Support relational operators for decimal type |
#1220 | Support replacing ORC format when Hive is configured |
#1219 | Upgrade to jucx 1.9.0 |
#1081 | Add option to upload benchmark summary JSON file |
#1217 | Aggregate reductions in Complete mode should use updateExpressions |
#1218 | Remove obsolete HiveStringType usage |
#1214 | changelog update 2020-11-30. Trigger automerge check [skip ci] |
#1210 | Support auto-merge for branch-0.4 [skip ci] |
#1202 | Fix a bug with the support for java.lang.StringBuilder.append. |
#1213 | Skip casting StringType to TimestampType for Spark 310 |
#1201 | Replace only window expressions on databricks. |
#1208 | [BUG] Fix GHSL2020-239 [skip ci] |
#1205 | Fix missing input bytes read metric for Parquet |
#1206 | Update Spark 3.1 shim for ShuffleOrigin shuffle parameter |
#1196 | Rename ShuffleCoalesceExec to GpuShuffleCoalesceExec |
#1191 | Skip window array tests for databricks. |
#1183 | Support for CalendarIntervalType and NullType |
#1150 | udf spec |
#1188 | Add in tests for parquet nested pruning support |
#1189 | Enable NullType for First and Last in 3.0.1+ |
#1181 | Fix resource leaks in unit tests |
#1186 | Fix compilation and scaladoc warnings |
#1187 | Updated documentation for distinct count compatibility |
#1182 | Close buffer catalog on device manager shutdown |
#1137 | Let GpuWindowInPandas declare ArrayType supported. |
#1176 | Add in support for null type |
#1174 | Fix race condition in SerializeConcatHostBuffersDeserializeBatch |
#1175 | Fix leaks seen in shuffle tests |
#1138 | [REVIEW] Support decimal type for GpuProjectExec |
#1162 | Set job descriptions in benchmark runner |
#1172 | Revert "Fix race condition (#1165)" |
#1060 | Show partition metrics for custom shuffler reader |
#1152 | Add spark301db shim layer for WindowInPandas. |
#1167 | Nulls out the dataframe if --gc-between-runs is set |
#1165 | Fix race condition in SerializeConcatHostBuffersDeserializeBatch |
#1163 | Add in support for GetStructField |
#1166 | Fix the cast tests for 3.1.0+ |
#1159 | fix bug where 'now' had same value as 'today' for timestamps |
#1161 | Fix nightly build pipeline failure. |
#1160 | Fix some performance problems with columnar to columnar conversion |
#1105 | [REVIEW] Change ColumnViewAccess usage to work with ColumnView |
#1148 | Add in tests for Maps and extend map support where possible |
#1154 | Mark test as xfail until we can get a fix in |
#1113 | Support unix_timestamp on GPU for subset of formats |
#1156 | Fix warning introduced in iterator suite |
#1095 | Dependency info |
#1145 | Remove support for databricks 7.0 runtime - shim spark300db |
#1147 | Change the assert to require for handling TIMESTAMP_MILLIS in isDateTimeRebaseNeeded |
#1132 | Add in basic support to read structs from parquet |
#1121 | Shuffle/better error handling |
#1134 | Support saveAsTable for writing orc and parquet |
#1124 | Add shim layers for GpuWindowInPandasExec. |
#1131 | Add in some basic support for Structs |
#1127 | Add in basic support for reading lists from parquet |
#1129 | Fix resource leaks with new shuffle optimization |
#1116 | Optimize normal shuffle by coalescing smaller batches on host |
#1102 | Auto-register UDF extention when main plugin is set |
#1108 | Remove integration test pipelines on NGCC |
#1123 | Mark Pandas udf over window tests as xfail on databricks until they can be fixed |
#1120 | Add in support for filtering ArrayType |
#1080 | Support for CalendarIntervalType and NullType for ParquetCachedSerializer |
#994 | Packs bounce buffers for highly partitioned shuffles |
#1112 | Remove bad config from pytest setup |
#1107 | closeOnExcept -> withResources in MetaUtils |
#1104 | Support lists to/from the GPU |
#1106 | Improve mechanism for expected exceptions in tests |
#1069 | Accelerate the data transfer between JVM and Python for the plan 'GpuWindowInPandasExec' |
#1099 | Update how we deal with type checking |
#1077 | Improve AQE transitions for shuffle and coalesce batches |
#1097 | Cleanup some instances of excess closure serialization |
#1090 | Fix the integration build |
#1086 | Speed up test performance using pytest-xdist |
#1084 | Avoid issues where more scalars that expected show up in an expression |
#1076 | [FEA] Support Databricks 7.3 LTS Runtime |
#1083 | Revert "Get cudf/spark dependency from the correct .m2 dir" |
#1062 | Get cudf/spark dependency from the correct .m2 dir |
#1078 | Another round of fixes for mapping of DataType to DType |
#1066 | More fixes for conversion to ColumnarBatch |
#1029 | BenchmarkRunner should produce JSON summary file even when queries fail |
#1055 | Fix build warnings |
#1064 | Use array instead of List for from(Table, DataType) |
#1057 | Fix empty table broadcast requiring a GPU on driver node |
#1047 | Sanity checks for cudf jar mismatch |
#1044 | Accelerated row to columnar and columnar to row transitions |
#1056 | Add query number to Spark app name when running benchmarks |
#1054 | Log total RMM allocated on GPU OOM |
#1053 | Remove isGpuBroadcastNestedLoopJoin from shims |
#1052 | Allow for GPUCoalesceBatch to deal with Map |
#1051 | Add simple retry for URM dependencies [skip ci] |
#1046 | Fix broken links |
#1017 | Log whether PTDS is enabled |
#1040 | Update to cudf 0.17-SNAPSHOT and fix tests |
#1042 | Fix inconsistencies in AQE support for broadcast joins |
#1037 | Add in support for the SQL functions Least and Greatest |
#1036 | Increase number of retries when waiting for databricks cluster |
#1034 | [BUG] To honor spark.rapids.memory.gpu.pool=NONE |
#854 | Arbitrary function call in UDF |
#1028 | Update to cudf-0.16 |
#1023 | Add --gc-between-run flag for TPC* benchmarks. |
#1001 | ColumnarBatch to CachedBatch and back |
#990 | Parquet coalesce file reader for local filesystems |
#1014 | Add --append-dat flag for TPC-DS benchmark |
#991 | Updated GCP Dataproc Mortgage-ETL-GPU.ipynb |
#886 | Spark BinaryType and cast to BinaryType |
#1016 | Change Hash Aggregate to allow pass-through on MapType |
#984 | Add support for MapType in selected operators |
#1012 | Update for new position parameter in Spark 3.1.0 RegExpReplace |
#995 | Add shim for EMR 3.0.1 and EMR 3.0.1-SNAPSHOT |
#998 | Update benchmark automation script |
#1000 | Always use RAPIDS shuffle when running TPCH and Mortgage tests |
#981 | Change databricks build to dynamically create a cluster |
#986 | Fix missing dataSize metric when using RAPIDS shuffle |
#914 | Write InternalRow to CachedBatch |
#934 | Iterator to make it easier to work with a window of blocks in the RAPIDS shuffle |
#992 | Skip post-clean if aborted before the image build stage in pre-merge [skip ci] |
#988 | Change in Spark caused the 3.1.0 CI to fail |
#983 | clean jenkins file for premerge on NGCC |
#964 | Refactor TPC benchmarks to reduce duplicate code |
#978 | Enable scalastyle checks for udf-compiler module |
#949 | Fix GpuWindowExec to work with a CPU SortExec |
#973 | Stop reporting totalTime metric for GpuShuffleExchangeExec |
#968 | XFail pos_explode tests until final fix can be put in |
#970 | Add legacy config to clear active Spark 3.1.0 session in tests |
#918 | Benchmark runner script |
#915 | Add option to control number of partitions when converting from CSV to Parquet |
#944 | Fix some issues with non-determinism |
#935 | Add in support/tests for a window count on a column |
#940 | Fix closeOnExcept suppressed exception handling |
#942 | fix github action env setup [skip ci] |
#933 | Update first/last tests to avoid non-determinisim and ordering differences |
#931 | Fix checking for nullable columns in window range query |
#924 | Benchmark guide update for command-line interface / spark-submit |
#926 | Move pandas_udf functions into the tests functions |
#929 | Pick a default tableId to use that is non 0 so that flatbuffers allow… |
#928 | Fix RapidsBufferStore NPE when no spillable buffers are available |
#820 | Benchmarking guide |
#859 | Compare partitioned files in order |
#916 | create new sparkContext explicitly in CPU notebook |
#917 | create new SparkContext in GPU notebook explicitly. |
#919 | Add label benchmark to performance subsection in changelog |
#850 | Add in basic support for lead/lag |
#843 | [REVIEW] Cache plugin to handle reading CachedBatch to an InternalRow |
#904 | Add command-line argument for benchmark result filename |
#909 | GCP preview version image name update |
#903 | update getting-started-gcp.md with new component list |
#900 | Turn off CollectLimitExec replacement by default |
#907 | remove configs from databricks that shouldn't be used by default |
#893 | Fix rounding error when casting timestamp to string for timestamps before 1970 |
#899 | Mark reduction corner case tests as xfail on databricks until they can be fixed |
#894 | Replace whole-buffer slicing with direct refcounting |
#891 | Add config to dump heap on GPU OOM |
#890 | Clean up CoalesceBatch to use withResource |
#892 | Only manifest the current batch in cached block shuffle read iterator |
#871 | Add support for using the arena allocator |
#889 | Fix crash on scalar only orderby |
#879 | Update SpillableColumnarBatch to remove buffer from catalog on close |
#888 | Shrink detect scope to compile only [skip ci] |
#885 | [BUG] fix IT dockerfile arguments [skip ci] |
#883 | [BUG] fix IT dockerfile args ordering [skip ci] |
#875 | fix the non-consistency for spark.rapids.sql.format.parquet.multiThreadedRead in RapidsConf.scala |
#862 | Migrate nightly&integration pipelines to blossom [skip ci] |
#872 | Ensure that receive-side batches use GpuColumnVectorFromBuffer to avoid |
#833 | Add nvcomp LZ4 codec support |
#870 | Cleaned up tests and documentation for csv timestamp parsing |
#823 | Add command-line interface for TPC-* for use with spark-submit |
#856 | Move GpuWindowInPandasExec in shims layers |
#756 | Add stream-time metric |
#832 | Skip pandas tests if pandas cannot be found |
#841 | Fix a hanging issue when processing empty data. |
#840 | [REVIEW] Fixed failing cache tests |
#848 | Update task memory and disk spill metrics when buffer store spills |
#851 | Use contiguous table when deserializing columnar batch |
#857 | fix pvc scheduling issue |
#853 | Remove nodeAffinity from premerge pipeline |
#796 | Record spark plan SQL metrics to JSON when running benchmarks |
#781 | Add AQE unit tests |
#824 | Skip cudf_udf test by default |
#839 | First/Last reduction and cleanup of agg APIs |
#827 | Add Spark 3.0 EMR Shim layer |
#816 | [BUG] fix nightly is timing out |
#782 | Benchmark utility to perform diff of output from benchmark runs, allowing for precision differences |
#813 | Revert "Enable tests in udf_cudf_test.py" |
#788 | [FEA] Persist workspace data on PVC for premerge |
#805 | [FEA] nightly build trigger both IT on spark 300 and 301 |
#797 | Allow host spill store to fit a buffer larger than configured max size |
#807 | Deploy integration-tests javadoc and sources |
#777 | Enable tests in udf_cudf_test.py |
#790 | CI: Update cudf python to 0.16 nightly |
#772 | Add support for empty array construction. |
#783 | Improved GpuArrowEvalPythonExec |
#771 | Various improvements to benchmarks |
#763 | [REVIEW] Allow CoalesceBatch to spill data that is not in active use |
#727 | Update cudf dependency to 0.16-SNAPSHOT |
#726 | parquet writer support for TIMESTAMP_MILLIS |
#674 | Unit test for GPU exchange re-use with AQE |
#723 | Update code coverage to find source files in new places |
#766 | Update the integration Dockerfile to reduce the image size |
#762 | Fixing conflicts in branch-0.3 |
#738 | [auto-merge] branch-0.2 to branch-0.3 - resolve conflict |
#722 | Initial code changes to support spilling outside of shuffle |
#693 | Update jenkins files for 0.3 |
#692 | Merge shims dependency to spark-3.0.1 into branch-0.3 |
#690 | Update the version to 0.3.0-SNAPSHOT |
#696 | [FEA] run integration tests against SPARK-3.0.1 |
#455 | [FEA] Support UCX shuffle with optimized AQE |
#510 | [FEA] Investigate libcudf features needed to support struct schema pruning during loads |
#541 | [FEA] Scala UDF:Support for null Value operands |
#542 | [FEA] Scala UDF: Support for Date and Time |
#499 | [FEA] disable any kind of warnings about ExecutedCommandExec not being on the GPU |
#540 | [FEA] Scala UDF: Support for String replaceFirst() |
#340 | [FEA] widen the rendered Jekyll pages |
#602 | [FEA] don't release with any -SNAPSHOT dependencies |
#579 | [FEA] Auto-merge between branches |
#515 | [FEA] Write tests for AQE skewed join optimization |
#452 | [FEA] Update HashSortOptimizerSuite to work with AQE |
#454 | [FEA] Update GpuCoalesceBatchesSuite to work with AQE enabled |
#354 | [FEA]Spark 3.1 FileSourceScanExec adds parameter optionalNumCoalescedBuckets |
#566 | [FEA] Add support for StringSplit with an array index. |
#524 | [FEA] Add GPU specific metrics to GpuFileSourceScanExec |
#494 | [FEA] Add some AQE-specific tests to the PySpark test suite |
#146 | [FEA] Python tests should support running with Adaptive Query Execution enabled |
#465 | [FEA] Audit: Update script to audit multiple versions of Spark |
#488 | [FEA] Ability to limit total GPU memory used |
#70 | [FEA] Support StringSplit |
#403 | [FEA] Add in support for GetArrayItem |
#493 | [FEA] Implement shuffle optimization when AQE is enabled |
#500 | [FEA] Add maven profiles for testing with AQE on or off |
#471 | [FEA] create a formal process for updating the github-pages branch |
#233 | [FEA] Audit DataWritingCommandExec |
#240 | [FEA] Audit Api validation script follow on - Optimize StringToTypeTag |
#388 | [FEA] Audit WindowExec |
#425 | [FEA] Add tests for configs in BatchScan Readers |
#453 | [FEA] Update HashAggregatesSuite to work with AQE |
#184 | [FEA] Enable NoScalaDoc scalastyle rule |
#438 | [FEA] Enable StringLPad |
#232 | [FEA] Audit SortExec |
#236 | [FEA] Audit ShuffleExchangeExec |
#355 | [FEA] Support Multiple Spark versions in the same jar |
#385 | [FEA] Support RangeExec on the GPU |
#317 | [FEA] Write test wrapper to run SQL queries via pyspark |
#235 | [FEA] Audit BroadcastExchangeExec |
#234 | [FEA] Audit BatchScanExec |
#238 | [FEA] Audit ShuffledHashJoinExec |
#237 | [FEA] Audit BroadcastHashJoinExec |
#316 | [FEA] Add some basic Dataframe tests for CoalesceExec |
#145 | [FEA] Scala tests should support running with Adaptive Query Execution enabled |
#231 | [FEA] Audit ProjectExec |
#229 | [FEA] Audit FileSourceScanExec |
#326 | [DISCUSS] Shuffle read-side error handling |
#601 | [FEA] Optimize unnecessary sorts when replacing SortAggregate |
#333 | [FEA] Better handling of reading lots of small Parquet files |
#511 | [FEA] Connect shuffle table compression to shuffle exec metrics |
#15 | [FEA] Multiple threads sharing the same GPU |
#272 | [DOC] Getting started guide for UCX shuffle |
#780 | [BUG] Inner Join dropping data with bucketed Table input |
#569 | [BUG] left_semi_join operation is abnormal and serious time-consuming |
#744 | [BUG] TPC-DS query 6 now produces incorrect results. |
#718 | [BUG] GpuBroadcastHashJoinExec ArrayIndexOutOfBoundsException |
#698 | [BUG] batch coalesce can fail to appear between columnar shuffle and subsequent columnar operation |
#658 | [BUG] GpuCoalesceBatches collectTime metric can be underreported |
#59 | [BUG] enable tests for string literals in a select |
#486 | [BUG] GpuWindowExec does not implement requiredChildOrdering |
#631 | [BUG] Rows are dropped when AQE is enabled in some cases |
#671 | [BUG] Databricks hash_aggregate_test fails trying to canonicalize a WrappedAggFunction |
#218 | [BUG] Window function COUNT(x) includes null-values, when it shouldn't |
#153 | [BUG] Incorrect output from partial-only hash aggregates with multiple distincts and non-distinct functions |
#656 | [BUG] integration tests produce hive metadata files |
#607 | [BUG] Fix misleading "cannot run on GPU" warnings when AQE is enabled |
#630 | [BUG] GpuCustomShuffleReader metrics always show zero rows/batches output |
#643 | [BUG] race condition while registering a buffer and spilling at the same time |
#606 | [BUG] Multiple scans for same data source with TPC-DS query59 with delta format |
#626 | [BUG] parquet_test showing leaked memory buffer |
#155 | [BUG] Incorrect output from averages with filters in partial only mode |
#277 | [BUG] HashAggregateSuite failure when AQE is enabled |
#276 | [BUG] GpuCoalesceBatchSuite failure when AQE is enabled |
#598 | [BUG] Non-deterministic output from MapOutputTracker.getStatistics() with AQE on GPU |
#192 | [BUG] test_read_merge_schema fails on Databricks |
#341 | [BUG] Document compression formats for readers/writers |
#587 | [BUG] Spark3.1 changed FileScan which means or GpuScans need to be added to shim layer |
#362 | [BUG] Implement getReaderForRange in the RapidsShuffleManager |
#528 | [BUG] HashAggregateSuite "Avg Distinct with filter" no longer valid when testing against Spark 3.1.0 |
#416 | [BUG] Fix Spark 3.1.0 integration tests |
#556 | [BUG] NPE when removing shuffle |
#553 | [BUG] GpuColumnVector build warnings from raw type access |
#492 | [BUG] Re-enable AQE integration tests |
#275 | [BUG] TpchLike query 2 fails when AQE is enabled |
#508 | [BUG] GpuUnion publishes metrics on the UI that are all 0 |
#269 | Needed to add --conf spark.driver.extraClassPath= |
#473 | [BUG] PartMerge:countDistinct:sum fails sporadically |
#531 | [BUG] Temporary RMM workaround needs to be removed |
#532 | [BUG] NPE when enabling shuffle manager |
#525 | [BUG] GpuFilterExec reports incorrect nullability of output in some cases |
#483 | [BUG] Multiple scans for the same parquet data source |
#382 | [BUG] Spark3.1 StringFallbackSuite regexp_replace null cpu fall back test fails. |
#489 | [FEA] Fix Spark 3.1 GpuHashJoin since it now requires CodegenSupport |
#441 | [BUG] test_broadcast_nested_loop_join_special_case fails on databricks |
#347 | [BUG] Failed to read Parquet file generated by GPU-enabled Spark. |
#433 | InSet operator produces an error for Strings |
#144 | [BUG] spark.sql.legacy.parquet.datetimeRebaseModeInWrite is ignored |
#323 | [BUG] GpuBroadcastNestedLoopJoinExec can fail if there are no columns |
#356 | [BUG] Integration cache test for BroadcastNestedLoopJoin failure |
#280 | [BUG] Full Outer Join does not work on nullable keys |
#149 | [BUG] Spark driver fails to load native libs when running on node without CUDA |
#826 | Fix link to cudf-0.15-cuda11.jar |
#815 | Update documentation for Scala UDFs in 0.2 since you need two things |
#802 | Update 0.2 CHANGELOG |
#793 | Update Jenkins scripts for release |
#798 | Fix shims provider override config not being seen by executors |
#785 | Make shuffle run on CPU if we do a join where we read from bucketed table |
#765 | Add config to override shims provider class |
#759 | Add CHANGELOG for release 0.2 |
#758 | Skip the udf test fails periodically. |
#752 | Fix snapshot plugin jar version in docs |
#751 | Correct the channel for cudf installation |
#754 | Filter nulls from joins where possible to improve performance |
#732 | Add a timeout for RapidsShuffleIterator to prevent jobs to hang infin… |
#637 | Documentation changes for 0.2 release |
#747 | Disable udf tests that fail periodically |
#745 | Revert Null Join Filter |
#741 | Fix issue with parquet partitioned reads |
#733 | Remove GPU Types from github |
#720 | Stop removing GpuCoalesceBatches from non-AQE queries when AQE is enabled |
#729 | Fix collect time metric in CoalesceBatches |
#640 | Support running Pandas UDFs on GPUs in Python processes. |
#721 | Add some more checks to databricks build scripts |
#714 | Move spark 3.0.1-shims out of snapshot-shims |
#711 | fix blossom checkout repo |
#709 | [BUG] fix unexpected indentation issue in blossom yml |
#642 | Init workflow for blossom-ci |
#705 | Enable configuration check for cast string to timestamp |
#702 | Update slack channel for Jenkins builds |
#701 | fix checkout-ref for automerge |
#695 | Fix spark-3.0.1 shim to be released |
#668 | refactor automerge to support merge for protected branch |
#687 | Include the UDF compiler in the dist jar |
#689 | Change shims dependency to spark-3.0.1 |
#677 | Use multi-threaded parquet read with small files |
#638 | Add Parquet-based cache serializer |
#613 | Enable UCX + AQE |
#684 | Enable test for literal string values in a select |
#686 | Remove sorts when replacing sort aggregate if possible |
#675 | Added TimeAdd |
#645 | [window] Add GpuWindowExec requiredChildOrdering |
#676 | fixUpJoinConsistency rule now works when AQE is enabled |
#683 | Fix issues with cannonicalization of WrappedAggFunction |
#682 | Fix path to start-slave.sh script in docs |
#673 | Increase build timeouts on nightly and premerge builds |
#648 | add signoff-check use github actions |
#593 | Add support for isNaN and datetime related instructions in UDF compiler |
#666 | [window] Disable GPU for COUNT(exp) queries |
#655 | Implement AQE unit test for InsertAdaptiveSparkPlan |
#614 | Fix for aggregation with multiple distinct and non distinct functions |
#657 | Fix verify build after integration tests are run |
#660 | Add in neverReplaceExec and several rules for it |
#639 | BooleanType test shouldn't xfail |
#652 | Mark UVM config as internal until supported |
#653 | Move to the cudf-0.15 release |
#647 | Improve warnings about AQE nodes not supported on GPU |
#646 | Stop reporting zero metrics for GpuCustomShuffleReader |
#644 | Small fix for race in catalog where a buffer could get spilled while … |
#623 | Fix issues with canonicalization |
#599 | [FEA] changelog generator |
#563 | cudf and spark version info in artifacts |
#633 | Fix leak if RebaseHelper throws during Parquet read |
#632 | Copy function isSearchableType from Spark because signature changed in 3.0.1 |
#583 | Add udf compiler unit tests |
#617 | Documentation updates for branch 0.2 |
#616 | Add config to reserve GPU memory |
#612 | [REVIEW] Fix incorrect output from averages with filters in partial only mode |
#609 | fix minor issues with instructions for building ucx |
#611 | Added in profile to enable shims for SNAPSHOT releases |
#595 | Parquet small file reading optimization |
#582 | fix #579 Auto-merge between branches |
#536 | Add test for skewed join optimization when AQE is enabled |
#603 | Fix data size metric always 0 when using RAPIDS shuffle |
#600 | Fix calculation of string data for compressed batches |
#597 | Remove the xfail for parquet test_read_merge_schema on Databricks |
#591 | Add ucx license in NOTICE-binary |
#596 | Add Spark 3.0.2 to Shim layer |
#594 | Filter nulls from joins where possible to improve performance. |
#590 | Move GpuParquetScan/GpuOrcScan into Shim |
#588 | xfail the tpch spark 3.1.0 tests that fail |
#572 | Update buffer store to return compressed batches directly, add compression NVTX ranges |
#558 | Fix unit tests when AQE is enabled |
#580 | xfail the Spark 3.1.0 integration tests that fail |
#565 | Minor improvements to TPC-DS benchmarking code |
#567 | Explicitly disable AQE in one test |
#571 | Fix Databricks shim layer for GpuFileSourceScanExec and GpuBroadcastExchangeExec |
#564 | Add GPU decode time metric to scans |
#562 | getCatalog can be called from the driver, and can return null |
#555 | Fix build warnings for ColumnViewAccess |
#560 | Fix databricks build for AQE support |
#557 | Fix tests failing on Spark 3.1 |
#547 | Add GPU metrics to GpuFileSourceScanExec |
#462 | Implement optimized AQE support so that exchanges run on GPU where possible |
#550 | Document Parquet and ORC compression support |
#539 | Update script to audit multiple Spark versions |
#543 | Add metrics to GpuUnion operator |
#549 | Move spark shim properties to top level pom |
#497 | Add UDF compiler implementations |
#487 | Add framework for batch compression of shuffle partitions |
#544 | Add in driverExtraClassPath for standalone mode docs |
#546 | Fix Spark 3.1.0 shim build error in GpuHashJoin |
#537 | Use fresh SparkSession when capturing to avoid late capture of previous query |
#538 | Revert "Temporary workaround for RMM initial pool size bug (#530)" |
#517 | Add config to limit maximum RMM pool size |
#527 | Add support for split and getArrayIndex |
#534 | Fixes bugs around GpuShuffleEnv initialization |
#529 | [BUG] Degenerate table metas were not getting copied to the heap |
#530 | Temporary workaround for RMM initial pool size bug |
#526 | Fix bug with nullability reporting in GpuFilterExec |
#521 | Fix typo with databricks shim classname SparkShimServiceProvider |
#522 | Use SQLConf instead of SparkConf when looking up SQL configs |
#518 | Fix init order issue in GpuShuffleEnv when RAPIDS shuffle configured |
#514 | Added clarification of RegExpReplace, DateDiff, made descriptive text consistent |
#506 | Add in basic support for running tpcds like queries |
#504 | Add ability to ignore tests depending on spark shim version |
#503 | Remove unused async buffer spill support |
#501 | disable codegen in 3.1 shim for hash join |
#466 | Optimize and fix Api validation script |
#481 | Codeowners |
#439 | Check a PR has been committed using git signoff |
#319 | Update partitioning logic in ShuffledBatchRDD |
#491 | Temporarily ignore AQE integration tests |
#490 | Fix Spark 3.1.0 build for HashJoin changes |
#482 | Prevent bad practice in python tests |
#485 | Show plan in assertion message if test fails |
#480 | Fix link from README to getting-started.md |
#448 | Preliminary support for keeping broadcast exchanges on GPU when AQE is enabled |
#478 | Fall back to CPU for binary as string in parquet |
#477 | Fix special case joins in broadcast nested loop join |
#469 | Update HashAggregateSuite to work with AQE |
#475 | Udf compiler pom followup |
#434 | Add UDF compiler skeleton |
#474 | Re-enable noscaladoc check |
#461 | Fix comments style to pass scala style check |
#468 | fix broken link |
#456 | Add closeOnExcept to clean up code that closes resources only on exceptions |
#464 | Turn off noscaladoc rule until codebase is fixed |
#449 | Enforce NoScalaDoc rule in scalastyle checks |
#450 | Enable scalastyle for shuffle plugin |
#451 | Databricks remove unneeded files and fix build to not fail on rm when file missing |
#442 | Shim layer support for Spark 3.0.0 Databricks |
#447 | Add scalastyle plugin to shim module |
#426 | Update BufferMeta to support multiple codec buffers per table |
#440 | Run mortgage test both with AQE on and off |
#445 | Added in StringRPad and StringLPad |
#422 | Documentation updates |
#437 | Fix bug with InSet and Strings |
#435 | Add in checks for Parquet LEGACY date/time rebase |
#432 | Fix batch use-after-close in partitioning, shuffle env init |
#423 | Fix duplicates includes in assembly jar |
#418 | CI Add unit tests running for Spark 3.0.1 |
#421 | Make it easier to run TPCxBB benchmarks from spark shell |
#413 | Fix download link |
#414 | Shim Layer to support multiple Spark versions |
#406 | Update cast handling to deal with new libcudf casting limitations |
#405 | Change slave->worker |
#395 | Databricks doc updates |
#401 | Extended the FAQ |
#398 | Add tests for GpuPartition |
#352 | Change spark tgz package name |
#397 | Fix small bug in ShuffleBufferCatalog.hasActiveShuffle |
#286 | [REVIEW] Updated join tests for cache |
#393 | Contributor license agreement |
#389 | Added in support for RangeExec |
#390 | Ucx getting started |
#391 | Hide slack channel in Jenkins scripts |
#387 | Remove the term whitelist |
#365 | [REVIEW] Timesub tests |
#383 | Test utility to compare SQL query results between CPU and GPU |
#380 | Fix databricks notebook link |
#378 | Added in FAQ and fixed spelling |
#377 | Update heading in configs.md |
#373 | Modifying branch name to conform with rapidsai branch name change |
#376 | Add our session extension correctly if there are other extensions configured |
#374 | Fix rat issue for notebooks |
#364 | Update Databricks patch for changes to GpuSortMergeJoin |
#371 | fix typo and use regional bucket per GCP's update |
#359 | Karthik changes |
#353 | Fix broadcast nested loop join for the no column case |
#313 | Additional tests for broadcast hash join |
#342 | Implement build-side rules for shuffle hash join |
#349 | Updated join code to treat null equality properly |
#335 | Integration tests on spark 3.0.1-SNAPSHOT & 3.1.0-SNAPSHOT |
#346 | Update the Title Header for Fine Tuning |
#344 | Fix small typo in readme |
#331 | Adds iterator and client unit tests, and prepares for more fetch failure handling |
#337 | Fix Scala compile phase to allow Java classes referencing Scala classes |
#332 | Match GPU overwritten functions with SQL functions from FunctionRegistry |
#339 | Fix databricks build |
#338 | Move GpuPartitioning to a separate file |
#310 | Update release Jenkinsfile for Databricks |
#330 | Hide private info in Jenkins scripts |
#324 | Add in basic support for GpuCartesianProductExec |
#328 | Enable slack notification for Databricks build |
#321 | update databricks patch for GpuBroadcastNestedLoopJoinExec |
#322 | Add oss.sonatype.org to download the cudf jar |
#320 | Don't mount passwd/group to the container |
#258 | Enable running TPCH tests with AQE enabled |
#318 | Build docker image with Dockerfile |
#309 | Update databricks patch to latest changes |
#312 | Trigger branch-0.2 integration test |
#307 | [Jenkins] Update the release script and Jenkinsfile |
#304 | [DOC][Minor] Fix typo in spark config name. |
#303 | Update compatibility doc for -0.0 issues |
#301 | Add info about branches in README.md |
#296 | Added in basic support for broadcast nested loop join |
#297 | Databricks CI improvements and support runtime env parameter to xfail certain tests |
#292 | Move artifacts version in version-def.sh |
#254 | Cleanup QA tests |
#289 | Clean up GpuCollectLimitMeta and add in metrics |
#287 | Add in support for right join and fix issues build right |
#273 | Added releases to the README.md |
#285 | modify run_pyspark_from_build.sh to be bash 3 friendly |
#281 | Add in support for Full Outer Join on non-null keys |
#274 | Add RapidsDiskStore tests |
#259 | Add RapidsHostMemoryStore tests |
#282 | Update Databricks patch for 0.2 branch |
#261 | Add conditional xfail test for DISTINCT aggregates with NaN |
#263 | More time ops |
#256 | Remove special cases for contains, startsWith, and endWith |
#253 | Remove GpuAttributeReference and GpuSortOrder |
#271 | Update the versions for 0.2.0 properly for the databricks build |
#162 | Integration tests for corner cases in window functions. |
#264 | Add a local mvn repo for nightly pipeline |
#262 | Refer to branch-0.2 |
#255 | Revert change to make dependencies of shaded jar optional |
#257 | Fix link to RAPIDS cudf in index.md |
#252 | Update to 0.2.0-SNAPSHOT and cudf-0.15-SNAPSHOT |
#74 | [FEA] Support ToUnixTimestamp |
#21 | [FEA] NormalizeNansAndZeros |
#105 | [FEA] integration tests for equi-joins |
#116 | [BUG] calling replace with a NULL throws an exception |
#168 | [BUG] GpuUnitTests Date tests leak column vectors |
#209 | [BUG] Developers section in pom need to be updated |
#204 | [BUG] Code coverage docs are out of date |
#154 | [BUG] Incorrect output from partial-only averages with nulls |
#61 | [BUG] Cannot disable Parquet, ORC, CSV reading when using FileSourceScanExec |
#249 | Compatability -> Compatibility |
#247 | Add index.md for default doc page, fix table formatting for configs |
#241 | Let default branch to master per the release rule |
#177 | Fixed leaks in unit test and use ColumnarBatch for testing |
#243 | Jenkins file for Databricks release |
#225 | Make internal project dependencies optional for shaded artifact |
#242 | Add site pages |
#221 | Databricks Build Support |
#215 | Remove CudfColumnVector |
#213 | Add RapidsDeviceMemoryStore tests |
#214 | [REVIEW] Test failure to pass Attribute as GpuAttribute |
#211 | Add project leads to pom developer list |
#210 | Updated coverage docs |
#195 | Support public release for plugin jar |
#208 | Remove unneeded comment from pom.xml |
#191 | WindowExec handle different spark distributions |
#181 | Remove INCOMPAT for NormalizeNanAndZero, KnownFloatingPointNormalized |
#196 | Update Spark dependency to the released 3.0.0 artifacts |
#206 | Change groupID to 'com.nvidia' in IT scripts |
#202 | Fixed issue for contains when searching for an empty string |
#201 | Fix name of scan |
#200 | Fix issue with GpuAttributeReference not overrideing references |
#197 | Fix metrics for writes |
#186 | Fixed issue with nullability on concat |
#193 | Add RapidsBufferCatalog tests |
#188 | rebrand to com.nvidia instead of ai.rapids |
#189 | Handle AggregateExpression having resultIds parameter instead of a single resultId |
#190 | FileSourceScanExec can have logicalRelation parameter on some distributions |
#185 | Update type of parameter of GpuExpandExec to make it consistent |
#172 | Merge qa test to integration test |
#180 | Add MetaUtils unit tests |
#171 | Cleanup scaladoc warnings about missing links |
#176 | Updated join tests to cover more data. |
#169 | Remove dependency on shaded Spark artifact |
#174 | Added in fallback tests |
#165 | Move input metadata tests to pyspark |
#173 | Fix setting local mode for tests |
#160 | Integration tests for normalizing NaN/zeroes. |
#163 | Ignore the order locally for repartition tests |
#157 | Add partial and final only hash aggregate tests and fix nulls corner case for Average |
#159 | Add integration tests for joins |
#158 | Orc merge schema fallback and FileScan format configs |
#164 | Fix compiler warnings |
#152 | Moved cudf to 0.14 for CI |
#151 | Switch CICD pipelines to Github |