Change log

Generated on 2024-01-31

Release 23.12

Features


#6832	[FEA] Convert Timestamp/Timezone tests/checks to be per operator instead of generic
#9805	[FEA] Support `current_date` expression function with CST (UTC + 8) timezone support
#9515	[FEA] Support temporal types in to_json
#9872	[FEA][JSON] Support Decimal type in `to_json`
#9802	[FEA] Support FromUTCTimestamp on the GPU with a non-UTC time zone
#6831	[FEA] Support timestamp transitions to and from UTC for single time zones with no repeating rules
#9590	[FEA][JSON] Support temporal types in `from_json`
#9804	[FEA] Support CPU path for from_utc_timestamp function with timezone
#9461	[FEA] Validate nvcomp-3.0 with spark rapids plugin
#8832	[FEA] rewrite join conditions where only part of it can fit on the AST
#9059	[FEA] Support spark.sql.parquet.datetimeRebaseModeInRead=LEGACY
#9037	[FEA] Support spark.sql.parquet.int96RebaseModeInWrite= LEGACY
#9632	[FEA] Take into account `org.apache.spark.timeZone` in Parquet/Avro from Spark 3.2
#8770	[FEA] add more metrics to Eventlogs or Executor logs
#9597	[FEA][JSON] Support boolean type in `from_json`
#9516	[FEA] Add support for JSON data source option `ignoreNullFields=false` in `to_json`
#9520	[FEA] Add support for `LAST()` as running window function
#9518	[FEA] Add support for relevant JSON data source options in `to_json`
#9218	[FEA] Support stack function
#9532	[FEA] Support Delta Lake 2.3.0
#1525	[FEA] Support Scala 2.13
#7279	[FEA] Support OverwriteByExpressionExecV1 for Delta Lake
#9326	[FEA] Specify `recover_with_null` when reading JSON files
#8780	[FEA] Support to_json function
#7278	[FEA] Support AppendDataExecV1 for Delta Lake
#6266	[FEA] Support Percentile
#7277	[FEA] Support AtomicReplaceTableAsSelect for Delta Lake
#7276	[FEA] Support AtomicCreateTableAsSelect for Delta Lake

Performance


#8137	[FEA] Upgrade to UCX 1.15
#8157	[FEA] Add string comparison to AST expressions
#9398	[FEA] Compress/encrypt spill to disk

Bugs Fixed


#9687	[BUG] `test_in_set` fails when DATAGEN_SEED=1698940723
#9659	[BUG] executor crash intermittantly in scala2.13-built spark332 integration tests
#9923	[BUG] Failed case about `test_timestamp_seconds_rounding_necessary[Decimal(20,7)][DATAGEN_SEED=1701412018] – src.main.python.date_time_test`
#9982	[BUG] test "convert large InternalRow iterator to cached batch single col" failed with arena pool
#9683	[BUG] test_map_scalars_supported_key_types fails with DATAGEN_SEED=1698940723
#9976	[BUG] test_part_write_round_trip[Float] Failed on -0.0 partition
#9948	[BUG] parquet reader data corruption in nested schema after rapidsai/cudf#13302
#9867	[BUG] Unable to use Spark Rapids with Spark Thrift Server
#9934	[BUG] test_delta_multi_part_write_round_trip_unmanaged and test_delta_part_write_round_trip_unmanaged failed DATA_SEED=1701608331
#9933	[BUG] collection_ops_test.py::test_sequence_too_long_sequence[Long(not_null)][DATAGEN_SEED=1701553915, INJECT_OOM]
#9837	[BUG] test_part_write_round_trip failed
#9932	[BUG] Failed test_multi_tier_ast[DATAGEN_SEED=1701445668] on CI
#9829	[BUG] Java OOM when testing non-UTC time zone with lots of cases fallback.
#9403	[BUG] test_cogroup_apply_udf[Short(not_null)] failed with pandas 2.1.X
#9684	[BUG] test_coalesce fails with DATAGEN_SEED=1698940723
#9685	[BUG] test_case_when fails with DATAGEN_SEED=1698940723
#9776	[BUG] fastparquet compatibility tests fail with data mismatch if TZ is not set and system timezone is not UTC
#9733	[BUG] Complex AST expressions can crash with non-matching operand type error
#9877	[BUG] Fix resource leak in to_json
#9722	[BUG] test_floor_scale_zero fails with DATAGEN_SEED=1700009407
#9846	[BUG] test_ceil_scale_zero may fail with different datagen_seed
#9781	[BUG] test_cast_string_date_valid_format fails on DATAGEN_SEED=1700250017
#9714	Scala Map class not found when executing the benchmark on Spark 3.5.0 with Scala 2.13
#9856	collection_ops_test.py failed on Dataproc-2.1 with: Column 'None' does not exist
#9397	[BUG] RapidsShuffleManager MULTITHREADED on Databricks, we see loss of executors due to Rpc issues
#9738	[BUG] `test_delta_part_write_round_trip_unmanaged` and `test_delta_multi_part_write_round_trip_unmanaged` fail with `DATAGEN_SEED=1700105176`
#9771	[BUG] ast_test.py::test_X[(String, True)][DATAGEN_SEED=1700205785] failed
#9782	[BUG] Error messages appear in a clean build
#9798	[BUG] GpuCheckOverflowInTableInsert should be added to databricks shim
#9820	[BUG] test_parquet_write_roundtrip_datetime_with_legacy_rebase fails with "year 0 is out of range"
#9817	[BUG] FAILED dpp_test.py::test_dpp_reuse_broadcast_exchange[false-0-parquet][DATAGEN_SEED=1700572856, IGNORE_ORDER]
#9768	[BUG] `cast decimal to string` ScalaTest relies on a side effects
#9711	[BUG] test_lte fails with DATAGEN_SEED=1699987762
#9751	[BUG] cmp_test test_gte failed with DATAGEN_SEED=1700149611
#9469	[BUG] [main] ERROR com.nvidia.spark.rapids.GpuOverrideUtil - Encountered an exception applying GPU overrides java.lang.IllegalStateException: the broadcast must be on the GPU too
#9648	[BUG] Existence default values in schema are not being honored
#9676	Fix Delta Lake Integration tests; `test_delta_atomic_create_table_as_select` and `test_delta_atomic_replace_table_as_select`
#9701	[BUG] test_ts_formats_round_trip and test_datetime_roundtrip_with_legacy_rebase fail with DATAGEN_SEED=1699915317
#9691	[BUG] Repeated Maven invocations w/o changes recompile too many Scala sources despite recompileMode=incremental
#9547	Update buildall and doc to generate bloop projects for test debugging
#9697	[BUG] Iceberg multiple file readers can not read files if the file paths contain encoded URL unsafe chars
#9681	Databricks Build Failing For 330db+
#9521	[BUG] Multi Threaded Shuffle Writer needs flow control
#9675	Failing Delta Lake Tests for Databricks 13.3 Due to WriteIntoDeltaCommand
#9669	[BUG] Rebase exception states not in UTC but timezone is Etc/UTC
#7940	[BUG] UCX peer connection issue in multi-nic single node cluster
#9650	[BUG] Github workflow for missing scala2.13 updates fails to detect when pom is new
#9621	[BUG] Scala 2.13 with-classifier profile is picking up Scala2.12 spark.version
#9636	[BUG] All parquet integration tests failed "Part of the plan is not columnar class" in databricks runtimes
#9108	[BUG] nullability on some decimal operations is wrong
#9625	[BUG] Typo in github Maven check install-modules
#9603	[BUG] fastparquet_compatibility_test fails on dataproc
#8729	[BUG] nightly integration test failed OOM kill in JDK11 ENV
#9589	[BUG] Scala 2.13 build hard-codes Java 8 target
#9581	Delta Lake 2.4 missing equals/hashCode override for file format and some metrics for merge
#9507	[BUG] Spark 3.2+/ParquetFilterSuite/Parquet filter pushdown - timestamp/ FAILED
#9540	[BUG] Job failed with SparkUpgradeException no matter which value are set for spark.sql.parquet.datetimeRebaseModeInRead
#9545	[BUG] Dataproc 2.0 test_reading_file_rewritten_with_fastparquet tests failing
#9552	[BUG] Inconsistent CDH dependency overrides across submodules
#9571	[BUG] non-deterministic compiled SQLExecPlugin.class with scala 2.13 deployment
#9569	[BUG] test_window_running failed in 3.1.2+3.1.3
#9480	[BUG] mapInPandas doesn't invoke udf on empty partitions
#8644	[BUG] Parquet file with malformed dictionary does not error when loaded
#9310	[BUG] Improve support for reading JSON files with malformed rows
#9457	[BUG] CDH 332 unit tests failing
#9404	[BUG] Spark reports a decimal error when create lit scalar when generate Decimal(34, -5) data.
#9110	[BUG] GPU Reader fails due to partition column creating column larger then cudf column size limit
#8631	[BUG] Parquet load failure on repeated_no_annotation.parquet
#9364	[BUG] CUDA illegal access error is triggering split and retry logic

PRs


#10340	Copyright to 2024 [skip ci]
#10323	Upgrade version to 23.12.2-SNAPSHOT
#10274	PythonRunner Changes
#10124	Update changelog for v23.12.1 [skip ci]
#10123	Change version to v23.12.1 [skip ci]
#10122	Init changelog for v23.12.1 [skip ci]
#10121	[DOC] update download page for db hot fix [skip ci]
#10116	Upgrade to 23.12.1-SNAPSHOT
#9935	Init 23.12 changelog [skip ci]
#9943	[DOC] Update docs for 23.12.0 release [skip ci]
#10014	Add documentation for how to run tests with a fixed datagen seed [skip ci]
#9954	Update private and JNI version to released 23.12.0
#10009	Using fix seed to unblock 23.12 release; Move the blocked issues to 24.02
#10007	Fix Java OOM in non-UTC case with lots of xfail (#9944)
#9985	Avoid allocating GPU memory out of RMM managed pool in test
#9970	Avoid leading and trailing zeros in test_timestamp_seconds_rounding_necessary
#9978	Avoid using floating point values as partition values in tests
#9979	Add compatibility notes for writing ORC with lost Gregorian days [skip ci]
#9949	Override the seed for `test_map_scalars_supported_key_types` for version of Spark before 3.4.0 [Databricks]
#9961	Avoid using floating point for partition values in Delta Lake tests
#9960	Fix LongGen accidentally using special cases when none are desired
#9950	Avoid generating NaNs as partition values in test_part_write_round_trip
#9940	Fix 'year 0 is out of range' by setting a fix seed
#9946	Fix test_multi_tier_ast to ignore ordering of output rows
#9928	Test `inset` with `NaN` only for Spark from 3.1.3
#9906	Fix test_initcap to use the intended limited character set
#9831	Skip fastparquet timestamp tests when plugin cannot read/write timestamps
#9893	Add multiple expression tier regression test for AST
#9873	Add support for decimal in `to_json`
#9890	Remove Databricks 13.3 from release 23.12
#9874	Fix zero-scale floor and ceil tests
#9879	Fix resource leak in to_json
#9600	Add date and timestamp support to to_json
#9871	Fix test_cast_string_date_valid_format generating year 0
#9885	Preparation for non-UTC nightly CI [skip ci]
#9810	Support from_utc_timestamp on the GPU for non-UTC timezones (non-DST)
#9865	Fix problems with nulls in sequence tests
#9864	Add compatibility documentation with respect to decimal overflow detection [skip ci]
#9860	Fixing FAQ deadlink in plugin code [skip ci]
#9840	Avoid using NaNs as Delta Lake partition values
#9773	xfail all the impacted cases when using non-UTC time zone
#9849	Instantly Delete pre-merge content of stage workspace if success
#9848	Force datagen_seed for test_ceil_scale_zero and test_decimal_round
#9677	Enable build for Databricks 13.3
#9809	Re-enable AST string integration cases
#9835	Avoid pre-Gregorian dates in schema_evolution_test
#9786	Check paths for existence to prevent ignorable error messages during build
#9824	UCX 1.15 upgrade
#9800	Add GpuCheckOverflowInTableInsert to Databricks 11.3+
#9821	Update timestamp gens to avoid "year 0 is out of range" errors
#9826	Set seed to 0 for test_hash_reduction_sum
#9720	Support timestamp in `from_json`
#9818	Specify nullable=False when generating filter values in dpp tests
#9689	Support CPU path for from_utc_timestamp function with timezone
#9769	Use withGpuSparkSession to customize SparkConf
#9780	Fix NaN handling in GpuLessThanOrEqual and GpuGreaterThanOrEqual
#9795	xfail AST string tests
#9666	Add support for parsing strings as dates in `from_json`
#9673	Fix the broadcast joins issues caused by InputFileBlockRule
#9785	Force datagen_seed for 9781 and 9784 [skip ci]
#9765	Let GPU scans fall back when default values exist in schema
#9729	Fix Delta Lake atomic table operations on spark341db
#9770	[BUG] Fix the doc for Maven and Scala 2.13 test example [skip ci]
#9761	Fix bug in tagging of JsonToStructs
#9758	Remove forced seed from Delta Lake part_write_round_trip_unmanaged tests
#9652	Add time zone config to set non-UTC
#9736	Fix `TimestampGen` to generate value not too close to the minimum allowed timestamp
#9698	Speed up build: unnecessary invalidation in the incremental recompile mode
#9748	Fix Delta Lake part_write_round_trip_unmanaged tests with floating point
#9702	Support split BroadcastNestedLoopJoin condition for AST and non-AST
#9746	Force test_hypot to be single seed for now
#9745	Avoid generating null filter values in test_delta_dfp_reuse_broadcast_exchange
#9741	Set seed=0 for the delta lake part roundtrip tests
#9660	Fully support date/time legacy rebase for nested input
#9672	Support String type for AST
#9732	Temporarily force `datagen_seed=0` for `test_re_replace_all` to unblock CI
#9726	Fix leak in BatchWithPartitionData
#9717	Encode the file path from Iceberg when converting to a PartitionedFile
#9441	Add a random seed specific to datagen cases
#9649	Support `spark.sql.parquet.datetimeRebaseModeInRead=LEGACY` and `spark.sql.parquet.int96RebaseModeInRead=LEGACY`
#9612	Escape quotes and newlines when converting strings to json format in to_json
#9644	Add Partial Delta Lake Support for Databricks 13.3
#9690	Changed `extractExecutedPlan` to consider ResultQueryStageExec for Databricks 13.3
#9686	Removed Maven Profiles From `tests/pom.xml`
#9509	Fine-grained spill metrics
#9658	Support `spark.sql.parquet.int96RebaseModeInWrite=LEGACY`
#9695	Revert "Support split non-AST-able join condition for BroadcastNested…
#9693	Enable automerge from 23.12 to 24.02 [skip ci]
#9679	[Doc] update the dead link in download page [skip ci]
#9678	Add flow control for multithreaded shuffle writer
#9635	Support split non-AST-able join condition for BroadcastNestedLoopJoin
#9646	Fix Integration Test Failures for Databricks 13.3 Support
#9670	Normalize file timezone and handle missing file timezone in datetimeRebaseUtils
#9657	Update verify check to handle new pom files [skip ci]
#9663	Making User Guide info in bold and adding it as top right link in github.io [skip ci]
#9609	Add valid retry solution to mvn-verify [skip ci]
#9655	Document problem with handling of invalid characters in CSV reader
#9620	Add support for parsing boolean values in `from_json`
#9615	Bloop updates - require JDK11 in buildall + docs, build bloop for all targets.
#9631	Refactor Parquet readers
#9637	Added Support For Various Execs for Databricks 13.3
#9640	Add support for `ignoreNullFields=false` in `to_json`
#9623	Running window optimization for `LAST()`
#9641	Revert "Support rebase checking for nested dates and timestamps (#9617)"
#9423	Re-enable `from_json` / `JsonToStructs`
#9624	Add jenkins-level retry for pre-merge build in databricks runtimes
#9608	Fix nullability issues for some decimal operations
#9617	Support rebase checking for nested dates and timestamps
#9611	Move simple classes after refactoring to sql-plugin-api
#9618	Remove unused dataTypes argument from HostShuffleCoalesceIterator
#9626	Fix ENV typo in pre-merge github actions [skip ci]
#9593	PythonRunner and RapidsErrorUtils Changes For Databricks 13.3
#9607	Integration tests: Install specific fastparquet version.
#9610	Propagate local properties to broadcast execs
#9544	Support batching for `RANGE` running window aggregations. Including on
#9601	Remove usage of deprecated scala.Proxy
#9591	Enable implicit JDK profile activation
#9586	Merge metrics and file format fixes to Delta 2.4 support
#9594	Revert "Ignore failing Parquet filter test to unblock CI (#9519)"
#9454	Support encryption and compression in disk store
#9439	Support stack function
#9583	Fix fastparquet tests to work with HDFS
#9508	Consolidate deps switching in an intermediate pom
#9562	Delta Lake 2.3.0 support
#9576	Move Stack classes to wrapper classes to fix non-deterministic build issue
#9572	Add retry for CrossJoinIterator and ConditionalNestedLoopJoinIterator
#9575	Fix `test_window_running*()` for `NTH_VALUE IGNORE NULLS`.
#9574	Fix broken #endif scala comments [skip ci]
#9568	Enforce Apache 3.3.0+ for Scala 2.13
#9557	Support launching Map Pandas UDF on empty partitions
#9489	Batching support for ROW-based `FIRST()` window function
#9510	Add Databricks 13.3 shim boilerplate code and refactor Databricks 12.2 shim
#9554	Fix fastparquet installation for
#9536	Add CPU POC of TimeZoneDB; Test some time zones by comparing CPU POC and Spark
#9558	Support integration test against scala2.13 spark binaries[skip ci]
#8592	Scala 2.13 Support
#9551	Enable malformed Parquet failure test
#9546	Support OverwriteByExpressionExecV1 for Delta Lake tables
#9527	Support Split And Retry for GpuProjectAstExec
#9541	Move simple classes to API
#9548	Append new authorized user to blossom-ci whitelist [skip ci]
#9418	Fix STRUCT comparison between Pandas and Spark dataframes in fastparquet tests
#9468	Add SplitAndRetry to GpuRunningWindowIterator
#9486	Add partial support for `to_json`
#9538	Fix tiered project breaking higher order functions
#9539	Add delta-24x to delta-lake/README.md [skip ci]
#9534	Add pyarrow tests for Databricks runtime
#9444	Remove redundant pass-through shuffle manager classes
#9531	Fix relative path for spark-shell nightly test [skip ci]
#9525	Follow-up to dbdeps consolidation
#9506	Move ProxyShuffleInternalManagerBase to api
#9504	Add a spark-shell smoke test to premerge and nightly
#9519	Ignore failing Parquet filter test to unblock CI
#9478	Support AppendDataExecV1 for Delta Lake tables
#9366	Add tests to check compatibility with `fastparquet`
#9419	Add retry to RoundRobin Partitioner and Range Partitioner
#9502	Install Dependencies Needed For Databricks 13.3
#9296	Implement `percentile` aggregation
#9488	Add Shim JSON Headers for Databricks 13.3
#9443	Add AtomicReplaceTableAsSelectExec support for Delta Lake
#9476	Refactor common Delta Lake test code
#9463	Fix Cloudera 3.3.2 shim for handling CheckOverflowInTableInsert and orc zstd support
#9460	Update links in old release notes to new doc locations [skip ci]
#9405	Wrap scalar generation into spark session in integration test
#9459	Fix 332cdh build [skip ci]
#9425	Add support for AtomicCreateTableAsSelect with Delta Lake
#9434	Add retry support to `HostToGpuCoalesceIterator.concatAllAndPutOnGPU`
#9453	Update codeowner and blossom-ci ACL [skip ci]
#9396	Add support for Cloudera CDS-3.3.2
#9380	Fix parsing of Parquet legacy list-of-struct format
#9438	Fix auto merge conflict 9437 [skip ci]
#9424	Refactor aggregate functions
#9414	Add retry to GpuHashJoin.filterNulls
#9388	Add developer documentation about working with data sources [skip ci]
#9369	Improve JSON empty row fix to use less memory
#9373	Fix auto merge conflict 9372
#9308	Initiate arm64 CI support [skip ci]
#9292	Init project version 23.12.0-SNAPSHOT
#9291	Automerge from 23.10 to 23.12 [skip ci]

Release 23.10

Features


#9220	[FEA] Add GPU support for converting binary data to a hex string in REPL
#9171	[FEA] Add GPU version of ToPrettyString
#5314	[FEA] Support window.rowsBetween(Window.unboundedPreceding, -1)
#9057	[FEA] Add unbounded to unbounded fixers for min and max
#8121	[FEA] Add Spark 3.5.0 shim layer
#9224	[FEA] Allow } and }} to be transpiled to static strings
#8596	[FEA] Support spark.sql.legacy.parquet.datetimeRebaseModeInWrite=LEGACY
#8767	[AUDIT][SPARK-43302][SQL] Make Python UDAF an AggregateFunction
#9055	[FEA] Support Spark 3.3.3 official release
#8672	[FEA] Make GPU readers easier to debug on failure (any failure including OOM)
#8965	[FEA] Enable Bloom filter join acceleration by default
#8625	[FEA] Support outputTimestampType being INT96

Performance


#9512	[DOC] Multi-Threaded shuffle documentation is not accurate on the read side
#7803	[FEA] Accelerate Bloom filtered joins

Bugs Fixed


#8662	[BUG] Dataproc spark-rapids.sh fails due to cuda driver version issue
#9428	[Audit] SPARK-44448 Wrong results for dense_rank() <= k
#9485	[BUG] GpuSemaphore can deadlock if there are multiple threads per task
#9498	[BUG] spark 3.5.0 shim spark-shell is broken in spark-rapids 23.10 and 23.12
#9060	[BUG] OOM error in split and retry with multifile coalesce reader with parquet data
#8916	[BUG] Databricks - move init scripts off DBFS
#9416	[BUG] CDH build failed due to missing dependencies
#9357	[BUG] json_test failed on "NameError: name 'TimestampNTZType' is not defined"
#9271	[BUG] ThreadPool size is deduced incorrectly in MultiFileReaderThreadPool on YARN clusters
#9309	[BUG] bround and round do not return the correct result for some decimal values.
#9153	[BUG] netty OOM with MULTITHREADED shuffle
#9311	[BUG] test_hash_groupby_collect_list fails
#9180	[FEA][AUDIT][SPARK-44641] Incorrect result in certain scenarios when SPJ is not triggered
#9290	[BUG] delta_lake_test FAILED on "column mapping mode id is not supported for this Delta version"
#9255	[BUG] Unable to read DeltaTable with columnMapping.mode = name
#9261	[BUG] Leaks and Double Frees in Unit Tests
#9246	[BUG] `test_predefined_character_classes` failed with seed 4
#9208	[BUG] SplitAndRetryOOM query14_part1 at 100TB with spark.executor.cores=64
#9106	[BUG] Configuring GDS breaks new host spillable buffers and batches
#9131	[BUG] ConcurrentModificationException in ScalableTaskCompletion
#9263	[BUG] Unit test logging is not captured when running against Spark 3.5.0
#9168	[BUG] Calling RmmSpark.getAndResetNumRetryThrow from tests is not working
#8776	[BUG] FileCacheIntegrationSuite intermittent failure
#9223	[BUG] Failed to create memory map on query14_part1 at 100TB with spark.executor.cores=64
#9116	[BUG] spark350 shim build failed in mvn-verify github checks and nightly due to dependencies not released
#8984	[BUG] Check that keys are not null when creating a map
#9233	[BUG] test_parquet_testing_error_files - Failed: DID NOT RAISE <class 'Exception'> in databricks runtime 12.2
#9142	[BUG] AWS EMR 6.12 NDS SF3k query9 Failure on g4dn.4xlarge
#9214	[BUG] mvn resolve dependencies failed missing rapids-4-spark-sql-plugin-api_2.12 of 311 shim
#9204	[BUG] SplitAndRetryOOM query78 at 100TB with spark.executor.cores=64
#9213	[BUG] Missing revision info in databricks shims failed nightly build
#9206	[BUG] test_datetime_roundtrip_with_legacy_rebase failed in databricks runtimes
#9165	[BUG] Data gen for key groups produces type-mismatch columns
#9129	[BUG] Writing Parquet map(map) column can not set the outer key as non-null.
#9194	[BUG] missing sql-plugin-api databricks artifacts in the nightly CI
#9167	[BUG] Ensure no udf-compiler internal nodes escape
#9092	[BUG] NDS query 64 falls back to CPU only for a shuffle
#9071	[BUG] `test_numeric_running_sum_window_no_part_unbounded` failed in MT tests
#9154	[BUG] Spark 3.5.0 nightly build failures (test_parquet_testing_error_files)
#9149	[BUG] compile failed in databricks runtimes due to new added TestReport
#9041	[BUG] Fix regression in Python UDAF support when running against Spark 3.5.0
#9064	[BUG][Spark 3.5.0] Re-enable test_hive_empty_simple_udf when 3.5.0-rc2 is available
#9065	[BUG][Spark 3.5.0] Reinstate cast map/array to string tests when 3.5.0-rc2 is available
#9119	[BUG] Predicate pushdown doesn't work for parquet files written by GPU
#9103	[BUG] test_select_complex_field fails in MT tests
#9086	[BUG] GpuBroadcastNestedLoopJoinExec can assert in doUnconditionalJoin
#8939	[BUG] q95 odd task failure in query95 at 30TB
#9082	[BUG] Race condition while spilling and aliasing a RapidsBuffer (regression)
#9069	[BUG] ParquetFormatScanSuite does not pass locally
#8980	[BUG] invalid escape sequences in pytests
#7807	[BUG] Round robin partitioning sort check falls back to CPU for cases that can be supported
#8482	[BUG] Potential leak on SplitAndRetry when iterator not fully drained
#8942	[BUG] NDS query 14 parts 1 and 2 both fail at SF100K
#8778	[BUG] GPU Parquet output for TIMESTAMP_MICROS is misinteterpreted by fastparquet as nanos

PRs


#9304	Specify recoverWithNull when reading JSON files
#9474	Improve configuration handling in BatchWithPartitionData
#9289	Add tests to check compatibility with pyarrow
#9522	Update 23.10 changelog [skip ci]
#9501	Fix GpuSemaphore to support multiple threads per task
#9500	Fix Spark 3.5.0 shell classloader issue with the plugin
#9230	Fix reading partition value columns larger than cudf column size limit
#9427	[DOC] Update docs for 23.10.0 release [skip ci]
#9421	Init changelog of 23.10 [skip ci]
#9445	Only run test_csv_infer_schema_timestamp_ntz tests with PySpark >= 3.4.1
#9420	Update private and jni dep version to released 23.10.0
#9415	[BUG] fix docker modified check in premerge [skip ci]
#9407	[Doc]Update docs for 23.08.2 version[skip ci]
#9392	Only run test_json_ts_formats_round_trip_ntz tests with PySpark >= 3.4.1
#9401	Remove using mamba before they fix the incompatibility issue [skip ci]
#9381	Change the executor core calculation to take into account the cluster manager
#9351	Put back in full decimal support for format_number
#9374	GpuCoalesceBatches should throw SplitAndRetyOOM on GPU OOM error
#9238	Simplified handling of GPU core dumps
#9362	[DOC] Removing User Guide pages that will be source of truth on docs.nvidia…
#9365	Update DataWriteCommandExec docs to reflect ORC support for nested types
#9277	[Doc]Remove CUDA related requirement from download page.[Skip CI]
#9352	Refine rules for skipping `test_csv_infer_schema_timestamp_ntz_*` tests
#9334	Add NaNs to Data Generators In Floating-Point Testing
#9344	Update MULTITHREADED shuffle maxBytesInFlight default to 128MB
#9330	Add Hao to blossom-ci whitelist
#9328	Building different Cuda versions section profile does not take effect [skip ci]
#9329	Add kuhushukla to blossom ci yml
#9281	Support `format_number`
#9335	Temporarily skip failing tests test_csv_infer_schema_timestamp_ntz*
#9318	Update authorized user in blossom-ci whitelist [skip ci]
#9221	Add GPU version of ToPrettyString
#9321	[DOC] Fix some incorrect config links in doc [skip ci]
#9314	Fix RMM crash in FileCacheIntegrationSuite with ARENA memory allocator
#9287	Allow checkpoint and restore on non-deterministic expressions in GpuFilter and GpuProject
#9146	Improve some CSV integration tests
#9159	Update tests and documentation for `spark.sql.timestampType` when reading CSV/JSON
#9313	Sort results of collect_list test before comparing since it is not guaranteed
#9286	[FEA][AUDIT][SPARK-44641] Incorrect result in certain scenarios when SPJ is not triggered
#9229	Support negative preceding/following for ROW-based window functions
#9297	Append new authorized user to blossom-ci whitelist [skip ci]
#9294	Fix test_delta_read_column_mapping test failures on Spark 3.2.x and 3.3.x
#9285	Add CastOptions to make GpuCast extendible to handle more options
#9279	Fix file format checks to be exact and handle Delta Lake column mapping
#9283	Refactor ExternalSource to move some APIs to converted GPU format or scan
#9264	Fix leak in test and double free in corner case
#9280	Fix some issues found with different seeds in integration tests
#9257	Have host spill use the new HostAlloc API
#9253	Enforce Scala method syntax over deprecated procedure syntax
#9273	Add arm64 profile to build arm artifacts
#9270	Remove GDS spilling
#9267	Roll our own BufferedIterator so we can close cleanly
#9266	Specify correct dependency versions for 350 build
#9262	Add Delta Lake support for Spark 3.4.1 and Delta Lake tests on Spark 3.4.x
#9256	Test Parquet double column stat without NaN
#9254	[Doc]update the emr getting started doc for emr-6130 release[skip ci]
#9228	Add in unbounded to unbounded optimization for min/max
#9252	Add Spark 3.5.0 to list of supported Spark versions [skip ci]
#9251	Enable a couple of retry asserts in internal row to cudf row iterator suite
#9239	Handle escaping the dangling right ] and right } in the regexp transpiler
#9090	Add test cases for Parquet statistics
#9240	Fix flaky ORC filecache test
#9053	[DOC] update the turning guide document issues [skip ci]
#9211	Allow skipping host spill for a direct device->disk spill
#9234	Enable Spark 350 builds
#9237	Check for null keys when creating map
#9235	xfail fixed_length_byte_array.parquet test due to rapidsai/cudf#14104
#9231	Use conda libmamba solver to resolve intermittent libarchive issue [skip ci]
#8404	Add in support for FIXED_LEN_BYTE_ARRAY as binary
#9225	Add in a HostAlloc API for high priority and add in spilling
#9207	Support SplitAndRetry for GpuRangeExec
#9217	Fix leak in aggregate when there are retries
#9200	Fix a few minor things with scale test
#9222	Deploy classified aggregator for Databricks [skip ci]
#9209	Fix tests for datetime rebase in Databricks
#9181	[DOC] address document issues [skip ci]
#9132	Support `spark.sql.parquet.datetimeRebaseModeInWrite=LEGACY`
#9196	Fix host memory leak for R2C
#9192	Throw overflow exception when interval seconds are outside of range [0, 59]
#9150	add error section in report and the rest queries
#9189	Expose host store spill
#9147	Make map column non-nullable when it's a key in another map.
#9193	Support Retry for GpuLocalLimitExec and GpuGlobalLimitExec
#9183	Add test to verify UDT fallback for parquet
#9195	Deploy sql-plugin-api artifact in DBR CI pipelines [skip ci]
#9170	Add in new HostAlloc API
#9182	Consolidate Spark vendor shim dependency management
#9190	Prevent returning internal compiler expressions when compiling UDFs
#9164	Support Retry for GpuTopN and GpuSortEachBatchIterator
#9134	Fix shuffle fallback due to AQE on AWS EMR
#9188	Fix flaky tests in FileCacheIntegrationSuite
#9148	Add minimum Maven module eventually containing all non-shimmable source code
#9169	Add retry-without-split in InternalRowToColumnarBatchIterator
#9172	Remove doSetSpillable in favor of setSpillable
#9152	Add test cases for testing Parquet compression types
#9157	XFAIL parquet `lz4_raw` tests for Spark 3.5.0 or later
#9128	Test parquet predicate pushdown for basic types and fields having dots in names
#9158	Add json4s dependencies for Databricks integration_tests build
#9102	Add retry support to GpuOutOfCoreSortIterator.mergeSortEnoughToOutput
#9089	Add application to run Scale Test
#9143	[DOC] update spark.rapids.sql.concurrentGpuTasks default value in tuning guide [skip ci]
#8476	Use retry with split in GpuCachedDoublePassWindowIterator
#9141	Removed resultDecimalType in GpuIntegralDecimalDivide
#9099	Spark 3.5.0 follow-on work (rc2 support + Python UDAF)
#9140	Bump Jython to 2.7.3
#9136	Moving row column conversion code from cudf to jni
#9133	Add 350 tag to InSubqueryShims
#9124	Import `scala.collection` intead of `collection`
#9122	Fall back to CPU if `spark.sql.execution.arrow.useLargeVarTypes` is true
#9115	[DOC] updates documentation related to java compatibility [skip ci]
#9098	Add SpillableHostColumnarBatch
#9091	GPU support for DynamicPruningExpression and InSubqueryExec
#9117	Temply disable spark 350 shim build in nightly [skip ci]
#9113	Instantiate execution plan capture callback via shim loader
#8969	Initial support for Spark 3.5.0-rc1
#9100	Support broadcast nested loop existence joins with no condition
#8925	Add GpuConv operator for the `conv` 10<->16 expression
#9109	[DOC] adding java 11 to download docs [skip ci]
#9085	Retry with smaller split on `CudfColumnSizeOverflowException`
#8961	Save Databricks init scripts in the workspace
#9088	Add retry and SplitAndRetry support to AcceleratedColumnarToRowIterator
#9095	Support released spark 3.3.3
#9084	Fix race when a rapids buffer is aliased while it is spilled
#9093	Update ParquetFormatScanSuite to not call CUDF directly
#9068	Test ORC predicate pushdown (PPD) with timestamps decimals booleans
#9054	Initial entry point to data generation for scale test
#9070	Spillable host buffer
#9066	Add retry support to RowToColumnarIterator
#9073	Stop using invalid escape sequences
#9018	Add test for selecting a single complex field array and its parent struct array
#9067	Add array support for round robin partition; Refactor pluginSupportedOrderableSig
#9072	Revert "Implement SumUnboundedToUnboundedFixer (#8934)"
#9056	Add in configs for host memory limits
#9061	Fix import order
#8934	Implement SumUnboundedToUnboundedFixer
#9051	Use number of threads on executor instead of driver to set core count
#9040	Fix issues from 23.08 merge in join_test
#9045	Fix auto merge conflict 9043 [skip ci]
#9009	Add in a layer of indirection for task completion callbacks
#9013	Create a two-shim jar by default on Databricks
#8995	Add test case for ORC statistics test
#8970	Add ability to debug dump input data only on errors
#9003	Fix auto merge conflict 9002 [skip ci]
#8989	Mark lazy spillables as allowSpillable in during gatherer construction
#8988	Move big data generator to a separate module
#8987	Fix host memory buffer leaks in SerializationSuite
#8968	Enable GPU acceleration of Bloom filter join expressions by default
#8947	Add ArrowUtilsShims in preparation for Spark 3.5.0
#8946	[Spark 3.5.0] Shim access to StructType.fromAttributes
#8824	Drop the in-range check at INT96 output path
#8924	Deprecate and delegate GpuCV.debug to cudf TableDebug
#8915	Move LegacyBehaviorPolicy references to shim layer
#8918	Output unified diff when GPU output deviates
#8857	Remove the pageable pool
#8854	Fix auto merge conflict 8853 [skip ci]
#8805	Bump up dep versions to 23.10.0-SNAPSHOT
#8796	Init version 23.10.0-SNAPSHOT

Older Releases

Changelog of older releases can be found at docs/archives

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CHANGELOG.md

CHANGELOG.md

Change log

Release 23.12

Features

Performance

Bugs Fixed

PRs

Release 23.10

Features

Performance

Bugs Fixed

PRs

Older Releases

Files

CHANGELOG.md

Latest commit

History

CHANGELOG.md

File metadata and controls

Change log

Release 23.12

Features

Performance

Bugs Fixed

PRs

Release 23.10

Features

Performance

Bugs Fixed

PRs

Older Releases