Skip to content

Latest commit

 

History

History
545 lines (534 loc) · 59.3 KB

CHANGELOG.md

File metadata and controls

545 lines (534 loc) · 59.3 KB

Change log

Generated on 2024-01-31

Release 23.12

Features

#6832 [FEA] Convert Timestamp/Timezone tests/checks to be per operator instead of generic
#9805 [FEA] Support current_date expression function with CST (UTC + 8) timezone support
#9515 [FEA] Support temporal types in to_json
#9872 [FEA][JSON] Support Decimal type in to_json
#9802 [FEA] Support FromUTCTimestamp on the GPU with a non-UTC time zone
#6831 [FEA] Support timestamp transitions to and from UTC for single time zones with no repeating rules
#9590 [FEA][JSON] Support temporal types in from_json
#9804 [FEA] Support CPU path for from_utc_timestamp function with timezone
#9461 [FEA] Validate nvcomp-3.0 with spark rapids plugin
#8832 [FEA] rewrite join conditions where only part of it can fit on the AST
#9059 [FEA] Support spark.sql.parquet.datetimeRebaseModeInRead=LEGACY
#9037 [FEA] Support spark.sql.parquet.int96RebaseModeInWrite= LEGACY
#9632 [FEA] Take into account org.apache.spark.timeZone in Parquet/Avro from Spark 3.2
#8770 [FEA] add more metrics to Eventlogs or Executor logs
#9597 [FEA][JSON] Support boolean type in from_json
#9516 [FEA] Add support for JSON data source option ignoreNullFields=false in to_json
#9520 [FEA] Add support for LAST() as running window function
#9518 [FEA] Add support for relevant JSON data source options in to_json
#9218 [FEA] Support stack function
#9532 [FEA] Support Delta Lake 2.3.0
#1525 [FEA] Support Scala 2.13
#7279 [FEA] Support OverwriteByExpressionExecV1 for Delta Lake
#9326 [FEA] Specify recover_with_null when reading JSON files
#8780 [FEA] Support to_json function
#7278 [FEA] Support AppendDataExecV1 for Delta Lake
#6266 [FEA] Support Percentile
#7277 [FEA] Support AtomicReplaceTableAsSelect for Delta Lake
#7276 [FEA] Support AtomicCreateTableAsSelect for Delta Lake

Performance

#8137 [FEA] Upgrade to UCX 1.15
#8157 [FEA] Add string comparison to AST expressions
#9398 [FEA] Compress/encrypt spill to disk

Bugs Fixed

#9687 [BUG] test_in_set fails when DATAGEN_SEED=1698940723
#9659 [BUG] executor crash intermittantly in scala2.13-built spark332 integration tests
#9923 [BUG] Failed case about test_timestamp_seconds_rounding_necessary[Decimal(20,7)][DATAGEN_SEED=1701412018] – src.main.python.date_time_test
#9982 [BUG] test "convert large InternalRow iterator to cached batch single col" failed with arena pool
#9683 [BUG] test_map_scalars_supported_key_types fails with DATAGEN_SEED=1698940723
#9976 [BUG] test_part_write_round_trip[Float] Failed on -0.0 partition
#9948 [BUG] parquet reader data corruption in nested schema after rapidsai/cudf#13302
#9867 [BUG] Unable to use Spark Rapids with Spark Thrift Server
#9934 [BUG] test_delta_multi_part_write_round_trip_unmanaged and test_delta_part_write_round_trip_unmanaged failed DATA_SEED=1701608331
#9933 [BUG] collection_ops_test.py::test_sequence_too_long_sequence[Long(not_null)][DATAGEN_SEED=1701553915, INJECT_OOM]
#9837 [BUG] test_part_write_round_trip failed
#9932 [BUG] Failed test_multi_tier_ast[DATAGEN_SEED=1701445668] on CI
#9829 [BUG] Java OOM when testing non-UTC time zone with lots of cases fallback.
#9403 [BUG] test_cogroup_apply_udf[Short(not_null)] failed with pandas 2.1.X
#9684 [BUG] test_coalesce fails with DATAGEN_SEED=1698940723
#9685 [BUG] test_case_when fails with DATAGEN_SEED=1698940723
#9776 [BUG] fastparquet compatibility tests fail with data mismatch if TZ is not set and system timezone is not UTC
#9733 [BUG] Complex AST expressions can crash with non-matching operand type error
#9877 [BUG] Fix resource leak in to_json
#9722 [BUG] test_floor_scale_zero fails with DATAGEN_SEED=1700009407
#9846 [BUG] test_ceil_scale_zero may fail with different datagen_seed
#9781 [BUG] test_cast_string_date_valid_format fails on DATAGEN_SEED=1700250017
#9714 Scala Map class not found when executing the benchmark on Spark 3.5.0 with Scala 2.13
#9856 collection_ops_test.py failed on Dataproc-2.1 with: Column 'None' does not exist
#9397 [BUG] RapidsShuffleManager MULTITHREADED on Databricks, we see loss of executors due to Rpc issues
#9738 [BUG] test_delta_part_write_round_trip_unmanaged and test_delta_multi_part_write_round_trip_unmanaged fail with DATAGEN_SEED=1700105176
#9771 [BUG] ast_test.py::test_X[(String, True)][DATAGEN_SEED=1700205785] failed
#9782 [BUG] Error messages appear in a clean build
#9798 [BUG] GpuCheckOverflowInTableInsert should be added to databricks shim
#9820 [BUG] test_parquet_write_roundtrip_datetime_with_legacy_rebase fails with "year 0 is out of range"
#9817 [BUG] FAILED dpp_test.py::test_dpp_reuse_broadcast_exchange[false-0-parquet][DATAGEN_SEED=1700572856, IGNORE_ORDER]
#9768 [BUG] cast decimal to string ScalaTest relies on a side effects
#9711 [BUG] test_lte fails with DATAGEN_SEED=1699987762
#9751 [BUG] cmp_test test_gte failed with DATAGEN_SEED=1700149611
#9469 [BUG] [main] ERROR com.nvidia.spark.rapids.GpuOverrideUtil - Encountered an exception applying GPU overrides java.lang.IllegalStateException: the broadcast must be on the GPU too
#9648 [BUG] Existence default values in schema are not being honored
#9676 Fix Delta Lake Integration tests; test_delta_atomic_create_table_as_select and test_delta_atomic_replace_table_as_select
#9701 [BUG] test_ts_formats_round_trip and test_datetime_roundtrip_with_legacy_rebase fail with DATAGEN_SEED=1699915317
#9691 [BUG] Repeated Maven invocations w/o changes recompile too many Scala sources despite recompileMode=incremental
#9547 Update buildall and doc to generate bloop projects for test debugging
#9697 [BUG] Iceberg multiple file readers can not read files if the file paths contain encoded URL unsafe chars
#9681 Databricks Build Failing For 330db+
#9521 [BUG] Multi Threaded Shuffle Writer needs flow control
#9675 Failing Delta Lake Tests for Databricks 13.3 Due to WriteIntoDeltaCommand
#9669 [BUG] Rebase exception states not in UTC but timezone is Etc/UTC
#7940 [BUG] UCX peer connection issue in multi-nic single node cluster
#9650 [BUG] Github workflow for missing scala2.13 updates fails to detect when pom is new
#9621 [BUG] Scala 2.13 with-classifier profile is picking up Scala2.12 spark.version
#9636 [BUG] All parquet integration tests failed "Part of the plan is not columnar class" in databricks runtimes
#9108 [BUG] nullability on some decimal operations is wrong
#9625 [BUG] Typo in github Maven check install-modules
#9603 [BUG] fastparquet_compatibility_test fails on dataproc
#8729 [BUG] nightly integration test failed OOM kill in JDK11 ENV
#9589 [BUG] Scala 2.13 build hard-codes Java 8 target
#9581 Delta Lake 2.4 missing equals/hashCode override for file format and some metrics for merge
#9507 [BUG] Spark 3.2+/ParquetFilterSuite/Parquet filter pushdown - timestamp/ FAILED
#9540 [BUG] Job failed with SparkUpgradeException no matter which value are set for spark.sql.parquet.datetimeRebaseModeInRead
#9545 [BUG] Dataproc 2.0 test_reading_file_rewritten_with_fastparquet tests failing
#9552 [BUG] Inconsistent CDH dependency overrides across submodules
#9571 [BUG] non-deterministic compiled SQLExecPlugin.class with scala 2.13 deployment
#9569 [BUG] test_window_running failed in 3.1.2+3.1.3
#9480 [BUG] mapInPandas doesn't invoke udf on empty partitions
#8644 [BUG] Parquet file with malformed dictionary does not error when loaded
#9310 [BUG] Improve support for reading JSON files with malformed rows
#9457 [BUG] CDH 332 unit tests failing
#9404 [BUG] Spark reports a decimal error when create lit scalar when generate Decimal(34, -5) data.
#9110 [BUG] GPU Reader fails due to partition column creating column larger then cudf column size limit
#8631 [BUG] Parquet load failure on repeated_no_annotation.parquet
#9364 [BUG] CUDA illegal access error is triggering split and retry logic

PRs

#10340 Copyright to 2024 [skip ci]
#10323 Upgrade version to 23.12.2-SNAPSHOT
#10274 PythonRunner Changes
#10124 Update changelog for v23.12.1 [skip ci]
#10123 Change version to v23.12.1 [skip ci]
#10122 Init changelog for v23.12.1 [skip ci]
#10121 [DOC] update download page for db hot fix [skip ci]
#10116 Upgrade to 23.12.1-SNAPSHOT
#9935 Init 23.12 changelog [skip ci]
#9943 [DOC] Update docs for 23.12.0 release [skip ci]
#10014 Add documentation for how to run tests with a fixed datagen seed [skip ci]
#9954 Update private and JNI version to released 23.12.0
#10009 Using fix seed to unblock 23.12 release; Move the blocked issues to 24.02
#10007 Fix Java OOM in non-UTC case with lots of xfail (#9944)
#9985 Avoid allocating GPU memory out of RMM managed pool in test
#9970 Avoid leading and trailing zeros in test_timestamp_seconds_rounding_necessary
#9978 Avoid using floating point values as partition values in tests
#9979 Add compatibility notes for writing ORC with lost Gregorian days [skip ci]
#9949 Override the seed for test_map_scalars_supported_key_types for version of Spark before 3.4.0 [Databricks]
#9961 Avoid using floating point for partition values in Delta Lake tests
#9960 Fix LongGen accidentally using special cases when none are desired
#9950 Avoid generating NaNs as partition values in test_part_write_round_trip
#9940 Fix 'year 0 is out of range' by setting a fix seed
#9946 Fix test_multi_tier_ast to ignore ordering of output rows
#9928 Test inset with NaN only for Spark from 3.1.3
#9906 Fix test_initcap to use the intended limited character set
#9831 Skip fastparquet timestamp tests when plugin cannot read/write timestamps
#9893 Add multiple expression tier regression test for AST
#9873 Add support for decimal in to_json
#9890 Remove Databricks 13.3 from release 23.12
#9874 Fix zero-scale floor and ceil tests
#9879 Fix resource leak in to_json
#9600 Add date and timestamp support to to_json
#9871 Fix test_cast_string_date_valid_format generating year 0
#9885 Preparation for non-UTC nightly CI [skip ci]
#9810 Support from_utc_timestamp on the GPU for non-UTC timezones (non-DST)
#9865 Fix problems with nulls in sequence tests
#9864 Add compatibility documentation with respect to decimal overflow detection [skip ci]
#9860 Fixing FAQ deadlink in plugin code [skip ci]
#9840 Avoid using NaNs as Delta Lake partition values
#9773 xfail all the impacted cases when using non-UTC time zone
#9849 Instantly Delete pre-merge content of stage workspace if success
#9848 Force datagen_seed for test_ceil_scale_zero and test_decimal_round
#9677 Enable build for Databricks 13.3
#9809 Re-enable AST string integration cases
#9835 Avoid pre-Gregorian dates in schema_evolution_test
#9786 Check paths for existence to prevent ignorable error messages during build
#9824 UCX 1.15 upgrade
#9800 Add GpuCheckOverflowInTableInsert to Databricks 11.3+
#9821 Update timestamp gens to avoid "year 0 is out of range" errors
#9826 Set seed to 0 for test_hash_reduction_sum
#9720 Support timestamp in from_json
#9818 Specify nullable=False when generating filter values in dpp tests
#9689 Support CPU path for from_utc_timestamp function with timezone
#9769 Use withGpuSparkSession to customize SparkConf
#9780 Fix NaN handling in GpuLessThanOrEqual and GpuGreaterThanOrEqual
#9795 xfail AST string tests
#9666 Add support for parsing strings as dates in from_json
#9673 Fix the broadcast joins issues caused by InputFileBlockRule
#9785 Force datagen_seed for 9781 and 9784 [skip ci]
#9765 Let GPU scans fall back when default values exist in schema
#9729 Fix Delta Lake atomic table operations on spark341db
#9770 [BUG] Fix the doc for Maven and Scala 2.13 test example [skip ci]
#9761 Fix bug in tagging of JsonToStructs
#9758 Remove forced seed from Delta Lake part_write_round_trip_unmanaged tests
#9652 Add time zone config to set non-UTC
#9736 Fix TimestampGen to generate value not too close to the minimum allowed timestamp
#9698 Speed up build: unnecessary invalidation in the incremental recompile mode
#9748 Fix Delta Lake part_write_round_trip_unmanaged tests with floating point
#9702 Support split BroadcastNestedLoopJoin condition for AST and non-AST
#9746 Force test_hypot to be single seed for now
#9745 Avoid generating null filter values in test_delta_dfp_reuse_broadcast_exchange
#9741 Set seed=0 for the delta lake part roundtrip tests
#9660 Fully support date/time legacy rebase for nested input
#9672 Support String type for AST
#9732 Temporarily force datagen_seed=0 for test_re_replace_all to unblock CI
#9726 Fix leak in BatchWithPartitionData
#9717 Encode the file path from Iceberg when converting to a PartitionedFile
#9441 Add a random seed specific to datagen cases
#9649 Support spark.sql.parquet.datetimeRebaseModeInRead=LEGACY and spark.sql.parquet.int96RebaseModeInRead=LEGACY
#9612 Escape quotes and newlines when converting strings to json format in to_json
#9644 Add Partial Delta Lake Support for Databricks 13.3
#9690 Changed extractExecutedPlan to consider ResultQueryStageExec for Databricks 13.3
#9686 Removed Maven Profiles From tests/pom.xml
#9509 Fine-grained spill metrics
#9658 Support spark.sql.parquet.int96RebaseModeInWrite=LEGACY
#9695 Revert "Support split non-AST-able join condition for BroadcastNested…
#9693 Enable automerge from 23.12 to 24.02 [skip ci]
#9679 [Doc] update the dead link in download page [skip ci]
#9678 Add flow control for multithreaded shuffle writer
#9635 Support split non-AST-able join condition for BroadcastNestedLoopJoin
#9646 Fix Integration Test Failures for Databricks 13.3 Support
#9670 Normalize file timezone and handle missing file timezone in datetimeRebaseUtils
#9657 Update verify check to handle new pom files [skip ci]
#9663 Making User Guide info in bold and adding it as top right link in github.io [skip ci]
#9609 Add valid retry solution to mvn-verify [skip ci]
#9655 Document problem with handling of invalid characters in CSV reader
#9620 Add support for parsing boolean values in from_json
#9615 Bloop updates - require JDK11 in buildall + docs, build bloop for all targets.
#9631 Refactor Parquet readers
#9637 Added Support For Various Execs for Databricks 13.3
#9640 Add support for ignoreNullFields=false in to_json
#9623 Running window optimization for LAST()
#9641 Revert "Support rebase checking for nested dates and timestamps (#9617)"
#9423 Re-enable from_json / JsonToStructs
#9624 Add jenkins-level retry for pre-merge build in databricks runtimes
#9608 Fix nullability issues for some decimal operations
#9617 Support rebase checking for nested dates and timestamps
#9611 Move simple classes after refactoring to sql-plugin-api
#9618 Remove unused dataTypes argument from HostShuffleCoalesceIterator
#9626 Fix ENV typo in pre-merge github actions [skip ci]
#9593 PythonRunner and RapidsErrorUtils Changes For Databricks 13.3
#9607 Integration tests: Install specific fastparquet version.
#9610 Propagate local properties to broadcast execs
#9544 Support batching for RANGE running window aggregations. Including on
#9601 Remove usage of deprecated scala.Proxy
#9591 Enable implicit JDK profile activation
#9586 Merge metrics and file format fixes to Delta 2.4 support
#9594 Revert "Ignore failing Parquet filter test to unblock CI (#9519)"
#9454 Support encryption and compression in disk store
#9439 Support stack function
#9583 Fix fastparquet tests to work with HDFS
#9508 Consolidate deps switching in an intermediate pom
#9562 Delta Lake 2.3.0 support
#9576 Move Stack classes to wrapper classes to fix non-deterministic build issue
#9572 Add retry for CrossJoinIterator and ConditionalNestedLoopJoinIterator
#9575 Fix test_window_running*() for NTH_VALUE IGNORE NULLS.
#9574 Fix broken #endif scala comments [skip ci]
#9568 Enforce Apache 3.3.0+ for Scala 2.13
#9557 Support launching Map Pandas UDF on empty partitions
#9489 Batching support for ROW-based FIRST() window function
#9510 Add Databricks 13.3 shim boilerplate code and refactor Databricks 12.2 shim
#9554 Fix fastparquet installation for
#9536 Add CPU POC of TimeZoneDB; Test some time zones by comparing CPU POC and Spark
#9558 Support integration test against scala2.13 spark binaries[skip ci]
#8592 Scala 2.13 Support
#9551 Enable malformed Parquet failure test
#9546 Support OverwriteByExpressionExecV1 for Delta Lake tables
#9527 Support Split And Retry for GpuProjectAstExec
#9541 Move simple classes to API
#9548 Append new authorized user to blossom-ci whitelist [skip ci]
#9418 Fix STRUCT comparison between Pandas and Spark dataframes in fastparquet tests
#9468 Add SplitAndRetry to GpuRunningWindowIterator
#9486 Add partial support for to_json
#9538 Fix tiered project breaking higher order functions
#9539 Add delta-24x to delta-lake/README.md [skip ci]
#9534 Add pyarrow tests for Databricks runtime
#9444 Remove redundant pass-through shuffle manager classes
#9531 Fix relative path for spark-shell nightly test [skip ci]
#9525 Follow-up to dbdeps consolidation
#9506 Move ProxyShuffleInternalManagerBase to api
#9504 Add a spark-shell smoke test to premerge and nightly
#9519 Ignore failing Parquet filter test to unblock CI
#9478 Support AppendDataExecV1 for Delta Lake tables
#9366 Add tests to check compatibility with fastparquet
#9419 Add retry to RoundRobin Partitioner and Range Partitioner
#9502 Install Dependencies Needed For Databricks 13.3
#9296 Implement percentile aggregation
#9488 Add Shim JSON Headers for Databricks 13.3
#9443 Add AtomicReplaceTableAsSelectExec support for Delta Lake
#9476 Refactor common Delta Lake test code
#9463 Fix Cloudera 3.3.2 shim for handling CheckOverflowInTableInsert and orc zstd support
#9460 Update links in old release notes to new doc locations [skip ci]
#9405 Wrap scalar generation into spark session in integration test
#9459 Fix 332cdh build [skip ci]
#9425 Add support for AtomicCreateTableAsSelect with Delta Lake
#9434 Add retry support to HostToGpuCoalesceIterator.concatAllAndPutOnGPU
#9453 Update codeowner and blossom-ci ACL [skip ci]
#9396 Add support for Cloudera CDS-3.3.2
#9380 Fix parsing of Parquet legacy list-of-struct format
#9438 Fix auto merge conflict 9437 [skip ci]
#9424 Refactor aggregate functions
#9414 Add retry to GpuHashJoin.filterNulls
#9388 Add developer documentation about working with data sources [skip ci]
#9369 Improve JSON empty row fix to use less memory
#9373 Fix auto merge conflict 9372
#9308 Initiate arm64 CI support [skip ci]
#9292 Init project version 23.12.0-SNAPSHOT
#9291 Automerge from 23.10 to 23.12 [skip ci]

Release 23.10

Features

#9220 [FEA] Add GPU support for converting binary data to a hex string in REPL
#9171 [FEA] Add GPU version of ToPrettyString
#5314 [FEA] Support window.rowsBetween(Window.unboundedPreceding, -1)
#9057 [FEA] Add unbounded to unbounded fixers for min and max
#8121 [FEA] Add Spark 3.5.0 shim layer
#9224 [FEA] Allow } and }} to be transpiled to static strings
#8596 [FEA] Support spark.sql.legacy.parquet.datetimeRebaseModeInWrite=LEGACY
#8767 [AUDIT][SPARK-43302][SQL] Make Python UDAF an AggregateFunction
#9055 [FEA] Support Spark 3.3.3 official release
#8672 [FEA] Make GPU readers easier to debug on failure (any failure including OOM)
#8965 [FEA] Enable Bloom filter join acceleration by default
#8625 [FEA] Support outputTimestampType being INT96

Performance

#9512 [DOC] Multi-Threaded shuffle documentation is not accurate on the read side
#7803 [FEA] Accelerate Bloom filtered joins

Bugs Fixed

#8662 [BUG] Dataproc spark-rapids.sh fails due to cuda driver version issue
#9428 [Audit] SPARK-44448 Wrong results for dense_rank() <= k
#9485 [BUG] GpuSemaphore can deadlock if there are multiple threads per task
#9498 [BUG] spark 3.5.0 shim spark-shell is broken in spark-rapids 23.10 and 23.12
#9060 [BUG] OOM error in split and retry with multifile coalesce reader with parquet data
#8916 [BUG] Databricks - move init scripts off DBFS
#9416 [BUG] CDH build failed due to missing dependencies
#9357 [BUG] json_test failed on "NameError: name 'TimestampNTZType' is not defined"
#9271 [BUG] ThreadPool size is deduced incorrectly in MultiFileReaderThreadPool on YARN clusters
#9309 [BUG] bround and round do not return the correct result for some decimal values.
#9153 [BUG] netty OOM with MULTITHREADED shuffle
#9311 [BUG] test_hash_groupby_collect_list fails
#9180 [FEA][AUDIT][SPARK-44641] Incorrect result in certain scenarios when SPJ is not triggered
#9290 [BUG] delta_lake_test FAILED on "column mapping mode id is not supported for this Delta version"
#9255 [BUG] Unable to read DeltaTable with columnMapping.mode = name
#9261 [BUG] Leaks and Double Frees in Unit Tests
#9246 [BUG] test_predefined_character_classes failed with seed 4
#9208 [BUG] SplitAndRetryOOM query14_part1 at 100TB with spark.executor.cores=64
#9106 [BUG] Configuring GDS breaks new host spillable buffers and batches
#9131 [BUG] ConcurrentModificationException in ScalableTaskCompletion
#9263 [BUG] Unit test logging is not captured when running against Spark 3.5.0
#9168 [BUG] Calling RmmSpark.getAndResetNumRetryThrow from tests is not working
#8776 [BUG] FileCacheIntegrationSuite intermittent failure
#9223 [BUG] Failed to create memory map on query14_part1 at 100TB with spark.executor.cores=64
#9116 [BUG] spark350 shim build failed in mvn-verify github checks and nightly due to dependencies not released
#8984 [BUG] Check that keys are not null when creating a map
#9233 [BUG] test_parquet_testing_error_files - Failed: DID NOT RAISE <class 'Exception'> in databricks runtime 12.2
#9142 [BUG] AWS EMR 6.12 NDS SF3k query9 Failure on g4dn.4xlarge
#9214 [BUG] mvn resolve dependencies failed missing rapids-4-spark-sql-plugin-api_2.12 of 311 shim
#9204 [BUG] SplitAndRetryOOM query78 at 100TB with spark.executor.cores=64
#9213 [BUG] Missing revision info in databricks shims failed nightly build
#9206 [BUG] test_datetime_roundtrip_with_legacy_rebase failed in databricks runtimes
#9165 [BUG] Data gen for key groups produces type-mismatch columns
#9129 [BUG] Writing Parquet map(map) column can not set the outer key as non-null.
#9194 [BUG] missing sql-plugin-api databricks artifacts in the nightly CI
#9167 [BUG] Ensure no udf-compiler internal nodes escape
#9092 [BUG] NDS query 64 falls back to CPU only for a shuffle
#9071 [BUG] test_numeric_running_sum_window_no_part_unbounded failed in MT tests
#9154 [BUG] Spark 3.5.0 nightly build failures (test_parquet_testing_error_files)
#9149 [BUG] compile failed in databricks runtimes due to new added TestReport
#9041 [BUG] Fix regression in Python UDAF support when running against Spark 3.5.0
#9064 [BUG][Spark 3.5.0] Re-enable test_hive_empty_simple_udf when 3.5.0-rc2 is available
#9065 [BUG][Spark 3.5.0] Reinstate cast map/array to string tests when 3.5.0-rc2 is available
#9119 [BUG] Predicate pushdown doesn't work for parquet files written by GPU
#9103 [BUG] test_select_complex_field fails in MT tests
#9086 [BUG] GpuBroadcastNestedLoopJoinExec can assert in doUnconditionalJoin
#8939 [BUG] q95 odd task failure in query95 at 30TB
#9082 [BUG] Race condition while spilling and aliasing a RapidsBuffer (regression)
#9069 [BUG] ParquetFormatScanSuite does not pass locally
#8980 [BUG] invalid escape sequences in pytests
#7807 [BUG] Round robin partitioning sort check falls back to CPU for cases that can be supported
#8482 [BUG] Potential leak on SplitAndRetry when iterator not fully drained
#8942 [BUG] NDS query 14 parts 1 and 2 both fail at SF100K
#8778 [BUG] GPU Parquet output for TIMESTAMP_MICROS is misinteterpreted by fastparquet as nanos

PRs

#9304 Specify recoverWithNull when reading JSON files
#9474 Improve configuration handling in BatchWithPartitionData
#9289 Add tests to check compatibility with pyarrow
#9522 Update 23.10 changelog [skip ci]
#9501 Fix GpuSemaphore to support multiple threads per task
#9500 Fix Spark 3.5.0 shell classloader issue with the plugin
#9230 Fix reading partition value columns larger than cudf column size limit
#9427 [DOC] Update docs for 23.10.0 release [skip ci]
#9421 Init changelog of 23.10 [skip ci]
#9445 Only run test_csv_infer_schema_timestamp_ntz tests with PySpark >= 3.4.1
#9420 Update private and jni dep version to released 23.10.0
#9415 [BUG] fix docker modified check in premerge [skip ci]
#9407 [Doc]Update docs for 23.08.2 version[skip ci]
#9392 Only run test_json_ts_formats_round_trip_ntz tests with PySpark >= 3.4.1
#9401 Remove using mamba before they fix the incompatibility issue [skip ci]
#9381 Change the executor core calculation to take into account the cluster manager
#9351 Put back in full decimal support for format_number
#9374 GpuCoalesceBatches should throw SplitAndRetyOOM on GPU OOM error
#9238 Simplified handling of GPU core dumps
#9362 [DOC] Removing User Guide pages that will be source of truth on docs.nvidia…
#9365 Update DataWriteCommandExec docs to reflect ORC support for nested types
#9277 [Doc]Remove CUDA related requirement from download page.[Skip CI]
#9352 Refine rules for skipping test_csv_infer_schema_timestamp_ntz_* tests
#9334 Add NaNs to Data Generators In Floating-Point Testing
#9344 Update MULTITHREADED shuffle maxBytesInFlight default to 128MB
#9330 Add Hao to blossom-ci whitelist
#9328 Building different Cuda versions section profile does not take effect [skip ci]
#9329 Add kuhushukla to blossom ci yml
#9281 Support format_number
#9335 Temporarily skip failing tests test_csv_infer_schema_timestamp_ntz*
#9318 Update authorized user in blossom-ci whitelist [skip ci]
#9221 Add GPU version of ToPrettyString
#9321 [DOC] Fix some incorrect config links in doc [skip ci]
#9314 Fix RMM crash in FileCacheIntegrationSuite with ARENA memory allocator
#9287 Allow checkpoint and restore on non-deterministic expressions in GpuFilter and GpuProject
#9146 Improve some CSV integration tests
#9159 Update tests and documentation for spark.sql.timestampType when reading CSV/JSON
#9313 Sort results of collect_list test before comparing since it is not guaranteed
#9286 [FEA][AUDIT][SPARK-44641] Incorrect result in certain scenarios when SPJ is not triggered
#9229 Support negative preceding/following for ROW-based window functions
#9297 Append new authorized user to blossom-ci whitelist [skip ci]
#9294 Fix test_delta_read_column_mapping test failures on Spark 3.2.x and 3.3.x
#9285 Add CastOptions to make GpuCast extendible to handle more options
#9279 Fix file format checks to be exact and handle Delta Lake column mapping
#9283 Refactor ExternalSource to move some APIs to converted GPU format or scan
#9264 Fix leak in test and double free in corner case
#9280 Fix some issues found with different seeds in integration tests
#9257 Have host spill use the new HostAlloc API
#9253 Enforce Scala method syntax over deprecated procedure syntax
#9273 Add arm64 profile to build arm artifacts
#9270 Remove GDS spilling
#9267 Roll our own BufferedIterator so we can close cleanly
#9266 Specify correct dependency versions for 350 build
#9262 Add Delta Lake support for Spark 3.4.1 and Delta Lake tests on Spark 3.4.x
#9256 Test Parquet double column stat without NaN
#9254 [Doc]update the emr getting started doc for emr-6130 release[skip ci]
#9228 Add in unbounded to unbounded optimization for min/max
#9252 Add Spark 3.5.0 to list of supported Spark versions [skip ci]
#9251 Enable a couple of retry asserts in internal row to cudf row iterator suite
#9239 Handle escaping the dangling right ] and right } in the regexp transpiler
#9090 Add test cases for Parquet statistics
#9240 Fix flaky ORC filecache test
#9053 [DOC] update the turning guide document issues [skip ci]
#9211 Allow skipping host spill for a direct device->disk spill
#9234 Enable Spark 350 builds
#9237 Check for null keys when creating map
#9235 xfail fixed_length_byte_array.parquet test due to rapidsai/cudf#14104
#9231 Use conda libmamba solver to resolve intermittent libarchive issue [skip ci]
#8404 Add in support for FIXED_LEN_BYTE_ARRAY as binary
#9225 Add in a HostAlloc API for high priority and add in spilling
#9207 Support SplitAndRetry for GpuRangeExec
#9217 Fix leak in aggregate when there are retries
#9200 Fix a few minor things with scale test
#9222 Deploy classified aggregator for Databricks [skip ci]
#9209 Fix tests for datetime rebase in Databricks
#9181 [DOC] address document issues [skip ci]
#9132 Support spark.sql.parquet.datetimeRebaseModeInWrite=LEGACY
#9196 Fix host memory leak for R2C
#9192 Throw overflow exception when interval seconds are outside of range [0, 59]
#9150 add error section in report and the rest queries
#9189 Expose host store spill
#9147 Make map column non-nullable when it's a key in another map.
#9193 Support Retry for GpuLocalLimitExec and GpuGlobalLimitExec
#9183 Add test to verify UDT fallback for parquet
#9195 Deploy sql-plugin-api artifact in DBR CI pipelines [skip ci]
#9170 Add in new HostAlloc API
#9182 Consolidate Spark vendor shim dependency management
#9190 Prevent returning internal compiler expressions when compiling UDFs
#9164 Support Retry for GpuTopN and GpuSortEachBatchIterator
#9134 Fix shuffle fallback due to AQE on AWS EMR
#9188 Fix flaky tests in FileCacheIntegrationSuite
#9148 Add minimum Maven module eventually containing all non-shimmable source code
#9169 Add retry-without-split in InternalRowToColumnarBatchIterator
#9172 Remove doSetSpillable in favor of setSpillable
#9152 Add test cases for testing Parquet compression types
#9157 XFAIL parquet lz4_raw tests for Spark 3.5.0 or later
#9128 Test parquet predicate pushdown for basic types and fields having dots in names
#9158 Add json4s dependencies for Databricks integration_tests build
#9102 Add retry support to GpuOutOfCoreSortIterator.mergeSortEnoughToOutput
#9089 Add application to run Scale Test
#9143 [DOC] update spark.rapids.sql.concurrentGpuTasks default value in tuning guide [skip ci]
#8476 Use retry with split in GpuCachedDoublePassWindowIterator
#9141 Removed resultDecimalType in GpuIntegralDecimalDivide
#9099 Spark 3.5.0 follow-on work (rc2 support + Python UDAF)
#9140 Bump Jython to 2.7.3
#9136 Moving row column conversion code from cudf to jni
#9133 Add 350 tag to InSubqueryShims
#9124 Import scala.collection intead of collection
#9122 Fall back to CPU if spark.sql.execution.arrow.useLargeVarTypes is true
#9115 [DOC] updates documentation related to java compatibility [skip ci]
#9098 Add SpillableHostColumnarBatch
#9091 GPU support for DynamicPruningExpression and InSubqueryExec
#9117 Temply disable spark 350 shim build in nightly [skip ci]
#9113 Instantiate execution plan capture callback via shim loader
#8969 Initial support for Spark 3.5.0-rc1
#9100 Support broadcast nested loop existence joins with no condition
#8925 Add GpuConv operator for the conv 10<->16 expression
#9109 [DOC] adding java 11 to download docs [skip ci]
#9085 Retry with smaller split on CudfColumnSizeOverflowException
#8961 Save Databricks init scripts in the workspace
#9088 Add retry and SplitAndRetry support to AcceleratedColumnarToRowIterator
#9095 Support released spark 3.3.3
#9084 Fix race when a rapids buffer is aliased while it is spilled
#9093 Update ParquetFormatScanSuite to not call CUDF directly
#9068 Test ORC predicate pushdown (PPD) with timestamps decimals booleans
#9054 Initial entry point to data generation for scale test
#9070 Spillable host buffer
#9066 Add retry support to RowToColumnarIterator
#9073 Stop using invalid escape sequences
#9018 Add test for selecting a single complex field array and its parent struct array
#9067 Add array support for round robin partition; Refactor pluginSupportedOrderableSig
#9072 Revert "Implement SumUnboundedToUnboundedFixer (#8934)"
#9056 Add in configs for host memory limits
#9061 Fix import order
#8934 Implement SumUnboundedToUnboundedFixer
#9051 Use number of threads on executor instead of driver to set core count
#9040 Fix issues from 23.08 merge in join_test
#9045 Fix auto merge conflict 9043 [skip ci]
#9009 Add in a layer of indirection for task completion callbacks
#9013 Create a two-shim jar by default on Databricks
#8995 Add test case for ORC statistics test
#8970 Add ability to debug dump input data only on errors
#9003 Fix auto merge conflict 9002 [skip ci]
#8989 Mark lazy spillables as allowSpillable in during gatherer construction
#8988 Move big data generator to a separate module
#8987 Fix host memory buffer leaks in SerializationSuite
#8968 Enable GPU acceleration of Bloom filter join expressions by default
#8947 Add ArrowUtilsShims in preparation for Spark 3.5.0
#8946 [Spark 3.5.0] Shim access to StructType.fromAttributes
#8824 Drop the in-range check at INT96 output path
#8924 Deprecate and delegate GpuCV.debug to cudf TableDebug
#8915 Move LegacyBehaviorPolicy references to shim layer
#8918 Output unified diff when GPU output deviates
#8857 Remove the pageable pool
#8854 Fix auto merge conflict 8853 [skip ci]
#8805 Bump up dep versions to 23.10.0-SNAPSHOT
#8796 Init version 23.10.0-SNAPSHOT

Older Releases

Changelog of older releases can be found at docs/archives