Releases · pola-rs/polars

01 Oct 19:54

github-actions

py-1.9.0

be5a4b4

Python Polars 1.9.0 Latest

Latest

🚀 Performance improvements

Use List's TotalEqKernel (#18984)

✨ Enhancements

Bitwise operations / aggregations (#18994)
Allow insert_column to take expressions (#19024)
Improved error message DSL -> IR resolving (#19032)
Add strict param to eager/lazy frame "rename" (#19017)
Support schema arg in read/scan_parquet() (#19013)
Add include_file_paths parameter to read_parquet (#19008)
Add allow_missing_columns option to read/scan_parquet (#18922)
Drop python 3.8 support (#18965)
Use FFI to extract Series from different Polars binaries (#18964)
Allow for zero-width fixed size lists (#18940)

🐞 Bug fixes

Remove failing temporal lit tests (#19056)
Divide-by-zero in OOC sort (#19048)
Ensure must_flush flag is not reset (#19046)
Error node should be on top (#19045)
Force nested struct missing equality (#19031)
Fix invalid alias udf (#19021)
Raise invalid predicate join_where (#19020)
Fix nested flag of functions with multiple arguments (#19016)
Fix projection pushdown bug in IEJOINS (#19015)
Separate temporal tests (#19012)
Return the truth values of ne_missing and eq_missing operations for struct instead of null (#18930)
Fix list to numpy conversion (#19009)
Fix struct broadcasting comparisons (#19003)
Wrong result on when().then().otherwise() on struct when both result are broadcast (#19000)
Improve literals for temporal subclasses (#18998)
Ensure same fmt in Series/AnyValue to string cast (#18982)
Return correct value for when().then().else() on structs when using first()\last() (#18969)
IPC don't write variadic_buffer_counts in blocks, but only dictionaries (#18980)
Respect allow_threading in TernaryExpr (#18977)
Make join test order-agnostic (#18975)
Fix lit().shrink_dtype() broadcasting (#18958)
Parallel evaluation of cumulative_eval (#18959)
Properly implement AnyValue::Binary into_py (#18960)
Fix Expr.over with order_by did not take effect if group keys were sorted (#18947)
Properly fetch type of full None List Series (#18916)
Incorrect mode for sorted input (#18945)
Properly choose inner physical type for Array (#18942)
Disable very old date in timezone test for CI (#18935)
Infer reshape dims when determining schema (#18923)
Incorrect broadcasting on list-of-string set ops (#18918)
Adding with_row_index() to previously collected lazy scan does not take effect (#18913)

📖 Documentation

Fix example of lazy schema verification (#19059)
Rewrite 'Getting started' page (#19028)
Fix is_not_nan description (#18985)
Recommend targetDir for rust-analyzer (#18973)
Fix LazyFrame fetch method references (#18033)

📦 Build system

Bump Rust toolchain to nightly-2024-09-29 (#19006)
Bump simd-json to 0.14 (#18999)

🛠️ Other improvements

Remove built info (#19057)
Mark schema arg in read/scan_parquet as unstable (#19018)
Fix new-streaming test_lazy_parquet::test_row_index (#19019)
Preserve scalar in more places (#18898)
Mention allow_missing_columns in error message when column not found (parquet) (#18972)
Disable CSE-specific test on new streaming engine (#18971)
Add FixedSizeList equality broadcasting (#18967)
Divide ChunkCompare into Eq and Ineq variants (#18963)
Another set of new-stream test skip/fixes (#18952)
Fix/skip variety of new-streaming tests, cont (#18928)
Fix/skip variety of new-streaming tests (#18924)

Thank you to all our contributors for making this release possible!
@LukasFolwarczny, @Plutone11011, @aleexharris, @alexander-beedie, @barak1412, @coastalwhite, @dependabot, @dependabot[bot], @edwinvehmaanpera, @kgv, @mcrumiller, @nameexhaustion, @orlp, @ritchie46, @rodrigogiraoserrao, @stinodego and @xhiroga

Contributors

orlp, kgv, and 14 other contributors

Assets 3

24 Sep 20:10

github-actions

py-1.8.2

f235240

Python Polars 1.8.2

🚀 Performance improvements

Improve rename performace for Lazy API (#18890)
Collapse cross-joins to faster joins (#18633)

✨ Enhancements

Improve scalar strict message (#18904)

🐞 Bug fixes

Properly zip struct validities (#18886)
Out-of-bounds gather in categorical->int cast (#18897)
AnyValue Series from Categorical/Enum (#18893)
Properly cast AnyValue string (#18888)
Fix SO in json inference (#18887)
Use proper thread pool in cumulative_eval (#18885)

📖 Documentation

Fix broken user-guide API links (#18872)

Thank you to all our contributors for making this release possible!
@coastalwhite, @npielawski, @orlp, @ritchie46 and @siddharth-vi

Contributors

orlp, ritchie46, and 3 other contributors

Assets 3

23 Sep 19:35

github-actions

py-1.8.1

3d296a6

Python Polars 1.8.1

🚀 Performance improvements

Cache register plugin function (#18860)

🐞 Bug fixes

Properly calculate duration units (#18869)
Check values in strict cast Int to Time (#18854)
Fix typo in DuplicateError error message (#18855)
Properly merge live- and dead columns in prefiltered (#18862)

📖 Documentation

Fix minor rogue apostrophes (#18865)

🛠️ Other improvements

Make with_column_unchecked take Column (#18863)
Keep scalar in more places (#18775)

Thank you to all our contributors for making this release possible!
@MarcoGorelli, @coastalwhite, @mcrumiller, @ritchie46 and @rodrigogiraoserrao

Contributors

mcrumiller, ritchie46, and 3 other contributors

Assets 3

23 Sep 12:11

github-actions

py-1.8.0

3be3b86

Python Polars 1.8.0

✨ Enhancements

Support arithmetic between Series with dtype list (#17823)
Relaxed schema alignment for parquet file list read (#18803)
Always preserve sorted flag for .dt.date (#18692)
Enable additional ruff lint rule sets (#18721)
Implement single inequality joins for join_where (#18727)

🐞 Bug fixes

DataFrame plot was raising when some extra keywords were passed to encodings (e.g. x=alt.X(a, axis=alt.Axis(labelAngle=30))) (#18836)
Respect strictness in list constructor (#18853)
Properly broadcast array arithmetic (#18851)
Throw error for comparison of unequal length series (#18816)
Raise when parquet file has extra columns and no select() was done (#18843)
Pass missing user params in write_csv (#18845)
Improve join argument checks (#18847)
Struct filter by index (#18778)
Proper dtype casting for struct embedded categoricals in chunked categoricals (#18815)
Fixed some error/assertion types (#18811)
Remove panic in arr.to_struct (#18804)
Allow empty sort by columns (#18774)
Broadcast zip_with for structs (#18770)
Dropped/shifted rows in parquet scan with streaming=True (#18766)
Fix cum_max using exception text of cum_min for invalid dtype (#18780)
Fix accidental raise on shape 1 (#18748)

📖 Documentation

Fix link to issue tracker and code snippet format in GPU docs (#18850)
Clarify documentation for schema in read_csv function (#18759)
Fix literal type mapping example in lit docstrings (#18756)
Refactor docs directory hierarchy (#18773)
Minor improvements to contributing guide (#18777)
Improve over docs, add example with order_by (#18796)
Add documentation for beta gpu support (#18762)

🛠️ Other improvements

Re-export PyO3 in polars-python crate (#18835)
Make NodeTraverser struct public (#18822)
Add panic to unchecked DataFrame constructors in debug mode (#18807)
Fix parquet file metadata is dropped after first DSL->IR conversion (#18789)
Remove extra hashmap construction in new-streaming parquet (#18792)
Remove TODO comment regarding NumPy pinning (#18776)
Remove unused methods (#18744)
Make DataFrame a Vec of Column instead of Series (#18664)
Run benchmark on PR labeled 'needs-bench' (#18737)
Enable additional ruff lint rule sets (#18721)

Thank you to all our contributors for making this release possible!
@3ok, @Manishearth, @MarcoGorelli, @adamreeve, @alexander-beedie, @barak1412, @beckernick, @bradfordlynch, @coastalwhite, @deanm0000, @eitsupi, @i64, @itamarst, @mcrumiller, @nameexhaustion, @orlp, @r-brink, @ritchie46, @rodrigogiraoserrao, @squnit, @stinodego and @t-ded

Contributors

orlp, adamreeve, and 20 other contributors

Assets 3

12 Sep 14:01

github-actions

rs-0.43.1

54218e7

Rust Polars 0.43.1

🐞 Bug fixes

Revert automatically turning on Parquet prefiltered (#18720)
Parquet prefiltered with projection pushdown (#18714)
Correctly display multilevel nested Arrays (#18687)
Fix scalar literals (#18707)
Missing activation for serde for PlSmallStr from some crates (#18702)
Add missing PhantomDatas to BackingStorage (#18699)
Fix use of undeclared crate or module error (#18701)
Refactor decompression checks and add support for decompressing JSON (#18536)
Qcut all nulls panics (#18667)

🛠️ Other improvements

Remove IR info from DSL (#18712)
Refactor code into functions in new parquet source (#18711)
Remove unused feature flags from polars-mem-engine (#18679)
Remove hive_parts from DSL source (#18694)

Thank you to all our contributors for making this release possible!
@alexander-beedie, @ankane, @attila-lin, @coastalwhite, @eitsupi, @nameexhaustion, @ohanf, @orlp, @ritchie46 and @yarimiz

Contributors

orlp, ankane, and 8 other contributors

Assets 2

12 Sep 15:49

github-actions

py-1.7.1

54218e7

Python Polars 1.7.1

🐞 Bug fixes

Revert automatically turning on Parquet prefiltered (#18720)
Parquet prefiltered with projection pushdown (#18714)
Fix scalar literals (#18707)

🛠️ Other improvements

Remove IR info from DSL (#18712)
Remove unused feature flags from polars-mem-engine (#18679)
Remove hive_parts from DSL source (#18694)

Thank you to all our contributors for making this release possible!
@ankane, @attila-lin, @coastalwhite, @eitsupi, @nameexhaustion, @orlp and @ritchie46

Contributors

orlp, ankane, and 5 other contributors

Assets 3

11 Sep 10:26

github-actions

rs-0.43.0

f25ca0c

Rust Polars 0.43.0

🏆 Highlights

Add support for IO[bytes] and bytes in scan_{...} functions (#18532)
Add IEJoin algorithm for non-equi joins and support Full non-equi joins (#18365)

🚀 Performance improvements

Back arrow arrays with SharedStorage which can have non-refcounted static slices (#18666)
Don't traverse file list twice for extension validation (#18620)
Remove cloning of ColumnChunkMetadata (#18615)
Add upfront partitioning in ColumnChunkMetadata (#18584)
Enable Parquet parallel=prefiltered for auto (#18514)
Change PlSmallStr impl from Arc<str> to compact_str (#18508)
Added optimizer rules for is_null().all() and similar expressions to use null_count() (#18359)
Parquet do not copy uncompressed pages (#18441)
Several large parquet optimizations (#18437)
Batch Plain Parquet UTF-8 verification (#18397)
Partition metadata for parquet statistic loading (#18343)
Fix accidental quadratic parquet metadata (#18327)
Lazy decompress Parquet pages (#18326)
Don't rechunk aligned chunks in owned_binary_chunk_align (#18314)
Batch DELTA_LENGTH_BYTE_ARRAY decoding (#18299)
Slice pushdown for SimpleProjection (#18296)
Use direct path for time/timedelta literals (#18223)
Speedup ndjson reader ~40% (#18197)

✨ Enhancements

Add support for IO[bytes] and bytes in scan_{...} functions (#18532)
Add IEJoin algorithm for non-equi joins and support Full non-equi joins (#18365)
Make expressions containing Python UDFs serializable (#18135)
Support Serde for IRPlan (#18433)
Respect input time zone if input is pandas Timestamp (#18346)
Add POLARS_BACKTRACE_IN_ERR for debugging (#18333)
IR serde (#18298)
Improve decimal_comma error message (#18269)
Support pre-signed URLs for cloud scan (#18274)
Support empty structs (#18249)
Allow float in interpolate_by by column (#18015)

🐞 Bug fixes

Scalar checks (#18627)
Scanning hive partitioned files where hive columns are partially included in the file (#18626)
Enable "polars-json/timezones" feature from "polars-io" (#18635)
Use Buffer<T> in ObjectSeries, fixes variety of offset bugs (#18637)
Properly slice validity mask on pl.Object series (#18631)
Indicative error in list.gather when wrong indices type is supplied (#18611)
Fix group first value after group-by slice (#18603)
Functions for streaming require streaming feature (#18602)
Allow for date/datetime subclasses (e.g. pd.Timestamp, FreezeGun) in pl.lit (#18497)
Fix UnitVec inline clone and with_capacity (#18586)
Ensure result name of pow matches schema in grouped context (#18533)
Decimal mean agg dtype was incorrect in IR (#18577)
Fix output type for list.eval in certain cases (#18570)
Fix map_elements for List return dtypes (#18567)
Do not remove double-sort if maintain_order=True (#18561)
Empty any_horizontal should be false, not true (#18545)
Fix type inference error in map_elements for List types (#18542)
Added proper handling of file.write for large remote csv files (#18424)
Handle Parquet projection pushdown with only row index (#18520)
Properly raise on invalid selector expressions (#18511)
Wrong output column name in or and xor operations (#18512)
Various schema corrections (#18474)
Don't drop objects on empty buffers (#18469)
Add missing chunk align in pipe sink (#18457)
Expr.sign should preserve dtype (#18446)
Enable CSE in eager if struct are expanded (#18426)
Treat explode as gather (#18431)
Fencepost error in debug assertion in splitfields (#18423)
Unsoundness in CSV SplitFields (#18413)
Parquet nested values that span several pages (#18407)
Support reading empty parquet files (#18392)
Recurse on map field during type conversion (#15075)
Allow search_sorted on boolean series (#18387)
Mark Expr.(lower|upper)_bound as returning scalar (#18383)
Fix broken feature gate for ParquetReader (#18376)
Fix compressed ndjson row count (#18371)
Use correct column names when there are no value columns in unpivot (#18340)
Parquet several smaller issues (#18325)
Fix group-by slice on all keys (#18324)
Compute joint null mask before calling rolling corr/cov stats (#18246)
Several scan_parquet(parallel='prefiltered') problems (#18278)
Json feature flag missing imports (#18305)
Check groups in group-by filter (#18300)
Make json readers ignore BOM character (#18240)
Parquet delta encoding for 0-bitwidth miniblocks (#18289)
Arguments for upsample only have to be sorted within groups (#18264)
Use appropriate bins in hist when bin_count specified (#16942)
Raise suitable error on unsupported SQL set op syntax (#18205)
Fix invalid state due to cached IR (#18262)
Fix failed AWS credential load from '~/.aws/credentials' due to formatting (#18259)
Fix panic streaming parquet scan from cloud with slice (#18202)
Consistently round half-way points down in dt.round (#18245)
Fix duplicate column output and panic for include_file_paths (#18255)
Fix unit null rank (#18252)
Use physical for row-encoding (#18251)

📖 Documentation

Fix multiprocessing docs regarding fork method check (#18563)
Pre-compute plugin_path before defining plugin (#18503)
Fix BinViewChunkedBuilder arguments (#17277) (#18439)
Add date_range and datetime_ranges examples without eager=True (#18379)
Document POLARS_BACKTRACE_IN_ERR env var (#18354)
Document DataFrame.__getitem__ and Series.__getitem__ (#18309)
Improve decimal_comma error message (#18269)
Clarify coalesce behaviour in join_asof (#18273)
Add note to Expr.shuffle differentiating from df method (#18266)

📦 Build system

Remove extension-module from polars-python (#18554)
Bump Rust toolchain to nightly-2024-08-26 (#18370)

🛠️ Other improvements

Push down max row group height calc to file metadata (#18674)
Re-use already decoded metadata for first path (new-parquet-source) (#18656)
Remove duplicate byte range calc from new parquet source (#18655)
Fix a bunch of tests for new-streaming (#18659)
Rename MemSlice::from_slice -> MemSlice::from_static (#18657)
Don't raise on multiple same names in ie_join (#18658)
Split parquet_source.rs in new-streaming (#18649)
Check predicates in join_where (#18648)
Feature gate iejoin (#18646)
Scan from BytesIO in new-streaming parquet source (#18643)
Rename MetaData -> Metadata (#18644)
Change join_where semantics (#18640)
Fix unimplemented panics to give todo!s for AUTO_NEW_STREAMING (#18628)
Remove extra schema traits (#18616)
One simplify expression module and keep utility local (#18621)
Check number of binary comparisons in join_where predicates (#18608)
Raise on suffixed predicate in join_where (#18607)
Fix Python docs build (#18605)
Fix nan-ignoring max/min in new-streaming (#18593)
Correctly support more types in new-streaming sum (#18580)
Bump NodeTraverser major version (#18576)
Fix mean reduction in new-streaming (#18572)
Rename data_type -> dtype (#18566)
Refactor ArrowSchema to use polars_schema::Schema<D> (#18564)
Remove NotifyReceiver from new-streaming parquet source (#18540)
Refactor Schema to use generic struct from new polars-schema crate (#18539)
Temporarily pin NumPy in CI to address dependency resolving issue (#18544)
Fix and extend AnyValue comparison (#18534)
Remove top-level metadata from ArrowSchema (#18527)
Add FromIterator impls for PlSmallStr (#18509)
Update PlSmallStr comment (#18518)
Change PlSmallStr impl from Arc<str> to compact_str (#18508)
Make expressions containing Python UDFs serializable (#18135)
Allow polars to pass cargo check on windows (#18498)
Remove From<&&str> for PlSmallStr (#18507)
Change naming to new benchmark setup (#18473)
More refactor for PlSmallStr (#18456)
Split Reduction into it plus ReductionState (#18460)
Remove a string allocation in Parquet (#18466)
Unify internal string type (#18425)
Remove network call in hf docs (#18454)
Remove old streaming flag if we're going into new streaming (#18438)
Address spurious hypothesis test failure (#18434)
Add pl.length() reduction and small new-streaming fixes (#18429)
Fencepost error in debug assertion in splitfields (#18423)
Group arguments in conversion in a Context (#18418)
Turn all Binary/Utf8 into BinaryView/Utf8View in Parquet (#18331)
Recursively evaluate is_elementwise for function expressions (#18385)
Various small fixes for the new streaming engine (#18384)
Temporarily add ability to disable parquet source node (#18378)
Improve dot formatting of new-streaming parquet source (#18367)
Fix the required version of rust in README.md (#18357)
Only instantiate used portion of graph (#18337)
Fix new_streaming parameter (#18342)
Add parquet source node to new streaming engine (#18152)
Disable common sub-expr elim for new streaming engine (#18330)
Remove unused Parquet indexes (#18329)
Lower arbitrary expressions in the new streaming engine (#18315)
Expose many more function expressions to python IR (#18317)
Add Graphviz physical plan visualization for new streaming engine (#18307)
Add DataFrame::new_with_broadcast and simplify column uniqueness checks (#18285)
Add output_schema to all PhysNodes (#18272)
Change fn schema to fn collect_schema (#18261)
Add multiplexer node to new streaming engine (#18241)
Add feature gates for polars-python crate (#18232)
Split py-polars crate (#18204)
Update the required version of rust in README.md (#18203)
Add itertools in utils (#18213)
Use or_else for raising (#18206)
Remove unused Parquet source files (#18193)

Thank you to all our contributors for making this release possible...

Contributors

orlp, philss, and 34 other contributors

Assets 2

11 Sep 13:33

github-actions

py-1.7.0

d8acacf

Python Polars 1.7.0

🏆 Highlights

Add support for IO[bytes] and bytes in scan_{...} functions (#18532)
Add IEJoin algorithm for non-equi joins and support Full non-equi joins (#18365)

🚀 Performance improvements

Back arrow arrays with SharedStorage which can have non-refcounted static slices (#18666)
Don't traverse file list twice for extension validation (#18620)
Remove cloning of ColumnChunkMetadata (#18615)
Add upfront partitioning in ColumnChunkMetadata (#18584)
Enable Parquet parallel=prefiltered for auto (#18514)
Change PlSmallStr impl from Arc<str> to compact_str (#18508)
Added optimizer rules for is_null().all() and similar expressions to use null_count() (#18359)

✨ Enhancements

Update BytecodeParser for upcoming Python 3.13 (#18677)
Add tooltip by default to charts (#18625)
Add support for IO[bytes] and bytes in scan_{...} functions (#18532)
Support shortcut eval of common boolean filters in SQL interface "WHERE" clause (#18571)
Add IEJoin algorithm for non-equi joins and support Full non-equi joins (#18365)
Make expressions containing Python UDFs serializable (#18135)

🐞 Bug fixes

Use IO[bytes] instead of BytesIO in DataFrame.write_parquet() (#18652)
Scalar checks (#18627)
Scanning hive partitioned files where hive columns are partially included in the file (#18626)
Enable "polars-json/timezones" feature from "polars-io" (#18635)
Use Buffer<T> in ObjectSeries, fixes variety of offset bugs (#18637)
Properly slice validity mask on pl.Object series (#18631)
Raise if single argument form in replace/replace_strict is not a mapping (#18492)
Fix group first value after group-by slice (#18603)
Allow for date/datetime subclasses (e.g. pd.Timestamp, FreezeGun) in pl.lit (#18497)
Fix output type for list.eval in certain cases (#18570)
Fix map_elements for List return dtypes (#18567)
Check for duplicate column names in read_database cursor result, raising DuplicateError if found (#18548)
Do not remove double-sort if maintain_order=True (#18561)
Empty any_horizontal should be false, not true (#18545)
Fix type inference error in map_elements for List types (#18542)
Address incorrect align_frames result when the alignment column contains NULL values (#18521)
Fix advertised version in source builds (#18523)
Handle Parquet projection pushdown with only row index (#18520)
DataFrame write_database not passing down "engine_options" when using ADBC (#18451)
Properly raise on invalid selector expressions (#18511)
Wrong output column name in or and xor operations (#18512)
Normalize by default in Series.entropy like Expr.entropy does (#18493)
Various schema corrections (#18474)
Don't drop objects on empty buffers (#18469)
Expr.sign should preserve dtype (#18446)
Ensure assert_frame_not_equal and assert_series_not_equal raise on mismatched input types (#18402)
Fixed Worksheet definition in write_excel type annotations (#18452)

📖 Documentation

Update join_where docs to clarify behaviour (#18670)
Fix multiprocessing docs regarding fork method check (#18563)
Various docstring improvements to testing.assert_* functions (#18494)
Fix formula in ewm_mean_by (#18506)
Pre-compute plugin_path before defining plugin (#18503)
Add Expr.null_count to aggregations (#18459)

🛠️ Other improvements

Fix a bunch of tests for new-streaming (#18659)
Don't raise on multiple same names in ie_join (#18658)
Check predicates in join_where (#18648)
Change join_where semantics (#18640)
Add benchmark tests for join_where with inequalities (#18614)
Check number of binary comparisons in join_where predicates (#18608)
Raise on suffixed predicate in join_where (#18607)
Fix Python docs build (#18605)
Use streaming argument in test_parquet_slice_pushdown_non_zero_offset (#18529)
Fix delta test merge (#18601)
Alter/skip some tests for new streaming (#18574)
Add lower-bound pin for numba (#18555)
Temporarily pin NumPy in CI to address dependency resolving issue (#18544)
Change PlSmallStr impl from Arc<str> to compact_str (#18508)
Make expressions containing Python UDFs serializable (#18135)
Change naming to new benchmark setup (#18473)
Ensure physical arguments to np ufuncs are rechunked (#18471)
Remove a string allocation in Parquet (#18466)
Remove network call in hf docs (#18454)

Thank you to all our contributors for making this release possible!
@0xbe7a, @MarcoGorelli, @WbaN314, @adamreeve, @alexander-beedie, @alonme, @barak1412, @coastalwhite, @dependabot, @dependabot[bot], @eitsupi, @henryharbeck, @ion-elgreco, @krasnobaev, @megaserg, @nameexhaustion, @ohanf, @orlp, @philss, @r-brink, @ritchie46, @skellys, @squnit, @stinodego, @wence- and @yarimiz

Contributors

orlp, philss, and 23 other contributors

Assets 3

28 Aug 18:57

github-actions

py-1.6.0

6ff1c70

Python Polars 1.6.0

💥 Unstable Breaking changes

These API's were marked unstable and are allowed to change.

Use Altair in DataFrame.plot (#17995)

🚀 Performance improvements

Parquet do not copy uncompressed pages (#18441)
Several large parquet optimizations (#18437)
Batch Plain Parquet UTF-8 verification (#18397)
Partition metadata for parquet statistic loading (#18343)
Fix accidental quadratic parquet metadata (#18327)
Lazy decompress Parquet pages (#18326)
Don't rechunk aligned chunks in owned_binary_chunk_align (#18314)
Batch DELTA_LENGTH_BYTE_ARRAY decoding (#18299)
Slice pushdown for SimpleProjection (#18296)
Use direct path for time/timedelta literals (#18223)
Speedup ndjson reader ~40% (#18197)
Skip parquet page when unneeded (#18192)

✨ Enhancements

Use Altair in DataFrame.plot (#17995)
Allow mapping as syntactic sugar in str.replace_many (#18214)
Respect input time zone if input is pandas Timestamp (#18346)
Improve Schema and DataType interop with Python types (#18308)
Add POLARS_BACKTRACE_IN_ERR for debugging (#18333)
IR serde (#18298)
Improve decimal_comma error message (#18269)
Support pre-signed URLs for cloud scan (#18274)
Support the most recent version of "duckdb_engine" connections via read_database (#18277)
Support empty structs (#18249)
Allow float in interpolate_by by column (#18015)
Make show_versions more responsive (#18208)

🐞 Bug fixes

Enable CSE in eager if struct are expanded (#18426)
Treat explode as gather (#18431)
Parquet nested values that span several pages (#18407)
Support reading empty parquet files (#18392)
Recurse on map field during type conversion (#15075)
Allow search_sorted on boolean series (#18387)
Mark Expr.(lower|upper)_bound as returning scalar (#18383)
Fix compressed ndjson row count (#18371)
Use correct column names when there are no value columns in unpivot (#18340)
Parquet several smaller issues (#18325)
Fix group-by slice on all keys (#18324)
Compute joint null mask before calling rolling corr/cov stats (#18246)
Several scan_parquet(parallel='prefiltered') problems (#18278)
Json feature flag missing imports (#18305)
Check groups in group-by filter (#18300)
Parquet delta encoding for 0-bitwidth miniblocks (#18289)
Arguments for upsample only have to be sorted within groups (#18264)
Use appropriate bins in hist when bin_count specified (#16942)
Raise suitable error on unsupported SQL set op syntax (#18205)
Fix invalid state due to cached IR (#18262)
Fix failed AWS credential load from '~/.aws/credentials' due to formatting (#18259)
Fix panic streaming parquet scan from cloud with slice (#18202)
Consistently round half-way points down in dt.round (#18245)
Fix duplicate column output and panic for include_file_paths (#18255)
Fix unit null rank (#18252)
Use physical for row-encoding (#18251)
Convert date and datetime in literal construction (#16018)
Fix gather str as lit (#18207)

📖 Documentation

Add date_range and datetime_ranges examples without eager=True (#18379)
Fix incorrect comments in group_by_dynamic (#18415)
Alphabetise methods in Python API reference (#18380)
Document POLARS_BACKTRACE_IN_ERR env var (#18354)
Add missing aggregation entries (#18334) (#18341)
Add missing Series methods to API reference (#18312)
Document DataFrame.__getitem__ and Series.__getitem__ (#18309)
Fix typos and add see also links to struct name expressions (#18282)
Improve decimal_comma error message (#18269)
Clarify coalesce behaviour in join_asof (#18273)
Add note to Expr.shuffle differentiating from df method (#18266)
Improve formatting and consistency of various docstrings (#18237)
Add missing "Parameters" section to bin.size expr docstring (#18222)
Fix column name output in example of DataFrame.map_rows (#18227)

📦 Build system

Bump Rust toolchain to nightly-2024-08-26 (#18370)

🛠️ Other improvements

Address spurious hypothesis test failure (#18434)
Turn all Binary/Utf8 into BinaryView/Utf8View in Parquet (#18331)
Fix the required version of rust in README.md (#18357)
Remove unused Parquet indexes (#18329)
Deprecate serialize json for LazyFrame (#18283)
Don't add sink node to cloud query (#18280)
Split py-polars crate (#18204)
Fix test for new deltalake release (#18211)
Update the required version of rust in README.md (#18203)
Fix version bifurcation for test_read_database_cx_credentials (#18220)
Use or_else for raising (#18206)
Remove unused Parquet source files (#18193)

Thank you to all our contributors for making this release possible!
@BartSchuurmans, @ChayimFriedman2, @MarcoGorelli, @StepfenShawn, @agossard, @alexander-beedie, @cgbur, @coastalwhite, @corwinjoy, @deanm0000, @henryharbeck, @ion-elgreco, @jqnatividad, @krasnobaev, @liufeimath, @markxwang, @mcrumiller, @nameexhaustion, @orlp, @ritchie46, @stinodego, @sunadase, @thomascamminady and @wence-

Contributors

orlp, wence-, and 22 other contributors

Assets 3

14 Aug 14:59

github-actions

rs-0.42.0

7686025

Rust Polars 0.42.0

💥 Breaking changes

Reject literal input in sort_by_exprs() (#17606)

🚀 Performance improvements

Skip parquet page when unneeded (#18192)
Improve binview extend/ifthenelse (#18164)
Start on better Parquet delta decoding (#18049)
Tune jemalloc to not create muzzy pages (#18148)
Reduce default async thread count (#18142)
Use single threaded algorithms if only 1 core given (#18101)
Use Arc<Vec<_>> instead of Arc<[_]> for paths and hive partitions (#18066)
SIMD View from FixedSizeBinary (#18059)
Use bitmask to filter Parquet predicate-pushdown items (#17993)
Zerocopy buffers for FixedSizeBinary to BinaryView cast (#18043)
Integer fast path Parquet dict encoding (#18030)
Speedup writing of Parquet primitive values (#18020)
Remove temporary allocations in Parquet (#18013)
Delay selection expansion (#18011)
Optimize strings slices (#17996)
Make .dt.weekday 20x faster (#17992)
Shrink MemSliceInner enum (#17991)
Push down slice with non-zero offset to Parquet (#17972)
Reduce copy in MemSlice (#17983)
Ensure metadata flags are maintained on vertical parallelization (#17804)
Ensure only nodes that are not changed are cached in collapse optimizer (#17791)
Use bitflags for OptState (#17788)
Remove async directory auto-detection (#17779)
Fix accidental quadratic horizontal concat (#17783)
Batch parquet integer decoding (#17734)
Use mmap-ed memory if possible in Parquet reader (#17725)
Use bitflags for function options (#17723)
Introduce MemReader to file buffer in Parquet reader (#17712)
Better GC and push_view for binviews (#17627)
Fix pathological perf issue in window-order-by (#17650)
Cache path resolving of scan functions (#17616)
Add ArrayChunks to optimize codegen of BatchDecoder (#17632)
Rechunk before we go into grouped gathers (#17623)
Cache schema resolve back to DSL (#17610)
Add fastpath for when rounding by single constant durations (#17580)
Improve parallelism in writing hive parquet (#17512)
Support datetime in predicate during hive partition pruning (#17545)
Batch nested embed parquet decoding (#17549)
Batch nested Parquet decoding (#17542)
Collect Parquet dictionary binary as view (#17475)
Keep more parallelism when CSE plan cache hits (#17463)
Batch parquet primitive decoding (#17462)
Respect allow_threading in some more operators (#17450)
Parallelize parquet metadata deserialization (#17399)

✨ Enhancements

Create literals for datetime/date expressions (#18184)
Create literals in 'datetime' expression (#18182)
Add missing impl for Series (#18166)
Raise on invalid 'is_between' and improve error message quality (#18147)
Add boolean Parquet HybridRle encoding (#18022)
Add nested SQL join support (#18006)
Push down slice with non-zero offset to Parquet (#17972)
Add support for binary size method to Expr and Series "bin" namespace (#17924)
Add SQL interface support for PostgreSQL dollar-quoted string literals (#17940)
Allow for parsing parquet file where the time zone is stored as lowercase "utc" (#17925)
Expose binary_elementwise_into_string_amortized for plugin authors, recommend apply_into_string_amortized instead of apply_to_buffer (#17903)
Decompress in CSV / NDJSON scan (#17841)
Ensure unique names in HConcat (#17884)
Support authentication with HuggingFace login (#17881)
Support "BY NAME" qualifier for SQL "INTERSECT" and "EXCEPT" set ops (#17835)
Raise informative error instead of panicking when passing invalid directives to to_string for Date dtype (#17670)
Implement forward/backward fill for all types (#17861)
Implement is_in operation on decimal type (#17832)
Support hf:// in read_(csv|ipc|ndjson) functions (#17785)
Allow literals in sort (#17780)
Cloud support for NDJSON (#17717)
Support API token for scanning hf:// (#17682)
Raise error instead of panic in unsupported serde (#17679)
Include file path option for NDJSON (#17681)
Hugging Face path expansion (#17665)
Add DSL validation for cloud eligible check (#17287)
Raise informative error message if non-IntoExpr is passed by name in *Frame.group_by (#17654)
Change API for writing partitioned Parquet to reduce code duplication (#17586)
Cache schema resolve back to DSL (#17610)
Expose returns_scalar to map_elements (#17613)
Add option to include file path for Parquet, IPC, CSV scans (#17563)
Support describe on decimal (#15092)
Support datetime in predicate during hive partition pruning (#17545)
Raise more informative error message for directories containing files with mixed extensions (#17480)
Exclude empty files from directory/glob expansion (#17478)
Add "future" versioning (#17421)
Apply slice pushdown immediately to in-memory frames (#17459)
Support writing hive partitioned parquet (#17324)
Add right join support (#17441)
Support hive partitioning in scan_ipc (#17434)

🐞 Bug fixes

Fix struct shift and list builder (#18189)
Don't load Parquet nested metadata (#18183)
Throw bigidx error for Parquet row-count (#18154)
Fix unpivot on empty df (#18179)
Don't vertically parallelize cse contexts (#18177)
Properly handle empty Parquet row groups with no dictionary (#18161)
Struct outer nullabillity (#18156)
Fix pyarrow predicate pushdown regression (#18145)
Prevent unwanted supertype cast in 'search_sorted' (#18143)
Parquet with filter=None (#18139)
Don't raise when converting from pandas if index contains duplicate names when include_index=False (the default) (#18133)
Don't remove leading whitespace in read_csv (#18131)
Py-polars compilation with no features (#18129)
String transform to_titlecase was too narrowly defined (#18122)
Reading Parquet with Null dictionary page (#18112)
Incorrect lazy CSV select(len()) for compressed files (#18067)
Fix sink_ipc_cloud panicking with runtime error (#18091)
Properly write Parquet for sliced lists (#18073)
Panic reading multiple CSV files from cloud (#18056)
Fix CloudWriter to use buffer before making requests (#18027)
Fix typos and remove trailing whitespace (#18024)
Handle cfg(feature) for shrink_dtype (#18038)
Subtraction with overflow on negative slice offset in Parquet (#18036)
Add nested SQL join support (#18006)
Allow read_csv schema to take unparsable types (#17765)
Multi-output column expressions in frame sort method (#17947)
Fix Asof join by schema (#17988)
Fix glob resolution for Hugging Face (#17958)
Several parquet reader/writer regressions (#17941)
Incorrect filter on categorical columns from parquet files (#17950)
SQL COUNT(DISTINCT x) should not include NULL values (#17930)
Scanning '%' from cloud (#17890)
Respect glob=False for cloud reads (#17860)
Properly write nest-nulled values in Parquet (#17845)
Allow full-null Object series to be built (#17870)
Fix from_arrow for struct type (#17839)
Infer decimal scales on mixed scale input (#17840)
Raise on unsupported fill strategy dtype (#17837)
Properly write nested NullArray in Parquet (#17807)
Check input type on list.to_struct (#17834)
Fix right join schema (#17833)
Non-compliant Parquet list element name (#17803)
Correctly set should_broadcast flag in HStack CSE rewrite (#17784)
Fix projection pusdhown of literals without names (#17778)
Don't expand HTTP paths (#17774)
Check funtion input len at expansion (#17763)
Don't panic in invalid agg_groups (#17762)
Raise empty struct (#17736)
Fix GC logic in write_ipc (#17752)
Panic in pl.concat_list and list.concat on empty inputs (#17742)
Fix out nullability for structs coming from arrow (#17738)
Percent encode for Hugging Face paths (#17718)
Use bytemuck in slice reinterpret for Parquet ArrayChunks (#17700)
Propagate struct outer nullability eagerly (#17697)
Use ETag for HTTP file cache invalidation (#17684)
Fix type inference failure caused by double transpose (#17663)
Interpret %y consistently with Chrono in to_date/to_datetime/strptime (#17661)
Fix explode invalid check (#17651)
Tighten up error checking on join keys (#17517)
Expand brackets in async glob expansion (#17630)
Fix row index disappearing after projection pushdown in NDJSON (#17631)
Fix struct -> enum is_in (#17622)
Don't needlessly unwrap in pivot_schema (#17611)
Reject literal input in sort_by_exprs() (#17606)
Bitmap collect into safety (#17588)
Method dt.truncate was sometimes returning incorrect results for pre-1970 datetimes (#17582)
Defer path expansion until collect in file scan methods (#17532)
Correct logic for descending sort of BooleanChunked (#17558)
Don't unwrap send attempt to oneshot channel (#17566)
Fix scanning from HTTP cloud paths (#17571)
Properly implement struct (#17522)
Add missing commas in python IR interchange (#17518)
Fix predicate pushdown for .list.(get|gather) (#17511)
Turn panic into error when serializing Object types (#17353)
Fix struct expansion and raise on exclude (#17489)
Fix decimal dyn float supertype (#17464)
Don't rechunk on phys_repr (#17461)
Harden alchemy session for old sqlalchemy versions (#17366)
Fix swapping rename schema (#17458)
Raise on oob decimal precision (#17445)
Don't allow json inference method to be chunked/streaming (#17396)
avoid panic when projecting solitary count into empty frame (#17393)
Set literal nesting to 0 (#17392)
Fix scanning cloud paths with spaces (#17379)
Fix slice length no longer allowing None (#17372)
Cull row index in scan if projection pushdown removes it (#17363)
Fix typo in SchemaError exception message (#17350)