-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Reading CSV with low_memory gave no data #16231
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #16231 +/- ##
==========================================
- Coverage 80.92% 80.83% -0.10%
==========================================
Files 1393 1394 +1
Lines 179568 179948 +380
Branches 2909 2909
==========================================
+ Hits 145321 145454 +133
- Misses 33742 33989 +247
Partials 505 505 ☔ View full report in Codecov by Sentry. |
|
||
batches += reader.next_batches(5) # type: ignore[operator] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have adjusted this as the mmap batched reader isn't strictly respecting the batch size
pl.select(a=pl.repeat(1, 100)).write_csv(".env/data.csv")
f = pl.read_csv_batched(".env/data.csv", batch_size=5).next_batches
lst = []
while b := f(5):
lst.extend(b)
print([x.height for x in lst])
# [11, 11, 11, 11, 11, 11, 11, 11, 11, 1]
Fixes #16010
I've left the code in but removed calls to them for now, as I think I will be looking to do proper refactoring later.