Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Fix read_csv to respect the order specified by the columns argument #13240

Closed
wants to merge 2 commits into from

Conversation

romanovacca
Copy link
Contributor

@romanovacca romanovacca commented Dec 24, 2023

fix #13066

in fn parse_csv(), we use get_projection(), which uses the projection that contains the correct columns and order. That function mentions:
we also need to sort the projection to have predictable output, which happens via sort_unstable(), which leads to a different order of the projection/columns.

In my fix, we now take the original projection and use that to fetch the matching column names, and when all the chunks of data are processed, I reorder the dataframe (if column_names was specified), into the order that was initially passed.

I would expect this reordering to be present somewhere already, but I couldn't find it

@romanovacca romanovacca changed the title #13066 order column fix(rust): fix specified column order for read_csv Dec 24, 2023
@github-actions github-actions bot added fix Bug fix rust Related to Rust Polars labels Dec 24, 2023
@romanovacca romanovacca marked this pull request as draft December 24, 2023 15:36
@stinodego stinodego changed the title fix(rust): fix specified column order for read_csv fix: Fix read_csv to respect the order specified by the columns argument Feb 14, 2024
@github-actions github-actions bot added the python Related to Python Polars label Feb 14, 2024
@stinodego
Copy link
Member

stinodego commented Feb 14, 2024

I did a rebase. However, I don't think this is the right fix. The offending code seems to be here:

if let Some(cols) = columns {
let mut prj = Vec::with_capacity(cols.len());
for col in cols {
let i = schema.try_index_of(&col)?;
prj.push(i);
}
// update null values with projection
if let Some(nv) = null_values.as_mut() {
nv.apply_projection(&prj);
}
projection = Some(prj);
}

After this, the columns information is lost. So we should make sure the projection is correct at that point, rather than try to fix it later.

Since this PR is pretty old and has been on draft for a while, and it doesn't seem (close to) ready, I'll close it for now. Feel free to open a fresh PR and continue the work!

@stinodego stinodego closed this Feb 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix Bug fix python Related to Python Polars rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

read_csv does not return columns in the order specified by columns parameter
2 participants