Skip to content

Commit

Permalink
Pass schema as schema into CsvReader (pola-rs#15254)
Browse files Browse the repository at this point in the history
When passed in as dtypes, the schema inference is not skipped. That has
the side effect that only the first `n` columns from the passed-in
schema are eventually used (see the `infer_file_schema_inner` method
in the `polars-io/src/csv/utils.rs` file).

With the change `scan_csv` behaves the same as `read_csv` when used with
a schema having more columns that the file header:

```python
with tempfile.NamedTemporaryFile() as f:
    f.write(b"""
 A,B,C
 1,2,3
 4,5,6,7,8
 9,10,11
 """.strip())
    f.seek(0)
    df = pl.read_csv(f.name, schema=dict.fromkeys("ABCDE", pl.String), truncate_ragged_lines=True)
    print(df)
    lf = pl.scan_csv(f.name, schema=dict.fromkeys("ABCDE", pl.String), truncate_ragged_lines=True).collect()
    print(lf)
...
>>> check()
shape: (3, 5)
┌─────┬─────┬─────┬──────┬──────┐
│ A   ┆ B   ┆ C   ┆ D    ┆ E    │
│ --- ┆ --- ┆ --- ┆ ---  ┆ ---  │
│ str ┆ str ┆ str ┆ str  ┆ str  │
╞═════╪═════╪═════╪══════╪══════╡
│ 1   ┆ 2   ┆ 3   ┆ null ┆ null │
│ 4   ┆ 5   ┆ 6   ┆ 7    ┆ 8    │
│ 9   ┆ 10  ┆ 11  ┆ null ┆ null │
└─────┴─────┴─────┴──────┴──────┘
shape: (3, 5)
┌─────┬─────┬─────┬──────┬──────┐
│ A   ┆ B   ┆ C   ┆ D    ┆ E    │
│ --- ┆ --- ┆ --- ┆ ---  ┆ ---  │
│ str ┆ str ┆ str ┆ str  ┆ str  │
╞═════╪═════╪═════╪══════╪══════╡
│ 1   ┆ 2   ┆ 3   ┆ null ┆ null │
│ 4   ┆ 5   ┆ 6   ┆ 7    ┆ 8    │
│ 9   ┆ 10  ┆ 11  ┆ null ┆ null │
└─────┴─────┴─────┴──────┴──────┘
```
  • Loading branch information
filabrazilska committed Mar 26, 2024
1 parent 9f1b08f commit 6d8727b
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion crates/polars-lazy/src/physical_plan/executors/scan/csv.rs
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ impl CsvExec {
CsvReader::from_path(&self.path)
.unwrap()
.has_header(self.options.has_header)
.with_dtypes(Some(self.schema.clone()))
.with_schema(Some(self.schema.clone()))
.with_separator(self.options.separator)
.with_ignore_errors(self.options.ignore_errors)
.with_skip_rows(self.options.skip_rows)
Expand Down

0 comments on commit 6d8727b

Please sign in to comment.