Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Pass schema as schema into CsvReader (pola-rs#15254)
When passed in as dtypes, the schema inference is not skipped. That has the side effect that only the first `n` columns from the passed-in schema are eventually used (see the `infer_file_schema_inner` method in the `polars-io/src/csv/utils.rs` file). With the change `scan_csv` behaves the same as `read_csv` when used with a schema having more columns that the file header: ```python with tempfile.NamedTemporaryFile() as f: f.write(b""" A,B,C 1,2,3 4,5,6,7,8 9,10,11 """.strip()) f.seek(0) df = pl.read_csv(f.name, schema=dict.fromkeys("ABCDE", pl.String), truncate_ragged_lines=True) print(df) lf = pl.scan_csv(f.name, schema=dict.fromkeys("ABCDE", pl.String), truncate_ragged_lines=True).collect() print(lf) ... >>> check() shape: (3, 5) ┌─────┬─────┬─────┬──────┬──────┐ │ A ┆ B ┆ C ┆ D ┆ E │ │ --- ┆ --- ┆ --- ┆ --- ┆ --- │ │ str ┆ str ┆ str ┆ str ┆ str │ ╞═════╪═════╪═════╪══════╪══════╡ │ 1 ┆ 2 ┆ 3 ┆ null ┆ null │ │ 4 ┆ 5 ┆ 6 ┆ 7 ┆ 8 │ │ 9 ┆ 10 ┆ 11 ┆ null ┆ null │ └─────┴─────┴─────┴──────┴──────┘ shape: (3, 5) ┌─────┬─────┬─────┬──────┬──────┐ │ A ┆ B ┆ C ┆ D ┆ E │ │ --- ┆ --- ┆ --- ┆ --- ┆ --- │ │ str ┆ str ┆ str ┆ str ┆ str │ ╞═════╪═════╪═════╪══════╪══════╡ │ 1 ┆ 2 ┆ 3 ┆ null ┆ null │ │ 4 ┆ 5 ┆ 6 ┆ 7 ┆ 8 │ │ 9 ┆ 10 ┆ 11 ┆ null ┆ null │ └─────┴─────┴─────┴──────┴──────┘ ```
- Loading branch information