Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with empty dataframe being returned when querying with only hive partition column. #11796

Closed
2 tasks done
allinux opened this issue Oct 17, 2023 · 1 comment · Fixed by #11803
Closed
2 tasks done
Labels
bug Something isn't working rust Related to Rust Polars

Comments

@allinux
Copy link

allinux commented Oct 17, 2023

Checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

// empty DATAFRAME
LazyFrame::scan_parquet("s3://.../DATE_PARTITION=20221010/*", args)?
	.group_by([col("DATE_PARTITION")])
	.agg([count()])
	.collect();

Log output

No response

Issue description

#11690

When the condition is a dataframe(with columns variable) consisting of only partition columns, df may be empty, so calculating the number of rows in df may result in 0.

fn rg_to_dfs_optionally_par_over_columns(
...
	let mut df = DataFrame::new_no_checks(columns); 
	if let Some(rc) = &row_count {
		df.with_row_count_mut(&rc.name, Some(*previous_row_count + rc.offset));
	}
	materialize_hive_partitions(&mut df, hive_partition_columns);
...
}		
		
fn materialize_hive_partitions(df: &mut DataFrame, hive_partition_columns: Option<&[Series]>) {
    if let Some(hive_columns) = hive_partition_columns {
        let num_rows = df.height();	// 0 

        for s in hive_columns {
            unsafe { df.with_column_unchecked(s.new_from_index(0, num_rows)) };
        }
    }
}

// empty DATAFRAME
LazyFrame::scan_parquet("s3://.../DATE_PARTITION=20221010/*", args)?
	.group_by([col("DATE_PARTITION")])
	.agg([count()])
	.collect();

Expected behavior

.

Installed versions

rustc 1.74.0-beta.2 (9326de8fa 2023-10-13)

[dependencies]
polars = { git = "https://github.com/pola-rs/polars.git", rev = "f40eea6952846f3570382c5b1145e13dd3310949", features = ["async", "lazy", "async", "aws", "parquet"] }

@allinux allinux added bug Something isn't working rust Related to Rust Polars labels Oct 17, 2023
@allinux allinux changed the title Problem with empty dataframe being returned when querying with hive partition column. Problem with empty dataframe being returned when querying with only hive partition column. Oct 17, 2023
@nameexhaustion
Copy link
Collaborator

Thanks for the report - will take a look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working rust Related to Rust Polars
Projects
None yet
2 participants