Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could not create or read partition table #862

Closed
smallzhongfeng opened this issue Aug 8, 2023 · 9 comments · Fixed by apache/datafusion#9126
Closed

Could not create or read partition table #862

smallzhongfeng opened this issue Aug 8, 2023 · 9 comments · Fixed by apache/datafusion#9126
Labels
bug Something isn't working

Comments

@smallzhongfeng
Copy link

smallzhongfeng commented Aug 8, 2023

Describe the bug
After the partition table is created, it cannot be read normally

To Reproduce

echo "1,2" > tmp/year=2022/data.csv
echo "3,4" > tmp/year=2021/data.csv

run in ballista-cli


❯ CREATE EXTERNAL TABLE t2 (a INT, b INT) STORED AS CSV PARTITIONED BY (year) LOCATION 'tmp';
ArrowError(SchemaError("Unable to get field named \"year\". Valid fields: [\"a\", \"b\"]"))

I deployed it in standalone mode.

Expected behavior
A clear and concise description of what you expected to happen.

Additional context
Add any other context about the problem here.

@smallzhongfeng smallzhongfeng added the bug Something isn't working label Aug 8, 2023
@smallzhongfeng smallzhongfeng reopened this Aug 8, 2023
@smallzhongfeng
Copy link
Author

image
I deployed it using the latest online version, and the client is also the latest version 0.11.0

@smallzhongfeng
Copy link
Author

smallzhongfeng commented Aug 8, 2023

@thinkharderdev @yahoNanJing @Dandandan Have you ever encountered similar problems? Could you guys give me some advice

@smallzhongfeng
Copy link
Author

Similar issue like this: #747

@smallzhongfeng
Copy link
Author

use datafusion::arrow::datatypes::DataType;
use datafusion::datasource::file_format::parquet::DEFAULT_PARQUET_EXTENSION;
use ballista::prelude::{BallistaConfig, BallistaContext, Result};
use datafusion::prelude::{CsvReadOptions, ParquetReadOptions, SessionContext};

#[tokio::main]
async fn main() -> Result<()> {
    let config = BallistaConfig::builder()
        .set("ballista.shuffle.partitions", "1")
        .build()?;

    let ctx = BallistaContext::standalone(&config, 2).await?;

    let options = ParquetReadOptions {
        file_extension: DEFAULT_PARQUET_EXTENSION,
        table_partition_cols: vec![("date".to_string(), DataType::Utf8)],
        parquet_pruning: Some(false),
        skip_metadata: Some(true),
    };
    let path= format!("tmp");

    let arc = ctx.read_parquet(&path, options).await?;
    println!("{}", arc.schema());
    arc.clone().select_columns(&["String", "date"]).unwrap();
    arc.clone().show().await?;
    Ok(())
}

This case also fail, so is it currently not supported to create a partition table?

@smallzhongfeng smallzhongfeng changed the title Support for reading partition table Could not create or read partition table Aug 9, 2023
@yahoNanJing
Copy link
Contributor

Hi @smallzhongfeng, I'll take a look at this issue in this week.

@smallzhongfeng
Copy link
Author

Thank you for your reply. @yahoNanJing At present, my guess is that the partition field is treated as an ordinary field, resulting in an error when the schema is matched.

@smallzhongfeng
Copy link
Author

Any update ?

@andreclaudino
Copy link

It looks the partitions are ignored, and the files inside are not loaded. Is there any update on how to deal that?

@bcmcmill
Copy link

Any update?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
4 participants