Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spark: Add read/write support for UUIDs from bytes #10635

Open
raphaelauv opened this issue Jul 5, 2024 · 3 comments
Open

Spark: Add read/write support for UUIDs from bytes #10635

raphaelauv opened this issue Jul 5, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@raphaelauv
Copy link

raphaelauv commented Jul 5, 2024

Apache Iceberg version

1.5.2 (latest release)

Query engine

Spark

Please describe the bug 馃悶

I can insert a string column to an iceberg UUID column thanks to #7399

df = df.withColumn("id", lit(str(uuid.uuid4())))

but I can't insert a byte column to an iceberg UUID column

df = df.withColumn("id", lit(uuid.uuid4().bytes))

thanks all

@raphaelauv raphaelauv added the bug Something isn't working label Jul 5, 2024
@nastra
Copy link
Contributor

nastra commented Jul 5, 2024

@raphaelauv would you be interested in contributing a fix for this?

@raphaelauv
Copy link
Author

raphaelauv commented Jul 6, 2024

hey @nastra, I do not have the time to contribute this feature right now, thanks for the proposition 馃憤

until , I'm sharring an hacky bypass 馃槄 :

df = df.withColumn(
    "id", 
    F.regexp_replace(
        F.lower(F.hex("id")), 
        "(.{8})(.{4})(.{4})(.{4})(.{12})", 
        "$1-$2-$3-$4-$5"
    )
)

@anuragmantri
Copy link
Contributor

I can give this a shot @nastra. Although I need to read the UUID PR first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants