Spark: Add read/write support for UUIDs from bytes #10635

raphaelauv · 2024-07-05T07:43:17Z

Apache Iceberg version

1.5.2 (latest release)

Query engine

Spark

Please describe the bug 🐞

I can insert a string column to an iceberg UUID column thanks to #7399

df = df.withColumn("id", lit(str(uuid.uuid4())))

but I can't insert a byte column to an iceberg UUID column

df = df.withColumn("id", lit(uuid.uuid4().bytes))

thanks all

nastra · 2024-07-05T08:49:07Z

@raphaelauv would you be interested in contributing a fix for this?

raphaelauv · 2024-07-06T08:38:42Z

hey @nastra, I do not have the time to contribute this feature right now, thanks for the proposition 👍

until , I'm sharring an hacky bypass 😅 :

df = df.withColumn(
    "id", 
    F.regexp_replace(
        F.lower(F.hex("id")), 
        "(.{8})(.{4})(.{4})(.{4})(.{12})", 
        "$1-$2-$3-$4-$5"
    )
)

anuragmantri · 2024-07-08T03:48:14Z

I can give this a shot @nastra. Although I need to read the UUID PR first.

raphaelauv added the bug Something isn't working label Jul 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spark: Add read/write support for UUIDs from bytes #10635

Spark: Add read/write support for UUIDs from bytes #10635

raphaelauv commented Jul 5, 2024 •

edited

Loading

nastra commented Jul 5, 2024

raphaelauv commented Jul 6, 2024 •

edited

Loading

anuragmantri commented Jul 8, 2024

Spark: Add read/write support for UUIDs from bytes #10635

Spark: Add read/write support for UUIDs from bytes #10635

Comments

raphaelauv commented Jul 5, 2024 • edited Loading

Apache Iceberg version

Query engine

Please describe the bug 🐞

nastra commented Jul 5, 2024

raphaelauv commented Jul 6, 2024 • edited Loading

anuragmantri commented Jul 8, 2024

raphaelauv commented Jul 5, 2024 •

edited

Loading

raphaelauv commented Jul 6, 2024 •

edited

Loading