Spark rewrite Files Action OOM #10054

Zhanxiao-Ma · 2024-03-28T02:23:38Z

Query engine

Spark

Question

V2 table support equality deletes for row level delete. When I use the Java API to write a large number of delete records and then start the spark rewrite files Action, I get an OOM error. Is it not allowed to delete too many records? How can I resolve this issue?
Does the community have plans to improve this issue?

manuzhang · 2024-03-28T11:08:03Z

It's not forbidden to delete too many records but could increase memory required in the driver. If you are using position deletes, there's rewrite_position_delete_files. As for equality deletes, there was #2364 to rewrite equality deletes as position deletes but not merged.

RussellSpitzer · 2024-03-28T21:12:58Z

There really isn't enough information here to dig into the issue. How many records are there, what were the spark settings, did it not oom before there were deletes? Did it oom during shuffle? Did the executors OOM? Was an unreasonable amount of memory being consumed?

As a general statement we are interested in improving performance but OOM's can happen for many different reasons so it's not something that can be universally fixed.

Zhanxiao-Ma · 2024-03-29T03:17:25Z

There really isn't enough information here to dig into the issue. How many records are there, what were the spark settings, did it not oom before there were deletes? Did it oom during shuffle? Did the executors OOM? Was an unreasonable amount of memory being consumed?

As a general statement we are interested in improving performance but OOM's can happen for many different reasons so it's not something that can be universally fixed.

OK. Actually, this line of code is causing the OOM error because it loads all EqDelete record into memory. And as the number of records to be deleted increases, the memory requirement also increases.

iceberg/data/src/main/java/org/apache/iceberg/data/DeleteFilter.java

Line 190 in 2d76c91

    
           StructLikeSet deleteSet = deleteLoader().loadEqualityDeletes(deletes, deleteSchema);

nk1506 · 2024-04-05T08:38:45Z

@RussellSpitzer / @manuzhang , are we planning to make any fix for this? OOM has been observed with RewriteFiles too?
If we use this API with large chunks of small files to be rewritten with new large files, It causes OOM.

manuzhang · 2024-04-05T14:22:15Z

@nk1506 Echoing Russell's comments, how many small files are there in your OOM case? How much memory do you set up?

Zhanxiao-Ma · 2024-04-06T13:41:57Z

@nk1506 Echoing Russell's comments, how many small files are there in your OOM case? How much memory do you set up?

@RussellSpitzer I believe increasing memory is not a good solution for dealing with excessive information deletion because it is impossible to predict how much memory would be appropriate.

Zhanxiao-Ma · 2024-04-06T13:42:07Z

@RussellSpitzer I have implemented a disk-based map to solve this problem. Is this what Iceberg expects? If so, I will submit the code.

nk1506 · 2024-04-07T17:29:34Z

@nk1506 Echoing Russell's comments, how many small files are there in your OOM case? How much memory do you set up?

I didn't use spark-engine for compaction. I was using Java Client API. My queries might distract from the original problem. Although my requirement is to compact very large datasets(say 10K datafiles) with single commit. Using RewriteFiles always might cause OOM. So I am looking something which can help to manage manifestFiles more intelligently. I think I will start different thread to discuss the other problem.

manuzhang · 2024-05-08T09:38:03Z

I have implemented a disk-based map to solve this problem. Is this what Iceberg expects? If so, I will submit the code.

@Zhanxiao-Ma I think it will be valuable to the community. Please open a PR.

pdames · 2024-05-21T02:03:37Z

Any updates here @Zhanxiao-Ma? Would love to take a look at what you've implemented if you've got a pending PR to link back to this issue, and see if there's an opportunity to work together to improve the state of affairs here!

manuzhang · 2024-07-09T16:27:01Z

I've created a draft PR which stores equality deletes in RocksDB. It's been verified in our environment, but requires more work to be integrated with existing API, caching mechanism, etc.

Zhanxiao-Ma added the question Further information is requested label Mar 28, 2024

Zhanxiao-Ma changed the title ~~Rewrite Files Action OOM~~ Spark rewrite Files Action OOM Mar 28, 2024

manuzhang mentioned this issue Jul 10, 2024

Add RocksDBStructLikeSet for storing equality deletes #10667

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spark rewrite Files Action OOM #10054

Spark rewrite Files Action OOM #10054

Zhanxiao-Ma commented Mar 28, 2024

manuzhang commented Mar 28, 2024

RussellSpitzer commented Mar 28, 2024

Zhanxiao-Ma commented Mar 29, 2024

nk1506 commented Apr 5, 2024

manuzhang commented Apr 5, 2024

Zhanxiao-Ma commented Apr 6, 2024

Zhanxiao-Ma commented Apr 6, 2024

nk1506 commented Apr 7, 2024

manuzhang commented May 8, 2024 •

edited

Loading

pdames commented May 21, 2024

manuzhang commented Jul 9, 2024 •

edited

Loading

Spark rewrite Files Action OOM #10054

Spark rewrite Files Action OOM #10054

Comments

Zhanxiao-Ma commented Mar 28, 2024

Query engine

Question

manuzhang commented Mar 28, 2024

RussellSpitzer commented Mar 28, 2024

Zhanxiao-Ma commented Mar 29, 2024

nk1506 commented Apr 5, 2024

manuzhang commented Apr 5, 2024

Zhanxiao-Ma commented Apr 6, 2024

Zhanxiao-Ma commented Apr 6, 2024

nk1506 commented Apr 7, 2024

manuzhang commented May 8, 2024 • edited Loading

pdames commented May 21, 2024

manuzhang commented Jul 9, 2024 • edited Loading

manuzhang commented May 8, 2024 •

edited

Loading

manuzhang commented Jul 9, 2024 •

edited

Loading