Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does MERGE INTO operations support hidden partition on timestamp columns? #2765

Closed
canhduong28 opened this issue Jun 30, 2021 · 6 comments
Closed
Labels

Comments

@canhduong28
Copy link

canhduong28 commented Jun 30, 2021

Hello,

We have noticed that transforming expressions on timestamp column in MERGE INTO queries don't get pushed down to table scan filter.

Query (published column is timestamp)

MERGE INTO iceberg.books.books_changeset_base_schema T USING books_changeset_base_schema595503 U ON
date(T.published) IN (date '1937-01-01', date '1925-01-01', date '1851-01-01') AND T.title = U.title
WHEN MATCHED THEN UPDATE SET *
WHEN NOT MATCHED THEN INSERT *

Physical Plan

21/06/29 12:52:08 INFO BaseTableScan: Scanning table iceberg.books.books_changeset_base_schema snapshot 6883139844713745910 created at 2021-06-29 12:52:06.246 with filter true
21/06/29 12:52:10 INFO SparkWrite: Committing overwrite of 1 data files with 3 new data files, scanSnapshotId: 6883139844713745910, conflictDetectionFilter: true to table iceberg.books.books_changeset_base_schema

But the same query works when the published column is date

MERGE INTO iceberg.books.books_changeset_base_schema T USING books_changeset_base_schema595503 U ON
date(T.published) IN (date '1937-01-01', date '1925-01-01', date '1851-01-01') AND T.title = U.title
WHEN MATCHED THEN UPDATE SET *
WHEN NOT MATCHED THEN INSERT *

Physical Plan

21/06/29 12:52:08 INFO BaseTableScan: Scanning table iceberg.books.books_changeset_base_schema snapshot 4766385065641761074 created at 2021-06-29 13:12:06.246 with filter ref(name="published") in (-12053, -16436, -43464)
21/06/29 12:52:10 INFO SparkWrite: Committing overwrite of 1 data files with 3 new data files, scanSnapshotId: 4766385065641761074, conflictDetectionFilter: ref(name="published") in (-12053, -16436, -43464) to table iceberg.books.books_changeset_base_schema

Is this expected behaviour from Iceberg?

@kbendick
Copy link
Contributor

kbendick commented Jul 2, 2021

Can you please provide the Spark and Iceberg versions you're using, as well as the table settings (such as file format)?

I think this might be due to a way that Spark pushes down filters (and not always correctly handling typed conversions), but without version info it's hard to say much.

If you could also please provide the full query plan / explain, that would be great.

@canhduong28
Copy link
Author

@kbendick

We are using Spark 3.0.1 and Iceberg 0.11.1 and Parquet files.

Full query plain

== Physical Plan ==
ReplaceData IcebergBatchWrite(table=iceberg.perks.affiliate_network_transactions_v3, format=PARQUET), org.apache.spark.sql.execution.datasources.v2.ExtendedDataSourceV2Strategy$$Lambda$3778/0x000000084186dc40@31c9d4a7
+- AdaptiveSparkPlan isFinalPlan=true
   +- Sort [icebergdaytransform(created_date#299) ASC NULLS FIRST], false, 0
      +- MergeInto MergeIntoParams(isnull(_row_from_source_#314),isnull(_row_from_target_#339),ArrayBuffer(true),ArrayBuffer(Some(List(record_id#8, an_transaction_datetime#87, created_date#63, is_payable#81, affiliate_network#65, cashback_amount#73, reason_for_status#85, etl_job_id#83, base_currency#61, updated_date#71, an_correlator#67, paid_to_revolut#89, sale_amount#75, an_status#59, country#69, an_merchant_name#77, status#79))),ArrayBuffer(true),ArrayBuffer(Some(List(record_id#8, an_transaction_datetime#87, created_date#63, is_payable#81, affiliate_network#65, cashback_amount#73, reason_for_status#85, etl_job_id#83, base_currency#61, updated_date#71, an_correlator#67, paid_to_revolut#89, sale_amount#75, an_status#59, country#69, an_merchant_name#77, status#79))),List(record_id#297, an_transaction_datetime#298, created_date#299, is_payable#300, affiliate_network#301, cashback_amount#302, reason_for_status#303, etl_job_id#304, base_currency#305, updated_date#306, an_correlator#307, paid_to_revolut#308, sale_amount#309, an_status#310, country#311, an_merchant_name#312, status#313),List(record_id#8, sale_amount#75, an_status#59, created_date#63, updated_date#71, affiliate_network#65, status#79, base_currency#61, cashback_amount#73, reason_for_status#85, is_payable#81, etl_job_id#83, country#69, an_transaction_datetime#87, an_correlator#67, an_merchant_name#77, paid_to_revolut#89, _row_from_source_#314, record_id#297, an_transaction_datetime#298, created_date#299, is_payable#300, affiliate_network#301, cashback_amount#302, reason_for_status#303, etl_job_id#304, base_currency#305, updated_date#306, an_correlator#307, paid_to_revolut#308, sale_amount#309, an_status#310, country#311, an_merchant_name#312, status#313, _file#334, _pos#335L, _row_from_target_#339)), [record_id#297, an_transaction_datetime#298, created_date#299, is_payable#300, affiliate_network#301, cashback_amount#302, reason_for_status#303, etl_job_id#304, base_currency#305, updated_date#306, an_correlator#307, paid_to_revolut#308, sale_amount#309, an_status#310, country#311, an_merchant_name#312, status#313]
         +- SortMergeJoin [record_id#8], [record_id#297], FullOuter, ((cast(created_date#299 as date) >= 18801) AND (cast(created_date#299 as date) <= 18802))
            :- *(10) Project [record_id#8, sale_amount#75, an_status#59, created_date#63, updated_date#71, affiliate_network#65, status#79, base_currency#61, cashback_amount#73, reason_for_status#85, is_payable#81, etl_job_id#83, country#69, an_transaction_datetime#87, an_correlator#67, an_merchant_name#77, paid_to_revolut#89, true AS _row_from_source_#314]
            :  +- SortAggregate(key=[record_id#8], functions=[last(sale_amount#14, false), last(an_status#15, false), last(created_date#22, false), last(updated_date#23, false), last(affiliate_network#11, false), last(status#16, false), last(base_currency#12, false), last(cashback_amount#13, false), last(reason_for_status#17, false), last(is_payable#20, false), last(etl_job_id#7, false), last(country#18, false), last(an_transaction_datetime#21, false), last(an_correlator#9, false), last(an_merchant_name#10, false), last(paid_to_revolut#19, false)])
            :     +- *(9) Sort [record_id#8 ASC NULLS FIRST], false, 0
            :        +- CustomShuffleReader coalesced
            :           +- ShuffleQueryStage 0
            :              +- Exchange hashpartitioning(record_id#8, 200), true, [id=#303]
            :                 +- SortAggregate(key=[record_id#8], functions=[partial_last(sale_amount#14, false), partial_last(an_status#15, false), partial_last(created_date#22, false), partial_last(updated_date#23, false), partial_last(affiliate_network#11, false), partial_last(status#16, false), partial_last(base_currency#12, false), partial_last(cashback_amount#13, false), partial_last(reason_for_status#17, false), partial_last(is_payable#20, false), partial_last(etl_job_id#7, false), partial_last(country#18, false), partial_last(an_transaction_datetime#21, false), partial_last(an_correlator#9, false), partial_last(an_merchant_name#10, false), partial_last(paid_to_revolut#19, false)])
            :                    +- *(1) Sort [record_id#8 ASC NULLS FIRST], false, 0
            :                       +- FileScan csv [etl_job_id#7,record_id#8,an_correlator#9,an_merchant_name#10,affiliate_network#11,base_currency#12,cashback_amount#13,sale_amount#14,an_status#15,status#16,reason_for_status#17,country#18,paid_to_revolut#19,is_payable#20,an_transaction_datetime#21,created_date#22,updated_date#23] Batched: false, DataFilters: [], Format: CSV, Location: InMemoryFileIndex[gs://iceberg/affiliate_network_transactions_v3/2021/06..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<etl_job_id:timestamp,record_id:string,an_correlator:string,an_merchant_name:string,affilia...
            +- *(11) Sort [record_id#297 ASC NULLS FIRST], false, 0
               +- CustomShuffleReader coalesced
                  +- ShuffleQueryStage 4
                     +- Exchange hashpartitioning(record_id#297, 200), true, [id=#683]
                        +- *(8) Project [record_id#297, an_transaction_datetime#298, created_date#299, is_payable#300, affiliate_network#301, cashback_amount#302, reason_for_status#303, etl_job_id#304, base_currency#305, updated_date#306, an_correlator#307, paid_to_revolut#308, sale_amount#309, an_status#310, country#311, an_merchant_name#312, status#313, _file#334, _pos#335L, true AS _row_from_target_#339]
                           +- DynamicFileFilterExec[record_id#297, an_transaction_datetime#298, created_date#299, is_payable#300, affiliate_network#301, cashback_amount#302, reason_for_status#303, etl_job_id#304, base_currency#305, updated_date#306, an_correlator#307, paid_to_revolut#308, sale_amount#309, an_status#310, country#311, an_merchant_name#312, status#313, _file#334, _pos#335L]
                              :- *(6) Project [record_id#297, an_transaction_datetime#298, created_date#299, is_payable#300, affiliate_network#301, cashback_amount#302, reason_for_status#303, etl_job_id#304, base_currency#305, updated_date#306, an_correlator#307, paid_to_revolut#308, sale_amount#309, an_status#310, country#311, an_merchant_name#312, status#313, _file#334, _pos#335L]
                              :  +- ExtendedBatchScan[record_id#297, an_transaction_datetime#298, created_date#299, is_payable#300, affiliate_network#301, cashback_amount#302, reason_for_status#303, etl_job_id#304, base_currency#305, updated_date#306, an_correlator#307, paid_to_revolut#308, sale_amount#309, an_status#310, country#311, an_merchant_name#312, status#313, _file#334, _pos#335L] iceberg.perks.affiliate_network_transactions_v3 [filters=]
                              +- *(7) Filter (isnotnull(_sum_#338L) AND (_sum_#338L > 1))
                                 +- *(7) HashAggregate(keys=[_file#334, _pos#335L], functions=[sum(_affectedFiles_#336)])
                                    +- ShuffleQueryStage 3
                                       +- Exchange hashpartitioning(_file#334, _pos#335L, 200), true, [id=#568]
                                          +- *(5) HashAggregate(keys=[_file#334, _pos#335L], functions=[partial_sum(_affectedFiles_#336)])
                                             +- Project [_file#334, _pos#335L, AccumulateFiles(SetAccumulator(id: 571, name: Some(internal.metrics.merge.affectedFiles), value: [gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-24/00022-298-8beea94b-2b6f-4256-8269-35a9dc5d9024-00002.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00031-323-8bebe5ce-afa7-4864-83e5-3d2c96535903-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00010-302-c3aa1128-dfc4-457e-bb81-0672d64bf730-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00011-313-ab9551e7-3ce5-420b-b0f2-4028ad2a444c-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-24/00019-295-700431cb-c7c2-4cbe-8506-d9c5edfd6f19-00002.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00028-320-2a2f7876-e7aa-45f7-bc78-d27101d067db-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00038-334-c9b508de-110a-4951-bac1-6ad71dfb08ad-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-24/00005-281-50e70176-eb21-4bbe-9942-da6d23ed04d7-00002.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-24/00010-286-6c5fbed6-7f12-49c1-b24c-ff6a807a9fe4-00002.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-24/00014-290-12d33502-3ef5-4694-8bc5-0f8b5b742d1c-00002.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00026-318-5ccdc550-171c-46be-91f1-dc6f1d7e9c06-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-24/00008-284-c887f4d4-80bf-4d1d-89b2-66f6b8a94757-00002.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-24/00017-293-8a6bb2b8-542e-4e0c-b3f4-bd51f80da087-00002.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00019-319-4481ebef-cc5e-4f75-bb58-1036fe45af3f-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00040-335-0d6c8444-810e-485a-96e7-d4586436aa2e-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00012-299-a836204a-042a-48de-bd48-2c07aeaf9c93-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-24/00009-285-1a527a5f-6313-4c77-b5cc-1df15ce8b78b-00002.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00015-307-344dfb30-35c0-45a5-86b1-202d1717f52b-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-24/00013-289-5e8c7dc9-4e6c-402d-a9a7-5404bcef0b3b-00002.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-24/00000-276-32e02e0d-671a-42df-93aa-21e4d3c3fee5-00002.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00017-309-6e13f650-a92e-4d8b-b592-538a6bff242b-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00012-304-f1aa8572-e5af-4cc5-984e-e366c787ba63-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00009-311-98f4a5b4-91e7-4779-a65c-69dff24e58c8-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00032-329-bb70ce71-c176-4416-9a7f-d18b99582024-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00027-319-4c8403b8-84ca-4762-a153-6e86bca52ef4-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00016-317-39256fae-6dfc-4049-8fcf-cae14bd1bc85-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-24/00020-296-e328e835-3848-439f-b753-d6848d58f9c2-00002.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00008-300-99162aec-9e2d-4af4-bdc2-22dede713a5c-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-24/00021-297-3c4e8614-2351-4bf2-8b0e-590c80cb19f0-00002.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00020-312-65c68bce-83fd-41fd-bd49-8efcab10ec8e-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00006-298-36dbe91d-61d8-46d4-8536-e0b8719b4368-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-24/00001-277-ba89d8ce-f3fd-4a98-b442-6908160b2506-00002.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00023-305-951c9552-99a3-4a07-a1a2-60758b298735-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00002-301-7627007c-79a5-4e1c-934f-18b591d0a45b-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00026-323-5adf7f29-7ab3-4628-9a13-02f63df35718-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00015-316-6f3c2735-0f5e-4bbc-b05c-739815ab55d0-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00034-331-4471c635-9280-40db-88fa-de2a4f645715-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00033-330-d6e3bb29-2eb3-4409-9508-4519b5d97234-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-24/00028-304-08fe25c0-1a76-4428-b898-b7b3e8fe3241-00002.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-24/00024-300-773682b0-5595-4ace-85e3-93744b2e82e6-00002.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00001-293-c87efacc-ca87-446e-a9cb-45da8fc17f1f-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00028-325-c1e082ad-b106-4cbe-8bc7-8f2b60e523d9-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-24/00002-278-2763a538-084c-4052-87f1-7332517be728-00002.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00017-318-73f2d8d8-5f82-4a8c-9679-3ea9f5768de3-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-24/00003-279-a9d7a583-ac33-4755-9532-610805ae998e-00002.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00021-297-3c4e8614-2351-4bf2-8b0e-590c80cb19f0-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-24/00025-301-96b9943e-db5e-473b-b628-771800716ffe-00002.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00001-277-ba89d8ce-f3fd-4a98-b442-6908160b2506-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00005-297-cf720f22-38e6-4d5d-982e-a737b6bc494f-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00030-322-a5873853-a6bd-4845-a880-f6f285ccb5f9-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00003-295-79599b3e-9073-4654-8bca-f7d5f68eb9b9-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00029-326-90fdcb7b-94db-4f96-b6d2-f92fb6833b15-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00006-309-9451babb-27c5-4e90-9cb2-b3ff545c7acc-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00039-307-a57a4cdd-5ae6-4102-a102-27cbb4bc1467-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-24/00027-303-718e14fc-0a8c-4561-a172-92d6c936c834-00002.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00003-295-53f71419-c334-4140-aa80-a9e95af8dde0-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00001-298-999d0420-d411-4ad3-8c41-3962bd9e1dcc-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00030-327-670e354b-9d15-40a5-bcf0-662a35851011-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00007-310-ef1c7000-b615-4f4a-bc20-9d7012281f37-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-24/00011-287-ae55c24f-2892-4e8d-a74b-c06df5cb1f03-00002.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00021-313-b79fbc8b-0d62-4bb9-8d3d-8533c1549541-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00025-317-c27ae109-7cb7-4dd5-b99e-eaab07942260-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-24/00023-299-a727dac8-1ebb-424a-b39b-806844ff4426-00002.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-24/00016-292-90a46de0-213f-4fb8-8db9-3e33b6eb79da-00002.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00035-306-1edd8918-d46b-48be-88f8-2e3178e9578e-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00000-297-34121ef6-ac84-4eb9-9467-aba72e4af0ae-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00021-303-77fd9635-5937-464d-9d10-e2aff181eaaa-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00018-300-f6c69f58-d785-4c83-9ffd-22069a7716f7-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00004-302-ccae989f-7c4f-4836-b1d9-d9717eabb775-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00010-312-e99d93a6-ec64-4057-a066-669cbe2e72c3-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00011-287-ae55c24f-2892-4e8d-a74b-c06df5cb1f03-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00008-296-cd001e53-20d9-4658-af4a-eb06e22e4b0f-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00036-332-ed3c2123-601d-48c2-bd30-e368670397e3-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00020-320-212b7926-edc4-4063-9453-94f1cd8a3eb1-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00004-296-f2f2236c-7230-4ec3-a0ed-546b8eea1a6b-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00014-306-bb0741ed-92cb-42b9-b644-43ee2e4aa71c-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-24/00018-294-0b0a1177-b997-4c25-9bb8-484a2c87b397-00002.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00029-321-567e24e6-cbba-43dc-9df6-8c40d4ccc586-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-24/00007-283-64ccb9a5-9c14-4db7-8d0e-e02b70815fd5-00002.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00016-292-90a46de0-213f-4fb8-8db9-3e33b6eb79da-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-24/00006-282-af65d2fe-49be-49f1-b322-c13e6cdd9d05-00002.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00013-314-5618ce45-38fe-4e28-8d85-921b08a0aa71-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-24/00026-302-731561c7-abdb-4cba-bae2-ba5ca86288d2-00002.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-24/00015-291-99735a58-3243-4777-819f-1112f3f51d85-00002.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-24/00029-305-80b36bd3-ace8-41c1-8dee-32b7385c3566-00002.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00000-292-cb077c19-cce1-4ea6-912e-239bf4bd10eb-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00024-321-adf8f153-77e4-4df9-8a5a-765c8738bf64-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-24/00004-280-9351fd11-44f3-44ce-a2b5-68a822d0c1d7-00002.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00002-294-6278ebd0-b136-4398-8f3d-981cb6092583-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00037-333-e3e8204f-2049-4ebf-a492-fe2319d850a7-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00007-299-1f1669ee-2dd6-4c38-b025-cc18f0200c4e-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00022-314-f6873031-6f93-4ef6-a1e7-57516b9349e0-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00025-322-0e55f71d-8f55-4d99-89d5-8e0d8d40cc27-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00014-315-27acba9c-3b1d-40b7-9ffd-54c36eeb7862-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-24/00012-288-41351ab1-2e94-4b72-ad90-573b761ca5ad-00002.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00018-310-ff3aca87-8f60-4829-bb65-d1d08ccf94ab-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00011-303-ea8bb29f-9d55-4f1a-bddf-d6a0bfcdc70a-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00022-304-fc70387d-e440-4096-bd6d-86c8d777f26d-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00016-308-e3285d30-e0a2-4f63-adc3-e59861c8f5e3-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00005-308-07b08dff-41cb-4b8b-aef2-952d6d838102-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00027-324-05e59b10-123a-4594-9bab-e6dfffb44ed3-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00031-328-f0aa2d5d-85ad-4e52-90b0-a83961be9e56-00001.parquet]), _file#334) AS _affectedFiles_#336]
                                                +- *(4) BroadcastHashJoin [record_id#8], [record_id#297], Inner, BuildLeft
                                                   :- BroadcastQueryStage 2
                                                   :  +- BroadcastExchange HashedRelationBroadcastMode(List(input[0, string, true])), [id=#464]
                                                   :     +- *(3) HashAggregate(keys=[record_id#8], functions=[])
                                                   :        +- CustomShuffleReader coalesced
                                                   :           +- ShuffleQueryStage 1
                                                   :              +- Exchange hashpartitioning(record_id#8, 200), true, [id=#348]
                                                   :                 +- *(2) HashAggregate(keys=[record_id#8], functions=[])
                                                   :                    +- *(2) Project [record_id#8]
                                                   :                       +- *(2) Filter isnotnull(record_id#8)
                                                   :                          +- FileScan csv [record_id#8] Batched: false, DataFilters: [isnotnull(record_id#8)], Format: CSV, Location: InMemoryFileIndex[gs://iceberg/affiliate_network_transactions_v3/2021/06..., PartitionFilters: [], PushedFilters: [IsNotNull(record_id)], ReadSchema: struct<record_id:string>
                                                   +- *(4) Project [record_id#297, _file#334, _pos#335L]
                                                      +- *(4) Filter (((isnotnull(created_date#299) AND (cast(created_date#299 as date) >= 18801)) AND (cast(created_date#299 as date) <= 18802)) AND isnotnull(record_id#297))
                                                         +- ExtendedBatchScan[record_id#297, an_transaction_datetime#298, created_date#299, is_payable#300, affiliate_network#301, cashback_amount#302, reason_for_status#303, etl_job_id#304, base_currency#305, updated_date#306, an_correlator#307, paid_to_revolut#308, sale_amount#309, an_status#310, country#311, an_merchant_name#312, status#313, _file#334, _pos#335L] iceberg.perks.affiliate_network_transactions_v3 [filters=]

@kbendick
Copy link
Contributor

Hi @nautilus28. Sorry to leave this hanging for so long.

There is a bug in Spark (that has been patched but I don't think it's been released) where queries without any unresolved fields (such as named columns in the query) don't properly parse. There's also a bug where MERGE INTO queries are resolved by ordinal position (and not by field name) in Spark, which has also been patched but I don't believe we'll see those patches upstream until Spark 3.2. Though I don't think either of those are what you're hitting.

Is this just a concern about predicate pushdown? I do seem to recall that (at least in older versions), predicates aren't always pushed down if the type is different (I think especially coming from subqueries). I'd have to look into it again.

I'll see if I can recall (or find somebody who knows more than I do) about the conditions in which predicate pushdown does or does not occur.

If you can upgrade to Spark 3.1, that would be ideal. Now that we support Spark 3.1, I will admit that I am always a bit suspicious of a lot of behavior with Spark 3.0.x (though I cannot confirm that that's the issue here).

@smaspe
Copy link

smaspe commented Oct 1, 2021

Hi @kbendick,
We've upgraded to Spark 3.1.2, with no changes.
The concern is that updating rows in multiple partitions in a large table is extremely slow. We've mitigated so far by splitting queries into chunks that target smaller numbers of partitions, but it's far from perfect.

What we don't understand is: how can we tell Iceberg what (hidden) partitions to target specifically, so that it doesn't need to scan the whole table?

What Canh tried (ON date(T.published) IN (date '1937-01-01', ...)) still doesn't work in Spark 3.1.2. We can use things like ON T.published >= date '1937-01-01' and T.published < date '1937-01-02', but that doesn't seem very practical for dozens of partitions (and, if those partitions are sparse, we can't join then in a single larger range)

We suspect that it might have to do with https://issues.apache.org/jira/browse/SPARK-35245

This is because filtering side do not has selective predicate

However I fail to see how we can make the predicate selective, if it isn't at the moment.

Thanks in advance for any light you can shed on this!

Copy link

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.

@github-actions github-actions bot added the stale label Jun 19, 2024
Copy link

github-actions bot commented Jul 3, 2024

This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jul 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants