Does MERGE INTO operations support hidden partition on timestamp columns? #2765

canhduong28 · 2021-06-30T11:55:49Z

Hello,

We have noticed that transforming expressions on timestamp column in MERGE INTO queries don't get pushed down to table scan filter.

Query (published column is timestamp)

MERGE INTO iceberg.books.books_changeset_base_schema T USING books_changeset_base_schema595503 U ON
date(T.published) IN (date '1937-01-01', date '1925-01-01', date '1851-01-01') AND T.title = U.title
WHEN MATCHED THEN UPDATE SET *
WHEN NOT MATCHED THEN INSERT *

Physical Plan

21/06/29 12:52:08 INFO BaseTableScan: Scanning table iceberg.books.books_changeset_base_schema snapshot 6883139844713745910 created at 2021-06-29 12:52:06.246 with filter true
21/06/29 12:52:10 INFO SparkWrite: Committing overwrite of 1 data files with 3 new data files, scanSnapshotId: 6883139844713745910, conflictDetectionFilter: true to table iceberg.books.books_changeset_base_schema

But the same query works when the published column is date

MERGE INTO iceberg.books.books_changeset_base_schema T USING books_changeset_base_schema595503 U ON
date(T.published) IN (date '1937-01-01', date '1925-01-01', date '1851-01-01') AND T.title = U.title
WHEN MATCHED THEN UPDATE SET *
WHEN NOT MATCHED THEN INSERT *

Physical Plan

21/06/29 12:52:08 INFO BaseTableScan: Scanning table iceberg.books.books_changeset_base_schema snapshot 4766385065641761074 created at 2021-06-29 13:12:06.246 with filter ref(name="published") in (-12053, -16436, -43464)
21/06/29 12:52:10 INFO SparkWrite: Committing overwrite of 1 data files with 3 new data files, scanSnapshotId: 4766385065641761074, conflictDetectionFilter: ref(name="published") in (-12053, -16436, -43464) to table iceberg.books.books_changeset_base_schema

Is this expected behaviour from Iceberg?

The text was updated successfully, but these errors were encountered:

kbendick · 2021-07-02T23:31:26Z

Can you please provide the Spark and Iceberg versions you're using, as well as the table settings (such as file format)?

I think this might be due to a way that Spark pushes down filters (and not always correctly handling typed conversions), but without version info it's hard to say much.

If you could also please provide the full query plan / explain, that would be great.

canhduong28 · 2021-07-04T13:09:20Z

@kbendick

We are using Spark 3.0.1 and Iceberg 0.11.1 and Parquet files.

Full query plain

== Physical Plan ==
ReplaceData IcebergBatchWrite(table=iceberg.perks.affiliate_network_transactions_v3, format=PARQUET), org.apache.spark.sql.execution.datasources.v2.ExtendedDataSourceV2Strategy$$Lambda$3778/0x000000084186dc40@31c9d4a7
+- AdaptiveSparkPlan isFinalPlan=true
   +- Sort [icebergdaytransform(created_date#299) ASC NULLS FIRST], false, 0
      +- MergeInto MergeIntoParams(isnull(_row_from_source_#314),isnull(_row_from_target_#339),ArrayBuffer(true),ArrayBuffer(Some(List(record_id#8, an_transaction_datetime#87, created_date#63, is_payable#81, affiliate_network#65, cashback_amount#73, reason_for_status#85, etl_job_id#83, base_currency#61, updated_date#71, an_correlator#67, paid_to_revolut#89, sale_amount#75, an_status#59, country#69, an_merchant_name#77, status#79))),ArrayBuffer(true),ArrayBuffer(Some(List(record_id#8, an_transaction_datetime#87, created_date#63, is_payable#81, affiliate_network#65, cashback_amount#73, reason_for_status#85, etl_job_id#83, base_currency#61, updated_date#71, an_correlator#67, paid_to_revolut#89, sale_amount#75, an_status#59, country#69, an_merchant_name#77, status#79))),List(record_id#297, an_transaction_datetime#298, created_date#299, is_payable#300, affiliate_network#301, cashback_amount#302, reason_for_status#303, etl_job_id#304, base_currency#305, updated_date#306, an_correlator#307, paid_to_revolut#308, sale_amount#309, an_status#310, country#311, an_merchant_name#312, status#313),List(record_id#8, sale_amount#75, an_status#59, created_date#63, updated_date#71, affiliate_network#65, status#79, base_currency#61, cashback_amount#73, reason_for_status#85, is_payable#81, etl_job_id#83, country#69, an_transaction_datetime#87, an_correlator#67, an_merchant_name#77, paid_to_revolut#89, _row_from_source_#314, record_id#297, an_transaction_datetime#298, created_date#299, is_payable#300, affiliate_network#301, cashback_amount#302, reason_for_status#303, etl_job_id#304, base_currency#305, updated_date#306, an_correlator#307, paid_to_revolut#308, sale_amount#309, an_status#310, country#311, an_merchant_name#312, status#313, _file#334, _pos#335L, _row_from_target_#339)), [record_id#297, an_transaction_datetime#298, created_date#299, is_payable#300, affiliate_network#301, cashback_amount#302, reason_for_status#303, etl_job_id#304, base_currency#305, updated_date#306, an_correlator#307, paid_to_revolut#308, sale_amount#309, an_status#310, country#311, an_merchant_name#312, status#313]
         +- SortMergeJoin [record_id#8], [record_id#297], FullOuter, ((cast(created_date#299 as date) >= 18801) AND (cast(created_date#299 as date) <= 18802))
            :- *(10) Project [record_id#8, sale_amount#75, an_status#59, created_date#63, updated_date#71, affiliate_network#65, status#79, base_currency#61, cashback_amount#73, reason_for_status#85, is_payable#81, etl_job_id#83, country#69, an_transaction_datetime#87, an_correlator#67, an_merchant_name#77, paid_to_revolut#89, true AS _row_from_source_#314]
            :  +- SortAggregate(key=[record_id#8], functions=[last(sale_amount#14, false), last(an_status#15, false), last(created_date#22, false), last(updated_date#23, false), last(affiliate_network#11, false), last(status#16, false), last(base_currency#12, false), last(cashback_amount#13, false), last(reason_for_status#17, false), last(is_payable#20, false), last(etl_job_id#7, false), last(country#18, false), last(an_transaction_datetime#21, false), last(an_correlator#9, false), last(an_merchant_name#10, false), last(paid_to_revolut#19, false)])
            :     +- *(9) Sort [record_id#8 ASC NULLS FIRST], false, 0
            :        +- CustomShuffleReader coalesced
            :           +- ShuffleQueryStage 0
            :              +- Exchange hashpartitioning(record_id#8, 200), true, [id=#303]
            :                 +- SortAggregate(key=[record_id#8], functions=[partial_last(sale_amount#14, false), partial_last(an_status#15, false), partial_last(created_date#22, false), partial_last(updated_date#23, false), partial_last(affiliate_network#11, false), partial_last(status#16, false), partial_last(base_currency#12, false), partial_last(cashback_amount#13, false), partial_last(reason_for_status#17, false), partial_last(is_payable#20, false), partial_last(etl_job_id#7, false), partial_last(country#18, false), partial_last(an_transaction_datetime#21, false), partial_last(an_correlator#9, false), partial_last(an_merchant_name#10, false), partial_last(paid_to_revolut#19, false)])
            :                    +- *(1) Sort [record_id#8 ASC NULLS FIRST], false, 0
            :                       +- FileScan csv [etl_job_id#7,record_id#8,an_correlator#9,an_merchant_name#10,affiliate_network#11,base_currency#12,cashback_amount#13,sale_amount#14,an_status#15,status#16,reason_for_status#17,country#18,paid_to_revolut#19,is_payable#20,an_transaction_datetime#21,created_date#22,updated_date#23] Batched: false, DataFilters: [], Format: CSV, Location: InMemoryFileIndex[gs://iceberg/affiliate_network_transactions_v3/2021/06..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<etl_job_id:timestamp,record_id:string,an_correlator:string,an_merchant_name:string,affilia...
            +- *(11) Sort [record_id#297 ASC NULLS FIRST], false, 0
               +- CustomShuffleReader coalesced
                  +- ShuffleQueryStage 4
                     +- Exchange hashpartitioning(record_id#297, 200), true, [id=#683]
                        +- *(8) Project [record_id#297, an_transaction_datetime#298, created_date#299, is_payable#300, affiliate_network#301, cashback_amount#302, reason_for_status#303, etl_job_id#304, base_currency#305, updated_date#306, an_correlator#307, paid_to_revolut#308, sale_amount#309, an_status#310, country#311, an_merchant_name#312, status#313, _file#334, _pos#335L, true AS _row_from_target_#339]
                           +- DynamicFileFilterExec[record_id#297, an_transaction_datetime#298, created_date#299, is_payable#300, affiliate_network#301, cashback_amount#302, reason_for_status#303, etl_job_id#304, base_currency#305, updated_date#306, an_correlator#307, paid_to_revolut#308, sale_amount#309, an_status#310, country#311, an_merchant_name#312, status#313, _file#334, _pos#335L]
                              :- *(6) Project [record_id#297, an_transaction_datetime#298, created_date#299, is_payable#300, affiliate_network#301, cashback_amount#302, reason_for_status#303, etl_job_id#304, base_currency#305, updated_date#306, an_correlator#307, paid_to_revolut#308, sale_amount#309, an_status#310, country#311, an_merchant_name#312, status#313, _file#334, _pos#335L]
                              :  +- ExtendedBatchScan[record_id#297, an_transaction_datetime#298, created_date#299, is_payable#300, affiliate_network#301, cashback_amount#302, reason_for_status#303, etl_job_id#304, base_currency#305, updated_date#306, an_correlator#307, paid_to_revolut#308, sale_amount#309, an_status#310, country#311, an_merchant_name#312, status#313, _file#334, _pos#335L] iceberg.perks.affiliate_network_transactions_v3 [filters=]
                              +- *(7) Filter (isnotnull(_sum_#338L) AND (_sum_#338L > 1))
                                 +- *(7) HashAggregate(keys=[_file#334, _pos#335L], functions=[sum(_affectedFiles_#336)])
                                    +- ShuffleQueryStage 3
                                       +- Exchange hashpartitioning(_file#334, _pos#335L, 200), true, [id=#568]
                                          +- *(5) HashAggregate(keys=[_file#334, _pos#335L], functions=[partial_sum(_affectedFiles_#336)])
                                             +- Project [_file#334, _pos#335L, AccumulateFiles(SetAccumulator(id: 571, name: Some(internal.metrics.merge.affectedFiles), value: [gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-24/00022-298-8beea94b-2b6f-4256-8269-35a9dc5d9024-00002.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00031-323-8bebe5ce-afa7-4864-83e5-3d2c96535903-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00010-302-c3aa1128-dfc4-457e-bb81-0672d64bf730-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00011-313-ab9551e7-3ce5-420b-b0f2-4028ad2a444c-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-24/00019-295-700431cb-c7c2-4cbe-8506-d9c5edfd6f19-00002.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00028-320-2a2f7876-e7aa-45f7-bc78-d27101d067db-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00038-334-c9b508de-110a-4951-bac1-6ad71dfb08ad-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-24/00005-281-50e70176-eb21-4bbe-9942-da6d23ed04d7-00002.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-24/00010-286-6c5fbed6-7f12-49c1-b24c-ff6a807a9fe4-00002.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-24/00014-290-12d33502-3ef5-4694-8bc5-0f8b5b742d1c-00002.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00026-318-5ccdc550-171c-46be-91f1-dc6f1d7e9c06-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-24/00008-284-c887f4d4-80bf-4d1d-89b2-66f6b8a94757-00002.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-24/00017-293-8a6bb2b8-542e-4e0c-b3f4-bd51f80da087-00002.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00019-319-4481ebef-cc5e-4f75-bb58-1036fe45af3f-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00040-335-0d6c8444-810e-485a-96e7-d4586436aa2e-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00012-299-a836204a-042a-48de-bd48-2c07aeaf9c93-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-24/00009-285-1a527a5f-6313-4c77-b5cc-1df15ce8b78b-00002.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00015-307-344dfb30-35c0-45a5-86b1-202d1717f52b-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-24/00013-289-5e8c7dc9-4e6c-402d-a9a7-5404bcef0b3b-00002.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-24/00000-276-32e02e0d-671a-42df-93aa-21e4d3c3fee5-00002.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00017-309-6e13f650-a92e-4d8b-b592-538a6bff242b-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00012-304-f1aa8572-e5af-4cc5-984e-e366c787ba63-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00009-311-98f4a5b4-91e7-4779-a65c-69dff24e58c8-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00032-329-bb70ce71-c176-4416-9a7f-d18b99582024-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00027-319-4c8403b8-84ca-4762-a153-6e86bca52ef4-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00016-317-39256fae-6dfc-4049-8fcf-cae14bd1bc85-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-24/00020-296-e328e835-3848-439f-b753-d6848d58f9c2-00002.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00008-300-99162aec-9e2d-4af4-bdc2-22dede713a5c-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-24/00021-297-3c4e8614-2351-4bf2-8b0e-590c80cb19f0-00002.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00020-312-65c68bce-83fd-41fd-bd49-8efcab10ec8e-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00006-298-36dbe91d-61d8-46d4-8536-e0b8719b4368-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-24/00001-277-ba89d8ce-f3fd-4a98-b442-6908160b2506-00002.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00023-305-951c9552-99a3-4a07-a1a2-60758b298735-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00002-301-7627007c-79a5-4e1c-934f-18b591d0a45b-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00026-323-5adf7f29-7ab3-4628-9a13-02f63df35718-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00015-316-6f3c2735-0f5e-4bbc-b05c-739815ab55d0-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00034-331-4471c635-9280-40db-88fa-de2a4f645715-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00033-330-d6e3bb29-2eb3-4409-9508-4519b5d97234-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-24/00028-304-08fe25c0-1a76-4428-b898-b7b3e8fe3241-00002.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-24/00024-300-773682b0-5595-4ace-85e3-93744b2e82e6-00002.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00001-293-c87efacc-ca87-446e-a9cb-45da8fc17f1f-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00028-325-c1e082ad-b106-4cbe-8bc7-8f2b60e523d9-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-24/00002-278-2763a538-084c-4052-87f1-7332517be728-00002.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00017-318-73f2d8d8-5f82-4a8c-9679-3ea9f5768de3-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-24/00003-279-a9d7a583-ac33-4755-9532-610805ae998e-00002.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00021-297-3c4e8614-2351-4bf2-8b0e-590c80cb19f0-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-24/00025-301-96b9943e-db5e-473b-b628-771800716ffe-00002.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00001-277-ba89d8ce-f3fd-4a98-b442-6908160b2506-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00005-297-cf720f22-38e6-4d5d-982e-a737b6bc494f-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00030-322-a5873853-a6bd-4845-a880-f6f285ccb5f9-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00003-295-79599b3e-9073-4654-8bca-f7d5f68eb9b9-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00029-326-90fdcb7b-94db-4f96-b6d2-f92fb6833b15-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00006-309-9451babb-27c5-4e90-9cb2-b3ff545c7acc-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00039-307-a57a4cdd-5ae6-4102-a102-27cbb4bc1467-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-24/00027-303-718e14fc-0a8c-4561-a172-92d6c936c834-00002.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00003-295-53f71419-c334-4140-aa80-a9e95af8dde0-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00001-298-999d0420-d411-4ad3-8c41-3962bd9e1dcc-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00030-327-670e354b-9d15-40a5-bcf0-662a35851011-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00007-310-ef1c7000-b615-4f4a-bc20-9d7012281f37-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-24/00011-287-ae55c24f-2892-4e8d-a74b-c06df5cb1f03-00002.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00021-313-b79fbc8b-0d62-4bb9-8d3d-8533c1549541-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00025-317-c27ae109-7cb7-4dd5-b99e-eaab07942260-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-24/00023-299-a727dac8-1ebb-424a-b39b-806844ff4426-00002.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-24/00016-292-90a46de0-213f-4fb8-8db9-3e33b6eb79da-00002.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00035-306-1edd8918-d46b-48be-88f8-2e3178e9578e-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00000-297-34121ef6-ac84-4eb9-9467-aba72e4af0ae-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00021-303-77fd9635-5937-464d-9d10-e2aff181eaaa-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00018-300-f6c69f58-d785-4c83-9ffd-22069a7716f7-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00004-302-ccae989f-7c4f-4836-b1d9-d9717eabb775-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00010-312-e99d93a6-ec64-4057-a066-669cbe2e72c3-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00011-287-ae55c24f-2892-4e8d-a74b-c06df5cb1f03-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00008-296-cd001e53-20d9-4658-af4a-eb06e22e4b0f-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00036-332-ed3c2123-601d-48c2-bd30-e368670397e3-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00020-320-212b7926-edc4-4063-9453-94f1cd8a3eb1-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00004-296-f2f2236c-7230-4ec3-a0ed-546b8eea1a6b-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00014-306-bb0741ed-92cb-42b9-b644-43ee2e4aa71c-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-24/00018-294-0b0a1177-b997-4c25-9bb8-484a2c87b397-00002.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00029-321-567e24e6-cbba-43dc-9df6-8c40d4ccc586-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-24/00007-283-64ccb9a5-9c14-4db7-8d0e-e02b70815fd5-00002.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00016-292-90a46de0-213f-4fb8-8db9-3e33b6eb79da-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-24/00006-282-af65d2fe-49be-49f1-b322-c13e6cdd9d05-00002.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00013-314-5618ce45-38fe-4e28-8d85-921b08a0aa71-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-24/00026-302-731561c7-abdb-4cba-bae2-ba5ca86288d2-00002.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-24/00015-291-99735a58-3243-4777-819f-1112f3f51d85-00002.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-24/00029-305-80b36bd3-ace8-41c1-8dee-32b7385c3566-00002.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00000-292-cb077c19-cce1-4ea6-912e-239bf4bd10eb-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00024-321-adf8f153-77e4-4df9-8a5a-765c8738bf64-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-24/00004-280-9351fd11-44f3-44ce-a2b5-68a822d0c1d7-00002.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00002-294-6278ebd0-b136-4398-8f3d-981cb6092583-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00037-333-e3e8204f-2049-4ebf-a492-fe2319d850a7-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00007-299-1f1669ee-2dd6-4c38-b025-cc18f0200c4e-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00022-314-f6873031-6f93-4ef6-a1e7-57516b9349e0-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00025-322-0e55f71d-8f55-4d99-89d5-8e0d8d40cc27-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00014-315-27acba9c-3b1d-40b7-9ffd-54c36eeb7862-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-24/00012-288-41351ab1-2e94-4b72-ad90-573b761ca5ad-00002.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00018-310-ff3aca87-8f60-4829-bb65-d1d08ccf94ab-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00011-303-ea8bb29f-9d55-4f1a-bddf-d6a0bfcdc70a-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00022-304-fc70387d-e440-4096-bd6d-86c8d777f26d-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00016-308-e3285d30-e0a2-4f63-adc3-e59861c8f5e3-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00005-308-07b08dff-41cb-4b8b-aef2-952d6d838102-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00027-324-05e59b10-123a-4594-9bab-e6dfffb44ed3-00001.parquet, gs://iceberg/affiliate_network_transactions_v3/data/created_date_day=2021-06-23/00031-328-f0aa2d5d-85ad-4e52-90b0-a83961be9e56-00001.parquet]), _file#334) AS _affectedFiles_#336]
                                                +- *(4) BroadcastHashJoin [record_id#8], [record_id#297], Inner, BuildLeft
                                                   :- BroadcastQueryStage 2
                                                   :  +- BroadcastExchange HashedRelationBroadcastMode(List(input[0, string, true])), [id=#464]
                                                   :     +- *(3) HashAggregate(keys=[record_id#8], functions=[])
                                                   :        +- CustomShuffleReader coalesced
                                                   :           +- ShuffleQueryStage 1
                                                   :              +- Exchange hashpartitioning(record_id#8, 200), true, [id=#348]
                                                   :                 +- *(2) HashAggregate(keys=[record_id#8], functions=[])
                                                   :                    +- *(2) Project [record_id#8]
                                                   :                       +- *(2) Filter isnotnull(record_id#8)
                                                   :                          +- FileScan csv [record_id#8] Batched: false, DataFilters: [isnotnull(record_id#8)], Format: CSV, Location: InMemoryFileIndex[gs://iceberg/affiliate_network_transactions_v3/2021/06..., PartitionFilters: [], PushedFilters: [IsNotNull(record_id)], ReadSchema: struct<record_id:string>
                                                   +- *(4) Project [record_id#297, _file#334, _pos#335L]
                                                      +- *(4) Filter (((isnotnull(created_date#299) AND (cast(created_date#299 as date) >= 18801)) AND (cast(created_date#299 as date) <= 18802)) AND isnotnull(record_id#297))
                                                         +- ExtendedBatchScan[record_id#297, an_transaction_datetime#298, created_date#299, is_payable#300, affiliate_network#301, cashback_amount#302, reason_for_status#303, etl_job_id#304, base_currency#305, updated_date#306, an_correlator#307, paid_to_revolut#308, sale_amount#309, an_status#310, country#311, an_merchant_name#312, status#313, _file#334, _pos#335L] iceberg.perks.affiliate_network_transactions_v3 [filters=]

kbendick · 2021-08-27T22:04:54Z

Hi @nautilus28. Sorry to leave this hanging for so long.

There is a bug in Spark (that has been patched but I don't think it's been released) where queries without any unresolved fields (such as named columns in the query) don't properly parse. There's also a bug where MERGE INTO queries are resolved by ordinal position (and not by field name) in Spark, which has also been patched but I don't believe we'll see those patches upstream until Spark 3.2. Though I don't think either of those are what you're hitting.

Is this just a concern about predicate pushdown? I do seem to recall that (at least in older versions), predicates aren't always pushed down if the type is different (I think especially coming from subqueries). I'd have to look into it again.

I'll see if I can recall (or find somebody who knows more than I do) about the conditions in which predicate pushdown does or does not occur.

If you can upgrade to Spark 3.1, that would be ideal. Now that we support Spark 3.1, I will admit that I am always a bit suspicious of a lot of behavior with Spark 3.0.x (though I cannot confirm that that's the issue here).

smaspe · 2021-10-01T16:39:09Z

Hi @kbendick,
We've upgraded to Spark 3.1.2, with no changes.
The concern is that updating rows in multiple partitions in a large table is extremely slow. We've mitigated so far by splitting queries into chunks that target smaller numbers of partitions, but it's far from perfect.

What we don't understand is: how can we tell Iceberg what (hidden) partitions to target specifically, so that it doesn't need to scan the whole table?

What Canh tried (ON date(T.published) IN (date '1937-01-01', ...)) still doesn't work in Spark 3.1.2. We can use things like ON T.published >= date '1937-01-01' and T.published < date '1937-01-02', but that doesn't seem very practical for dozens of partitions (and, if those partitions are sparse, we can't join then in a single larger range)

We suspect that it might have to do with https://issues.apache.org/jira/browse/SPARK-35245

This is because filtering side do not has selective predicate

However I fail to see how we can make the predicate selective, if it isn't at the moment.

Thanks in advance for any light you can shed on this!

github-actions · 2024-06-19T00:12:26Z

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.

github-actions · 2024-07-03T00:12:43Z

This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'

github-actions bot added the stale label Jun 19, 2024

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jul 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does MERGE INTO operations support hidden partition on timestamp columns? #2765

Does MERGE INTO operations support hidden partition on timestamp columns? #2765

canhduong28 commented Jun 30, 2021 •

edited

Loading

kbendick commented Jul 2, 2021 •

edited

Loading

canhduong28 commented Jul 4, 2021

kbendick commented Aug 27, 2021

smaspe commented Oct 1, 2021

github-actions bot commented Jun 19, 2024

github-actions bot commented Jul 3, 2024

Does MERGE INTO operations support hidden partition on timestamp columns? #2765

Does MERGE INTO operations support hidden partition on timestamp columns? #2765

Comments

canhduong28 commented Jun 30, 2021 • edited Loading

kbendick commented Jul 2, 2021 • edited Loading

canhduong28 commented Jul 4, 2021

kbendick commented Aug 27, 2021

smaspe commented Oct 1, 2021

github-actions bot commented Jun 19, 2024

github-actions bot commented Jul 3, 2024

canhduong28 commented Jun 30, 2021 •

edited

Loading

kbendick commented Jul 2, 2021 •

edited

Loading