-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Does MERGE INTO operations support hidden partition on timestamp columns? #2765
Comments
Can you please provide the Spark and Iceberg versions you're using, as well as the table settings (such as file format)? I think this might be due to a way that Spark pushes down filters (and not always correctly handling typed conversions), but without version info it's hard to say much. If you could also please provide the full query plan / explain, that would be great. |
We are using Spark 3.0.1 and Iceberg 0.11.1 and Parquet files. Full query plain
|
Hi @nautilus28. Sorry to leave this hanging for so long. There is a bug in Spark (that has been patched but I don't think it's been released) where queries without any unresolved fields (such as named columns in the query) don't properly parse. There's also a bug where MERGE INTO queries are resolved by ordinal position (and not by field name) in Spark, which has also been patched but I don't believe we'll see those patches upstream until Spark 3.2. Though I don't think either of those are what you're hitting. Is this just a concern about predicate pushdown? I do seem to recall that (at least in older versions), predicates aren't always pushed down if the type is different (I think especially coming from subqueries). I'd have to look into it again. I'll see if I can recall (or find somebody who knows more than I do) about the conditions in which predicate pushdown does or does not occur. If you can upgrade to Spark 3.1, that would be ideal. Now that we support Spark 3.1, I will admit that I am always a bit suspicious of a lot of behavior with Spark 3.0.x (though I cannot confirm that that's the issue here). |
Hi @kbendick, What we don't understand is: how can we tell Iceberg what (hidden) partitions to target specifically, so that it doesn't need to scan the whole table? What Canh tried ( We suspect that it might have to do with https://issues.apache.org/jira/browse/SPARK-35245
However I fail to see how we can make the predicate selective, if it isn't at the moment. Thanks in advance for any light you can shed on this! |
This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible. |
This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' |
Hello,
We have noticed that transforming expressions on
timestamp
column inMERGE INTO
queries don't get pushed down to table scan filter.Query (
published
column istimestamp
)Physical Plan
But the same query works when the
published
column isdate
Physical Plan
Is this expected behaviour from Iceberg?
The text was updated successfully, but these errors were encountered: