Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Old Avro file not found breaks Athena iceberg table #10560

Open
ZMarouani opened this issue Jun 24, 2024 · 1 comment
Open

Old Avro file not found breaks Athena iceberg table #10560

ZMarouani opened this issue Jun 24, 2024 · 1 comment
Labels
AWS bug Something isn't working

Comments

@ZMarouani
Copy link

ZMarouani commented Jun 24, 2024

Apache Iceberg version

None

Query engine

Athena

Please describe the bug 🐞

The very first created metadata.json , avro and snapshot avro files for my iceberg table on Athena and glue catalog were deleted because i have a TTL on my s3 bucket , i still have all other more recent files in metadata/ and data/ folders , but i cannot use the table anymore in anyway , which is weird since it is supposed to at least either update one single metadata file or at least use the last generated metadata or even be able to refresh the metadata files , loosing only the very oldest file breaks the whole table is really bad , here is more explanation below :
Athena query execution on the Iceberg table , for now the error i get is :
GENERIC_INTERNAL_ERROR: io.trino.hdfs.s3.TrinoS3FileSystem$UnrecoverableS3OperationException: com.amazonaws.services.s3.model.AmazonS3Exception: The specified key does not exist. (Service: Amazon S3; Status Code: 404; Error Code: NoSuchKey; Request ID: VW53B9PKC8FVD3G6; S3 Extended Request ID: C0aljha+rUMGEYZQ/oA5QVF3/Ggsg17YTuEDQOFUabWcJGxjEXb0vZ9zMcqNwml/GOy7Ka8D4UDwU5lrqBDKTg==; Proxy: null), S3 Extended Request ID: C0aljha+rUMGEYZQ/oA5QVF3/Ggsg17YTuEDQOFUabWcJGxjEXb0vZ9zMcqNwml/GOy7Ka8D4UDwU5lrqBDKTg== (Bucket: athena-xxx-output-stage, Key: my_athena_path_xxxxxx/metadata/b6f6cbf8-774e-4161-8568-6b3e43ac6920-m0.avro) This query ran against the ‘xxxx’ database, unless qualified by the query. Please post the error message on our [forum ](https://forums.aws.amazon.com/forum.jspa?forumID=242&start=0) or contact [customer support ](https://eu-west-1.console.aws.amazon.com/support/home?#/case/create?issueType=technical&serviceCode=amazon-athena&categoryCode=query-related-issue) with Query ID: 53761fa2-b802-4417-b0c0-983a49686816

the missing file b6f6cbf8-774e-4161-8568-6b3e43ac6920-m0.avro was deleted by s3 ttl , and is very old , though i still have more recent avro files , but it seems the recent metadata.json still points to that file in the snapshots list in the manifest-list for the oldest snapshot with sequence-number= 1 , see below :
"snapshots" : [ { "sequence-number" : 1, "snapshot-id" : 95661809200085951, "timestamp-ms" : 1713142581530, "summary" : { "operation" : "append", "trino_query_id" : "20240415_005553_00056_gf7ew", "added-data-files" : "27", "added-records" : "5049", "added-files-size" : "330610", "changed-partition-count" : "1", "total-records" : "5049", "total-files-size" : "330610", "total-data-files" : "27", "total-delete-files" : "0", "total-position-deletes" : "0", "total-equality-deletes" : "0" }, "manifest-list" : "s3://athena-xxxx-output-stage/athena_output/config_xxxx/metadata/snap-95661809200085951-1-b6f6cbf8-774e-4161-8568-6b3e43ac6920.avro", "schema-id" : 0

I tried setting the propperties for vacuum this way :

`ALTER TABLE iceberg_table SET TBLPROPERTIES (
'vacuum_max_snapshot_age_seconds'='xxxxx'
)

VACUUM iceberg_table`

and that removed that reference from the newest metadata file , but i think this new file still references now the oldest metadatta.json file in this list :
"metadata-log" : [ { "timestamp-ms" : 1713142581530, "metadata-file : " : "s3://athena-......
}, {
"timestamp-ms" : 1713148787530,
"metadata-file : " : "s3://athena-...... }, { "timestamp-ms" : 1713187881530, "metadata-file : " : "s3://athena-......}, {

Also tried REFRESH operation even using iceberg api .. still same very first error of missing avro file

I would really appreciate if its posssible to help here .

I had this problem on prod and now the team is considering dropping the usage of iceberg and use standard athena tables

S3 had TTL for gdpr reasons

### PS : I managed to reproduce the error onn another table by simply deleting the oldest metada.json , .avro and snapshot.avro files

@ZMarouani ZMarouani added the bug Something isn't working label Jun 24, 2024
@nastra nastra added the AWS label Jun 26, 2024
@ZMarouani
Copy link
Author

No answers for this ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AWS bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants