Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Throws error if subfolder not found or empty and aborts the table merge #20

Open
premkaliap opened this issue Oct 10, 2022 · 0 comments

Comments

@premkaliap
Copy link

Getting below error while its encountering missing path and download for that table aborts.

[ForkJoinPool-1-worker-3] WARN com.guidewire.cda.TableReader - Copy Job FAILED for 'cc_claim' for fingerprint '4e588b71e9a149148b623a22da443314': org.apache.spark.sql.AnalysisException: Path does not exist: s3a://tenant-xxx/cc/4e588b71e9a149148b623a22da443314/1664585730000/*.parquet;

I was told by GW that its normal to have empty folder reference on the .cda/batch-metrics.json file but the actual path don't exists. How to handle this scenario?

"The failures are related to timestamp folders that don’t contain any Parquet files in them. This is standard behavior where, if all records processed in a batch for a given table were deemed as duplicates (previously seen), CDA would correctly not write them out to S3. But it would still write out the reconciliation stats to .cda/batch-metrics.jsonfile. It’s the presence of this file that’s causing the folder to show up in S3, even if there are no Parquet files inside of it."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant