Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

problem ingesting delimited file to druid #15489

Open
tayunyang opened this issue Dec 5, 2023 · 2 comments
Open

problem ingesting delimited file to druid #15489

tayunyang opened this issue Dec 5, 2023 · 2 comments
Labels

Comments

@tayunyang
Copy link

Hi

I like to ingest delimited file to druid. However, I cannot ingest anything because my data is not parsed. I am not sure where is not correct

My tsv file is like

1, 0
2.8736, 8.29
7.10, 8.83

My task spec is below
{
"type" : "index",
"spec" : {
"dataSchema" : {
"dataSource" : "test_append",
"timestampSpec": {
"column": "timestamp",
"format": "iso"
},
"dimensionsSpec" : {
"dimensions": [{ "name" : "requested", "type" : "long" },
{ "name" : "bfee", "type" : "long" }
]
}
},
"ioConfig" : {
"type" : "index",
"inputSource" : {
"type" : "local",
"baseDir" : "/tmp/test",
"filter" : "test_append.csv"
},
"inputFormat" : {
"type" : "tsv",
"columns" : [ "requested", "bfee" ]
},
"appendToExisting" : true,
"dropExisting" : false
},
"tuningConfig" : {
"type" : "index_parallel",
"maxRowsPerSegment" : 5000000,
"maxRowsInMemory" : 25000
}
}
}

Log shows data is not parsed so cannot be published

2023-12-04T21:40:43,048 INFO [[index_test_append_bnilolmo_2023-12-04T21:40:38.315Z]-appenderator-merge] org.apache.druid.segment.realtime.appenderator.AppenderatorImpl - Preparing to push (stats): processed rows: [0], sinks: [0], fireHydrants (across sinks): [0]
2023-12-04T21:40:43,049 INFO [[index_test_append_bnilolmo_2023-12-04T21:40:38.315Z]-appenderator-merge] org.apache.druid.segment.realtime.appenderator.AppenderatorImpl - Push complete...
2023-12-04T21:40:43,057 INFO [task-runner-0-priority-0] org.apache.druid.segment.realtime.appenderator.BaseAppenderatorDriver - Nothing to publish, skipping publish step.
2023-12-04T21:40:43,058 INFO [task-runner-0-priority-0] org.apache.druid.indexing.common.task.IndexTask - Processed[0] events, unparseable[4], thrownAway[0].

@SamWheating
Copy link
Contributor

SamWheating commented Dec 13, 2023

Based on the input file name and contents, It looks like you're passing in a .csv file (comma-separated). However, under inputFormat you've specified the type as tsv (tab-separated) which is likely causing parse exceptions.

Also - you can set logParseExceptions: true under tuningConfig in order to include the parsing exceptions in your ingestion logs.

Copy link

This issue has been marked as stale due to 280 days of inactivity.
It will be closed in 4 weeks if no further activity occurs. If this issue is still
relevant, please simply write any comment. Even if closed, you can still revive the
issue at any time or discuss it on the dev@druid.apache.org list.
Thank you for your contributions.

@github-actions github-actions bot added the stale label Sep 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants