You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Small cluster (1x master, 2x data and 1x query server) index jobs infrequently fail.
Note: logs slightly sanitised.
Affected Version
26
Description
Ingestion is via kafka. Cluster setup is pretty vanilla as per documentation example, running in AWS. Ingestion rate is around 400-500 messages per minute and servers are lightly loaded.
Relevant section from middlemanager log on the data server.
2023-07-08T07:15:17,847 INFO [forking-task-runner-3] org.apache.druid.indexing.overlord.ForkingTaskRunner - Logging task index_kafka_ProdTelegraf_fd4ef3fcd347e9c_nikkmmpb output to: var/druid/task/index_kafka_ProdTelegraf_fd4ef3fcd347e9c_nikkmmpb/log
2023-07-08T07:16:19,169 INFO [forking-task-runner-2] org.apache.druid.storage.s3.S3TaskLogs - Pushing task log var/druid/task/index_kafka_ProdTelegraf_711e77b3a232659_jeemgebj/log to: druid/indexing-logs/index_kafka_ProdTelegraf_711e77b3a232659_jeemgebj/log
2023-07-08T07:16:19,170 INFO [forking-task-runner-2] org.apache.druid.storage.s3.S3Utils - Pushing [var/druid/task/index_kafka_ProdTelegraf_711e77b3a232659_jeemgebj/log] to bucket[druid-deepstorage-prod] and key[druid/indexing-logs/index_kafka_ProdTelegraf_711e77b3a232659_jeemgebj/log].
2023-07-08T07:16:19,306 INFO [forking-task-runner-2] org.apache.druid.storage.s3.S3TaskLogs - Pushing task reports /home/ec2-user/apache-druid-26.0.0/var/druid/task/index_kafka_ProdTelegraf_711e77b3a232659_jeemgebj/attempt/1/report.json to: druid/indexing-logs/index_kafka_ProdTelegraf_711e77b3a232659_jeemgebj/report.json
2023-07-08T07:16:19,306 INFO [forking-task-runner-2] org.apache.druid.storage.s3.S3Utils - Pushing [/home/ec2-user/apache-druid-26.0.0/var/druid/task/index_kafka_ProdTelegraf_711e77b3a232659_jeemgebj/attempt/1/report.json] to bucket[druid-deepstorage-prod] and key[druid/indexing-logs/index_kafka_ProdTelegraf_711e77b3a232659_jeemgebj/report.json].
2023-07-08T07:16:19,350 INFO [forking-task-runner-2] org.apache.druid.indexing.overlord.ForkingTaskRunner - Process exited successfully for task: index_kafka_ProdTelegraf_711e77b3a232659_jeemgebj
2023-07-08T07:16:19,355 INFO [forking-task-runner-2] org.apache.druid.indexing.overlord.ForkingTaskRunner - Removing task directory: var/druid/task/index_kafka_ProdTelegraf_711e77b3a232659_jeemgebj
2023-07-08T07:16:19,360 INFO [WorkerTaskManager-NoticeHandler] org.apache.druid.indexing.worker.WorkerTaskManager - Task [index_kafka_ProdTelegraf_711e77b3a232659_jeemgebj] completed with status [FAILED].
Failed index log:
2023-07-08T07:15:38,512 INFO [[index_kafka_ProdTelegraf_711e77b3a232659_jeemgebj]-appenderator-merge] org.apache.druid.segment.realtime.appenderator.StreamAppenderator - Segment[ProdTelegraf_2023-07-08T00:00:00.000Z_2023-07-09T00:00:00.000Z_2023-07-08T00:00:04.131Z_7] of 34,451,733 bytes built from 6 incremental persist(s) in 10,808ms; pushed to deep storage in 7,597ms. Load spec is: {"type":"s3_zip","bucket":"druid-deepstorage-prod","key":"druid/segments/ProdTelegraf/2023-07-08T00:00:00.000Z_2023-07-09T00:00:00.000Z/2023-07-08T00:00:04.131Z/7/228257f7-4460-4e66-af74-b76af28292ab/index.zip","S3Schema":"s3n"}
2023-07-08T07:15:38,513 INFO [[index_kafka_ProdTelegraf_711e77b3a232659_jeemgebj]-appenderator-merge] org.apache.druid.segment.realtime.appenderator.StreamAppenderator - Push complete...
2023-07-08T07:15:38,613 INFO [[index_kafka_ProdTelegraf_711e77b3a232659_jeemgebj]-publish] org.apache.druid.segment.realtime.appenderator.BaseAppenderatorDriver - Published [1] segments with commit metadata [{nextPartitions=SeekableStreamStartSequenceNumbers{stream='ProdTelegrafTopic', partitionSequenceNumberMap={0=11782162747}, exclusivePartitions=[]}, publishPartitions=SeekableStreamEndSequenceNumbers{stream='ProdTelegrafTopic', partitionSequenceNumberMap={0=11782162747}}}]
2023-07-08T07:15:38,616 INFO [[index_kafka_ProdTelegraf_711e77b3a232659_jeemgebj]-publish] org.apache.druid.segment.realtime.appenderator.BaseAppenderatorDriver - Published segments: [ProdTelegraf_2023-07-08T00:00:00.000Z_2023-07-09T00:00:00.000Z_2023-07-08T00:00:04.131Z_7]
2023-07-08T07:15:38,618 INFO [[index_kafka_ProdTelegraf_711e77b3a232659_jeemgebj]-publish] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner - Published 1 segments for sequence [index_kafka_ProdTelegraf_711e77b3a232659_0] with metadata [AppenderatorDriverMetadata{segments={index_kafka_ProdTelegraf_711e77b3a232659_0=[SegmentWithState{segmentIdentifier=ProdTelegraf_2023-07-08T00:00:00.000Z_2023-07-09T00:00:00.000Z_2023-07-08T00:00:04.131Z_7, state=APPENDING}]}, lastSegmentIds={index_kafka_ProdTelegraf_711e77b3a232659_0=ProdTelegraf_2023-07-08T00:00:00.000Z_2023-07-09T00:00:00.000Z_2023-07-08T00:00:04.131Z_7}, callerMetadata={nextPartitions=SeekableStreamStartSequenceNumbers{stream='ProdTelegrafTopic', partitionSequenceNumberMap={0=11782162747}, exclusivePartitions=[]}, publishPartitions=SeekableStreamEndSequenceNumbers{stream='ProdTelegrafTopic', partitionSequenceNumberMap={0=11782162747}}}}].
2023-07-08T07:15:38,619 INFO [[index_kafka_ProdTelegraf_711e77b3a232659_jeemgebj]-publish] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner - Published segments: [ProdTelegraf_2023-07-08T00:00:00.000Z_2023-07-09T00:00:00.000Z_2023-07-08T00:00:04.131Z_7]
2023-07-08T07:15:38,620 INFO [[index_kafka_ProdTelegraf_711e77b3a232659_jeemgebj]-publish] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner - Saved sequence metadata to disk: []
2023-07-08T07:16:18,413 INFO [[index_kafka_ProdTelegraf_711e77b3a232659_jeemgebj]-appenderator-persist] org.apache.druid.server.coordination.BatchDataSegmentAnnouncer - Unannouncing segment[ProdTelegraf_2023-07-08T00:00:00.000Z_2023-07-09T00:00:00.000Z_2023-07-08T00:00:04.131Z_7] at path[/druid/segments/10.9.5.136:8100/10.9.5.136:8100_indexer-executor__default_tier_2023-07-08T06:15:21.452Z_51f2f62c63814b38a2ff8cd9bd0847d80]
2023-07-08T07:16:18,414 INFO [[index_kafka_ProdTelegraf_711e77b3a232659_jeemgebj]-appenderator-persist] org.apache.druid.curator.announcement.Announcer - Unannouncing [/druid/segments/10.9.5.136:8100/10.9.5.136:8100_indexer-executor__default_tier_2023-07-08T06:15:21.452Z_51f2f62c63814b38a2ff8cd9bd0847d80]
2023-07-08T07:16:18,486 INFO [[index_kafka_ProdTelegraf_711e77b3a232659_jeemgebj]-appenderator-persist] org.apache.druid.segment.realtime.appenderator.StreamAppenderator - Dropped segment[ProdTelegraf_2023-07-08T00:00:00.000Z_2023-07-09T00:00:00.000Z_2023-07-08T00:00:04.131Z_7].
2023-07-08T07:16:18,487 INFO [[index_kafka_ProdTelegraf_711e77b3a232659_jeemgebj]-appenderator-persist] org.apache.druid.segment.realtime.appenderator.StreamAppenderatorDriver - Successfully handed off [1] segments.
2023-07-08T07:16:18,493 INFO [qtp1622464820-132] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner - Stopping gracefully (status: [PUBLISHING])
2023-07-08T07:16:18,496 INFO [task-runner-0-priority-0] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner - Handoff complete for segments: DataSegment{binaryVersion=9, id=ProdTelegraf_2023-07-08T00:00:00.000Z_2023-07-09T00:00:00.000Z_2023-07-08T00:00:04.131Z_7, loadSpec={type=>s3_zip, bucket=>druid-deepstorage-prod, key=>druid/segments/ProdTelegraf/2023-07-08T00:00:00.000Z_2023-07-09T00:00:00.000Z/2023-07-08T00:00:04.131Z/7/228257f7-4460-4e66-af74-b76af28292ab/index.zip, S3Schema=>s3n}, dimensions=[value, metric, metric_group, resource_index, resource, source], metrics=[], shardSpec=NumberedShardSpec{partitionNum=7, partitions=0}, lastCompactionState=null, size=34451733}
2023-07-08T07:16:18,500 WARN [task-runner-0-priority-0] org.apache.druid.segment.realtime.appenderator.StreamAppenderator - Interrupted during close()
java.lang.InterruptedException: null
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1343) ~[?:?]
at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:285) ~[guava-16.0.1.jar:?]
at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) ~[guava-16.0.1.jar:?]
at org.apache.druid.segment.realtime.appenderator.StreamAppenderator.close(StreamAppenderator.java:943) ~[druid-server-26.0.0.jar:26.0.0]
at org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner.runInternal(SeekableStreamIndexTaskRunner.java:856) ~[druid-indexing-service-26.0.0.jar:26.0.0]
at org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner.run(SeekableStreamIndexTaskRunner.java:266) ~[druid-indexing-service-26.0.0.jar:26.0.0]
at org.apache.druid.indexing.seekablestream.SeekableStreamIndexTask.runTask(SeekableStreamIndexTask.java:151) ~[druid-indexing-service-26.0.0.jar:26.0.0]
at org.apache.druid.indexing.common.task.AbstractTask.run(AbstractTask.java:173) ~[druid-indexing-service-26.0.0.jar:26.0.0]
at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:477) ~[druid-indexing-service-26.0.0.jar:26.0.0]
at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:449) ~[druid-indexing-service-26.0.0.jar:26.0.0]
at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
at java.lang.Thread.run(Thread.java:829) ~[?:?]
2023-07-08T07:16:18,514 INFO [task-runner-0-priority-0] org.apache.kafka.common.metrics.Metrics - Metrics scheduler closed
2023-07-08T07:16:18,515 INFO [task-runner-0-priority-0] org.apache.kafka.common.metrics.Metrics - Closing reporter org.apache.kafka.common.metrics.JmxReporter
2023-07-08T07:16:18,516 INFO [task-runner-0-priority-0] org.apache.kafka.common.metrics.Metrics - Metrics reporters closed
2023-07-08T07:16:18,532 INFO [task-runner-0-priority-0] org.apache.kafka.common.utils.AppInfoParser - App info kafka.consumer for consumer-kafka-supervisor-opjjiojf-1 unregistered
2023-07-08T07:16:18,534 INFO [task-runner-0-priority-0] org.apache.druid.curator.announcement.Announcer - Unannouncing [/druid/internal-discovery/PEON/10.9.5.136:8100]
2023-07-08T07:16:18,535 ERROR [task-runner-0-priority-0] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner - Encountered exception while running task.
org.apache.druid.java.util.common.ISE: Failed to shutdown executors during close()
at org.apache.druid.segment.realtime.appenderator.StreamAppenderator.close(StreamAppenderator.java:973) ~[druid-server-26.0.0.jar:26.0.0]
at org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner.runInternal(SeekableStreamIndexTaskRunner.java:856) ~[druid-indexing-service-26.0.0.jar:26.0.0]
at org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner.run(SeekableStreamIndexTaskRunner.java:266) ~[druid-indexing-service-26.0.0.jar:26.0.0]
at org.apache.druid.indexing.seekablestream.SeekableStreamIndexTask.runTask(SeekableStreamIndexTask.java:151) ~[druid-indexing-service-26.0.0.jar:26.0.0]
at org.apache.druid.indexing.common.task.AbstractTask.run(AbstractTask.java:173) ~[druid-indexing-service-26.0.0.jar:26.0.0]
at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:477) ~[druid-indexing-service-26.0.0.jar:26.0.0]
at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:449) ~[druid-indexing-service-26.0.0.jar:26.0.0]
at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
at java.lang.Thread.run(Thread.java:829) ~[?:?]
Suppressed: java.lang.RuntimeException: java.lang.InterruptedException
at org.apache.druid.curator.announcement.Announcer.unannounce(Announcer.java:406) ~[druid-server-26.0.0.jar:26.0.0]
at org.apache.druid.curator.discovery.CuratorDruidNodeAnnouncer.unannounce(CuratorDruidNodeAnnouncer.java:86) ~[druid-server-26.0.0.jar:26.0.0]
at org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner.runInternal(SeekableStreamIndexTaskRunner.java:908) ~[druid-indexing-service-26.0.0.jar:26.0.0]
at org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner.run(SeekableStreamIndexTaskRunner.java:266) ~[druid-indexing-service-26.0.0.jar:26.0.0]
at org.apache.druid.indexing.seekablestream.SeekableStreamIndexTask.runTask(SeekableStreamIndexTask.java:151) ~[druid-indexing-service-26.0.0.jar:26.0.0]
at org.apache.druid.indexing.common.task.AbstractTask.run(AbstractTask.java:173) ~[druid-indexing-service-26.0.0.jar:26.0.0]
at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:477) ~[druid-indexing-service-26.0.0.jar:26.0.0]
at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:449) ~[druid-indexing-service-26.0.0.jar:26.0.0]
at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
at java.lang.Thread.run(Thread.java:829) ~[?:?]
Caused by: java.lang.InterruptedException
at java.lang.Object.wait(Native Method) ~[?:?]
at java.lang.Object.wait(Object.java:328) ~[?:?]
at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1556) ~[zookeeper-3.5.10.jar:3.5.10]
at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1539) ~[zookeeper-3.5.10.jar:3.5.10]
at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1909) ~[zookeeper-3.5.10.jar:3.5.10]
at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1830) ~[zookeeper-3.5.10.jar:3.5.10]
at org.apache.curator.framework.imps.CuratorTransactionImpl.doOperation(CuratorTransactionImpl.java:197) ~[curator-framework-5.4.0.jar:5.4.0]
at org.apache.curator.framework.imps.CuratorTransactionImpl.access$000(CuratorTransactionImpl.java:37) ~[curator-framework-5.4.0.jar:5.4.0]
at org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:130) ~[curator-framework-5.4.0.jar:5.4.0]
at org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:126) ~[curator-framework-5.4.0.jar:5.4.0]
at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:93) ~[curator-client-5.4.0.jar:5.4.0]
at org.apache.curator.framework.imps.CuratorTransactionImpl.commit(CuratorTransactionImpl.java:123) ~[curator-framework-5.4.0.jar:5.4.0]
at org.apache.druid.curator.announcement.Announcer.unannounce(Announcer.java:400) ~[druid-server-26.0.0.jar:26.0.0]
... 11 more
2023-07-08T07:16:18,552 INFO [task-runner-0-priority-0] org.apache.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {
"id" : "index_kafka_ProdTelegraf_711e77b3a232659_jeemgebj",
"status" : "FAILED",
"duration" : 3660999,
"errorMsg" : "org.apache.druid.java.util.common.ISE: Failed to shutdown executors during close()\n\tat org.apache.dr...",
"location" : {
"host" : null,
"port" : -1,
"tlsPort" : -1
}
}
2023-07-08T07:16:18,560 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle - Stopping lifecycle [module] stage [ANNOUNCEMENTS]
2023-07-08T07:16:18,580 INFO [main] org.apache.druid.curator.announcement.Announcer - Unannouncing [/druid/announcements/10.9.5.136:8100]
2023-07-08T07:16:18,583 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle - Stopping lifecycle [module] stage [SERVER]
2023-07-08T07:16:18,593 INFO [main] org.eclipse.jetty.server.AbstractConnector - Stopped ServerConnector@54816d7a{HTTP/1.1, (http/1.1)}{0.0.0.0:8100}
2023-07-08T07:16:18,593 INFO [main] org.eclipse.jetty.server.session - node0 Stopped scavenging
2023-07-08T07:16:18,594 INFO [main] org.eclipse.jetty.server.handler.ContextHandler - Stopped o.e.j.s.ServletContextHandler@5ed463b7{/,null,STOPPED}
2023-07-08T07:16:18,608 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle - Stopping lifecycle [module] stage [NORMAL]
2023-07-08T07:16:18,608 INFO [main] org.apache.druid.server.coordination.ZkCoordinator - Stopping ZkCoordinator for [DruidServerMetadata{name='10.9.5.136:8100', hostAndPort='10.9.5.136:8100', hostAndTlsPort='null', maxSize=0, tier='_default_tier', type=indexer-executor, priority=0}]
2023-07-08T07:16:18,610 INFO [main] org.apache.druid.server.coordination.SegmentLoadDropHandler - Stopping...
2023-07-08T07:16:18,610 INFO [main] org.apache.druid.server.coordination.SegmentLoadDropHandler - Stopped.
2023-07-08T07:16:18,612 INFO [main] org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner - Starting graceful shutdown of task[index_kafka_ProdTelegraf_711e77b3a232659_jeemgebj].
2023-07-08T07:16:18,612 INFO [main] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner - Stopping forcefully (status: [PUBLISHING])
2023-07-08T07:16:18,638 INFO [LookupExtractorFactoryContainerProvider-MainThread] org.apache.druid.query.lookup.LookupReferencesManager - Lookup Management loop exited. Lookup notices are not handled anymore.
2023-07-08T07:16:18,641 INFO [Curator-Framework-0] org.apache.curator.framework.imps.CuratorFrameworkImpl - backgroundOperationsLoop exiting
2023-07-08T07:16:18,748 INFO [main] org.apache.zookeeper.ZooKeeper - Session: 0x100000190ec003c closed
2023-07-08T07:16:18,748 INFO [main-EventThread] org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x100000190ec003c
2023-07-08T07:16:18,757 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle - Stopping lifecycle [module] stage [INIT]
Finished peon task
The text was updated successfully, but these errors were encountered:
This issue has been marked as stale due to 280 days of inactivity.
It will be closed in 4 weeks if no further activity occurs. If this issue is still
relevant, please simply write any comment. Even if closed, you can still revive the
issue at any time or discuss it on the dev@druid.apache.org list.
Thank you for your contributions.
This issue has been closed due to lack of activity. If you think that
is incorrect, or the issue requires additional review, you can revive the issue at
any time.
Small cluster (1x master, 2x data and 1x query server) index jobs infrequently fail.
Note: logs slightly sanitised.
Affected Version
26
Description
Ingestion is via kafka. Cluster setup is pretty vanilla as per documentation example, running in AWS. Ingestion rate is around 400-500 messages per minute and servers are lightly loaded.
Relevant section from middlemanager log on the data server.
Failed index log:
The text was updated successfully, but these errors were encountered: