Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kafka/zookeeper fatal error when disk runs out #3133

Open
sposs opened this issue Jun 17, 2024 · 3 comments
Open

Kafka/zookeeper fatal error when disk runs out #3133

sposs opened this issue Jun 17, 2024 · 3 comments

Comments

@sposs
Copy link

sposs commented Jun 17, 2024

Self-Hosted Version

24.500

CPU Architecture

x86_64

Docker Version

26.1.4

Docker Compose Version

2.27.1

Steps to Reproduce

Install self hosted. Run out of disk. Kafka/zookeeper will fail. Impossible to recover (see logs), my installation is doomed.

Expected Result

The service should not break to a point it cannot be recovered. Maybe check the disk and kill itself. I'd rather loose a bunch of transactions than losing everything.

Actual Result

===> Launching kafka ... 
[2024-06-17 04:55:46,504] INFO Registered kafka:type=kafka.Log4jController MBean (kafka.utils.Log4jControllerRegistration$)
[2024-06-17 04:55:47,568] INFO Starting the log cleaner (kafka.log.LogCleaner)
[2024-06-17 04:55:47,915] INFO Updated connection-accept-rate max connection creation rate to 2147483647 (kafka.network.ConnectionQuotas)
[2024-06-17 04:55:47,936] INFO [SocketServer listenerType=ZK_BROKER, nodeId=1001] Created data-plane acceptor and processors for endpoint : ListenerName(PLAINTEXT) (kafka.network.SocketServer)
[2024-06-17 04:55:48,020] INFO Creating /brokers/ids/1001 (is it secure? false) (kafka.zk.KafkaZkClient)
[2024-06-17 04:55:48,033] INFO Stat of the created znode at /brokers/ids/1001 is: 1478,1478,1718600148028,1718600148028,1,0,0,72130214439944228,194,0,1478
 (kafka.zk.KafkaZkClient)
[2024-06-17 04:55:48,034] INFO Registered broker 1001 at path /brokers/ids/1001 with addresses: PLAINTEXT://kafka:9092, czxid (broker epoch): 1478 (kafka.zk.KafkaZkClient)
[2024-06-17 04:55:48,242] INFO [/config/changes-event-process-thread]: Starting (kafka.common.ZkNodeChangeNotificationListener$ChangeEventProcessThread)
[2024-06-17 04:55:48,259] WARN [Controller id=1001, targetBrokerId=1001] Connection to node 1001 (kafka/172.19.0.13:9092) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
[2024-06-17 04:55:48,260] WARN [RequestSendThread controllerId=1001] Controller 1001's connection to broker kafka:9092 (id: 1001 rack: null) was unsuccessful (kafka.controller.RequestSendThread)
java.io.IOException: Connection to kafka:9092 (id: 1001 rack: null) failed.
	at org.apache.kafka.clients.NetworkClientUtils.awaitReady(NetworkClientUtils.java:70)
	at kafka.controller.RequestSendThread.brokerReady(ControllerChannelManager.scala:298)
	at kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:251)
	at org.apache.kafka.server.util.ShutdownableThread.run(ShutdownableThread.java:130)
[2024-06-17 04:55:48,341] INFO [SocketServer listenerType=ZK_BROKER, nodeId=1001] Enabling request processing. (kafka.network.SocketServer)
[2024-06-17 04:55:48,344] INFO Awaiting socket connections on 0.0.0.0:9092. (kafka.network.DataPlaneAcceptor)
[2024-06-17 04:56:20,444] ERROR Error while appending records to ingest-transactions-0 in dir /var/lib/kafka/data (org.apache.kafka.storage.internals.log.LogDirFailureChannel)
java.io.IOException: No space left on device
	at java.base/sun.nio.ch.FileDispatcherImpl.write0(Native Method)
	at java.base/sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:62)
	at java.base/sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:113)
	at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:79)
	at java.base/sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:280)
	at org.apache.kafka.common.record.MemoryRecords.writeFullyTo(MemoryRecords.java:90)
	at org.apache.kafka.common.record.FileRecords.append(FileRecords.java:188)
	at kafka.log.LogSegment.append(LogSegment.scala:160)
	at kafka.log.LocalLog.append(LocalLog.scala:439)
	at kafka.log.UnifiedLog.append(UnifiedLog.scala:911)
	at kafka.log.UnifiedLog.appendAsLeader(UnifiedLog.scala:719)
	at kafka.cluster.Partition.$anonfun$appendRecordsToLeader$1(Partition.scala:1313)
	at kafka.cluster.Partition.appendRecordsToLeader(Partition.scala:1301)
	at kafka.server.ReplicaManager.$anonfun$appendToLocalLog$6(ReplicaManager.scala:1277)
	at scala.collection.StrictOptimizedMapOps.map(StrictOptimizedMapOps.scala:28)
	at scala.collection.StrictOptimizedMapOps.map$(StrictOptimizedMapOps.scala:27)
	at scala.collection.mutable.HashMap.map(HashMap.scala:35)
	at kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:1265)
	at kafka.server.ReplicaManager.appendRecords(ReplicaManager.scala:868)
	at kafka.server.KafkaApis.handleProduceRequest(KafkaApis.scala:686)
	at kafka.server.KafkaApis.handle(KafkaApis.scala:180)
	at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:153)
	at java.base/java.lang.Thread.run(Thread.java:829)
[2024-06-17 04:56:20,445] WARN [ReplicaManager broker=1001] Stopping serving replicas in dir /var/lib/kafka/data (kafka.server.ReplicaManager)
[2024-06-17 04:56:20,464] WARN [ReplicaManager broker=1001] Broker 1001 stopped fetcher for partitions snuba-queries-0,outcomes-0,scheduled-subscriptions-transactions-0,events-0,cdc-0,profiles-call-tree-0,snuba-generic-metrics-sets-commit-log-0,__consumer_offsets-0,scheduled-subscriptions-events-0,outcomes-billing-0,ingest-performance-metrics-0,events-subscription-results-0,snuba-dead-letter-generic-events-0,transactions-0,snuba-dead-letter-replays-0,processed-profiles-0,snuba-dead-letter-metrics-0,snuba-attribution-0,scheduled-subscriptions-generic-metrics-distributions-0,snuba-generic-metrics-counters-commit-log-0,ingest-events-0,metrics-subscription-results-0,snuba-generic-metrics-gauges-commit-log-0,profiles-0,scheduled-subscriptions-generic-metrics-counters-0,scheduled-subscriptions-generic-metrics-sets-0,scheduled-subscriptions-generic-metrics-gauges-0,generic-metrics-subscription-results-0,snuba-transactions-commit-log-0,snuba-spans-0,ingest-replay-events-0,ingest-sessions-0,ingest-transactions-0,ingest-attachments-0,snuba-metrics-0,monitors-clock-tick-0,snuba-metrics-summaries-0,snuba-dead-letter-group-attributes-0,shared-resources-usage-0,ingest-monitors-0,ingest-occurrences-0,transactions-subscription-results-0,generic-events-0,snuba-dead-letter-generic-metrics-0,snuba-metrics-commit-log-0,ingest-metrics-0,group-attributes-0,snuba-generic-metrics-0,event-replacements-0,snuba-dead-letter-querylog-0,snuba-commit-log-0,snuba-generic-metrics-distributions-commit-log-0,ingest-replay-recordings-0,snuba-generic-events-commit-log-0,scheduled-subscriptions-metrics-0 and stopped moving logs for partitions  because they are in the failed log directory /var/lib/kafka/data. (kafka.server.ReplicaManager)
[2024-06-17 04:56:20,464] WARN Stopping serving logs in dir /var/lib/kafka/data (kafka.log.LogManager)
[2024-06-17 04:56:20,466] ERROR Shutdown broker because all log dirs in /var/lib/kafka/data have failed (kafka.log.LogManager)

And zookeepers' logs

Using log4j config /etc/kafka/log4j.properties
===> User
uid=1000(appuser) gid=1000(appuser) groups=1000(appuser)
===> Configuring ...
Running in Zookeeper mode...
===> Running preflight checks ... 
===> Check if /var/lib/kafka/data is writable ...
===> Check if Zookeeper is healthy ...
[2024-06-17 06:00:49,813] ERROR Unable to resolve address: zookeeper:2181 (org.apache.zookeeper.client.StaticHostProvider)
java.net.UnknownHostException: zookeeper: Name or service not known
	at java.base/java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
	at java.base/java.net.InetAddress$PlatformNameService.lookupAllHostAddr(InetAddress.java:930)
	at java.base/java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1543)
	at java.base/java.net.InetAddress$NameServiceAddresses.get(InetAddress.java:848)
	at java.base/java.net.InetAddress.getAllByName0(InetAddress.java:1533)
	at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1386)
	at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1307)
	at org.apache.zookeeper.client.StaticHostProvider$1.getAllByName(StaticHostProvider.java:88)
	at org.apache.zookeeper.client.StaticHostProvider.resolve(StaticHostProvider.java:141)
	at org.apache.zookeeper.client.StaticHostProvider.next(StaticHostProvider.java:368)
	at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1204)
[2024-06-17 06:00:49,818] WARN Session 0x0 for server zookeeper:2181, Closing socket connection. Attempting reconnect except it is a SessionExpiredException. (org.apache.zookeeper.ClientCnxn)

Shutting down and restarting fails with dependency failed to start: container sentry-self-hosted-zookeeper-1 is unhealthy
Reinstalling fails with

dependency failed to start: container sentry-self-hosted-zookeeper-1 is unhealthy
Error in install/bootstrap-snuba.sh:3.
'$dcr snuba-api bootstrap --no-migrate --force' exited with status 1
-> ./install.sh:main:36
--> install/bootstrap-snuba.sh:source:3

Tried to follow the troubleshooting guide

sentry@workhorse:~/self-hosted$ docker compose run --rm kafka kafka-consumer-groups --bootstrap-server kafka:9092 --list
[+] Creating 1/0
 ✔ Container sentry-self-hosted-zookeeper-1  Created                                                                                                                                                                                               0.0s 
[+] Running 1/1
 ✔ Container sentry-self-hosted-zookeeper-1  Started                                                                                                                                                                                               0.4s 
dependency failed to start: container sentry-self-hosted-zookeeper-1 is unhealthy

Tried the nuclear option

sentry@workhorse:~/self-hosted$ docker compose down --volumes
[+] Running 13/13
 ✔ Container sentry-self-hosted-kafka-1             Removed                                                                                                                                                                                        0.0s 
 ✔ Container sentry-self-hosted-clickhouse-1        Removed                                                                                                                                                                                        0.0s 
 ✔ Container sentry-self-hosted-redis-1             Removed                                                                                                                                                                                        0.0s 
 ✔ Container sentry-self-hosted-zookeeper-1         Removed                                                                                                                                                                                        0.1s 
 ✔ Volume sentry-self-hosted_sentry-clickhouse-log  Removed                                                                                                                                                                                        0.0s 
 ✔ Volume sentry-self-hosted_sentry-vroom           Removed                                                                                                                                                                                        0.4s 
 ✔ Volume sentry-self-hosted_sentry-secrets         Removed                                                                                                                                                                                        0.0s 
 ✔ Volume sentry-self-hosted_sentry-kafka-log       Removed                                                                                                                                                                                        0.4s 
 ✔ Volume sentry-self-hosted_sentry-smtp            Removed                                                                                                                                                                                        0.4s 
 ✔ Volume sentry-self-hosted_sentry-smtp-log        Removed                                                                                                                                                                                        0.4s 
 ✔ Volume sentry-self-hosted_sentry-nginx-cache     Removed                                                                                                                                                                                        0.4s 
 ✔ Volume sentry-self-hosted_sentry-zookeeper-log   Removed                                                                                                                                                                                        0.4s 
 ✔ Network sentry-self-hosted_default               Removed                                                                                                                                                                                        0.1s 
sentry@workhorse:~/self-hosted$ docker volume rm sentry-kafka
sentry-kafka
sentry@workhorse:~/self-hosted$ docker volume rm sentry-zookeeper
sentry-zookeeper

But then reinstall fails

 Volume "sentry-self-hosted_sentry-nginx-cache"  Created
external volume "sentry-zookeeper" not found
Error in install/upgrade-clickhouse.sh:15.
'$dc up -d clickhouse' exited with status 1
-> ./install.sh:main:25
--> install/upgrade-clickhouse.sh:source:15

Event ID

No response

@sposs
Copy link
Author

sposs commented Jun 17, 2024

Worst thing: I've removed everything (docker system prune -a), but now install always fails due to the missing volume.

@sposs
Copy link
Author

sposs commented Jun 17, 2024

Apparently, docker system prune -a does not clean the volumes if the location is not standard and that's the reason for the problem reinstalling.

@djoeycl
Copy link

djoeycl commented Jun 17, 2024

to get it working again you need to do.

docker volume create sentry-zookeeper
docker volume create sentry-kafka

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Waiting for: Community
Status: No status
Development

No branches or pull requests

3 participants