active masternode crash caused by datanode Input/output error #76480

hanbj · 2021-08-13T06:52:18Z

Problem description：
A few days ago, an active masternode process of the cluster hung up. Looking at the log, it was found that a datanode disk was damaged. The log is as follows:

[2021-08-09T09:17:36,992][WARN ][o.e.g.G.InternalPrimaryShardAllocator][node1] [index_2021-08-07][3]: failed to list shard for shard_started on node [Jyd9FdobRayRSGEdQla3Ww]
org.elasticsearch.action.FailedNodeException: Failed node [Jyd9FdobRayRSGEdQla3Ww]
at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction.onFailure(TransportNodesAction.java:221) [elasticsearch-7.6.0.jar:7.6.0]
...
at java.lang.Thread.run(Thread.java:834) [?:?]
Caused by: org.elasticsearch.transport.RemoteTransportException: [node1][0.0.0.0:9300][internal:gateway/local/started_shards[n]]
Caused by: org.elasticsearch.ElasticsearchException: failed to load started shards
at org.elasticsearch.gateway.TransportNodesListGatewayStartedShards.nodeOperation(TransportNodesListGatewayStartedShards.java:168) ~[elasticsearch-7.6.0.jar:7.6.0]
...
at java.lang.Thread.run(Thread.java:834) ~[?:?]
Caused by: java.lang.IllegalStateException: environment is not locked
at org.elasticsearch.env.NodeEnvironment.assertEnvIsLocked(NodeEnvironment.java:1044) ~[elasticsearch-7.6.0.jar:7.6.0]
...
at java.lang.Thread.run(Thread.java:834) ~[?:?]
Caused by: java.io.IOException: Input/output error
at sun.nio.ch.FileDispatcherImpl.size0(Native Method) ~[?:?]
...
at java.lang.Thread.run(Thread.java:834) ~[?:?]

[2021-08-09T09:17:37,291][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler][node1] fatal error in thread [elasticsearch[node1][masterService#updateTask][T#436]], exiting
java.lang.StackOverflowError: null
at java.util.Collections$UnmodifiableCollection$1.(Collections.java:1042) ~[?:?]
at java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1041) ~[?:?]
at java.util.Collections$UnmodifiableCollection$1.(Collections.java:1042) ~[?:?]
...
at java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1041) ~[?:?]
at java.util.Collections$UnmodifiableCollection$1.(Collections.java:1042) ~[?:?]
at java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1041) ~[?:?]

Here is the code for the UnmodifiableCollection class from Java.
As you can see, when you call iterator to an immutable collection, it creates an instance on the anonymous class. The constructor of this class calls C. iterator ()... Where is the class wrapped in C. However, a stack trace means that it C itself is an immutable collection.

So I can think of a reasonable reason:

If your application is wrapping unmodifiable collections in unmodifiable collections to N levels, then creating an iterator will result in N * 2 levels of stack frames. For large enough N, that would lead to a stack overflow.

static class UnmodifiableCollection<E> implements Collection<E>, Serializable {
        final Collection<? extends E> c;
        UnmodifiableCollection(Collection<? extends E> c) {
            if (c==null)
                throw new NullPointerException();
            this.c = c;
        }
        ...
        public Iterator<E> iterator() {
            return new Iterator<E>() {
                private final Iterator<? extends E> i = c.iterator();

                public boolean hasNext() {return i.hasNext();}
                public E next()          {return i.next();}
                public void remove() {
                    throw new UnsupportedOperationException();
                }
                @Override
                public void forEachRemaining(Consumer<? super E> action) {
                    // Use backing collection version
                    i.forEachRemaining(action);
                }
            };
        }
}

Cause of problem：

Use Arthas to observe the calling path of the iterator method. The command is as follows:

options unsafe true
stack java.util.Collections$UnmodifiableCollection iterator

The following results will appear only when the process has just started and the above command is executed immediately using Arthas.

ts=2021-08-13 10:10:53;thread_name=elasticsearch[node1][masterService#updateTask][T#1];id=22;is_daemon=true;priority=5;TCCL=jdk.internal.loader.ClassLoaders$AppClassLoader@277050dc
@java.util.Collections$UnmodifiableCollection.iterator()
at java.util.Collections$UnmodifiableCollection$1.(Collections.java:1044)
at java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043)
...
at java.util.Collections$UnmodifiableCollection$1.(Collections.java:1044)
at java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043)
at org.elasticsearch.common.io.stream.StreamOutput.writeCollection(StreamOutput.java:1112)
at org.elasticsearch.cluster.routing.**UnassignedInfo.writeTo(**UnassignedInfo.java:297)
at org.elasticsearch.common.io.stream.StreamOutput.writeOptionalWriteable(StreamOutput.java:897)
at org.elasticsearch.cluster.routing.ShardRouting.writeToThin(ShardRouting.java:299)
at org.elasticsearch.cluster.routing.IndexShardRoutingTable$Builder.writeToThin(IndexShardRoutingTable.java:742)
at org.elasticsearch.cluster.routing.IndexRoutingTable.writeTo(IndexRoutingTable.java:321)
at org.elasticsearch.cluster.AbstractDiffable$CompleteDiff.writeTo(AbstractDiffable.java:81)
at org.elasticsearch.cluster.DiffableUtils$DiffableValueSerializer.writeDiff(DiffableUtils.java:647)
...
at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:175)
at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:253)
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:633)

Read the source code. Because the datanode disk is damaged, in active masternode BaseGatewayShardAllocator#makeAllocationDecision() method will return AllocateunassignedDecision. no (...), and this shard will be removeAndIgnore. When changing the reason why the shard is not allocated, call UnassignedInfo#getFailedNodeIds() to obtain an immutable collection, In the construction method of UnassignedInfo, a layer of immutable collection failedNodeIds will be wrapped, which cannot be changed.

--GatewayAllocator
------GatewayAllocator#allocateUnassigned()
----------GatewayAllocator#innerAllocatedUnassigned()
--------------BaseGatewayShardAllocator#allocateUnassigned()
------------------UnassignedIterator#removeAndIgnore()
----------------------UnassignedShards#ignoreShard() #currInfo.getFailedNodeIds() will get an immutable collection
--------------------------new UnassignedInfo() #this.failedNodeIds = Collections.unmodifiableSet(failedNodeIds)

Therefore, when the disk is damaged and does not come out for a period of time, it will cause stack overflow.

elasticmachine · 2021-08-14T18:40:22Z

Pinging @elastic/es-distributed (Team:Distributed)

original-brownbear

Thanks for the contribution @hanbj. I looked into this a little an we do indeed even during normal operation wrap this set a couple of times. Just a quick suggestion on making this shorter.

original-brownbear · 2021-08-14T18:41:00Z

server/src/main/java/org/elasticsearch/cluster/routing/UnassignedInfo.java

@@ -241,7 +242,7 @@ public UnassignedInfo(Reason reason, @Nullable String message, @Nullable Excepti
        this.failure = failure;
        this.failedAllocations = failedAllocations;
        this.lastAllocationStatus = Objects.requireNonNull(lastAllocationStatus);
-        this.failedNodeIds = Collections.unmodifiableSet(failedNodeIds);
+        this.failedNodeIds = Collections.unmodifiableSet(new HashSet<>(failedNodeIds));


I think we can just use Set.copyOf( here to be shorter and have a singleton for the empty case.

@original-brownbear Thank you very much for your suggestion. I see that the Set.copyOf() method is not public, and I use jdk14 to test the method, which also reports an error.

Not sure what you mean. java.util.Set#copyOf is available from JDK 10 and public. In master we do not support any JDK version older than 10 so it's not a problem there and for the 7.x backport we do have a wrapper in place that we can use. We use that utility all over the codebase and I don't see a reason why it wouldn't work here?

@original-brownbear Yes, you're right. Thank you very much. You're great. I've solved it.

original-brownbear

LGTM thanks for finding and tracking this down @hanbj ! I'll merge and backport this once green

original-brownbear · 2021-08-15T17:06:41Z

Jenkins test this

original-brownbear · 2021-08-15T17:19:56Z

Jenkins run elasticsearch-ci/rest-compatibility

original-brownbear · 2021-08-15T17:27:38Z

@hanbj seems like we have some backwards compatibility issues here from recent changes, could you merge latest master into your branch again to fix them so we get a clean CI build? Thanks!

have a singleton instance when collection is not change

delete import HashSet

hanbj · 2021-08-16T00:57:19Z

@original-brownbear OK, Thanks, I've handled it. Please try again.

original-brownbear · 2021-08-16T03:37:34Z

Jenkins test this

original-brownbear · 2021-08-16T04:34:04Z

np + Thanks again @hanbj !

We kept wrapping the collection over and over again which in extreme corner cases could lead to a SOE. Closes elastic#76490

We kept wrapping the collection over and over again which in extreme corner cases could lead to a SOE. Closes #76490 Co-authored-by: hanbj <hanbj0707@163.com>

* master: (868 commits) Query API key - Rest spec and yaml tests (elastic#76238) Delay shard reassignment from nodes which are known to be restarting (elastic#75606) Reenable bwc tests for elastic#76475 (elastic#76576) Set version to 7.15 in BWC code (elastic#76577) Don't remove warning headers on all failure (elastic#76434) Disable bwc tests for elastic#76475 (elastic#76541) Re-enable bwc tests (elastic#76567) Keep track of data recovered from snapshots in RecoveryState (elastic#76499) [Transform] Align transform checkpoint range with date_histogram interval for better performance (elastic#74004) EQL: Remove "wildcard" function (elastic#76099) Fix 'accept' and 'content_type' fields for search_mvt API Add persistent licensed feature tracking (elastic#76476) Add system data streams to feature state snapshots (elastic#75902) fix the error message for instance methods that don't exist (elastic#76512) ILM: Add validation of the number_of_shards parameter in Shrink Action of ILM (elastic#74219) remove dashboard only reserved role (elastic#76507) Fix Stack Overflow in UnassignedInfo in Corner Case (elastic#76480) Add (Extended)KeyUsage KeyUsage, CipherSuite & Protocol to SSL diagnostics (elastic#65634) Add recovery from snapshot to tests (elastic#76535) Reenable BwC Tests after elastic#76532 (elastic#76534) ...

elasticsearchmachine added v8.0.0 external-contributor Pull request authored by a developer outside the Elasticsearch team labels Aug 13, 2021

hanbj mentioned this pull request Aug 13, 2021

active masternode crash caused by datanode Input/output error #76490

Closed

original-brownbear added :Distributed/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) v7.15.0 and removed external-contributor Pull request authored by a developer outside the Elasticsearch team labels Aug 14, 2021

elasticmachine added the Team:Distributed Meta label for distributed team label Aug 14, 2021

original-brownbear added the >bug label Aug 14, 2021

original-brownbear self-requested a review August 14, 2021 18:40

original-brownbear reviewed Aug 14, 2021

View reviewed changes

original-brownbear approved these changes Aug 15, 2021

View reviewed changes

hanbj and others added 3 commits August 16, 2021 08:47

active masternode crash caused by datanode Input/output error

b9226ca

Update UnassignedInfo.java

3ba4948

have a singleton instance when collection is not change

Update UnassignedInfo.java

8f84121

delete import HashSet

hanbj force-pushed the StackOverflowError branch from 7be86d3 to 8f84121 Compare August 16, 2021 00:53

original-brownbear merged commit 1d8bad6 into elastic:master Aug 16, 2021

original-brownbear mentioned this pull request Aug 16, 2021

Fix Stack Overflow in UnassignedInfo in Corner Case (#76480) #76546

Merged

jakelandis added v8.0.0-alpha2 and removed v8.0.0 labels Sep 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

active masternode crash caused by datanode Input/output error #76480

active masternode crash caused by datanode Input/output error #76480

hanbj commented Aug 13, 2021 •

edited

Loading

elasticmachine commented Aug 14, 2021

original-brownbear left a comment

original-brownbear Aug 14, 2021

hanbj Aug 15, 2021 •

edited

Loading

original-brownbear Aug 15, 2021

hanbj Aug 15, 2021

original-brownbear left a comment

original-brownbear commented Aug 15, 2021

original-brownbear commented Aug 15, 2021

original-brownbear commented Aug 15, 2021

hanbj commented Aug 16, 2021

original-brownbear commented Aug 16, 2021

original-brownbear commented Aug 16, 2021

active masternode crash caused by datanode Input/output error #76480

active masternode crash caused by datanode Input/output error #76480

Conversation

hanbj commented Aug 13, 2021 • edited Loading

elasticmachine commented Aug 14, 2021

original-brownbear left a comment

Choose a reason for hiding this comment

original-brownbear Aug 14, 2021

Choose a reason for hiding this comment

hanbj Aug 15, 2021 • edited Loading

Choose a reason for hiding this comment

original-brownbear Aug 15, 2021

Choose a reason for hiding this comment

hanbj Aug 15, 2021

Choose a reason for hiding this comment

original-brownbear left a comment

Choose a reason for hiding this comment

original-brownbear commented Aug 15, 2021

original-brownbear commented Aug 15, 2021

original-brownbear commented Aug 15, 2021

hanbj commented Aug 16, 2021

original-brownbear commented Aug 16, 2021

original-brownbear commented Aug 16, 2021

hanbj commented Aug 13, 2021 •

edited

Loading

hanbj Aug 15, 2021 •

edited

Loading