Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

active masternode crash caused by datanode Input/output error #76480

Merged
merged 3 commits into from
Aug 16, 2021

Conversation

hanbj
Copy link
Contributor

@hanbj hanbj commented Aug 13, 2021

Problem description:
A few days ago, an active masternode process of the cluster hung up. Looking at the log, it was found that a datanode disk was damaged. The log is as follows:

[2021-08-09T09:17:36,992][WARN ][o.e.g.G.InternalPrimaryShardAllocator][node1] [index_2021-08-07][3]: failed to list shard for shard_started on node [Jyd9FdobRayRSGEdQla3Ww]
org.elasticsearch.action.FailedNodeException: Failed node [Jyd9FdobRayRSGEdQla3Ww]
at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction.onFailure(TransportNodesAction.java:221) [elasticsearch-7.6.0.jar:7.6.0]
...
at java.lang.Thread.run(Thread.java:834) [?:?]
Caused by: org.elasticsearch.transport.RemoteTransportException: [node1][0.0.0.0:9300][internal:gateway/local/started_shards[n]]
Caused by: org.elasticsearch.ElasticsearchException: failed to load started shards
at org.elasticsearch.gateway.TransportNodesListGatewayStartedShards.nodeOperation(TransportNodesListGatewayStartedShards.java:168) ~[elasticsearch-7.6.0.jar:7.6.0]
...
at java.lang.Thread.run(Thread.java:834) ~[?:?]
Caused by: java.lang.IllegalStateException: environment is not locked
at org.elasticsearch.env.NodeEnvironment.assertEnvIsLocked(NodeEnvironment.java:1044) ~[elasticsearch-7.6.0.jar:7.6.0]
...
at java.lang.Thread.run(Thread.java:834) ~[?:?]
Caused by: java.io.IOException: Input/output error
at sun.nio.ch.FileDispatcherImpl.size0(Native Method) ~[?:?]
...
at java.lang.Thread.run(Thread.java:834) ~[?:?]

[2021-08-09T09:17:37,291][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler][node1] fatal error in thread [elasticsearch[node1][masterService#updateTask][T#436]], exiting
java.lang.StackOverflowError: null
at java.util.Collections$UnmodifiableCollection$1.(Collections.java:1042) ~[?:?]
at java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1041) ~[?:?]
at java.util.Collections$UnmodifiableCollection$1.(Collections.java:1042) ~[?:?]
...
at java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1041) ~[?:?]
at java.util.Collections$UnmodifiableCollection$1.(Collections.java:1042) ~[?:?]
at java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1041) ~[?:?]

Here is the code for the UnmodifiableCollection class from Java.
As you can see, when you call iterator to an immutable collection, it creates an instance on the anonymous class. The constructor of this class calls C. iterator ()... Where is the class wrapped in C. However, a stack trace means that it C itself is an immutable collection.

So I can think of a reasonable reason:

If your application is wrapping unmodifiable collections in unmodifiable collections to N levels, then creating an iterator will result in N * 2 levels of stack frames. For large enough N, that would lead to a stack overflow.

static class UnmodifiableCollection<E> implements Collection<E>, Serializable {
        final Collection<? extends E> c;
        UnmodifiableCollection(Collection<? extends E> c) {
            if (c==null)
                throw new NullPointerException();
            this.c = c;
        }
        ...
        public Iterator<E> iterator() {
            return new Iterator<E>() {
                private final Iterator<? extends E> i = c.iterator();

                public boolean hasNext() {return i.hasNext();}
                public E next()          {return i.next();}
                public void remove() {
                    throw new UnsupportedOperationException();
                }
                @Override
                public void forEachRemaining(Consumer<? super E> action) {
                    // Use backing collection version
                    i.forEachRemaining(action);
                }
            };
        }
}

Cause of problem:

Use Arthas to observe the calling path of the iterator method. The command is as follows:

  1. options unsafe true
  2. stack java.util.Collections$UnmodifiableCollection iterator

The following results will appear only when the process has just started and the above command is executed immediately using Arthas.

ts=2021-08-13 10:10:53;thread_name=elasticsearch[node1][masterService#updateTask][T#1];id=22;is_daemon=true;priority=5;TCCL=jdk.internal.loader.ClassLoaders$AppClassLoader@277050dc
@java.util.Collections$UnmodifiableCollection.iterator()
at java.util.Collections$UnmodifiableCollection$1.(Collections.java:1044)
at java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043)
...
at java.util.Collections$UnmodifiableCollection$1.(Collections.java:1044)
at java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043)
at org.elasticsearch.common.io.stream.StreamOutput.writeCollection(StreamOutput.java:1112)
at org.elasticsearch.cluster.routing.**UnassignedInfo.writeTo(**UnassignedInfo.java:297)
at org.elasticsearch.common.io.stream.StreamOutput.writeOptionalWriteable(StreamOutput.java:897)
at org.elasticsearch.cluster.routing.ShardRouting.writeToThin(ShardRouting.java:299)
at org.elasticsearch.cluster.routing.IndexShardRoutingTable$Builder.writeToThin(IndexShardRoutingTable.java:742)
at org.elasticsearch.cluster.routing.IndexRoutingTable.writeTo(IndexRoutingTable.java:321)
at org.elasticsearch.cluster.AbstractDiffable$CompleteDiff.writeTo(AbstractDiffable.java:81)
at org.elasticsearch.cluster.DiffableUtils$DiffableValueSerializer.writeDiff(DiffableUtils.java:647)
...
at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:175)
at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:253)
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:633)

Read the source code. Because the datanode disk is damaged, in active masternode BaseGatewayShardAllocator#makeAllocationDecision() method will return AllocateunassignedDecision. no (...), and this shard will be removeAndIgnore. When changing the reason why the shard is not allocated, call UnassignedInfo#getFailedNodeIds() to obtain an immutable collection, In the construction method of UnassignedInfo, a layer of immutable collection failedNodeIds will be wrapped, which cannot be changed.

--GatewayAllocator
------GatewayAllocator#allocateUnassigned()
----------GatewayAllocator#innerAllocatedUnassigned()
--------------BaseGatewayShardAllocator#allocateUnassigned()
------------------UnassignedIterator#removeAndIgnore()
----------------------UnassignedShards#ignoreShard() #currInfo.getFailedNodeIds() will get an immutable collection
--------------------------new UnassignedInfo() #this.failedNodeIds = Collections.unmodifiableSet(failedNodeIds)

Therefore, when the disk is damaged and does not come out for a period of time, it will cause stack overflow.

@elasticsearchmachine elasticsearchmachine added v8.0.0 external-contributor Pull request authored by a developer outside the Elasticsearch team labels Aug 13, 2021
@original-brownbear original-brownbear added :Distributed/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) v7.15.0 and removed external-contributor Pull request authored by a developer outside the Elasticsearch team labels Aug 14, 2021
@elasticmachine elasticmachine added the Team:Distributed Meta label for distributed team label Aug 14, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

Copy link
Member

@original-brownbear original-brownbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution @hanbj. I looked into this a little an we do indeed even during normal operation wrap this set a couple of times. Just a quick suggestion on making this shorter.

@@ -241,7 +242,7 @@ public UnassignedInfo(Reason reason, @Nullable String message, @Nullable Excepti
this.failure = failure;
this.failedAllocations = failedAllocations;
this.lastAllocationStatus = Objects.requireNonNull(lastAllocationStatus);
this.failedNodeIds = Collections.unmodifiableSet(failedNodeIds);
this.failedNodeIds = Collections.unmodifiableSet(new HashSet<>(failedNodeIds));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can just use Set.copyOf( here to be shorter and have a singleton for the empty case.

Copy link
Contributor Author

@hanbj hanbj Aug 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@original-brownbear Thank you very much for your suggestion. I see that the Set.copyOf() method is not public, and I use jdk14 to test the method, which also reports an error.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you mean. java.util.Set#copyOf is available from JDK 10 and public. In master we do not support any JDK version older than 10 so it's not a problem there and for the 7.x backport we do have a wrapper in place that we can use. We use that utility all over the codebase and I don't see a reason why it wouldn't work here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@original-brownbear Yes, you're right. Thank you very much. You're great. I've solved it.

Copy link
Member

@original-brownbear original-brownbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks for finding and tracking this down @hanbj ! I'll merge and backport this once green

@original-brownbear
Copy link
Member

Jenkins test this

@original-brownbear
Copy link
Member

Jenkins run elasticsearch-ci/rest-compatibility

@original-brownbear
Copy link
Member

@hanbj seems like we have some backwards compatibility issues here from recent changes, could you merge latest master into your branch again to fix them so we get a clean CI build? Thanks!

hanbj and others added 3 commits August 16, 2021 08:47
have a singleton instance when collection is not change
delete import HashSet
@hanbj
Copy link
Contributor Author

hanbj commented Aug 16, 2021

@original-brownbear OK, Thanks, I've handled it. Please try again.

@original-brownbear
Copy link
Member

Jenkins test this

@original-brownbear
Copy link
Member

np + Thanks again @hanbj !

@original-brownbear original-brownbear merged commit 1d8bad6 into elastic:master Aug 16, 2021
original-brownbear pushed a commit to original-brownbear/elasticsearch that referenced this pull request Aug 16, 2021
We kept wrapping the collection over and over again which in extreme corner cases could lead to a SOE.

Closes elastic#76490
original-brownbear added a commit that referenced this pull request Aug 16, 2021
We kept wrapping the collection over and over again which in extreme corner cases could lead to a SOE.

Closes #76490

Co-authored-by: hanbj <hanbj0707@163.com>
wjp719 added a commit to wjp719/elasticsearch that referenced this pull request Aug 17, 2021
* master: (868 commits)
  Query API key - Rest spec and yaml tests (elastic#76238)
  Delay shard reassignment from nodes which are known to be restarting (elastic#75606)
  Reenable bwc tests for elastic#76475 (elastic#76576)
  Set version to 7.15 in BWC code (elastic#76577)
  Don't remove warning headers on all failure (elastic#76434)
  Disable bwc tests for elastic#76475 (elastic#76541)
  Re-enable bwc tests (elastic#76567)
  Keep track of data recovered from snapshots in RecoveryState (elastic#76499)
  [Transform] Align transform checkpoint range with date_histogram interval for better performance (elastic#74004)
  EQL: Remove "wildcard" function (elastic#76099)
  Fix 'accept' and 'content_type' fields for search_mvt API
  Add persistent licensed feature tracking (elastic#76476)
  Add system data streams to feature state snapshots (elastic#75902)
  fix the error message for instance methods that don't exist (elastic#76512)
  ILM: Add validation of the number_of_shards parameter in Shrink Action of ILM (elastic#74219)
  remove dashboard only reserved role (elastic#76507)
  Fix Stack Overflow in UnassignedInfo in Corner Case (elastic#76480)
  Add (Extended)KeyUsage KeyUsage, CipherSuite & Protocol to SSL diagnostics (elastic#65634)
  Add recovery from snapshot to tests (elastic#76535)
  Reenable BwC Tests after elastic#76532 (elastic#76534)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) Team:Distributed Meta label for distributed team v7.15.0 v8.0.0-alpha2
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants