Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StackOverflowError in TransportClientNodesService #1930

Closed
jprante opened this issue May 9, 2012 · 7 comments
Closed

StackOverflowError in TransportClientNodesService #1930

jprante opened this issue May 9, 2012 · 7 comments
Assignees

Comments

@jprante
Copy link
Contributor

jprante commented May 9, 2012

I observed a StackOverflowError lately

Exception in thread "elasticsearch[generic]-pool-1-thread-24" java.lang.StackOverflowError
        at java.net.Inet4Address.getHostAddress(Inet4Address.java:322)
        at java.net.InetAddress.toString(InetAddress.java:663)
        at java.net.InetSocketAddress.toString(InetSocketAddress.java:276)
        at java.lang.String.valueOf(String.java:2902)
        at java.lang.StringBuilder.append(StringBuilder.java:128)
        at org.elasticsearch.common.transport.InetSocketTransportAddress.toString(InetSocketTransportAddress.java:150)
        at java.lang.String.valueOf(String.java:2902)
        at java.lang.StringBuilder.append(StringBuilder.java:128)
        at org.elasticsearch.transport.ActionTransportException.buildMessage(ActionTransportException.java:71)
        at org.elasticsearch.transport.ActionTransportException.<init>(ActionTransportException.java:46)
        at org.elasticsearch.transport.ConnectTransportException.<init>(ConnectTransportException.java:44)
        at org.elasticsearch.transport.ConnectTransportException.<init>(ConnectTransportException.java:32)
        at org.elasticsearch.transport.NodeNotConnectedException.<init>(NodeNotConnectedException.java:32)
        at org.elasticsearch.transport.netty.NettyTransport.nodeChannel(NettyTransport.java:637)
        at org.elasticsearch.transport.netty.NettyTransport.sendRequest(NettyTransport.java:445)
        at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:185)
        at org.elasticsearch.action.TransportActionNodeProxy.execute(TransportActionNodeProxy.java:63)
        at org.elasticsearch.client.transport.support.InternalTransportClient$2.doWithNode(InternalTransportClient.java:100)
        at org.elasticsearch.client.transport.TransportClientNodesService$RetryListener.onFailure(TransportClientNodesService.java:217)
        at org.elasticsearch.client.transport.TransportClientNodesService$RetryListener.onFailure(TransportClientNodesService.java:220)
        at org.elasticsearch.client.transport.TransportClientNodesService$RetryListener.onFailure(TransportClientNodesService.java:220)
        at org.elasticsearch.client.transport.TransportClientNodesService$RetryListener.onFailure(TransportClientNodesService.java:220)
        at org.elasticsearch.client.transport.TransportClientNodesService$RetryListener.onFailure(TransportClientNodesService.java:220)
        at org.elasticsearch.client.transport.TransportClientNodesService$RetryListener.onFailure(TransportClientNodesService.java:220)
        at org.elasticsearch.client.transport.TransportClientNodesService$RetryListener.onFailure(TransportClientNodesService.java:220)
        at org.elasticsearch.client.transport.TransportClientNodesService$RetryListener.onFailure(TransportClientNodesService.java:220)
        at org.elasticsearch.client.transport.TransportClientNodesService$RetryListener.onFailure(TransportClientNodesService.java:220)
        at org.elasticsearch.client.transport.TransportClientNodesService$RetryListener.onFailure(TransportClientNodesService.java:220)
        at org.elasticsearch.client.transport.TransportClientNodesService$RetryListener.onFailure(TransportClientNodesService.java:220)
        at org.elasticsearch.client.transport.TransportClientNodesService$RetryListener.onFailure(TransportClientNodesService.java:220)
        at org.elasticsearch.client.transport.TransportClientNodesService$RetryListener.onFailure(TransportClientNodesService.java:220)
        at org.elasticsearch.client.transport.TransportClientNodesService$RetryListener.onFailure(TransportClientNodesService.java:220)
        at org.elasticsearch.client.transport.TransportClientNodesService$RetryListener.onFailure(TransportClientNodesService.java:220)
        [...endlessly repeated... ]
@jprante
Copy link
Contributor Author

jprante commented May 10, 2012

Situation where this can happen:

  • many threads in parallel
  • TransportClient shared between threads
  • TransportClient in sniff mode
  • ES cluster on more than one node
  • heavy bulk transfer started, with lots of outstanding bulk responses
  • StackOverflowError might show up after bulk transfer is stopped with ctrl-c

A mockup client is here: https://gist.github.com/2652024

Bulk traffic needs to get simulated.

@jprante
Copy link
Contributor Author

jprante commented May 12, 2012

A few minutes ago I needed to interrupt my high-volume indexer with kill (so it causes InterruptedExceptions in all threads), and the StackOverflow happened again, this time mixed with additional messages.

See in this gist for the messages:

https://gist.github.com/2666459

I'm clueless, I think it's my fault, because I treat TransportClient badly - I just let it go.

How can I interrupt a TransportClient with many pending bulk requests so that it can shut down cleanly?

Is it possible to wait for outstanding bulks got processed (waiting for BulkResponses)?

@jprante
Copy link
Contributor Author

jprante commented May 12, 2012

Just noticed the StackOverflow happens while handling something with IPv6 , because I added IPv6 addresses to the TransportClient. But IPv6 cluster wide addressing is not possible because IPv6 is not supported by the network administration team. Hmmm. I should quit using IPv6 in TransportClient.

@kimchy
Copy link
Member

kimchy commented May 25, 2012

Heya @jprante, did you manage to recreate it at the end? (btw, there is no way to wait for pending bulk requests, sorry for the late answer).

@jprante
Copy link
Contributor Author

jprante commented May 29, 2012

Sorry for the lag. I still need some time writing an error-provoking bulk indexing text case and see how far it goes. As a workaound, I'm thinking about something as a "SafeTransportClient" that can wait for outstanding ActionListeners if InterruptedExceptions come in.

@ghost ghost assigned spinscale Oct 30, 2013
@spinscale
Copy link
Contributor

hey,

anything we can do here to help? Can you still reproduce this? Should we close this?

@jprante
Copy link
Contributor Author

jprante commented Oct 30, 2013

I'm quite sure the Mac OS X JVM version at that time was playing tricks on me with Java exceptions around open/close sockets/file descriptors and tight system resource limits. I have not seen it after the workaround. Also on Linux or Solaris, I never encountered this situation.

Actually I do not intend to dig deeper, beside it is hard to reproduce, it would involve tracing at OS level, maybe OS or JVM devs have to be bothered in that case. Will also soon move to OS X Maverick in the hope for new exciting bugs :)

So it's ok the issue was closed, I close it again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants