p2p/simulations: fix a deadlock and clean up adapters #17891

fjl · 2018-10-10T22:29:13Z

This fixes a rare deadlock with the inproc adapter:

A node is stopped, which acquires Network.lock.
The protocol code being simulated (swarm/network in my case)
waits for its goroutines to shut down.
One of those goroutines calls into the simulation to add a peer, which
waits for Network.lock.

The fix for the deadlock is really simple, just release the lock before stopping the
simulation node.

Other changes in this PR clean up the exec adapter so it reports
node startup errors better and remove the docker adapter because it just adds
overhead.

This avoids log parsing and allows reporting startup errors to the simulation host. A small change in package node was needed because simulation nodes use port zero. Node.{HTTP,WS}Endpoint now return the live endpoints after startup by checking the TCP listener.

This fixes a rare deadlock with the inproc adapter: - A node is stopped, which acquires Network.lock. - The protocol code being simulated (swarm/network in my case) waits for its goroutines to shut down. - One of those goroutines calls into the simulation to add a peer, which waits for Network.lock.

fjl · 2018-10-10T22:30:36Z

@justelad the adapters code would be a lot simpler if we removed the docker adapter. Why does it exist? I don't see the benefit of running the simulation binary through docker vs. just executing it directly.

nonsense · 2018-10-11T07:23:24Z

cc @lmars as I believe he has written some, if not most, of this code.

acud · 2018-10-11T08:01:09Z

@lmars do we really need to be running docker builds in a test run?

fjl · 2018-10-11T08:52:42Z

Docker adapter is now removed after short discussion with @nonsense.

lmars · 2018-10-11T09:48:03Z

FYI the DockerAdapter was added to make it easy to run simulations remotely (either on a single Docker host or on a Docker Swarm cluster), and was useful for running the overlay simulation in Docker and visualising the simulation with the visualisation demo.

The vision for the simulations code was that it was more about spinning up simulation clusters with a particular configuration, triggering actions and then observing the result. We were planning to add a generic "remote" adapter that could be pointed at say AWS to simulate more realistic scenarios (mass node churn, net splits, potential attacks etc.) and see what happens. This is why we added the p2psim CLI as a more interactive way to manage a simulation cluster.

I feel like removing the Docker adapter is a bit of a step back from that vision, narrowing the scope to just be about running local clusters (which aren't really realistic). If we're happy with this narrowing of scope (or feel like running more realistic simulations should be done elsewhere), I agree the Docker adapter adds unnecessary complexity.

fjl · 2018-10-11T09:52:46Z

What we discussed is that the docker adapter is useless in its current form. My hunch is that a 'remote' adapter would require significant changes anyway. We'll see what will be needed if/when anyone finds the time to actually implement a remote adapter.

lmars · 2018-10-11T09:57:38Z

I agree. At least you now know I had good intentions 😄

fjl added 3 commits October 11, 2018 00:13

p2p/simulations: improve log messages

dd67514

fjl requested a review from zsfelfoldi as a code owner October 10, 2018 22:29

fjl changed the title ~~2p/simulations: avoid holding Network lock while stopping node~~ p2p/simulations: avoid holding Network lock while stopping node Oct 10, 2018

fjl changed the title ~~p2p/simulations: avoid holding Network lock while stopping node~~ p2p/simulations: fix a rare deadlock with the inproc adapter Oct 10, 2018

fjl requested a review from acud October 10, 2018 22:30

fjl force-pushed the simulations-deadlock branch from df4efc5 to 093afaf Compare October 11, 2018 00:17

p2p/simulations/adapters: fix docker adapter

d8c86d7

fjl force-pushed the simulations-deadlock branch from 093afaf to d8c86d7 Compare October 11, 2018 00:28

fjl added 3 commits October 11, 2018 10:50

swarm/network/simulations/discovery: remove docker adapter simulation

a2e2018

p2p/simulations/examples: remove docker adapter

f44533c

p2p/simulations/adapters: remove docker adapter

0bc880a

fjl requested a review from zelig as a code owner October 11, 2018 08:52

fjl changed the title ~~p2p/simulations: fix a rare deadlock with the inproc adapter~~ p2p/simulations: fix a deadlock and clean up adapters Oct 11, 2018

nonsense self-requested a review October 11, 2018 13:53

nonsense approved these changes Oct 11, 2018

View reviewed changes

fjl merged commit dcae0d3 into ethereum:master Oct 11, 2018

karalabe added this to the 1.8.18 milestone Nov 7, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

p2p/simulations: fix a deadlock and clean up adapters #17891

p2p/simulations: fix a deadlock and clean up adapters #17891

fjl commented Oct 10, 2018 •

edited

Loading

fjl commented Oct 10, 2018 •

edited

Loading

nonsense commented Oct 11, 2018

acud commented Oct 11, 2018

fjl commented Oct 11, 2018

lmars commented Oct 11, 2018

fjl commented Oct 11, 2018

lmars commented Oct 11, 2018

p2p/simulations: fix a deadlock and clean up adapters #17891

p2p/simulations: fix a deadlock and clean up adapters #17891

Conversation

fjl commented Oct 10, 2018 • edited Loading

fjl commented Oct 10, 2018 • edited Loading

nonsense commented Oct 11, 2018

acud commented Oct 11, 2018

fjl commented Oct 11, 2018

lmars commented Oct 11, 2018

fjl commented Oct 11, 2018

lmars commented Oct 11, 2018

fjl commented Oct 10, 2018 •

edited

Loading

fjl commented Oct 10, 2018 •

edited

Loading