Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tests fail to start running due to ntp setup failure #339

Open
danielmai opened this issue Mar 15, 2019 · 4 comments
Open

Tests fail to start running due to ntp setup failure #339

danielmai opened this issue Mar 15, 2019 · 4 comments

Comments

@danielmai
Copy link
Contributor

I'm trying to set up Jepsen testing in Dgraph's TeamCity CI infrastructure. The tests run OK when I spin up a new CI agent machine (Ubuntu machines on Google Cloud), SSH into the box and run tests in an interactive shell. i.e., after setting up the Jepsen cluster via ./up.sh I can run tests with this command.

docker exec --workdir=/jepsen/dgraph jepsen-control lein run test -w bank --nemesis none --time-limit 300 --local-binary /gobin/dgraph --rebalance-interval 10h

But when this gets run via a triggered CI build, they always fail with NTP clock synchronization issues.

docker exec --workdir=/jepsen/dgraph jepsen-control lein run test -w bank --nemesis none --time-limit 300 --local-binary /gobin/dgraph --rebalance-interval 10h
 SLF4J: Class path contains multiple SLF4J bindings.
 SLF4J: Found binding in [jar:file:/root/.m2/repository/ch/qos/logback/logback-classic/1.1.3/logback-classic-1.1.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
 SLF4J: Found binding in [jar:file:/root/.m2/repository/org/slf4j/slf4j-simple/1.7.21/slf4j-simple-1.7.21.jar!/org/slf4j/impl/StaticLoggerBinder.class]
 SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
 SLF4J: Actual binding is of type [ch.qos.logback.classic.util.ContextSelectorStaticBinder]
 23:26:47.804 [main] INFO  jepsen.cli - Test options:
  {:final-recovery-time 10,
  :concurrency 5,
  :skew :small,
  :nemesis-interval 10,
  :ssh
  {:username "root",
   :password "root",
   :strict-host-key-checking false,
   :private-key-path nil},
  :retry-db-setup false,
  :local-binary "/gobin/dgraph",
  :defer-db-teardown false,
  :nemesis {:interval 10},
  :nodes ["n1" "n2" "n3" "n4" "n5"],
  :test-count 1,
  :replicas 3,
  :force-download false,
  :time-limit 300,
  :version "1.0.3",
  :upsert-schema true,
  :workload :bank,
  :rebalance-interval "10h"}
 
 INFO [2019-03-15 23:26:48,172] jepsen test runner - jepsen.core Running test:
  {:final-recovery-time 10,
  :concurrency 5,
  :db
  #object[jepsen.dgraph.support$db$reify__3595 0x7c79f2cf "jepsen.dgraph.support$db$reify__3595@7c79f2cf"],
  :name "dgraph v1.0.12-rc3-225-g207e1349 bank upsert nemesis=",
  :skew :small,
  :start-time
  #object[org.joda.time.DateTime 0x6aa792 "2019-03-15T23:26:48.000Z"],
  :nemesis-interval 10,
  :net
  #object[jepsen.net$reify__5331 0x6f36e806 "jepsen.net$reify__5331@6f36e806"],
  :client {:conn nil},
  :barrier
  #object[java.util.concurrent.CyclicBarrier 0x7aef21bb "java.util.concurrent.CyclicBarrier@7aef21bb"],
  :ssh
  {:username "root",
   :password "root",
   :strict-host-key-checking false,
   :private-key-path nil},
  :retry-db-setup false,
  :tracing {:endpoint nil, :config true, :exporter nil},
  :local-binary "/gobin/dgraph",
  :checker
  #object[jepsen.checker$compose$reify__4889 0x68f75a35 "jepsen.checker$compose$reify__4889@68f75a35"],
  :defer-db-teardown false,
  :nemesis
  #object[jepsen.nemesis$compose$reify__5459 0xa6204e4 "jepsen.nemesis$compose$reify__5459@a6204e4"],
  :active-histories #<Atom@45375bdf: #{}>,
  :nodes ["n1" "n2" "n3" "n4" "n5"],
  :test-count 1,
  :replicas 3,
  :force-download false,
  :total-amount 100,
  :generator
  (gen/timelimit 300 (gen/concat [(gen/on #{:nemesis} (gen/delayfn #object[clojure.core$partial$fn__5561 0x55315a00 "clojure.core$partial$fn__5561@55315a00"] (gen/gvoid))) (gen/on #object[clojure.core$complement$fn__5391 0x669daa93 "clojure.core$complement$fn__5391@669daa93"] (gen/mix [(gen/filter #object[jepsen.tests.bank$fn__1508 0x670ae31f "jepsen.tests.bank$fn__1508@670ae31f"] #object[jepsen.tests.bank$transfer 0x129aaac1 "jepsen.tests.bank$transfer@129aaac1"]) #object[jepsen.tests.bank$read 0x3d2b13f "jepsen.tests.bank$read@3d2b13f"]]))])),
  :max-transfer 5,
  :os
  #object[jepsen.os.debian.Debian 0x58c36104 "jepsen.os.debian.Debian@58c36104"],
  :time-limit 300,
  :version "v1.0.12-rc3-225-g207e1349",
  :upsert-schema true,
  :workload :bank,
  :accounts [0 1 2 3 4 5 6 7],
  :rebalance-interval "10h"}
 
 INFO [2019-03-15 23:26:49,224] jepsen node n1 - jepsen.os.debian n1 setting up debian
 INFO [2019-03-15 23:26:49,224] jepsen node n2 - jepsen.os.debian n2 setting up debian
 INFO [2019-03-15 23:26:49,224] jepsen node n3 - jepsen.os.debian n3 setting up debian
 INFO [2019-03-15 23:26:49,225] jepsen node n4 - jepsen.os.debian n4 setting up debian
 INFO [2019-03-15 23:26:49,225] jepsen node n5 - jepsen.os.debian n5 setting up debian
 INFO [2019-03-15 23:26:50,143] jepsen node n4 - jepsen.os.debian Installing #{apt-transport-https}
 INFO [2019-03-15 23:26:50,143] jepsen node n3 - jepsen.os.debian Installing #{apt-transport-https}
 INFO [2019-03-15 23:26:50,143] jepsen node n1 - jepsen.os.debian Installing #{apt-transport-https}
 INFO [2019-03-15 23:26:50,143] jepsen node n5 - jepsen.os.debian Installing #{apt-transport-https}
 INFO [2019-03-15 23:26:50,143] jepsen node n2 - jepsen.os.debian Installing #{apt-transport-https}
 INFO [2019-03-15 23:27:13,875] jepsen test runner - jepsen.db Tearing down DB
 INFO [2019-03-15 23:27:14,288] jepsen test runner - jepsen.db Setting up DB
 INFO [2019-03-15 23:27:14,391] jepsen node n4 - jepsen.dgraph.support Uploading /gobin/dgraph ...
 INFO [2019-03-15 23:27:14,391] jepsen node n2 - jepsen.dgraph.support Uploading /gobin/dgraph ...
 INFO [2019-03-15 23:27:14,391] jepsen node n3 - jepsen.dgraph.support Uploading /gobin/dgraph ...
 INFO [2019-03-15 23:27:14,391] jepsen node n1 - jepsen.dgraph.support Uploading /gobin/dgraph ...
 INFO [2019-03-15 23:27:14,392] jepsen node n5 - jepsen.dgraph.support Uploading /gobin/dgraph ...
 INFO [2019-03-15 23:27:16,658] jepsen node n3 - jepsen.os.debian Installing #{build-essential}
 INFO [2019-03-15 23:27:16,670] jepsen node n5 - jepsen.os.debian Installing #{build-essential}
 INFO [2019-03-15 23:27:16,673] jepsen node n1 - jepsen.os.debian Installing #{build-essential}
 INFO [2019-03-15 23:27:16,680] jepsen node n4 - jepsen.os.debian Installing #{build-essential}
 INFO [2019-03-15 23:27:16,680] jepsen node n2 - jepsen.os.debian Installing #{build-essential}
 INFO [2019-03-15 23:28:05,758] jepsen node n1 - jepsen.control.util starting dgraph
 INFO [2019-03-15 23:28:15,962] jepsen node n4 - jepsen.control.util starting dgraph
 INFO [2019-03-15 23:28:15,962] jepsen node n3 - jepsen.control.util starting dgraph
 INFO [2019-03-15 23:28:15,962] jepsen node n5 - jepsen.control.util starting dgraph
 INFO [2019-03-15 23:28:15,962] jepsen node n2 - jepsen.control.util starting dgraph
 INFO [2019-03-15 23:28:16,369] jepsen node n1 - jepsen.control.util starting dgraph
 INFO [2019-03-15 23:28:16,369] jepsen node n4 - jepsen.control.util starting dgraph
 INFO [2019-03-15 23:28:16,370] jepsen node n2 - jepsen.control.util starting dgraph
 INFO [2019-03-15 23:28:16,370] jepsen node n5 - jepsen.control.util starting dgraph
 INFO [2019-03-15 23:28:16,370] jepsen node n3 - jepsen.control.util starting dgraph
 INFO [2019-03-15 23:28:16,801] jepsen node n1 - jepsen.dgraph.support Cluster converged
 INFO [2019-03-15 23:28:18,943] jepsen node n4 - jepsen.dgraph.support GRPC ready
 INFO [2019-03-15 23:28:18,943] jepsen node n3 - jepsen.dgraph.support GRPC ready
 INFO [2019-03-15 23:28:18,943] jepsen node n5 - jepsen.dgraph.support GRPC ready
 INFO [2019-03-15 23:28:18,943] jepsen node n2 - jepsen.dgraph.support GRPC ready
 INFO [2019-03-15 23:28:18,945] jepsen node n1 - jepsen.dgraph.support GRPC ready
 INFO [2019-03-15 23:28:18,954] jepsen nemesis - jepsen.core Setting up nemesis
 INFO [2019-03-15 23:28:18,955] jepsen worker 0 - jepsen.core Setting up worker 0
 INFO [2019-03-15 23:28:18,955] jepsen worker 1 - jepsen.core Setting up worker 1
 INFO [2019-03-15 23:28:18,955] jepsen worker 2 - jepsen.core Setting up worker 2
 INFO [2019-03-15 23:28:18,956] jepsen worker 3 - jepsen.core Setting up worker 3
 INFO [2019-03-15 23:28:18,956] jepsen worker 4 - jepsen.core Setting up worker 4
 INFO [2019-03-15 23:28:19,607] jepsen worker 0 - jepsen.dgraph.bank Schema altered
 INFO [2019-03-15 23:28:19,609] jepsen worker 0 - jepsen.dgraph.bank Upserting {:key_0 0, :type_0 account, :amount_0 100}
 INFO [2019-03-15 23:28:19,695] jepsen worker 2 - jepsen.dgraph.bank Schema altered
 INFO [2019-03-15 23:28:19,696] jepsen worker 2 - jepsen.dgraph.bank Upserting {:key_0 0, :type_0 account, :amount_0 100}
 INFO [2019-03-15 23:28:19,706] jepsen worker 3 - jepsen.dgraph.bank Schema altered
 INFO [2019-03-15 23:28:19,707] jepsen worker 3 - jepsen.dgraph.bank Upserting {:key_0 0, :type_0 account, :amount_0 100}
 INFO [2019-03-15 23:28:19,718] jepsen worker 1 - jepsen.dgraph.bank Schema altered
 INFO [2019-03-15 23:28:19,719] jepsen worker 1 - jepsen.dgraph.bank Upserting {:key_0 0, :type_0 account, :amount_0 100}
 INFO [2019-03-15 23:28:19,730] jepsen worker 4 - jepsen.dgraph.bank Schema altered
 INFO [2019-03-15 23:28:19,731] jepsen worker 4 - jepsen.dgraph.bank Upserting {:key_0 0, :type_0 account, :amount_0 100}
 WARN [2019-03-15 23:28:28,236] jepsen nemesis - jepsen.dgraph.nemesis Error resetting clock with NTP, retrying...
 WARN [2019-03-15 23:28:37,973] jepsen nemesis - jepsen.dgraph.nemesis Error resetting clock with NTP, retrying...
 WARN [2019-03-15 23:28:47,537] jepsen nemesis - jepsen.dgraph.nemesis Error resetting clock with NTP, retrying...
 INFO [2019-03-15 23:28:57,221] jepsen nemesis - jepsen.core Tearing down nemesis
 INFO [2019-03-15 23:28:57,221] jepsen worker 0 - jepsen.core Tearing down worker 0
 INFO [2019-03-15 23:28:57,221] jepsen worker 4 - jepsen.core Tearing down worker 4
 INFO [2019-03-15 23:28:57,221] jepsen worker 1 - jepsen.core Tearing down worker 1
 INFO [2019-03-15 23:28:57,221] jepsen worker 2 - jepsen.core Tearing down worker 2
 INFO [2019-03-15 23:28:57,221] jepsen worker 3 - jepsen.core Tearing down worker 3
 INFO [2019-03-15 23:28:57,231] jepsen test runner - jepsen.core Snarfing log files
 INFO [2019-03-15 23:28:57,942] jepsen node n3 - jepsen.core downloading /opt/dgraph/alpha.log to alpha.log
 INFO [2019-03-15 23:28:57,942] jepsen node n1 - jepsen.core downloading /opt/dgraph/alpha.log to alpha.log
 INFO [2019-03-15 23:28:57,942] jepsen node n2 - jepsen.core downloading /opt/dgraph/alpha.log to alpha.log
 INFO [2019-03-15 23:28:57,942] jepsen node n5 - jepsen.core downloading /opt/dgraph/alpha.log to alpha.log
 INFO [2019-03-15 23:28:57,943] jepsen node n4 - jepsen.core downloading /opt/dgraph/alpha.log to alpha.log
 INFO [2019-03-15 23:28:57,959] jepsen node n3 - jepsen.core downloading /opt/dgraph/zero.log to zero.log
 INFO [2019-03-15 23:28:57,959] jepsen node n4 - jepsen.core downloading /opt/dgraph/zero.log to zero.log
 INFO [2019-03-15 23:28:57,959] jepsen node n1 - jepsen.core downloading /opt/dgraph/zero.log to zero.log
 INFO [2019-03-15 23:28:57,960] jepsen node n5 - jepsen.core downloading /opt/dgraph/zero.log to zero.log
 INFO [2019-03-15 23:28:57,960] jepsen node n2 - jepsen.core downloading /opt/dgraph/zero.log to zero.log
 INFO [2019-03-15 23:28:57,974] jepsen node n5 - jepsen.core downloading /opt/dgraph/data.tar.bz2 to data.tar.bz2
 INFO [2019-03-15 23:28:57,974] jepsen node n4 - jepsen.core downloading /opt/dgraph/data.tar.bz2 to data.tar.bz2
 INFO [2019-03-15 23:28:57,975] jepsen node n3 - jepsen.core downloading /opt/dgraph/data.tar.bz2 to data.tar.bz2
 INFO [2019-03-15 23:28:57,974] jepsen node n1 - jepsen.core downloading /opt/dgraph/data.tar.bz2 to data.tar.bz2
 INFO [2019-03-15 23:28:57,976] jepsen node n2 - jepsen.core downloading /opt/dgraph/data.tar.bz2 to data.tar.bz2
 INFO [2019-03-15 23:28:58,190] jepsen node n5 - jepsen.control.util Stopping /opt/dgraph/alpha.pid
 INFO [2019-03-15 23:28:58,191] jepsen node n2 - jepsen.control.util Stopping /opt/dgraph/alpha.pid
 INFO [2019-03-15 23:28:58,191] jepsen node n3 - jepsen.control.util Stopping /opt/dgraph/alpha.pid
 INFO [2019-03-15 23:28:58,191] jepsen node n1 - jepsen.control.util Stopping /opt/dgraph/alpha.pid
 INFO [2019-03-15 23:28:58,193] jepsen node n4 - jepsen.control.util Stopping /opt/dgraph/alpha.pid
 INFO [2019-03-15 23:28:58,596] jepsen node n5 - jepsen.control.util Stopping /opt/dgraph/zero.pid
 INFO [2019-03-15 23:28:58,597] jepsen node n4 - jepsen.control.util Stopping /opt/dgraph/zero.pid
 INFO [2019-03-15 23:28:58,601] jepsen node n1 - jepsen.control.util Stopping /opt/dgraph/zero.pid
 INFO [2019-03-15 23:28:58,601] jepsen node n3 - jepsen.control.util Stopping /opt/dgraph/zero.pid
 INFO [2019-03-15 23:28:58,601] jepsen node n2 - jepsen.control.util Stopping /opt/dgraph/zero.pid
 WARN [2019-03-15 23:28:59,022] main - jepsen.core Test crashed!
 java.util.concurrent.ExecutionException: clojure.lang.ExceptionInfo: throw+: {:type :jepsen.control/nonzero-exit, :cmd "sudo -S -u root bash -c \"cd /; ntpdate -b pool.ntp.org\"", :exit 1, :out "", :err "15 Mar 23:28:57 ntpdate[6305]: no server suitable for synchronization found\n", :host "n4", :action {:cmd "sudo -S -u root bash -c \"cd /; ntpdate -b pool.ntp.org\"", :in "root\n"}} {:type :jepsen.control/nonzero-exit, :cmd "sudo -S -u root bash -c \"cd /; ntpdate -b pool.ntp.org\"", :exit 1, :out "", :err "15 Mar 23:28:57 ntpdate[6305]: no server suitable for synchronization found\n", :host "n4", :action {:cmd "sudo -S -u root bash -c \"cd /; ntpdate -b pool.ntp.org\"", :in "root\n"}}
 	at java.util.concurrent.FutureTask.report(FutureTask.java:122) ~[na:1.8.0_191]
 	at java.util.concurrent.FutureTask.get(FutureTask.java:192) ~[na:1.8.0_191]
 	at clojure.core$deref_future.invokeStatic(core.clj:2292) ~[clojure-1.9.0.jar:na]
 	at clojure.core$future_call$reify__8097.deref(core.clj:6894) ~[clojure-1.9.0.jar:na]
 	at clojure.core$deref.invokeStatic(core.clj:2312) ~[clojure-1.9.0.jar:na]
 	at clojure.core$deref.invoke(core.clj:2298) ~[clojure-1.9.0.jar:na]
 	at clojure.core$map$fn__5587.invoke(core.clj:2745) ~[clojure-1.9.0.jar:na]
 	at clojure.lang.LazySeq.sval(LazySeq.java:40) ~[clojure-1.9.0.jar:na]
 	at clojure.lang.LazySeq.seq(LazySeq.java:49) ~[clojure-1.9.0.jar:na]
 	at clojure.lang.RT.seq(RT.java:528) ~[clojure-1.9.0.jar:na]
 	at clojure.core$seq__5124.invokeStatic(core.clj:137) ~[clojure-1.9.0.jar:na]
 	at clojure.core.protocols$seq_reduce.invokeStatic(protocols.clj:24) ~[clojure-1.9.0.jar:na]
 	at clojure.core.protocols$fn__7835.invokeStatic(protocols.clj:75) ~[clojure-1.9.0.jar:na]
 	at clojure.core.protocols$fn__7835.invoke(protocols.clj:75) ~[clojure-1.9.0.jar:na]
 	at clojure.core.protocols$fn__7781$G__7776__7794.invoke(protocols.clj:13) ~[clojure-1.9.0.jar:na]
 	at clojure.core$reduce.invokeStatic(core.clj:6748) ~[clojure-1.9.0.jar:na]
 	at clojure.core$into.invokeStatic(core.clj:6815) ~[clojure-1.9.0.jar:na]
 	at clojure.core$into.invoke(core.clj:6807) ~[clojure-1.9.0.jar:na]
 	at jepsen.control$on_nodes.invokeStatic(control.clj:385) ~[jepsen-0.1.12.jar:na]
 	at jepsen.control$on_nodes.invoke(control.clj:369) ~[jepsen-0.1.12.jar:na]
 	at jepsen.control$on_nodes.invokeStatic(control.clj:374) ~[jepsen-0.1.12.jar:na]
 	at jepsen.control$on_nodes.invoke(control.clj:369) ~[jepsen-0.1.12.jar:na]
 	at jepsen.nemesis.time$reset_time_BANG_.invokeStatic(time.clj:75) ~[na:na]
 	at jepsen.nemesis.time$reset_time_BANG_.invoke(time.clj:71) ~[na:na]
 	at jepsen.dgraph.nemesis$bump_time$reify__3656$fn__3661.invoke(nemesis.clj:95) ~[na:na]
 	at jepsen.dgraph.nemesis$bump_time$reify__3656.setup_BANG_(nemesis.clj:95) ~[na:na]
 	at jepsen.nemesis$setup_compat_BANG_.invokeStatic(nemesis.clj:30) ~[jepsen-0.1.12.jar:na]
 	at jepsen.nemesis$setup_compat_BANG_.invoke(nemesis.clj:23) ~[jepsen-0.1.12.jar:na]
 	at jepsen.nemesis$compose$reify__5459$fn__5460.invoke(nemesis.clj:198) ~[jepsen-0.1.12.jar:na]
 	at jepsen.util$map_vals$fn__2202.invoke(util.clj:509) ~[jepsen-0.1.12.jar:na]
 	at clojure.core.reducers$map$fn__554$fn__555.invoke(reducers.clj:167) ~[knossos-0.3.4.jar:na]
 	at clojure.core.protocols$iter_reduce.invokeStatic(protocols.clj:49) ~[clojure-1.9.0.jar:na]
 	at clojure.core.protocols$fn__7839.invokeStatic(protocols.clj:75) ~[clojure-1.9.0.jar:na]
 	at clojure.core.protocols$fn__7839.invoke(protocols.clj:75) ~[clojure-1.9.0.jar:na]
 	at clojure.core.protocols$fn__7781$G__7776__7794.invoke(protocols.clj:13) ~[clojure-1.9.0.jar:na]
 	at clojure.core.reducers$folder$reify__537.coll_reduce(reducers.clj:126) ~[knossos-0.3.4.jar:na]
 	at clojure.core$reduce.invokeStatic(core.clj:6748) ~[clojure-1.9.0.jar:na]
 	at clojure.core$into.invokeStatic(core.clj:6815) ~[clojure-1.9.0.jar:na]
 	at clojure.core$into.invoke(core.clj:6807) ~[clojure-1.9.0.jar:na]
 	at jepsen.util$map_kv.invokeStatic(util.clj:499) ~[jepsen-0.1.12.jar:na]
 	at jepsen.util$map_kv.invoke(util.clj:495) ~[jepsen-0.1.12.jar:na]
 	at jepsen.util$map_vals.invokeStatic(util.clj:509) ~[jepsen-0.1.12.jar:na]
 	at jepsen.util$map_vals.invoke(util.clj:506) ~[jepsen-0.1.12.jar:na]
 	at jepsen.nemesis$compose$reify__5459.setup_BANG_(nemesis.clj:198) ~[jepsen-0.1.12.jar:na]
 	at jepsen.nemesis$setup_compat_BANG_.invokeStatic(nemesis.clj:30) ~[jepsen-0.1.12.jar:na]
 	at jepsen.nemesis$setup_compat_BANG_.invoke(nemesis.clj:23) ~[jepsen-0.1.12.jar:na]
 	at jepsen.core.NemesisWorker.setup_worker_BANG_(core.clj:378) ~[jepsen-0.1.12.jar:na]
 	at jepsen.core$run_workers_BANG_$setup__5635.invoke(core.clj:180) ~[jepsen-0.1.12.jar:na]
 	at dom_top.core$real_pmap$build_thread__214$fn__215.invoke(core.clj:166) ~[jepsen-0.1.12.jar:na]
 	at clojure.lang.AFn.applyToHelper(AFn.java:152) ~[clojure-1.9.0.jar:na]
 	at clojure.lang.AFn.applyTo(AFn.java:144) ~[clojure-1.9.0.jar:na]
 	at clojure.core$apply.invokeStatic(core.clj:657) ~[clojure-1.9.0.jar:na]
 	at clojure.core$with_bindings_STAR_.invokeStatic(core.clj:1965) ~[clojure-1.9.0.jar:na]
 	at clojure.core$with_bindings_STAR_.doInvoke(core.clj:1965) ~[clojure-1.9.0.jar:na]
 	at clojure.lang.RestFn.invoke(RestFn.java:425) ~[clojure-1.9.0.jar:na]
 	at clojure.lang.AFn.applyToHelper(AFn.java:156) ~[clojure-1.9.0.jar:na]
 	at clojure.lang.RestFn.applyTo(RestFn.java:132) ~[clojure-1.9.0.jar:na]
 	at clojure.core$apply.invokeStatic(core.clj:661) ~[clojure-1.9.0.jar:na]
 	at clojure.core$bound_fn_STAR_$fn__5471.doInvoke(core.clj:1995) ~[clojure-1.9.0.jar:na]
 	at clojure.lang.RestFn.invoke(RestFn.java:397) ~[clojure-1.9.0.jar:na]
 	at clojure.lang.AFn.run(AFn.java:22) ~[clojure-1.9.0.jar:na]
 	at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_191]
 Caused by: clojure.lang.ExceptionInfo: throw+: {:type :jepsen.control/nonzero-exit, :cmd "sudo -S -u root bash -c \"cd /; ntpdate -b pool.ntp.org\"", :exit 1, :out "", :err "15 Mar 23:28:57 ntpdate[6305]: no server suitable for synchronization found\n", :host "n4", :action {:cmd "sudo -S -u root bash -c \"cd /; ntpdate -b pool.ntp.org\"", :in "root\n"}}
 	at slingshot.support$stack_trace.invoke(support.clj:201) ~[knossos-0.3.4.jar:na]
 	at jepsen.control$throw_on_nonzero_exit.invokeStatic(control.clj:127) ~[jepsen-0.1.12.jar:na]
 	at jepsen.control$throw_on_nonzero_exit.invoke(control.clj:122) ~[jepsen-0.1.12.jar:na]
 	at jepsen.control$exec_STAR_.invokeStatic(control.clj:163) ~[jepsen-0.1.12.jar:na]
 	at jepsen.control$exec_STAR_.doInvoke(control.clj:160) ~[jepsen-0.1.12.jar:na]
 	at clojure.lang.RestFn.applyTo(RestFn.java:137) ~[clojure-1.9.0.jar:na]
 	at clojure.core$apply.invokeStatic(core.clj:657) ~[clojure-1.9.0.jar:na]
 	at clojure.core$apply.invoke(core.clj:652) ~[clojure-1.9.0.jar:na]
 	at jepsen.control$exec.invokeStatic(control.clj:179) ~[jepsen-0.1.12.jar:na]
 	at jepsen.control$exec.doInvoke(control.clj:173) ~[jepsen-0.1.12.jar:na]
 	at clojure.lang.RestFn.invoke(RestFn.java:436) ~[clojure-1.9.0.jar:na]
 	at jepsen.nemesis.time$reset_time_BANG_.invokeStatic(time.clj:74) ~[na:na]
 	at jepsen.nemesis.time$reset_time_BANG_.invoke(time.clj:71) ~[na:na]
 	at jepsen.nemesis.time$reset_time_BANG_$fn__2149.invoke(time.clj:75) ~[na:na]
 	at jepsen.control$on_nodes$fn__2893.invoke(control.clj:384) ~[jepsen-0.1.12.jar:na]
 	at clojure.lang.AFn.applyToHelper(AFn.java:154) ~[clojure-1.9.0.jar:na]
 	at clojure.lang.AFn.applyTo(AFn.java:144) ~[clojure-1.9.0.jar:na]
 	at clojure.core$apply.invokeStatic(core.clj:657) ~[clojure-1.9.0.jar:na]
 	at clojure.core$with_bindings_STAR_.invokeStatic(core.clj:1965) ~[clojure-1.9.0.jar:na]
 	at clojure.core$with_bindings_STAR_.doInvoke(core.clj:1965) ~[clojure-1.9.0.jar:na]
 	at clojure.lang.RestFn.applyTo(RestFn.java:142) ~[clojure-1.9.0.jar:na]
 	at clojure.core$apply.invokeStatic(core.clj:661) ~[clojure-1.9.0.jar:na]
 	at clojure.core$bound_fn_STAR_$fn__5471.doInvoke(core.clj:1995) ~[clojure-1.9.0.jar:na]
 	at clojure.lang.RestFn.invoke(RestFn.java:408) ~[clojure-1.9.0.jar:na]
 	at jepsen.util$real_pmap$launcher__2044$fn__2045.invoke(util.clj:50) ~[jepsen-0.1.12.jar:na]
 	at clojure.core$binding_conveyor_fn$fn__5476.invoke(core.clj:2022) ~[clojure-1.9.0.jar:na]
 	at clojure.lang.AFn.call(AFn.java:18) ~[clojure-1.9.0.jar:na]
 	at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_191]
 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[na:1.8.0_191]
 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[na:1.8.0_191]
 	... 1 common frames omitted
 ERROR [2019-03-15 23:28:59,029] main - jepsen.cli Oh jeez, I'm sorry, Jepsen broke. Here's why:
 java.util.concurrent.ExecutionException: clojure.lang.ExceptionInfo: throw+: {:type :jepsen.control/nonzero-exit, :cmd "sudo -S -u root bash -c \"cd /; ntpdate -b pool.ntp.org\"", :exit 1, :out "", :err "15 Mar 23:28:57 ntpdate[6305]: no server suitable for synchronization found\n", :host "n4", :action {:cmd "sudo -S -u root bash -c \"cd /; ntpdate -b pool.ntp.org\"", :in "root\n"}} {:type :jepsen.control/nonzero-exit, :cmd "sudo -S -u root bash -c \"cd /; ntpdate -b pool.ntp.org\"", :exit 1, :out "", :err "15 Mar 23:28:57 ntpdate[6305]: no server suitable for synchronization found\n", :host "n4", :action {:cmd "sudo -S -u root bash -c \"cd /; ntpdate -b pool.ntp.org\"", :in "root\n"}}
 	at java.util.concurrent.FutureTask.report(FutureTask.java:122) ~[na:1.8.0_191]
 	at java.util.concurrent.FutureTask.get(FutureTask.java:192) ~[na:1.8.0_191]
 	at clojure.core$deref_future.invokeStatic(core.clj:2292) ~[clojure-1.9.0.jar:na]
 	at clojure.core$future_call$reify__8097.deref(core.clj:6894) ~[clojure-1.9.0.jar:na]
 	at clojure.core$deref.invokeStatic(core.clj:2312) ~[clojure-1.9.0.jar:na]
 	at clojure.core$deref.invoke(core.clj:2298) ~[clojure-1.9.0.jar:na]
 	at clojure.core$map$fn__5587.invoke(core.clj:2745) ~[clojure-1.9.0.jar:na]
 	at clojure.lang.LazySeq.sval(LazySeq.java:40) ~[clojure-1.9.0.jar:na]
 	at clojure.lang.LazySeq.seq(LazySeq.java:49) ~[clojure-1.9.0.jar:na]
 	at clojure.lang.RT.seq(RT.java:528) ~[clojure-1.9.0.jar:na]
 	at clojure.core$seq__5124.invokeStatic(core.clj:137) ~[clojure-1.9.0.jar:na]
 	at clojure.core.protocols$seq_reduce.invokeStatic(protocols.clj:24) ~[clojure-1.9.0.jar:na]
 	at clojure.core.protocols$fn__7835.invokeStatic(protocols.clj:75) ~[clojure-1.9.0.jar:na]
 	at clojure.core.protocols$fn__7835.invoke(protocols.clj:75) ~[clojure-1.9.0.jar:na]
 	at clojure.core.protocols$fn__7781$G__7776__7794.invoke(protocols.clj:13) ~[clojure-1.9.0.jar:na]
 	at clojure.core$reduce.invokeStatic(core.clj:6748) ~[clojure-1.9.0.jar:na]
 	at clojure.core$into.invokeStatic(core.clj:6815) ~[clojure-1.9.0.jar:na]
 	at clojure.core$into.invoke(core.clj:6807) ~[clojure-1.9.0.jar:na]
 	at jepsen.control$on_nodes.invokeStatic(control.clj:385) ~[jepsen-0.1.12.jar:na]
 	at jepsen.control$on_nodes.invoke(control.clj:369) ~[jepsen-0.1.12.jar:na]
 	at jepsen.control$on_nodes.invokeStatic(control.clj:374) ~[jepsen-0.1.12.jar:na]
 	at jepsen.control$on_nodes.invoke(control.clj:369) ~[jepsen-0.1.12.jar:na]
 	at jepsen.nemesis.time$reset_time_BANG_.invokeStatic(time.clj:75) ~[na:na]
 	at jepsen.nemesis.time$reset_time_BANG_.invoke(time.clj:71) ~[na:na]
 	at jepsen.dgraph.nemesis$bump_time$reify__3656$fn__3661.invoke(nemesis.clj:95) ~[na:na]
 	at jepsen.dgraph.nemesis$bump_time$reify__3656.setup_BANG_(nemesis.clj:95) ~[na:na]
 	at jepsen.nemesis$setup_compat_BANG_.invokeStatic(nemesis.clj:30) ~[jepsen-0.1.12.jar:na]
 	at jepsen.nemesis$setup_compat_BANG_.invoke(nemesis.clj:23) ~[jepsen-0.1.12.jar:na]
 	at jepsen.nemesis$compose$reify__5459$fn__5460.invoke(nemesis.clj:198) ~[jepsen-0.1.12.jar:na]
 	at jepsen.util$map_vals$fn__2202.invoke(util.clj:509) ~[jepsen-0.1.12.jar:na]
 	at clojure.core.reducers$map$fn__554$fn__555.invoke(reducers.clj:167) ~[knossos-0.3.4.jar:na]
 	at clojure.core.protocols$iter_reduce.invokeStatic(protocols.clj:49) ~[clojure-1.9.0.jar:na]
 	at clojure.core.protocols$fn__7839.invokeStatic(protocols.clj:75) ~[clojure-1.9.0.jar:na]
 	at clojure.core.protocols$fn__7839.invoke(protocols.clj:75) ~[clojure-1.9.0.jar:na]
 	at clojure.core.protocols$fn__7781$G__7776__7794.invoke(protocols.clj:13) ~[clojure-1.9.0.jar:na]
 	at clojure.core.reducers$folder$reify__537.coll_reduce(reducers.clj:126) ~[knossos-0.3.4.jar:na]
 	at clojure.core$reduce.invokeStatic(core.clj:6748) ~[clojure-1.9.0.jar:na]
 	at clojure.core$into.invokeStatic(core.clj:6815) ~[clojure-1.9.0.jar:na]
 	at clojure.core$into.invoke(core.clj:6807) ~[clojure-1.9.0.jar:na]
 	at jepsen.util$map_kv.invokeStatic(util.clj:499) ~[jepsen-0.1.12.jar:na]
 	at jepsen.util$map_kv.invoke(util.clj:495) ~[jepsen-0.1.12.jar:na]
 	at jepsen.util$map_vals.invokeStatic(util.clj:509) ~[jepsen-0.1.12.jar:na]
 	at jepsen.util$map_vals.invoke(util.clj:506) ~[jepsen-0.1.12.jar:na]
 	at jepsen.nemesis$compose$reify__5459.setup_BANG_(nemesis.clj:198) ~[jepsen-0.1.12.jar:na]
 	at jepsen.nemesis$setup_compat_BANG_.invokeStatic(nemesis.clj:30) ~[jepsen-0.1.12.jar:na]
 	at jepsen.nemesis$setup_compat_BANG_.invoke(nemesis.clj:23) ~[jepsen-0.1.12.jar:na]
 	at jepsen.core.NemesisWorker.setup_worker_BANG_(core.clj:378) ~[jepsen-0.1.12.jar:na]
 	at jepsen.core$run_workers_BANG_$setup__5635.invoke(core.clj:180) ~[jepsen-0.1.12.jar:na]
 	at dom_top.core$real_pmap$build_thread__214$fn__215.invoke(core.clj:166) ~[jepsen-0.1.12.jar:na]
 	at clojure.lang.AFn.applyToHelper(AFn.java:152) ~[clojure-1.9.0.jar:na]
 	at clojure.lang.AFn.applyTo(AFn.java:144) ~[clojure-1.9.0.jar:na]
 	at clojure.core$apply.invokeStatic(core.clj:657) ~[clojure-1.9.0.jar:na]
 	at clojure.core$with_bindings_STAR_.invokeStatic(core.clj:1965) ~[clojure-1.9.0.jar:na]
 	at clojure.core$with_bindings_STAR_.doInvoke(core.clj:1965) ~[clojure-1.9.0.jar:na]
 	at clojure.lang.RestFn.invoke(RestFn.java:425) ~[clojure-1.9.0.jar:na]
 	at clojure.lang.AFn.applyToHelper(AFn.java:156) ~[clojure-1.9.0.jar:na]
 	at clojure.lang.RestFn.applyTo(RestFn.java:132) ~[clojure-1.9.0.jar:na]
 	at clojure.core$apply.invokeStatic(core.clj:661) ~[clojure-1.9.0.jar:na]
 	at clojure.core$bound_fn_STAR_$fn__5471.doInvoke(core.clj:1995) ~[clojure-1.9.0.jar:na]
 	at clojure.lang.RestFn.invoke(RestFn.java:397) ~[clojure-1.9.0.jar:na]
 	at clojure.lang.AFn.run(AFn.java:22) ~[clojure-1.9.0.jar:na]
 	at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_191]
 Caused by: clojure.lang.ExceptionInfo: throw+: {:type :jepsen.control/nonzero-exit, :cmd "sudo -S -u root bash -c \"cd /; ntpdate -b pool.ntp.org\"", :exit 1, :out "", :err "15 Mar 23:28:57 ntpdate[6305]: no server suitable for synchronization found\n", :host "n4", :action {:cmd "sudo -S -u root bash -c \"cd /; ntpdate -b pool.ntp.org\"", :in "root\n"}}
 	at slingshot.support$stack_trace.invoke(support.clj:201) ~[knossos-0.3.4.jar:na]
 	at jepsen.control$throw_on_nonzero_exit.invokeStatic(control.clj:127) ~[jepsen-0.1.12.jar:na]
 	at jepsen.control$throw_on_nonzero_exit.invoke(control.clj:122) ~[jepsen-0.1.12.jar:na]
 	at jepsen.control$exec_STAR_.invokeStatic(control.clj:163) ~[jepsen-0.1.12.jar:na]
 	at jepsen.control$exec_STAR_.doInvoke(control.clj:160) ~[jepsen-0.1.12.jar:na]
 	at clojure.lang.RestFn.applyTo(RestFn.java:137) ~[clojure-1.9.0.jar:na]
 	at clojure.core$apply.invokeStatic(core.clj:657) ~[clojure-1.9.0.jar:na]
 	at clojure.core$apply.invoke(core.clj:652) ~[clojure-1.9.0.jar:na]
 	at jepsen.control$exec.invokeStatic(control.clj:179) ~[jepsen-0.1.12.jar:na]
 	at jepsen.control$exec.doInvoke(control.clj:173) ~[jepsen-0.1.12.jar:na]
 	at clojure.lang.RestFn.invoke(RestFn.java:436) ~[clojure-1.9.0.jar:na]
 	at jepsen.nemesis.time$reset_time_BANG_.invokeStatic(time.clj:74) ~[na:na]
 	at jepsen.nemesis.time$reset_time_BANG_.invoke(time.clj:71) ~[na:na]
 	at jepsen.nemesis.time$reset_time_BANG_$fn__2149.invoke(time.clj:75) ~[na:na]
 	at jepsen.control$on_nodes$fn__2893.invoke(control.clj:384) ~[jepsen-0.1.12.jar:na]
 	at clojure.lang.AFn.applyToHelper(AFn.java:154) ~[clojure-1.9.0.jar:na]
 	at clojure.lang.AFn.applyTo(AFn.java:144) ~[clojure-1.9.0.jar:na]
 	at clojure.core$apply.invokeStatic(core.clj:657) ~[clojure-1.9.0.jar:na]
 	at clojure.core$with_bindings_STAR_.invokeStatic(core.clj:1965) ~[clojure-1.9.0.jar:na]
 	at clojure.core$with_bindings_STAR_.doInvoke(core.clj:1965) ~[clojure-1.9.0.jar:na]
 	at clojure.lang.RestFn.applyTo(RestFn.java:142) ~[clojure-1.9.0.jar:na]
 	at clojure.core$apply.invokeStatic(core.clj:661) ~[clojure-1.9.0.jar:na]
 	at clojure.core$bound_fn_STAR_$fn__5471.doInvoke(core.clj:1995) ~[clojure-1.9.0.jar:na]
 	at clojure.lang.RestFn.invoke(RestFn.java:408) ~[clojure-1.9.0.jar:na]
 	at jepsen.util$real_pmap$launcher__2044$fn__2045.invoke(util.clj:50) ~[jepsen-0.1.12.jar:na]
 	at clojure.core$binding_conveyor_fn$fn__5476.invoke(core.clj:2022) ~[clojure-1.9.0.jar:na]
 	at clojure.lang.AFn.call(AFn.java:18) ~[clojure-1.9.0.jar:na]
 	at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_191]
 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[na:1.8.0_191]
 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[na:1.8.0_191]
 	... 1 common frames omitted
@danielmai
Copy link
Contributor Author

danielmai commented Mar 16, 2019

ntpdate runs OK in each Jepsen node outside of the test:

$ docker exec jepsen-n1 sudo -S -u root bash -c 'cd /; ntpdate -b pool.ntp.org'
16 Mar 00:24:08 ntpdate[6386]: step time server 45.79.187.10 offset -0.014471 sec
$ docker exec jepsen-n2 sudo -S -u root bash -c 'cd /; ntpdate -b pool.ntp.org'
16 Mar 00:24:15 ntpdate[6365]: step time server 45.79.187.10 offset -0.000358 sec
$ docker exec jepsen-n3 sudo -S -u root bash -c 'cd /; ntpdate -b pool.ntp.org'
16 Mar 00:24:22 ntpdate[6365]: step time server 45.79.187.10 offset -0.000179 sec
$ docker exec jepsen-n4 sudo -S -u root bash -c 'cd /; ntpdate -b pool.ntp.org'
16 Mar 00:24:29 ntpdate[6360]: step time server 45.79.187.10 offset -0.000703 sec
$ docker exec jepsen-n5 sudo -S -u root bash -c 'cd /; ntpdate -b pool.ntp.org'
16 Mar 00:24:36 ntpdate[6364]: step time server 45.79.187.10 offset 0.000188 sec

Actually, if these commands are run beforehand then the tests start running as expected.

@aphyr
Copy link
Collaborator

aphyr commented Mar 16, 2019

That's a weird one. I don't know if this is a docker thing, but it looooks like whatever went wrong was some sort of issue with ntpdate: ntpdate[6305]: no server suitable for synchronization found suggests that maybe the server it was trying to hit was unreachable or not provided or something?

I mean, ntpdate isn't going to do anything anyway, because you can't change clocks in docker containers, right? Either way, this is an odd one. You say it's working now?

@danielmai
Copy link
Contributor Author

It works only when ntpdate is run on all the nodes before running lein run test. That’s the workaround we’re using now.

Otherwise, ntpdate reliably complains about no suitable synchronization server.

because you can't change clocks in docker containers, right?

Err... given that, does that mean skew clock tests don’t work with the Docker setup?

@aphyr
Copy link
Collaborator

aphyr commented Mar 16, 2019

It works only when ntpdate is run on all the nodes before running lein run test. That’s the workaround we’re using now.

That is really strange! I'm not sure why ntpdate would say "no server suitable" when run by Jepsen, but be fine running it by SSH, and then be fine when Jepsen runs it afterwards.

Err... given that, does that mean skew clock tests don’t work with the Docker setup?

Yeah, I assume so. It might be different on your platform, but at least on Debian's LXC, clocks aren't namespaced and can't be updated in containers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants