Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Port linearizable tests to testsuite (bloomberg#328)
* Commit this so I can pull from AWS * Test fixes * Fix register test for testsuite * Fix horrible jdbc bug, add jepsen tests * Disable debug trace * Jepsen tests to normal testsuite * Add kill-cluster test * Fix killcluster test, add sigstopcluster test * Disable debug trace in cinsert test * Redirect stdin for ssh commands in clustered tests * Fix cnonce checking * Changes to clustered tests * Piggy back to reduce the number of durable lsn requests * Static lsns and gen are not pointers, fix setup merge bug * Piggy backing doesn't work: add comment explaining why * Add testloop * Enable wait-for-seqnum trace for all linearizable tests, fix wait-for-seqnum trace bugs. * Clang formatting * Enable genid48 for linearizability tests * Don't allow log_puts if we are a candidate * Fix tabs * Modify makefile so that we can debug setup * Jepsen: use CLUSTER env var for cluster nodes, when available * gitignore: don't track temporary Jepsen files * jepsen register nemesis test: don't use directio This breaks tests within containers; we're disabling it here. I don't think these tests put that much stress on the IO subsystem, and even if they do, Jepsen *likes* slow disks. Helps us catch things. :) * Jepsen tests: fix typos in cluster-nodes * Jepsen: basic cleanup, formatting, notes for later No semantic changes here; just doing some basic formatting, docstrings, and making notes about things I don't understand or would like to fix later. * Jepsen tests: connect to specific (e.g. all) nodes instead of arbitrary ones. * Jepsen: remove copy of knossos source There was a local copy of the knossos source here, which existed so that we could have a modified version of model.clj with a custom model for the comdb2 tests. This commit moves that custom model into the Jepsen test, returning to using knossos as a library. * Jepsen: remove duplicate copy of jepsen source We had a copy of the Jepsen source tree checked in locally, overriding the library version. I don't... think we actually need it here. If anyone would like to make changes, they can open namespaces and redefine specific functions as needed, without copying the full namespace. * Jepsen: upgrade from 0.1.4-SNAPSHOT to 0.1.6 * Jepsen: stateful conn wrapper for register test There's an issue with jdbc conn reuse that could allow a crashed operation's transactional scope to be recovered by a subsequent invocation. We now close the JDBC conn whenever throwing an error, and retry with a fresh connection. We also politely recover from transactions where a connection fails in the initial setup phase of the txn, and introduce a five-second delay to avoid spamming down nodes with requests. * Jepsen: simplify and refactor linearizable client No major changes to semantics, but we've rewritten the register client to have unified error handling across all 3 branches, collapsed nested let statements, removed superfluous logging, added UIDs to completion operations, and removed code that replaced keys in completion operations with the constant 1 instead of the requested key, which I think was a mistake. * Jepsen: upgrade to clj-jdbc 0.6.1, add timeouts everywhere A lack of timeouts allowed tests to block indefinitely when a node went unresponsive. We now have a default 5 second timeout on all SQL queries, which significantly improves test responsiveness. * Verify we are setting sqlreponse.error_code to a valid enum * Jepsen: generalize register test to multiple keys In general, this lets us test more things faster for indefinite time periods, without crashing. This allows us to drop the custom register model and simplify the generators: we use jepsen.independent now to generalize those for us. Also fixed a client bug that wrote to all rows instead of just the given one. Also fixed a bug in the generator which made the test take longer than it needed to, and only emit 10 seconds of operations instead of ops over the whole test. * Jepsen: linearizable client's UIDs are now sequential instead of random * Fix startup election race * Jepsen: remove spurious transactions from register test The register client opened transactions and performed reads prior to writes. We no longer read before writing, which gives the database fewer chances to establish locks or visibility barriers. It also simplifies the code significantly. * Setup: log to /tmp/comdb.log, so Jepsen can provide log bundles * Jepsen: extensive refactoring of all tests Focused primarily on making the register test more rigorous, but also refactored the bank, set, and dirty-reads tests as well. Things are... still a bit of a mess but mostly functional. There's a lagging issue in the register test causing it to hang at the end of the run, which I haven't tracked down yet. * Kyle changes * Recheck master at failed sync-up * Various fixes, create breakloop to nemesize the network outside of a test. * Use msghd.seqnum: netinfo_ptr->seqnum can change after we release the mutex (bloomberg#445) * Check for master on every iteration of the final sync-up loop * Add min_retries flag to cdb2sql & move min & max retry calls to before open * Add logic to cdb2sql for kyle_branch * Change add-record wait-for-seqnum to adaptive version * Fix broken merge in final-syncup, flush breakloop output at nl * This isn't working: commit & pull from the nodes to debug in gdb * Aha- this should allow it. * Additional lrl options * Commit to pull on nodes * Commit to push to nodes * Commit to move to all nodes * Commit * Perform a master-request if a replicant gets a REP_LOG during election. * Short-circuit dummy_add and commit if lock is desired * Log coherency lease changes on a tunable. * We go through weird periods where replication to one node is slow- see if it goes away by disabling udp-acks * Slow-replicant check breaks us for Kyle's tests * A combination of bug-fixes and tuning has gotten us consistently in the 2 - 15 second recovery range. * Allow tester to set timeout from the environment * Port test to cloud machine. * color-code nemesis * Fix dumb print bug * Look at test-file before fixing partition. Print average wait time. * Add iteration count to status line * Jepsen tests: fix nemesis/non-nemesis scheduling. * Bank test: reorganize a bit * tests/tools/get_tests_inorder.sh: fix missing quotes in test expression * Jepsen: G2 test * Jepsen: sketch out an A6 test * Jepsen: A6 tuning * Jepsen: experiment with a6 queries by primary key * Jepsen: clean up unused read path in g2 client * Add NOTSERIAL as a valid protobuf error_code * Jepsen: break out a6 test into its own namespace. * .gitignore: don't ignore all comdb2 directories; just the root binary * Jepsen: bank client reconnects during initial account creation * Jepsen: add atomic writes test * Jepsen: remove superfluous try in bank test * Jepsen: atomic test makefiles * Break out bank test, debug agonizing insertion issue * Jepsen: break out g2 test * Jepsen: clean up a6 test invocation * Jepsen: break out sets test * Jepsen: break out dirty reads test * Jepsen: break out register test * Jepsen: get rid of spurious test selectors * Jepsen: command line runner You can now run `lein run serve` in the jepsen directory to launch a web server for browsing test results, constructing zipfiles with analyses, graphs, and logs, and so on. Running tests no longer requires a test selector and leiningen test in `jepsen/tests`; instead you add a small map to `workloads` in cli.clj, telling Jepsen what the name of that workload is and how to run it. Then run `lein run test --workload register` to run, for instance, the register test. You can also configure the nemesis, concurrency, and runtime for any test: `lein run test --workload g2 --concurrency 5n --time-limit 120` The makefile test targets have been updated to invoke Jepsen tests using this style, instead of the old `lein test` command. * Jepsen: allllll kinds of fixes for the dirty reads test; actually measures stuff now * fix the connection bug that kyle & I were seeing * Allow set-able cursordebug from an sql session * Make jepsen-tests setup-able * Small script changes * Uhh.. somehow this WORKS now.. ? Maybe replaceable params were only part of the issue .. * Fix java api bug (that looked like a bound parameter issue) * Let's be more explicit about this * This was a bug- you can't set hasql inside of a transaction. We seem to pass the G2 test now. * Tweak test-script * Fix atomic writes test (can't set hasql within a transaction) * Re-order a few more hasqls * Fix jepsen_atomic_writes * Make statement level effects the default behavior for jdbc * We are now passing ALL of Kyle's tests * Formatting * Make init_with_genid48 default * Send correct vote-type if elect_highest_committed_gen is enabled * Undo non-changes * Revert this * Revert another NON-change * Remove unneeded formatting * Remove nSetsQuerySent, don't run jepsen tests as part of the normal testsuite. * Use correct string size * Allow a user to override test timeouts. Allow testsuite to core the database on timeout if CORE_ON_TIMEOUT is set. * Remove unused variable * Port ctest executables to tools * This code has been dead for 5 or 6 years .. don't rely on a dumb define, remove it completely * Can't tell what's going on with the core - leave the database up & runnning * Run cinsert_linearizable and register_linearizable from testloop * quick cleanup for script * Commit * Make get_tests_inorder.sh aware of configurable TEST_TIMEOUT * clang formatting * Fix tunables test
- Loading branch information