Port linearizable tests to testsuite (bloomberg#328)

* Commit this so I can pull from AWS * Test fixes * Fix register test for testsuite * Fix horrible jdbc bug, add jepsen tests * Disable debug trace * Jepsen tests to normal testsuite * Add kill-cluster test * Fix killcluster test, add sigstopcluster test * Disable debug trace in cinsert test * Redirect stdin for ssh commands in clustered tests * Fix cnonce checking * Changes to clustered tests * Piggy back to reduce the number of durable lsn requests * Static lsns and gen are not pointers, fix setup merge bug * Piggy backing doesn't work: add comment explaining why * Add testloop * Enable wait-for-seqnum trace for all linearizable tests, fix wait-for-seqnum trace bugs. * Clang formatting * Enable genid48 for linearizability tests * Don't allow log_puts if we are a candidate * Fix tabs * Modify makefile so that we can debug setup * Jepsen: use CLUSTER env var for cluster nodes, when available * gitignore: don't track temporary Jepsen files * jepsen register nemesis test: don't use directio This breaks tests within containers; we're disabling it here. I don't think these tests put that much stress on the IO subsystem, and even if they do, Jepsen *likes* slow disks. Helps us catch things. :) * Jepsen tests: fix typos in cluster-nodes * Jepsen: basic cleanup, formatting, notes for later No semantic changes here; just doing some basic formatting, docstrings, and making notes about things I don't understand or would like to fix later. * Jepsen tests: connect to specific (e.g. all) nodes instead of arbitrary ones. * Jepsen: remove copy of knossos source There was a local copy of the knossos source here, which existed so that we could have a modified version of model.clj with a custom model for the comdb2 tests. This commit moves that custom model into the Jepsen test, returning to using knossos as a library. * Jepsen: remove duplicate copy of jepsen source We had a copy of the Jepsen source tree checked in locally, overriding the library version. I don't... think we actually need it here. If anyone would like to make changes, they can open namespaces and redefine specific functions as needed, without copying the full namespace. * Jepsen: upgrade from 0.1.4-SNAPSHOT to 0.1.6 * Jepsen: stateful conn wrapper for register test There's an issue with jdbc conn reuse that could allow a crashed operation's transactional scope to be recovered by a subsequent invocation. We now close the JDBC conn whenever throwing an error, and retry with a fresh connection. We also politely recover from transactions where a connection fails in the initial setup phase of the txn, and introduce a five-second delay to avoid spamming down nodes with requests. * Jepsen: simplify and refactor linearizable client No major changes to semantics, but we've rewritten the register client to have unified error handling across all 3 branches, collapsed nested let statements, removed superfluous logging, added UIDs to completion operations, and removed code that replaced keys in completion operations with the constant 1 instead of the requested key, which I think was a mistake. * Jepsen: upgrade to clj-jdbc 0.6.1, add timeouts everywhere A lack of timeouts allowed tests to block indefinitely when a node went unresponsive. We now have a default 5 second timeout on all SQL queries, which significantly improves test responsiveness. * Verify we are setting sqlreponse.error_code to a valid enum * Jepsen: generalize register test to multiple keys In general, this lets us test more things faster for indefinite time periods, without crashing. This allows us to drop the custom register model and simplify the generators: we use jepsen.independent now to generalize those for us. Also fixed a client bug that wrote to all rows instead of just the given one. Also fixed a bug in the generator which made the test take longer than it needed to, and only emit 10 seconds of operations instead of ops over the whole test. * Jepsen: linearizable client's UIDs are now sequential instead of random * Fix startup election race * Jepsen: remove spurious transactions from register test The register client opened transactions and performed reads prior to writes. We no longer read before writing, which gives the database fewer chances to establish locks or visibility barriers. It also simplifies the code significantly. * Setup: log to /tmp/comdb.log, so Jepsen can provide log bundles * Jepsen: extensive refactoring of all tests Focused primarily on making the register test more rigorous, but also refactored the bank, set, and dirty-reads tests as well. Things are... still a bit of a mess but mostly functional. There's a lagging issue in the register test causing it to hang at the end of the run, which I haven't tracked down yet. * Kyle changes * Recheck master at failed sync-up * Various fixes, create breakloop to nemesize the network outside of a test. * Use msghd.seqnum: netinfo_ptr->seqnum can change after we release the mutex (bloomberg#445) * Check for master on every iteration of the final sync-up loop * Add min_retries flag to cdb2sql & move min & max retry calls to before open * Add logic to cdb2sql for kyle_branch * Change add-record wait-for-seqnum to adaptive version * Fix broken merge in final-syncup, flush breakloop output at nl * This isn't working: commit & pull from the nodes to debug in gdb * Aha- this should allow it. * Additional lrl options * Commit to pull on nodes * Commit to push to nodes * Commit to move to all nodes * Commit * Perform a master-request if a replicant gets a REP_LOG during election. * Short-circuit dummy_add and commit if lock is desired * Log coherency lease changes on a tunable. * We go through weird periods where replication to one node is slow- see if it goes away by disabling udp-acks * Slow-replicant check breaks us for Kyle's tests * A combination of bug-fixes and tuning has gotten us consistently in the 2 - 15 second recovery range. * Allow tester to set timeout from the environment * Port test to cloud machine. * color-code nemesis * Fix dumb print bug * Look at test-file before fixing partition. Print average wait time. * Add iteration count to status line * Jepsen tests: fix nemesis/non-nemesis scheduling. * Bank test: reorganize a bit * tests/tools/get_tests_inorder.sh: fix missing quotes in test expression * Jepsen: G2 test * Jepsen: sketch out an A6 test * Jepsen: A6 tuning * Jepsen: experiment with a6 queries by primary key * Jepsen: clean up unused read path in g2 client * Add NOTSERIAL as a valid protobuf error_code * Jepsen: break out a6 test into its own namespace. * .gitignore: don't ignore all comdb2 directories; just the root binary * Jepsen: bank client reconnects during initial account creation * Jepsen: add atomic writes test * Jepsen: remove superfluous try in bank test * Jepsen: atomic test makefiles * Break out bank test, debug agonizing insertion issue * Jepsen: break out g2 test * Jepsen: clean up a6 test invocation * Jepsen: break out sets test * Jepsen: break out dirty reads test * Jepsen: break out register test * Jepsen: get rid of spurious test selectors * Jepsen: command line runner You can now run `lein run serve` in the jepsen directory to launch a web server for browsing test results, constructing zipfiles with analyses, graphs, and logs, and so on. Running tests no longer requires a test selector and leiningen test in `jepsen/tests`; instead you add a small map to `workloads` in cli.clj, telling Jepsen what the name of that workload is and how to run it. Then run `lein run test --workload register` to run, for instance, the register test. You can also configure the nemesis, concurrency, and runtime for any test: `lein run test --workload g2 --concurrency 5n --time-limit 120` The makefile test targets have been updated to invoke Jepsen tests using this style, instead of the old `lein test` command. * Jepsen: allllll kinds of fixes for the dirty reads test; actually measures stuff now * fix the connection bug that kyle & I were seeing * Allow set-able cursordebug from an sql session * Make jepsen-tests setup-able * Small script changes * Uhh.. somehow this WORKS now.. ? Maybe replaceable params were only part of the issue .. * Fix java api bug (that looked like a bound parameter issue) * Let's be more explicit about this * This was a bug- you can't set hasql inside of a transaction. We seem to pass the G2 test now. * Tweak test-script * Fix atomic writes test (can't set hasql within a transaction) * Re-order a few more hasqls * Fix jepsen_atomic_writes * Make statement level effects the default behavior for jdbc * We are now passing ALL of Kyle's tests * Formatting * Make init_with_genid48 default * Send correct vote-type if elect_highest_committed_gen is enabled * Undo non-changes * Revert this * Revert another NON-change * Remove unneeded formatting * Remove nSetsQuerySent, don't run jepsen tests as part of the normal testsuite. * Use correct string size * Allow a user to override test timeouts. Allow testsuite to core the database on timeout if CORE_ON_TIMEOUT is set. * Remove unused variable * Port ctest executables to tools * This code has been dead for 5 or 6 years .. don't rely on a dumb define, remove it completely * Can't tell what's going on with the core - leave the database up & runnning * Run cinsert_linearizable and register_linearizable from testloop * quick cleanup for script * Commit * Make get_tests_inorder.sh aware of configurable TEST_TIMEOUT * clang formatting * Fix tunables test
Raafey-Khan · Nov 1, 2017 · 2fe8c21 · 2fe8c21
1 parent 3a446fd
commit 2fe8c21
Show file tree

Hide file tree

Showing 304 changed files with 6,844 additions and 9,717 deletions.
diff --git a/Makefile b/Makefile
@@ -3,7 +3,7 @@
 #######################################################
 .PHONY: test_tools
 test_tools: compat_install
-	@cd build && $(MAKE) -s -j blob bound cdb2_client cdb2api_caller cdb2bind comdb2_blobtest comdb2_sqltest crle hatest insert_lots_mt leakcheck localrep overflow_blobtest ptrantest recom selectv serial sicountbug sirace simple_ssl stepper utf8
+	@cd build && $(MAKE) -s -j blob bound cdb2_client cdb2api_caller cdb2bind comdb2_blobtest comdb2_sqltest crle hatest insert_lots_mt leakcheck localrep overflow_blobtest ptrantest recom selectv serial sicountbug sirace simple_ssl stepper utf8 insert register breakloop
 	@ln -f build/tests/tools/blob tests/bloballoc.test/blob
 	@ln -f build/tests/tools/bound tests/tools/bound
 	@ln -f build/tests/tools/cdb2_client tests/cdb2api_so.test/cdb2_client
@@ -26,6 +26,9 @@ test_tools: compat_install
 	@ln -f build/tests/tools/simple_ssl tests/simple_ssl.test/simple_ssl
 	@ln -f build/tests/tools/stepper tests/tools/stepper
 	@ln -f build/tests/tools/utf8 tests/tools/utf8
+	@ln -f build/tests/tools/insert tests/tools/insert
+	@ln -f build/tests/tools/register tests/tools/register
+	@ln -f build/tests/tools/breakloop tests/tools/breakloop
 
 .PHONY: compat_install
 compat_install: all

diff --git a/bdb/bdb_net.c b/bdb/bdb_net.c
@@ -748,6 +748,8 @@ int send_myseqnum_to_master_udp(bdb_state_type *bdb_state)
     return rc;
 }
 
+int gbl_verbose_send_coherency_lease;
+
 void send_coherency_leases(bdb_state_type *bdb_state, int lease_time,
                            int *inc_wait)
 {
@@ -793,7 +795,8 @@ void send_coherency_leases(bdb_state_type *bdb_state, int lease_time,
         /* Assume disconnected node(s) are incoherent */
         *inc_wait = 1;
 
-        if (last_count != count || (now = time(NULL)) - lastpr) {
+        if (gbl_verbose_send_coherency_lease &&
+            (last_count != count || (now = time(NULL)) - lastpr)) {
             char *machs = (char *)malloc(1);
             int machs_len = 0;
             machs[0] = '\0';
@@ -863,9 +866,10 @@ void send_coherency_leases(bdb_state_type *bdb_state, int lease_time,
         } else {
             static time_t lastpr = 0;
             time_t now;
-            if ((now = time(NULL)) - lastpr) {
-                logmsg(LOGMSG_INFO, "%s: not sending to %s\n", __func__,
-                        hostlist[i]);
+            if (gbl_verbose_send_coherency_lease &&
+                (now = time(NULL)) - lastpr) {
+                logmsg(LOGMSG_ERROR, "%s: not sending to %s\n", __func__,
+                       hostlist[i]);
                 lastpr = now;
             }
         }

diff --git a/bdb/cursor.c b/bdb/cursor.c
@@ -7686,6 +7686,12 @@ int bdb_osql_trak(char *sql, unsigned int *status)
         *status |= SQL_DBG_SHADOW;
         return 0;
     }
+
+    if (strncasecmp(sql, "OFF", 3) == 0) {
+        *status = 0;
+        return 0;
+    }
+
     return -1;
 }
 

diff --git a/bdb/file.c b/bdb/file.c
@@ -3169,21 +3169,24 @@ if (!is_real_netinfo(bdb_state->repinfo->netinfo))
         }
     }
 
+    int tmpnode;
+    int attempts = bdb_state->attr->startup_sync_attempts;
+    uint8_t *p_buf = (uint8_t *)&tmpnode;
+    uint8_t *p_buf_end = ((uint8_t *)&tmpnode + sizeof(int));
+
     /*
       PHASE 4:
       finally now that we believe we are caught up and are no longer lying
       about our LSN to the master, lets ask the master to force the LSN
       forward and wait for us to reach the same LSN.  when we pass this
       phase, we are truly cache coherent.
       */
+
+again:
+    buf_put(&(bdb_state->repinfo->master_host), sizeof(int), p_buf, p_buf_end);
+
     if (bdb_state->repinfo->master_host != myhost) {
-        int tmpnode, attempts = bdb_state->attr->startup_sync_attempts;
-        uint8_t *p_buf = (uint8_t *)&tmpnode;
-        uint8_t *p_buf_end = ((uint8_t *)&tmpnode + sizeof(int));
 
-    again:
-        buf_put(&(bdb_state->repinfo->master_host), sizeof(int), p_buf,
-                p_buf_end);
         /* now we have the master checkpoint and WAIT for us to ack the seqnum,
            thus making sure we are actually LIVE */
         rc = net_send_message(
@@ -4719,13 +4722,10 @@ static int bdb_downgrade_int(bdb_state_type *bdb_state, int noelect,
         *downgraded = 0;
 
     retries = 0;
-    while (!bdb_state->repinfo->upgrade_allowed) {
-        if (++retries > 100) {
-            logmsg(LOGMSG_DEBUG, "bdb_downgrade: not allowed (bdb_open has not "
-                            "completed yet)\n");
-            return 0;
-        }
-        poll(NULL, 0, 100);
+    if (!bdb_state->repinfo->upgrade_allowed) {
+        logmsg(LOGMSG_DEBUG, "bdb_downgrade: not allowed (bdb_open has not "
+                             "completed yet)\n");
+        return 0;
     }
 
     /* if we were passed a child, find his parent */

diff --git a/bdb/llmeta.c b/bdb/llmeta.c
@@ -2655,6 +2655,11 @@ int bdb_add_dummy_llmeta(void)
     }
 
 retry:
+    if (bdb_lock_desired(llmeta_bdb_state->parent)) {
+        logmsg(LOGMSG_ERROR, "%s short-circuiting because bdb_lock_desired\n",
+               __func__);
+        return -1;
+    }
     tran = bdb_tran_begin(llmeta_bdb_state, NULL, &bdberr);
     if (tran == NULL)
         goto fail;
@@ -2680,8 +2685,9 @@ int bdb_add_dummy_llmeta(void)
                                           &bdberr);
 
     if (rc == 0) {
-        tran = NULL;
-        rc = bdb_wait_for_seqnum_from_all(llmeta_bdb_state, &ss);
+        int timeoutms;
+        rc = bdb_wait_for_seqnum_from_all_adaptive_newcoh(llmeta_bdb_state, &ss,
+                                                          0, &timeoutms);
     }
     // rc = bdb_tran_commit(llmeta_bdb_state, tran, &bdberr);
     if (rc && bdberr != BDBERR_NOERROR) {

diff --git a/bdb/rep.c b/bdb/rep.c
@@ -1724,6 +1724,8 @@ void net_newnode_rtn(netinfo_type *netinfo_ptr, char *hostname, int portnum)
 
 /* Timestamp of when our coherency lease expires on replicant */
 uint64_t coherency_timestamp = 0;
+int gbl_dump_zero_coherency_timestamp;
+
 char coherency_master[128] = {0};
 
 /* Don't let anything commit on the master until after this */
@@ -1772,6 +1774,8 @@ typedef struct {
     char *host;
 } hostdown_type;
 
+int gbl_reset_on_unelectable_cluster = 1;
+
 void *hostdown_thread(void *arg)
 {
     bdb_state_type *bdb_state;
@@ -1812,7 +1816,7 @@ void *hostdown_thread(void *arg)
     print(bdb_state, "master is %s we are %s\n", master_host,
           bdb_state->repinfo->myhost);
 
-    {
+    if (gbl_reset_on_unelectable_cluster) {
         int num_up, num_connected, electable;
 
         print(bdb_state, "xxx master is %d we are %d\n", master_host,
@@ -3090,6 +3094,13 @@ static int bdb_wait_for_seqnum_from_all_int(bdb_state_type *bdb_state,
             rc = bdb_wait_for_seqnum_from_node_int(bdb_state, seqnum,
                                                    nodelist[i], 1000, __LINE__);
 
+            if (bdb_lock_desired(bdb_state)) {
+                logmsg(LOGMSG_ERROR,
+                       "%s line %d early exit because lock-is-desired\n",
+                       __func__, __LINE__);
+                return (durable_lsns ? BDBERR_NOT_DURABLE : -1);
+            }
+
             if (rc == 0) {
                 base_node = nodelist[i];
                 num_successfully_acked++;
@@ -3165,6 +3176,14 @@ static int bdb_wait_for_seqnum_from_all_int(bdb_state_type *bdb_state,
         rc = bdb_wait_for_seqnum_from_node_int(bdb_state, seqnum, nodelist[i],
                                                waitms, __LINE__);
 
+        if (bdb_lock_desired(bdb_state)) {
+            logmsg(LOGMSG_ERROR,
+                   "%s line %d early exit because lock-is-desired\n", __func__,
+                   __LINE__);
+
+            return (durable_lsns ? BDBERR_NOT_DURABLE : -1);
+        }
+
         if (rc == -999) {
             logmsg(LOGMSG_WARN, "replication timeout to node %s (%d ms), base node "
                             "was %s with %d ms\n",
@@ -3321,8 +3340,11 @@ static int bdb_wait_for_seqnum_from_all_int(bdb_state_type *bdb_state,
              * seqnums can race against each other.  If we got a majority of 
              * these during the commit we are okay */
             if (was_durable && log_compare(&calc_lsn, &seqnum->lsn) < 0) {
-                logmsg(LOGMSG_ERROR, "ERROR: calculate_durable_lsn trails seqnum, "
-                                "but this is durable?\n");
+                logmsg(LOGMSG_ERROR,
+                       "ERROR: calculate_durable_lsn trails seqnum, "
+                       "but this is durable (%d:%d vs %d:%d)?\n",
+                       calc_lsn.file, calc_lsn.offset, seqnum->lsn.file,
+                       seqnum->lsn.offset);
             }
             logmsg(LOGMSG_USER, 
                 "Last txn was %s, tot_connected=%d tot_acked=%d, "
@@ -3769,6 +3791,10 @@ static int process_berkdb(bdb_state_type *bdb_state, char *host, DBT *control,
         } else
             bdb_setmaster(bdb_state, host);
 
+        if (gbl_dump_zero_coherency_timestamp) {
+            logmsg(LOGMSG_ERROR, "%s line %d zero'ing coherency timestamp\n",
+                   __func__, __LINE__);
+        }
         coherency_timestamp = 0;
         break;
 
@@ -3968,7 +3994,25 @@ uint8_t *colease_type_put(const colease_t *p_colease_type, uint8_t *p_buf,
     return p_buf;
 }
 
-uint64_t get_coherency_timestamp(void) { return coherency_timestamp; }
+uint64_t get_coherency_timestamp(void)
+{
+    uint64_t x = coherency_timestamp;
+    if (x == 0) {
+        static uint32_t lastpr;
+        static uint32_t zero_ts_count = 0;
+        uint32_t now;
+
+        zero_ts_count++;
+        if (gbl_dump_zero_coherency_timestamp &&
+            ((now = time(NULL)) - lastpr)) {
+            logmsg(LOGMSG_ERROR,
+                   "%s returning 0 coherency_timestamp, count=%u\n", __func__,
+                   zero_ts_count);
+            lastpr = now;
+        }
+    }
+    return x;
+}
 
 typedef struct start_lsn_response {
     uint32_t gen;
@@ -5446,6 +5490,20 @@ int bdb_debug_logreq(bdb_state_type *bdb_state, int file, int offset)
     return 0;
 }
 
+// Piggy-backing durable LSN requests doesn't work: each thread must make a
+// SEPARATE request for a durable LSN.  Here's the counter-example:
+//
+// 1. Thread A makes a request for a durable LSN from the master - it goes to
+//    the master and is stalled on it's way back to the replicant
+// 2. Thread B writes a record durably
+// 3. Thread C make a request for a durable LSN- instead of going to the master
+//    directly, it gloms onto the already outstanding durable LSN request, and
+//    retrieves the previous durable LSN
+//
+// .. Because Thread C started AFTER Thread B, it should see a durable LSN
+//    corresponding to B's writes
+//
+
 int request_durable_lsn_from_master(bdb_state_type *bdb_state, 
         uint32_t *durable_file, uint32_t *durable_offset, 
         uint32_t *durable_gen) {

diff --git a/berkdb/rep/rep_method.c b/berkdb/rep/rep_method.c
@@ -904,7 +904,8 @@ __rep_elect(dbenv, nsites, priority, timeout, eidp)
 	rep = db_rep->region;
 	dblp = dbenv->lg_handle;
 
-	if (use_committed_gen == dbenv->attr.elect_highest_committed_gen) {
+	/* This sets 'use_committed_gen' and feature-tests simultaneously */
+	if ((use_committed_gen = dbenv->attr.elect_highest_committed_gen)) {
 		MUTEX_LOCK(dbenv, db_rep->rep_mutexp);
 		lsn = rep->committed_lsn;
 		MUTEX_UNLOCK(dbenv, db_rep->rep_mutexp);
@@ -991,12 +992,17 @@ __rep_elect(dbenv, nsites, priority, timeout, eidp)
 	committed_gen = rep->committed_gen;
 	MUTEX_UNLOCK(dbenv, db_rep->rep_mutexp);
 
-	if (use_committed_gen)
+	if (use_committed_gen) {
+		logmsg(LOGMSG_USER, "%s line %d sending REP_GEN_VOTE1 from %s with committed-gen=%d\n",
+			__func__, __LINE__, *eidp, committed_gen);
 		__rep_send_gen_vote(dbenv, &lsn, nsites, priority, tiebreaker,
-		    egen, committed_gen, db_eid_broadcast, REP_GEN_VOTE1);
-	else
+			egen, committed_gen, db_eid_broadcast, REP_GEN_VOTE1);
+	} else {
+		logmsg(LOGMSG_USER, "%s line %d sending REP_VOTE1 from %s (committed-gen=0)\n",
+			__func__, __LINE__, *eidp);
 		__rep_send_vote(dbenv, &lsn, nsites, priority, tiebreaker, egen,
-		    db_eid_broadcast, REP_VOTE1);
+			db_eid_broadcast, REP_VOTE1);
+	}
 
 	ret = __rep_wait(dbenv, timeout, eidp, REP_F_EPHASE1);
 	switch (ret) {
@@ -1081,14 +1087,18 @@ __rep_elect(dbenv, nsites, priority, timeout, eidp)
 			    send_vote != rep->eid)
 				__db_err(dbenv, "Sending vote");
 #endif
-			if (use_committed_gen)
+			if (use_committed_gen) {
+				logmsg(LOGMSG_USER, "%s line %d sending REP_GEN_VOTE2 from %s "
+						"with committed-gen=%d\n", __func__, __LINE__, *eidp,
+						committed_gen);
 				__rep_send_gen_vote(dbenv, NULL, 0, 0, 0, egen,
-				    committed_gen, send_vote, REP_GEN_VOTE2);
-
-			else
+					committed_gen, send_vote, REP_GEN_VOTE2);
+			} else {
+				logmsg(LOGMSG_USER, "%s line %d sending REP_VOTE2 from %s "
+						"(committed-gen=0)\n", __func__, __LINE__, *eidp);
 				__rep_send_vote(dbenv, NULL, 0, 0, 0, egen,
-				    send_vote, REP_VOTE2);
-
+					send_vote, REP_VOTE2);
+			}
 		}
 phase2:
 		ret = __rep_wait(dbenv, timeout, eidp, REP_F_EPHASE2);