Skip to content

Commit

Permalink
Port linearizable tests to testsuite (bloomberg#328)
Browse files Browse the repository at this point in the history
* Commit this so I can pull from AWS

* Test fixes

* Fix register test for testsuite

* Fix horrible jdbc bug, add jepsen tests

* Disable debug trace

* Jepsen tests to normal testsuite

* Add kill-cluster test

* Fix killcluster test, add sigstopcluster test

* Disable debug trace in cinsert test

* Redirect stdin for ssh commands in clustered tests

* Fix cnonce checking

* Changes to clustered tests

* Piggy back to reduce the number of durable lsn requests

* Static lsns and gen are not pointers, fix setup merge bug

* Piggy backing doesn't work: add comment explaining why

* Add testloop

* Enable wait-for-seqnum trace for all linearizable tests, fix wait-for-seqnum trace bugs.

* Clang formatting

* Enable genid48 for linearizability tests

* Don't allow log_puts if we are a candidate

* Fix tabs

* Modify makefile so that we can debug setup

* Jepsen: use CLUSTER env var for cluster nodes, when available

* gitignore: don't track temporary Jepsen files

* jepsen register nemesis test: don't use directio

This breaks tests within containers; we're disabling it here. I don't
think these tests put that much stress on the IO subsystem, and even if
they do, Jepsen *likes* slow disks. Helps us catch things. :)

* Jepsen tests: fix typos in cluster-nodes

* Jepsen: basic cleanup, formatting, notes for later

No semantic changes here; just doing some basic formatting, docstrings,
and making notes about things I don't understand or would like to fix
later.

* Jepsen tests: connect to specific (e.g. all) nodes instead of arbitrary ones.

* Jepsen: remove copy of knossos source

There was a local copy of the knossos source here, which existed so that
we could have a modified version of model.clj with a custom model for
the comdb2 tests. This commit moves that custom model into the Jepsen
test, returning to using knossos as a library.

* Jepsen: remove duplicate copy of jepsen source

We had a copy of the Jepsen source tree checked in locally,
overriding the library version. I don't... think we actually need it
here. If anyone would like to make changes, they can open namespaces and
redefine specific functions as needed, without copying the full
namespace.

* Jepsen: upgrade from 0.1.4-SNAPSHOT to 0.1.6

* Jepsen: stateful conn wrapper for register test

There's an issue with jdbc conn reuse that could allow a crashed
operation's transactional scope to be recovered by a subsequent
invocation. We now close the JDBC conn whenever throwing an error, and
retry with a fresh connection.

We also politely recover from transactions where a connection fails in
the initial setup phase of the txn, and introduce a five-second delay to
avoid spamming down nodes with requests.

* Jepsen: simplify and refactor linearizable client

No major changes to semantics, but we've rewritten the register client
to have unified error handling across all 3 branches, collapsed nested
let statements, removed superfluous logging, added UIDs to completion
operations, and removed code that replaced keys in completion operations
with the constant 1 instead of the requested key, which I think was a
mistake.

* Jepsen: upgrade to clj-jdbc 0.6.1, add timeouts everywhere

A lack of timeouts allowed tests to block indefinitely when a node went
unresponsive. We now have a default 5 second timeout on all SQL queries,
which significantly improves test responsiveness.

* Verify we are setting sqlreponse.error_code to a valid enum

* Jepsen: generalize register test to multiple keys

In general, this lets us test more things faster for indefinite time
periods, without crashing. This allows us to drop the custom register
model and simplify the generators: we use jepsen.independent now to
generalize those for us.  Also fixed a client bug that wrote to all rows
instead of just the given one. Also fixed a bug in the generator which
made the test take longer than it needed to, and only emit 10 seconds of
operations instead of ops over the whole test.

* Jepsen: linearizable client's UIDs are now sequential instead of random

* Fix startup election race

* Jepsen: remove spurious transactions from register test

The register client opened transactions and performed reads prior to
writes. We no longer read before writing, which gives the database fewer
chances to establish locks or visibility barriers. It also simplifies
the code significantly.

* Setup: log to /tmp/comdb.log, so Jepsen can provide log bundles

* Jepsen: extensive refactoring of all tests

Focused primarily on making the register test more rigorous, but also
refactored the bank, set, and dirty-reads tests as well. Things are...
still a bit of a mess but mostly functional. There's a lagging issue in
the register test causing it to hang at the end of the run, which I
haven't tracked down yet.

* Kyle changes

* Recheck master at failed sync-up

* Various fixes, create breakloop to nemesize the network outside of a test.

* Use msghd.seqnum: netinfo_ptr->seqnum can change after we release the mutex (bloomberg#445)

* Check for master on every iteration of the final sync-up loop

* Add min_retries flag to cdb2sql & move min & max retry calls to before open

* Add logic to cdb2sql for kyle_branch

* Change add-record wait-for-seqnum to adaptive version

* Fix broken merge in final-syncup, flush breakloop output at nl

* This isn't working: commit & pull from the nodes to debug in gdb

* Aha- this should allow it.

* Additional lrl options

* Commit to pull on nodes

* Commit to push to nodes

* Commit to move to all nodes

* Commit

* Perform a master-request if a replicant gets a REP_LOG during election.

* Short-circuit dummy_add and commit if lock is desired

* Log coherency lease changes on a tunable.

* We go through weird periods where replication to one node is slow- see if it goes away by disabling udp-acks

* Slow-replicant check breaks us for Kyle's tests

* A combination of bug-fixes and tuning has gotten us consistently in the 2 - 15 second recovery range.

* Allow tester to set timeout from the environment

* Port test to cloud machine.

* color-code nemesis

* Fix dumb print bug

* Look at test-file before fixing partition.  Print average wait time.

* Add iteration count to status line

* Jepsen tests: fix nemesis/non-nemesis scheduling.

* Bank test: reorganize a bit

* tests/tools/get_tests_inorder.sh: fix missing quotes in test expression

* Jepsen: G2 test

* Jepsen: sketch out an A6 test

* Jepsen: A6 tuning

* Jepsen: experiment with a6 queries by primary key

* Jepsen: clean up unused read path in g2 client

* Add NOTSERIAL as a valid protobuf error_code

* Jepsen: break out a6 test into its own namespace.

* .gitignore: don't ignore all comdb2 directories; just the root binary

* Jepsen: bank client reconnects during initial account creation

* Jepsen: add atomic writes test

* Jepsen: remove superfluous try in bank test

* Jepsen: atomic test makefiles

* Break out bank test, debug agonizing insertion issue

* Jepsen: break out g2 test

* Jepsen: clean up a6 test invocation

* Jepsen: break out sets test

* Jepsen: break out dirty reads test

* Jepsen: break out register test

* Jepsen: get rid of spurious test selectors

* Jepsen: command line runner

You can now run

`lein run serve`

in the jepsen directory to launch a web server for browsing test
results, constructing zipfiles with analyses, graphs, and logs, and so
on.

Running tests no longer requires a test selector and leiningen test in
`jepsen/tests`; instead you add a small map to `workloads` in cli.clj,
telling Jepsen what the name of that workload is and how to run it. Then
run

`lein run test --workload register`

to run, for instance, the register test. You can also configure the
nemesis, concurrency, and runtime for any test:

`lein run test --workload g2 --concurrency 5n --time-limit 120`

The makefile test targets have been updated to invoke Jepsen tests using
this style, instead of the old `lein test` command.

* Jepsen: allllll kinds of fixes for the dirty reads test; actually measures stuff now

* fix the connection bug that kyle & I were seeing

* Allow set-able cursordebug from an sql session

* Make jepsen-tests setup-able

* Small script changes

* Uhh.. somehow this WORKS now.. ?  Maybe replaceable params were only part of the issue ..

* Fix java api bug (that looked like a bound parameter issue)

* Let's be more explicit about this

* This was a bug- you can't set hasql inside of a transaction.  We seem to pass the G2 test now.

* Tweak test-script

* Fix atomic writes test (can't set hasql within a transaction)

* Re-order a few more hasqls

* Fix jepsen_atomic_writes

* Make statement level effects the default behavior for jdbc

* We are now passing ALL of Kyle's tests

* Formatting

* Make init_with_genid48 default

* Send correct vote-type if elect_highest_committed_gen is enabled

* Undo non-changes

* Revert this

* Revert another NON-change

* Remove unneeded formatting

* Remove nSetsQuerySent, don't run jepsen tests as part of the normal testsuite.

* Use correct string size

* Allow a user to override test timeouts.  Allow testsuite to core the database on timeout if CORE_ON_TIMEOUT is set.

* Remove unused variable

* Port ctest executables to tools

* This code has been dead for 5 or 6 years .. don't rely on a dumb define, remove it completely

* Can't tell what's going on with the core - leave the database up & runnning

* Run cinsert_linearizable and register_linearizable from testloop

* quick cleanup for script

* Commit

* Make get_tests_inorder.sh aware of configurable TEST_TIMEOUT

* clang formatting

* Fix tunables test
  • Loading branch information
markhannum authored Nov 1, 2017
1 parent 3a446fd commit 2fe8c21
Show file tree
Hide file tree
Showing 304 changed files with 6,844 additions and 9,717 deletions.
5 changes: 4 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
#######################################################
.PHONY: test_tools
test_tools: compat_install
@cd build && $(MAKE) -s -j blob bound cdb2_client cdb2api_caller cdb2bind comdb2_blobtest comdb2_sqltest crle hatest insert_lots_mt leakcheck localrep overflow_blobtest ptrantest recom selectv serial sicountbug sirace simple_ssl stepper utf8
@cd build && $(MAKE) -s -j blob bound cdb2_client cdb2api_caller cdb2bind comdb2_blobtest comdb2_sqltest crle hatest insert_lots_mt leakcheck localrep overflow_blobtest ptrantest recom selectv serial sicountbug sirace simple_ssl stepper utf8 insert register breakloop
@ln -f build/tests/tools/blob tests/bloballoc.test/blob
@ln -f build/tests/tools/bound tests/tools/bound
@ln -f build/tests/tools/cdb2_client tests/cdb2api_so.test/cdb2_client
Expand All @@ -26,6 +26,9 @@ test_tools: compat_install
@ln -f build/tests/tools/simple_ssl tests/simple_ssl.test/simple_ssl
@ln -f build/tests/tools/stepper tests/tools/stepper
@ln -f build/tests/tools/utf8 tests/tools/utf8
@ln -f build/tests/tools/insert tests/tools/insert
@ln -f build/tests/tools/register tests/tools/register
@ln -f build/tests/tools/breakloop tests/tools/breakloop

.PHONY: compat_install
compat_install: all
Expand Down
12 changes: 8 additions & 4 deletions bdb/bdb_net.c
Original file line number Diff line number Diff line change
Expand Up @@ -748,6 +748,8 @@ int send_myseqnum_to_master_udp(bdb_state_type *bdb_state)
return rc;
}

int gbl_verbose_send_coherency_lease;

void send_coherency_leases(bdb_state_type *bdb_state, int lease_time,
int *inc_wait)
{
Expand Down Expand Up @@ -793,7 +795,8 @@ void send_coherency_leases(bdb_state_type *bdb_state, int lease_time,
/* Assume disconnected node(s) are incoherent */
*inc_wait = 1;

if (last_count != count || (now = time(NULL)) - lastpr) {
if (gbl_verbose_send_coherency_lease &&
(last_count != count || (now = time(NULL)) - lastpr)) {
char *machs = (char *)malloc(1);
int machs_len = 0;
machs[0] = '\0';
Expand Down Expand Up @@ -863,9 +866,10 @@ void send_coherency_leases(bdb_state_type *bdb_state, int lease_time,
} else {
static time_t lastpr = 0;
time_t now;
if ((now = time(NULL)) - lastpr) {
logmsg(LOGMSG_INFO, "%s: not sending to %s\n", __func__,
hostlist[i]);
if (gbl_verbose_send_coherency_lease &&
(now = time(NULL)) - lastpr) {
logmsg(LOGMSG_ERROR, "%s: not sending to %s\n", __func__,
hostlist[i]);
lastpr = now;
}
}
Expand Down
6 changes: 6 additions & 0 deletions bdb/cursor.c
Original file line number Diff line number Diff line change
Expand Up @@ -7686,6 +7686,12 @@ int bdb_osql_trak(char *sql, unsigned int *status)
*status |= SQL_DBG_SHADOW;
return 0;
}

if (strncasecmp(sql, "OFF", 3) == 0) {
*status = 0;
return 0;
}

return -1;
}

Expand Down
26 changes: 13 additions & 13 deletions bdb/file.c
Original file line number Diff line number Diff line change
Expand Up @@ -3169,21 +3169,24 @@ if (!is_real_netinfo(bdb_state->repinfo->netinfo))
}
}

int tmpnode;
int attempts = bdb_state->attr->startup_sync_attempts;
uint8_t *p_buf = (uint8_t *)&tmpnode;
uint8_t *p_buf_end = ((uint8_t *)&tmpnode + sizeof(int));

/*
PHASE 4:
finally now that we believe we are caught up and are no longer lying
about our LSN to the master, lets ask the master to force the LSN
forward and wait for us to reach the same LSN. when we pass this
phase, we are truly cache coherent.
*/

again:
buf_put(&(bdb_state->repinfo->master_host), sizeof(int), p_buf, p_buf_end);

if (bdb_state->repinfo->master_host != myhost) {
int tmpnode, attempts = bdb_state->attr->startup_sync_attempts;
uint8_t *p_buf = (uint8_t *)&tmpnode;
uint8_t *p_buf_end = ((uint8_t *)&tmpnode + sizeof(int));

again:
buf_put(&(bdb_state->repinfo->master_host), sizeof(int), p_buf,
p_buf_end);
/* now we have the master checkpoint and WAIT for us to ack the seqnum,
thus making sure we are actually LIVE */
rc = net_send_message(
Expand Down Expand Up @@ -4719,13 +4722,10 @@ static int bdb_downgrade_int(bdb_state_type *bdb_state, int noelect,
*downgraded = 0;

retries = 0;
while (!bdb_state->repinfo->upgrade_allowed) {
if (++retries > 100) {
logmsg(LOGMSG_DEBUG, "bdb_downgrade: not allowed (bdb_open has not "
"completed yet)\n");
return 0;
}
poll(NULL, 0, 100);
if (!bdb_state->repinfo->upgrade_allowed) {
logmsg(LOGMSG_DEBUG, "bdb_downgrade: not allowed (bdb_open has not "
"completed yet)\n");
return 0;
}

/* if we were passed a child, find his parent */
Expand Down
10 changes: 8 additions & 2 deletions bdb/llmeta.c
Original file line number Diff line number Diff line change
Expand Up @@ -2655,6 +2655,11 @@ int bdb_add_dummy_llmeta(void)
}

retry:
if (bdb_lock_desired(llmeta_bdb_state->parent)) {
logmsg(LOGMSG_ERROR, "%s short-circuiting because bdb_lock_desired\n",
__func__);
return -1;
}
tran = bdb_tran_begin(llmeta_bdb_state, NULL, &bdberr);
if (tran == NULL)
goto fail;
Expand All @@ -2680,8 +2685,9 @@ int bdb_add_dummy_llmeta(void)
&bdberr);

if (rc == 0) {
tran = NULL;
rc = bdb_wait_for_seqnum_from_all(llmeta_bdb_state, &ss);
int timeoutms;
rc = bdb_wait_for_seqnum_from_all_adaptive_newcoh(llmeta_bdb_state, &ss,
0, &timeoutms);
}
// rc = bdb_tran_commit(llmeta_bdb_state, tran, &bdberr);
if (rc && bdberr != BDBERR_NOERROR) {
Expand Down
66 changes: 62 additions & 4 deletions bdb/rep.c
Original file line number Diff line number Diff line change
Expand Up @@ -1724,6 +1724,8 @@ void net_newnode_rtn(netinfo_type *netinfo_ptr, char *hostname, int portnum)

/* Timestamp of when our coherency lease expires on replicant */
uint64_t coherency_timestamp = 0;
int gbl_dump_zero_coherency_timestamp;

char coherency_master[128] = {0};

/* Don't let anything commit on the master until after this */
Expand Down Expand Up @@ -1772,6 +1774,8 @@ typedef struct {
char *host;
} hostdown_type;

int gbl_reset_on_unelectable_cluster = 1;

void *hostdown_thread(void *arg)
{
bdb_state_type *bdb_state;
Expand Down Expand Up @@ -1812,7 +1816,7 @@ void *hostdown_thread(void *arg)
print(bdb_state, "master is %s we are %s\n", master_host,
bdb_state->repinfo->myhost);

{
if (gbl_reset_on_unelectable_cluster) {
int num_up, num_connected, electable;

print(bdb_state, "xxx master is %d we are %d\n", master_host,
Expand Down Expand Up @@ -3090,6 +3094,13 @@ static int bdb_wait_for_seqnum_from_all_int(bdb_state_type *bdb_state,
rc = bdb_wait_for_seqnum_from_node_int(bdb_state, seqnum,
nodelist[i], 1000, __LINE__);

if (bdb_lock_desired(bdb_state)) {
logmsg(LOGMSG_ERROR,
"%s line %d early exit because lock-is-desired\n",
__func__, __LINE__);
return (durable_lsns ? BDBERR_NOT_DURABLE : -1);
}

if (rc == 0) {
base_node = nodelist[i];
num_successfully_acked++;
Expand Down Expand Up @@ -3165,6 +3176,14 @@ static int bdb_wait_for_seqnum_from_all_int(bdb_state_type *bdb_state,
rc = bdb_wait_for_seqnum_from_node_int(bdb_state, seqnum, nodelist[i],
waitms, __LINE__);

if (bdb_lock_desired(bdb_state)) {
logmsg(LOGMSG_ERROR,
"%s line %d early exit because lock-is-desired\n", __func__,
__LINE__);

return (durable_lsns ? BDBERR_NOT_DURABLE : -1);
}

if (rc == -999) {
logmsg(LOGMSG_WARN, "replication timeout to node %s (%d ms), base node "
"was %s with %d ms\n",
Expand Down Expand Up @@ -3321,8 +3340,11 @@ static int bdb_wait_for_seqnum_from_all_int(bdb_state_type *bdb_state,
* seqnums can race against each other. If we got a majority of
* these during the commit we are okay */
if (was_durable && log_compare(&calc_lsn, &seqnum->lsn) < 0) {
logmsg(LOGMSG_ERROR, "ERROR: calculate_durable_lsn trails seqnum, "
"but this is durable?\n");
logmsg(LOGMSG_ERROR,
"ERROR: calculate_durable_lsn trails seqnum, "
"but this is durable (%d:%d vs %d:%d)?\n",
calc_lsn.file, calc_lsn.offset, seqnum->lsn.file,
seqnum->lsn.offset);
}
logmsg(LOGMSG_USER,
"Last txn was %s, tot_connected=%d tot_acked=%d, "
Expand Down Expand Up @@ -3769,6 +3791,10 @@ static int process_berkdb(bdb_state_type *bdb_state, char *host, DBT *control,
} else
bdb_setmaster(bdb_state, host);

if (gbl_dump_zero_coherency_timestamp) {
logmsg(LOGMSG_ERROR, "%s line %d zero'ing coherency timestamp\n",
__func__, __LINE__);
}
coherency_timestamp = 0;
break;

Expand Down Expand Up @@ -3968,7 +3994,25 @@ uint8_t *colease_type_put(const colease_t *p_colease_type, uint8_t *p_buf,
return p_buf;
}

uint64_t get_coherency_timestamp(void) { return coherency_timestamp; }
uint64_t get_coherency_timestamp(void)
{
uint64_t x = coherency_timestamp;
if (x == 0) {
static uint32_t lastpr;
static uint32_t zero_ts_count = 0;
uint32_t now;

zero_ts_count++;
if (gbl_dump_zero_coherency_timestamp &&
((now = time(NULL)) - lastpr)) {
logmsg(LOGMSG_ERROR,
"%s returning 0 coherency_timestamp, count=%u\n", __func__,
zero_ts_count);
lastpr = now;
}
}
return x;
}

typedef struct start_lsn_response {
uint32_t gen;
Expand Down Expand Up @@ -5446,6 +5490,20 @@ int bdb_debug_logreq(bdb_state_type *bdb_state, int file, int offset)
return 0;
}

// Piggy-backing durable LSN requests doesn't work: each thread must make a
// SEPARATE request for a durable LSN. Here's the counter-example:
//
// 1. Thread A makes a request for a durable LSN from the master - it goes to
// the master and is stalled on it's way back to the replicant
// 2. Thread B writes a record durably
// 3. Thread C make a request for a durable LSN- instead of going to the master
// directly, it gloms onto the already outstanding durable LSN request, and
// retrieves the previous durable LSN
//
// .. Because Thread C started AFTER Thread B, it should see a durable LSN
// corresponding to B's writes
//

int request_durable_lsn_from_master(bdb_state_type *bdb_state,
uint32_t *durable_file, uint32_t *durable_offset,
uint32_t *durable_gen) {
Expand Down
32 changes: 21 additions & 11 deletions berkdb/rep/rep_method.c
Original file line number Diff line number Diff line change
Expand Up @@ -904,7 +904,8 @@ __rep_elect(dbenv, nsites, priority, timeout, eidp)
rep = db_rep->region;
dblp = dbenv->lg_handle;

if (use_committed_gen == dbenv->attr.elect_highest_committed_gen) {
/* This sets 'use_committed_gen' and feature-tests simultaneously */
if ((use_committed_gen = dbenv->attr.elect_highest_committed_gen)) {
MUTEX_LOCK(dbenv, db_rep->rep_mutexp);
lsn = rep->committed_lsn;
MUTEX_UNLOCK(dbenv, db_rep->rep_mutexp);
Expand Down Expand Up @@ -991,12 +992,17 @@ __rep_elect(dbenv, nsites, priority, timeout, eidp)
committed_gen = rep->committed_gen;
MUTEX_UNLOCK(dbenv, db_rep->rep_mutexp);

if (use_committed_gen)
if (use_committed_gen) {
logmsg(LOGMSG_USER, "%s line %d sending REP_GEN_VOTE1 from %s with committed-gen=%d\n",
__func__, __LINE__, *eidp, committed_gen);
__rep_send_gen_vote(dbenv, &lsn, nsites, priority, tiebreaker,
egen, committed_gen, db_eid_broadcast, REP_GEN_VOTE1);
else
egen, committed_gen, db_eid_broadcast, REP_GEN_VOTE1);
} else {
logmsg(LOGMSG_USER, "%s line %d sending REP_VOTE1 from %s (committed-gen=0)\n",
__func__, __LINE__, *eidp);
__rep_send_vote(dbenv, &lsn, nsites, priority, tiebreaker, egen,
db_eid_broadcast, REP_VOTE1);
db_eid_broadcast, REP_VOTE1);
}

ret = __rep_wait(dbenv, timeout, eidp, REP_F_EPHASE1);
switch (ret) {
Expand Down Expand Up @@ -1081,14 +1087,18 @@ __rep_elect(dbenv, nsites, priority, timeout, eidp)
send_vote != rep->eid)
__db_err(dbenv, "Sending vote");
#endif
if (use_committed_gen)
if (use_committed_gen) {
logmsg(LOGMSG_USER, "%s line %d sending REP_GEN_VOTE2 from %s "
"with committed-gen=%d\n", __func__, __LINE__, *eidp,
committed_gen);
__rep_send_gen_vote(dbenv, NULL, 0, 0, 0, egen,
committed_gen, send_vote, REP_GEN_VOTE2);

else
committed_gen, send_vote, REP_GEN_VOTE2);
} else {
logmsg(LOGMSG_USER, "%s line %d sending REP_VOTE2 from %s "
"(committed-gen=0)\n", __func__, __LINE__, *eidp);
__rep_send_vote(dbenv, NULL, 0, 0, 0, egen,
send_vote, REP_VOTE2);

send_vote, REP_VOTE2);
}
}
phase2:
ret = __rep_wait(dbenv, timeout, eidp, REP_F_EPHASE2);
Expand Down
Loading

0 comments on commit 2fe8c21

Please sign in to comment.