Stale read with loss of un-fsynced writes #14140

aphyr · 2022-06-21T15:47:27Z

What happened?

In this Jepsen test involving the loss of un-fsynced writes (via lazyfs) in a five-node cluster of etcd 3.5.3 nodes, we observed a stale read (a linearizability violation) on a single key despite not using serializable reads.

Numbering transactions top-to-bottom as T1, T2, T3, and T4, T3 used an etcd transaction guarded by an equality comparison on revision to appended the value 16 to key 180. T4 observed T3's write: it read key 180 and saw it ended in [... 15 16]. However, T2, which began executing almost five seconds after T4 completed (denoted by rt edges), read key 180 and saw a previous value, ending in [... 15].

Again, this relies on lazyfs, which we're still sanding off bugs in. I'm not entirely confident in this finding, but I would like to register it in case y'all can reproduce it using your own tests. :-)

What did you expect to happen?

I expected reads to be linearizable, even with the loss of un-fsynced writes.

How can we reproduce it (as minimally and precisely as possible)?

With https://github.com/jepsen-io/etcd jepsen-io/etcd@181656b, run:

lein run test-all -w append --rate 1000 --concurrency 4n --lazyfs --nemesis kill --time-limit 300 --test-count 10 --nemesis-interval 20

Anything else we need to know?

No response

Etcd version (please run commands below)

$ etcd --version
etcd Version: 3.5.3
Git SHA: 0452feec7
Go Version: go1.16.15
Go OS/Arch: linux/amd64

$ etcdctl version
etcdctl version: 3.5.3
API version: 3.5

Etcd configuration (command line flags or environment variables)

etcd --enable-v2 --log-outputs stderr --logger zap --name n1 --listen-peer-urls http://192.168.122.101:2380/ --listen-client-urls http://192.168.122.101:2379/ --advertise-client-urls http://192.168.122.101:2379/ --initial-cluster-state new --initial-advertise-peer-urls http://192.168.122.101:2380/ --initial-cluster n1=[http://192.168.122.101:2380,n2=http://192.168.122.102:2380,n3=http://192.168.122.103:2380,n4=http://192.168.122.104:2380,n5=http://192.168.122.105:2380](http://192.168.122.101:2380%2Cn2%3Dhttp//192.168.122.102:2380,n3=http://192.168.122.103:2380,n4=http://192.168.122.104:2380,n5=http://192.168.122.105:2380) --snapshot-count 100 --experimental-initial-corrupt-check

Etcd debug information (please run commands blow, feel free to obfuscate the IP address or FQDN in the output)


admin@ip-172-31-9-53:/opt/etcd$ ./etcdctl --endpoints http://172.31.9.53:2379  member list -w table
+------------------+---------+--------------------------------------------+---------------------------+---------------------------+------------+
|        ID        | STATUS  |                    NAME                    |        PEER ADDRS         |       CLIENT ADDRS        | IS LEARNER |
+------------------+---------+--------------------------------------------+---------------------------+---------------------------+------------+
| 6f0318b69646b10a | started |  ec2-54-234-45-183.compute-1.amazonaws.com | http://172.31.13.117:2380 | http://172.31.13.117:2379 |      false |
| 851bb95a115a0651 | started |   ec2-54-90-123-74.compute-1.amazonaws.com |   http://172.31.9.53:2380 |   http://172.31.9.53:2379 |      false |
| 8ce655fee64e8888 | started |    ec2-3-88-142-40.compute-1.amazonaws.com |  http://172.31.7.132:2380 |  http://172.31.7.132:2379 |      false |
| 9a228ee3e4899e1e | started | ec2-54-167-119-185.compute-1.amazonaws.com |   http://172.31.9.60:2380 |   http://172.31.9.60:2379 |      false |
| dfed32f696db96d2 | started |  ec2-54-204-125-79.compute-1.amazonaws.com | http://172.31.12.135:2380 | http://172.31.12.135:2379 |      false |
+------------------+---------+--------------------------------------------+---------------------------+---------------------------+------------+

admin@ip-172-31-9-53:/opt/etcd$ {"level":"warn","ts":"2022-06-13T14:48:00.119Z","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0004fa1c0/n1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
Failed to get the status of endpoint http://n4:2379 (context deadline exceeded)
+----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
|    ENDPOINT    |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| http://n1:2379 | 851bb95a115a0651 |   3.5.3 |  164 kB |     false |      false |         2 |       2302 |               2299 |        |
| http://n2:2379 | 6f0318b69646b10a |   3.5.3 |  164 kB |     false |      false |         3 |       2306 |               2306 |        |
| http://n5:2379 | 8ce655fee64e8888 |   3.5.3 |  168 kB |     false |      false |         3 |       4966 |               4966 |        |
+----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
{"level":"warn","ts":"2022-06-13T14:47:38.147Z","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0002fee00/n1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
Failed to get the status of endpoint http://n4:2379 (cont

Relevant log output

No response

The text was updated successfully, but these errors were encountered:

serathius · 2022-07-12T08:03:15Z

Based on my discussion with @endocrimes this is also reproducible on v3.4

serathius · 2022-10-18T07:46:08Z

I have reproduced loss of un-fsynced writes in my own testing using lazyfs. Just running a single node cluster with lazyfs underneath. Kill + "lazyfs::clear-cache" results in large rollback of etcd state (tens of transactions).

@ramses @aphyr How much confidence we have in lazyfs implementation? Have lazyfs been successfully used to find similar issues in other databases?

endocrimes · 2022-10-18T11:35:02Z

@serathius fwiw the recent single-node data loss issues are a result of the same bug - We accept writes before they are actually flushed to the disk

serathius · 2022-10-18T12:39:54Z

@endocrimes I don't understand what you mean. Do you attribute this issue to #14370 ?
In my tests I verified that issue exists on main branch and v3.5.5 release which should have #14370 fixed.

serathius · 2022-10-18T16:20:53Z

Please disregard my above comment, I found issue in my setup. I'm still working on reproducing the issue.

ahrtr · 2024-08-26T09:45:26Z

The data loss issue #14370 was only for one member cluster (and also already fixed in 3.5.5); but this issue was reproduced in multi-members cluster based on the description above. So they are unrelated.

I think we can close this ticket,

We haven't reproduced this issue so far in the past a couple of years.
Even the reporter didn't trust lazyfs.

Also it's hard to debug this "issue" without the transaction details (i.e. #14890 (comment)) and data bundle (i.e. #14890 (comment)).

aphyr added the type/bug label Jun 21, 2022

serathius mentioned this issue Jun 22, 2022

Plans for v3.5.5 release #14138

Closed

16 tasks

serathius added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Jun 22, 2022

serathius mentioned this issue Aug 22, 2022

Introduce etcd linearizability tests #14045

Closed

33 tasks

stale bot added the stale label Oct 15, 2022

etcd-io deleted a comment from stale bot Oct 17, 2022

serathius added the stage/tracked label Oct 17, 2022

stale bot removed the stale label Oct 18, 2022

serathius added the release/v3.5 label Nov 15, 2022

serathius added the help wanted label May 10, 2023

serathius mentioned this issue May 10, 2023

Plans for v3.5.9 release #15871

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stale read with loss of un-fsynced writes #14140

Stale read with loss of un-fsynced writes #14140

aphyr commented Jun 21, 2022 •

edited by ahrtr

Loading

serathius commented Jul 12, 2022

serathius commented Oct 18, 2022

endocrimes commented Oct 18, 2022

serathius commented Oct 18, 2022

serathius commented Oct 18, 2022

ahrtr commented Aug 26, 2024

Stale read with loss of un-fsynced writes #14140

Stale read with loss of un-fsynced writes #14140

Comments

aphyr commented Jun 21, 2022 • edited by ahrtr Loading

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Etcd version (please run commands below)

Etcd configuration (command line flags or environment variables)

Etcd debug information (please run commands blow, feel free to obfuscate the IP address or FQDN in the output)

Relevant log output

serathius commented Jul 12, 2022

serathius commented Oct 18, 2022

endocrimes commented Oct 18, 2022

serathius commented Oct 18, 2022

serathius commented Oct 18, 2022

ahrtr commented Aug 26, 2024

aphyr commented Jun 21, 2022 •

edited by ahrtr

Loading