Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stale read with loss of un-fsynced writes #14140

Open
aphyr opened this issue Jun 21, 2022 · 6 comments
Open

Stale read with loss of un-fsynced writes #14140

aphyr opened this issue Jun 21, 2022 · 6 comments
Labels
help wanted priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release/v3.5 stage/tracked type/bug

Comments

@aphyr
Copy link

aphyr commented Jun 21, 2022

What happened?

In this Jepsen test involving the loss of un-fsynced writes (via lazyfs) in a five-node cluster of etcd 3.5.3 nodes, we observed a stale read (a linearizability violation) on a single key despite not using serializable reads.

Screenshot from 2022-06-21 11-26-07

Numbering transactions top-to-bottom as T1, T2, T3, and T4, T3 used an etcd transaction guarded by an equality comparison on revision to appended the value 16 to key 180. T4 observed T3's write: it read key 180 and saw it ended in [... 15 16]. However, T2, which began executing almost five seconds after T4 completed (denoted by rt edges), read key 180 and saw a previous value, ending in [... 15].

Again, this relies on lazyfs, which we're still sanding off bugs in. I'm not entirely confident in this finding, but I would like to register it in case y'all can reproduce it using your own tests. :-)

What did you expect to happen?

I expected reads to be linearizable, even with the loss of un-fsynced writes.

How can we reproduce it (as minimally and precisely as possible)?

With https://github.com/jepsen-io/etcd jepsen-io/etcd@181656b, run:

lein run test-all -w append --rate 1000 --concurrency 4n --lazyfs --nemesis kill --time-limit 300 --test-count 10 --nemesis-interval 20

Anything else we need to know?

No response

Etcd version (please run commands below)

$ etcd --version
etcd Version: 3.5.3
Git SHA: 0452feec7
Go Version: go1.16.15
Go OS/Arch: linux/amd64

$ etcdctl version
etcdctl version: 3.5.3
API version: 3.5

Etcd configuration (command line flags or environment variables)

etcd --enable-v2 --log-outputs stderr --logger zap --name n1 --listen-peer-urls http://192.168.122.101:2380/ --listen-client-urls http://192.168.122.101:2379/ --advertise-client-urls http://192.168.122.101:2379/ --initial-cluster-state new --initial-advertise-peer-urls http://192.168.122.101:2380/ --initial-cluster n1=[http://192.168.122.101:2380,n2=http://192.168.122.102:2380,n3=http://192.168.122.103:2380,n4=http://192.168.122.104:2380,n5=http://192.168.122.105:2380](http://192.168.122.101:2380%2Cn2%3Dhttp//192.168.122.102:2380,n3=http://192.168.122.103:2380,n4=http://192.168.122.104:2380,n5=http://192.168.122.105:2380) --snapshot-count 100 --experimental-initial-corrupt-check

Etcd debug information (please run commands blow, feel free to obfuscate the IP address or FQDN in the output)


admin@ip-172-31-9-53:/opt/etcd$ ./etcdctl --endpoints http://172.31.9.53:2379  member list -w table
+------------------+---------+--------------------------------------------+---------------------------+---------------------------+------------+
|        ID        | STATUS  |                    NAME                    |        PEER ADDRS         |       CLIENT ADDRS        | IS LEARNER |
+------------------+---------+--------------------------------------------+---------------------------+---------------------------+------------+
| 6f0318b69646b10a | started |  ec2-54-234-45-183.compute-1.amazonaws.com | http://172.31.13.117:2380 | http://172.31.13.117:2379 |      false |
| 851bb95a115a0651 | started |   ec2-54-90-123-74.compute-1.amazonaws.com |   http://172.31.9.53:2380 |   http://172.31.9.53:2379 |      false |
| 8ce655fee64e8888 | started |    ec2-3-88-142-40.compute-1.amazonaws.com |  http://172.31.7.132:2380 |  http://172.31.7.132:2379 |      false |
| 9a228ee3e4899e1e | started | ec2-54-167-119-185.compute-1.amazonaws.com |   http://172.31.9.60:2380 |   http://172.31.9.60:2379 |      false |
| dfed32f696db96d2 | started |  ec2-54-204-125-79.compute-1.amazonaws.com | http://172.31.12.135:2380 | http://172.31.12.135:2379 |      false |
+------------------+---------+--------------------------------------------+---------------------------+---------------------------+------------+

admin@ip-172-31-9-53:/opt/etcd$ {"level":"warn","ts":"2022-06-13T14:48:00.119Z","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0004fa1c0/n1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
Failed to get the status of endpoint http://n4:2379 (context deadline exceeded)
+----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
|    ENDPOINT    |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| http://n1:2379 | 851bb95a115a0651 |   3.5.3 |  164 kB |     false |      false |         2 |       2302 |               2299 |        |
| http://n2:2379 | 6f0318b69646b10a |   3.5.3 |  164 kB |     false |      false |         3 |       2306 |               2306 |        |
| http://n5:2379 | 8ce655fee64e8888 |   3.5.3 |  168 kB |     false |      false |         3 |       4966 |               4966 |        |
+----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
{"level":"warn","ts":"2022-06-13T14:47:38.147Z","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0002fee00/n1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
Failed to get the status of endpoint http://n4:2379 (cont


Relevant log output

No response

@aphyr aphyr added the type/bug label Jun 21, 2022
@serathius serathius added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Jun 22, 2022
@serathius
Copy link
Member

Based on my discussion with @endocrimes this is also reproducible on v3.4

@stale stale bot added the stale label Oct 15, 2022
@etcd-io etcd-io deleted a comment from stale bot Oct 17, 2022
@serathius
Copy link
Member

I have reproduced loss of un-fsynced writes in my own testing using lazyfs. Just running a single node cluster with lazyfs underneath. Kill + "lazyfs::clear-cache" results in large rollback of etcd state (tens of transactions).

@ramses @aphyr How much confidence we have in lazyfs implementation? Have lazyfs been successfully used to find similar issues in other databases?

@endocrimes
Copy link
Contributor

@serathius fwiw the recent single-node data loss issues are a result of the same bug - We accept writes before they are actually flushed to the disk

@stale stale bot removed the stale label Oct 18, 2022
@serathius
Copy link
Member

@endocrimes I don't understand what you mean. Do you attribute this issue to #14370 ?
In my tests I verified that issue exists on main branch and v3.5.5 release which should have #14370 fixed.

@serathius
Copy link
Member

Please disregard my above comment, I found issue in my setup. I'm still working on reproducing the issue.

@ahrtr
Copy link
Member

ahrtr commented Aug 26, 2024

The data loss issue #14370 was only for one member cluster (and also already fixed in 3.5.5); but this issue was reproduced in multi-members cluster based on the description above. So they are unrelated.

I think we can close this ticket,

  • We haven't reproduced this issue so far in the past a couple of years.
  • Even the reporter didn't trust lazyfs.

Also it's hard to debug this "issue" without the transaction details (i.e. #14890 (comment)) and data bundle (i.e. #14890 (comment)).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release/v3.5 stage/tracked type/bug
Development

No branches or pull requests

4 participants