Can't start second etcd cluster #12623

vladloskut · 2021-01-14T19:18:13Z

Hello, I am trying to create a second etcd cluster. Here is my infrastructure

ETCD cluster 1:

Node #1 PG + patroni + etcd
Node #2 PG + patroni + etcd
Node #3 etcd only

ETCD cluster 2:

Node #4 PG + patroni + etcd
Node #5 PG + patroni + etcd
Node #3 etcd only

So as you can see the second cluster is using node #3 to create a quorum

ECTD cluster 1 running with no problems but when I am trying to launch second cluster I am getting the following error on node #4 and #5:

request cluster ID mismatch (got A want B)

I did use google search but can't find any idea on how to run 2 ETCD clusters

For ETCD cluster 2 I changed ports for node #3 and also created separate systemd service

Please help me

ptabor · 2021-01-14T21:24:18Z

Please put here exact (might be obfuscated, but representative) command lines how you run the etcd instances, and what's the exact error message.

vladloskut · 2021-01-18T15:28:14Z

ETCD_NAME="pg_node_1"
ETCD_LISTEN_CLIENT_URLS="http://10.105.241.135:2379,http://127.0.0.1:2379"
ETCD_ADVERTISE_CLIENT_URLS="http://10.105.241.135:2379"
ETCD_LISTEN_PEER_URLS="http://10.105.241.135:2380"
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://10.105.241.135:2380"
ETCD_INITIAL_CLUSTER_TOKEN="cluster_1"
ETCD_INITIAL_CLUSTER="pg_node_1=http://10.105.241.135:2380,pg_node_2=http://10.105.241.137:2380,etcd_node_only=http://10.105.241.142:2380"
ETCD_INITIAL_CLUSTER_STATE="new"
ETCD_DATA_DIR="/var/lib/etcd"
ETCD_ELECTION_TIMEOUT="5000"
ETCD_HEARTBEAT_INTERVAL="1000"

ETCD_NAME="pg_node_2"
ETCD_LISTEN_CLIENT_URLS="http://10.105.241.137:2379,http://127.0.0.1:2379"
ETCD_ADVERTISE_CLIENT_URLS="http://10.105.241.137:2379"
ETCD_LISTEN_PEER_URLS="http://10.105.241.137:2380"
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://10.105.241.137:2380"
ETCD_INITIAL_CLUSTER_TOKEN="cluster_1"
ETCD_INITIAL_CLUSTER="pg_node_1=http://10.105.241.135:2380,pg_node_2=http://10.105.241.137:2380,etcd_node_only=http://10.105.241.142:2380"
ETCD_INITIAL_CLUSTER_STATE="new"
ETCD_DATA_DIR="/var/lib/etcd"
ETCD_ELECTION_TIMEOUT="5000"
ETCD_HEARTBEAT_INTERVAL="1000"

Service #1

ETCD_NAME="etcd_node_only"
ETCD_LISTEN_CLIENT_URLS="http://10.105.241.142:2379,http://127.0.0.1:2379"
ETCD_ADVERTISE_CLIENT_URLS="http://10.105.241.142:2379"
ETCD_LISTEN_PEER_URLS="http://10.105.241.142:2380"
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://10.105.241.142:2380"
ETCD_INITIAL_CLUSTER_TOKEN="cluster_1"
ETCD_INITIAL_CLUSTER="pg_node_1=http://10.105.241.135:2380,pg_node_2=http://10.105.241.137:2380,etcd_node_only=http://10.105.241.142:2380"
ETCD_INITIAL_CLUSTER_STATE="new"
ETCD_DATA_DIR="/var/lib/etcd"
ETCD_ELECTION_TIMEOUT="5000"
ETCD_HEARTBEAT_INTERVAL="1000"

Service #2

ETCD_NAME="etcd_node_only_2"
ETCD_LISTEN_CLIENT_URLS="http://10.105.241.142:2378,http://127.0.0.1:2378"
ETCD_ADVERTISE_CLIENT_URLS="http://10.105.241.142:2378"
ETCD_LISTEN_PEER_URLS="http://10.105.241.142:2381"
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://10.105.241.142:2381"
ETCD_INITIAL_CLUSTER_TOKEN="cluster_2"
ETCD_INITIAL_CLUSTER="pg_node_1=http://10.105.241.135:2380,pg_node_2=http://10.105.241.137:2380,etcd_node_only_2=http://10.105.241.142:2381"
ETCD_INITIAL_CLUSTER_STATE="new"
ETCD_DATA_DIR="/var/lib/etcd_2"
ETCD_ELECTION_TIMEOUT="5000"
ETCD_HEARTBEAT_INTERVAL="1000"

ETCD_NAME="db_node_3"
ETCD_LISTEN_CLIENT_URLS="http://10.105.241.119:2379,http://127.0.0.1:2379"
ETCD_ADVERTISE_CLIENT_URLS="http://10.105.241.119:2379"
ETCD_LISTEN_PEER_URLS="http://10.105.241.119:2380"
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://10.105.241.119:2380"
ETCD_INITIAL_CLUSTER_TOKEN="cluster_2"
ETCD_INITIAL_CLUSTER="db_node_3=http://10.105.241.119:2380,db_node_4=http://10.105.241.120:2380,etcd_node_only_2=http://10.105.241.142:2381"
ETCD_INITIAL_CLUSTER_STATE="new"
ETCD_DATA_DIR="/var/lib/another_etcd"
ETCD_ELECTION_TIMEOUT="5000"
ETCD_HEARTBEAT_INTERVAL="1000"

ETCD_NAME="db_node_4"
ETCD_LISTEN_CLIENT_URLS="http://10.105.241.119:2379,http://127.0.0.1:2379"
ETCD_ADVERTISE_CLIENT_URLS="http://10.105.241.119:2379"
ETCD_LISTEN_PEER_URLS="http://10.105.241.119:2380"
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://10.105.241.119:2380"
ETCD_INITIAL_CLUSTER_TOKEN="cluster_2"
ETCD_INITIAL_CLUSTER="db_node_3=http://10.105.241.119:2380,db_node_4=http://10.105.241.120:2380,etcd_node_only_2=http://10.105.241.142:2381"
ETCD_INITIAL_CLUSTER_STATE="new"
ETCD_DATA_DIR="/var/lib/another_etcd"
ETCD_ELECTION_TIMEOUT="5000"
ETCD_HEARTBEAT_INTERVAL="1000"

vladloskut · 2021-01-18T15:35:18Z

Please put here exact (might be obfuscated, but representative) command lines how you run the etcd instances, and what's the exact error message.

Posted above. Tried to build again. Every time I use clean ETCD_DATA_DIR and ETCD_INITIAL_CLUSTER_TOKEN

ptabor · 2021-01-19T08:54:19Z

Why node "http://10.105.241.135:2380" spans both services ?

It needs to belong to one or the other cluster, but not to both.

vladloskut · 2021-01-19T08:58:03Z

Why node "http://10.105.241.135:2380" spans both services ?

It needs to belong to one or the other cluster, but not to both.

My bad.

Here is Service 2 config:

ETCD_INITIAL_CLUSTER="db_node_3=http://10.105.241.119:2380,db_node_4=http://10.105.241.120:2380,etcd_node=http://10.105.241.142:2381"

ptabor · 2021-01-19T09:07:04Z

ETCD_LISTEN_PEER_URLS="http://10.105.241.119:2380" seems to be shared between 4. & 5.

Either way, the warning is printed if one RAFT nodes receives a RAFT message that is targetted for another node.
So it's most likely you still have some cross-cluster IP:port mismatch in cluster configuration.
You need to fully separate the spaces between the clusters.

vladloskut · 2021-01-19T09:10:18Z

ETCD_INITIAL_CLUSTER is the same for Node 1,2 and service 1 on node 3
ETCD_INITIAL_CLUSTER is the same for Node 4,5 and service 2 on node 3

data_dir's were empty before start

ETCD_INITIAL_CLUSTER_TOKEN is the same and unique for Node 1,2 and service 1 on node 3
ETCD_INITIAL_CLUSTER_TOKEN is the same and unique for Node 4,5 and service 2 on node 3

ETCD_LISTEN_PEER_URLS="http://10.105.241.119:2380" seems to be shared between 4. & 5. <<<<<

My bad too, sorry, a bit tired of this thing...

vladloskut · 2021-01-19T09:13:16Z

/usr/local/bin/another_etcdctl member list --write-out=table

This is output from Node 3

How to query another cluster?

/usr/local/bin/another_etcdctl endpoint status --write-out=table --endpoints=10.105.241.119:2379,10.105.241.120:2379,10.105.241.142:2378 member list

+---------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+---------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| 10.105.241.120:2379 | 923207833d9ad5e1 | 3.4.14 | 20 kB | false | false | 123 | 2973 | 2973 | |
| 10.105.241.142:2378 | 53f80f7fb22dbc34 | 3.4.14 | 20 kB | true | false | 123 | 2973 | 2973 | |
+---------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+

vladloskut · 2021-01-19T09:35:25Z

Hmm..I guess I found a way to resolve this issue

I used 2378 and 2381 on nodes 4 and 5 + Service 2 on node 3 and all nodes connected immediately

On more question. How to check that 2 ETCD cluster are really healthy? Help please :)

ptabor · 2021-01-19T09:54:10Z

If you can write & all nodes have the same RAFT APPLIED INDEX after it, this is good indicator that all of them are connected.

ptabor · 2021-01-19T20:05:31Z

Assuming the issue is solved.

ptabor closed this as completed Jan 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't start second etcd cluster #12623

Can't start second etcd cluster #12623

vladloskut commented Jan 14, 2021

ptabor commented Jan 14, 2021

vladloskut commented Jan 18, 2021

vladloskut commented Jan 18, 2021

ptabor commented Jan 19, 2021

vladloskut commented Jan 19, 2021

ptabor commented Jan 19, 2021

vladloskut commented Jan 19, 2021

vladloskut commented Jan 19, 2021

vladloskut commented Jan 19, 2021

ptabor commented Jan 19, 2021

ptabor commented Jan 19, 2021

Can't start second etcd cluster #12623

Can't start second etcd cluster #12623

Comments

vladloskut commented Jan 14, 2021

ptabor commented Jan 14, 2021

vladloskut commented Jan 18, 2021

vladloskut commented Jan 18, 2021

ptabor commented Jan 19, 2021

vladloskut commented Jan 19, 2021

ptabor commented Jan 19, 2021

vladloskut commented Jan 19, 2021

vladloskut commented Jan 19, 2021

vladloskut commented Jan 19, 2021

ptabor commented Jan 19, 2021

ptabor commented Jan 19, 2021