Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resource status StandAlone inconsistency #213

Open
tumurzakov opened this issue Jan 18, 2021 · 8 comments
Open

Resource status StandAlone inconsistency #213

tumurzakov opened this issue Jan 18, 2021 · 8 comments

Comments

@tumurzakov
Copy link

tumurzakov commented Jan 18, 2021

One node was down by power issue (kube-node-7). Before it has been recovered i made linstor n lost kube-node-7. Node was deleted from linstor.

After node was recovered, i made

linstor node create kube-node-7 192.168.11.246
linstor sp create lvmthin kube-node-7 linstor-pool ubuntu-vg/linstor-pool

After that linstor r list command starts to return several resources in StandAlone status (which by documentation used for connection if split brain was detected)

# linstor r l
...
┊ pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71 ┊ kube-node-1   ┊ 7004 ┊ Unused ┊ Ok                      ┊ UpToDate ┊ 2021-01-18 11:48:00 ┊
┊ pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71 ┊ kube-node-2   ┊ 7004 ┊ InUse  ┊ StandAlone(kube-node-7) ┊ UpToDate ┊ 2020-10-26 05:45:54 ┊
┊ pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71 ┊ kube-node-6   ┊ 7004 ┊ Unused ┊ StandAlone(kube-node-7) ┊ UpToDate ┊ 2020-10-26 05:45:52 ┊

But drbdtop shows everything fine on thouse nodes

kube-node-2

│Resource: pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71: (Overall danger score: 0)                                                                                                        │
│ Local Disc(Primary):                                                                                                                                                                │
│  volume 0 (/dev/drbd1004): UpToDate(normal disk state)                                                                                                                              │
│                                                                                                                                                                                     │
│ Connection to kube-node-1(Secondary): Connected(connected to kube-node-1)                                                                                                           │
│  volume 0:                                                                                                                                                                          │
│   UpToDate(normal disk state)                                                                                                                                                       │
│                                                                                                                                                                                     │
│ Connection to kube-node-6(Secondary): Connected(connected to kube-node-6)                                                                                                           │
│  volume 0:                                                                                                                                                                          │
│   UpToDate(normal disk state)

kube-node-6

│Resource: pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71: (Overall danger score: 0)                                                                                                        │
│ Local Disc(Secondary):                                                                                                                                                              │
│  volume 0 (/dev/drbd1004): UpToDate(normal disk state)                                                                                                                              │
│                                                                                                                                                                                     │
│ Connection to kube-node-1(Secondary): Connected(connected to kube-node-1)                                                                                                           │
│  volume 0:                                                                                                                                                                          │
│   UpToDate(normal disk state)                                                                                                                                                       │
│                                                                                                                                                                                     │
│ Connection to kube-node-2(Primary): Connected(connected to kube-node-2)                                                                                                             │
│  volume 0:                                                                                                                                                                          │
│   UpToDate(normal disk state)

kube-node-1

│Resource: pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71: (Overall danger score: 0)                                                                                                        │
│ Local Disc(Secondary):                                                                                                                                                              │
│  volume 0 (/dev/drbd1004): UpToDate(normal disk state)                                                                                                                              │
│                                                                                                                                                                                     │
│ Connection to kube-node-2(Primary): Connected(connected to kube-node-2)                                                                                                             │
│  volume 0:                                                                                                                                                                          │
│   UpToDate(normal disk state)                                                                                                                                                       │
│                                                                                                                                                                                     │
│ Connection to kube-node-6(Secondary): Connected(connected to kube-node-6)                                                                                                           │
│  volume 0:                                                                                                                                                                          │
│   UpToDate(normal disk state)   

Why linstor r list shows that thouse resources in StandAlone mode?

@ghernadi
Copy link
Contributor

Hello,

can you please

  1. restart the two linstor-satellites and see if this fixes the resource list issue
  2. attach here drbdadm status from both / all DRBD peers
  3. attach here dmesg for the specific DRBD-resource, ideally also from all DRBD peers

@tumurzakov
Copy link
Author

  1. After restart statuses connection become Ok (Closing issue, thanx :)
  2. drbdadm status after satellite restart
pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71 role:Primary
  disk:UpToDate
  kube-node-1 role:Secondary
    peer-disk:UpToDate
  kube-node-6 role:Secondary
    peer-disk:UpToDate

drbdadm status before satellite restart on other node

pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71 role:Secondary
  disk:UpToDate
  kube-node-1 role:Secondary
    peer-disk:UpToDate
  kube-node-2 role:Primary
    peer-disk:UpToDate
  1. dmesg

kube-node-2

Jan 18 11:20:01 kube-node-2 kernel: [8311959.489859] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71 kube-node-6: Preparing remote state change 838443789 
Jan 18 11:20:01 kube-node-2 kernel: [8311959.490095] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71 kube-node-6: Committing remote state change 838443789 (primary_nodes=1)
Jan 18 11:20:01 kube-node-2 kernel: [8311959.490105] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71 kube-node-6: conn( Connected -> TearDown ) peer( Secondary -> Unknown )
Jan 18 11:20:01 kube-node-2 kernel: [8311959.490108] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71/0 drbd1004 kube-node-6: pdsk( UpToDate -> Outdated ) repl( Established -> Off )
Jan 18 11:20:01 kube-node-2 kernel: [8311959.490135] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71 kube-node-6: ack_receiver terminated
Jan 18 11:20:01 kube-node-2 kernel: [8311959.490137] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71 kube-node-6: Terminating ack_recv thread
Jan 18 11:20:01 kube-node-2 kernel: [8311959.522606] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71 kube-node-6: Restarting sender thread
Jan 18 11:20:01 kube-node-2 kernel: [8311959.532357] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71 kube-node-6: Connection closed
Jan 18 11:20:01 kube-node-2 kernel: [8311959.532370] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71 kube-node-6: helper command: /sbin/drbdadm disconnected
Jan 18 11:20:01 kube-node-2 kernel: [8311959.536009] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71 kube-node-6: helper command: /sbin/drbdadm disconnected exit code 0
Jan 18 11:20:01 kube-node-2 kernel: [8311959.536036] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71 kube-node-6: conn( TearDown -> Unconnected )
Jan 18 11:20:01 kube-node-2 kernel: [8311959.536047] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71 kube-node-6: Restarting receiver thread
Jan 18 11:20:01 kube-node-2 kernel: [8311959.536053] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71 kube-node-6: conn( Unconnected -> Connecting )
Jan 18 11:20:01 kube-node-2 kernel: [8311959.545231] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71 kube-node-1: Preparing remote state change 1387051894 
Jan 18 11:20:01 kube-node-2 kernel: [8311959.545666] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71 kube-node-1: Committing remote state change 1387051894 (primary_nodes=1)
Jan 18 11:20:01 kube-node-2 kernel: [8311960.035735] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71/0 drbd1004: new current UUID: C7E46E77C301DB63 weak: FFFFFFFFFFFFFFF6
Jan 18 11:20:09 kube-node-2 kernel: [8311967.354610] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71 kube-node-6: Handshake to peer 1 successful: Agreed network protocol version 117
Jan 18 11:20:09 kube-node-2 kernel: [8311967.354612] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71 kube-node-6: Feature flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
Jan 18 11:20:09 kube-node-2 kernel: [8311967.355426] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71 kube-node-6: Peer authenticated using 20 bytes HMAC
Jan 18 11:20:09 kube-node-2 kernel: [8311967.355430] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71 kube-node-6: Starting ack_recv thread (from drbd_r_pvc-1ed5 [24015])
Jan 18 11:20:09 kube-node-2 kernel: [8311967.390805] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71/0 drbd1004 kube-node-6: drbd_sync_handshake:
Jan 18 11:20:09 kube-node-2 kernel: [8311967.390806] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71/0 drbd1004 kube-node-6: self C7E46E77C301DB63:BD104F7322908293:0000000000000000:0000000000000000 bits:58 flags:120
Jan 18 11:20:09 kube-node-2 kernel: [8311967.390807] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71/0 drbd1004 kube-node-6: peer BD104F7322908292:0000000000000000:0000000000000000:0000000000000000 bits:0 flags:20
Jan 18 11:20:09 kube-node-2 kernel: [8311967.390808] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71/0 drbd1004 kube-node-6: uuid_compare()=source-use-bitmap by rule 70
Jan 18 11:20:09 kube-node-2 kernel: [8311967.406520] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71: Preparing cluster-wide state change 772351151 (0->1 499/145)
Jan 18 11:20:09 kube-node-2 kernel: [8311967.410808] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71: State change 772351151: primary_nodes=1, weak_nodes=FFFFFFFFFFFFFFF4
Jan 18 11:20:09 kube-node-2 kernel: [8311967.410810] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71: Committing cluster-wide state change 772351151 (4ms)
Jan 18 11:20:09 kube-node-2 kernel: [8311967.410836] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71 kube-node-6: conn( Connecting -> Connected ) peer( Unknown -> Secondary )
Jan 18 11:20:09 kube-node-2 kernel: [8311967.410839] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71/0 drbd1004 kube-node-6: repl( Off -> WFBitMapS )
Jan 18 11:20:09 kube-node-2 kernel: [8311967.431606] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71/0 drbd1004 kube-node-6: send bitmap stats [Bytes(packets)]: plain 0(0), RLE 36(1), total 36; compression: 100.0%
Jan 18 11:20:09 kube-node-2 kernel: [8311967.466951] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71/0 drbd1004 kube-node-6: receive bitmap stats [Bytes(packets)]: plain 0(0), RLE 36(1), total 36; compression: 100.0%
Jan 18 11:20:09 kube-node-2 kernel: [8311967.466957] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71/0 drbd1004 kube-node-6: helper command: /sbin/drbdadm before-resync-source
Jan 18 11:20:09 kube-node-2 kernel: [8311967.471673] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71/0 drbd1004 kube-node-6: helper command: /sbin/drbdadm before-resync-source exit code 0
Jan 18 11:20:09 kube-node-2 kernel: [8311967.471695] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71/0 drbd1004 kube-node-6: pdsk( Outdated -> Inconsistent ) repl( WFBitMapS -> SyncSource )
Jan 18 11:20:09 kube-node-2 kernel: [8311967.471747] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71/0 drbd1004 kube-node-6: Began resync as SyncSource (will sync 244 KB [61 bits set]).
Jan 18 11:20:09 kube-node-2 kernel: [8311967.871565] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71/0 drbd1004 kube-node-6: updated UUIDs C7E46E77C301DB63:0000000000000000:0000000000000000:0000000000000000
Jan 18 11:20:09 kube-node-2 kernel: [8311967.909770] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71/0 drbd1004 kube-node-6: Resync done (total 1 sec; paused 0 sec; 244 K/sec)
Jan 18 11:20:09 kube-node-2 kernel: [8311967.909781] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71/0 drbd1004 kube-node-6: expected n_oos:1 to be equal to rs_failed:0
Jan 18 11:20:09 kube-node-2 kernel: [8311967.909787] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71/0 drbd1004 kube-node-6: pdsk( Inconsistent -> UpToDate ) repl( SyncSource -> Established )
Jan 18 11:20:11 kube-node-2 kernel: [8311969.499414] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71 kube-node-6: Preparing remote state change 3933973722 
Jan 18 11:20:11 kube-node-2 kernel: [8311969.499899] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71 kube-node-6: Committing remote state change 3933973722 (primary_nodes=1)
Jan 18 11:20:11 kube-node-2 kernel: [8311969.500040] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71/0 drbd1004 kube-node-6: pdsk( UpToDate -> Outdated )
Jan 18 11:20:11 kube-node-2 kernel: [8311969.707919] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71/0 drbd1004 kube-node-6: pdsk( Outdated -> Inconsistent ) resync-susp( no -> peer )
Jan 18 11:20:11 kube-node-2 kernel: [8311970.030691] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71/0 drbd1004 kube-node-6: pdsk( Inconsistent -> UpToDate ) resync-susp( peer -> no )
Jan 18 11:23:49 kube-node-2 kernel: [8312188.046983] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71/0 drbd1004: using verify-alg: "crc32c"
Jan 18 16:27:06 kube-node-2 kernel: [8330384.740269] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71/0 drbd1004: rs_discard_granularity changed to 262144

kube-node-6

Jan 18 11:16:16 kube-node-6 kernel: [8228770.739461] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71/0 drbd1004: rs_discard_granularity changed to 262144
Jan 18 11:20:01 kube-node-6 kernel: [8228995.548808] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71: Preparing cluster-wide state change 838443789 (1->0 8176/3088)
Jan 18 11:20:01 kube-node-6 kernel: [8228995.549069] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71: State change 838443789: primary_nodes=1, weak_nodes=FFFFFFFFFFFFFFF6
Jan 18 11:20:01 kube-node-6 kernel: [8228995.549071] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71: Committing cluster-wide state change 838443789 (0ms)
Jan 18 11:20:01 kube-node-6 kernel: [8228995.549099] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71 kube-node-2: conn( Connected -> Disconnecting ) peer( Primary -> Unknown )
Jan 18 11:20:01 kube-node-6 kernel: [8228995.549101] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71/0 drbd1004: disk( UpToDate -> Outdated )
Jan 18 11:20:01 kube-node-6 kernel: [8228995.549103] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71/0 drbd1004 kube-node-2: pdsk( UpToDate -> DUnknown ) repl( Established -> Off )
Jan 18 11:20:01 kube-node-6 kernel: [8228995.549144] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71 kube-node-2: ack_receiver terminated
Jan 18 11:20:01 kube-node-6 kernel: [8228995.549145] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71 kube-node-2: Terminating ack_recv thread
Jan 18 11:20:01 kube-node-6 kernel: [8228995.590457] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71 kube-node-2: Restarting sender thread
Jan 18 11:20:01 kube-node-6 kernel: [8228995.595722] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71 kube-node-2: Connection closed
Jan 18 11:20:01 kube-node-6 kernel: [8228995.595734] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71 kube-node-2: helper command: /sbin/drbdadm disconnected
Jan 18 11:20:01 kube-node-6 kernel: [8228995.599020] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71 kube-node-2: helper command: /sbin/drbdadm disconnected exit code 0
Jan 18 11:20:01 kube-node-6 kernel: [8228995.599071] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71 kube-node-2: conn( Disconnecting -> StandAlone )
Jan 18 11:20:01 kube-node-6 kernel: [8228995.599077] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71 kube-node-2: Terminating receiver thread
Jan 18 11:20:01 kube-node-6 kernel: [8228995.599127] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71 kube-node-2: Terminating sender thread
Jan 18 11:20:01 kube-node-6 kernel: [8228995.604039] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71: Preparing cluster-wide state change 1387051894 (1->3 496/16)
Jan 18 11:20:01 kube-node-6 kernel: [8228995.604537] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71: State change 1387051894: primary_nodes=1, weak_nodes=FFFFFFFFFFFFFFF6
Jan 18 11:20:01 kube-node-6 kernel: [8228995.604538] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71 kube-node-1: Cluster is now split
Jan 18 11:20:01 kube-node-6 kernel: [8228995.604539] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71: Committing cluster-wide state change 1387051894 (0ms)
Jan 18 11:20:01 kube-node-6 kernel: [8228995.604576] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71 kube-node-1: conn( Connected -> Disconnecting ) peer( Secondary -> Unknown )
Jan 18 11:20:01 kube-node-6 kernel: [8228995.604577] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71/0 drbd1004: quorum( yes -> no )
Jan 18 11:20:01 kube-node-6 kernel: [8228995.604579] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71/0 drbd1004 kube-node-1: pdsk( UpToDate -> DUnknown ) repl( Established -> Off )
Jan 18 11:20:01 kube-node-6 kernel: [8228995.604603] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71 kube-node-1: ack_receiver terminated
Jan 18 11:20:01 kube-node-6 kernel: [8228995.604604] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71 kube-node-1: Terminating ack_recv thread
Jan 18 11:20:01 kube-node-6 kernel: [8228995.650446] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71 kube-node-1: Restarting sender thread
Jan 18 11:20:01 kube-node-6 kernel: [8228995.654025] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71 kube-node-1: Connection closed
Jan 18 11:20:01 kube-node-6 kernel: [8228995.654035] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71 kube-node-1: helper command: /sbin/drbdadm disconnected
Jan 18 11:20:01 kube-node-6 kernel: [8228995.656801] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71 kube-node-1: helper command: /sbin/drbdadm disconnected exit code 0
Jan 18 11:20:01 kube-node-6 kernel: [8228995.656825] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71 kube-node-1: conn( Disconnecting -> StandAlone )
Jan 18 11:20:01 kube-node-6 kernel: [8228995.656835] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71 kube-node-1: Terminating receiver thread
Jan 18 11:20:01 kube-node-6 kernel: [8228995.656891] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71 kube-node-1: Terminating sender thread
Jan 18 11:20:01 kube-node-6 kernel: [8228995.656959] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71/0 drbd1004: disk( Outdated -> Detaching )
Jan 18 11:20:01 kube-node-6 kernel: [8228995.657224] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71/0 drbd1004: disk( Detaching -> Diskless )
Jan 18 11:20:01 kube-node-6 kernel: [8228995.657759] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71/0 drbd1004: drbd_bm_resize called with capacity == 0
Jan 18 11:20:01 kube-node-6 kernel: [8228995.678497] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71: Terminating worker thread
Jan 18 11:20:05 kube-node-6 kernel: [8228999.990585] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71: Starting worker thread (from drbdsetup [8523])
Jan 18 11:20:05 kube-node-6 kernel: [8229000.014503] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71 kube-node-1: Starting sender thread (from drbdsetup [8589])
Jan 18 11:20:05 kube-node-6 kernel: [8229000.017699] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71 kube-node-2: Starting sender thread (from drbdsetup [8598])
Jan 18 11:20:06 kube-node-6 kernel: [8229000.427671] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71/0 drbd1004: meta-data IO uses: blk-bio
Jan 18 11:20:06 kube-node-6 kernel: [8229000.443411] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71/0 drbd1004: rs_discard_granularity changed to 262144
Jan 18 11:20:06 kube-node-6 kernel: [8229000.443426] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71/0 drbd1004: disk( Diskless -> Attaching )
Jan 18 11:20:06 kube-node-6 kernel: [8229000.443431] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71/0 drbd1004: Maximum number of peer devices = 7
Jan 18 11:20:06 kube-node-6 kernel: [8229000.443521] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71: Method to ensure write ordering: flush
Jan 18 11:20:06 kube-node-6 kernel: [8229000.443524] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71/0 drbd1004: drbd_bm_resize called with capacity == 62917424
Jan 18 11:20:06 kube-node-6 kernel: [8229000.450643] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71/0 drbd1004: resync bitmap: bits=7864678 words=860202 pages=1681
Jan 18 11:20:06 kube-node-6 kernel: [8229000.450646] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71/0 drbd1004: size = 30 GB (31458712 KB)
Jan 18 11:20:06 kube-node-6 kernel: [8229000.450647] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71/0 drbd1004: size = 30 GB (31458712 KB)
Jan 18 11:20:06 kube-node-6 kernel: [8229000.543729] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71/0 drbd1004: recounting of set bits took additional 8ms
Jan 18 11:20:06 kube-node-6 kernel: [8229000.543741] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71/0 drbd1004: disk( Attaching -> Outdated )
Jan 18 11:20:06 kube-node-6 kernel: [8229000.543743] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71/0 drbd1004: attached to current UUID: BD104F7322908292
Jan 18 11:20:08 kube-node-6 kernel: [8229002.899318] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71 kube-node-1: conn( StandAlone -> Unconnected )
Jan 18 11:20:08 kube-node-6 kernel: [8229002.899347] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71 kube-node-1: Starting receiver thread (from drbd_w_pvc-1ed5 [8524])
Jan 18 11:20:08 kube-node-6 kernel: [8229002.899433] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71 kube-node-1: conn( Unconnected -> Connecting )
Jan 18 11:20:08 kube-node-6 kernel: [8229002.900066] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71 kube-node-2: conn( StandAlone -> Unconnected )
Jan 18 11:20:08 kube-node-6 kernel: [8229002.900142] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71 kube-node-2: Starting receiver thread (from drbd_w_pvc-1ed5 [8524])
Jan 18 11:20:08 kube-node-6 kernel: [8229002.901776] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71 kube-node-2: conn( Unconnected -> Connecting )
Jan 18 11:20:09 kube-node-6 kernel: [8229003.414475] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71 kube-node-2: Handshake to peer 0 successful: Agreed network protocol version 117
Jan 18 11:20:09 kube-node-6 kernel: [8229003.414476] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71 kube-node-2: Feature flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
Jan 18 11:20:09 kube-node-6 kernel: [8229003.416166] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71 kube-node-2: Peer authenticated using 20 bytes HMAC
Jan 18 11:20:09 kube-node-6 kernel: [8229003.416189] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71 kube-node-2: Starting ack_recv thread (from drbd_r_pvc-1ed5 [9032])
Jan 18 11:20:09 kube-node-6 kernel: [8229003.428506] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71 kube-node-1: Handshake to peer 3 successful: Agreed network protocol version 117
Jan 18 11:20:09 kube-node-6 kernel: [8229003.428508] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71 kube-node-1: Feature flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
Jan 18 11:20:09 kube-node-6 kernel: [8229003.428728] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71 kube-node-1: Peer authenticated using 20 bytes HMAC
Jan 18 11:20:09 kube-node-6 kernel: [8229003.428738] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71 kube-node-1: Starting ack_recv thread (from drbd_r_pvc-1ed5 [9030])
Jan 18 11:20:09 kube-node-6 kernel: [8229003.450460] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71/0 drbd1004 kube-node-2: drbd_sync_handshake:
Jan 18 11:20:09 kube-node-6 kernel: [8229003.450462] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71/0 drbd1004 kube-node-2: self BD104F7322908292:0000000000000000:0000000000000000:0000000000000000 bits:0 flags:20
Jan 18 11:20:09 kube-node-6 kernel: [8229003.450464] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71/0 drbd1004 kube-node-2: peer C7E46E77C301DB63:BD104F7322908293:0000000000000000:0000000000000000 bits:58 flags:120
Jan 18 11:20:09 kube-node-6 kernel: [8229003.450466] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71/0 drbd1004 kube-node-2: uuid_compare()=target-use-bitmap by rule 50
Jan 18 11:20:09 kube-node-6 kernel: [8229003.465779] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71 kube-node-2: Preparing remote state change 772351151 
Jan 18 11:20:09 kube-node-6 kernel: [8229003.466495] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71/0 drbd1004 kube-node-1: drbd_sync_handshake:
Jan 18 11:20:09 kube-node-6 kernel: [8229003.466498] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71/0 drbd1004 kube-node-1: self BD104F7322908292:0000000000000000:0000000000000000:0000000000000000 bits:0 flags:20
Jan 18 11:20:09 kube-node-6 kernel: [8229003.466500] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71/0 drbd1004 kube-node-1: peer C7E46E77C301DB62:BD104F7322908292:0000000000000000:0000000000000000 bits:58 flags:100
Jan 18 11:20:09 kube-node-6 kernel: [8229003.466502] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71/0 drbd1004 kube-node-1: uuid_compare()=target-use-bitmap by rule 50
Jan 18 11:20:09 kube-node-6 kernel: [8229003.471113] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71 kube-node-2: Committing remote state change 772351151 (primary_nodes=1)
Jan 18 11:20:09 kube-node-6 kernel: [8229003.471117] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71 kube-node-2: conn( Connecting -> Connected ) peer( Unknown -> Primary )
Jan 18 11:20:09 kube-node-6 kernel: [8229003.471118] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71/0 drbd1004: quorum( no -> yes )
Jan 18 11:20:09 kube-node-6 kernel: [8229003.471120] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71/0 drbd1004 kube-node-2: pdsk( DUnknown -> UpToDate ) repl( Off -> WFBitMapT )
Jan 18 11:20:09 kube-node-6 kernel: [8229003.523708] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71/0 drbd1004 kube-node-2: receive bitmap stats [Bytes(packets)]: plain 0(0), RLE 36(1), total 36; compression: 100.0%
Jan 18 11:20:09 kube-node-6 kernel: [8229003.526049] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71/0 drbd1004 kube-node-2: send bitmap stats [Bytes(packets)]: plain 0(0), RLE 36(1), total 36; compression: 100.0%
Jan 18 11:20:09 kube-node-6 kernel: [8229003.526060] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71/0 drbd1004 kube-node-2: helper command: /sbin/drbdadm before-resync-target
Jan 18 11:20:09 kube-node-6 kernel: [8229003.531436] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71/0 drbd1004 kube-node-2: helper command: /sbin/drbdadm before-resync-target exit code 0
Jan 18 11:20:09 kube-node-6 kernel: [8229003.531477] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71/0 drbd1004: disk( Outdated -> Inconsistent )
Jan 18 11:20:09 kube-node-6 kernel: [8229003.531478] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71/0 drbd1004 kube-node-2: repl( WFBitMapT -> SyncTarget )
Jan 18 11:20:09 kube-node-6 kernel: [8229003.531479] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71/0 drbd1004 kube-node-1: resync-susp( no -> connection dependency )
Jan 18 11:20:09 kube-node-6 kernel: [8229003.531529] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71/0 drbd1004 kube-node-2: Began resync as SyncTarget (will sync 240 KB [60 bits set]).
Jan 18 11:20:09 kube-node-6 kernel: [8229003.835600] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71/0 drbd1004 kube-node-2: Resync done (total 1 sec; paused 0 sec; 240 K/sec)
Jan 18 11:20:09 kube-node-6 kernel: [8229003.835606] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71/0 drbd1004 kube-node-2: updated UUIDs C7E46E77C301DB62:0000000000000000:0000000000000000:0000000000000000
Jan 18 11:20:09 kube-node-6 kernel: [8229003.835614] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71/0 drbd1004: disk( Inconsistent -> UpToDate )
Jan 18 11:20:09 kube-node-6 kernel: [8229003.835616] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71/0 drbd1004 kube-node-2: repl( SyncTarget -> Established )
Jan 18 11:20:09 kube-node-6 kernel: [8229003.835618] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71/0 drbd1004 kube-node-1: resync-susp( connection dependency -> no )
Jan 18 11:20:09 kube-node-6 kernel: [8229003.924794] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71/0 drbd1004 kube-node-2: helper command: /sbin/drbdadm after-resync-target
Jan 18 11:20:09 kube-node-6 kernel: [8229003.929956] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71/0 drbd1004 kube-node-2: helper command: /sbin/drbdadm after-resync-target exit code 0
Jan 18 11:20:11 kube-node-6 kernel: [8229005.558460] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71: Preparing cluster-wide state change 3933973722 (1->3 499/146)
Jan 18 11:20:11 kube-node-6 kernel: [8229005.558961] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71: State change 3933973722: primary_nodes=1, weak_nodes=FFFFFFFFFFFFFFF4
Jan 18 11:20:11 kube-node-6 kernel: [8229005.558963] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71: Committing cluster-wide state change 3933973722 (0ms)
Jan 18 11:20:11 kube-node-6 kernel: [8229005.558978] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71 kube-node-1: conn( Connecting -> Connected ) peer( Unknown -> Secondary )
Jan 18 11:20:11 kube-node-6 kernel: [8229005.558979] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71/0 drbd1004: disk( UpToDate -> Outdated )
Jan 18 11:20:11 kube-node-6 kernel: [8229005.558980] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71/0 drbd1004 kube-node-1: pdsk( DUnknown -> UpToDate ) repl( Off -> WFBitMapT )
Jan 18 11:20:11 kube-node-6 kernel: [8229005.757947] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71/0 drbd1004 kube-node-1: receive bitmap stats [Bytes(packets)]: plain 0(0), RLE 32(1), total 32; compression: 100.0%
Jan 18 11:20:11 kube-node-6 kernel: [8229005.762071] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71/0 drbd1004 kube-node-1: send bitmap stats [Bytes(packets)]: plain 0(0), RLE 32(1), total 32; compression: 100.0%
Jan 18 11:20:11 kube-node-6 kernel: [8229005.762088] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71/0 drbd1004 kube-node-1: helper command: /sbin/drbdadm before-resync-target
Jan 18 11:20:11 kube-node-6 kernel: [8229005.766882] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71/0 drbd1004 kube-node-1: helper command: /sbin/drbdadm before-resync-target exit code 0
Jan 18 11:20:11 kube-node-6 kernel: [8229005.766894] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71/0 drbd1004: disk( Outdated -> Inconsistent )
Jan 18 11:20:11 kube-node-6 kernel: [8229005.766895] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71/0 drbd1004 kube-node-2: resync-susp( no -> connection dependency )
Jan 18 11:20:11 kube-node-6 kernel: [8229005.766896] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71/0 drbd1004 kube-node-1: repl( WFBitMapT -> SyncTarget )
Jan 18 11:20:11 kube-node-6 kernel: [8229005.766944] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71/0 drbd1004 kube-node-1: Began resync as SyncTarget (will sync 12 KB [3 bits set]).
Jan 18 11:20:11 kube-node-6 kernel: [8229005.890811] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71/0 drbd1004 kube-node-1: Resync done (total 1 sec; paused 0 sec; 12 K/sec)
Jan 18 11:20:11 kube-node-6 kernel: [8229005.890816] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71/0 drbd1004 kube-node-1: updated UUIDs C7E46E77C301DB62:0000000000000000:0000000000000000:0000000000000000
Jan 18 11:20:11 kube-node-6 kernel: [8229005.890823] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71/0 drbd1004: disk( Inconsistent -> UpToDate )
Jan 18 11:20:11 kube-node-6 kernel: [8229005.890825] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71/0 drbd1004 kube-node-2: resync-susp( connection dependency -> no )
Jan 18 11:20:11 kube-node-6 kernel: [8229005.890826] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71/0 drbd1004 kube-node-1: repl( SyncTarget -> Established )
Jan 18 11:20:11 kube-node-6 kernel: [8229006.082283] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71/0 drbd1004 kube-node-1: helper command: /sbin/drbdadm after-resync-target
Jan 18 11:20:11 kube-node-6 kernel: [8229006.089354] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71/0 drbd1004 kube-node-1: helper command: /sbin/drbdadm after-resync-target exit code 0
Jan 18 11:21:00 kube-node-6 kernel: [8229054.390264] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71: State change failed: Need a verify algorithm to start online verify
Jan 18 11:21:00 kube-node-6 kernel: [8229054.391539] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71/0 drbd1004 kube-node-1: Failed: repl( Established -> VerifyS )
Jan 18 11:21:05 kube-node-6 kernel: [8229060.036061] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71: State change failed: Need a verify algorithm to start online verify
Jan 18 11:21:05 kube-node-6 kernel: [8229060.037602] drbd pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71/0 drbd1004 kube-node-1: Failed: repl( Established -> VerifyS )

@ghernadi
Copy link
Contributor

Thank you. Can you please also provide the version of DRBD utils as well as kmod?
drbdadm --version (includes both version)

@tumurzakov
Copy link
Author

tumurzakov commented Jan 19, 2021

root@kube-node-2# drbdadm --version
DRBDADM_BUILDTAG=GIT-hash:\ fa9b9d3823b6e1792919e711fcf6164cac629290\ build\ by\ buildd@lgw01-amd64-011\,\ 2020-11-05\ 11:51:01
DRBDADM_API_VERSION=2
DRBD_KERNEL_VERSION_CODE=0x090019
DRBD_KERNEL_VERSION=9.0.25
DRBDADM_VERSION_CODE=0x090f01
DRBDADM_VERSION=9.15.1
root@kube-node-6# drbdadm --version
DRBDADM_BUILDTAG=GIT-hash:\ fa9b9d3823b6e1792919e711fcf6164cac629290\ build\ by\ buildd@lgw01-amd64-011\,\ 2020-11-05\ 11:51:01
DRBDADM_API_VERSION=2
DRBD_KERNEL_VERSION_CODE=0x090019
DRBD_KERNEL_VERSION=9.0.25
DRBDADM_VERSION_CODE=0x090f01
DRBDADM_VERSION=9.15.1
root@kube-node-1# drbdadm --version
DRBDADM_BUILDTAG=GIT-hash:\ fa9b9d3823b6e1792919e711fcf6164cac629290\ build\ by\ buildd@lgw01-amd64-011\,\ 2020-11-05\ 11:51:01
DRBDADM_API_VERSION=2
DRBD_KERNEL_VERSION_CODE=0x090019
DRBD_KERNEL_VERSION=9.0.25
DRBDADM_VERSION_CODE=0x090f01
DRBDADM_VERSION=9.15.1

Thank you for your awesome project!

@tumurzakov
Copy link
Author

tumurzakov commented Jan 19, 2021

  1. kube-node-7 down
  2. linstor n lost kube-node-7
  3. kube-node-7 up
  4. linstor node create kube-node-7 192.168.11.246
  5. linstor sp create lvmthin kube-node-7 linstor-pool ubuntu-vg/linstor-pool
  6. After that command statuses was
┊ pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71 ┊ kube-node-2   ┊ 7004 ┊ InUse  ┊ Connecting(kube-node-7) ┊ UpToDate ┊ 2020-10-26 05:45:54 ┊
┊ pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71 ┊ kube-node-6   ┊ 7004 ┊ Unused ┊ Connecting(kube-node-7) ┊ UpToDate ┊ 2020-10-26 05:45:52 ┊

Almost all resources was in Connecting(kube-node-7) status, after restarting only 3 of them become StandAlone(kube-node-7), other Ok. Thouse resources has only two of three replicas and don't replicate automaticaly.
7. linstor rd ap pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71 --storage-pool=linstor-pool --place-count=3 replca on kube-node-1 was created. Replicas on kube-node-2 and kube-node-6 still in StandAlone status

┊ pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71 ┊ kube-node-2   ┊ 7004 ┊ InUse  ┊ StandAlone(kube-node-7) ┊ UpToDate ┊ 2020-10-26 05:45:54 ┊
┊ pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71 ┊ kube-node-6   ┊ 7004 ┊ Unused ┊ StandAlone(kube-node-7) ┊ UpToDate ┊ 2020-10-26 05:45:52 ┊
  1. satellites restarted and everything become ok.

I have terminal output from whole process on controller, if you interested i could send it to you by email confidentially

@ghernadi
Copy link
Contributor

I am trying to reproduce this issue but without success so far..

Starting from an empty Linstor-cluster:

linstor c sp DrbdOptions/auto-add-quorum-tiebreaker False
linstor n c bravo
linstor n c charlie
linstor n c delta
ssh root@bravo lvcreate --size 9G -T scratch/thin
ssh root@charlie lvcreate --size 9G -T scratch/thin
ssh root@delta lvcreate --size 9G -T scratch/thin
linstor sp c lvmthin bravo lvmthinpool scratch/thin
linstor sp c lvmthin charlie lvmthinpool scratch/thin
linstor sp c lvmthin delta lvmthinpool scratch/thin
linstor rd c rsc
linstor vd c rsc 1G
linstor r c bravo charlie delta rsc -s lvmthinpool
ssh root@bravo drbdadm primary rsc
sleep 10.0s
# shutting down delta(Satellite)
ssh root@delta drbdsetup down all
linstor n lo delta
linstor --no-utf8 --no-color r l -a
+---------------------------------------------------------------------------------+
| ResourceName | Node    | Port | Usage  | Conns |    State | CreatedOn           |
|=================================================================================|
| rsc          | bravo   | 7000 | InUse  | Ok    | UpToDate | 2021-01-19 07:22:53 |
| rsc          | charlie | 7000 | Unused | Ok    | UpToDate | 2021-01-19 07:22:54 |
+---------------------------------------------------------------------------------+

# starting delta(Satellite)
linstor n c delta
linstor sp c lvmthin delta lvmthinpool scratch/thin
linstor --no-utf8 --no-color r l -a
+---------------------------------------------------------------------------------+
| ResourceName | Node    | Port | Usage  | Conns |    State | CreatedOn           |
|=================================================================================|
| rsc          | bravo   | 7000 | InUse  | Ok    | UpToDate | 2021-01-19 07:22:53 |
| rsc          | charlie | 7000 | Unused | Ok    | UpToDate | 2021-01-19 07:22:54 |
+---------------------------------------------------------------------------------+

linstor --no-utf8 --no-color sp l
+-----------------------------------------------------------------------------------------------------------------------------+
| StoragePool          | Node    | Driver   | PoolName     | FreeCapacity | TotalCapacity | CanSnapshots | State | SharedName |
|=============================================================================================================================|
| DfltDisklessStorPool | bravo   | DISKLESS |              |              |               | False        | Ok    |            |
| DfltDisklessStorPool | charlie | DISKLESS |              |              |               | False        | Ok    |            |
| DfltDisklessStorPool | delta   | DISKLESS |              |              |               | False        | Ok    |            |
| lvmthinpool          | bravo   | LVM_THIN | scratch/thin |     9.00 GiB |         9 GiB | True         | Ok    |            |
| lvmthinpool          | charlie | LVM_THIN | scratch/thin |     9.00 GiB |         9 GiB | True         | Ok    |            |
| lvmthinpool          | delta   | LVM_THIN | scratch/thin |     9.00 GiB |         9 GiB | True         | Ok    |            |
+-----------------------------------------------------------------------------------------------------------------------------+

linstor rd ap rsc -s lvmthinpool --place-count 3
linstor --no-utf8 --no-color r l -a
+---------------------------------------------------------------------------------+
| ResourceName | Node    | Port | Usage  | Conns |    State | CreatedOn           |
|=================================================================================|
| rsc          | bravo   | 7000 | InUse  | Ok    | UpToDate | 2021-01-19 07:22:53 |
| rsc          | charlie | 7000 | Unused | Ok    | UpToDate | 2021-01-19 07:22:54 |
| rsc          | delta   | 7000 | Unused | Ok    | UpToDate | 2021-01-19 07:23:12 |
+---------------------------------------------------------------------------------+

Do you see something missing from my test?

I have terminal output from whole process on controller, if you interested i could send it to you by email confidentially

Yes, please email it to me (mail-address is in my profile), maybe I can also find some difference...

I reopen this issue for now, as I think that restarting the satellite only fixes the symptom, but we should still address the actual issue..

@ghernadi ghernadi reopened this Jan 19, 2021
@tumurzakov
Copy link
Author

I think that error occured because i made unexpected actions:

Good scenario

  1. Host is down
  2. Wait, because nothing happen, when host is up, replication continue
  3. Host is up, replication catch actual state

My scenario

  1. Host is down
  2. Panic
  3. Host is up
  4. linstor n lost kube-node-7
  5. linstor node create kube-node-7 192.168.11.246, errors occured while adding node, ignoring...
  6. linstor sp create lvmthin kube-node-7 linstor-pool ubuntu-vg/linstor-pool
  7. Many resources tries to connect kube-node-7
  8. Restart controller, huh, better, only 3 of them now in unknown StandAlone status
  9. Now thouse resources have only 2 replica, lets add one more manually, linstor rd ap pvc-1ed58414-0a3d-415d-8b20-80d4fae9cf71 --storage-pool=linstor-pool --place-count=3
  10. Better, but that status...
  11. Check drbd on nodes, everything fine, create an issue on github.
  12. Restart satellite, wow, now everything as it was before.
  13. On kube-node-7 many unregistered lvm storages, huh, delete them...
  14. Profit.

@ghernadi
Copy link
Contributor

Maybe I can explain here a bit: the errors you got in step 5 were triggered by the auto-tiebreaker, which tried to add.. well.. tiebreaker-resources on the newly created kube-node-7.
Note to myself: we should actually also trigger the auto-tiebreaker on the node lost command... should make sense, but I'll investigate into that later.
However, that auto-tiebreaker should have worked, but you somehow stumbled across my currently most hated bug called "duplicated node-id". I am trying to reproduce it for a few months now and I thought we fixed that in 1.9 or 1.10 release, but as you are using 1.11, I was apparently wrong.
Currently I highly assume that there has to be a situation like "it looks broken, but I can fix it" where after that fix everything looks fine, but internally it is not which will in the end trigger this duplicated node-id bug on the next try / error / whatever... But as I said, I could not reliably reproduce this issue.

We will try to take a look into the issue with the "connecting kube-node-7" status although kube-node-7 no longer had the resource.

Regarding step 13: that is expected as the node lost command only remove the node and all of its resources from the Controller's database (and updated the still online satellites), but as kube-node-7 was offline at that time, there was no way for Linstor to also clean up those resources. As the controller also got rid of all the resources, there was no information about which LVs should be cleaned up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants