Make local cluster start idempotent #1031

tjtelan · 2021-04-28T01:14:44Z

If we try to start a cluster twice, we don't always get useful error messages.

$ fluvio cluster start --local
Performing pre-flight checks
✅ ok: Supported helm version is installed
✅ ok: Supported kubernetes version is installed
✅ ok: Kubernetes config is loadable
✅ ok: Fluvio system charts are installed
Successfully installed Fluvio!

$ fluvio cluster start --local
Performing pre-flight checks
✅ ok: Supported helm version is installed
✅ ok: Supported kubernetes version is installed
✅ ok: Kubernetes config is loadable
✅ ok: Fluvio system charts are installed
Error:
   0: Fluvio cluster error
   1: Fluvio cluster error
   2: Failed to install Fluvio locally
   3: Kubernetes client error
   4: client error: 409 Conflict

This error manifests during development often. We might forget that we have a local cluster started. Or we used to have a cluster started locally, but we've killed the process.

In my opinion, there are 2 different levels to address this.

Behave similar to fluvio cluster start (Kubernetes).
If a cluster appears to be running, we just say so. Otherwise, we suggest running fluvio cluster delete --local to clean up metadata.

ex. Successful kubernetes startup idempotency

$ fluvio cluster start
✅ ok: Kubernetes config is loadable
✅ ok: Supported helm version is installed
✅ ok: Load balancer is up
✅ ok: Fluvio system charts are installed
💙 note: Fluvio is already running
Successfully installed Fluvio!

We attempt to clean up however we can, and try to start a local cluster.

In my mind, this is effectively running fluvio cluster delete --local && fluvio cluster start --local on behalf of the user. We can inform them while we are cleaning up.

We could have option 1 be the default behavior, and provide option 2 behind a --force flag. The goal in mind is a scriptable path to starting a local cluster w/o needing to handle errors.

The text was updated successfully, but these errors were encountered:

nicholastmosher · 2021-04-28T13:38:59Z

One possible solution to this is to make the SC create a pid lockfile when it starts up, something like /usr/local/var/fluvio/sc.pid, which contains the process id of the currently-running SC in it. Then before starting up, the cluster installer can check for a lockfile and for a process with the same pid. If the lockfile or the process do not exist, then we can preemptively clean out any old metadata from the store so that we do not encounter the 409 conflict.

sehz · 2021-04-28T16:28:29Z

This needs to work across different OS and cluster types (local,k8). We certainly need fluvio cluster status API/Command similar to: https://minikube.sigs.k8s.io/docs/commands/status/.

Key points

User may have multiple clusters which may or may not be running locally
They need to switch between different clusters.

Note that this is admin management of clusters different from users's perspective of clusters which they are just consumer/producer.

github-actions · 2021-08-30T11:22:02Z

Stale issue message

github-actions · 2021-11-21T11:21:59Z

Stale issue message

github-actions · 2022-01-23T11:21:54Z

Stale issue message

BatmanAoD · 2022-08-28T19:49:30Z

@sehz Since #2044 added a check for a running local cluster, is a status API still necessary to resolve this? Or is it sufficient to simply make the nonrecoverable AlreadyInstalled and ExistingLocalCluster errors (in check/mod.rs) cause the CLI to exit with a 0 exit code (rather than an error code) and a message stating that the cluster is already running?

sehz · 2022-08-28T20:08:43Z

Yes, it's OK to return a non-recoverable error.

BatmanAoD · 2022-08-28T21:59:11Z

@sehz Sorry, I think I wasn't clear; those are errors that already exist in the code, and are considered "nonrecoverable" (they're part of the UnrecoverableCheckStatus enum). My question is whether changing the behavior when those are encountered, so that the process exits with a "success" status (0), would be sufficient to satisfy option 1 for this issue: "If a cluster appears to be running, we just say so."

tjtelan added enhancement New feature or request technical debt labels Apr 28, 2021

github-actions bot added the no-issue-activity label Aug 30, 2021

github-actions bot closed this as completed Sep 6, 2021

sehz removed the no-issue-activity label Sep 21, 2021

sehz reopened this Sep 21, 2021

github-actions bot added the no-issue-activity label Nov 21, 2021

tjtelan added help wanted Good issue for community involvement good first issue Good for newcomers and removed no-issue-activity labels Nov 23, 2021

github-actions bot added the no-issue-activity label Jan 23, 2022

tjtelan removed the no-issue-activity label Jan 23, 2022

tjtelan added the no-stale Opt-out of closing issue due to no activity label Apr 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make local cluster start idempotent #1031

Make local cluster start idempotent #1031

tjtelan commented Apr 28, 2021

nicholastmosher commented Apr 28, 2021

sehz commented Apr 28, 2021

github-actions bot commented Aug 30, 2021

github-actions bot commented Nov 21, 2021

github-actions bot commented Jan 23, 2022

BatmanAoD commented Aug 28, 2022

sehz commented Aug 28, 2022

BatmanAoD commented Aug 28, 2022

Make local cluster start idempotent #1031

Make local cluster start idempotent #1031

Comments

tjtelan commented Apr 28, 2021

nicholastmosher commented Apr 28, 2021

sehz commented Apr 28, 2021

github-actions bot commented Aug 30, 2021

github-actions bot commented Nov 21, 2021

github-actions bot commented Jan 23, 2022

BatmanAoD commented Aug 28, 2022

sehz commented Aug 28, 2022

BatmanAoD commented Aug 28, 2022