Gracefully stop tidb pod #2597

tennix · 2020-06-01T13:09:36Z

Feature Request

When stopping a tidb pod, kubelet will first send TERM signal to tidb-server, tidb-server will wait for 15s before closing all SQL connections and then do all the cleanup. If it fails to finish all the cleanup, then kubelet will send KILL signal to tidb-server, and tidb-server will exit immediately.

When tidb-server fails to respond to the client, the client will timeout or fail, thus increasing latency.

We might consider using a readiness probe to let the load balancer (internal or external) to remove the backend first before making the server fail to respond to the client. This would help to reduce the latency during tidb pods shutdown.

The key point is letting readiness probe fail but still be able to receive requests.

ref: https://github.com/pingcap/tidb/blob/v4.0.1/server/server.go#L553-L557

The text was updated successfully, but these errors were encountered:

zjj2wry · 2020-06-01T13:50:30Z

maybe we can add a prestop lifeclycle to guarantee tidb-server have enough time to closing all SQL connections gracefully

lifeCycle:
    preStop:
      exec:
        command: ["/bin/bash", "-c", "sleep 15"]

cofyc · 2020-06-16T05:05:29Z

maybe we can add a prestop lifeclycle to guarantee tidb-server have enough time to closing all SQL connections gracefully
lifeCycle:
    preStop:
      exec:
        command: ["/bin/bash", "-c", "sleep 15"]

It seems a good solution. In addition to builtin 15 seconds wait time, users can configure extra wait time in preStop hook.

The key point is letting readiness probe fail but still be able to receive requests.

When a pod is going to be deleted (DeletionTimestamp != nil), its IP is ignored in endpoints controller and will be completely removed from endpoints.

weekface · 2020-06-22T03:43:32Z

When we start to update TiDB, the StatefulSet controller sends command to delete Pod. The pod is removed from endpoints list for service, and are no longer considered part of the set of running Pods for replication controllers. Then the LB(internal or external) can remove the backend.

Next, we should wait for the connections to become 0 and then stop the pod:

Set TiDB Pod's spec.terminationGracePeriodSeconds to be large enough (for example, one hour or configurable);
Configure the preStop hook script:
1. Send SIGQUIT signal to TiDB pod, then the TiDB process will go graceful shutting down to wait for all clients to close the connection;
2. Get the current connections from TiDB's /status API, wait it reach 0.

/status API of TiDB has current connections:

$ curl 10.233.123.121:10080/status
{"connections":1,"version":"5.7.25-TiDB-v4.0.0-rc-141-g7267747ae","git_hash":"7267747ae0ec624dffc3fdedb00f1ed36e10284b"}

cofyc · 2020-06-22T06:11:54Z

maybe we can just allow users to specify container lifecycle hooks for our components

we can provide an example (and docs, etc.) for this scenario, but don't limit the possibilities, as if the application tolerate connection failures, they may don't need extra wait time or don't need to wait for all connections to be closed

DanielZhangQD · 2020-06-24T06:38:16Z

container lifecycle hooks

Agree.

For the SIGQUIT signal, TiDB should handle the SIGTERM in the same way, pingcap/tidb#13369 was created long time ago but seems no action from TiDB.

BTW, for spec.terminationGracePeriodSeconds, we cannot set it too long, otherwise, it may block the upgrade for too much time.

tennix added the enhancement New feature or request label Jun 1, 2020

cofyc added the status/help-wanted Extra attention is needed label Jun 8, 2020

DanielZhangQD added this to the v1.1.2 milestone Jun 16, 2020

weekface self-assigned this Jun 22, 2020

DanielZhangQD added the priority:P2 label Jun 24, 2020

weekface mentioned this issue Jun 24, 2020

Add support to stop TiDB gracefully #2810

Merged

cofyc closed this as completed in #2810 Jul 1, 2020

ti-srebot mentioned this issue Jul 1, 2020

Add support to stop TiDB gracefully (#2810) #2848

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gracefully stop tidb pod #2597

Gracefully stop tidb pod #2597

tennix commented Jun 1, 2020 •

edited by cofyc

Loading

zjj2wry commented Jun 1, 2020 •

edited

Loading

cofyc commented Jun 16, 2020

weekface commented Jun 22, 2020 •

edited

Loading

cofyc commented Jun 22, 2020 •

edited

Loading

DanielZhangQD commented Jun 24, 2020

Gracefully stop tidb pod #2597

Gracefully stop tidb pod #2597

Comments

tennix commented Jun 1, 2020 • edited by cofyc Loading

Feature Request

zjj2wry commented Jun 1, 2020 • edited Loading

cofyc commented Jun 16, 2020

weekface commented Jun 22, 2020 • edited Loading

cofyc commented Jun 22, 2020 • edited Loading

DanielZhangQD commented Jun 24, 2020

tennix commented Jun 1, 2020 •

edited by cofyc

Loading

zjj2wry commented Jun 1, 2020 •

edited

Loading

weekface commented Jun 22, 2020 •

edited

Loading

cofyc commented Jun 22, 2020 •

edited

Loading