Add a special scheduler for evict leader with timeout #2782

nolouch · 2020-08-18T03:26:44Z

Feature Request

Describe your feature request related problem

To upgrade the TiKV cluster, we will use evict-leader-scheduler to ensure the restart TiKV has no leader. but we encountered the problem that the evict-leader-scheduler was not deleted many times during the rolling upgrade process. In order to better solve the problem, we can provide a special evict leader scheduling with a timeout for the deployment tool.

Describe the feature you'd like

Add a special scheduler for evict leader with timeout

Describe alternatives you've considered

the timeout should suitable in most cases.

Teachability, Documentation, Adoption, Migration Strategy

Go

The text was updated successfully, but these errors were encountered:

BusyJay · 2020-08-18T05:37:16Z

Timeout is unpredictable. No one knows what's an appropriate value. I suggest to use a steaming grpc call or long http connection. The evict-leader-scheduler is added once the call/connection is established, and removed once the call/connection is aborted or finished.

disksing · 2020-08-18T07:25:07Z

I'd like to propose another approach.
Instead of depending on a special scheduler, we can introduce a new store state. Let's just call it UNLOAD. With the state, suppose we want to upgrade a tikv node:

We use pd-ctl or pd API to set the node's state to UNLOAD.
PD creates operators to transfer all leaders out of the store (just like evict-leader, but without creating the scheduler).
After the leader count becomes 0, PD changes the node's state to UNLOADED.
User restart / upgrade the TiKV node.
When PD receives putStore command from the TiKV node, it updates the node's state to Up.

BusyJay · 2020-08-18T12:21:28Z

@disksing I suggested a similar solution in chat. However this doesn't solve the case that user abort the operations, in which case TiKV doesn't have to be restarted.

The solution I suggested above doesn't require a new scheduler and also work in all known cases.

3pointer · 2020-09-25T05:17:03Z

BR also meet the same problem, BR will temporary remove balance-region-scheduler, balance-leader-scheduler ... to speed up restoration. and finally add these schedulers back, but if BR was killed during restoration. these schedulers would lost, so BR need PD to provide the ability of temporary remove schedulers, such as remove scheduler ttl option

kennytm · 2020-09-25T10:28:51Z

Currently PD supports the following TTL-based API:

Service GC Safepoint (a.k.a. GC-TTL)

This issues requests a TTL-based API to:

Add a scheduler (evict-leader-scheduler)

BR requests 3 TTL-based APIs to:

Remove schedulers (balance-* and shuffle-*)
Decrease config value to 0 (max-merge-region-{keys,size})
Increase config value to 40 ({leader,region}-schedule-limit and max-snapshot-count)

Question: what to do when multiple services require conflicting settings? In GC-TTL the conflict resolution is simple: just set the safepoint to min of all alive services. But for the new APIs... say, service A registers to remove scheduler X, and service B registers to add the same scheduler X, how should this be resolved?

I see two solutions for now:

Select only some specific schedulers and configs, with a clear direction of resolution, e.g. evict-leader-scheduler can only be registered to be added, not removed; the balance-* schedulers can only be removed, not added; max-merge-region-size can only be decreased, not increased, etc.
First-come-first-serve: while a service TTL for a particular scheduler/config is alive, no other services can register TTL to the same scheduler/config.

We also need to consider the interaction with the existing dynamic (permanent) changes. For instance, if a service has registered to set max-snapshot-count to 40, what effect we get if we run

pd-ctl config set max-snapshot-count 2 ?
pd-ctl config set max-snapshot-count 80 ?

3pointer · 2020-09-27T02:46:35Z

I think the First-come-first-serve solution is better. for two reasons

Not all config has a clear direction. e.g. {leader,region}-schedule-limit
TTL logic is simple. and we can make TTL based on service not config. if service A has registered to PD with in TTL, then PD will deny all other services' requests.

kennytm · 2020-10-06T06:54:40Z

For removing schedulers we could use the "Pause" API (#1831), which is available on 3.1 and 4.0.

nolouch added type/enhancement The issue or PR belongs to an enhancement. component/scheduler Scheduler logic. labels Aug 18, 2020

nolouch added the status/discussion-wanted The issue needs to be discussed. label Aug 18, 2020

3pointer mentioned this issue Sep 25, 2020

validate: manual reset pd config back pingcap/br#530

Merged

3pointer mentioned this issue Sep 28, 2020

restore: disable some pd scheduler during restore pingcap/tidb-lightning#408

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a special scheduler for evict leader with timeout #2782

Add a special scheduler for evict leader with timeout #2782

nolouch commented Aug 18, 2020 •

edited

Loading

BusyJay commented Aug 18, 2020

disksing commented Aug 18, 2020 •

edited

Loading

BusyJay commented Aug 18, 2020

3pointer commented Sep 25, 2020

kennytm commented Sep 25, 2020 •

edited

Loading

3pointer commented Sep 27, 2020

kennytm commented Oct 6, 2020

Add a special scheduler for evict leader with timeout #2782

Add a special scheduler for evict leader with timeout #2782

Comments

nolouch commented Aug 18, 2020 • edited Loading

Feature Request

Describe your feature request related problem

Describe the feature you'd like

Describe alternatives you've considered

Teachability, Documentation, Adoption, Migration Strategy

BusyJay commented Aug 18, 2020

disksing commented Aug 18, 2020 • edited Loading

BusyJay commented Aug 18, 2020

3pointer commented Sep 25, 2020

kennytm commented Sep 25, 2020 • edited Loading

3pointer commented Sep 27, 2020

kennytm commented Oct 6, 2020

nolouch commented Aug 18, 2020 •

edited

Loading

disksing commented Aug 18, 2020 •

edited

Loading

kennytm commented Sep 25, 2020 •

edited

Loading