Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a special scheduler for evict leader with timeout #2782

Open
nolouch opened this issue Aug 18, 2020 · 7 comments
Open

Add a special scheduler for evict leader with timeout #2782

nolouch opened this issue Aug 18, 2020 · 7 comments
Labels
component/scheduler Scheduler logic. status/discussion-wanted The issue needs to be discussed. type/enhancement The issue or PR belongs to an enhancement.

Comments

@nolouch
Copy link
Contributor

nolouch commented Aug 18, 2020

Feature Request

Describe your feature request related problem

To upgrade the TiKV cluster, we will use evict-leader-scheduler to ensure the restart TiKV has no leader. but we encountered the problem that the evict-leader-scheduler was not deleted many times during the rolling upgrade process. In order to better solve the problem, we can provide a special evict leader scheduling with a timeout for the deployment tool.

Describe the feature you'd like

Add a special scheduler for evict leader with timeout

Describe alternatives you've considered

the timeout should suitable in most cases.

Teachability, Documentation, Adoption, Migration Strategy

  • Go
@nolouch nolouch added type/enhancement The issue or PR belongs to an enhancement. component/scheduler Scheduler logic. labels Aug 18, 2020
@BusyJay
Copy link
Member

BusyJay commented Aug 18, 2020

Timeout is unpredictable. No one knows what's an appropriate value. I suggest to use a steaming grpc call or long http connection. The evict-leader-scheduler is added once the call/connection is established, and removed once the call/connection is aborted or finished.

@nolouch nolouch added the status/discussion-wanted The issue needs to be discussed. label Aug 18, 2020
@disksing
Copy link
Contributor

disksing commented Aug 18, 2020

I'd like to propose another approach.
Instead of depending on a special scheduler, we can introduce a new store state. Let's just call it UNLOAD. With the state, suppose we want to upgrade a tikv node:

  1. We use pd-ctl or pd API to set the node's state to UNLOAD.
  2. PD creates operators to transfer all leaders out of the store (just like evict-leader, but without creating the scheduler).
  3. After the leader count becomes 0, PD changes the node's state to UNLOADED.
  4. User restart / upgrade the TiKV node.
  5. When PD receives putStore command from the TiKV node, it updates the node's state to Up.

@BusyJay
Copy link
Member

BusyJay commented Aug 18, 2020

@disksing I suggested a similar solution in chat. However this doesn't solve the case that user abort the operations, in which case TiKV doesn't have to be restarted.

The solution I suggested above doesn't require a new scheduler and also work in all known cases.

@3pointer
Copy link
Contributor

BR also meet the same problem, BR will temporary remove balance-region-scheduler, balance-leader-scheduler ... to speed up restoration. and finally add these schedulers back, but if BR was killed during restoration. these schedulers would lost, so BR need PD to provide the ability of temporary remove schedulers, such as remove scheduler ttl option

@kennytm
Copy link

kennytm commented Sep 25, 2020

Currently PD supports the following TTL-based API:

  • Service GC Safepoint (a.k.a. GC-TTL)

This issues requests a TTL-based API to:

  • Add a scheduler (evict-leader-scheduler)

BR requests 3 TTL-based APIs to:

  • Remove schedulers (balance-* and shuffle-*)
  • Decrease config value to 0 (max-merge-region-{keys,size})
  • Increase config value to 40 ({leader,region}-schedule-limit and max-snapshot-count)

Question: what to do when multiple services require conflicting settings? In GC-TTL the conflict resolution is simple: just set the safepoint to min of all alive services. But for the new APIs... say, service A registers to remove scheduler X, and service B registers to add the same scheduler X, how should this be resolved?

I see two solutions for now:

  1. Select only some specific schedulers and configs, with a clear direction of resolution, e.g. evict-leader-scheduler can only be registered to be added, not removed; the balance-* schedulers can only be removed, not added; max-merge-region-size can only be decreased, not increased, etc.

  2. First-come-first-serve: while a service TTL for a particular scheduler/config is alive, no other services can register TTL to the same scheduler/config.

We also need to consider the interaction with the existing dynamic (permanent) changes. For instance, if a service has registered to set max-snapshot-count to 40, what effect we get if we run

  • pd-ctl config set max-snapshot-count 2 ?
  • pd-ctl config set max-snapshot-count 80 ?

@3pointer
Copy link
Contributor

I think the First-come-first-serve solution is better. for two reasons

  1. Not all config has a clear direction. e.g. {leader,region}-schedule-limit
  2. TTL logic is simple. and we can make TTL based on service not config. if service A has registered to PD with in TTL, then PD will deny all other services' requests.

@kennytm
Copy link

kennytm commented Oct 6, 2020

For removing schedulers we could use the "Pause" API (#1831), which is available on 3.1 and 4.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/scheduler Scheduler logic. status/discussion-wanted The issue needs to be discussed. type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
Development

No branches or pull requests

5 participants