Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: dynamic configuration change #13660

Closed
wants to merge 2 commits into from

Conversation

rleungx
Copy link
Member

@rleungx rleungx commented Nov 21, 2019

Summary

This proposal proposes a unified way to manage the configuration options of TiDB, TiKV, and PD by storing them in PD and support dynamically change the configuration options by using the same way among the different components which can greatly improve usability.

Motivation

Here are some reasons why we need to do it:

  • For now, each component in TiDB cluster has its own configuration file, which is hard for management. we need a unified way to manage the configuration options of all components
  • Although some configuration options support dynamic modification, the operations need to learn a lot to use them properly since we have multiple entries, e.g., pd-ctl, tikv-ctl, and SQL, resulting in poor usability. For better usability, provide a unified way to modify them dynamically.

@rleungx rleungx changed the title proposal: dynamic configuration change proposal proposal: dynamic configuration change Nov 21, 2019
@codecov
Copy link

codecov bot commented Nov 21, 2019

Codecov Report

Merging #13660 into master will decrease coverage by 0.1506%.
The diff coverage is n/a.

@@               Coverage Diff                @@
##             master     #13660        +/-   ##
================================================
- Coverage   80.0635%   79.9129%   -0.1507%     
================================================
  Files           473        473                
  Lines        116440     115801       -639     
================================================
- Hits          93226      92540       -686     
- Misses        15924      15973        +49     
+ Partials       7290       7288         -2

Signed-off-by: Ryan Leung <rleungx@gmail.com>
@SunRunAway
Copy link
Contributor

SunRunAway commented Nov 21, 2019

There are also several scenarios to consider, it is recommended to add to the workflow:

  1. Let's say I want to upgrade the binary and change the configuration at the same time. What should I do in this scenario?
  2. A certain TiDB instance needs to debug a configuration parameter.
  3. Grayscale upgrade scenario
  4. Users have their own etcd to do configuration management
  5. Should an instance keep its local copy of a configuration file? When the remote configuration server is down, the instance can still start up.

In addition, it is best to describe how the workdflow is in the normal machines and kubernetes environment.

- New cluster. Both TiDB and TiKV use the default configuration and send it to
PD to complete the registration. The registration needs to establish the mapping
relationship between the component ID, version and local configuration. For
customized requirements, such as modifying the size of block cache. It needs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When does the customization happen? I think it should be happened before TiKV and TiDB to register themselves. Because some configurations won't take effect after restart.

Global global = 2;
}
string name = 3;
string value = 4;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All values are string?

files of those components can be removed and don't need to learn how those
tools works. It reduces administrative costs. For configuration options that
cannot be modified dynamically, we still can change it using this unified way,
but we need to wait for the next restart after modification to take effect.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a mechanism to tell user if they need to restart the cluster to make the modification to take effect.

docs/design/2019-11-21-dynamic-configuration-change.md Outdated Show resolved Hide resolved
The functions of each interface are as follows:

*Create* is used to register a configuration to PD when the components start.
*Get* is used to get the complete configuration of the component periodically
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would a Watch API be helpful? Client could watch the configuration version and only Get on version change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can directly use the version to decide if we need to return the configuration?


*Create* is used to register a configuration to PD when the components start.
*Get* is used to get the complete configuration of the component periodically
from PD and decide if the component configuration need to update. *Update* is
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does Update perform concurrency control?

These two types are used to distinguish whether the configuration is shared by
components. For example, the label configuration of TiKV is individual for each
TiKV instance. So the type should be local. Each instance here is uniquely identified
by the *component_id*, which can be obtained by hashing *IP: port*.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know little about the behavior convention of components, but the IP of a logical instance would change in many scenarios, wouldn't it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the registered address, in k8s the instance IP may change when upgraded but we can register a persistent address for each component.

registers or queries the component ID. By comparing the version carried by the
request with the version stored in PD, it determines whether to return the
configuration of the component in the response. After receiving the reply, TiDB
or TiKV decides whether to update the configuration or not after comparing with
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suppose PD lost the configurations in a disaster, can operator make configuration changes after recovery? Seems like that the changes will be ignored by component because the version in PD is less than version in component.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, the PD recover tool will recover the cluster-id and also pick a big enough alloc-id which needs bigger than the current allocated id. So I guess when storing configuration in PD, the corresponding config version needs to be handled the same way.

or TiKV decides whether to update the configuration or not after comparing with
the version stored in the component.

- Delete the node. PD can directly delete the corresponding component ID, the
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the API of deleting a node?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are two ways to delete a node. One is adding a delete API. Another one is using the TTL mechanism.


- Add a new component or restart the component. The initialization of
configuration calls the *Create* method. After receiving the request, PD first
registers or queries the component ID. By comparing the version carried by the
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does a new component or a restarted component determine which version to send in the request?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the version of each component also needs to persist itself.

Signed-off-by: Ryan Leung <rleungx@gmail.com>
@sre-bot
Copy link
Contributor

sre-bot commented Feb 7, 2020

@tennix, @aylei, PTAL.

1 similar comment
@sre-bot
Copy link
Contributor

sre-bot commented Feb 9, 2020

@tennix, @aylei, PTAL.

@sre-bot
Copy link
Contributor

sre-bot commented Feb 12, 2020

@tennix, @aylei, PTAL.

@AndreMouche
Copy link
Contributor

/label test

@ti-srebot
Copy link
Contributor

These labels are not found test.

@sre-bot
Copy link
Contributor

sre-bot commented Jun 19, 2020

@AndreMouche
Copy link
Contributor

/label test,wip

@ti-srebot
Copy link
Contributor

These labels are not found test,wip.

@ti-srebot
Copy link
Contributor

@tennix, @aylei, PTAL.

2 similar comments
@ti-srebot
Copy link
Contributor

@tennix, @aylei, PTAL.

@ti-srebot
Copy link
Contributor

@tennix, @aylei, PTAL.

@zz-jason
Copy link
Member

zz-jason commented Feb 9, 2021

I'm going to close this PR since it's hasn't been updated for a long time, feel free to reopen it if you are planning to continue this PR in the future. Thanks for your contribution.

@zz-jason zz-jason closed this Feb 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants