Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

able to scale in/out with delete slots feature #1361

Merged
merged 20 commits into from
Dec 27, 2019
Merged

Conversation

cofyc
Copy link
Contributor

@cofyc cofyc commented Dec 18, 2019

What problem does this PR solve?

xref: #1348

What is changed and how does it work?

  • use advanced-statefulset v0.2.1
  • able to scale in/out with delete slots feature

Check List

Tests

  • Unit test
  • E2E test

Does this PR introduce a user-facing change?:

able to scale in/out with delete slots feature of advanced statefulset

go.mod Outdated
@@ -142,3 +142,5 @@ replace k8s.io/legacy-cloud-providers => k8s.io/legacy-cloud-providers v0.0.0-20
replace k8s.io/node-api => k8s.io/node-api v0.0.0-20190918163711-2299658ad911

replace github.com/uber-go/atomic => go.uber.org/atomic v1.5.0

replace github.com/pingcap/advanced-statefulset => /data/go/src/github.com/pingcap/advanced-statefulset
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably don't want to commit this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is for local development only, I'll remove this

@cofyc cofyc force-pushed the fix1348 branch 2 times, most recently from 7c30943 to 30169a5 Compare December 19, 2019 13:15
@cofyc
Copy link
Contributor Author

cofyc commented Dec 19, 2019

/run-e2e-test

@cofyc cofyc force-pushed the fix1348 branch 6 times, most recently from 1a007b1 to cd53f6f Compare December 23, 2019 02:51
scope: Namespaced
names:
plural: statefulsets
singular: statefulset
kind: StatefulSet
shortNames:
- asts
versions:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are already two versions?

@cofyc cofyc force-pushed the fix1348 branch 3 times, most recently from 08477b9 to 4dae600 Compare December 23, 2019 06:31
@cofyc cofyc changed the title WIP: able to scale in/out with delete slots feature able to scale in/out with delete slots feature Dec 23, 2019
@cofyc
Copy link
Contributor Author

cofyc commented Dec 23, 2019

/run-e2e-test

tests/actions.go Outdated
@@ -393,15 +397,19 @@ func (oi *OperatorConfig) OperatorHelmSetString(m map[string]string) string {
"scheduler.kubeSchedulerImageName": oi.SchedulerImage,
"controllerManager.logLevel": oi.LogLevel,
"scheduler.logLevel": "4",
"controllerManager.replicas": "2",
"scheduler.replicas": "2",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make it configurable, in some tests, we can use 1 replica to consumes fewer resources

tests/actions.go Outdated
name := fmt.Sprintf("%s-%d", memberName, 0)
pod, err := kubeCli.CoreV1().Pods(namespace).Get(name, metav1.GetOptions{})
// getMemberContainer gets member container
func getMemberContainer(kubeCli kubernetes.Interface, namespace, tcName, component string) (*corev1.Container, bool) {
Copy link
Contributor Author

@cofyc cofyc Dec 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of fetching the -0 pod, we list the pods and use the first one.

tests/dt.go Outdated
@@ -63,7 +63,7 @@ func (oa *operatorActions) LabelNodes() error {
}

for i, node := range nodes.Items {
err := wait.Poll(3*time.Second, time.Minute, func() (bool, error) {
err := wait.PollImmediate(3*time.Second, time.Minute, func() (bool, error) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to wait at the first try, this reduces time

@cofyc cofyc force-pushed the fix1348 branch 4 times, most recently from 9e60e30 to 20c79ba Compare December 23, 2019 08:03
@cofyc
Copy link
Contributor Author

cofyc commented Dec 23, 2019

/run-e2e-test

@@ -41,7 +41,7 @@ type DBInfo struct {
// TiDBControlInterface is the interface that knows how to manage tidb peers
type TiDBControlInterface interface {
// GetHealth returns tidb's health info
GetHealth(tc *v1alpha1.TidbCluster) map[string]bool
GetHealth(tc *v1alpha1.TidbCluster, ordinal int32) (bool, error)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tidb member controller will decide which member we need to request the health status

return nil
}
// Following cases are not supported yet.
// TODO support it or validate in validation phase
Copy link
Contributor Author

@cofyc cofyc Dec 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some desired states are not allowed because the controller has to delete and create pods. I decided to implement in the future. There are two ways to solve this issue:

  1. avoid in validation phase
  2. support it in multiple scaling operations (e.g. by scaling out first and then scaling in)
    -replicas 5, delete slots [0] is changed to replicas 4, delete slots[1], we need to delete pod 1 and 5 and create pod 4.

// from a string like feature1=true,feature2=false,...
Set(value string) error
// SetFromMap stores flag gates for enabled features from a map[string]bool
SetFromMap(m map[string]bool)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

able to change features settings in unit tests

if err != nil {
return err
}
resetReplicas(newSet, oldSet)
Copy link
Contributor Author

@cofyc cofyc Dec 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the scaling logic like this:

  • get desired replicas and delete slot by comparing the old and new statefulsets
  • reset the new statefulset
  • if our scaling logic succeeds, set desired replicas and delete slots, otherwise, return an error

}

// scaleOne scales actual set to desired set by deleting or creating a replica
func scaleOne(actual *apps.StatefulSet, desired *apps.StatefulSet) (int32, int32, sets.Int, error) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function calculates desired replicas and delete slots from actual/desired stateful sets by allowing one pod to be deleted or created

func (psd *pdScaler) ScaleOut(tc *v1alpha1.TidbCluster, oldSet *apps.StatefulSet, newSet *apps.StatefulSet) error {
_, ordinal, replicas, deleteSlots := scaleOne(oldSet, newSet)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

recalculating again because of less code needs to be changed (otherwise ScaleOut/ScaleIn function signatures need to change and a lot of tests must be updated)

func (tsd *tikvScaler) ScaleOut(tc *v1alpha1.TidbCluster, oldSet *apps.StatefulSet, newSet *apps.StatefulSet) error {
_, ordinal, replicas, deleteSlots := scaleOne(oldSet, newSet)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

recalculating again because of less code needs to be changed (otherwise ScaleOut/ScaleIn function signatures need to change and a lot of tests must be updated)

@cofyc cofyc requested a review from Yisaer December 24, 2019 09:06
@aylei
Copy link
Contributor

aylei commented Dec 24, 2019

/run-e2e-test

@cofyc
Copy link
Contributor Author

cofyc commented Dec 25, 2019

/run-e2e-test

Copy link
Contributor

@aylei aylei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM, the scaleOne() method is excellent

Comment on lines +120 to +123
// - scaling:
// - 0: no scaling required
// - 1: scaling out
// - -1: scaling in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we define enums for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's necessary because it's used internally and I actually uses > 0 and < 0 to decide whether to scale in or scale out.

@@ -158,3 +163,188 @@ func newFakeGeneralScaler() (*generalScaler, cache.Indexer, *controller.FakePVCC
return &generalScaler{pvcLister: pvcInformer.Lister(), pvcControl: pvcControl},
pvcInformer.Informer().GetIndexer(), pvcControl
}

func TestScaleOne(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a case for "scale tidb to 0"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed some corner cases and add more tests in c8eb405

Comment on lines 129 to 130
additions := desiredDesiredPodOrdinals.Difference(actualDesiredPodOrdinals)
deletions := actualDesiredPodOrdinals.Difference(desiredDesiredPodOrdinals)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the order of the diff result guaranteed? or random?

Copy link
Contributor Author

@cofyc cofyc Dec 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

internal struct of sets is a map, List() will sort the set members in ascending order

@cofyc cofyc force-pushed the fix1348 branch 2 times, most recently from 29e6c30 to 4792854 Compare December 25, 2019 11:05
@cofyc
Copy link
Contributor Author

cofyc commented Dec 25, 2019

/run-e2e-test

Copy link
Contributor

@aylei aylei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@aylei aylei added this to the v1.1.0 milestone Dec 27, 2019
Copy link
Contributor

@weekface weekface left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well done.

LGTM

@cofyc
Copy link
Contributor Author

cofyc commented Dec 27, 2019

/merge

@sre-bot
Copy link
Contributor

sre-bot commented Dec 27, 2019

/run-all-tests

@sre-bot sre-bot merged commit c9b1d8f into pingcap:master Dec 27, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants