Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Webhook for PD upgrader #1040

Closed
wants to merge 20 commits into from
Closed

Conversation

Yisaer
Copy link
Contributor

@Yisaer Yisaer commented Oct 23, 2019

What problem does this PR solve?

#992

What is changed and how does it work?

Tidb-Operator will only need to edit statefulset spec during pd upgrading and to edit replicas during pd scale-in. The webhook server will ensure that each pd pod will be safely deleted.

And I also add one more component as an init job to auto-create CA cert and csr for webhook configuration which would be much user-friendly.

Currently I package tidb-operator-initializer in the same docker image as others and install openssl and kubectl in tidb-operator image which initializer use these in the script to manage CA cert.

Packaging all necessary process into 2 images would need us to change some e2e test script which would have huge influence for other e2e test.

After we moved our e2e CI script into code repo instead of jenkins server, I would pakcage tidb-operator-initializer in a different image so that other e2e test won't be influenced.

Check List

Tests

  • Unit test
  • E2E test
  • Manual test (add detailed scripts or steps below)

Code changes

  • Has Helm charts change
  • Has Go code change

Side effects
N/A

Related changes

Kubernetes 1.9.0 or above with the admissionregistration.k8s.io/v1beta1 API enabled. 

@Yisaer

This comment has been minimized.

@Yisaer Yisaer added the enhancement New feature or request label Oct 25, 2019
@Yisaer Yisaer force-pushed the pd_delete_pods_webhook branch 2 times, most recently from 790c92f to 563054c Compare October 29, 2019 10:48
@Yisaer
Copy link
Contributor Author

Yisaer commented Oct 29, 2019

/run-e2e-test

@Yisaer
Copy link
Contributor Author

Yisaer commented Oct 31, 2019

/run-e2e-in-kind

cpu: 80m
memory: 50Mi
## Interval Hour to Refresh CA Cert
certRefreshIntervalHour: 336
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AdmissionController will refresh CA cert before certRefreshIntervalHour hour on its expireDate.

}

/*
* currently we only use one replica for admission-controller-webhook
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently we just set replica as 1 for admissioncontroller so we don't have lease lock from apiServer as we are using Lister here.

return hex.EncodeToString(out)
}

func GenerateRSAKeyPair(namespace, serviceName string) ([]byte, []byte, error) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CertUtil help us generate CA Cert for a certain service with ten years default validity period

}
}

func (pc *PodAdmissionControl) AdmitPods(ar v1beta1.AdmissionReview) *v1beta1.AdmissionResponse {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AdmitPods will only check AdmissionReview with Delete Operation.


if l.IsPD() {
pdClient := pc.pdControl.GetPDClient(pdapi.Namespace(namespace), tcName, tc.Spec.EnableTLSCluster)
return pc.admitDeletePdPods(pod, ownerStatefulSet, tc, pdClient)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently we only check Delete AdmissionReview For PD Pod.

@@ -89,9 +89,14 @@ func serve(w http.ResponseWriter, r *http.Request, admit admitFunc) {
responseAdmissionReview.Response.UID = requestedAdmissionReview.Request.UID

marshalAndWrite(responseAdmissionReview, w)
webhookHandler.RefreshCertPEMExpirationHandler(RefreshJobConfig)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WebHook Server will check whether CA cert would be expired at each serving request.

@@ -133,7 +128,6 @@ func main() {
oa.CheckTidbClusterStatusOrDie(cluster)

// upgrade
oa.RegisterWebHookAndServiceOrDie(certCtx, ocfg)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove e2e webhook.

pkg/initializer/initializer.go Outdated Show resolved Hide resolved
@Yisaer
Copy link
Contributor Author

Yisaer commented Nov 1, 2019

@tennix @weekface @cofyc @aylei PTAL.

@cofyc
Copy link
Contributor

cofyc commented Nov 1, 2019

Can we divide the whole process into several steps?

implement the webhook controller and can use the webhook in tidb-operator optionally

  1. we can use hardcoded self-signed certs in e2e tests
  2. add a feature switch to disable the logic in controller-manager to work with webhook admission controller

in this step, we focus on webhook logic

deploy/configure webhook controller

request/sign/configure/renew cert for webhook controller

in this step, we focus on webhook deployment and management

use the webhook controller by default

when it's stable, we can switch to use webhook controller by default but have an option to switch back

production-ready

when it's production-ready, we can remove the old logic in the controller and require the webhook to exist in our deployment

In each step, we introduce small changes to our code. It's easier to review and we can have enough tests before enabling the new feature by default.

@Yisaer
Copy link
Contributor Author

Yisaer commented Nov 1, 2019

/run-e2e-in-kind

spec:
groups:
- system:authenticated
request: $(cat ${tmpdir}/server.csr | base64 | tr -d '\n')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

base64 | tr -d '\n' can be base64 -w0

kubectl get csr ${csrName}
if [ "$?" -eq 0 ]; then
break
fi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to sleep a while here?

charts/tidb-operator/templates/scripts/_create_cert.sh.tpl Outdated Show resolved Hide resolved
}

server.Spec.Template.Annotations = map[string]string{
"checksum/config": certUtils.Checksum(secret.Data["cert.pem"]),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"checksum/config": certUtils.Checksum(secret.Data["cert.pem"]),
"checksum/cert": certUtils.Checksum(secret.Data["cert.pem"]),

BTW, no need to preserve other annotations?

pkg/webhook/pod/pd_deleter.go Show resolved Hide resolved
Co-Authored-By: DanielZhangQD <36026334+DanielZhangQD@users.noreply.github.com>
@tennix
Copy link
Member

tennix commented Nov 13, 2019

Is this PR still needed after #1101 is merged?

@Yisaer
Copy link
Contributor Author

Yisaer commented Dec 18, 2019

This request is no more needed now.

@Yisaer Yisaer closed this Dec 18, 2019
@Yisaer Yisaer deleted the pd_delete_pods_webhook branch February 26, 2020 04:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request type/refactor
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants