Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Patch PVC when storage request is increased #3004

Closed
mightyguava opened this issue Jul 22, 2020 · 12 comments · Fixed by #3096
Closed

Patch PVC when storage request is increased #3004

mightyguava opened this issue Jul 22, 2020 · 12 comments · Fixed by #3096
Assignees
Labels
priority:P1 status/help-wanted Extra attention is needed
Milestone

Comments

@mightyguava
Copy link
Contributor

mightyguava commented Jul 22, 2020

Feature Request

Is your feature request related to a problem? Please describe:
Increasing storage for tidb, pd, pump, monitor, require the user to manually patch PVCs. The storage setting in the CRDs only have effect when initially creating the cluster.

Describe the feature you'd like:
When the storage request for any component is increased, and the underlying storage-class supports expansion (allowVolumeExpansion: true), the operator should patch the PVC to increase the volume size automatically.

@DanielZhangQD DanielZhangQD added the status/help-wanted Extra attention is needed label Jul 23, 2020
@DanielZhangQD DanielZhangQD added this to the v1.1.4 milestone Jul 23, 2020
@DanielZhangQD
Copy link
Contributor

I think we can patch the PVC automatically, and users have to make sure that the underlying sc supports expansion.
The operator just patches the PVC according to the spec, if the sc does not support expansion, the patch just does not take effect. WDYT @cofyc

@cofyc
Copy link
Contributor

cofyc commented Jul 23, 2020

SGTM

@cofyc
Copy link
Contributor

cofyc commented Jul 30, 2020

The StatefulSet "web" is invalid: spec: Forbidden: updates to statefulset spec for fields other than 'replicas', 'template', and 'updateStrategy' are forbidden

Current statefulset implementation does not allow volumeClaimTemplates to be changed, two possible workaround solutions:

recreate the statefulset, patch the PVCs

patch the PVCs only

@mightyguava
Copy link
Contributor Author

My intent with the feature request is to just patch the PVCs. I think that is sufficient.

@cofyc
Copy link
Contributor

cofyc commented Aug 12, 2020

hi, @mightyguava

what volume plugin are you using? does it support online filesystem volume expansion?
is it ok to patch all PVCs at the same time?

@mightyguava
Copy link
Contributor Author

mightyguava commented Aug 12, 2020

We are using the in-tree aws EBS driver. I believe it require the pod to be restarted for the resize operation to run.

It’s ok to patch all PVCs at the same time but the stateful set still needs a rolling restart.

@cofyc
Copy link
Contributor

cofyc commented Aug 12, 2020

Thanks!

I didn't test with EKS. But according to this blog, as for AWS-EBS, the pod does not need to be restarted. The feature ExpandInUsePersistentVolumes has been enabled by default since 1.15.

If ExpandInUsePersistentVolumes is not enabled or the volume plugin does not support it, the pod referencing the volume must be deleted and recreated after the FileSystemResizePending condition becomes true.

@mightyguava
Copy link
Contributor Author

We just upgraded to kubernetes 1.15 last week so I guess it’s on by default now! I will test it today.

@mightyguava
Copy link
Contributor Author

mightyguava commented Aug 12, 2020

Just tried on our cluster, it didn't resize online.

PVC
Name:          tikv-tidb-tikv-0
Namespace:     yunchi-test-tidb
StorageClass:  ebs-gp2
Status:        Bound
Volume:        pvc-23dd3c9d-cc45-11ea-9f00-0a20ab0494e4
Labels:        app.kubernetes.io/component=tikv
               app.kubernetes.io/instance=tidb
               app.kubernetes.io/managed-by=tidb-operator
               app.kubernetes.io/name=tidb-cluster
               tidb.pingcap.com/cluster-id=6852365978574287644
               tidb.pingcap.com/pod-name=tidb-tikv-0
               tidb.pingcap.com/store-id=1
Annotations:   pv.kubernetes.io/bind-completed: yes
               pv.kubernetes.io/bound-by-controller: yes
               tidb.pingcap.com/pod-name: tidb-tikv-0
               volume.beta.kubernetes.io/storage-provisioner: kubernetes.io/aws-ebs
               volume.kubernetes.io/selected-node: ip-10-136-65-1.us-west-2.compute.internal
               volume.kubernetes.io/storage-resizer: kubernetes.io/aws-ebs
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      10Gi
Access Modes:  RWO
VolumeMode:    Filesystem
Mounted By:    tidb-tikv-0
Conditions:
  Type                      Status  LastProbeTime                     LastTransitionTime                Reason  Message
  ----                      ------  -----------------                 ------------------                ------  -------
  FileSystemResizePending   True    Mon, 01 Jan 0001 00:00:00 +0000   Wed, 12 Aug 2020 15:57:16 -0400           Waiting for user to (re-)start a pod to finish file system resize of volume on node.
Events:                     <none>
storage class ``` allowVolumeExpansion: true apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: annotations: kubectl.kubernetes.io/last-applied-configuration: | {"allowVolumeExpansion":true,"apiVersion":"storage.k8s.io/v1","kind":"StorageClass","metadata":{"annotations":{},"name":"ebs-gp2"},"parameters":{"fsType":"ext4","type":"gp2"},"provisioner":"kubernetes.io/aws-ebs","reclaimPolicy":"Retain","volumeBindingMode":"WaitForFirstConsumer"} creationTimestamp: "2019-11-07T16:48:26Z" name: ebs-gp2 resourceVersion: "536533776" selfLink: /apis/storage.k8s.io/v1/storageclasses/ebs-gp2 uid: 6b3226b1-017e-11ea-a332-0695703b250c parameters: fsType: ext4 type: gp2 provisioner: kubernetes.io/aws-ebs reclaimPolicy: Retain volumeBindingMode: WaitForFirstConsumer ```

I've confirmed that we are on kubernetes 1.15, and this is the only feature gate on the kubelet:

--feature-gates=RotateKubeletServerCertificate=true \

@mightyguava
Copy link
Contributor Author

mightyguava commented Aug 12, 2020

Huh, never mind, it did end up finish resizing automatically without a pod restart. I tried it manually by patching all 3 TiKV pods. They all did the resize of the volume immediately via AWS API, and then all went into the status Waiting for user to (re-)start a pod to finish file system resize of volume on node.. But after a few minutes, the resize finishes and no pod restarted. I guess that status line is a lie. I exec'ed into the tikv pods too and double-checked that the filesystems have been resized.

> kubectl -n yunchi-test-tidb get event --watch
LAST SEEN   TYPE     REASON                       OBJECT                                       MESSAGE
7m51s       Normal   FileSystemResizeSuccessful   pod/tidb-tikv-0                              MountVolume.NodeExpandVolume succeeded for volume "pvc-23dd3c9d-cc45-11ea-9f00-0a20ab0494e4"
0s          Normal   FileSystemResizeSuccessful   pod/tidb-tikv-1                              MountVolume.NodeExpandVolume succeeded for volume "pvc-23df9860-cc45-11ea-9f00-0a20ab0494e4"
2s          Normal   SuccessfulDelete             backup/backup-schedule-2020-07-27t10-00-00   delete Backup yunchi-test-tidb/backup-schedule-2020-07-27t10-00-00 for backupSchedule/backup-schedule successful
3s          Normal   SuccessfulDelete             backup/backup-schedule-2020-07-26t10-00-00   delete Backup yunchi-test-tidb/backup-schedule-2020-07-26t10-00-00 for backupSchedule/backup-schedule successful
3s          Normal   SuccessfulDelete             backup/backup-schedule-2020-07-25t10-00-00   delete Backup yunchi-test-tidb/backup-schedule-2020-07-25t10-00-00 for backupSchedule/backup-schedule successful
3s          Normal   SuccessfulDelete             backup/backup-schedule-2020-07-24t10-00-00   delete Backup yunchi-test-tidb/backup-schedule-2020-07-24t10-00-00 for backupSchedule/backup-schedule successful
0s          Normal   FileSystemResizeSuccessful   pod/tidb-tikv-2                              MountVolume.NodeExpandVolume succeeded for volume "pvc-23e1e639-cc45-11ea-9f00-0a20ab0494e4"
3s          Normal   SuccessfulDelete             backup/backup-schedule-2020-07-23t10-00-00   delete Backup yunchi-test-tidb/backup-schedule-2020-07-23t10-00-00 for backupSchedule/backup-schedule successful

@cofyc
Copy link
Contributor

cofyc commented Aug 13, 2020

Great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority:P1 status/help-wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants