Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add method to delete namespaced PVC in spawner base class #475

Merged
merged 5 commits into from
May 12, 2021

Conversation

nsshah1288
Copy link
Contributor

@nsshah1288 nsshah1288 commented Jan 19, 2021

PR Summary by Erik

This PR implements a delete_forever hook for KubeSpawner. The delete_forever hook will be respected by JupyterHub 1.4.1 and later and was implemented in jupyterhub/jupyterhub#3337. The idea of this hook is to allow the Spawners to perform cleanup of a users associated resources at the time a JupyterHub user is deleted.

As KubeSpawner can create PVCs for users, this KubeSpawner implementation checks for a PVC that it could have created for a user and tries to delete it. This behavior should be documented properly in the CHANGELOG.md for the next KubeSpawner release, I'm not sure if I think it would be labelled as a "breaking" change, but it is a change in the user experience that is quite important to be aware about.

This PR closes #446.

Original PR message

have tested with sqlite-pvc

@welcome
Copy link

welcome bot commented Jan 19, 2021

Thanks for submitting your first pull request! You are awesome! 🤗

If you haven't done so already, check out Jupyter's Code of Conduct. Also, please make sure you followed the pull request template, as this will help us review your contribution more quickly.
welcome
You can meet the other Jovyans by joining our Discourse forum. There is also a intro thread there where you can stop by and say Hi! 👋

Welcome to the Jupyter community! 🎉

@nsshah1288
Copy link
Contributor Author

@consideRatio @minrk If either of you have time, could you please review this PR?
The PR related to this has been merged in the jupyterhub repo: jupyterhub/jupyterhub#3337

Thanks!

Copy link
Member

@manics manics left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it's practical to add a test for this new feature, especially as it may not be tested routinely?
There's an existing test which checks that secrets are deleted

# stop the pod
await spawner.stop()
# verify pod is gone
pods = kube_client.list_namespaced_pod(kube_ns).items
pod_names = [p.metadata.name for p in pods]
assert "jupyter-%s" % spawner.user.name not in pod_names
# verify service and secret are gone
# it may take a little while for them to get cleaned up
for i in range(5):
secrets = kube_client.list_namespaced_secret(kube_ns).items
secret_names = {s.metadata.name for s in secrets}
services = kube_client.list_namespaced_service(kube_ns).items
service_names = {s.metadata.name for s in services}
if secret_name in secret_names or service_name in service_names:
time.sleep(1)
else:
break
assert secret_name not in secret_names
assert service_name not in service_names

so maybe that structure could be copied?

kubespawner/spawner.py Outdated Show resolved Hide resolved
@nsshah1288

This comment has been minimized.

@nsshah1288
Copy link
Contributor Author

Hi @manics @consideRatio I have fixed the test case. Can you please review it?
I apologize for the delay on this. Thanks!

Copy link
Member

@consideRatio consideRatio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @nsshah1288 so much for working to make this supported in JupyterHub itself which was the proper way of implementing this in KubeSpawner!

This PR looks good to me as it is content wise!

Could you as a final change also squash/fixup the 36 commits into one or two commits so it becomes less complicated to inspect the history of changes later?

@nsshah1288
Copy link
Contributor Author

Hi @consideRatio, thank you much for the review!
I apologize, but I couldn't for the life of me figure out how to squash my commits, so I made a new PR here with the same content: #504

If this is not sufficient, please let me know and I am willing to try whatever you need. Thank you!

@consideRatio
Copy link
Member

consideRatio commented May 4, 2021

I gave it a try as I was eager to learn more about this. It becomes complicated whenever there are multiple merges to various stages of the master.

I tried various things that failed but then I got the following to work as I wanted. Want to give it a try? Below is a sequence of commands you can use.

# Bonus detail:
# this makes your commit be associated with your username on GitHub
# but you could also add the email regn.net email to your user account
git config --global user.email "nsshah1288@users.noreply.github.com"

git remote add upstream https://github.com/jupyterhub/kubespawner
git fetch upstream
git reset --hard upstream/master
git merge --squash 0836260 # this is the latest commit in this PR
git commit -m "add delete_forever hook to cleanup user's PVC on user deletion"

# verify it seems right
git log
git diff HEAD~1

# update this branch
git push --force <name of fork remote> feature/shahn3_pvcDeletion

@nsshah1288
Copy link
Contributor Author

nsshah1288 commented May 5, 2021

Amazing @consideRatio ! Thank you so much.
I followed your steps, and then I had to resolve a merge conflict so that created a second commit. Is this fine? Thanks again

@consideRatio consideRatio force-pushed the feature/shahn3_pvcDeletion branch 2 times, most recently from 8d7e1b2 to 7357b7e Compare May 5, 2021 03:43
Comment on lines 2792 to 2857
try:
self.api.read_namespaced_persistent_volume_claim(
self.pvc_name, self.namespace
)
except ApiException as e:
if e.status == 404:
self.log.warning(
"Could not delete %s. This PVC does not exist.", self.pvc_name
)
else:
raise
else:
self.api.delete_namespaced_persistent_volume_claim(
self.pvc_name, self.namespace
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@minrk do you have any suggested changes to this section?

I'm a bit uncertain about the use of async without any await statement and that we don't have any timeouts defined for example.

For reference, this is the async stop() method:

    async def stop(self, now=False):
        delete_options = client.V1DeleteOptions()

        if now:
            grace_seconds = 0
        else:
            grace_seconds = self.delete_grace_period

        delete_options.grace_period_seconds = grace_seconds

        ref_key = "{}/{}".format(self.namespace, self.pod_name)
        await exponential_backoff(
            partial(
                self._make_delete_pod_request,
                self.pod_name,
                delete_options,
                grace_seconds,
                self.k8s_api_request_timeout,
            ),
            f'Could not delete pod {ref_key}',
            timeout=self.k8s_api_request_retry_timeout,
        )

        try:
            await exponential_backoff(
                lambda: self.pod_reflector.pods.get(ref_key, None) is None,
                'pod %s did not disappear in %s seconds!'
                % (ref_key, self.start_timeout),
                timeout=self.start_timeout,
            )
        except TimeoutError:
            self.log.error(
                "Pod %s did not disappear, restarting pod reflector", ref_key
            )
            self._start_watching_pods(replace=True)
            raise

Copy link
Member

@minrk minrk May 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit uncertain about the use of async without any await

This in principle is fine, and correct if you are implementing a generally async API but don't have anything to wait for. In this case, however, all k8s API requests should be async via self.asynchronize, in which case there will be awaits. Following the timeout/conflict/error handling pattern in our other API requests makes sense.

Copy link
Member

@consideRatio consideRatio May 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @minrk! Okay let's follow the pattern used already for timeout/conflict/error.

Suggested changes

Thank you for your work on this @nsshah1288! For reliability and performance of this feature it would be good if we can do the following as well:

  • I think we should stop making a get request for the PVC, and just try delete it straight away instead and use a 404 response as an indicator of that instead.
  • Mimic the pattern for deleting a pod above using exponential_backoff, using the same timeouts
  • Ignore the part about verifying the PVC was removed (a X reflector monitors and caches the information about X in the k8s cluster - we don't have a PVC reflector)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @consideRatio , thanks to you and @minrk for the suggestions! I tried to implement what you outlined above. I tested the changes again with sqlite-pvc and it worked well. Please let me know if any other changes are required.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @consideRatio , do the changes look OK?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nsshah1288 thanks for your work on this! 🎉

If the tests pass and the following changes below LGTY I think we could be ready to merge this.

  • I cleaned up the merge commit (reset everything to the current master branch and then manually cherry picked all but the merge commit and solving some trivial merge conflicts).

  • By doing above I --force pushed to this PRs branch, backing up its previous state in https://github.com/consideRatio/kubespawner/tree/feature/shahn3_pvcDeletion-backup

  • I added ea12ea7 because the function didn't have the correct signature matching JupyterHub's Spawner base class

  • I opted to not default to delete_grace_period for the PVC deletion request because as it is explicitly described to be used for the pod deletion requests.

        delete_grace_period = Integer(
            1,
            config=True,
            help="""
            Time in seconds for the pod to be in `terminating` state before is forcefully killed.
    
            Increase this if you need more time to execute a `preStop` lifecycle hook.
    
            See https://kubernetes.io/docs/concepts/workloads/pods/pod/#termination-of-pods for
            more information on how pod termination works.
    
            Defaults to `1`.
            """,
        )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for your detailed help on this @consideRatio ! The changes look good to me.

@consideRatio consideRatio requested a review from minrk May 11, 2021 00:19
minrk and others added 2 commits May 12, 2021 10:23
Co-authored-by: Erik Sundell <erik.i.sundell@gmail.com>
to opt-out of new pvc deletion logic

and avoid deleting shared PVCs when deleting named servers
@minrk
Copy link
Member

minrk commented May 12, 2021

I added a condition and clarified a docstring:

  • KubeSpawner.delete_pvc = False allows opt-out of deletion for anyone who wants to keep the older behavior
  • check if a named server's PVC has {servername} in it, because we don't want deleting named servers to delete the default server's PVC
  • The docstring wasn't accurate to say that only the default Spawner is deleted. All Spawners can be deleted, all of which trigger this delete_forever hook.

@consideRatio consideRatio merged commit fdad3d4 into jupyterhub:master May 12, 2021
@welcome
Copy link

welcome bot commented May 12, 2021

Congrats on your first merged pull request in this project! 🎉
congrats
Thank you for contributing, we are very proud of you! ❤️

"No pvc %s to delete. Assuming already deleted.",
pvc_name,
)
# If there isn't a PVC to delete, that's ok too!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've got a question about this and the warning a couple of lines above. In our deployment we're currently using kubespawner 0.15 and jupyterhub 1.3. I'm looking to upgrade to z2jh 1.2.0, jupyterhub 1.5.0 and kubespawner 1.1.2. We use a single PVC that is backed by cloud object storage so all of the singleuser-server pods share the same PVC. We leave the default of storage_pvc_ensure=False which from looking at the code means kubespawner won't actually create a PVC named self.pvc_name in _start. So shouldn't delete_forever also be checking storage_pvc_ensure? Maybe not if it's assumed that someone could pre-create the PVC using the same pvc_name_template, and in that case we just try to delete the PVC and if it doesn't exist we get a 404 and log the warning.

I guess in our case the point is I don't want to see the warnings when users get culled (which happens after 5 days of inactivity for us) so to avoid that we just need to set delete_pvc=False.

Copy link
Member

@consideRatio consideRatio Nov 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I extracted this to #543 to avoid loosing track of this comment! Please follow up there rather than here so we don't disperse the discussion about this in two places.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Deleting named servers should delete the corresponding PVC
6 participants