Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: allow configurable child shutdown timeouts #565

Merged
merged 12 commits into from
Feb 22, 2022

Conversation

raddessi
Copy link
Contributor

@raddessi raddessi commented Feb 1, 2022

Thank you for the work done in #393! It did fix some odd issues but brought to light one more. We have a few tests that spin up docker containers and then clean them up after the pytest session exits but these take upwards of 10-15 seconds to fully clean up. I think that making the timeout configurable may be acceptable although I'm not sure if this is the best approach. Let me know what you think.

The goal of this change is to allow you to set the timeout on the CLI, in the noxfile.py, or individually on each session.run invocation. If this is acceptable I'll work on updating docs and tests. Thanks!

Copy link
Contributor Author

@raddessi raddessi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some questions

nox/_options.py Outdated
group=options.groups["execution"],
help=(
"The number of seconds to wait for children to shut down cleanly after "
"receiving a SIGTERM signal before sending them a SIGKILL signal."
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I include the default value in the help here?

nox/_options.py Outdated
Comment on lines 153 to 159
try:
return float(command_args.child_shutdown_timeout)
except (ValueError, TypeError):
try:
return float(noxfile_args.child_shutdown_timeout)
except (ValueError, TypeError):
return None
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a better method to validate/cast object types?

nox/popen.py Outdated
with contextlib.suppress(subprocess.TimeoutExpired):
return proc.communicate(timeout=0.3)
return proc.communicate(timeout=child_timeout or 0.3)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this timeout (the first one) be the one controlled by the variable or should it be the second? I'm on the fence.

Copy link
Collaborator

@cjolowicz cjolowicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting together this PR!

Generally, it's preferred to open an issue first. This gives us an opportunity to talk about the problem you're trying to solve, and investigate different approaches before getting into implementation details of the specific approach you favor.

I have a few questions:

  • Have you considered using pytest fixtures for your Docker containers? pytest should provide all the flexibility you need, and it's a more general solution as it does not require Nox. Also, have you considered modifying the Docker containers to improve their shutdown time? While you can always argue for more configurability, every little knob we add increases maintainance effort permanently, so I'd like to first consider alternative solutions.

  • Have you considered passing the timeout as an argument to session.run and friends? This is a more fine-grained approach. The shutdown timeout is specified at the point where the expensive process is spawned. That also makes this approach clearer in my mind. Finally, it means that you don't need to pass the timeout with every Nox invocation, but specify it once in the Noxfile. Edit: Just saw that your PR already allows passing the timeout to session.run.

  • The option help says that it's for the delay between SIGTERM and SIGKILL. But the implementation is for the delay between SIGINT and SIGTERM. Possible confusion?

  • There are two timeouts involved in the existing shutdown process, the delay before SIGTERM and the delay before SIGKILL. In tox 3, both are configurable, and have names that make their purpose clear (interrupt_timeout and terminate_timeout). You chose the generic name shutdown_timeout, and wired that to the interrupt timeout. Why did you select this one of the two timeouts, and how should we handle future feature requests for configuring the other timeout?

I'm not 100% against making this configurable, but it would be good to talk these points over.

@raddessi
Copy link
Contributor Author

raddessi commented Feb 2, 2022

Thanks for the feedback!

Generally, it's preferred to open an issue first.

I'm sorry, I would if I had known. I don't think there is a PR template on this project is there? I'll maybe open an issue about adding one of those with some guidelines then.

Have you considered using pytest fixtures for your Docker containers?

Yes, actually that is what is going on :) Pytest spins up and down containers, but when it tries to shut them down after receiving SIGINT noxs end up SIGKILLing pytest and we get containers left around on out CI hosts. This has become apparent after I switched the project in question over to a matrix strategy build on github so after the first test failure a SIGINT gets sent to the other running matrix tests as I'm sure you know.

While you can always argue for more configurability, every little knob we add increases maintainance effort permanently, so I'd like to first consider alternative solutions.

I 100% understand and agree. If there is a workaround I'm willing to implement it, but I think this will probably effect more than a few other users of nox unless I happen to be doing things in a strange way (which I hope I'm not)

Also, have you considered modifying the Docker containers to improve their shutdown time?

I'm not really sure what all I could speed up, I think it's just a problem of combined IO load on the CI hosts at some points. It can be as fast as 1-2 seconds to clean it up sometimes but it's always longer than 0.3 seconds. Docker-compose with databases always seems a little sluggish to me on cleanup for some reason.

The option help says that it's for the delay between SIGTERM and SIGKILL. But the implementation is for the delay between SIGINT and SIGTERM. Possible confusion?

Oh you're totally right, I think in this case the configurable timeout should be moved to the second one? I was reading the code wrong.

There are two timeouts involved in the existing shutdown process, the delay before SIGTERM and the delay before SIGKILL. In tox 3, both are configurable, and have names that make their purpose clear (interrupt_timeout and terminate_timeout). You chose the generic name shutdown_timeout, and wired that to the interrupt timeout. Why did you select this one of the two timeouts, and how should we handle future feature requests for configuring the other timeout?

Good to know about tox, I should have looked there for naming standards to follow. In this case because tox has the functionality already then I would personally want to match tox's naming and probably provide both as options to the user. If you would only want one, this should probably be named terminate_timeout and have control over the timeout before the SIGKILL do you agree?


I should have set this to draft mode to begin with, I didn't mean for you to accept this without a discussion :) Thanks again

@raddessi
Copy link
Contributor Author

raddessi commented Feb 2, 2022

I do see you have a CONTRIBUTING file however, I apologize I should have looked for one.

@raddessi
Copy link
Contributor Author

raddessi commented Feb 2, 2022

Should I start a related Issue for this?

@raddessi raddessi marked this pull request as draft February 2, 2022 17:29
@cjolowicz
Copy link
Collaborator

cjolowicz commented Feb 3, 2022

Thanks for explaining your use case further. And no worries about opening an issue now, I think we can take it from here.

What are your thoughts about adding parameters interrupt_timeout and terminate_timeout to session.runpopen, and omitting the CLI options? I feel that the command-line interface isn't achieving much here, in addition to the parameters.

We'll also need docs and a test. For the docs, you can update the docstrings of session.run and session.run_always.

@cjolowicz
Copy link
Collaborator

I don't think there is a PR template on this project is there? I'll maybe open an issue about adding one of those with some guidelines then.

I agree, that would be helpful.

@raddessi
Copy link
Contributor Author

raddessi commented Feb 3, 2022

What are your thoughts about adding parameters interrupt_timeout and terminate_timeout to session.runpopen, and omitting the CLI options? I feel that the command-line interface isn't achieving much here, in addition to the parameters.

That actually makes a lot more sense, CLi options aren't probably needed for this and having separate variables that match tox naming also makes sense. Should a default also be read from the config options so it could be set in the noxfile as nox.options.<option_name> or should it only be allowed as a kwarg to session.run that then gets passed to popen?

We'll also need docs and a test. For the docs, you can update the docstrings of session.run and session.run_always.

For sure! Thank you for keeping the test coverage high, nox has been incredibly stable for me.

@cjolowicz
Copy link
Collaborator

Should a default also be read from the config options so it could be set in the noxfile as nox.options.<option_name> or should it only be allowed as a kwarg to session.run that then gets passed to popen?

IMO it should be passed explicitly as a keyword argument. I don't think we need a configurable global default, and nox.options wouldn't be a good place for that as its attributes should mirror command-line options.

@raddessi
Copy link
Contributor Author

raddessi commented Feb 9, 2022

Still need to check test coverage and update docs

nox/popen.py Outdated
Comment on lines 26 to 27
interrupt_timeout: float,
terminate_timeout: float,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's allow waiting without timeout as well.

Suggested change
interrupt_timeout: float,
terminate_timeout: float,
interrupt_timeout: float | None,
terminate_timeout: float | None,

nox/popen.py Outdated
Comment on lines 66 to 67
interrupt_timeout: float = 0.3,
terminate_timeout: float = 0.2,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above.

Suggested change
interrupt_timeout: float = 0.3,
terminate_timeout: float = 0.2,
interrupt_timeout: float | None = 0.3,
terminate_timeout: float | None = 0.2,

"terminate_timeout_expected",
),
[
(None, None, 0.3, 0.2),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If None is passed, None is the expected value (see above).

For the defaults, pass a different sentinel instead of None and check for that further down. For example, the string "default" would work.

@raddessi raddessi marked this pull request as ready for review February 21, 2022 18:43
@raddessi
Copy link
Contributor Author

Sorry for the delay

Copy link
Collaborator

@cjolowicz cjolowicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This still needs to be documented. Other than that, looking great.

For the docs, you can update the docstrings of session.run and session.run_always.

@raddessi
Copy link
Contributor Author

Nice! Yep, will do. Thanks!

Copy link
Collaborator

@cjolowicz cjolowicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect, thanks!

@cjolowicz cjolowicz merged commit bcee2dc into wntrblm:main Feb 22, 2022
@cjolowicz
Copy link
Collaborator

FTR, a small noxfile.py to test both timeouts:

"""
Usage: nox -- [interrupt_timeout] [terminate_timeout]
Examples:

    $ nox -- 1
    $ nox -- 1 2
    $ nox -- None None

"""
import ast
import nox

bash_program = """
trap '' SIGINT SIGTERM
sleep inf
"""

@nox.session
def test(session):
    timeouts = dict(
        zip(
            ["interrupt_timeout", "terminate_timeout"],
            map(ast.literal_eval, session.posargs),
        )
    )
    session.run("bash", "-c", bash_program, external=True, **timeouts)

@raddessi raddessi deleted the main.configurable-shutdown-timeouts branch February 22, 2022 19:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants