Skip to content

Commit

Permalink
systemd: improve housekeeping drain message
Browse files Browse the repository at this point in the history
Problem: sometimes systemd doesn't set $EXIT_CODE or $EXIT_STATUS
and nodes that fail housekeeping are drained with the message
"housekeeping code= status=".

Modify the unit file to generate a message that
- includes the jobid
- drops the code/status labels and just print the values, if set
- include the $SERVICE_RESULT or "failure" if unset

Examples:

When housekeeping script exits with nonzero code:
  housekeeping@f2PsCLp3gf9 exit-code: exited 1

When flux housekeeping kill is used:
  housekeeping@f2PxL1tmvUX signal: killed TERM

If no env vars available:
  housekeeping@fuzzybunny failure

Fixes flux-framework#6176
  • Loading branch information
garlick committed Oct 1, 2024
1 parent f7793b5 commit 9d741a4
Showing 1 changed file with 4 additions and 7 deletions.
11 changes: 4 additions & 7 deletions etc/flux-housekeeping@.service.in
Original file line number Diff line number Diff line change
Expand Up @@ -9,13 +9,10 @@ ExecStart=@X_SYSCONFDIR@/flux/system/housekeeping
ExecStopPost=-rm -f @X_RUNSTATEDIR@/flux-housekeeping@%I.env
ExecStopPost=-sh -c '\
if test "$SERVICE_RESULT" != "success"; then \
if test "$EXIT_CODE" = "killed" -o "$EXIT_CODE" = "dumped"; then \
message="killed by SIG${EXIT_STATUS}"; \
elif test "$EXIT_CODE" = "exited"; then \
message="exited with exit code $EXIT_CODE"; \
else \
message="code=$EXIT_CODE status=$EXIT_STATUS"; \
message="housekeeping@%I ${SERVICE_RESULT:-failure}"; \
if test "${EXIT_CODE}${EXIT_STATUS}"; then \
message="$message: $EXIT_CODE $EXIT_STATUS"; \
fi; \
flux resource drain $(flux getattr rank) "housekeeping $message"; \
flux resource drain $(flux getattr rank) $message; \
fi \
'

0 comments on commit 9d741a4

Please sign in to comment.