Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

job-exec.info logs contain garbage #6366

Open
garlick opened this issue Oct 11, 2024 · 1 comment · May be fixed by #6368
Open

job-exec.info logs contain garbage #6366

garlick opened this issue Oct 11, 2024 · 1 comment · May be fixed by #6368

Comments

@garlick
Copy link
Member

garlick commented Oct 11, 2024

Problem: On 0.66, it seems we may be logging garbage to job-exec.info. Some snippets

Oct 11 14:27:25 elcap1 flux[28728]: job-exec.info[0]: elcapN (rank M): imp kill: flux-imp: Fatal: kill: failed to initialize pid info: No such file or directory
Oct 11 14:27:25 elcap1 flux[28728]: job-exec.info[0]: }
Oct 11 14:27:30 elcap1 flux[28728]: job-manager.err[0]: fBzYmV7oGnb: epilog: elcapN (rank M): No route to host

Oct 11 12:47:14 elcap1 flux[28728]: job-exec.info[0]: elcapN (rank M): imp kill: flux-imp: Fatal: kill: failed to initialize pid info: No such file or directory
Oct 11 12:47:14 elcap1 flux[28728]: job-exec.info[0]: `
Oct 11 12:54:10 elcap1 flux[28728]: job-exec.info[0]: elcapN (rank M): imp kill: flux-imp: Fatal: kill: failed to initialize pid info: No such file or directory
Oct 11 12:54:10 elcap1 flux[28728]: job-exec.info[0]: }

Sep 25 19:56:19 elcap1 flux[28728]: job-exec.info[0]: elcapN (rank M): imp kill: flux-imp: Fatal: kill: failed to initialize pid info: No such file or directory
Sep 25 19:56:19 elcap1 flux[28728]: [21B blob data]
Sep 25 19:56:19 elcap1 flux[28728]: job-exec.info[0]: elcapN (rank M): imp kill: flux-imp: Fatal: kill: failed to initialize pid info: No such file or directory
Sep 25 19:56:19 elcap1 flux[28728]: [21B blob data]

This seems familiar and I thought we might have fixed it in the latest release but so far I haven't found any evidence.

@garlick
Copy link
Member Author

garlick commented Oct 11, 2024

With journalctl -a the [21B blob data] comes out as

Sep 25 19:56:19 elcap1 flux[28728]: job-exec.info[0]: %kÒÿù^?
Sep 25 19:56:19 elcap1 flux[28728]: job-exec.info[0]: h^AÂpý^?

Eek!

garlick added a commit to garlick/flux-core that referenced this issue Oct 12, 2024
Problem: garbage appears in the logs when the bulkexec imp-kill
generates output.

Since bulk-exec always uses unbuffered reads for performance, the
output buffers returned by flux_subprocess_read() are not guaranteed
to be NULL terminated.

Use "%.*s" instead of "%s" in the log format string for:
- the imp kill on_output() handler
- the fallback in case a user doesn't define an on_output() handler

Fixes flux-framework#6366
@garlick garlick linked a pull request Oct 12, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant