Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exception: "failed to update jobspec with bank name" on Tuolumne #498

Open
jameshcorbett opened this issue Oct 7, 2024 · 4 comments
Open

Comments

@jameshcorbett
Copy link
Member

[devcich1@tuolumne1001:MY_TEST_DIR]$ flux job info fBKZ2XwhCX9 eventlog | grep -i fail
{"timestamp":1728333451.1065736,"name":"exception","context":{"type":"mf_priority","severity":0,"note":"failed to update jobspec with bank name","userid":767}}
@jameshcorbett
Copy link
Member Author

They seem to be happening regularly:

flux job info fBKZNkS3D2P eventlog
{"timestamp":1728333175.2028327,"name":"submit","context":{"userid":54987,"urgency":16,"flags":0,"version":1}}
{"timestamp":1728333175.2335021,"name":"validate"}
{"timestamp":1728333175.2560935,"name":"dependency-add","context":{"description":"dws-create"}}
{"timestamp":1728333175.8794999,"name":"memo","context":{"rabbit_workflow":"fluxjob-76653206309962752"}}
{"timestamp":1728333179.8968751,"name":"dependency-remove","context":{"description":"dws-create"}}
{"timestamp":1728333179.8969295,"name":"depend"}
{"timestamp":1728333179.897059,"name":"priority","context":{"priority":16}}
{"timestamp":1728333317.7042291,"name":"alloc","context":{"annotations":{"user":{"rabbit_workflow":"fluxjob-76653206309962752"}}}}
{"timestamp":1728333317.7044308,"name":"prolog-start","context":{"description":"job-manager.prolog"}}
{"timestamp":1728333317.704457,"name":"prolog-start","context":{"description":"cray-pals-port-distributor"}}
{"timestamp":1728333317.7044675,"name":"prolog-start","context":{"description":"dws-setup"}}
{"timestamp":1728333317.7085178,"name":"prolog-finish","context":{"description":"cray-pals-port-distributor","status":0}}
{"timestamp":1728333317.7138379,"name":"memo","context":{"rabbits":"tuolumne267"}}
{"timestamp":1728333399.8657088,"name":"dws_environment","context":{"variables":{"DW_JOB_ioioio":"/mnt/nnf/f52b9826-5db6-40c4-9f18-272437d6f807-0","DW_WORKFLOW_NAME":"fluxjob-76653206309962752","DW_WORKFLOW_NAMESPACE":"default"},"rabbits":{"tuolumne267":"tuolumne[2057-2072]"},"copy_offload":false}}
{"timestamp":1728333399.8658042,"name":"prolog-finish","context":{"description":"dws-setup","status":0}}
{"timestamp":1728333400.2696013,"name":"prolog-finish","context":{"description":"job-manager.prolog","status":0}}
{"timestamp":1728333400.310169,"name":"start"}
{"timestamp":1728333400.7326884,"name":"finish","context":{"status":0}}
{"timestamp":1728333400.7330658,"name":"epilog-start","context":{"description":"job-manager.epilog"}}
{"timestamp":1728333400.7331221,"name":"epilog-start","context":{"description":"dws-epilog"}}
{"timestamp":1728333400.7672787,"name":"release","context":{"ranks":"all","final":true}}
{"timestamp":1728333400.925935,"name":"epilog-finish","context":{"description":"job-manager.epilog","status":0}}
{"timestamp":1728333451.1015525,"name":"exception","context":{"type":"mf_priority","severity":0,"note":"failed to update jobspec with bank name","userid":767}}
{"timestamp":1728333451.1015964,"name":"jobspec-update","context":{"attributes.system.bank":"DNE"}}
{"timestamp":1728333451.1016669,"name":"exception","context":{"type":"mf_priority","severity":0,"note":"job.update: bank info is missing","userid":767}}
{"timestamp":1728333454.9056153,"name":"epilog-finish","context":{"description":"dws-epilog","status":0}}
{"timestamp":1728333454.9060705,"name":"free"}
{"timestamp":1728333454.9061046,"name":"clean"}

@jameshcorbett
Copy link
Member Author

Sounds like this has been resolved offline @cmoussa1 ? In which case feel free to close.

@grondo
Copy link
Contributor

grondo commented Oct 8, 2024

We should probably open a separate issue (?) on the mf_priority plugin trying to update the jobspec for jobs that are past the SCHED state.

@cmoussa1
Copy link
Member

cmoussa1 commented Oct 8, 2024

Yup, I was planning on seeing if I could reproduce this behavior in a controlled environment today, so I'll leave this one open (or can just open a separate issue).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants