Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cancel_job REST API #340

Merged
merged 2 commits into from
Oct 12, 2022
Merged

Add cancel_job REST API #340

merged 2 commits into from
Oct 12, 2022

Conversation

tfeda
Copy link
Contributor

@tfeda tfeda commented Oct 11, 2022

Hi all! Hope to get more involved with the project over the next few weeks. Thought id start with some easier items.

Which issue does this PR close?

part of #265

Rationale for this change

What changes are included in this PR?

Moved grpc cancellation logic to SchedulerState so that it could be reused by the REST interface. Kept error handling behavior the same, but added a 404 return when canceling a job that doesn't exist.

Are there any user-facing changes?

For any long running tasks, users can execute a PATCH request on the job. for example:

curl -X PATCH -v -H 'Accept: application/json' http://localhost:50050/api/job/25UPl4w | jq

NOTE

When testing out this addition, I found that cancelling the job causes a few errors:

  1. The scheduler always fails its grpc call to the executor to cancel_tasks(), and returns with an Unipmlemented code. From looking at the executor logs, it doesn't seem to even reach them.

  2. The scheduler goes into an infinite loop

image

I think the bugs are unrelated. I'm happy to help debug, but I thought I'd post what I had since the changes are useful for debugging with curl

@andygrove
Copy link
Member

This is looking great. Thanks @tfeda. Could you also update the REST API documentation at docs/source/user-guide/scheduler.md ?

I also see that infinite loop issue when jobs fail. I just filed #341 to track this.

@andygrove andygrove merged commit 5ac161c into apache:master Oct 12, 2022
@mingmwang
Copy link
Contributor

Ah, this is a known issue, actually I added the failed job check in the pop_next_task() loop.
Without such check, the scheduler loop will try to schedule the pending tasks from failed job which would be worse !!

@yahoNanJing
Please take a look and have a fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants