Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build: Handle commit hash on incoming webhooks and send commit status on all builds #10320

Closed

Conversation

benjaoming
Copy link
Contributor

@benjaoming benjaoming commented May 15, 2023

Fixes #7084

Notes for reviewers

I hope you're okay about starting the review a bit high-level:

  • I wrote my notes for the implementation, starting here: Provide Github (etc) commit status for builds on branches / tags #7084 (comment)

  • Please make sure to read this comment about issues that I suggest to handle separately Build: Handle commit hash on incoming webhooks and send commit status on all builds #10320 (comment)

  • The implementation should maintain fallback support for previous behavior, i.e. "no commit ID supplied": If there for some reason exists a scenario where the commit ID isn't provided in the webhook event (tag pushes or we don't detect it correctly?), then we will just log the event on warning level but continue as before. I prefer that we go this way, since it's hard to guarantee that we have understood all potential events that are pushed to the webhook.

  • Noting that the risks of out-of-order webhook events and builds are very low and already exist to some extend, I prefer an approach where we will discuss and solve this separately.

Work log

  • Pre-analysis to figure out the correct solution
  • Implementation and local Q&A with GitHub OAuth App
  • Updates to tests w/ some new scenarios
  • Understand a bit more potential cases we need to investigate.
    • Out of order webhook events
    • Force pushes and deleted commits
  • Understand if auto-cancel feature works on existing PR push events where commit IDs were already used and how/if that worked.

image

image

Test with a build failure:

image

Test with build success:

image

@benjaoming
Copy link
Contributor Author

benjaoming commented May 16, 2023

Some (not so) new theoretical issues

All of these are theoretically possible, but they require that either webhook events are received out of order or concurrent builds finish out of order.

Update: Looking into how current PR builds work, they would have had the same issues with out-of-order events as shown below.

Out-of-order webhook events

Our current build process in doc_builder.director will checkout the commit that's stored on the build, and now we have started setting commit ids on builds received from webhook events so this behavior will be the new default. It's necessary because it's the same commit that we are going to send a status back to, and we need to send a "pending" status already before the build is triggered.

Race conditions in builds

If we start several builds, an older process can succeed in checking out an older commit and finish before a newer build. This is mostly averted by correctly canceling old builds (see next section).

Canceling old builds

In #9549, we started to cancel currently running builds when a new build is received based on the RTD version. That's okay since these builds weren't sensitive to the commit they were building - they were unaware, so any new build process would just checkout the latest branch/tag identifier.

The previous approach probably only had one shortcoming: Assuming that concurrent builds won't end up in race conditions. So the new behavior isn't too different, it's just a bit more sensitive to webhook event order or in case our event queue processes events out of order.

Force pushes and deleted commits

If a commit disappears between the time that a webhook event is received and a build is launched, the build process should fail since the commit doesn't exist. That's fine. It will just deliver an invalid status and fail.

We could anticipate that a new webhook event would show up. It ideally won't be canceled since it should arrive after the previously failed webhook event.

Potential solutions

There are two solutions that I think are important to address the above. I think they can both be solved separately from this feature, since they are already well-known issues, and to some degree also theoretically possible in our current logic.

Feature 1: Discard webhook push events based on timestamp

Most webhook events have timestamps from the Git provider so we don't need to look at commit history. Instead, we can store the timestamp, check if we have processed a newer event and discard the event in that case.

Feature 2: Auto-cancel current builds on the same version: Use commit hash, too

When we auto-cancel currently running builds, we should check if the commit hashes of all active builds are part of the new build's commit history.

Currently, active builds are canceled in the prepare_build task.

Instead, a build process could potentially receive the commit ids of all concurrently running builds with the same version so it can emit a cancel request if they are included in the new build's git history. This is more accurate.

@benjaoming benjaoming marked this pull request as ready for review May 18, 2023 13:04
@benjaoming benjaoming requested a review from a team as a code owner May 18, 2023 13:04
@benjaoming benjaoming requested a review from stsewd May 18, 2023 13:04
@benjaoming benjaoming requested a review from humitos May 18, 2023 13:04
@humitos
Copy link
Member

humitos commented May 22, 2023

I just saw that I was requested a review on this PR and tried to quickly review it. However, I saw it touches multiple parts of the build process to pass the commit around and there are some potential issues and solutions written down. I'm a little concern regarding the amount of code we are adding here together with these potential issues compared with the value this feature adds to the platform at this moment. I think it's a "nice to have" at this point and I find it hard to prioritize currently.

@benjaoming
Copy link
Contributor Author

@humitos there's a lot of work that went into preparing this change. I'll be happy to go over it with you on our call tomorrow, if we can extend it a bit. But please read the PR description and related comments.

I'm especially interested in going over this part with you:

Understand if auto-cancel feature works on existing PR push events where commit IDs were already used and how/if that worked.

@benjaoming benjaoming changed the title Handle commit hash on incoming webhooks and send commit status on all builds Build: Handle commit hash on incoming webhooks and send commit status on all builds May 23, 2023
@benjaoming
Copy link
Contributor Author

Understand if auto-cancel feature works on existing PR push events where commit IDs were already used and how/if that worked.

I looked into this now, and we should have exactly the same functionality preserved, including the out-of-order issues :)

Auto-canceling will continue to work pr version. So if a new push event adds a commit to the main branch, then any existing builds for latest will be cancelled. This seems fine since the result of the newer commit will always overwrite (upload) the latest version, so there's no need to finish the running build.

@benjaoming benjaoming added the Needed: design decision A core team decision is required label May 25, 2023
@benjaoming
Copy link
Contributor Author

While discussing this, I think we came to a conclusion that we need some more factors to prioritize this change:

  • Reasons to track the commit id on builds (perhaps integrity/correctness)
  • Additional features that need the commit id

@benjaoming
Copy link
Contributor Author

Reasons to track commit ids for builds

I'll try to gather ideas that can help prioritize this change here.

  • We can simplify logic that previously was written with if not commit. Here is an example:

    # NOTE: why we wouldn't have `self.data.build_commit` here?
    # This attribute is set when we get it after cloning the repository
    #
    # Oh, I think this is to differentiate a task triggered with
    # `Build.commit` than a one triggered just with the `Version` to build
    # the _latest_ commit of it

@humitos
Copy link
Member

humitos commented Aug 9, 2023

This idea is great, but considering the amount of work required here and the complexity added to the code base, I think we won't be able to make it for now. I'm closing this PR, but we can recover this work once we can prioritize it.

@humitos humitos closed this Aug 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needed: design decision A core team decision is required
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Provide Github (etc) commit status for builds on branches / tags
2 participants