Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scrub response errors #320

Merged
merged 12 commits into from
Dec 17, 2021
Merged

Scrub response errors #320

merged 12 commits into from
Dec 17, 2021

Conversation

hannahhoward
Copy link
Collaborator

@hannahhoward hannahhoward commented Dec 15, 2021

Goals

resolve #314

Implementation

Oy, so, this is tricky.

  • When a message errors, we want to go through all the responses contained in that message, then look at all the pending messages and clear out any record from those responses too
    • this is fairly straight forward -- see the new method on the builder -- ScrubResponse
    • and the method scrubResponses on the messageQueue that goes through all the pending messages and calls ScrubResponse
  • But it gets worse. We want to prevent any additional messages from getting queued for this request.
    • this is the hard part -- we want to essentially set a block that prevents any more messages from getting queued.
    • right now, we're dealing with this by allowing you to attach an io.Closer to the message builder - should the message fail to send, all registered io.Closers close
    • atm, the closer itself is setup at the level of the responseAssembler, which adds a layer of indirection by creating newStreams
    • the concept of a stream opens us up for a really cool improvement: removing the need to add notifications manually all over the response code
    • I also refactored all the notifications for messages to remove one of the single most annoying & confusing concepts in graphsync: the pubsub notifications concept of the "TopicDataSubscriber". Essentially, this allowed you to register a subscriber on topic A that would would rebroadcast on a set of topics of your choosing, remapping the "Topic" parameter to some set of data. This led to our subscribers being extremely confusing cause they would get up several notifications for each message, one for each block + one for final status codes. Now, the subscriber gets one notification per message, with all relevant data, does what it needs to, then moves on. Major improvement.
    • we assemble all the relevant metadata for a notification directly in the builder, but importantly, distinguish between a graphsync message builder, which is just to build graphsync messages, and a the message queue builder that allows us to attach metadata for this message. This seemed to not add a lot of extraneous methods into the message building function.
    • This does mean the PR is slightly larger than originally intended but I think it's overall a big conceptual reduction in complexity, other than the stream closing.

@hannahhoward hannahhoward marked this pull request as draft December 15, 2021 19:46
impl/graphsync_test.go Outdated Show resolved Hide resolved
@hannahhoward hannahhoward marked this pull request as ready for review December 16, 2021 02:02
Comment on lines +122 to +131
_, isPaused := err.(hooks.ErrPaused)
if isPaused {
return err
}

if err == ErrNetworkError || ipldutil.IsContextCancelErr(err) {
rt.ResponseStream.ClearRequest()
return err
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the implications of moving these outside of a "transaction"? There's a similar change in responsemanager/server.go that moves the initial error & state checking outside of the Transaction() call. Are we losing anything by doing this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not really. The whole idea of Transaction is to make sure a series of operations go out in the same libp2p message over the wire (for example, a block + a pause message that got generated running hooks on the block). ClearRequest doesn't actually send anything over the wire. I have no idea why it was in transaction in the first place :)

Comment on lines 149 to 151
if rs.isClosed() {
return
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think this (and the isClosed() 4 lines down) are the key to the "don't keep on queuing messages on an errored connection" part of the problem, is that correct? Basically making each new Transaction() on one of these errored connections a noop.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep! That's the whole idea :)

Comment on lines 145 to 148
size := uint64(0)
for _, op := range operations {
size += op.size()
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this block could be moved down below the isClosed() block

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're right. :)

requestID: requestID,
linkTracker: ra.GetProcess(p).(*peerLinkTracker),
requestID: rs.requestID,
linkTracker: rs.linkTrackers.GetProcess(rs.p).(*peerLinkTracker),
}
err := transaction(rb)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to run this if rs.isClosed() == true? Could we also be short-circuiting in this method as well?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

possibly... but I'm a little worried we still have logic unrelated to whats going over the wire happening inside the transaction, and I don't want to short circuit that.

Copy link
Member

@rvagg rvagg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately there's too much complexity and too many changes to give much more than a rubber stamp on this one. I don't see obvious problems, although the move of the error case checking outside of the Transaction() calls seems like not an insignificant change, as mentioned.
Worth trying out live to see if it solves this particular problem.

@hannahhoward hannahhoward merged commit c5de746 into main Dec 17, 2021
@mvdan mvdan deleted the feat/scrub-response-errors branch January 10, 2022 15:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

When response has a network send error, and is cancelled, remove all queued packets for this request
2 participants