-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
process: allow outgoing channel not found for failed htlcs #99
process: allow outgoing channel not found for failed htlcs #99
Conversation
1c88eba
to
1023488
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also didn't think about this case. Did the bug already go out on umbrel?
name: "outgoing not found, settled", | ||
settled: true, | ||
outgoingFound: false, | ||
err: errChannelNotFound, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it correct that there is still a race condition in which this can happen? The channel closes right at the moment the resolution is read from the interceptor stream?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we still don't account for pending channels because the pending API doesn't surface all the info we need.
This has always existed for the incoming channel and it doesn't seem like anybody has run into issues? But we could downgrade to just log whenever channel is not found to be cautious?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was also what I thought. But agreed that it seems far off the hot path. Whatever you think is best.
} | ||
|
||
exit := make(chan error) | ||
go func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is quite a bit of duplication of setup code across the various tests. Non-blocking.
process.go
Outdated
// expect to find the channel). We still enforce channel lookup | ||
// for successful HTLCs, because then we know that the channel | ||
// does exist and should be found. | ||
var outgoingPeer route.Vertex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of an all-zeroes key in the database, a NULL value would be cleaner I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy for me to add a migration to relax the not-null restriction on outgoing_peer
to do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it would need to be relaxed. Hard to say if it is worth the effort, but as it is something that can be expressed natively in the db, it may be better to avoid the magic number.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative would be to just not record anything in the fwd log at all in these cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative would be to just not record anything in the fwd log at all in these cases?
I think that it's pretty useful info to track because it's pretty odd behavior - either somebody's code is very broken or they're spamming you. Will look into the DB change 👍
Sadly yes, I'll tag a new release + update once this is in. |
1023488
to
40241e6
Compare
Sorry for the delay on this - running out of capacity with the end of year approaching. Decided to just log + skip recording the HTLC for the sake of getting a fix out before the holidays. Can always come back and do the DB update at a later stage with minimal code churn. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine with postponing the db migration.
peer_controller.go
Outdated
// | ||
// Log and skip this HTLC for now to avoid adding empty values to the DB. | ||
if resolution.outgoingPeer == nil { | ||
log.Debugf("HTLC: %v has no known outgoing peer, not storing resolution", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there always two log lines now for this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes there are, cutting this one.
Update error handling for outgoing channel not found to catch the case where an outgoing channel was not found for a failed HTLC. Unlike incoming HTLCs, where the HTLC arrived on the channel so we know it exists, we have not yet performed any existence validation on the outgoing channel (because interception happens before we check that it exists). Since we only need the outgoing channel for record keeping, we just log the case where a HTLC was failed back and we don't know the channel (since this is just a bogus channel). We don't store this HTLC in the DB, as it will be instantly failed back.
40241e6
to
7d140d5
Compare
Fixes #97
Update process to log channel not found errors for failed lookup of outgoing HTLCs for payments that failed. For incoming links, we know that the channel exists because something was forwarded to us over an existing channel, but this level of validation has not yet occurred for the outgoing link (as we intercept it before checking that the channel exists) so it's possible that we can't lookup the bogus channel.
Change: