Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

process: allow outgoing channel not found for failed htlcs #99

Merged
merged 1 commit into from
Dec 15, 2023

Conversation

carlaKC
Copy link
Collaborator

@carlaKC carlaKC commented Nov 27, 2023

Fixes #97

Update process to log channel not found errors for failed lookup of outgoing HTLCs for payments that failed. For incoming links, we know that the channel exists because something was forwarded to us over an existing channel, but this level of validation has not yet occurred for the outgoing link (as we intercept it before checking that the channel exists) so it's possible that we can't lookup the bogus channel.

Change:

  • Just log failed + outgoing channel not found htlcs errors
    • This will store an empty pubkey for the outgoing peer, I think that's worth storing because it can easily be filtered
  • Still enforce channel lookup for successful HTLCs, as we know the channel exists

@carlaKC carlaKC marked this pull request as ready for review November 27, 2023 11:21
Copy link
Contributor

@joostjager joostjager left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also didn't think about this case. Did the bug already go out on umbrel?

name: "outgoing not found, settled",
settled: true,
outgoingFound: false,
err: errChannelNotFound,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it correct that there is still a race condition in which this can happen? The channel closes right at the moment the resolution is read from the interceptor stream?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we still don't account for pending channels because the pending API doesn't surface all the info we need.

This has always existed for the incoming channel and it doesn't seem like anybody has run into issues? But we could downgrade to just log whenever channel is not found to be cautious?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was also what I thought. But agreed that it seems far off the hot path. Whatever you think is best.

}

exit := make(chan error)
go func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is quite a bit of duplication of setup code across the various tests. Non-blocking.

process.go Outdated
// expect to find the channel). We still enforce channel lookup
// for successful HTLCs, because then we know that the channel
// does exist and should be found.
var outgoingPeer route.Vertex
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of an all-zeroes key in the database, a NULL value would be cleaner I think.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy for me to add a migration to relax the not-null restriction on outgoing_peer to do this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it would need to be relaxed. Hard to say if it is worth the effort, but as it is something that can be expressed natively in the db, it may be better to avoid the magic number.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative would be to just not record anything in the fwd log at all in these cases?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative would be to just not record anything in the fwd log at all in these cases?

I think that it's pretty useful info to track because it's pretty odd behavior - either somebody's code is very broken or they're spamming you. Will look into the DB change 👍

@carlaKC
Copy link
Collaborator Author

carlaKC commented Nov 28, 2023

Did the bug already go out on umbrel?

Sadly yes, I'll tag a new release + update once this is in.

@carlaKC
Copy link
Collaborator Author

carlaKC commented Dec 12, 2023

Sorry for the delay on this - running out of capacity with the end of year approaching. Decided to just log + skip recording the HTLC for the sake of getting a fix out before the holidays. Can always come back and do the DB update at a later stage with minimal code churn.

Copy link
Contributor

@joostjager joostjager left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with postponing the db migration.

//
// Log and skip this HTLC for now to avoid adding empty values to the DB.
if resolution.outgoingPeer == nil {
log.Debugf("HTLC: %v has no known outgoing peer, not storing resolution",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there always two log lines now for this case?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes there are, cutting this one.

Update error handling for outgoing channel not found to catch the case
where an outgoing channel was not found for a failed HTLC. Unlike
incoming HTLCs, where the HTLC arrived on the channel so we know it
exists, we have not yet performed any existence validation on the
outgoing channel (because interception happens before we check that it
exists).

Since we only need the outgoing channel for record keeping, we just
log the case where a HTLC was failed back and we don't know the channel
(since this is just a bogus channel). We don't store this HTLC in the
DB, as it will be instantly failed back.
@carlaKC carlaKC merged commit 60b70d9 into lightningequipment:master Dec 15, 2023
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

crash with "channel not found" for "UnknownNextPeer" routing failures
2 participants