Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert to attributable errors #2256

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

joostjager
Copy link

@joostjager joostjager commented May 2, 2023

Implements lightning/bolts#1044

This PR serves to:

Not included:

  • Support for legacy failures, this PR forces attributable errors always.
  • Timing data in the intermediate and final failure payload field.
  • Handling of failure cases where a node corrupts the failure message.
  • Working tests.
  • Clean up of various debug log/print statements.

@joostjager
Copy link
Author

Single hop interop testing passes.

Multi hop interop testing seems to be blocked by gossip propagation issues between lnd and rust-lightning.

@TheBlueMatt
Copy link
Collaborator

Nice! That's not too bad, thanks for working on this. I'll dig into the crypto in a day or two. Have a number of comments on the code itself and structure, but I assume its not really worth digging into until we support both old and new errors? I'm happy to give more code-level feedback now if you prefer.

Multi hop interop testing seems to be blocked by gossip propagation issues between lnd and rust-lightning.

Oh? Is this some known issue on the lnd end? I'm not aware of any gossip errors.

@joostjager
Copy link
Author

Have a number of comments on the code itself and structure, but I assume its not really worth digging into until we support both old and new errors? I'm happy to give more code-level feedback now if you prefer.

This doesn't surprise me. This is my first venture into Rust land. It is a bit of a switch from golang, and I need to get used to how things are done. The language is not bad though, I can see why people fall in love with it! But yes, can hold off on code-level comments for now.

Oh? Is this some known issue on the lnd end? I'm not aware of any gossip errors.

Not sure if it is a known issue. I've had nagging gossip issues before when I tried to do a pathfinding benchmark. For this PR, I did apply a patch to rust-lightning somewhere to force-send node announcement always and then it worked better. Will try to come up with a reasonable repro scenario.

@TheBlueMatt
Copy link
Collaborator

What are you using to do the testing? I assume ldk-sample of some form? If you change the timer at https://github.com/lightningdevkit/ldk-sample/blob/main/src/main.rs#L791 it will rebroadcast a fresh node announcement more aggressively.

@joostjager
Copy link
Author

Yes, ldk-sample. It was doing the timer alright, but somehow it got filtered out in the timer handler.

@TheBlueMatt
Copy link
Collaborator

Would be happy to take a look at logs. At the TRACE level we should basically be writing everything that is going out or coming in on the wire.

@joostjager
Copy link
Author

Yes, so I did see that the announcement wasn't going out. Will continue tomorrow and get back with more data.

@TheBlueMatt
Copy link
Collaborator

Errr, right, so to avoid dumping tons of gossip crap that message is only really logged at a high level (Broadcasting NodeAnnouncement after passing it to our own RoutingMessageHandler.), with the remaining logs on a per-peer level being at the GOSSIP level (disabled by default cause its verbose) - Sending message to all peers except {:?} or the announced node: {:?} and Skipping broadcast message to {:?} as its outbound buffer is full

@joostjager
Copy link
Author

Looks like timer is at 60s. I definitely waited much longer than that, so doesn't seem to be the problem. Enabled gossip logging and saved log files. Continuing in #2259

@joostjager
Copy link
Author

With gossip fixed with the hint in #2259, I was able to run through a few of the multi-hop inter-op scenarios:
LND -> LDK -> LDK
LDK -> LDK -> LND
LND -> LDK (intermediate failure)

Obviously there are more, but I think this is a good enough sanity check for now.

As mentioned above, attention should go to the crypto part of this feature first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants