Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Renepay simpler routetracking #7357

Open
wants to merge 15 commits into
base: master
Choose a base branch
from

Conversation

Lagrang3
Copy link
Collaborator

While fixing the bug #7337 I have realized that the mechanism for tracking pending routes internally in renepay is way too complicated, prone to bugs.
With this PR I wan to reduce that complexity by relying in the internal state as little as possible and using
lightningd as oracle. The tradeoff is that now there is an extra rpc call listsendpays for every route computation cycle;
this IO communication is way more expensive that checking the internal database. However the bottleneck of the payment
is the time spent waiting for onion messages from peers, not RPC communication with ligthningd and neither the computation
of the routes.

When renepay starts, one of the first operations it does is to check for
pending sendpays for the same invoice. Those are added up to an internal
database to keep track of. Also the amount they deliver to the
destination is computed so that the current payment rpc call would try
to complete the payment for whatever amount remains to pay.
For an incomplete payment after calling the RPC renepay I found this in
the logs:

2024-05-22T19:44:19.853Z DEBUG   plugin-cln-renepay: There are pending sendpays to this invoice. groupid = 6 delivering = 0msat, last_partid = 10

Where delivering should be the sum of the amounts delivered by pending routes.
Signed-off-by: Lagrang3 <lagrang3@protonmail.com>
- remove payment pointer from routetracker, fetch payment if necessary
  from payment_hash;
- "have results" condition as a function call to routetracker.

Signed-off-by: Lagrang3 <lagrang3@protonmail.com>
Use a single global map of computed routes instead of one for each
payment.

Signed-off-by: Lagrang3 <lagrang3@protonmail.com>
It feels unsafe to rely on the internal state of the plugin's database
to tell how many pending sendpays there are for the current payment.
The safest way is to assume lightningd knows and thus use listsendpay
before computing routes.

Signed-off-by: Lagrang3 <lagrang3@protonmail.com>
There were some dummy rpc calls to waitblockheight in the payment workflow
to allow the function stack to clear. But it is better if some steps in
the payment are executed "atomically" to avoid strange race conditions.
For example: all steps between getting pending sendpays, computing new
routes and sending those routes.

Signed-off-by: Lagrang3 <lagrang3@protonmail.com>
Simply move the "computed routes" array from the payment to the
routetracker. It makes sense to put all temporary stages of routing into
a single data structure: the routetracker.

Signed-off-by: Lagrang3 <lagrang3@protonmail.com>
Use success_data_from_listsendpays to check if there are "complete"
sendpays instead of imposing the presence of "complete" sendpays as a
precondition for success_data_from_listsendpays.

Signed-off-by: Lagrang3 <lagrang3@protonmail.com>
Add missing uncertainty update after payment success.

Signed-off-by: Lagrang3 <lagrang3@protonmail.com>
struct route_map *pending_routes;

/* Routes that have concluded (either SENDPAY_FAILED or
* SENDPAY_COMPLETE). */
struct route **finalized_routes;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to myself: make sure all this extra data gets cleaned after the payment session concludes.

Add remove_htlc in the route destructor.

Signed-off-by: Lagrang3 <lagrang3@protonmail.com>
We rather have a payment hash table to look up payments based on the
payment hash than having a list.

Signed-off-by: Lagrang3 <lagrang3@protonmail.com>
@Lagrang3
Copy link
Collaborator Author

Lagrang3 commented Jun 7, 2024

The current payment workflow can be described as this:

while payment is still pending:
    call rpc listsendpays to know which sendpays are pending
    build routes
    send routes
    wait until all routes have been sent
    wait until at least one route has either succeed or failed
    process results and decide whether to finish the payment with "FAILED", "SUCCESS" or continue trying.

There is an internal state of the payment called routetracker that handles all pending sendpays and their result when the plugin gets a sendpay success/fail notification. I wanted to simplify that logic in a way that we become more relying on RPC calls to lightningd and less dependent on the internal state. But I couldn't find a way around that.
Still this PR polishes some corners.

Changelog-Fixed: renepay: finalized routes have to be processed and
determine the payment status even after the payment goes into the
background (no current command active). Not doing so leads to finalized
routes getting stuck in the payment internal data structure and their
associated HTLCs in the uncertainty network don't get released.

Signed-off-by: Lagrang3 <lagrang3@protonmail.com>
Remove __PRETTY_FUNCTION__ in favor of __func__

Signed-off-by: Lagrang3 <lagrang3@protonmail.com>
@Lagrang3 Lagrang3 marked this pull request as ready for review June 7, 2024 09:14
routefail object allocation was linked to route,
we had a crash of the plugin with the following error:

0x561f424fc07a send_backtrace
	common/daemon.c:33
0x561f424fc102 crashdump
	common/daemon.c:75
0x7f5b0e7dc04f ???
	???:0
0x7f5b0e82ae2c ???
	???:0
0x7f5b0e7dbfb1 ???
	???:0
0x7f5b0e7c6471 ???
	???:0
0x561f4252581f call_error
	ccan/ccan/tal/tal.c:95
0x561f425258c8 check_bounds
	ccan/ccan/tal/tal.c:169
0x561f425258f9 to_tal_hdr
	ccan/ccan/tal/tal.c:179
0x561f42526283 tal_free
	ccan/ccan/tal/tal.c:525
0x561f424e5379 routefail_end
	plugins/renepay/routefail.c:52
0x561f424e557b handle_failure
	plugins/renepay/routefail.c:431

apparently there was a race condition for which the route was first
freed before we arrived to routefail_end where we manually free
routefail. I don't see how this could have happened, but anyways
this subtle bug can be avoided by linking the routefail to the payment.

Signed-off-by: Lagrang3 <lagrang3@protonmail.com>
Often I find this in the logs:
DEBUG   plugin-cln-renepay: gossmap ignored 94692259263600 channel updates

Apparently gossmap_refresh does not initialize the input variable num_channel_updates_rejected.
Fix this by explicitely initializing it before calling gossmap_refresh.

Signed-off-by: Lagrang3 <lagrang3@protonmail.com>
@Lagrang3 Lagrang3 added this to the v24.08 milestone Jun 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant