Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(drand): StateGetBeaconEntry uses chain beacons for historical epochs #12428

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

rvagg
Copy link
Member

@rvagg rvagg commented Sep 3, 2024

Fixes: #12414

Previously StateGetBeaconEntry would always try and use a drand beacon to get the appropriate round. But as drand has shut down old beacons and we've removed client details from Lotus, it has stopped working for historical beacons.
This fix restores historical beacon entries by using the on-chain lookup, however it now follows the rules used by StateGetRandomnessFromBeacon and the get_beacon_randomness syscall which has some quirks with null rounds prior to nv14. See #12414 (comment) for specifics.

StateGetBeaconEntry still blocks for future epochs and uses live drand beacon clients to wait for and fetch rounds as they are available.


Notes for review: the existing logic in chain/rand/rand.go should not change but I've had to do some refactoring to be able to return the beacon entries as well and not just the randomness. So the file looks a bit different. Please pay special attention to that because it's used for the get_beacon_randomness syscall and the StateGetRandomnessFromBeacon API. Aside from that, this should only impact StateGetBeaconEntry.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rvagg rvagg changed the title fix(drand): StateGetBeaconEntry uses chain beacons for historical epochs fix(drand): StateGetBeaconEntry uses chain beacons for historical epochs Sep 3, 2024
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rjan90
Copy link
Contributor

rjan90 commented Sep 9, 2024

I added a release/backport to this PR, and will spin up a Calibration-node with this patch, to confirm that I can get the beacon that Hubert encountered issues with: https://filecoinproject.slack.com/archives/CPFTWMY7N/p1724405387736799?thread_ts=1724401761.517049&cid=CPFTWMY7N

Copy link
Contributor

@AnomalRoil AnomalRoil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot, this should significantly reduce the quantity of nodes trying to query the old defunct incentinet network we lost 👍🏻

@@ -647,6 +647,11 @@ func (fr *fakeRand) GetChainRandomness(ctx context.Context, randEpoch abi.ChainE
return *(*[32]byte)(out), nil
}

func (fr *fakeRand) GetBeaconEntry(ctx context.Context, randEpoch abi.ChainEpoch) (*types.BeaconEntry, error) {
r, _ := fr.GetChainRandomness(ctx, randEpoch)
return &types.BeaconEntry{Round: 10, Data: r[:]}, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised that the fakeRand returning always the same round value, 10, rather than the expected one for the given randEpoch isn't breaking anything.

Comment on lines +155 to +169
for i := 0; i < 20; i++ {
cbe := randTs.Blocks()[0].BeaconEntries
for _, v := range cbe {
if v.Round == round {
return &v, nil
}
}

next, err := sr.cs.LoadTipSet(ctx, randTs.Parents())
if err != nil {
return nil, xerrors.Errorf("failed to load parents when searching back for beacon entry: %w", err)
}

randTs = next
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume the need for this loop here is because they might be null rounds?
Why 20? Is there any protocol guarantee that there won't be more than 20 null ones?

Furthermore I thought a given tipset was a set of blocks with the same "epoch"?
The drand beacons for a given epoch are univocally determined by the epoch timestamp.
But then loading the tipset's parents and trying to fetch their beacon entries should yield a beacon meant for the previous epoch, no? e.g. if my epoch is 42 and the corresponding drand epoch is 66 and no beacon entries match the round I've queried, it doesn't make sense to go see at the past 20 epochs since these epochs should have beacon rounds that are strictly inferiors to 66.

Let say I have tipsets for epochs:
40 -> 41 -> 42 ->43

and let's say the corresponding drand rounds would be:
64 -> 65 -> 66 -> 67

It seems to me that querying getBeaconEntryV3 on epoch 43 might go and look at these past epochs to check if they contains the right drand round 67 in case the first block of the tipset 43 doesn't contains round 67 in its beacons entries.
But I don't understand how a block on epoch 40 could have drand round 67 in it, since round 67 shouldn't have been produced yet by the drand network when epoch 40 occurred, it only could have known the rounds up to round 64 at the time, no?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this is old code you moved around, but it still looks weird.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it is weird, and I was tempted to rip out the loop and return an error if it's not found in the next non-null epoch. It came in #7376 and I suspect it was just a rough copy of the logic in GetLatestBeaconEntry which searches 20 tipsets for one with a beacon in it and fails after it goes through 20 without finding a single beacon.

With the current logic, even that search should fail, but historically maybe that was not unreasonable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the interests of scope, and the fact that this check probably never iterates more than once, I decided to leave it as is and cleanup can be done later so as not to complicated this PR and get it over the line

wait: true,
},
{
// After v14, if the tipset corresponding to round X is null, we still fetch the randomness for X (from the next non-null tipset) but can get the exact round
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The next or the previous?
It seems from the above code you're getting it from its parent, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, that's the 20 iterations thing that needs cleaning up:

  1. call getBeaconEntryV3 with the filecoin epoch we want
  2. call GetBeaconRandomnessTipset on that epoch with lookback set to false, so it'll skip ahead to the next non-null tipset
  3. look in the beacon entries in that tipset for the precise drand round number that we want
  4. if we don't find it, walk back up the chain ... but we're not going to find it up the chain because we're matching the exact drand round number so it'll always end in an error, just after a 20 tipset walkback

@rvagg
Copy link
Member Author

rvagg commented Sep 12, 2024

Opened #12452 to capture some thoughts about the 20 tipset walkback(s) in the code that I think are unnecessary. I just don't want to adjust the current get-beacon-from-chain logic in this PR so I left it alone.

@rjan90 rjan90 mentioned this pull request Sep 16, 2024
48 tasks
@rvagg
Copy link
Member Author

rvagg commented Sep 17, 2024

Calling for objections before I land this. It would be great to get some more review eyes but I understand it's niche.

…pochs

Fixes: #12414

Previously StateGetBeaconEntry would always try and use a drand beacon to get
the appropriate round. But as drand has shut down old beacons and we've
removed client details from Lotus, it has stopped working for historical
beacons.
This fix restores historical beacon entries by using the on-chain lookup,
however it now follows the rules used by StateGetRandomnessFromBeacon and the
get_beacon_randomness syscall which has some quirks with null rounds prior to
nv14. See #12414 (comment)
for specifics.

StateGetBeaconEntry still blocks for future epochs and uses live drand beacon
clients to wait for and fetch rounds as they are available.
@BigLep
Copy link
Member

BigLep commented Sep 17, 2024

Per 2024-09-17 FilOz meeting, @jennijuju asked that @Stebalien or @Kubuxu take this to get this landed.

@Stebalien @Kubuxu : see @rvagg 's offer for a sync discussion in https://filecoinproject.slack.com/archives/CP50PPW2X/p1726550218788619 if that is helpful

Also, this one is needed for #12464 which we want to start this week.

@BigLep
Copy link
Member

BigLep commented Sep 18, 2024

Notes from 2024-09-18 Lotus standup conversation: even though this has been opened for a couple of weeks, it affects pre-quicknet, and it was desired to get in as part of Lotus v1.29.2 this week, we won't merge it until it gets reviewed properly. We know @Stebalien and @Kubuxu are more tied up this week with a colo, but we assume (and we'll followup) that they'll engage next week.

@Kubuxu
Copy link
Contributor

Kubuxu commented Sep 19, 2024

I will try to fetch in the context. I looked at it few weeks ago but I could not reach a conclusion at that time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: 🔎 Awaiting review
Development

Successfully merging this pull request may close these issues.

StateGetBeaconEntry uses drand client to look up round instead of using tipset data
5 participants