-
Notifications
You must be signed in to change notification settings - Fork 2
Fix bugs, refactor, document and land the weighted consistent hashing branch #26
Conversation
// trigger early refreshes when pool size drops below this low watermark | ||
PoolLowWatermark int | ||
MaxConcurrency int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is never used.
PoolFailureDownvoteDebounce time.Duration | ||
PoolMaxSize int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is never used.
m.Unlock() | ||
nm := NewMember(m.url) | ||
nm.replication = updateWeightF(m.replication) | ||
nm.lastUpdate = time.Now() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to update the lastUpdate
time on this newly constructed member and not on the old member as the pool will now be updated with this member and we want to retain the last updated time across weight updates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think change it on both - if a couple things time out simultaneously against the previous member, we want only one of them to do the update re-weighting
// it owns and runs. We should probably just forget about the Saturn endpoints that were | ||
// previously in the pool but are no longer being returned by the orchestrator. It's highly | ||
// likely that the Orchestrator has deemed them to be non-functional/malicious. | ||
// Let's just override the old pool with the new endpoints returned here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The orchestrator periodically tests the L1s to mantain a list of responsive L1s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've asked for access to https://github.com/filecoin-saturn/orchestrator to see what exactly is "checked".
I don't think we can trust Orchestrator to give us usefull L1s, we need to verify ourselves.
The tests Orchestrator performs are not end-to-end, use hardcoded CIDs, and if the first try fails, they allow cached result (so nodes that lost retrieval capability, but have test CIDs in cache are never penalized):
- https://github.com/filecoin-saturn/orchestrator/blob/04b1f712ef612b340142f3e49e89a995e7e3154e/cron/health-check.js#L27
- https://github.com/filecoin-saturn/orchestrator/blob/04b1f712ef612b340142f3e49e89a995e7e3154e/cron/random-health-check.js#L20-L32
Orchestrator opinion would be useful only if it would run a real IPFS node, add random data to IPFS, announce it on DHT, and then attempt to retrieve it via specific L1.
This means "nearby" has little value.
We need to fetch as many L1s we can, and then let the weighting logic to find the most useful ones.
Currently, we only get 25, but with https://github.com/filecoin-saturn/orchestrator/issues/83 we will be able to ask for more.
@aarshkshah1992 update: we can now use https://orchestrator.strn.pl/nodes/nearby?count=9999 to grab as many L1s as we need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lidel This make sense to me. I think you'll have to make this change in Bifrost GW as caboose uses whatever orchestrator API URL you pass to it. Caboose Config param you'll have to change is called OrchestratorEndpoint
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think mostly this was: what happens if we get an empty list from the orchestrator because it has a bug / looses it's DB?
it doesn't seem to hurt much to keep using known, working nodes.
maybe we ask for some way for the orchestrator to more forcefully tell us to replace?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could be asking for ?count=9999
and see all of them, and decrease weight of ones that were removed.
This way if Orchestrator returns an empty list, of only 4 for some reason, we are not left with a broken client – weights will be applied to all, so no net impact on orchestrator hiccups/misconfigurations.
pool.go
Outdated
fb = time.Now() | ||
code = resp.StatusCode | ||
proto = resp.Proto | ||
respReq = resp.Request | ||
defer resp.Body.Close() | ||
|
||
// TODO: What if the Saturn node is malicious ? We should have an upper bound on how many bytes we read here. | ||
rb, err := io.ReadAll(resp.Body) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@willscott @lidel Is this a concern ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aarshkshah1992 yes, for block response, we don't expect blocks bigger than 4MiB (bitswap limit). Use io.LimitedReader
to cap the amount of data read here, just to be safe. (and error if they send anything bigger)
While at it, there is no point in reading body or calcualting hash for responses other than HTTP 200.
Check if resp.StatusCode != http.StatusOK
early, and fail fast, saving CPU and time.
Returning different error for timeout and different one for resp.StatusCode != http.StatusOK
will allow us to apply different penalty (timeout means node is trying to find content, HTTP error is just a hard fail).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR update with both these changes.
cmd/caboose/main.go
Outdated
@@ -38,7 +38,7 @@ func main1() int { | |||
} | |||
out := args.Get(1) | |||
|
|||
oe, _ := url.Parse("https://orchestrator.strn.pl/nodes/nearby") | |||
oe, _ := url.Parse("https://orchestrator.strn.pl/nodes/nearby?count=100") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This parameter should be moved inside of caboose asa a const
, appended to the endpoint.
You don't want to ask bifrost-gateway to change URL every time you want to adjust strategy for refreshing L1s :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made the entire OrchestratorEndpoint
param an optional value where the default will be this URL with a 1000 nodes. So, you can totally skip passing this param from Bifrost.
if received > 0 && received <= maxBlockSize { | ||
buf = buf[:received] | ||
} else { | ||
return nil, ErrBackendFailed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be a separate error – otherwise we will bang our heads why things do not work N years from now, if ever the block size policy changes (but fine to do it in #28)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah will do it in #28
Co-authored-by: Marcin Rataj <lidel@lidel.org>
@willscott Test is failing because we moved the Saturn client to use https and looks like it does NOT trust the cert given by the httptest TLS server. I'll take a look but a test against the Saturn network runs fine. |
@aarshkshah1992 I think this is why we have https://github.com/ipfs/bifrost-gateway/blob/main/blockstore.go#L82-L84 If you move this hack into caboose it should fix your cmd and help us clean up Saturn-related hack from bifrost-gateway |
m.Unlock() | ||
nm := NewMember(m.url) | ||
nm.replication = updateWeightF(m.replication) | ||
nm.lastUpdate = time.Now() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think change it on both - if a couple things time out simultaneously against the previous member, we want only one of them to do the update re-weighting
// it owns and runs. We should probably just forget about the Saturn endpoints that were | ||
// previously in the pool but are no longer being returned by the orchestrator. It's highly | ||
// likely that the Orchestrator has deemed them to be non-functional/malicious. | ||
// Let's just override the old pool with the new endpoints returned here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think mostly this was: what happens if we get an empty list from the orchestrator because it has a bug / looses it's DB?
it doesn't seem to hurt much to keep using known, working nodes.
maybe we ask for some way for the orchestrator to more forcefully tell us to replace?
@aarshkshah1992 I've fixed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
233eb8a was able to retrieve bafybeigdyrzt5sfp7udm7hu76uh7y26nf3efuylqabf3oclgtqy55fbzdi
, I am switching bifrost-gateway to this commit: ipfs-inactive/bifrost-gateway#39
@willscott ok to merge?
re-find idx in lock
This PR aims to land #19 and make some changes on top of it.
The main changes include:
Take care of
slowly re-fill the reputation of down-weighted backends as they successfully return content
TODO in Use a more advanced weighting / fail-over strategy #19. The way we do this is by bumping up the weights of downvoted nodes that successfully return content by 20 percent on every success with a debounce until they hit the default weight of 20 that we assign to all nodes.Addresses @lidel 's comment at Use a more advanced weighting / fail-over strategy #19 (comment) by ensuring we timeout requests to the Saturn L1s with a reasonable default.
I want to make some changes around how we refresh the pool after getting back a list of Saturn nodes from the Orchestrator and how we read the blocks in memory. Please take a look at Fix bugs, refactor, document and land the weighted consistent hashing branch #26 (comment) and Fix bugs, refactor, document and land the weighted consistent hashing branch #26 (comment) and let me know what you think.
I think I found and fixed a bug around how we were updating the weights. Please see Fix bugs, refactor, document and land the weighted consistent hashing branch #26 (comment)
Some refactoring to remove boilerplate and added documentation for people to grok what's happening in the code.
I'll add some comprehensive unit tests in follow up PRs as there's a lot of code here that needs testing.
Closes #27 Closes #25