Skip to content
This repository has been archived by the owner on Apr 18, 2024. It is now read-only.

Consider error codes in downvoting and build test harness for pool #39

Merged
merged 8 commits into from
Feb 22, 2023

Conversation

aarshkshah1992
Copy link
Contributor

@aarshkshah1992 aarshkshah1992 commented Feb 21, 2023

fixes #28 with tests.

Introduces a test harness for the pool.

@aarshkshah1992 aarshkshah1992 removed the request for review from willscott February 21, 2023 06:21
@aarshkshah1992 aarshkshah1992 changed the title Consider error codes in downvoting [WIP] Consider error codes in downvoting Feb 21, 2023
@aarshkshah1992 aarshkshah1992 changed the title [WIP] Consider error codes in downvoting Consider error codes in downvoting and build test harness for pool Feb 21, 2023
pool.go Outdated
@@ -297,27 +306,36 @@ func (p *pool) fetchAndUpdate(ctx context.Context, node string, c cid.Cid, attem
idx, nm = p.updateWeightUnlocked(node, false)
p.lk.RUnlock()

if idx != -1 && nm != nil && p.endpoints[idx].url == nm.url {
if idx != -1 && nm != nil {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

endpoints was being read outside a lock.

Copy link
Contributor

@willscott willscott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks generally good

Comment on lines +134 to +137
func (p *pool) Start() {
go p.refreshPool()
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do we get by factoring this out as a separate API to call versus having it implicit in newPool?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@willscott The flexibility to start the refresh loop whenever we want in tests instead of always having to do it on creation of the pool.

pool.go Outdated
Comment on lines 322 to 343
func (p *pool) upvote(node string) {
p.lk.RLock()
idx, nm = p.updateWeightUnlocked(node, true)
idx, nm := p.updateWeightUnlocked(node, false)
p.lk.RUnlock()

if idx != -1 && nm != nil {
p.replaceNodeToHaveWeight(nm)
}
}

func (p *pool) downvote(node string) {
p.lk.RLock()
idx, nm := p.updateWeightUnlocked(node, true)
p.lk.RUnlock()

// we weren't able to downvote the node.
if idx == -1 || nm == nil {
return
}

// we need to take the lock as we're updating the list of pool endpoint members below.
p.replaceNodeToHaveWeight(nm)
return
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can upvote and downvote be one method with direction bool since it seems like they basically just differ in the updateWeightUnlocked bool

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now done.

pool.go Outdated
// Saturn fetch worked, we return the block.
return
}

// If this is a NOT found or Timeout error, park the downvoting for now and see if other members are able to give us this content.
if errors.Is(err, ErrContentNotFound) || errors.Is(err, ErrSaturnTimeout) {
transientErrs[node] = struct{}{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of struct{}{} maybe worth tracking the err itself?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

pool.go Outdated
Comment on lines 255 to 257
for node := range transientErrs {
p.downvote(node)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how much complexity would it introduce to have this batch of operations occur within one lock acquisition? (i'm okay for this to be a subsequent PR)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, have done this. Please take a look.

lidel added a commit to ipfs-inactive/bifrost-gateway that referenced this pull request Feb 22, 2023
caboose.go Outdated

var ErrNotImplemented error = errors.New("not implemented")
var ErrNoBackend error = errors.New("no available strn backend")
var ErrBackendFailed error = errors.New("strn backend failed")
var ErrContentNotFound error = errors.New("content not found")
Copy link
Contributor

@lidel lidel Feb 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What Lassie is doing (returning 404) is semantically incorrect, will cause us headache when we will do GETs for graphs and paths.
Let's make this clear that it is failure to find providers, and not failure to find path in requested path:

Suggested change
var ErrContentNotFound error = errors.New("content not found")
var ErrContentProviderNotFound error = errors.New("strn failed to find content providers")

I know this looks like a cosmetic difference, but trust me, one day, this will save someone hours in debugging :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

if idx == -1 || nm == nil {
continue
}
p.updatePoolWithNewWeightUnlocked(nm, idx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the UpdateWithWeights call ends up being expensive - it re-builds the full consistent hash table.

we can do it in a followup PR, but we should see if we can end up making 1 call to UpdateWithWeights' after applying all our weight changes to endpoints`.

@willscott willscott merged commit 7ed8037 into lock-cleanup Feb 22, 2023
@willscott willscott deleted the feat/downvote-based-return-codes branch February 22, 2023 17:44
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants