-
Notifications
You must be signed in to change notification settings - Fork 2
Consider error codes in downvoting and build test harness for pool #39
Conversation
pool.go
Outdated
@@ -297,27 +306,36 @@ func (p *pool) fetchAndUpdate(ctx context.Context, node string, c cid.Cid, attem | |||
idx, nm = p.updateWeightUnlocked(node, false) | |||
p.lk.RUnlock() | |||
|
|||
if idx != -1 && nm != nil && p.endpoints[idx].url == nm.url { | |||
if idx != -1 && nm != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
endpoints was being read outside a lock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks generally good
func (p *pool) Start() { | ||
go p.refreshPool() | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do we get by factoring this out as a separate API to call versus having it implicit in newPool
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@willscott The flexibility to start the refresh loop whenever we want in tests instead of always having to do it on creation of the pool.
pool.go
Outdated
func (p *pool) upvote(node string) { | ||
p.lk.RLock() | ||
idx, nm = p.updateWeightUnlocked(node, true) | ||
idx, nm := p.updateWeightUnlocked(node, false) | ||
p.lk.RUnlock() | ||
|
||
if idx != -1 && nm != nil { | ||
p.replaceNodeToHaveWeight(nm) | ||
} | ||
} | ||
|
||
func (p *pool) downvote(node string) { | ||
p.lk.RLock() | ||
idx, nm := p.updateWeightUnlocked(node, true) | ||
p.lk.RUnlock() | ||
|
||
// we weren't able to downvote the node. | ||
if idx == -1 || nm == nil { | ||
return | ||
} | ||
|
||
// we need to take the lock as we're updating the list of pool endpoint members below. | ||
p.replaceNodeToHaveWeight(nm) | ||
return | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can upvote
and downvote
be one method with direction bool
since it seems like they basically just differ in the updateWeightUnlocked
bool
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now done.
pool.go
Outdated
// Saturn fetch worked, we return the block. | ||
return | ||
} | ||
|
||
// If this is a NOT found or Timeout error, park the downvoting for now and see if other members are able to give us this content. | ||
if errors.Is(err, ErrContentNotFound) || errors.Is(err, ErrSaturnTimeout) { | ||
transientErrs[node] = struct{}{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of struct{}{}
maybe worth tracking the err
itself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
pool.go
Outdated
for node := range transientErrs { | ||
p.downvote(node) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how much complexity would it introduce to have this batch of operations occur within one lock acquisition? (i'm okay for this to be a subsequent PR)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, have done this. Please take a look.
filecoin-saturn/caboose#37 Not very useful atm, but might be once filecoin-saturn/caboose#39 is done.
caboose.go
Outdated
|
||
var ErrNotImplemented error = errors.New("not implemented") | ||
var ErrNoBackend error = errors.New("no available strn backend") | ||
var ErrBackendFailed error = errors.New("strn backend failed") | ||
var ErrContentNotFound error = errors.New("content not found") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What Lassie is doing (returning 404) is semantically incorrect, will cause us headache when we will do GETs for graphs and paths.
Let's make this clear that it is failure to find providers, and not failure to find path in requested path:
var ErrContentNotFound error = errors.New("content not found") | |
var ErrContentProviderNotFound error = errors.New("strn failed to find content providers") |
I know this looks like a cosmetic difference, but trust me, one day, this will save someone hours in debugging :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
if idx == -1 || nm == nil { | ||
continue | ||
} | ||
p.updatePoolWithNewWeightUnlocked(nm, idx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the UpdateWithWeights
call ends up being expensive - it re-builds the full consistent hash table.
we can do it in a followup PR, but we should see if we can end up making 1 call to UpdateWithWeights' after applying all our weight changes to
endpoints`.
fixes #28 with tests.
Introduces a test harness for the pool.