-
Notifications
You must be signed in to change notification settings - Fork 2
Cool down cid failures #52
Cool down cid failures #52
Conversation
@@ -66,20 +75,38 @@ const DefaultSaturnGlobalBlockFetchTimeout = 60 * time.Second | |||
const maxBlockSize = 4194305 // 4 Mib + 1 byte | |||
const DefaultOrchestratorEndpoint = "https://orchestrator.strn.pl/nodes/nearby?count=1000" | |||
const DefaultPoolRefreshInterval = 5 * time.Minute | |||
const DefaultMaxCidFailures = 3 | |||
const DefaultCidCoolDownDuration = 10 * time.Minute |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lidel may have thoughts on what our negative cache should be here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, the half of the pool seems to fail to fetch some CID(s) every ~10-15m and gets punished for it (from bifrost-gw-staging-metrics):
Not sure what is the right number, but probably way shorter than 10m. Maybe play it safe and cooldown for 1m for now?
We could increase it up to 5min, but having these spikes and cooldown longer than DefaultPoolRefreshInterval feels like a recipe for running out of useful ones before refresh.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's continue this discussion in #59
@@ -50,6 +50,10 @@ type pool struct { | |||
refresh chan struct{} // refresh is used to signal the need for doing a refresh of the Saturn endpoints pool. | |||
done chan struct{} // done is used to signal that we're shutting down the Saturn endpoints pool and don't need to refresh it anymore. | |||
|
|||
cidLk sync.RWMutex | |||
cidFailureCache *cache.Cache // guarded by cidLk |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
go-cache says:
Its major advantage is that, being essentially a thread-safe map[string]interface{}
can we use it's internal thread safety rather than needing an explicit lock for accesses?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@willscott I want both cidFailureCache
and cidCoolDownCache
to be updated atomically.
Let's continue this discussion in #59. |
fixes #44