-
Notifications
You must be signed in to change notification settings - Fork 20k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Swarm MRUs: Adaptive frequency / Predictable lookups / API simplification #17559
Conversation
384357d
to
ec8790d
Compare
43f4b23
to
7d5f455
Compare
72d6416
to
6a6affd
Compare
swarm/storage/mru/lookup/lookup.go
Outdated
// but limited to not return a level that is smaller than the last-1 | ||
func GetNextLevel(last Epoch, now uint64) uint8 { | ||
// First XOR the last epoch base time with the current clock. | ||
// This will set all the common most significant bits will to zero. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bits will to zero -> bits to zero
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed thanks.
swarm/storage/mru/lookup/lookup.go
Outdated
return value, nil | ||
} | ||
hint = epoch | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put a continue here rather than an else clause
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed. Thanks
swarm/storage/mru/request_test.go
Outdated
t.Fatalf("Expected epoch to be '%s', was '%s'", epoch.String(), checkUpdate.Epoch.String()) | ||
} | ||
if !bytes.Equal(data, checkUpdate.data) { | ||
t.Fatalf("Expectedn data '%x', was '%x'", data, checkUpdate.data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Expectedn -> Expected
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed, thanks.
swarm/storage/mru/request_test.go
Outdated
} | ||
|
||
// mess with the lookup key to make sure Verify fails: | ||
recoveredRequest.Time = 77999 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not trivial that changing the time here changes the epoch and therefore the address
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added an extra comment, though line 185 explained it.
swarm/storage/mru/request_test.go
Outdated
r.data = []byte("Al bien hacer jamás le falta premio") // put some arbitrary length data | ||
_, err = r.toChunk() | ||
if err == nil { | ||
t.Fatal("expected request.toChunk to fail when there is no signature", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
err is nil, not needed as arg
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed. thanks
const ( | ||
hasherCount = 8 | ||
resourceHashAlgorithm = storage.SHA3Hash | ||
defaultRetrieveTimeout = 100 * time.Millisecond |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very very low. In network scenario this will be longer, so MRU resolver will be fooled
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This number has been as-is since the first version. What would be a good timeout?.
Take into account that the best-case scenario will take 2-3 lookups.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FTR, our convo in orange lounge
lash
@nolash
Sep 17 09:25
maybe the initial lookup should be async and bathced at least, if the timeout increases? And then cancel if the newest comes back?
Javier Peletier
@jpeletier
Sep 17 09:35
Can you elaborate? Each epoch lookup fail/success implies the algorithm takes a diferent path for the next epoch lookup. I thought about this, but would need to run it in parallel, forking on every lookup while we wait for the timeout or success.
So this would allow us to increase the timeout without sacrificing total lookup time, but would produce more lookups.
Viktor Trón
@zelig
Sep 17 09:59
This change is not for this PR for sure. As long as syncing, retrieval works, it's ok. Nonetheless 100millisecs is just too short for network retrieval. I worry that on the main cluster this would result in outdated resolutions cos some chunks just won't arrive that fast.
An alternative to parallelism is to go ahead after a short time but backtrack if a chunk that we thought was not found arrives later. That's probably easier to implement.
Another alternative or additional measure is we have redundancy (probably needed anyway for root chunks) and retrieve alternatives parallelly. That might guarantee some upper limit on successful retrieval latency and therefore makes the 'not found' decision reliable
swarm/storage/mru/handler.go
Outdated
log.Warn("Handler.get with invalid rootAddr") | ||
// Retrieves the resource cache value for the given nameHash | ||
func (h *Handler) get(view *View) *cacheEntry { | ||
if view == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need these cjecks on get/set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying to avoid a null-pointer panic in case some code screwed up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unnecessary; do not screw up code 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, removed.
d990b5f
to
8667a51
Compare
8667a51
to
25340fe
Compare
swarm/storage/mru/lookup: fixed getBaseTime Added NewEpoch constructor swarm/api/client: better error handling in GetResource() swarm/storage/mru: Renamed structures. Renamed ResourceMetadata to ResourceID. Renamed ResourceID.Name to ResourceID.Topic swarm/storage/mru: Added binarySerializer interface and test tools swarm/storage/mru/lookup: Changed base time to time and + marshallers swarm/storage/mru: Added ResourceID (former resourceMetadata) swarm/storage/mru: Added ResourceViewId and serialization tests swarm/storage/mru/lookup: fixed epoch unmarshaller. Added Epoch Equals swarm/storage/mru: Fixes as per review comments cmd/swarm: reworded resource create/update help text regarding topic swarm/storage/mru: Added UpdateLookup and serializer tests swarm/storage/mru: Added UpdateHeader, serializers and tests swarm/storage/mru: changed UpdateAddr / epoch to Base() swarm/storage/mru: Added resourceUpdate serializer and tests swarm/storage/mru: Added SignedResourceUpdate tests and serializers swarm/storage/mru/lookup: fixed GetFirstEpoch bug swarm/storage/mru: refactor, comments, cleanup Also added tests for Topic swarm/storage/mru: handler tests pass swarm/storage/mru: all resource package tests pass swarm/storage/mru: resource test pass after adding timestamp checking support swarm/storage/mru: Added JSON serializers to ResourceIDView structures swarm/storage/mru: Sever, client, API test pass swarm/storage/mru: server test pass swarm/storage/mru: Added topic length check swarm/storage/mru: removed some literals, improved "previous lookup" test case swarm/storage/mru: some fixes and comments as per review swarm/storage/mru: first working version without metadata chunk swarm/storage/mru: Various fixes as per review swarm/storage/mru: client test pass swarm/storage/mru: resource query strings and manifest-less queries swarm/storage/mru: simplify naming swarm/storage/mru: first autofreq working version swarm/storage/mru: renamed ToValues to AppendValues swarm/resource/mru: Added ToValues / FromValues for URL query strings swarm/storage/mru: Changed POST resource to work with query strings. No more JSON. swarm/storage/mru: removed resourceid swarm/storage/mru: Opened up structures swarm/storage/mru: Merged Request and SignedResourceUpdate swarm/storage/mru: removed initial data from CLI resource create swarm/storage/mru: Refactor Topic as a direct fixed-length array swarm/storage/mru/lookup: Comprehensive GetNextLevel tests swarm/storage/mru: Added comments Added length checks in Topic swarm/storage/mru: fixes in tests and some code comments swarm/storage/mru/lookup: new optimized lookup algorithm swarm/api: moved getResourceView to api out of server swarm/storage/mru: Lookup algorithm working swarm/storage/mru: comments and renamed NewLookupParams Deleted commented code swarm/storage/mru/lookup: renamed Epoch.LaterThan to After swarm/storage/mru/lookup: Comments and tidying naming swarm/storage/mru: fix lookup algorithm swarm/storage/mru: exposed lookup hint removed updateheader swarm/storage/mru/lookup: changed GetNextEpoch for initial values swarm/storage/mru: resource tests pass swarm/storage/mru: valueSerializer interface and tests swarm/storage/mru/lookup: Comments, improvements, fixes, more tests swarm/storage/mru: renamed UpdateLookup to ID, LookupParams to Query swarm/storage/mru: renamed query receiver var swarm/cmd: MRU CLI tests
81c7367
to
8a85203
Compare
cmd/swarm: remove rogue fmt swarm/storage/mru: Add version / header for future use-
49b7125
to
07c2aa7
Compare
Note: Discussion and review of this PR happened in 3 steps here:
Issue for roadmap tracking: ethersphere/swarm#910
Abstract
The current MRU implementation requires users to agree upon a predefined frequency and start time to start publishing updates about a certain topic. This causes lots of problems if that update frequency is not honored and requires users to know other user's update frequencies / start times in order to look up their updates on common topics
This PR removes this limitation via a novel adaptive frequency resource lookup algorithm. This algorithm automatically adjusts to the publisher's actual update frequency and converges quickly whether an update is found or not.
Users "following" a publisher automatically "tune" to the perceived frequency and can guess easily where the next update ought to be, meaning that subsequent lookups to get a newer update run faster or can be prefetched. This also allows to monitor a resource easily.
The Swarm team is working on a paper to describe this approach. In the meantime, the algorithm is described below.
As a result, interactions with Swarm's MRUs are greatly simplified since the user doesn't have to come up with a start time and frequency upfront, but rather start publishing updates about the topic they want.
In addition, this PR brings to Swarm:
multihash
flag in MRUs.bzz:
scheme will detect multihashes automaticallyTerminology
Topic
: 32-byte arbitrary byte array specifying what the MRU contains information about or acting as a "meeting point"UserAddress
: Public address of any user.Epoch
time span at a specific frequency level. See Adaptive frequency algorithm below.View
: Concatenation of (Topic
andUserAddress
): This represents a particular user's updates (point of view) about a specific topic.UpdateAddr
: Hash of (View
,Epoch
): This allows to look up a user's series of updates over time.What applications does this PR enable?
Data feeds, microblogging applications, metadata feeds about specific content...
API changes
HTTP API
To publish an update:
1.- Get resource metainformation
GET /bzz-resource:/?topic=<TOPIC>&user=<USER>&meta=1
GET /bzz-resource:/<MANIFEST OR ENS NAME>/?meta=1
Where:
You will receive a JSON like the below:
2.- Post the update
Extract the fields out of the JSON and build a query string as below:
POST /bzz-resource:/?topic=<TOPIC>&user=<USER>&level=<LEVEL>&time=<TIME>&signature=<SIGNATURE>
body: binary stream with the update data.
(more information about what each of these fields are in the adaptive frequency algorithm below)
To get the last update:
GET /bzz-resource:/?topic=<TOPIC>&user=<USER>
GET /bzz-resource:/<MANIFEST OR ENS NAME>
To get a previous update:
Add an addtional
time
parameter. The last update before that time will be looked up.GET /bzz-resource:/?topic=<TOPIC>&user=<USER>&time=<T>
GET /bzz-resource:/<MANIFEST OR ENS NAME>?time=<T>
Advanced search:
If you have an idea of when the last update happened, you can also hint the lookup algorithm by adding the following extra parameters:
hint.time
: Time at when you think the last update happenedhint.level
: Integer. Approximate period you think the updates where happening at, expressed aslog2(T)
, rounded up. For example, a resource updating every 300 seconds, level should be set to9
.log2(300) = 8.22
. See the Adaptive Frequency algorithm below for details on this.Note that this would only affect first lookups. Your swarm node will keep track of last updates and automatically use the last seen update as a hint. Using these parameters would override that automatic hint.
To publish a manifest:
POST /bzz-resource:/?topic=<TOPIC>&user=<USER>&manifest=1
with an empty body.Note: this functionality could be moved to the client and removed from the node, since this just creates a JSON and publishes it to
bzz-raw
, so the client could actually create this itself and callclient.UploadRaw()
. Don't expect this call to be available in future releases.CLI
Creating a resource manifest:
swarm resource create
is redefined as a command line to create and publish a MRU manifest only.swarm resource create [command options]
Update a resource
swarm resource update [command options] <0x Hex data>
Quick and dirty test:
In the example, the user wants to publish his/her profile picture so it can be found by anyone who knows his/her Ethereum address.
Adaptive frequency lookup algorithm
At the core of this PR is a new lookup algorithm with the following properties:
Revamping the resource frequency concept
In this new implementation, period lengths are expressed as powers of 2. The highest frequency (shortest period, update every second) is expressed as 2⁰ = 1 second. The lowest update frequency is currently set to 2²⁵ = 33554432 seconds which equals to roughly one year.
Therefore, the frequency can be encoded as just the exponent. We call this exponent frequency level, or level for short. A higher level means a longer period and thus a smaller frequency.
Introducing Epochs
Now that we have determined a set of finite possible frequencies, we can divide time in a grid of epochs. One epoch is a concrete time range at a specific frequency level, starting at a specific point in time, called the epoch base time. Level 0 epochs have a maximum length of 2⁰ = 1 seconds. Level 3 epochs have a maximum length of 2³ = 8 seconds, etc.
To refer to a specific epoch, or epoch ID we need to know the epoch base time and the epoch level
We will use this epoch addressing scheme to derive a chunk address in which to store a particular update.
Epoch base time
To caclculate the epoch base time of any given instant in time at a particular level, we use the simple formula:
baseTime(t, level) = t & ( 0xFFFFFFFFFFFFFFFF << level )
In other words, we are dropping the
level
lowest significant bits oft
.Seeding algorithm
The seeding algorithm describes the process followed by the update publisher to determine in what epoch "plant" the content so it can be found (harvested) by users. The algorithm works as follows:
First updates
Any first resource update will have a level of
25
.Thus, if as of writing this it is August 12th 2018 at 16:51 UTC, Unix Time is
1534092715
. Therefore, the epoch base time is1534092715 & 0xFFFFFFFFFE000000 = 1509949440
The epoch id for a first update now is therefore
(1509949440, 25)
Subsequent updates
To determine the epoch in which to store a subsequent update, the publisher needs to know where they stored the previous update. This should be straightforward. However, if the publisher can't or does not want to keep track of this, it can always use the harvesting algorithm (see below) to find their last update.
The selected epoch for a subsequent update must be the epoch with the highest possible level that is not already occupied by a previous update.
Let's say that we want to update our resource 5 minutes later. The Unix Time is now
1534093015
.We calculate
getBaseTime(1534093015, 25) = 1509949440
.This results in the same epoch as before
(1534093015, 25)
. Therefore, we decrease the level and calculate again:getBaseTime(1534093015, 24) = 1526726656
Thus, the next update will be located at
(1526726656, 24)
If the publisher keeps updating the resource exactly every 5 minutes, the epoch grid will look like this:
If the publisher keeps updating every 5 minutes (300s), we can expect the updates to stay around level 8-9 (2⁸ = 256 seconds, 2⁹ = 512 seconds). The publisher can however, at any time vary this update frequency or just update randomly. This does not affect the algorithm.
Here is a closer look at the converging levels further down:
Harvesting algorithm
The harvesting algorithm describes how to find the latest update of a resource. This involves looking up an epoch and walking back in time until we find the last update.
Start Epoch
To select the best starting epoch to walk our grid, we have to assume the worst case, which is that the resource was never updated after we last saw it.
If we don't know when the resource was last updated, we asume
0
as the "last time" it was updated.We can guess a start level as the position of the nonzero bit of
XOR(last base time, now)
counting from the left. The bigger the difference among the two times (last update time and now), the higher the level will be as the update frequency we are estimating is lower.If the resulting level is higher than
25
, we use25
.Walking the grid - a simple algorithm
Consider the following grid. In it we have marked in yellow where the updates have happened in the past.
All the above is unknown to the harvester. All we know is the last seen update happened at
(20,2)
, marked in light orange. We call this the "hint". The algorithm will consider this hint but will discard it if it proves it really did not contain an update. An invalid hint can pontentially slow down the algorithm but won't stop it from finding the last update.Now it is
t=26
and we want to look for the last update. Our guess at a start level is:XOR(20, 26) = 14 = 1110b
, in which the first nonzero bit counting from the left is bit #3. Thus, our first lookup will happen at(baseTime(26,3), 3) = (24,3)
, shown in dark blue below:If a lookup at
(24,3)
fails, we consider that there are no updates at lower levels either, since the seeding algorithm would've filled(24,3)
before going down. This means there are no updates on or aftert=24
. Thus, our search area would be reduced to the area directly below(20,2)
(green area). We restart the algorithm as ifnow
was23
If however, a lookup at
(24,3)
succeeds, then we know the last update could either be the one at(24,3)
itself or may be there is a later one in the epochs below (blue area). At this point we consider that the update is in24 <= t <= 26
. We restart the algorithm with the hint set to(24,3)
instead of the original one. If the lookup then fails, then the last update was indeed in(24,3)
This is how the algorithm would play out if now
t=26
if the last update is in(22,1)
Lookups are, in this order:
Locking on / following a resource
Once we have found the last update of a resource, we can easily calculate in what epoch will the next update appear, if the publisher actually makes it.
In figure 9 above, if the last update was found at
(22,1)
and now it ist=26
, the next update must happen exactly at(24,3)
. This holds true untilt=32
. Beyond that point, the next update can be expected at(32,4)
untilt=48
Therefore, the node following the resource could sample the expected epoch, keeping in sync with the publisher.
Final notes:
Please let me know your feedback, questions and test issues. I hope you like this feature. I am available on Gitter (@jpeletier) in #orange-lounge channel. Enjoy!!