Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thoughts on keys #1676

Merged
merged 4 commits into from
May 3, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
Add storage keys docs and pull in some functions
  • Loading branch information
webmaster128 committed May 2, 2023
commit 6c415f9869901f652bafd7a247b0e25c2729289f
83 changes: 83 additions & 0 deletions docs/STORAGE_KEYS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
# Storage keys

CosmWasm provides a generic key value store to contract developers via the
`Storage` trait. This is powerful but the nature of low level byte operations
makes it hard to use for high level storage types. In this document we discuss
the foundations of storage key composition all the way up to cw-storage-plus.

In a simple world, all you need is a `&[u8]` key which you can get e.g. using
`&17u64.to_be_bytes()`. This is an 8 bytes key with an encoded integer. But if
you have multiple data types in your contract, you want to prefix those keys in
order to avoid collisions. A simple concatenation is not sufficient because you
want to avoid collisions when part of the prefixes and part of the key overlap.
E.g. `b"keya" | b"x"` and `b"key" | b"ax"` (`|` denotes concatenation) must not
have the same binary representation.

In the early days, multiple approaches of key namespacing were discussed and
were documented here: https://github.com/webmaster128/key-namespacing. The "0x00
separated ASCIIHEX" approach was never used but "Length-prefixed keys" is used.

To recap, Length-prefixed keys have the following layout:

```
len(namespace_1) | namespace_1
| len(namespace_2) | namespace_2
| len(namespace_3) | namespace_3
| ...
| len(namespace_m) | namespace_m
| key
```

In this repo (package `cosmwasm-storage`), the following functions were
implemented:

```rust
pub fn to_length_prefixed(namespace: &[u8]) -> Vec<u8>

pub fn to_length_prefixed_nested(namespaces: &[&[u8]]) -> Vec<u8>

fn concat(namespace: &[u8], key: &[u8]) -> Vec<u8>
```

With the emerging cw-storage-plus we see two additions to that approach:

1. Manually creating the namespace and concatenating it with `concat` makes no
sense anymore. Instead `namespace` and `key` are always provided and a
composed database key is created.
2. Using a multi component namespace becomes the norm.

This led to the following addition in cw-storage-plus:

```rust
/// This is equivalent concat(to_length_prefixed_nested(namespaces), key)
/// But more efficient when the intermediate namespaces often must be recalculated
pub(crate) fn namespaces_with_key(namespaces: &[&[u8]], key: &[u8]) -> Vec<u8> {
```

In contrast to `concat(to_length_prefixed_nested(namespaces), key)` this direct
implementation saves once vector allocation since the final length can be
pre-computed and reserved. Also it's shorter to use.

Also since `to_length_prefixed` returns the same result as
`to_length_prefixed_nested` when called with one namespace element, there is no
good reason to preserve the single component version.

## 2023 updates

With the deprecation if cosmwasm-storage and the adoption of the system in
cw-storage-plus, it is time to do a few changes to the Length-prefixed keys
standard, without breaking existing users.

1. Remove the single component `to_length_prefixed` implementation and fully
commit to the multi-component version. This shifts focus from the recursive
implementation to the compatible iterative implementation.
2. Rename "namespaces" to just "namespace" and let one namespace have multiple
components.
3. Adopt the combined namespace + key encoder `namespaces_with_key` from
cw-storage-plus.
4. Add a decomposition implementation

Given the importance of Length-prefixed keys for the entire CosmWasm ecosystem,
those implementations should be maintained in cosmwasm-std. The generic approach
allows building all sorts of storage solutions on top of it and it allows
indexers to parse storage keys for all of them.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏼

1 change: 1 addition & 0 deletions packages/std/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ mod results;
mod sections;
mod serde;
mod storage;
mod storage_keys;
mod timestamp;
mod traits;
mod types;
Expand Down
195 changes: 195 additions & 0 deletions packages/std/src/storage_keys/length_prefixed.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,195 @@
//! This module is an implemention of a namespacing scheme described
//! in https://github.com/webmaster128/key-namespacing#length-prefixed-keys
//!
//! Everything in this file is only responsible for building such keys
//! and is in no way specific to any kind of storage.

/// Calculates the raw key prefix for a given namespace as documented
/// in https://github.com/webmaster128/key-namespacing#length-prefixed-keys
#[allow(unused)]
webmaster128 marked this conversation as resolved.
Show resolved Hide resolved
pub fn to_length_prefixed(namespace: &[u8]) -> Vec<u8> {
let mut out = Vec::with_capacity(namespace.len() + 2);
out.extend_from_slice(&encode_length(namespace));
out.extend_from_slice(namespace);
out
}

/// Calculates the raw key prefix for a given nested namespace
/// as documented in https://github.com/webmaster128/key-namespacing#nesting
#[allow(unused)]
pub fn to_length_prefixed_nested(namespaces: &[&[u8]]) -> Vec<u8> {
let mut size = 0;
for &namespace in namespaces {
size += namespace.len() + 2;
}

let mut out = Vec::with_capacity(size);
for &namespace in namespaces {
out.extend_from_slice(&encode_length(namespace));
out.extend_from_slice(namespace);
}
out
}

/// Encodes the length of a given namespace as a 2 byte big endian encoded integer
fn encode_length(namespace: &[u8]) -> [u8; 2] {
if namespace.len() > 0xFFFF {
panic!("only supports namespaces up to length 0xFFFF")
}
let length_bytes = (namespace.len() as u32).to_be_bytes();
[length_bytes[2], length_bytes[3]]
}

#[inline]
#[allow(unused)]
fn concat(namespace: &[u8], key: &[u8]) -> Vec<u8> {
webmaster128 marked this conversation as resolved.
Show resolved Hide resolved
let mut k = namespace.to_vec();
k.extend_from_slice(key);
k
}

#[cfg(test)]
mod tests {
use super::*;

#[test]
fn to_length_prefixed_works() {
assert_eq!(to_length_prefixed(b""), b"\x00\x00");
assert_eq!(to_length_prefixed(b"a"), b"\x00\x01a");
assert_eq!(to_length_prefixed(b"ab"), b"\x00\x02ab");
assert_eq!(to_length_prefixed(b"abc"), b"\x00\x03abc");
}

#[test]
fn to_length_prefixed_works_for_long_prefix() {
let long_namespace1 = vec![0; 256];
let prefix1 = to_length_prefixed(&long_namespace1);
assert_eq!(prefix1.len(), 256 + 2);
assert_eq!(&prefix1[0..2], b"\x01\x00");

let long_namespace2 = vec![0; 30000];
let prefix2 = to_length_prefixed(&long_namespace2);
assert_eq!(prefix2.len(), 30000 + 2);
assert_eq!(&prefix2[0..2], b"\x75\x30");

let long_namespace3 = vec![0; 0xFFFF];
let prefix3 = to_length_prefixed(&long_namespace3);
assert_eq!(prefix3.len(), 0xFFFF + 2);
assert_eq!(&prefix3[0..2], b"\xFF\xFF");
}

#[test]
#[should_panic(expected = "only supports namespaces up to length 0xFFFF")]
fn to_length_prefixed_panics_for_too_long_prefix() {
let limit = 0xFFFF;
let long_namespace = vec![0; limit + 1];
to_length_prefixed(&long_namespace);
}

#[test]
fn to_length_prefixed_calculates_capacity_correctly() {
// Those tests cannot guarantee the required capacity was calculated correctly before
// the vector allocation but increase the likelyhood of a proper implementation.

let key = to_length_prefixed(b"");
assert_eq!(key.capacity(), key.len());

let key = to_length_prefixed(b"h");
assert_eq!(key.capacity(), key.len());

let key = to_length_prefixed(b"hij");
assert_eq!(key.capacity(), key.len());
}

#[test]
fn to_length_prefixed_nested_works() {
assert_eq!(to_length_prefixed_nested(&[]), b"");
assert_eq!(to_length_prefixed_nested(&[b""]), b"\x00\x00");
assert_eq!(to_length_prefixed_nested(&[b"", b""]), b"\x00\x00\x00\x00");

assert_eq!(to_length_prefixed_nested(&[b"a"]), b"\x00\x01a");
assert_eq!(
to_length_prefixed_nested(&[b"a", b"ab"]),
b"\x00\x01a\x00\x02ab"
);
assert_eq!(
to_length_prefixed_nested(&[b"a", b"ab", b"abc"]),
b"\x00\x01a\x00\x02ab\x00\x03abc"
);
}

#[test]
fn to_length_prefixed_nested_returns_the_same_as_to_length_prefixed_for_one_element() {
let tests = [b"" as &[u8], b"x" as &[u8], b"abababab" as &[u8]];

for test in tests {
assert_eq!(to_length_prefixed_nested(&[test]), to_length_prefixed(test));
}
}

#[test]
fn to_length_prefixed_nested_allows_many_long_namespaces() {
// The 0xFFFF limit is for each namespace, not for the combination of them

let long_namespace1 = vec![0xaa; 0xFFFD];
let long_namespace2 = vec![0xbb; 0xFFFE];
let long_namespace3 = vec![0xcc; 0xFFFF];

let prefix =
to_length_prefixed_nested(&[&long_namespace1, &long_namespace2, &long_namespace3]);
assert_eq!(&prefix[0..2], b"\xFF\xFD");
assert_eq!(&prefix[2..(2 + 0xFFFD)], long_namespace1.as_slice());
assert_eq!(&prefix[(2 + 0xFFFD)..(2 + 0xFFFD + 2)], b"\xFF\xFe");
assert_eq!(
&prefix[(2 + 0xFFFD + 2)..(2 + 0xFFFD + 2 + 0xFFFE)],
long_namespace2.as_slice()
);
assert_eq!(
&prefix[(2 + 0xFFFD + 2 + 0xFFFE)..(2 + 0xFFFD + 2 + 0xFFFE + 2)],
b"\xFF\xFf"
);
assert_eq!(
&prefix[(2 + 0xFFFD + 2 + 0xFFFE + 2)..(2 + 0xFFFD + 2 + 0xFFFE + 2 + 0xFFFF)],
long_namespace3.as_slice()
);
}

#[test]
fn to_length_prefixed_nested_calculates_capacity_correctly() {
// Those tests cannot guarantee the required capacity was calculated correctly before
// the vector allocation but increase the likelyhood of a proper implementation.

let key = to_length_prefixed_nested(&[]);
assert_eq!(key.capacity(), key.len());

let key = to_length_prefixed_nested(&[b""]);
assert_eq!(key.capacity(), key.len());

let key = to_length_prefixed_nested(&[b"a"]);
assert_eq!(key.capacity(), key.len());

let key = to_length_prefixed_nested(&[b"a", b"bc"]);
assert_eq!(key.capacity(), key.len());

let key = to_length_prefixed_nested(&[b"a", b"bc", b"def"]);
assert_eq!(key.capacity(), key.len());
}

#[test]
fn encode_length_works() {
assert_eq!(encode_length(b""), *b"\x00\x00");
assert_eq!(encode_length(b"a"), *b"\x00\x01");
assert_eq!(encode_length(b"aa"), *b"\x00\x02");
assert_eq!(encode_length(b"aaa"), *b"\x00\x03");
assert_eq!(encode_length(&vec![1; 255]), *b"\x00\xff");
assert_eq!(encode_length(&vec![1; 256]), *b"\x01\x00");
assert_eq!(encode_length(&vec![1; 12345]), *b"\x30\x39");
assert_eq!(encode_length(&vec![1; 65535]), *b"\xff\xff");
}

#[test]
#[should_panic(expected = "only supports namespaces up to length 0xFFFF")]
fn encode_length_panics_for_large_values() {
encode_length(&vec![1; 65536]);
}
}
3 changes: 3 additions & 0 deletions packages/std/src/storage_keys/mod.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
mod length_prefixed;

pub use length_prefixed::{to_length_prefixed, to_length_prefixed_nested};