Skip to content

Commit

Permalink
udpate docs and readme
Browse files Browse the repository at this point in the history
  • Loading branch information
sinkinben committed Oct 24, 2021
1 parent f529c8b commit c67b60f
Show file tree
Hide file tree
Showing 3 changed files with 75 additions and 9 deletions.
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -118,7 +118,8 @@ And it also support some meta commands (for debugging):

## Test

```text
```bash
$ mv gemfile Gemfile # rename Gemfile
$ bundle install
$ bundle exec rspec
```
Expand Down
16 changes: 9 additions & 7 deletions btree.h
Original file line number Diff line number Diff line change
Expand Up @@ -13,12 +13,14 @@ typedef enum

/**
* a leaf node <==> a page on disk
* 每个 page 的开头需要存储一些 meta 信息
* - node_type
* - is_root
* - parent_pointer
* - num_cells: how many rows(cells) in our table
* - cells: {key1, value1}, {key2, value2}, where key is actually "id" here
* at the begining of a page, we need to store some meta-data,
* behind these meta-data, are the real <key, value> pairs data
* The structure of a leaf node:
* - node_type
* - is_root
* - parent_pointer
* - num_cells: how many rows(cells) in this page
* - cells: {key1, value1}, {key2, value2}, where key is actually "id" here
**/

// Common Node Header Layout
Expand Down Expand Up @@ -764,4 +766,4 @@ void internal_node_split(table_t *table, uint32_t old_page_num)
}
}

#endif
#endif
65 changes: 64 additions & 1 deletion docs/b-tree.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,11 +23,74 @@ B-tree 是基于二叉搜索树 (Binary Search Tree, BST) 扩展而来的,
- 我们的 b+tree 只会在 leaf node 删除(因为删除数据总是在 leaf node 的)
- 假如删除了 leaf node 的 max key,需要递归调整 parent.cell[i].key
- 假如删除了若干行,导致某个 leaf node (page) 为空,暂不考虑页面回收。
- 这意味着,虽然这一页是空的(没有任何的行数据),但还是会占用磁盘空间,并且**会导致永远不能被复用**(非常危险的行为)。
- 这意味着,虽然这一页是空的(没有任何的行数据),但还是会占用磁盘空间。
- 但当我们下次插入同样的 key 时,它依然可以恢复在这一页上(相当于伪删除)。
-



## B+Tree on Disk

如果我们要存储一棵二叉树,那么它的每个结点在磁盘上,应该是怎么存储的呢?

```text
+--------+--------+--------+--------+
| page | value | left | right |
+--------+--------+--------+--------+
| 0 | 123 | 1 | 2 |
+--------+--------+--------+--------+
| 1 | 101 | nil | nil |
+--------+--------+--------+--------+
| 2 | 789 | nil | 3 |
+--------+--------+--------+--------+
| 3 | 999 | nil | nil |
+--------+--------+--------+--------+
```

上面的内容,其实表示的是:

```text
123
/ \
101 789
\
999
```

但在 B+Tree 中,我们需要区分 Internal Node 和 Leaf Node,这 2 种节点的结构是不一样的。但有一点是一致的:总是使用一个 Page (4KB) 去存储一个结点。



**Leaf Node Structure**

| Offset | Member | Size(bytes) | Description |
| :----: | :----------------------------: | :---------: | :----------------------------------------------------------: |
| 0 | `node_type` | 1 | Internal node or leaf node |
| 1 | `is_root` | 1 | True or false |
| 2 | `parent` | 4 | Parent's page number |
| 6 | `num_cells` | 4 | How many cells in this node, each cell is a k-v pair |
| 10 | `next_leaf` | 4 | The sibling's page number (all leaf nodes in b+tree will form a linked list) |
| 14 | `cell[0] = <key[0], value[0]>` | 4 + X | `key` is the primary key of the table, and the size of `value` depends on the definition of row |
| ... | ... | ... | More cells here |
| | `cell[n-1]` | 4 + X | |
| ... | unused bytes | ... | there may be blank space left finally, which is not enough to store a `k-v` pair (i.e. unused bytes is less than `4+X` bytes). |



**Internal Node Structure**

| Offset | Member | Size(bytes) | Description |
| :----: | :----------------------------: | :---------: | :----------------------------------------------------------: |
| 0 | `node_type` | 1 | Internal node or leaf node |
| 1 | `is_root` | 1 | True or false |
| 2 | `parent` | 4 | Parent's page number |
| 6 | `num_cells` | 4 | How many cells in this node, each cell is a `<child, key>` pair |
| 10 | `rightmost_child` | 4 | a page number, represent the rightmost child of a internal node |
| 14 | `cell[0] = <child[0], key[0]>` | 8 | `child` is the child's page number |
| 22 | ... | ... | more cells here |
| ... | `cell[n-1]` | 8 | |
| ... | unused bytes | ... | Unused bytes is less strictly than 8 bytes. |



## Refs
Expand Down

0 comments on commit c67b60f

Please sign in to comment.