Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add data migration guide to database.md #3175

Merged
merged 4 commits into from
Jan 20, 2022
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
70 changes: 69 additions & 1 deletion docs/database.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Database

# Indexes
## Indexes

The table below documents the database indexes with the usage in APIs / services.

Expand All @@ -11,3 +11,71 @@ The table below documents the database indexes with the usage in APIs / services
| nft_transfer | consensus_timestamp | Rosetta API | `/account/balance` | Used to calculate an account's nft token balance including serial numbers at a block |
| nft_transfer | consensus_timestamp | Rosetta API | `/block` | Used to join `nft_transfer` and `transaction` on `consensus_timestamp` equality |
| nft_transfer | consensus_timestamp | Rosetta API | `/block/transaction` | Used to join `nft_transfer` and `transaction` on `consensus_timestamp` equality |

## Upgrade

Data needs to be migrated for PostgreSQL major release upgrade. This section documents the steps to dump the existing
data, configure the new PostgreSQL instance, and restore the data.

### Prerequisites

- Importer for the old PostgreSQL database instance is stopped
- The new PostgreSQL database instance
- An ubuntu virtual machine with fast network speed connections to both PostgreSQL database instances. The instance should also have
enough free disk space for the database dump
Nana-EC marked this conversation as resolved.
Show resolved Hide resolved

### Backup

To dump data from the old PostgreSQL database instance, run the following commands:

```shell
mkdir -p data_dump
pg_dump -h $OLD_POSTGRESQL_DB_IP -U mirror_node \
--format=directory \
--no-owner \
--no-acl \
-j 6 \
-f data_dump \
mirror_node
```

The flag `-j` sets the number of parallel dumping jobs. The value should be at least the number of cpu cores of the
PostgreSQL server and the recommended value is 1.5 times of that.

The time to dump the whole database usually depends on the size of the largest table.

### New PostgreSQL Database Instance Configuration

Run [init.sh](/hedera-mirror-importer/src/main/resources/db/scripts/init.sh) or the equivalent SQL statements to create
required database objects including the `mirror_node` database, the roles, the schema, and access privileges.

The following configuration needs to be applied to the database instance to improve the write speed.

```
checkpoint_timeout = 30m
max_wal_size = 512GB
temp_file_limit = 2147483647kB
```

Note: once the data is restored, revert the values back for normal operation.

### Restore

Before restoring the data, take a database snapshot.

Use the following command to restore the data dump to the new PostgreSQL database instance:

```shell
pg_restore -h $NEW_POSTGRESQL_DB_IP -U mirror_node \
--exit-on-error \
--format=directory \
--no-owner \
--no-acl \
-j 6 \
-d mirror_node \
data_dump
```

Note: `-j` works the same way as for `pg_dump`. The single transaction mode can't be used together with the parallel
mode. As a result, if the command is interrupted, the database will have partial data, and it needs to be restored
using the saved snapshot before retry.