Skip to content

Commit

Permalink
Add data migration guide to database.md (#3175)
Browse files Browse the repository at this point in the history
Signed-off-by: Xin Li <xin.li@hedera.com>
Signed-off-by: Matheus DallRosa <matheus.dallrosa@swirlds.com>
  • Loading branch information
xin-hedera authored and matheus-dallrosa committed Feb 21, 2022
1 parent c295ba6 commit 70f26db
Showing 1 changed file with 69 additions and 1 deletion.
70 changes: 69 additions & 1 deletion docs/database.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Database

# Indexes
## Indexes

The table below documents the database indexes with the usage in APIs / services.

Expand All @@ -11,3 +11,71 @@ The table below documents the database indexes with the usage in APIs / services
| nft_transfer | consensus_timestamp | Rosetta API | `/account/balance` | Used to calculate an account's nft token balance including serial numbers at a block |
| nft_transfer | consensus_timestamp | Rosetta API | `/block` | Used to join `nft_transfer` and `transaction` on `consensus_timestamp` equality |
| nft_transfer | consensus_timestamp | Rosetta API | `/block/transaction` | Used to join `nft_transfer` and `transaction` on `consensus_timestamp` equality |

## Upgrade

Data needs to be migrated for PostgreSQL major release upgrade. This section documents the steps to dump the existing
data, configure the new PostgreSQL instance, and restore the data.

### Prerequisites

- Importer for the old PostgreSQL database instance is stopped
- The new PostgreSQL database instance
- An ubuntu virtual machine with fast network speed connections to both PostgreSQL database instances. The instance should also have
enough free disk space for the database dump

### Backup

To dump data from the old PostgreSQL database instance, run the following commands:

```shell
mkdir -p data_dump
pg_dump -h $OLD_POSTGRESQL_DB_IP -U mirror_node \
--format=directory \
--no-owner \
--no-acl \
-j 6 \
-f data_dump \
mirror_node
```

The flag `-j` sets the number of parallel dumping jobs. The value should be at least the number of cpu cores of the
PostgreSQL server and the recommended value is 1.5 times of that.

The time to dump the whole database usually depends on the size of the largest table.

### New PostgreSQL Database Instance Configuration

Run [init.sh](/hedera-mirror-importer/src/main/resources/db/scripts/init.sh) or the equivalent SQL statements to create
required database objects including the `mirror_node` database, the roles, the schema, and access privileges.

The following configuration needs to be applied to the database instance to improve the write speed.

```
checkpoint_timeout = 30m
max_wal_size = 512GB
temp_file_limit = 2147483647kB
```

Note: once the data is restored, revert the values back for normal operation.

### Restore

Before restoring the data, take a database snapshot.

Use the following command to restore the data dump to the new PostgreSQL database instance:

```shell
pg_restore -h $NEW_POSTGRESQL_DB_IP -U mirror_node \
--exit-on-error \
--format=directory \
--no-owner \
--no-acl \
-j 6 \
-d mirror_node \
data_dump
```

Note: `-j` works the same way as for `pg_dump`. The single transaction mode can't be used together with the parallel
mode. As a result, if the command is interrupted, the database will have partial data, and it needs to be restored
using the saved snapshot before retry.

0 comments on commit 70f26db

Please sign in to comment.