Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add data migration guide to database.md #3175

Merged
merged 4 commits into from
Jan 20, 2022
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
68 changes: 67 additions & 1 deletion docs/database.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Database

# Indexes
## Indexes

The table below documents the database indexes with the usage in APIs / services.

Expand All @@ -11,3 +11,69 @@ The table below documents the database indexes with the usage in APIs / services
| nft_transfer | consensus_timestamp | Rosetta API | `/account/balance` | Used to calculate an account's nft token balance including serial numbers at a block |
| nft_transfer | consensus_timestamp | Rosetta API | `/block` | Used to join `nft_transfer` and `transaction` on `consensus_timestamp` equality |
| nft_transfer | consensus_timestamp | Rosetta API | `/block/transaction` | Used to join `nft_transfer` and `transaction` on `consensus_timestamp` equality |

## Data Migration Between PostgreSQL Major Releases
steven-sheehy marked this conversation as resolved.
Show resolved Hide resolved

Data needs to be migrated for PostgreSQL major release upgrade. This section documents the steps to dump the existing
data, configure the new PostgreSQL instance, and restore the data.

### Prerequisites

- Importer for the old PostgreSQL database instance is stopped
- The new PostgreSQL database instance at the same schema version
Nana-EC marked this conversation as resolved.
Show resolved Hide resolved
- No data in the new PostgreSQL database instance, if not sure, run the
steven-sheehy marked this conversation as resolved.
Show resolved Hide resolved
[cleanup script](/hedera-mirror-importer/src/main/resources/db/scripts/cleanup.sql) to clear the data
- An ubuntu virtual machine with fast network speed to both PostgreSQL database instances. The instance should also have
Nana-EC marked this conversation as resolved.
Show resolved Hide resolved
enough free disk space for the database dump
Nana-EC marked this conversation as resolved.
Show resolved Hide resolved

### Dump Data
steven-sheehy marked this conversation as resolved.
Show resolved Hide resolved

To dump data from the old PostgreSQL database instance, run the following commands:

```shell
mkdir -p data_dump
pg_dump -h $OLD_POSTGRESQL_DB_IP -U mirror_node \
--format=directory \
--no-owner \
--no-acl \
-j 6 \
-a \
-f data_dump \
-T 'flyway*' \
steven-sheehy marked this conversation as resolved.
Show resolved Hide resolved
mirror_node
```

The flag `-j` sets the number of parallel dumping jobs. The value should be at least the number of cpu cores of the
PostgreSQL server and the recommended value is 1.5 times of that.

The tables specified by the flag `-T` will be excluded from the dump. Adjust the table patterns if needed.

The time to dump the whole database usually depends on the size of the largest table.

### New PostgreSQL Database Instance Configuration

The following configuration needs to be applied to the new PostgreSQL database instance to improve the write speed.
steven-sheehy marked this conversation as resolved.
Show resolved Hide resolved

```
checkpoint_timeout = 30m
max_wal_size = 512GB
temp_file_limit = 2147483647kB
```

### Retore Data
steven-sheehy marked this conversation as resolved.
Show resolved Hide resolved

Use the following command to restore the data dump to the new PostgreSQL database instance:

```shell
pg_restore -h $NEW_POSTGRESQL_DB_IP -U mirror_node \
--exit-on-error \
--format=directory \
--no-owner \
--no-acl \
-j 6 \
-d mirror_node \
data_dump
```

Note `-j` works the same way as for `pg_dump`. The single transaction mode can't be used together with the parallel
mode. As a result, if the command is interrupted, clear the partially restored data before retry.