Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow record parsing with high number of created entities #1926

Closed
steven-sheehy opened this issue May 4, 2021 · 1 comment · Fixed by #2006
Closed

Slow record parsing with high number of created entities #1926

steven-sheehy opened this issue May 4, 2021 · 1 comment · Fixed by #2006
Assignees
Labels
enhancement Type: New feature P1 parser Area: File parsing performance
Milestone

Comments

@steven-sheehy
Copy link
Member

steven-sheehy commented May 4, 2021

Detailed Description
Performance testing of high account and topic creation shows that it can easily slow down the importer to a crawl. Thread dumps indicate the majority of time is spent inside EntityRepository methods like insertEntityId and findById.

Actual Behavior
Steps to reproduce the behavior:

  1. Create accounts at 2K TPS and topics at 2K TPS
  2. Wait until at least 2M entities have been created
  3. Record processing time begins exceeding 2m

Expected Behavior
Record processing time doesn't take longer than the 2s close interval

Environment:

  • Java: OpenJDK 11.0.10
  • OS: Ubuntu 18.04
  • Version: 0.32.0

Additional Context

May 04 15:47:53 java[16335]: 2021-05-04T09:47:53.483-0600 INFO scheduling-6 c.h.m.i.p.PgCopy Copied 5958 rows to transaction table in 112.8 ms
May 04 15:47:53 java[16335]: 2021-05-04T09:47:53.644-0600 INFO scheduling-6 c.h.m.i.p.PgCopy Copied 20862 rows to crypto_transfer table in 160.8 ms
May 04 15:47:53 java[16335]: 2021-05-04T09:47:53.664-0600 INFO scheduling-6 c.h.m.i.p.PgCopy Copied 141 rows to topic_message table in 20.05 ms
May 04 15:47:53 java[16335]: 2021-05-04T09:47:53.673-0600 INFO scheduling-6 c.h.m.i.p.PgCopy Copied 548 rows to token_transfer table in 8.679 ms
May 04 15:49:25 java[16335]: 2021-05-04T09:49:25.637-0600 INFO scheduling-6 c.h.m.i.p.r.e.s.SqlEntityListener Inserted 5432 entities in 1.533 min
May 04 15:49:25 java[16335]: 2021-05-04T09:49:25.637-0600 INFO scheduling-6 c.h.m.i.p.r.e.s.SqlEntityListener Completed batch inserts in 1.538 min
May 04 15:49:25 java[16335]: 2021-05-04T09:49:25.639-0600 INFO scheduling-6 c.h.m.i.p.r.RecordFileParser Finished parsing 5958 transactions from record file 2021-05-03T23_16_40.056853000Z.rcd in 1.718 min. Success: true
May 04 15:49:38 java[16335]: 2021-05-04T09:49:38.390-0600 INFO scheduling-6 c.h.m.i.p.PgCopy Copied 6857 rows to transaction table in 177.4 ms
May 04 15:49:38 java[16335]: 2021-05-04T09:49:38.597-0600 INFO scheduling-6 c.h.m.i.p.PgCopy Copied 24011 rows to crypto_transfer table in 207.2 ms
May 04 15:49:38 java[16335]: 2021-05-04T09:49:38.628-0600 INFO scheduling-6 c.h.m.i.p.PgCopy Copied 144 rows to topic_message table in 30.12 ms
May 04 15:49:38 java[16335]: 2021-05-04T09:49:38.637-0600 INFO scheduling-6 c.h.m.i.p.PgCopy Copied 636 rows to token_transfer table in 9.452 ms
May 04 15:51:43 java[16335]: 2021-05-04T09:51:43.527-0600 INFO scheduling-6 c.h.m.i.p.r.e.s.SqlEntityListener Inserted 6270 entities in 2.081 min
May 04 15:51:43 java[16335]: 2021-05-04T09:51:43.527-0600 INFO scheduling-6 c.h.m.i.p.r.e.s.SqlEntityListener Completed batch inserts in 2.089 min
May 04 15:51:43 java[16335]: 2021-05-04T09:51:43.529-0600 INFO scheduling-6 c.h.m.i.p.r.RecordFileParser Finished parsing 6857 transactions from record file 2021-05-03T23_16_42.039556000Z.rcd in 2.296 min. Success: true
May 04 15:52:19 java[16335]: 2021-05-04T09:52:19.813-0600 INFO scheduling-6 c.h.m.i.p.PgCopy Copied 6910 rows to transaction table in 124.5 ms
May 04 15:52:20 java[16335]: 2021-05-04T09:52:20.006-0600 INFO scheduling-6 c.h.m.i.p.PgCopy Copied 24216 rows to crypto_transfer table in 192.5 ms
May 04 15:52:20 java[16335]: 2021-05-04T09:52:20.018-0600 INFO scheduling-6 c.h.m.i.p.PgCopy Copied 163 rows to topic_message table in 12.82 ms
May 04 15:52:20 java[16335]: 2021-05-04T09:52:20.031-0600 INFO scheduling-6 c.h.m.i.p.PgCopy Copied 640 rows to token_transfer table in 12.51 ms
May 04 15:54:21 java[16335]: 2021-05-04T09:54:21.213-0600 INFO scheduling-6 c.h.m.i.p.r.e.s.SqlEntityListener Inserted 6274 entities in 2.020 min
May 04 15:54:21 java[16335]: 2021-05-04T09:54:21.213-0600 INFO scheduling-6 c.h.m.i.p.r.e.s.SqlEntityListener Completed batch inserts in 2.025 min
May 04 15:54:21 java[16335]: 2021-05-04T09:54:21.215-0600 INFO scheduling-6 c.h.m.i.p.r.RecordFileParser Finished parsing 6910 transactions from record file 2021-05-03T23_16_44.001309000Z.rcd in 2.626 min. Success: true
@steven-sheehy steven-sheehy added bug Type: Something isn't working performance parser Area: File parsing labels May 4, 2021
@steven-sheehy
Copy link
Member Author

steven-sheehy commented May 16, 2021

See also related #1440. The solution in that ticket is a bit unclear though. I would propose a different solution:

  • CREATE TEMPORARY TABLE ON COMMIT DROP AS SELECT * FROM WITH NO DATA for each table that can be updated
  • Use COPY to insert into temp tables
  • Use INSERT INTO entity SELECT FROM entity_temp ON CONFLICT DO UPDATE

This allows us to avoid any client side caches or update statements and use the fast pgcopy for all operations. At the end, a server executed command will upsert from temp table into regular table.

@Nana-EC Nana-EC self-assigned this May 17, 2021
@steven-sheehy steven-sheehy added enhancement Type: New feature and removed bug Type: Something isn't working labels May 26, 2021
@steven-sheehy steven-sheehy added this to the Mirror 0.36.0 milestone Jun 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Type: New feature P1 parser Area: File parsing performance
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants