Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix slow domain object ingestion #2006

Merged
merged 35 commits into from
Jun 16, 2021
Merged

Fix slow domain object ingestion #2006

merged 35 commits into from
Jun 16, 2021

Conversation

Nana-EC
Copy link
Contributor

@Nana-EC Nana-EC commented May 22, 2021

Detailed description:
Slow entity ingestions flows exists. This is mainly because we do multiple db reference calls to check for existing entities to get the up to date info.
Insert only flows are fast with PgCopy. Taking inspiration from this update scenarios could utilize PgCopy to copy domain objects over to db server, then follow up with an upsert query to manage the insert vs update scenarios.

  • Remove repository saves from EntityRecordItemListener
  • Update EntityRecordItemListener to pass all entities, tokens and schedules domain objects with available info to EntityListener onX() equivalents
  • Replace onEntityId() with onEntityId()
  • Create a UpsertPgCopy class that creates a temp table for an entity, copies into this, and then runs a domain specific insertSql and updateSql queries to manage upsert to main table
  • Update Transaction Handlers to ensure created timestamp and deleted are appropriately set on Entity updates
  • Centralize EntityIdSerializer usage in PgCopy and remove domain field annotation usage
  • Update domain fields of updatable domains to ensure custom types have a @JsonSerialize annotation to correctly serialize in PgCopy flow
  • Remove getNewAccountFreezeStatus() and getNewAccountKycStatus() and move logic into insert sql command
  • Add NullableStringSerializer to support differentiation of String intention to set null vs unset, since PgCipy serializes empty string and null to be null
  • Update updatable domain objects with serialization support for domain field types, and serializers
  • Add hibernate-jpamodelgen dependency to automatically generate Models based on domain classes for type safety
  • Move embeddedIds out to separate files to ensure generator picks them up
  • Add UpsertQueryGenerator interface with implementations for updatable domains that automatically generate sql queries for temp table creation, insert and update to final table
  • Add cache logic to SqlEntityListener to flatten the domains that are inserted and or updated multiple times in a record file. This will ensure a domain object appears once for inserts and updates .
  • Update transactionHandlers to ensure all entity updates and defaults are set e.g. deleted is false in create case and true in delete case
  • Remove NEVER_EXPIRE_LARGE cache creation and usage
  • Update Github importer workflow to exclude v2 tests from v1 group
  • Add migration to drop default and not null from entity.deleted
  • Update broken tests

Which issue(s) this PR fixes:
Fixes #1926
Fixes #1440

Special notes for your reviewer:

multiEntityUpdate() test outputs
Upsert
Inserting 6000 new entities took 78 s previously now 223.7 ms
Inserting 6000 entities with 3000 being updates took 34 s now takes 400 ms

Insert & Update
Inserting 6000 new entities took 78 s previously now ~160.7 ms
Inserting 6000 entities with 3000 being updates took 34 s now takes ~180 ms

Logs outputs
Summary:
Ingestion times improved 97%, new time is 3% of previous time
Insert only: before ~21s, after ~600ms
Insert & Update: before: ~22s, after ~500 ms

Extended:
Before (current master code)
6k entities insert only (Run 1)

c.h.m.i.p.PgCopy Copied 6000 rows to transaction table in 112.0 ms 
c.h.m.i.p.PgCopy Copied 30000 rows to crypto_transfer table in 140.6 ms 
c.h.m.i.p.r.e.s.SqlEntityListener Inserted 6004 entities in 20.64 s 
c.h.m.i.p.r.e.s.SqlEntityListener Completed batch inserts in 20.89 s  

6k entities insert only (Run 2)

c.h.m.i.p.PgCopy Copied 6000 rows to transaction table in 142.9 ms 
c.h.m.i.p.PgCopy Copied 30000 rows to crypto_transfer table in 147.1 ms 
c.h.m.i.p.r.e.s.SqlEntityListener Inserted 6004 entities in 21.26 s 
c.h.m.i.p.r.e.s.SqlEntityListener Completed batch inserts in 21.56 s 

6k entities insert with 3k being updates (Run 1)

c.h.m.i.p.PgCopy Copied 6000 rows to transaction table in 69.86 ms 
c.h.m.i.p.PgCopy Copied 24000 rows to crypto_transfer table in 98.75 ms 
c.h.m.i.p.r.e.s.SqlEntityListener Inserted 6005 entities in 17.24 s 
c.h.m.i.p.r.e.s.SqlEntityListener Completed batch inserts in 17.41 s 

6k entities insert with 3k being updates (Run 2)

c.h.m.i.p.PgCopy Copied 6000 rows to transaction table in 66.01 ms 
c.h.m.i.p.PgCopy Copied 24000 rows to crypto_transfer table in 124.9 ms 
c.h.m.i.p.r.e.s.SqlEntityListener Inserted 6005 entities in 24.87 s 
c.h.m.i.p.r.e.s.SqlEntityListener Completed batch inserts in 25.06 s 

After (PR code)
6k entities insert only (Run 1)

c.h.m.i.p.PgCopy Copied 30000 rows to crypto_transfer table in 169.8 ms 
c.h.m.i.p.PgCopy Copied 6000 rows to transaction table in 98.68 ms 
c.h.m.i.p.r.e.s.SqlEntityListener Inserted 6000 Entity's in 240.5 ms 
c.h.m.i.p.r.e.s.SqlEntityListener Inserted 6004 EntityId's in 47.88 ms 
c.h.m.i.p.r.e.s.SqlEntityListener Completed batch inserts in 558.9 ms 

6k entities insert only (Run 2)

c.h.m.i.p.PgCopy Copied 30000 rows to crypto_transfer table in 165.7 ms 
c.h.m.i.p.PgCopy Copied 6000 rows to transaction table in 87.10 ms 
c.h.m.i.p.r.e.s.SqlEntityListener Inserted 6000 Entity's in 259.6 ms  
c.h.m.i.p.r.e.s.SqlEntityListener Inserted 6004 EntityId's in 43.08 ms 
c.h.m.i.p.r.e.s.SqlEntityListener Completed batch inserts in 557.5 ms 

6k entities insert with 3k being updates (Run 1)

c.h.m.i.p.PgCopy Copied 24000 rows to crypto_transfer table in 106.0 ms 
c.h.m.i.p.PgCopy Copied 6000 rows to transaction table in 78.66 ms 
c.h.m.i.p.r.e.s.SqlEntityListener Inserted 6000 Entity's in 214.6 ms 
c.h.m.i.p.r.e.s.SqlEntityListener Inserted 6005 EntityId's in 27.86 ms 
c.h.m.i.p.r.e.s.SqlEntityListener Completed batch inserts in 427.7 ms 
c.h.m.i.p.r.e.p.EntityRecordItemListenerPerformanceCryptoTest Inserting 6000 entities with 3000 updates took 483 ms

6k entities insert with 3k being updates (Run 2)

c.h.m.i.p.PgCopy Copied 24000 rows to crypto_transfer table in 119.1 ms 
c.h.m.i.p.PgCopy Copied 6000 rows to transaction table in 75.42 ms 
c.h.m.i.p.r.e.s.SqlEntityListener Inserted 6000 Entity's in 223.7 ms  
c.h.m.i.p.r.e.s.SqlEntityListener Inserted 6005 EntityId's in 40.19 ms 
c.h.m.i.p.r.e.s.SqlEntityListener Completed batch inserts in 459.0 ms 
c.h.m.i.p.r.e.p.EntityRecordItemListenerPerformanceCryptoTest Inserting 6000 entities with 3000 updates took 521 ms 

Still need to

  • Add micro benchmark tests
  • Fix broken tests
  • Resolve transaction test affects
  • Verify intricate upsert sql queries
  • Explore clearing memo case. Currently hard to differentiate missing memo vs setting to empty string
  • Consider whether expanding OnX(domain) of entityListener to OnX(domain, update) has value. This would allow it to know update vs new and would allow for insert vs update path
  • Consider breaking out from Upsert to Insert & Update since Upsert is complicated and there are lots of edge cases

Checklist

  • Documentation added
  • Tests updated

@Nana-EC Nana-EC added bug Type: Something isn't working P1 parser Area: File parsing labels May 22, 2021
@Nana-EC Nana-EC added this to the Mirror 0.35.0 milestone May 22, 2021
@Nana-EC Nana-EC requested a review from a team May 22, 2021 01:53
@Nana-EC Nana-EC self-assigned this May 22, 2021
@codecov
Copy link

codecov bot commented May 22, 2021

Codecov Report

Merging #2006 (c17f1b8) into master (0000d7b) will increase coverage by 22.03%.
The diff coverage is n/a.

❗ Current head c17f1b8 differs from pull request most recent head e63613b. Consider uploading reports for the commit e63613b to get more accurate results
Impacted file tree graph

@@              Coverage Diff              @@
##             master    #2006       +/-   ##
=============================================
+ Coverage     57.92%   79.96%   +22.03%     
- Complexity        0     2065     +2065     
=============================================
  Files            56      387      +331     
  Lines          2491     9619     +7128     
  Branches        325      892      +567     
=============================================
+ Hits           1443     7692     +6249     
- Misses          903     1660      +757     
- Partials        145      267      +122     
Impacted Files Coverage Δ
...n/java/com/hedera/datagenerator/DataGenerator.java 0.00% <ø> (ø)
...dera/datagenerator/DataGeneratorConfiguration.java 0.00% <ø> (ø)
.../hedera/datagenerator/DataGeneratorProperties.java 0.00% <ø> (ø)
...agenerator/common/CryptoTransactionProperties.java 0.00% <ø> (ø)
...com/hedera/datagenerator/common/EntityManager.java 0.00% <ø> (ø)
...atagenerator/common/FileTransactionProperties.java 0.00% <ø> (ø)
...tagenerator/common/TopicTransactionProperties.java 0.00% <ø> (ø)
...era/datagenerator/common/TransactionGenerator.java 0.00% <ø> (ø)
...nerator/common/TransactionGeneratorProperties.java 0.00% <ø> (ø)
.../java/com/hedera/datagenerator/common/Utility.java 86.95% <ø> (ø)
... and 481 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f7bc254...e63613b. Read the comment docs.

@steven-sheehy steven-sheehy added performance enhancement Type: New feature and removed bug Type: Something isn't working labels May 26, 2021
@steven-sheehy steven-sheehy removed this from the Mirror 0.35.0 milestone Jun 2, 2021
@Nana-EC Nana-EC force-pushed the fix-slow-entity-ingestion branch from 96042de to 6b6a9af Compare June 7, 2021 18:45
@Nana-EC Nana-EC marked this pull request as ready for review June 7, 2021 23:31
@steven-sheehy steven-sheehy added this to the Mirror 0.36.0 milestone Jun 8, 2021
Copy link
Contributor

@ijungmann ijungmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some really good work here, just a few things.

Copy link
Member

@steven-sheehy steven-sheehy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still need to look at the sql generation more

Copy link
Contributor

@ijungmann ijungmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will give another pass, other than the mentioned unused code I haven't seen anything.

Signed-off-by: Nana-EC <nana.essilfie-conduah@hedera.com>
Signed-off-by: Nana-EC <nana.essilfie-conduah@hedera.com>
Signed-off-by: Nana-EC <nana.essilfie-conduah@hedera.com>
Signed-off-by: Nana-EC <nana.essilfie-conduah@hedera.com>
Signed-off-by: Nana-EC <nana.essilfie-conduah@hedera.com>
Signed-off-by: Nana-EC <nana.essilfie-conduah@hedera.com>
Signed-off-by: Nana-EC <nana.essilfie-conduah@hedera.com>
Signed-off-by: Nana-EC <nana.essilfie-conduah@hedera.com>
Signed-off-by: Nana-EC <nana.essilfie-conduah@hedera.com>
Signed-off-by: Nana-EC <nana.essilfie-conduah@hedera.com>
Signed-off-by: Nana-EC <nana.essilfie-conduah@hedera.com>
Signed-off-by: Nana-EC <nana.essilfie-conduah@hedera.com>
Signed-off-by: Nana-EC <nana.essilfie-conduah@hedera.com>
Signed-off-by: Nana-EC <nana.essilfie-conduah@hedera.com>
Signed-off-by: Nana-EC <nana.essilfie-conduah@hedera.com>
Signed-off-by: Nana-EC <nana.essilfie-conduah@hedera.com>
Signed-off-by: Nana-EC <nana.essilfie-conduah@hedera.com>
Signed-off-by: Nana-EC <nana.essilfie-conduah@hedera.com>
steven-sheehy
steven-sheehy previously approved these changes Jun 15, 2021
Copy link
Member

@steven-sheehy steven-sheehy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Signed-off-by: Nana-EC <nana.essilfie-conduah@hedera.com>
@sonarcloud
Copy link

sonarcloud bot commented Jun 15, 2021

Copy link
Collaborator

@xin-hedera xin-hedera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Nana-EC Nana-EC merged commit 419ed6e into master Jun 16, 2021
@Nana-EC Nana-EC deleted the fix-slow-entity-ingestion branch June 16, 2021 14:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Type: New feature P1 parser Area: File parsing performance
Projects
None yet
4 participants