Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize importer memory footprint #2103

Merged
merged 10 commits into from
Jun 15, 2021
Merged

Optimize importer memory footprint #2103

merged 10 commits into from
Jun 15, 2021

Conversation

xin-hedera
Copy link
Collaborator

@xin-hedera xin-hedera commented Jun 10, 2021

Detailed description:

  • Move persistBytes to downloader
  • Move keepFiles to downloader
  • Change saved stream files & signature files layout to match cloud bucket
  • Clear StreamFile bytes and items in post parse
  • Use DirectChannel when queueCapacity <= 0
  • Use javaconfig to conditionally set integration service endpoint's poller

Which issue(s) this PR fixes:
Fixes #2087

Special notes for your reviewer:

Checklist

  • Documentation added
  • Tests updated

- clear bytes and items if stream file is skipped or successfully parsed

Signed-off-by: Xin Li <xin.li@hedera.com>
@xin-hedera xin-hedera added bug Type: Something isn't working P2 performance parser Area: File parsing labels Jun 10, 2021
@xin-hedera xin-hedera added this to the Mirror 0.36.0 milestone Jun 10, 2021
@xin-hedera xin-hedera requested a review from a team June 10, 2021 01:40
@xin-hedera xin-hedera self-assigned this Jun 10, 2021
Signed-off-by: Xin Li <xin.li@hedera.com>
@codecov
Copy link

codecov bot commented Jun 10, 2021

Codecov Report

Merging #2103 (b1e5c94) into master (126751e) will decrease coverage by 13.17%.
The diff coverage is 78.26%.

❗ Current head b1e5c94 differs from pull request most recent head 708b918. Consider uploading reports for the commit 708b918 to get more accurate results
Impacted file tree graph

@@              Coverage Diff              @@
##             master    #2103       +/-   ##
=============================================
- Coverage     87.05%   73.87%   -13.18%     
+ Complexity     1744      242     -1502     
=============================================
  Files           315      152      -163     
  Lines          7731     4368     -3363     
  Branches        740      439      -301     
=============================================
- Hits           6730     3227     -3503     
- Misses          772     1085      +313     
+ Partials        229       56      -173     
Impacted Files Coverage Δ
...ensus/ConsensusCreateTopicTransactionSupplier.java 0.00% <ø> (ø)
...sus/ConsensusSubmitMessageTransactionSupplier.java 0.00% <0.00%> (ø)
...ensus/ConsensusUpdateTopicTransactionSupplier.java 0.00% <ø> (ø)
...er/schedule/ScheduleCreateTransactionSupplier.java 0.00% <ø> (ø)
...supplier/token/TokenCreateTransactionSupplier.java 0.00% <ø> (ø)
...supplier/token/TokenUpdateTransactionSupplier.java 0.00% <ø> (ø)
hedera-mirror-rosetta/app/domain/types/account.go 100.00% <ø> (ø)
...edera-mirror-rosetta/app/domain/types/operation.go 100.00% <ø> (ø)
...-mirror-rosetta/app/persistence/account/account.go 93.33% <ø> (+5.09%) ⬆️
...rosetta/app/persistence/addressbook/entry/entry.go 100.00% <ø> (ø)
... and 272 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 26df5ba...708b918. Read the comment docs.

Signed-off-by: Xin Li <xin.li@hedera.com>
@steven-sheehy
Copy link
Member

DirectChannel doesn't fit the requirement:
no concept of queueCapacity
subscriber of a DirectChannel is called in the publisher's thread

But that's exactly what I stated in the call and it's what we want for balances to reduce memory usage by 66% or more. My suggestion is this:

    @Bean(CHANNEL_BALANCE)
    MessageChannel channelBalance(BalanceParserProperties properties) {
        if (properties.getQueueCapacity() <= 0) {
            return MessageChannels.direct().get();
        else {
            return MessageChannels.queue(properties.getQueueCapacity()).get();
        }
    }

@xin-hedera
Copy link
Collaborator Author

DirectChannel doesn't fit the requirement:
no concept of queueCapacity
subscriber of a DirectChannel is called in the publisher's thread

But that's exactly what I stated in the call and it's what we want for balances to reduce memory usage by 66% or more. My suggestion is this:

    @Bean(CHANNEL_BALANCE)
    MessageChannel channelBalance(BalanceParserProperties properties) {
        if (properties.getQueueCapacity() <= 0) {
            return MessageChannels.direct().get();
        else {
            return MessageChannels.queue(properties.getQueueCapacity()).get();
        }
    }

Then RendezvousChannel (https://docs.spring.io/spring-integration/api/org/springframework/integration/channel/RendezvousChannel.html) is a better fit.

@steven-sheehy
Copy link
Member

steven-sheehy commented Jun 10, 2021

Sure, RendezvousChannel might do the trick. Does it still allow 2 total files in memory?

@xin-hedera
Copy link
Collaborator Author

Yes, 2 is the best we can get. One held in downloader and one is being processed in parser.

Copy link
Contributor

@Nana-EC Nana-EC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good asides MessageChannel discussion. One suggestion and thinking on if the test coverage can be expanded

@@ -225,6 +213,19 @@ private void assertFilesArchived(String... fileNames) throws Exception {
.contains(fileNames);
}

private void assertPostParseState(RecordFile recordFile, boolean success,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could move this method into an abstract class for the 3 test classes to share since they are exactly the same

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do. potentially can combine if not all at least most test cases for event and record. note the balance parser test is an integration test while the other two are mocked tests. perhaps I should unify the three.

@steven-sheehy steven-sheehy added the breaking Contains a breaking change that warrants mention in the release notes label Jun 10, 2021
Signed-off-by: Xin Li <xin.li@hedera.com>
- refactor parser test classes

Signed-off-by: Xin Li <xin.li@hedera.com>
Comment on lines 56 to 84
@Bean(CHANNEL_BALANCE)
MessageChannel channelBalance(BalanceParserProperties properties) {
return MessageChannels.queue(properties.getQueueCapacity()).get();
return channel(properties);
}

@Bean(CHANNEL_EVENT)
MessageChannel channelEvent(EventParserProperties properties) {
return MessageChannels.queue(properties.getQueueCapacity()).get();
return channel(properties);
}

@Bean(CHANNEL_RECORD)
MessageChannel channelRecord(RecordParserProperties properties) {
return MessageChannels.queue(properties.getQueueCapacity()).get();
return channel(properties);
}

@Bean(INTEGRATION_FLOW_BALANCE)
IntegrationFlow integrationFlowBalance(AccountBalanceFileParser parser) {
return integrationFlow(parser);
}

@Bean(INTEGRATION_FLOW_EVENT)
IntegrationFlow integrationFlowEvent(EventFileParser parser) {
return integrationFlow(parser);
}

@Bean(INTEGRATION_FLOW_RECORD)
@ConditionalOnRecordParser
IntegrationFlow integrationFlowRecord(RecordFileParser parser) {
return integrationFlow(parser);
Copy link
Collaborator Author

@xin-hedera xin-hedera Jun 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have to repeat these for different beans and different names.

didn't want to move the channel bean and integration flow bean to StreamFileParser classes because I prefer separating configuration from those classes as much as I can.

}

@Bean(INTEGRATION_FLOW_RECORD)
@ConditionalOnRecordParser
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure why we only have ConditionalOnRecordParser on RecordFileParser but not the other two parsers. anyway to avoid autowire error when parser.record.enabled is false, I also added the annotation here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe @ConditionalOnRecordParser was added more so to help out the pubs flow which should only check whether to run when record parser is enabled.
Balance and event parser don't have a sub flow so it didn't apply.
Although I do believe we wanted to check thee annotations at some point for easier coordination.

Signed-off-by: Xin Li <xin.li@hedera.com>
Nana-EC
Nana-EC previously approved these changes Jun 12, 2021
Copy link
Contributor

@Nana-EC Nana-EC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
A debug log recommendation if anything

Signed-off-by: Xin Li <xin.li@hedera.com>
Signed-off-by: Xin Li <xin.li@hedera.com>
Signed-off-by: Xin Li <xin.li@hedera.com>
steven-sheehy
steven-sheehy previously approved these changes Jun 14, 2021
Copy link
Member

@steven-sheehy steven-sheehy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

ijungmann
ijungmann previously approved these changes Jun 14, 2021
Copy link
Contributor

@ijungmann ijungmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small question, but LGTM

.get();
}

private MessageChannel channel(ParserProperties properties) {
if (properties.getQueueCapacity() <= 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have validation to ensure the minimum is 0, do we need to check for <? Could just be ==

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we should write code defensively in case that other class changes in the future and it's the same number of characters either way.

Nana-EC
Nana-EC previously approved these changes Jun 14, 2021
Copy link
Contributor

@Nana-EC Nana-EC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Signed-off-by: Xin Li <xin.li@hedera.com>
@sonarcloud
Copy link

sonarcloud bot commented Jun 14, 2021

Copy link
Contributor

@Nana-EC Nana-EC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@xin-hedera xin-hedera merged commit 8b0e277 into master Jun 15, 2021
@xin-hedera xin-hedera deleted the gke-importer-oom branch June 15, 2021 18:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking Contains a breaking change that warrants mention in the release notes bug Type: Something isn't working P2 parser Area: File parsing performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

OOMKilled in performance Kubernetes
4 participants