Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: parse chain-id from big genesis file could be slow #18204

Merged
merged 28 commits into from
Oct 26, 2023

Conversation

yihuang
Copy link
Collaborator

@yihuang yihuang commented Oct 23, 2023

Description

Currently, to parse the chain_id from genesis file, the whole json file is parsed into memory, sometimes the app_state is huge which makes the operation very costly, the solution here is to parse until the chain_id field and abort early, so as long as the chain_id field is put on the top of the genesis json file, it's much more efficient.

// ParseChainIDFromGenesis parses the `chain_id` from the genesis json file and abort early,
// it still parses the values before the `chain_id` field, particularly if the `app_state` field is
// before the `chain_id` field, it will parse the `app_state` value, user must make sure the `chain_id`
// is put before `app_state` or other big entries to enjoy the efficiency.
$ go test -v -run=^$ -bench=BenchmarkParseChainID -benchmem ./types/chain_id*
$ benchstat old new
goos: darwin
goarch: amd64
cpu: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
                    │ /tmp/before.txt │           /tmp/after.txt           │
                    │     sec/op      │   sec/op     vs base               │
ParseChainID/new-12    210.395µ ± 12%   3.612µ ± 4%  -98.28% (p=0.002 n=6)

                    │ /tmp/before.txt │           /tmp/after.txt            │
                    │      B/op       │     B/op      vs base               │
ParseChainID/new-12     88.353Ki ± 0%   1.414Ki ± 0%  -98.40% (p=0.002 n=6)

                    │ /tmp/before.txt │          /tmp/after.txt           │
                    │    allocs/op    │ allocs/op   vs base               │
ParseChainID/new-12       168.00 ± 0%   36.00 ± 0%  -78.57% (p=0.002 n=6)

Benchmark result is on a small typical genesis file, on a big file, the difference should be huge.


Author Checklist

All items are required. Please add a note to the item if the item is not applicable and
please add links to any relevant follow up issues.

I have...

  • included the correct type prefix in the PR title
  • added ! to the type prefix if API or client breaking change
  • targeted the correct branch (see PR Targeting)
  • provided a link to the relevant issue or specification
  • followed the guidelines for building modules
  • included the necessary unit and integration tests
  • added a changelog entry to CHANGELOG.md
  • included comments for documenting Go code
  • updated the relevant documentation or specification
  • reviewed "Files changed" and left comments if necessary
  • run make lint and make test
  • confirmed all CI checks have passed

Reviewers Checklist

All items are required. Please add a note if the item is not applicable and please add
your handle next to the items reviewed if you only reviewed selected items.

I have...

  • confirmed the correct type prefix in the PR title
  • confirmed ! in the type prefix if API or client breaking change
  • confirmed all author checklist items have been addressed
  • reviewed state machine logic
  • reviewed API design and naming
  • reviewed documentation is accurate
  • reviewed tests and test coverage
  • manually tested (if applicable)

Summary by CodeRabbit

  • New Feature: Introduced a more efficient method to parse the chain ID from the genesis file, enhancing the flexibility and error handling during the process.
  • Test: Added comprehensive tests and benchmarks for the new parsing function, ensuring its reliability and performance.
  • Refactor: Updated the logic for retrieving the chain ID in the application, improving the overall robustness of the system.

These changes aim to improve the system's performance and reliability, providing a more user-friendly experience.

Solution:
- use streaming json parsing for that
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 23, 2023

Walkthrough

This pull request introduces a new feature to parse the chain ID from a genesis JSON file using a streaming JSON parser. The changes include a new function ParseChainIDFromGenesis, a helper function validateChainID, and corresponding tests and benchmarks. The server logic has been updated to use this new function, improving flexibility and error handling.

Changes

File Summary
CHANGELOG.md Documented the new feature of using a streaming JSON parser to parse the chain ID from the genesis file.
server/util.go Updated the server logic to use the new ParseChainIDFromGenesis function for retrieving the chain ID.
types/chain_id.go Added a new function ParseChainIDFromGenesis that parses the chain ID from a JSON file using a streaming decoder. Also added a helper function validateChainID for validating the chain ID.
types/chain_id_test.go Added tests and benchmarks for the new ParseChainIDFromGenesis function.

"In the land of code, where the shadows lie,
A rabbit hopped forth, with a twinkle in its eye.
With a parser in hand, and a JSON file to read,
It hopped through the lines, with incredible speed.
The chain ID was found, with no memory to spare,
A feat of coding, beyond compare! 🐇💻🎉"


Tips

Chat with CodeRabbit Bot (@coderabbitai)

  • If you reply to a review comment from CodeRabbit, the bot will automatically respond.
  • To engage with CodeRabbit bot directly around the specific lines of code in the PR, mention @coderabbitai in your review comment
  • Note: Review comments are made on code diffs or files, not on the PR overview.
  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai help to get help.

Note: For conversation with the bot, please use the review comments on code diffs or files.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • The JSON schema for the configuration file is available here.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/coderabbit-overrides.json

CHANGELOG.md Outdated Show resolved Hide resolved
server/util.go Dismissed Show dismissed Hide dismissed
types/chain_id.go Outdated Show resolved Hide resolved
@julienrbrt julienrbrt changed the title Problem: parse chain-id from big genesis file could be slow perf: parse chain-id from big genesis file could be slow Oct 23, 2023
@yihuang yihuang marked this pull request as ready for review October 23, 2023 07:38
@yihuang yihuang requested a review from a team as a code owner October 23, 2023 07:38
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 2
Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 2fbc547 and 55be3ec.
Files ignored due to filter (2)
  • go.mod
  • go.sum
Files selected for processing (4)
  • CHANGELOG.md (1 hunks)
  • server/util.go (2 hunks)
  • types/chain_id.go (1 hunks)
  • types/chain_id_test.go (1 hunks)
Files skipped from review due to trivial changes (2)
  • CHANGELOG.md
  • types/chain_id_test.go
Additional comments (Suppressed): 2
server/util.go (2)
  • 42-42: The import genutiltypes "github.com/cosmos/cosmos-sdk/x/genutil/types" has been removed. Ensure that this does not affect other parts of the code that might be using it.

  • 487-496: The new code introduces a more efficient way of parsing the chain ID from the genesis file. It uses a streaming JSON parsing approach instead of loading the entire file into memory. This is a good performance improvement, especially for large genesis files. However, the error handling could be improved. Instead of using panic, consider returning the error to the caller function. This would allow the caller to decide how to handle the error and would make the code more robust and easier to debug.

- 			panic(err)
+ 			return nil, err
- 			panic(fmt.Errorf("failed to parse chain-id from genesis file: %w", err))
+ 			return nil, fmt.Errorf("failed to parse chain-id from genesis file: %w", err)

Comment on lines 14 to 24
decoder := jstream.NewDecoder(reader, 1).EmitKV()
var chain_id string
err := decoder.Decode(func(mv *jstream.MetaValue) bool {
if kv, ok := mv.Value.(jstream.KV); ok {
if kv.Key == ChainIDFieldName {
chain_id, _ = kv.Value.(string)
return false
}
}
return true
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function ParseChainIDFromGenesis uses the jstream library to parse the JSON genesis file. It's a good approach to use a streaming JSON parser for large files as it uses constant memory. However, there's a potential issue with error handling. In line 19, the error from the type assertion kv.Value.(string) is ignored. If kv.Value is not a string, this will not cause a panic due to the use of the "comma ok" idiom, but chain_id will be an empty string and no error will be reported at this stage. This could lead to misleading errors, as the function will later return an error stating that the "chain-id was not found in the genesis file", even though it was found but was not a string. Consider handling this error to provide more accurate feedback.

- chain_id, _ = kv.Value.(string)
+ chain_id, ok = kv.Value.(string)
+ if !ok {
+     return errors.New("chain-id is not a string")
+ }

Comment on lines 25 to 31
if len(chain_id) > 0 {
return chain_id, nil
}
if err == nil {
return "", errors.New("chain-id not found in genesis file")
}
return "", err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error handling here could be improved for clarity. If chain_id is empty and err is not nil, the function will return an error stating that the "chain-id was not found in the genesis file", even though the error might be unrelated to the presence of the chain-id. Consider checking err first and returning it if it's not nil, before checking if chain_id is empty.

- if len(chain_id) > 0 {
-     return chain_id, nil
- }
- if err == nil {
-     return "", errors.New("chain-id not found in genesis file")
- }
- return "", err
+ if err != nil {
+     return "", err
+ }
+ if len(chain_id) == 0 {
+     return "", errors.New("chain-id not found in genesis file")
+ }
+ return chain_id, nil

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0
Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 55be3ec and 9c220f5.
Files selected for processing (1)
  • types/chain_id.go (1 hunks)
Additional comments (Suppressed): 2
types/chain_id.go (2)
  • 16-26: The error handling issue identified in the previous review has not been addressed. If kv.Value is not a string, chain_id will be an empty string and no error will be reported at this stage. This could lead to misleading errors, as the function will later return an error stating that the "chain-id was not found in the genesis file", even though it was found but was not a string. Consider handling this error to provide more accurate feedback.
- chain_id, _ = kv.Value.(string)
+ chain_id, ok = kv.Value.(string)
+ if !ok {
+     return errors.New("chain-id is not a string")
+ }
  • 27-33: The error handling here could be improved for clarity. If chain_id is empty and err is not nil, the function will return an error stating that the "chain-id was not found in the genesis file", even though the error might be unrelated to the presence of the chain-id. Consider checking err first and returning it if it's not nil, before checking if chain_id is empty.
- if len(chain_id) > 0 {
-     return chain_id, nil
- }
- if err == nil {
-     return "", errors.New("chain-id not found in genesis file")
- }
- return "", err
+ if err != nil {
+     return "", err
+ }
+ if len(chain_id) == 0 {
+     return "", errors.New("chain-id not found in genesis file")
+ }
+ return chain_id, nil

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0
Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 9c220f5 and c74ba3b.
Files selected for processing (1)
  • types/chain_id.go (1 hunks)
Additional comments (Suppressed): 4
types/chain_id.go (4)
  • 16-40: The function ParseChainIDFromGenesis uses a streaming JSON parser to extract the chain-id from the genesis file. This approach is efficient for large JSON files as it uses constant memory. However, the function assumes that the chain-id is a string. If the chain-id is not a string or not found in the genesis file, an error is returned. This could potentially break the application if the genesis file structure changes or if the chain-id is not a string. Therefore, it's important to ensure that the genesis file always contains a string chain-id and that the application can handle the error cases appropriately.

  • 16-40: The function ParseChainIDFromGenesis uses the jstream library to parse the JSON stream. While this library is efficient for large JSON files, it's important to ensure that it's maintained and reliable. If the library is not well-maintained, it could introduce bugs or security vulnerabilities into the application. Therefore, it's recommended to evaluate the library's reliability and maintenance status before using it.

  • 16-40: The function ParseChainIDFromGenesis uses a callback function in the Decode method to process the JSON stream. This approach is efficient as it allows the function to stop processing as soon as the chain-id is found. However, the callback function returns a boolean value to indicate whether to continue processing the stream. This could be confusing as it's not immediately clear what the return value represents. It's recommended to add a comment explaining the purpose of the return value to improve code readability.

	err := decoder.Decode(func(mv *jstream.MetaValue) bool {
		if kv, ok := mv.Value.(jstream.KV); ok {
			if kv.Key == ChainIDFieldName {
				chain_id, ok = kv.Value.(string)
				// Return false to stop processing the stream
				return false
			}
		}
		// Return true to continue processing the stream
		return true
	})
  • 33-39: The error handling in the function ParseChainIDFromGenesis could be improved. Currently, the function checks if the chain-id is a string and if it's found in the genesis file. If not, it returns an error. However, these checks are done after the Decode method is called. If the Decode method returns an error, these checks are unnecessary. It's recommended to check for errors immediately after the Decode method is called to improve performance and code readability.
	err := decoder.Decode(func(mv *jstream.MetaValue) bool {
		if kv, ok := mv.Value.(jstream.KV); ok {
			if kv.Key == ChainIDFieldName {
				chain_id, ok = kv.Value.(string)
				return false
			}
		}
		return true
	})
+   if err != nil {
+       return "", err
+   }
	if len(chain_id) > 0 {
		return chain_id, nil
	}
	if !ok {
		return "", errors.New("chain-id is not a string")
	}
-   if err == nil {
-       return "", errors.New("chain-id not found in genesis file")
-   }
-   return "", err
+   return "", errors.New("chain-id not found in genesis file")

types/chain_id.go Fixed Show fixed Hide fixed
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0
Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between c74ba3b and 7680cdd.
Files selected for processing (1)
  • types/chain_id.go (1 hunks)
Additional comments (Suppressed): 4
types/chain_id.go (4)
  • 16-16: The jstream.NewDecoder(reader, 1).EmitKV() is used to create a new decoder with a buffer size of 1. This is a good practice for memory management, especially when dealing with large files. However, it's worth noting that a smaller buffer size might lead to more frequent I/O operations, which could potentially impact performance. If performance becomes an issue, consider increasing the buffer size.

  • 22-26: The code checks if the current value is a key-value pair and if the key is ChainIDFieldName. If both conditions are met, it attempts to cast the value to a string and stops the decoding. This is a good practice as it stops the decoding as soon as the required data is found, saving unnecessary computation. However, it's worth noting that if the chain_id is not found, the decoding will continue until the end of the file, which could be time-consuming for large files.

  • 30-39: The error handling here is comprehensive and provides clear error messages for different scenarios: when the chain_id is not a string, when it's not found in the genesis file, and when there's an error during decoding. This is a good practice as it helps with debugging and error tracking.

  • 14-14: The TODO comment suggests considering the use of encoding/json/v2 when it's ready. This is a good practice as it shows forward-thinking and planning for future improvements. However, it's important to keep track of these TODO comments and address them in a timely manner to avoid accumulating technical debt.

Copy link
Member

@facundomedica facundomedica left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, good catch!

types/chain_id.go Outdated Show resolved Hide resolved
@julienrbrt
Copy link
Member

Should we wait for upstream to merge your changes or you are going to maintain this fork?

Copy link
Collaborator

@odeke-em odeke-em left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work and thanks @yihuang for the improvement to use streaming JSON. I've added some suggestions, please take a look


func BenchmarkParseChainID(b *testing.B) {
// an arbitrary genesis file from a cronos devnet.
json := `
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please commit this JSON to a file under a directory "testdata" so example "testdata/streaming_chain_id.json" and then read it in by file then kindly run b.ResetTimer() even before each sub-benchmark.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

b.Run("new", func(b *testing.B) {
b.ResetTimer()
for i := 0; i < b.N; i++ {
_, err := ParseChainIDFromGenesis(strings.NewReader(json))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please assert that the parsed chainID is not empty

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

"vesting": {}
}
}
`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also add: b.ReportAllocs()

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

server/util.go Show resolved Hide resolved
package types

import (
strings "strings"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 2

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 7680cdd and 1035e32.
Files ignored due to filter (1)
  • types/testdata/streaming_chain_id.json
Files selected for processing (2)
  • types/chain_id.go (1 hunks)
  • types/chain_id_test.go (1 hunks)
Files skipped from review due to trivial changes (1)
  • types/chain_id_test.go
Additional comments: 1
types/chain_id.go (1)
  • 14-57: The function ParseChainIDFromGenesis is well implemented. It uses a streaming JSON parsing approach which is more efficient in terms of memory usage and speed when dealing with large JSON files. The function also handles errors properly and validates the chain_id before returning it. However, the function's comment mentions that the user must ensure the chain_id is put before app_state or other big entries to enjoy the efficiency. This could be a potential issue if the user is not aware of this requirement. It would be better to handle this within the function itself or at least make it clear in the function's documentation.

types/chain_id.go Outdated Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 63b604c and a4cc48a.
Files selected for processing (1)
  • types/chain_id.go (1 hunks)
Additional comments: 3
types/chain_id.go (3)
  • 15-19: The function comment has been updated to include the information that the function will return an error if the chain_id field is not found in the JSON file. This is a good improvement as it provides more clarity on the function's behavior.

  • 20-60: The function ParseChainIDFromGenesis is well implemented. It uses a streaming JSON parser to parse the JSON file until the chain_id field is found, then aborts early. This approach is efficient, especially for large files. The function also handles errors properly and validates the chain_id using the validateChainID function.

  • 62-71: The function validateChainID checks if the chain_id is not empty and if its length does not exceed the maximum allowed length. This is a good practice to ensure the validity of the chain_id.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between a4cc48a and 7a58af1.
Files selected for processing (1)
  • CHANGELOG.md (1 hunks)
Additional comments: 1
CHANGELOG.md (1)
  • 70-70: The change log entry correctly summarizes the changes made in the PR. It provides a clear and concise description of the change and links to the relevant PR for more details.

Copy link
Member

@julienrbrt julienrbrt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! Can you fix linting?

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 7a58af1 and 5147d1c.
Files selected for processing (1)
  • types/chain_id_test.go (1 hunks)
Additional comments: 4
types/chain_id_test.go (4)
  • 1-11: The imports look fine and are relevant to the tests being performed. The use of the embed package to include a JSON file for benchmarking is a good practice.

  • 13-14: The go:embed directive is used correctly to include the parse_chain_id.json file for benchmarking. The variable BenchmarkGenesis is appropriately named and will hold the contents of the JSON file.

  • 16-108: The TestParseChainIDFromGenesis function is well-structured and covers a wide range of test cases, including success, nested, non-existent, invalid type, invalid JSON, empty chain_id, whitespace chain_id, and chain_id too long. The use of table-driven tests is a good practice in Go, and the test cases are comprehensive and cover all possible edge cases. The error messages are checked to ensure they match the expected error messages, which is a good practice for testing error handling.

  • 110-130: The BenchmarkParseChainID function is well-structured and benchmarks the new ParseChainIDFromGenesis function against the old AppGenesisFromReader function. The use of b.ReportAllocs() is a good practice to measure memory allocation. The b.ResetTimer() function is used correctly to exclude setup time from the benchmark. The benchmark function is correctly structured to run the function b.N times, which is the standard way to write benchmarks in Go.

@odeke-em
Copy link
Collaborator

@yihuang an effective way to communicate the strength of your performance changes is by running benchmarks more than 5 times on a quiet machine with the -count=N flag say -count=10, then using benchstat to compare results https://pkg.go.dev/golang.org/x/perf/cmd/benchstat then splitting before and after into separate files and with the same name of benchmark result functions.
Otherwise in the current form, it requires one to eyeball and manually compare on their own, whereas after a bench stat comparison, anyone can trivially see the massive percentage improvements

but that means we need to add the same benchmark to the repo prior to this PR, right?

You already made the benchmark to compare both in the /old and /new. You'll just need to paste the contents of /old into old.txt then those of /new into new.txt, delete the differences in names then run

benchstat old.txt bench.txt

and paste the results into your commit message and this PR at the top.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 5147d1c and ba767d9.
Files selected for processing (1)
  • types/chain_id_test.go (1 hunks)
Files skipped from review due to trivial changes (1)
  • types/chain_id_test.go

@yihuang
Copy link
Collaborator Author

yihuang commented Oct 25, 2023

@yihuang an effective way to communicate the strength of your performance changes is by running benchmarks more than 5 times on a quiet machine with the -count=N flag say -count=10, then using benchstat to compare results https://pkg.go.dev/golang.org/x/perf/cmd/benchstat then splitting before and after into separate files and with the same name of benchmark result functions.
Otherwise in the current form, it requires one to eyeball and manually compare on their own, whereas after a bench stat comparison, anyone can trivially see the massive percentage improvements

but that means we need to add the same benchmark to the repo prior to this PR, right?

You already made the benchmark to compare both in the /old and /new. You'll just need to paste the contents of /old into old.txt then those of /new into new.txt, delete the differences in names then run

benchstat old.txt bench.txt

and paste the results into your commit message and this PR at the top.

sounds good, updated PR description, thanks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we maybe move this to genutil? Imho this makes more sense to have it there as its genesis related than in types.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 2

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between ba767d9 and 2d3ba07.
Files ignored due to filter (1)
  • x/genutil/types/testdata/parse_chain_id.json
Files selected for processing (3)
  • server/util.go (1 hunks)
  • x/genutil/types/chain_id.go (1 hunks)
  • x/genutil/types/chain_id_test.go (1 hunks)
Additional comments: 4
server/util.go (1)
  • 488-497: The new code introduces a more efficient way of parsing the chain_id from the genesis file. Instead of loading the entire file into memory, it uses a streaming JSON parser to parse the chain_id and aborts early. This approach significantly reduces memory usage and processing time. However, it's important to note that this approach assumes that the chain_id field is placed before any large entries in the JSON file for maximum efficiency.

While the use of panic is acceptable in this context as it's used to halt the program execution when the genesis file cannot be opened or parsed, it's generally recommended to handle errors gracefully and provide informative error messages to the user. In this case, the error messages are informative and provide context about the error.

-		appGenesis, err := genutiltypes.AppGenesisFromFile(filepath.Join(homeDir, "config", "genesis.json"))
-		if err != nil {
-			panic(err)
-		}
-		chainID = appGenesis.ChainID
+		reader, err := os.Open(filepath.Join(homeDir, "config", "genesis.json"))
+		if err != nil {
+			panic(err)
+		}
+		defer reader.Close()
+		chainID, err = genutiltypes.ParseChainIDFromGenesis(reader)
+		if err != nil {
+			panic(fmt.Errorf("failed to parse chain-id from genesis file: %w", err))
+		}
x/genutil/types/chain_id.go (2)
  • 21-60: The function ParseChainIDFromGenesis is well-written and handles errors properly. It uses a streaming JSON parser to parse the chain_id and aborts early, which is a good practice for performance optimization. The function also validates the chain_id using the validateChainID function, which is a good practice for data validation.

  • 62-71: The function validateChainID is well-written and handles errors properly. It checks if the chain_id is empty or exceeds the maximum length, which are good practices for data validation.

x/genutil/types/chain_id_test.go (1)
  • 110-130: The benchmark tests for ParseChainIDFromGenesis and AppGenesisFromReader are well-structured. They measure the time and memory allocations for each function, which is useful for comparing their performance. However, the BenchmarkGenesis string is not shown in this hunk, so it's unclear what kind of data is being used for the benchmark. It would be helpful to include a comment describing the structure and size of the BenchmarkGenesis string for context.

Here's a suggestion to improve the error message checking in the test cases:

- require.Contains(t, err.Error(), tc.expError)
+ require.Equal(t, tc.expError, err.Error())
Committable suggestion (Beta)
func TestParseChainIDFromGenesis(t *testing.T) {
	testCases := []struct {
		name       string
		json       string
		expChainID string
		expError   string
	}{
		{
			"success",
			`{
				"state": {
				  "accounts": {
					"a": {}
				  }
				},
				"chain_id": "test-chain-id"
			}`,
			"test-chain-id",
			"",
		},
		{
			"nested",
			`{
				"state": {
				  "accounts": {
					"a": {}
				  },
				  "chain_id": "test-chain-id"
				}
			}`,
			"",
			"missing chain-id in genesis file",
		},
		{
			"not exist",
			`{
				"state": {
				  "accounts": {
					"a": {}
				  }
				},
				"chain-id": "test-chain-id"
			}`,
			"",
			"missing chain-id in genesis file",
		},
		{
			"invalid type",
			`{
				"chain-id": 1,
			}`,
			"",
			"invalid character '}' looking for beginning of object key string",
		},
		{
			"invalid json",
			`[ " ': }`,
			"",
			"expected {, got [",
		},
		{
			"empty chain_id",
			`{"chain_id": ""}`,
			"",
			"genesis doc must include non-empty chain_id",
		},
		{
			"whitespace chain_id",
			`{"chain_id": "   "}`,
			"",
			"genesis doc must include non-empty chain_id",
		},
		{
			"chain_id too long",
			`{"chain_id": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"}`,
			"",
			"chain_id in genesis doc is too long",
		},
	}

	for _, tc := range testCases {
		t.Run(tc.name, func(t *testing.T) {
			chain_id, err := types.ParseChainIDFromGenesis(strings.NewReader(tc.json))
			if tc.expChainID == "" {
				require.Error(t, err)
				require.Equal(t, tc.expError, err.Error())
			} else {
				require.NoError(t, err)
				require.Equal(t, tc.expChainID, chain_id)
			}
		})
	}
}

x/genutil/types/chain_id.go Outdated Show resolved Hide resolved
Comment on lines +16 to +108
func TestParseChainIDFromGenesis(t *testing.T) {
testCases := []struct {
name string
json string
expChainID string
expError string
}{
{
"success",
`{
"state": {
"accounts": {
"a": {}
}
},
"chain_id": "test-chain-id"
}`,
"test-chain-id",
"",
},
{
"nested",
`{
"state": {
"accounts": {
"a": {}
},
"chain_id": "test-chain-id"
}
}`,
"",
"missing chain-id in genesis file",
},
{
"not exist",
`{
"state": {
"accounts": {
"a": {}
}
},
"chain-id": "test-chain-id"
}`,
"",
"missing chain-id in genesis file",
},
{
"invalid type",
`{
"chain-id": 1,
}`,
"",
"invalid character '}' looking for beginning of object key string",
},
{
"invalid json",
`[ " ': }`,
"",
"expected {, got [",
},
{
"empty chain_id",
`{"chain_id": ""}`,
"",
"genesis doc must include non-empty chain_id",
},
{
"whitespace chain_id",
`{"chain_id": " "}`,
"",
"genesis doc must include non-empty chain_id",
},
{
"chain_id too long",
`{"chain_id": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"}`,
"",
"chain_id in genesis doc is too long",
},
}

for _, tc := range testCases {
t.Run(tc.name, func(t *testing.T) {
chain_id, err := types.ParseChainIDFromGenesis(strings.NewReader(tc.json))
if tc.expChainID == "" {
require.Error(t, err)
require.Contains(t, err.Error(), tc.expError)
} else {
require.NoError(t, err)
require.Equal(t, tc.expChainID, chain_id)
}
})
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test cases for TestParseChainIDFromGenesis are well-structured and cover a variety of scenarios, including successful parsing, missing chain_id, invalid JSON, and chain_id that is too long, empty, or contains only whitespace. This is a good practice as it ensures that the function behaves as expected in different situations. However, there is a minor issue with the error message checking. The test cases are checking for a substring of the error message, which might lead to false positives if the error messages are changed in the future. It would be better to compare the entire error message to ensure accuracy.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 2d3ba07 and 310b580.
Files selected for processing (1)
  • CHANGELOG.md (1 hunks)
Additional comments: 1
CHANGELOG.md (1)
  • 70-70: The changelog entry correctly summarizes the changes made in the PR. It mentions the use of a streaming JSON parser to parse the chain_id from the genesis file, which is the main change in this PR.

yihuang and others added 2 commits October 26, 2023 10:07
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 1

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 310b580 and 1bb078f.
Files selected for processing (3)
  • CHANGELOG.md (1} hunks)
  • server/util.go (1} hunks)
  • x/genutil/types/chain_id.go (1} hunks)
Additional comments: 5
CHANGELOG.md (1)
  • 70-70: The change log entry correctly summarizes the changes made in the PR. It mentions the use of a streaming JSON parser to parse the chain_id from the genesis file, which aligns with the PR summary.
x/genutil/types/chain_id.go (4)
  • 1-11: The import statements look fine. No unused imports detected.

  • 13-14: The constant ChainIDFieldName is well defined and can be reused in other parts of the codebase.

  • 15-58: The function ParseChainIDFromGenesis is well implemented. It uses a streaming JSON parser to parse the genesis JSON file until the chain_id field is found, which is a good approach for handling large files. The function also handles errors appropriately and validates the chain_id using the validateChainID function.

  • 60-69: The function validateChainID is well implemented. It checks if the chain_id is not empty and if its length does not exceed the maximum allowed length. The function also handles errors appropriately.

Comment on lines +488 to +497
reader, err := os.Open(filepath.Join(homeDir, "config", "genesis.json"))
if err != nil {
panic(err)
}
defer reader.Close()

chainID = appGenesis.ChainID
chainID, err = genutiltypes.ParseChainIDFromGenesis(reader)
if err != nil {
panic(fmt.Errorf("failed to parse chain-id from genesis file: %w", err))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new code introduces a more efficient way of parsing the chain_id from the genesis JSON file. Instead of loading the entire file into memory, it uses a streaming JSON parser to parse the file until the chain_id field is found. This approach significantly reduces memory usage and processing time. However, the use of panic for error handling is not recommended. It would be better to return the error to the caller and let it decide how to handle the error. This would make the code more robust and easier to debug.

- panic(err)
+ return "", fmt.Errorf("failed to open genesis file: %w", err)
- panic(fmt.Errorf("failed to parse chain-id from genesis file: %w", err))
+ return "", fmt.Errorf("failed to parse chain-id from genesis file: %w", err)
Committable suggestion (Beta)
Suggested change
reader, err := os.Open(filepath.Join(homeDir, "config", "genesis.json"))
if err != nil {
panic(err)
}
defer reader.Close()
chainID = appGenesis.ChainID
chainID, err = genutiltypes.ParseChainIDFromGenesis(reader)
if err != nil {
panic(fmt.Errorf("failed to parse chain-id from genesis file: %w", err))
}
chainID := cast.ToString(appOpts.Get(flags.FlagChainID))
if chainID == "" {
// fallback to genesis chain-id
reader, err := os.Open(filepath.Join(homeDir, "config", "genesis.json"))
if err != nil {
return "", fmt.Errorf("failed to open genesis file: %w", err)
}
defer reader.Close()
chainID, err = genutiltypes.ParseChainIDFromGenesis(reader)
if err != nil {
return "", fmt.Errorf("failed to parse chain-id from genesis file: %w", err)
}
}
snapshotStore, err := GetSnapshotStore(appOpts)

@julienrbrt julienrbrt added this pull request to the merge queue Oct 26, 2023
Merged via the queue into cosmos:main with commit 793ca9a Oct 26, 2023
47 of 54 checks passed
mergify bot pushed a commit that referenced this pull request Oct 26, 2023
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: Julien Robert <julien@rbrt.fr>
(cherry picked from commit 793ca9a)

# Conflicts:
#	CHANGELOG.md
mmsqe pushed a commit to mmsqe/cosmos-sdk that referenced this pull request Oct 26, 2023
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: Julien Robert <julien@rbrt.fr>
@yihuang yihuang deleted the parse-chain-id branch October 26, 2023 09:19
tac0turtle added a commit that referenced this pull request Oct 26, 2023
…8204) (#18267)

Co-authored-by: yihuang <huang@crypto.com>
Co-authored-by: marbar3778 <marbar3778@yahoo.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/v0.50.x PR scheduled for inclusion in the v0.50's next stable release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants