perf: parse chain-id from big genesis file could be slow #18204

yihuang · 2023-10-23T03:30:12Z

Description

Currently, to parse the chain_id from genesis file, the whole json file is parsed into memory, sometimes the app_state is huge which makes the operation very costly, the solution here is to parse until the chain_id field and abort early, so as long as the chain_id field is put on the top of the genesis json file, it's much more efficient.

// ParseChainIDFromGenesis parses the `chain_id` from the genesis json file and abort early,
// it still parses the values before the `chain_id` field, particularly if the `app_state` field is
// before the `chain_id` field, it will parse the `app_state` value, user must make sure the `chain_id`
// is put before `app_state` or other big entries to enjoy the efficiency.

$ go test -v -run=^$ -bench=BenchmarkParseChainID -benchmem ./types/chain_id*
$ benchstat old new
goos: darwin
goarch: amd64
cpu: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
                    │ /tmp/before.txt │           /tmp/after.txt           │
                    │     sec/op      │   sec/op     vs base               │
ParseChainID/new-12    210.395µ ± 12%   3.612µ ± 4%  -98.28% (p=0.002 n=6)

                    │ /tmp/before.txt │           /tmp/after.txt            │
                    │      B/op       │     B/op      vs base               │
ParseChainID/new-12     88.353Ki ± 0%   1.414Ki ± 0%  -98.40% (p=0.002 n=6)

                    │ /tmp/before.txt │          /tmp/after.txt           │
                    │    allocs/op    │ allocs/op   vs base               │
ParseChainID/new-12       168.00 ± 0%   36.00 ± 0%  -78.57% (p=0.002 n=6)

Benchmark result is on a small typical genesis file, on a big file, the difference should be huge.

Author Checklist

All items are required. Please add a note to the item if the item is not applicable and
please add links to any relevant follow up issues.

I have...

Reviewers Checklist

All items are required. Please add a note if the item is not applicable and please add
your handle next to the items reviewed if you only reviewed selected items.

I have...

confirmed the correct type prefix in the PR title
confirmed ! in the type prefix if API or client breaking change
confirmed all author checklist items have been addressed
reviewed state machine logic
reviewed API design and naming
reviewed documentation is accurate
reviewed tests and test coverage
manually tested (if applicable)

Summary by CodeRabbit

New Feature: Introduced a more efficient method to parse the chain ID from the genesis file, enhancing the flexibility and error handling during the process.
Test: Added comprehensive tests and benchmarks for the new parsing function, ensuring its reliability and performance.
Refactor: Updated the logic for retrieving the chain ID in the application, improving the overall robustness of the system.

These changes aim to improve the system's performance and reliability, providing a more user-friendly experience.

Solution: - use streaming json parsing for that

coderabbitai · 2023-10-23T03:30:20Z

Walkthrough

This pull request introduces a new feature to parse the chain ID from a genesis JSON file using a streaming JSON parser. The changes include a new function ParseChainIDFromGenesis, a helper function validateChainID, and corresponding tests and benchmarks. The server logic has been updated to use this new function, improving flexibility and error handling.

Changes

File	Summary
`CHANGELOG.md`	Documented the new feature of using a streaming JSON parser to parse the chain ID from the genesis file.
`server/util.go`	Updated the server logic to use the new `ParseChainIDFromGenesis` function for retrieving the chain ID.
`types/chain_id.go`	Added a new function `ParseChainIDFromGenesis` that parses the chain ID from a JSON file using a streaming decoder. Also added a helper function `validateChainID` for validating the chain ID.
`types/chain_id_test.go`	Added tests and benchmarks for the new `ParseChainIDFromGenesis` function.

"In the land of code, where the shadows lie,
A rabbit hopped forth, with a twinkle in its eye.
With a parser in hand, and a JSON file to read,
It hopped through the lines, with incredible speed.
The chain ID was found, with no memory to spare,
A feat of coding, beyond compare! 🐇💻🎉"

Tips

Chat with CodeRabbit Bot (`@coderabbitai`)

If you reply to a review comment from CodeRabbit, the bot will automatically respond.
To engage with CodeRabbit bot directly around the specific lines of code in the PR, mention @coderabbitai in your review comment
Note: Review comments are made on code diffs or files, not on the PR overview.
Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Commands (invoked as PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai help to get help.

Note: For conversation with the bot, please use the review comments on code diffs or files.

CodeRabbit Configration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
The JSON schema for the configuration file is available here.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/coderabbit-overrides.json

CHANGELOG.md

server/util.go

types/chain_id.go

coderabbitai

Review Status

Actionable comments generated: 2
Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between 2fbc547 and 55be3ec.

Files ignored due to filter (2)

go.mod
go.sum

Files selected for processing (4)

CHANGELOG.md (1 hunks)
server/util.go (2 hunks)
types/chain_id.go (1 hunks)
types/chain_id_test.go (1 hunks)

Files skipped from review due to trivial changes (2)

CHANGELOG.md
types/chain_id_test.go

Additional comments (Suppressed): 2

server/util.go (2)
42-42: The import genutiltypes "github.com/cosmos/cosmos-sdk/x/genutil/types" has been removed. Ensure that this does not affect other parts of the code that might be using it.

487-496: The new code introduces a more efficient way of parsing the chain ID from the genesis file. It uses a streaming JSON parsing approach instead of loading the entire file into memory. This is a good performance improvement, especially for large genesis files. However, the error handling could be improved. Instead of using panic, consider returning the error to the caller function. This would allow the caller to decide how to handle the error and would make the code more robust and easier to debug.
- 			panic(err)
+ 			return nil, err
- 			panic(fmt.Errorf("failed to parse chain-id from genesis file: %w", err))
+ 			return nil, fmt.Errorf("failed to parse chain-id from genesis file: %w", err)

coderabbitai · 2023-10-23T07:41:42Z

types/chain_id.go

+	decoder := jstream.NewDecoder(reader, 1).EmitKV()
+	var chain_id string
+	err := decoder.Decode(func(mv *jstream.MetaValue) bool {
+		if kv, ok := mv.Value.(jstream.KV); ok {
+			if kv.Key == ChainIDFieldName {
+				chain_id, _ = kv.Value.(string)
+				return false
+			}
+		}
+		return true
+	})


The function ParseChainIDFromGenesis uses the jstream library to parse the JSON genesis file. It's a good approach to use a streaming JSON parser for large files as it uses constant memory. However, there's a potential issue with error handling. In line 19, the error from the type assertion kv.Value.(string) is ignored. If kv.Value is not a string, this will not cause a panic due to the use of the "comma ok" idiom, but chain_id will be an empty string and no error will be reported at this stage. This could lead to misleading errors, as the function will later return an error stating that the "chain-id was not found in the genesis file", even though it was found but was not a string. Consider handling this error to provide more accurate feedback.

- chain_id, _ = kv.Value.(string) + chain_id, ok = kv.Value.(string) + if !ok { + return errors.New("chain-id is not a string") + }

coderabbitai · 2023-10-23T07:41:42Z

types/chain_id.go

+	if len(chain_id) > 0 {
+		return chain_id, nil
+	}
+	if err == nil {
+		return "", errors.New("chain-id not found in genesis file")
+	}
+	return "", err


The error handling here could be improved for clarity. If chain_id is empty and err is not nil, the function will return an error stating that the "chain-id was not found in the genesis file", even though the error might be unrelated to the presence of the chain-id. Consider checking err first and returning it if it's not nil, before checking if chain_id is empty.

- if len(chain_id) > 0 { - return chain_id, nil - } - if err == nil { - return "", errors.New("chain-id not found in genesis file") - } - return "", err + if err != nil { + return "", err + } + if len(chain_id) == 0 { + return "", errors.New("chain-id not found in genesis file") + } + return chain_id, nil

coderabbitai

Review Status

Actionable comments generated: 0
Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between 55be3ec and 9c220f5.

Files selected for processing (1)

types/chain_id.go (1 hunks)

Additional comments (Suppressed): 2

types/chain_id.go (2)
16-26: The error handling issue identified in the previous review has not been addressed. If kv.Value is not a string, chain_id will be an empty string and no error will be reported at this stage. This could lead to misleading errors, as the function will later return an error stating that the "chain-id was not found in the genesis file", even though it was found but was not a string. Consider handling this error to provide more accurate feedback.
- chain_id, _ = kv.Value.(string)
+ chain_id, ok = kv.Value.(string)
+ if !ok {
+     return errors.New("chain-id is not a string")
+ }
27-33: The error handling here could be improved for clarity. If chain_id is empty and err is not nil, the function will return an error stating that the "chain-id was not found in the genesis file", even though the error might be unrelated to the presence of the chain-id. Consider checking err first and returning it if it's not nil, before checking if chain_id is empty.
- if len(chain_id) > 0 {
-     return chain_id, nil
- }
- if err == nil {
-     return "", errors.New("chain-id not found in genesis file")
- }
- return "", err
+ if err != nil {
+     return "", err
+ }
+ if len(chain_id) == 0 {
+     return "", errors.New("chain-id not found in genesis file")
+ }
+ return chain_id, nil

coderabbitai

Review Status

Actionable comments generated: 0
Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between 9c220f5 and c74ba3b.

Files selected for processing (1)

types/chain_id.go (1 hunks)

Additional comments (Suppressed): 4

types/chain_id.go (4)
16-40: The function ParseChainIDFromGenesis uses a streaming JSON parser to extract the chain-id from the genesis file. This approach is efficient for large JSON files as it uses constant memory. However, the function assumes that the chain-id is a string. If the chain-id is not a string or not found in the genesis file, an error is returned. This could potentially break the application if the genesis file structure changes or if the chain-id is not a string. Therefore, it's important to ensure that the genesis file always contains a string chain-id and that the application can handle the error cases appropriately.

16-40: The function ParseChainIDFromGenesis uses the jstream library to parse the JSON stream. While this library is efficient for large JSON files, it's important to ensure that it's maintained and reliable. If the library is not well-maintained, it could introduce bugs or security vulnerabilities into the application. Therefore, it's recommended to evaluate the library's reliability and maintenance status before using it.

16-40: The function ParseChainIDFromGenesis uses a callback function in the Decode method to process the JSON stream. This approach is efficient as it allows the function to stop processing as soon as the chain-id is found. However, the callback function returns a boolean value to indicate whether to continue processing the stream. This could be confusing as it's not immediately clear what the return value represents. It's recommended to add a comment explaining the purpose of the return value to improve code readability.
	err := decoder.Decode(func(mv *jstream.MetaValue) bool {
		if kv, ok := mv.Value.(jstream.KV); ok {
			if kv.Key == ChainIDFieldName {
				chain_id, ok = kv.Value.(string)
				// Return false to stop processing the stream
				return false
			}
		}
		// Return true to continue processing the stream
		return true
	})
33-39: The error handling in the function ParseChainIDFromGenesis could be improved. Currently, the function checks if the chain-id is a string and if it's found in the genesis file. If not, it returns an error. However, these checks are done after the Decode method is called. If the Decode method returns an error, these checks are unnecessary. It's recommended to check for errors immediately after the Decode method is called to improve performance and code readability.
	err := decoder.Decode(func(mv *jstream.MetaValue) bool {
		if kv, ok := mv.Value.(jstream.KV); ok {
			if kv.Key == ChainIDFieldName {
				chain_id, ok = kv.Value.(string)
				return false
			}
		}
		return true
	})
+   if err != nil {
+       return "", err
+   }
	if len(chain_id) > 0 {
		return chain_id, nil
	}
	if !ok {
		return "", errors.New("chain-id is not a string")
	}
-   if err == nil {
-       return "", errors.New("chain-id not found in genesis file")
-   }
-   return "", err
+   return "", errors.New("chain-id not found in genesis file")

types/chain_id.go

coderabbitai

Review Status

Actionable comments generated: 0
Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between c74ba3b and 7680cdd.

Files selected for processing (1)

types/chain_id.go (1 hunks)

Additional comments (Suppressed): 4

types/chain_id.go (4)

16-16: The jstream.NewDecoder(reader, 1).EmitKV() is used to create a new decoder with a buffer size of 1. This is a good practice for memory management, especially when dealing with large files. However, it's worth noting that a smaller buffer size might lead to more frequent I/O operations, which could potentially impact performance. If performance becomes an issue, consider increasing the buffer size.

22-26: The code checks if the current value is a key-value pair and if the key is ChainIDFieldName. If both conditions are met, it attempts to cast the value to a string and stops the decoding. This is a good practice as it stops the decoding as soon as the required data is found, saving unnecessary computation. However, it's worth noting that if the chain_id is not found, the decoding will continue until the end of the file, which could be time-consuming for large files.

30-39: The error handling here is comprehensive and provides clear error messages for different scenarios: when the chain_id is not a string, when it's not found in the genesis file, and when there's an error during decoding. This is a good practice as it helps with debugging and error tracking.

14-14: The TODO comment suggests considering the use of encoding/json/v2 when it's ready. This is a good practice as it shows forward-thinking and planning for future improvements. However, it's important to keep track of these TODO comments and address them in a timely manner to avoid accumulating technical debt.

facundomedica

LGTM, good catch!

types/chain_id.go

julienrbrt · 2023-10-23T19:47:01Z

Should we wait for upstream to merge your changes or you are going to maintain this fork?

odeke-em

Nice work and thanks @yihuang for the improvement to use streaming JSON. I've added some suggestions, please take a look

odeke-em · 2023-10-23T22:50:05Z

types/chain_id_test.go

+
+func BenchmarkParseChainID(b *testing.B) {
+	// an arbitrary genesis file from a cronos devnet.
+	json := `


Please commit this JSON to a file under a directory "testdata" so example "testdata/streaming_chain_id.json" and then read it in by file then kindly run b.ResetTimer() even before each sub-benchmark.

odeke-em · 2023-10-23T22:50:52Z

types/chain_id_test.go

+	b.Run("new", func(b *testing.B) {
+		b.ResetTimer()
+		for i := 0; i < b.N; i++ {
+			_, err := ParseChainIDFromGenesis(strings.NewReader(json))


Please assert that the parsed chainID is not empty

odeke-em · 2023-10-23T22:51:24Z

types/chain_id_test.go

+    "vesting": {}
+  }
+}
+`


Please also add: b.ReportAllocs()

server/util.go

odeke-em · 2023-10-23T22:58:00Z

types/chain_id_test.go

+package types
+
+import (
+	strings "strings"


Unnecessary

coderabbitai

Review Status

Actionable comments generated: 2

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between 7680cdd and 1035e32.

Files ignored due to filter (1)

types/testdata/streaming_chain_id.json

Files selected for processing (2)

types/chain_id.go (1 hunks)
types/chain_id_test.go (1 hunks)

Files skipped from review due to trivial changes (1)

types/chain_id_test.go

Additional comments: 1

types/chain_id.go (1)

14-57: The function ParseChainIDFromGenesis is well implemented. It uses a streaming JSON parsing approach which is more efficient in terms of memory usage and speed when dealing with large JSON files. The function also handles errors properly and validates the chain_id before returning it. However, the function's comment mentions that the user must ensure the chain_id is put before app_state or other big entries to enjoy the efficiency. This could be a potential issue if the user is not aware of this requirement. It would be better to handle this within the function itself or at least make it clear in the function's documentation.

types/chain_id.go

coderabbitai

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between 63b604c and a4cc48a.

Files selected for processing (1)

types/chain_id.go (1 hunks)

Additional comments: 3

types/chain_id.go (3)

15-19: The function comment has been updated to include the information that the function will return an error if the chain_id field is not found in the JSON file. This is a good improvement as it provides more clarity on the function's behavior.

20-60: The function ParseChainIDFromGenesis is well implemented. It uses a streaming JSON parser to parse the JSON file until the chain_id field is found, then aborts early. This approach is efficient, especially for large files. The function also handles errors properly and validates the chain_id using the validateChainID function.

62-71: The function validateChainID checks if the chain_id is not empty and if its length does not exceed the maximum allowed length. This is a good practice to ensure the validity of the chain_id.

coderabbitai

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between a4cc48a and 7a58af1.

Files selected for processing (1)

CHANGELOG.md (1 hunks)

Additional comments: 1

CHANGELOG.md (1)

70-70: The change log entry correctly summarizes the changes made in the PR. It provides a clear and concise description of the change and links to the relevant PR for more details.

julienrbrt

lgtm! Can you fix linting?

coderabbitai

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between 7a58af1 and 5147d1c.

Files selected for processing (1)

types/chain_id_test.go (1 hunks)

Additional comments: 4

types/chain_id_test.go (4)

1-11: The imports look fine and are relevant to the tests being performed. The use of the embed package to include a JSON file for benchmarking is a good practice.

13-14: The go:embed directive is used correctly to include the parse_chain_id.json file for benchmarking. The variable BenchmarkGenesis is appropriately named and will hold the contents of the JSON file.

16-108: The TestParseChainIDFromGenesis function is well-structured and covers a wide range of test cases, including success, nested, non-existent, invalid type, invalid JSON, empty chain_id, whitespace chain_id, and chain_id too long. The use of table-driven tests is a good practice in Go, and the test cases are comprehensive and cover all possible edge cases. The error messages are checked to ensure they match the expected error messages, which is a good practice for testing error handling.

110-130: The BenchmarkParseChainID function is well-structured and benchmarks the new ParseChainIDFromGenesis function against the old AppGenesisFromReader function. The use of b.ReportAllocs() is a good practice to measure memory allocation. The b.ResetTimer() function is used correctly to exclude setup time from the benchmark. The benchmark function is correctly structured to run the function b.N times, which is the standard way to write benchmarks in Go.

odeke-em · 2023-10-25T08:45:04Z

@yihuang an effective way to communicate the strength of your performance changes is by running benchmarks more than 5 times on a quiet machine with the -count=N flag say -count=10, then using benchstat to compare results https://pkg.go.dev/golang.org/x/perf/cmd/benchstat then splitting before and after into separate files and with the same name of benchmark result functions.
Otherwise in the current form, it requires one to eyeball and manually compare on their own, whereas after a bench stat comparison, anyone can trivially see the massive percentage improvements

but that means we need to add the same benchmark to the repo prior to this PR, right?

You already made the benchmark to compare both in the /old and /new. You'll just need to paste the contents of /old into old.txt then those of /new into new.txt, delete the differences in names then run

benchstat old.txt bench.txt

and paste the results into your commit message and this PR at the top.

coderabbitai

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between 5147d1c and ba767d9.

Files selected for processing (1)

types/chain_id_test.go (1 hunks)

Files skipped from review due to trivial changes (1)

types/chain_id_test.go

yihuang · 2023-10-25T14:58:16Z

@yihuang an effective way to communicate the strength of your performance changes is by running benchmarks more than 5 times on a quiet machine with the -count=N flag say -count=10, then using benchstat to compare results https://pkg.go.dev/golang.org/x/perf/cmd/benchstat then splitting before and after into separate files and with the same name of benchmark result functions.
Otherwise in the current form, it requires one to eyeball and manually compare on their own, whereas after a bench stat comparison, anyone can trivially see the massive percentage improvements

but that means we need to add the same benchmark to the repo prior to this PR, right?

You already made the benchmark to compare both in the /old and /new. You'll just need to paste the contents of /old into old.txt then those of /new into new.txt, delete the differences in names then run
benchstat old.txt bench.txt
and paste the results into your commit message and this PR at the top.

sounds good, updated PR description, thanks.

julienrbrt · 2023-10-25T18:17:28Z

types/chain_id.go

Can we maybe move this to genutil? Imho this makes more sense to have it there as its genesis related than in types.

coderabbitai

Review Status

Actionable comments generated: 2

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between ba767d9 and 2d3ba07.

Files ignored due to filter (1)

x/genutil/types/testdata/parse_chain_id.json

Files selected for processing (3)

server/util.go (1 hunks)
x/genutil/types/chain_id.go (1 hunks)
x/genutil/types/chain_id_test.go (1 hunks)

Additional comments: 4

server/util.go (1)
488-497: The new code introduces a more efficient way of parsing the chain_id from the genesis file. Instead of loading the entire file into memory, it uses a streaming JSON parser to parse the chain_id and aborts early. This approach significantly reduces memory usage and processing time. However, it's important to note that this approach assumes that the chain_id field is placed before any large entries in the JSON file for maximum efficiency.

While the use of panic is acceptable in this context as it's used to halt the program execution when the genesis file cannot be opened or parsed, it's generally recommended to handle errors gracefully and provide informative error messages to the user. In this case, the error messages are informative and provide context about the error.
-		appGenesis, err := genutiltypes.AppGenesisFromFile(filepath.Join(homeDir, "config", "genesis.json"))
-		if err != nil {
-			panic(err)
-		}
-		chainID = appGenesis.ChainID
+		reader, err := os.Open(filepath.Join(homeDir, "config", "genesis.json"))
+		if err != nil {
+			panic(err)
+		}
+		defer reader.Close()
+		chainID, err = genutiltypes.ParseChainIDFromGenesis(reader)
+		if err != nil {
+			panic(fmt.Errorf("failed to parse chain-id from genesis file: %w", err))
+		}
x/genutil/types/chain_id.go (2)

21-60: The function ParseChainIDFromGenesis is well-written and handles errors properly. It uses a streaming JSON parser to parse the chain_id and aborts early, which is a good practice for performance optimization. The function also validates the chain_id using the validateChainID function, which is a good practice for data validation.

62-71: The function validateChainID is well-written and handles errors properly. It checks if the chain_id is empty or exceeds the maximum length, which are good practices for data validation.
x/genutil/types/chain_id_test.go (1)
110-130: The benchmark tests for ParseChainIDFromGenesis and AppGenesisFromReader are well-structured. They measure the time and memory allocations for each function, which is useful for comparing their performance. However, the BenchmarkGenesis string is not shown in this hunk, so it's unclear what kind of data is being used for the benchmark. It would be helpful to include a comment describing the structure and size of the BenchmarkGenesis string for context.

Here's a suggestion to improve the error message checking in the test cases:
- require.Contains(t, err.Error(), tc.expError)
+ require.Equal(t, tc.expError, err.Error())
Committable suggestion (Beta)
func TestParseChainIDFromGenesis(t *testing.T) {
	testCases := []struct {
		name       string
		json       string
		expChainID string
		expError   string
	}{
		{
			"success",
			`{
				"state": {
				  "accounts": {
					"a": {}
				  }
				},
				"chain_id": "test-chain-id"
			}`,
			"test-chain-id",
			"",
		},
		{
			"nested",
			`{
				"state": {
				  "accounts": {
					"a": {}
				  },
				  "chain_id": "test-chain-id"
				}
			}`,
			"",
			"missing chain-id in genesis file",
		},
		{
			"not exist",
			`{
				"state": {
				  "accounts": {
					"a": {}
				  }
				},
				"chain-id": "test-chain-id"
			}`,
			"",
			"missing chain-id in genesis file",
		},
		{
			"invalid type",
			`{
				"chain-id": 1,
			}`,
			"",
			"invalid character '}' looking for beginning of object key string",
		},
		{
			"invalid json",
			`[ " ': }`,
			"",
			"expected {, got [",
		},
		{
			"empty chain_id",
			`{"chain_id": ""}`,
			"",
			"genesis doc must include non-empty chain_id",
		},
		{
			"whitespace chain_id",
			`{"chain_id": "   "}`,
			"",
			"genesis doc must include non-empty chain_id",
		},
		{
			"chain_id too long",
			`{"chain_id": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"}`,
			"",
			"chain_id in genesis doc is too long",
		},
	}

	for _, tc := range testCases {
		t.Run(tc.name, func(t *testing.T) {
			chain_id, err := types.ParseChainIDFromGenesis(strings.NewReader(tc.json))
			if tc.expChainID == "" {
				require.Error(t, err)
				require.Equal(t, tc.expError, err.Error())
			} else {
				require.NoError(t, err)
				require.Equal(t, tc.expChainID, chain_id)
			}
		})
	}
}

x/genutil/types/chain_id.go

coderabbitai · 2023-10-26T01:54:42Z

x/genutil/types/chain_id_test.go

+func TestParseChainIDFromGenesis(t *testing.T) {
+	testCases := []struct {
+		name       string
+		json       string
+		expChainID string
+		expError   string
+	}{
+		{
+			"success",
+			`{
+				"state": {
+				  "accounts": {
+					"a": {}
+				  }
+				},
+				"chain_id": "test-chain-id"
+			}`,
+			"test-chain-id",
+			"",
+		},
+		{
+			"nested",
+			`{
+				"state": {
+				  "accounts": {
+					"a": {}
+				  },
+				  "chain_id": "test-chain-id"
+				}
+			}`,
+			"",
+			"missing chain-id in genesis file",
+		},
+		{
+			"not exist",
+			`{
+				"state": {
+				  "accounts": {
+					"a": {}
+				  }
+				},
+				"chain-id": "test-chain-id"
+			}`,
+			"",
+			"missing chain-id in genesis file",
+		},
+		{
+			"invalid type",
+			`{
+				"chain-id": 1,
+			}`,
+			"",
+			"invalid character '}' looking for beginning of object key string",
+		},
+		{
+			"invalid json",
+			`[ " ': }`,
+			"",
+			"expected {, got [",
+		},
+		{
+			"empty chain_id",
+			`{"chain_id": ""}`,
+			"",
+			"genesis doc must include non-empty chain_id",
+		},
+		{
+			"whitespace chain_id",
+			`{"chain_id": "   "}`,
+			"",
+			"genesis doc must include non-empty chain_id",
+		},
+		{
+			"chain_id too long",
+			`{"chain_id": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"}`,
+			"",
+			"chain_id in genesis doc is too long",
+		},
+	}
+
+	for _, tc := range testCases {
+		t.Run(tc.name, func(t *testing.T) {
+			chain_id, err := types.ParseChainIDFromGenesis(strings.NewReader(tc.json))
+			if tc.expChainID == "" {
+				require.Error(t, err)
+				require.Contains(t, err.Error(), tc.expError)
+			} else {
+				require.NoError(t, err)
+				require.Equal(t, tc.expChainID, chain_id)
+			}
+		})
+	}
+}


The test cases for TestParseChainIDFromGenesis are well-structured and cover a variety of scenarios, including successful parsing, missing chain_id, invalid JSON, and chain_id that is too long, empty, or contains only whitespace. This is a good practice as it ensures that the function behaves as expected in different situations. However, there is a minor issue with the error message checking. The test cases are checking for a substring of the error message, which might lead to false positives if the error messages are changed in the future. It would be better to compare the entire error message to ensure accuracy.

coderabbitai

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between 2d3ba07 and 310b580.

Files selected for processing (1)

CHANGELOG.md (1 hunks)

Additional comments: 1

CHANGELOG.md (1)

70-70: The changelog entry correctly summarizes the changes made in the PR. It mentions the use of a streaming JSON parser to parse the chain_id from the genesis file, which is the main change in this PR.

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

coderabbitai

Review Status

Actionable comments generated: 1

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between 310b580 and 1bb078f.

Files selected for processing (3)

CHANGELOG.md (1} hunks)
server/util.go (1} hunks)
x/genutil/types/chain_id.go (1} hunks)

Additional comments: 5

CHANGELOG.md (1)

70-70: The change log entry correctly summarizes the changes made in the PR. It mentions the use of a streaming JSON parser to parse the chain_id from the genesis file, which aligns with the PR summary.

x/genutil/types/chain_id.go (4)

1-11: The import statements look fine. No unused imports detected.

13-14: The constant ChainIDFieldName is well defined and can be reused in other parts of the codebase.

15-58: The function ParseChainIDFromGenesis is well implemented. It uses a streaming JSON parser to parse the genesis JSON file until the chain_id field is found, which is a good approach for handling large files. The function also handles errors appropriately and validates the chain_id using the validateChainID function.

60-69: The function validateChainID is well implemented. It checks if the chain_id is not empty and if its length does not exceed the maximum allowed length. The function also handles errors appropriately.

coderabbitai · 2023-10-26T08:08:57Z

server/util.go

+		reader, err := os.Open(filepath.Join(homeDir, "config", "genesis.json"))
 		if err != nil {
 			panic(err)
 		}
+		defer reader.Close()

-		chainID = appGenesis.ChainID
+		chainID, err = genutiltypes.ParseChainIDFromGenesis(reader)
+		if err != nil {
+			panic(fmt.Errorf("failed to parse chain-id from genesis file: %w", err))
+		}


The new code introduces a more efficient way of parsing the chain_id from the genesis JSON file. Instead of loading the entire file into memory, it uses a streaming JSON parser to parse the file until the chain_id field is found. This approach significantly reduces memory usage and processing time. However, the use of panic for error handling is not recommended. It would be better to return the error to the caller and let it decide how to handle the error. This would make the code more robust and easier to debug.

- panic(err) + return "", fmt.Errorf("failed to open genesis file: %w", err)

- panic(fmt.Errorf("failed to parse chain-id from genesis file: %w", err)) + return "", fmt.Errorf("failed to parse chain-id from genesis file: %w", err)

Committable suggestion (Beta)

Suggested change

reader, err := os.Open(filepath.Join(homeDir, "config", "genesis.json"))

if err != nil {

panic(err)

}

defer reader.Close()

chainID = appGenesis.ChainID

chainID, err = genutiltypes.ParseChainIDFromGenesis(reader)

if err != nil {

panic(fmt.Errorf("failed to parse chain-id from genesis file: %w", err))

}

chainID := cast.ToString(appOpts.Get(flags.FlagChainID))

if chainID == "" {

// fallback to genesis chain-id

reader, err := os.Open(filepath.Join(homeDir, "config", "genesis.json"))

if err != nil {

return "", fmt.Errorf("failed to open genesis file: %w", err)

}

defer reader.Close()

chainID, err = genutiltypes.ParseChainIDFromGenesis(reader)

if err != nil {

return "", fmt.Errorf("failed to parse chain-id from genesis file: %w", err)

}

}

snapshotStore, err := GetSnapshotStore(appOpts)

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: Julien Robert <julien@rbrt.fr> (cherry picked from commit 793ca9a) # Conflicts: # CHANGELOG.md

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: Julien Robert <julien@rbrt.fr>

…8204) (#18267) Co-authored-by: yihuang <huang@crypto.com> Co-authored-by: marbar3778 <marbar3778@yahoo.com>

Problem: parse chain-id from big genesis file could be slow

192261b

Solution: - use streaming json parsing for that

yihuang commented Oct 23, 2023

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

Update CHANGELOG.md

7c59903

github-advanced-security bot found potential problems Oct 23, 2023

View reviewed changes

server/util.go Dismissed Show dismissed Hide dismissed

fix unit test

ee9b00a

julienrbrt reviewed Oct 23, 2023

View reviewed changes

types/chain_id.go Outdated Show resolved Hide resolved

julienrbrt changed the title ~~Problem: parse chain-id from big genesis file could be slow~~ perf: parse chain-id from big genesis file could be slow Oct 23, 2023

yihuang added 2 commits October 23, 2023 15:27

patch the library to support early abort

567ee2d

add benchmarking

55be3ec

yihuang marked this pull request as ready for review October 23, 2023 07:38

yihuang requested a review from a team as a code owner October 23, 2023 07:38

yihuang requested a review from julienrbrt October 23, 2023 07:40

mention encoding/json/v2

9c220f5

coderabbitai bot reviewed Oct 23, 2023

View reviewed changes

fix error handling

c74ba3b

coderabbitai bot reviewed Oct 23, 2023

View reviewed changes

github-advanced-security bot found potential problems Oct 23, 2023

View reviewed changes

types/chain_id.go Fixed Show fixed Hide fixed

fix shadow var

7680cdd

coderabbitai bot reviewed Oct 23, 2023

View reviewed changes

tac0turtle assigned kocubinski, julienrbrt and likhita-809 Oct 23, 2023

facundomedica approved these changes Oct 23, 2023

View reviewed changes

alexanderbez reviewed Oct 23, 2023

View reviewed changes

types/chain_id.go Outdated Show resolved Hide resolved

odeke-em requested changes Oct 23, 2023

View reviewed changes

use simpler solution with different trade off

1035e32

coderabbitai bot reviewed Oct 24, 2023

View reviewed changes

types/chain_id.go Outdated Show resolved Hide resolved

Merge branch 'main' into parse-chain-id

7a58af1

coderabbitai bot reviewed Oct 25, 2023

View reviewed changes

julienrbrt approved these changes Oct 25, 2023

View reviewed changes

yihuang added 2 commits October 25, 2023 16:07

fix cyclic import

5147d1c

Merge branch 'main' into parse-chain-id

5d99b23

coderabbitai bot reviewed Oct 25, 2023

View reviewed changes

fix lint

ba767d9

coderabbitai bot reviewed Oct 25, 2023

View reviewed changes

likhita-809 approved these changes Oct 25, 2023

View reviewed changes

julienrbrt reviewed Oct 25, 2023

View reviewed changes

yihuang added 2 commits October 26, 2023 09:51

move directory

2d3ba07

Merge branch 'main' into parse-chain-id

310b580

coderabbitai bot reviewed Oct 26, 2023

View reviewed changes

yihuang and others added 2 commits October 26, 2023 10:07

Update x/genutil/types/chain_id.go

5145bac

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

Merge branch 'main' into parse-chain-id

1bb078f

julienrbrt enabled auto-merge October 26, 2023 08:02

coderabbitai bot reviewed Oct 26, 2023

View reviewed changes

julienrbrt added this pull request to the merge queue Oct 26, 2023

Merged via the queue into cosmos:main with commit 793ca9a Oct 26, 2023
47 of 54 checks passed

mergify bot mentioned this pull request Oct 26, 2023

perf: parse chain-id from big genesis file could be slow (backport #18204) #18267

Merged

yihuang deleted the parse-chain-id branch October 26, 2023 09:19

tac0turtle added a commit that referenced this pull request Oct 26, 2023

perf: parse chain-id from big genesis file could be slow (backport #1…

d895753

…8204) (#18267) Co-authored-by: yihuang <huang@crypto.com> Co-authored-by: marbar3778 <marbar3778@yahoo.com>

coderabbitai bot mentioned this pull request Dec 4, 2023

feat: secp256k1 public key constant time #18026

Merged

20 tasks

perf: parse chain-id from big genesis file could be slow #18204

perf: parse chain-id from big genesis file could be slow #18204

Conversation

yihuang commented Oct 23, 2023 • edited by coderabbitai bot Loading

Description

Author Checklist

Reviewers Checklist

Summary by CodeRabbit

coderabbitai bot commented Oct 23, 2023 • edited Loading

Walkthrough

Changes

Chat with CodeRabbit Bot (@coderabbitai)

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configration File (.coderabbit.yaml)

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Oct 23, 2023

Choose a reason for hiding this comment

coderabbitai bot Oct 23, 2023

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

facundomedica left a comment

Choose a reason for hiding this comment

julienrbrt commented Oct 23, 2023

odeke-em left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

julienrbrt left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

odeke-em commented Oct 25, 2023

coderabbitai bot left a comment

Choose a reason for hiding this comment

yihuang commented Oct 25, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Oct 26, 2023

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Oct 26, 2023

Choose a reason for hiding this comment

yihuang commented Oct 23, 2023 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 23, 2023 •

edited

Loading

Chat with CodeRabbit Bot (`@coderabbitai`)

CodeRabbit Configration File (`.coderabbit.yaml`)