Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add benchmarks to compare protocompile perf and memory usage against other compilers #64

Merged
merged 5 commits into from
Oct 12, 2022

Conversation

jhump
Copy link
Member

@jhump jhump commented Oct 11, 2022

This includes some tweaks to reduce memory usage in protocompile, too.

Perf Summary:

  • protocompile is about 20% faster than protoparse.
  • protocompile allocates about 14% less memory than protoparse (* caveats below).
  • When skipping source code info, protocompile's advantage in speed and memory is more pronounced: 50% faster and about 35% less allocation.
  • Due to parallelism, both are faster than protoc (protocompile natively; and protoparse using "chunks" that are run concurrently like Buf CLI currently does) . Though protoc is the clear winner in single-threaded mode (not unexpected since it's optimized C++).
  • (Ignore the memory stats for the protoc benchmark. Since it just shells out to an external executable, no memory is really used inside the Go process.)

Caveats:

  • The actual descriptor result for protocompile is about 26-28% larger than protoparse.
    • This could be the use of many smaller maps instead of fewer larger ones (protoparse does linking in batch, with large tables; protocompile does it file-at-a-time with tables per file).
    • At first I suspected it's the way source code info is modeled in the v2 API (which protocompile implements; protoparse does not). But even when not generating source code info, the size difference persists.
  • The AST representation in protocompile, on the other hand, is less than half as big as protoparse.
  • Also, protocompile can release the AST as it goes (as soon as it's no longer needed, after a file is complete), so the total memory pressure for a large compilation is just the total for descriptor sizes. But in protoparse, the total memory pressure for a large compilation is the descriptor sizes plus the AST sizes. With that in mind, the total memory pressure for protocompile is a little less than half that of protoparse.

This includes my own implementation of code to measure the memory usage of an object. It tries to avoid double-counting memory by using a range tree to track regions of counted memory. I couldn't find anything that does this with some Google-searching... but maybe I missed it? Is there something that exists already like this? The tests also run runtime.GC() and print the total heap usage, as a sanity check that the measured size is reasonable (since it is close to the reported total heap usage).

I'm using googleapis since I thought a large batch of files was a more interesting performance test than trying to do something like a micro-benchmark (which wouldn't be able to exercise the same variety and mix of call paths). This downloads it (much like similar tests in the buf repo).


Here's the output of running tests and benchmarks in the new package:

> go test . -bench . -benchmem -v
Downloaded https://github.com/googleapis/googleapis/archive/cb6fbe8784479b22af38c09a5039d8983e894566.tar.gz; 6409073 bytes.
Expanded archive into 5362 files.
3944 total source files found in googleapis (29762935 bytes).
=== RUN   TestGoogleapis_Protocompile_Memory
    benchmark_test.go:515: (heap used: 377788896 bytes)
    benchmark_test.go:520: memory used: 330460518 bytes
--- PASS: TestGoogleapis_Protocompile_Memory (3.28s)
=== RUN   TestGoogleapis_Protocompile_Memory_NoSourceInfo
    benchmark_test.go:515: (heap used: 124096232 bytes)
    benchmark_test.go:520: memory used: 99613298 bytes
--- PASS: TestGoogleapis_Protocompile_Memory_NoSourceInfo (2.00s)
=== RUN   TestGoogleapis_Protocompile_ASTMemory
    benchmark_test.go:515: (heap used: 220590128 bytes)
    benchmark_test.go:520: memory used: 210632961 bytes
--- PASS: TestGoogleapis_Protocompile_ASTMemory (3.05s)
=== RUN   TestGoogleapis_Protoparse_Memory
    benchmark_test.go:515: (heap used: 299199496 bytes)
    benchmark_test.go:520: memory used: 258438616 bytes
--- PASS: TestGoogleapis_Protoparse_Memory (5.88s)
=== RUN   TestGoogleapis_Protoparse_Memory_NoSourceInfo
    benchmark_test.go:515: (heap used: 95685520 bytes)
    benchmark_test.go:520: memory used: 81485001 bytes
--- PASS: TestGoogleapis_Protoparse_Memory_NoSourceInfo (4.14s)
=== RUN   TestGoogleapis_Protoparse_ASTMemory
    benchmark_test.go:515: (heap used: 481703504 bytes)
    benchmark_test.go:520: memory used: 428383786 bytes
--- PASS: TestGoogleapis_Protoparse_ASTMemory (5.44s)
=== RUN   TestMeasuringTapeInsert
--- PASS: TestMeasuringTapeInsert (0.00s)
=== RUN   TestMeasuringTapeMeasure
--- PASS: TestMeasuringTapeMeasure (0.02s)
=== RUN   TestNumBuckets
--- PASS: TestNumBuckets (0.00s)
goos: darwin
goarch: arm64
pkg: github.com/jhump/protocompile/internal/benchmarks
BenchmarkGoogleapis_Protocompile
BenchmarkGoogleapis_Protocompile-10                   	       2	 935789396 ns/op	1936122840 B/op	29299976 allocs/op
BenchmarkGoogleapis_Protocompile_Canonical
BenchmarkGoogleapis_Protocompile_Canonical-10         	       2	 941203666 ns/op	2027678888 B/op	29467291 allocs/op
BenchmarkGoogleapis_Protocompile_NoSourceInfo
BenchmarkGoogleapis_Protocompile_NoSourceInfo-10      	       2	 669197125 ns/op	1232668852 B/op	18323988 allocs/op
BenchmarkGoogleapis_Protoparse
BenchmarkGoogleapis_Protoparse-10                     	       1	1153496875 ns/op	2298790208 B/op	40921638 allocs/op
BenchmarkGoogleapis_Protoparse_NoSourceInfo
BenchmarkGoogleapis_Protoparse_NoSourceInfo-10        	       2	1021327542 ns/op	1893843760 B/op	35178044 allocs/op
BenchmarkGoogleapis_Protoc
BenchmarkGoogleapis_Protoc-10                         	       1	1622085958 ns/op	  661088 B/op	      74 allocs/op
BenchmarkGoogleapis_Protoc_NoSourceInfo
BenchmarkGoogleapis_Protoc_NoSourceInfo-10            	       1	1316672500 ns/op	  661072 B/op	      73 allocs/op
BenchmarkGoogleapis_Protocompile_SingleThreaded
BenchmarkGoogleapis_Protocompile_SingleThreaded-10    	       1	3618983167 ns/op	1935788240 B/op	29296333 allocs/op
BenchmarkGoogleapis_Protoparse_SingleThreaded
BenchmarkGoogleapis_Protoparse_SingleThreaded-10      	       1	3141516333 ns/op	2038783304 B/op	36412126 allocs/op
PASS
ok  	github.com/jhump/protocompile/internal/benchmarks	219.629s

Fixes TCN-545

Comment on lines +529 to +531
if !t.e.c.RetainASTs {
file.RemoveAST()
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one significant improvement for memory usage in protocompile. It allows the AST to be dropped early in the process, so a large compilation doesn't accumulate a large amount of ASTs pinned on the heap.

@@ -0,0 +1,21 @@
module github.com/jhump/protocompile/internal/benchmarks
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made this a separate module so that go test ./... in the repo root would exclude these.

Also, I didn't want to pull in github.com/jhump/protoreflect as a dependency in the root go.mod.

@@ -373,7 +378,7 @@ type srcLocs struct {
protoreflect.SourceLocations
file *result
locs []protoreflect.SourceLocation
index map[interface{}]protoreflect.SourceLocation
index map[interface{}]int
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was another big win for reducing memory usage in protocompile.

@@ -43,7 +43,12 @@ type packageSymbols struct {
children map[protoreflect.FullName]*packageSymbols
files map[protoreflect.FileDescriptor]struct{}
symbols map[protoreflect.FullName]symbolEntry
exts map[protoreflect.FullName]map[protoreflect.FieldNumber]ast.SourcePos
exts map[extNumber]ast.SourcePos
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looked like low-hanging fruit -- nested maps are often less dense/more wasteful than a single map. But files typically don't have enough extensions for this to be a material source of memory usage. So this wasn't really a noticeable improvement. I left the change in anyway since it made the insertion logic cleaner and the tests a little less verbose.

Copy link
Member

@pkwarren pkwarren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice!

use (
.
./internal/benchmarks
./internal/tools
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

internal/benchmarks/benchmark_test.go Outdated Show resolved Hide resolved
internal/benchmarks/benchmark_test.go Outdated Show resolved Hide resolved
internal/benchmarks/benchmark_test.go Outdated Show resolved Hide resolved
internal/benchmarks/benchmark_test.go Outdated Show resolved Hide resolved
internal/benchmarks/benchmark_test.go Outdated Show resolved Hide resolved
@jhump jhump enabled auto-merge (squash) October 12, 2022 20:40
@jhump jhump merged commit 36825b1 into main Oct 12, 2022
@jhump jhump deleted the jh/add-benchmark branch October 12, 2022 20:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants