Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the OuterRun class mutable #2417

Merged
merged 1 commit into from
May 28, 2021

Conversation

diesalbla
Copy link
Contributor

Currently, stream compilation would create one new instance of an OuterRun for each Chunk that is emitted up to the top level, even if we are just folding them or merely ignoring them (draning them). Perhaps a little bit of mutability would help.

Not sure if this mutable variable would add much risk, since compilation proceeds in sequence, and separate compilations of a same source string would still use separate OuterRun instances.

@mpilquist
Copy link
Member

I like the idea though I think we need to understand performance improvements gained by these optimizations since they add complexity.

@diesalbla
Copy link
Contributor Author

diesalbla commented May 22, 2021

So, for a first approximation, I run some of the benchmarks we have in StreamBenchmark, using the command:

 jmh:run  -i 8 -wi 2 -f1 -t2 -gc true -jvmArgs "-XX:MaxRecursiveInlineLevel=3 -XX:MaxInlineSize=50" .*StreamBenchmark -prof gc

This was run on the Graal Java Virtual Machine, Graal VM 21.1.0.r16 (for Java 16). These are the results that I got comparing the measures gc.alloc.rate.norm and gc.churn.G1_Eden_Space.norm between the mail branch (old) and this branch (new)

.rate.norm (B/op) N Measure New Error New Measure Old Error Old New / old
leftAssocConcat:· 100 31604.383 0.751 34812.001 0.263 90.79
leftAssocConcat:· 1000 301797.049 69.894 333769.964 1.591 90.42
leftAssocConcat:· 10000 3007982.467 37.518 3327989.057 41.607 90.38
leftAssocFlatMap:· 100 64686.748 0.432 69407.35 0.783 93.2
leftAssocFlatMap:· 1000 674472.355 44.366 626439.517 4.008 107.67
leftAssocFlatMap:· 10000 6805196.417 44.999 6325217.453 66.376 107.59
rightAssocConcat:· 100 31643.933 0.503 34844.707 0.583 90.81
rightAssocConcat:· 1000 301769.522 1.397 333843.155 1.114 90.39
rightAssocConcat:· 10000 3007975.984 32.146 3327988.397 38.057 90.38
rightAssocFlatMap:· 100 68887.403 0.444 68919.581 0.444 99.95
rightAssocFlatMap:· 1000 667119.701 2.007 667112.877 1.767 100
rightAssocFlatMap:· 10000 6655600.623 40.765 6655576.533 89.728 100
Eden Space Churn norm N Measure New Error New Measure Old Error Old New / old
leftAssocConcat:· 100 31890.628 98.464 35121.478 94.564 90.8
leftAssocConcat:· 1000 303446.839 821.084 336063.58 1199.537 90.29
leftAssocConcat:· 10000 3016541.114 15849.264 3334790.278 18096.712 90.46
leftAssocFlatMap:· 100 65183.918 148.892 69975.284 164.421 93.15
leftAssocFlatMap:· 1000 679004.059 1239.28 631370.602 1845.723 107.54
leftAssocFlatMap:· 10000 6809133.462 38275.629 6339112.342 32213.896 107.41
rightAssocConcat:· 100 31947.028 70.083 35192.446 94.051 90.78
rightAssocConcat:· 1000 303451.834 467.402 336040.344 962.363 90.3
rightAssocConcat:· 10000 3014998.595 14921.215 3335184.284 22846.967 90.4
rightAssocFlatMap:· 100 69406.101 110.444 69519.644 131.493 99.84
rightAssocFlatMap:· 1000 671955.604 1671.515 671544.489 810.217 100.06
rightAssocFlatMap:· 10000 6662174.372 41800.489 6653812.956 44921.598 100.13

@diesalbla
Copy link
Contributor Author

diesalbla commented May 23, 2021

Here is a second measure of those benchmarks, using a different Java Virtual Machine. This ones were obtained running with the AdoptOpenJDK Java 16.0.1 virtual machine. Also, since the errors were low, I run them with 4 iterations rather than 8.

·gc.alloc.rate.norm            
leftAssocConcat: 100 31716.881 2.234 34917.027 2.791 90.83
leftAssocConcat: 1000 301748.907 6.983 333734.715 3.9 90.42
leftAssocConcat: 10000 3103367.87 476744.821 3406855.145 424965.523 91.09
leftAssocFlatMap: 100 64592.28 1.669 64640.331 2.41 99.93
leftAssocFlatMap: 1000 626348.91 3.639 626396.924 4.673 99.99
leftAssocFlatMap: 10000 6325960.119 10465.984 6348622.021 303277.025 99.64
rightAssocConcat: 100 31644.794 2.357 34917.246 1.7 90.63
rightAssocConcat: 1000 301744.348 33.515 333734.998 4.018 90.41
rightAssocConcat: 10000 3077759.365 523817.332 3411297.725 428605.64 90.22
rightAssocFlatMap: 100 64016.366 1.605 64088.275 1.225 99.89
rightAssocFlatMap: 1000 619099.768 5.208 619118.39 10.492 100
rightAssocFlatMap: 10000 6203741.645 365083.039 6218750.335 327520.668 99.76

In this case, the "optimisation" seems to have no impact for some benchmarks, bur for others it seems to reduce the allocation rate by a 10%. Of course, these may be extreme cases. Is there any other "more realistic" benchmark worth trying out?

@diesalbla diesalbla requested a review from mpilquist May 28, 2021 01:51
@mpilquist mpilquist merged commit 0c0ce2f into typelevel:main May 28, 2021
@diesalbla diesalbla deleted the make_outerrun_mutable branch July 31, 2021 20:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants