Make the OuterRun class mutable #2417

diesalbla · 2021-05-21T23:52:15Z

Currently, stream compilation would create one new instance of an OuterRun for each Chunk that is emitted up to the top level, even if we are just folding them or merely ignoring them (draning them). Perhaps a little bit of mutability would help.

Not sure if this mutable variable would add much risk, since compilation proceeds in sequence, and separate compilations of a same source string would still use separate OuterRun instances.

mpilquist · 2021-05-22T02:01:14Z

I like the idea though I think we need to understand performance improvements gained by these optimizations since they add complexity.

diesalbla · 2021-05-22T16:59:24Z

So, for a first approximation, I run some of the benchmarks we have in StreamBenchmark, using the command:

 jmh:run  -i 8 -wi 2 -f1 -t2 -gc true -jvmArgs "-XX:MaxRecursiveInlineLevel=3 -XX:MaxInlineSize=50" .*StreamBenchmark -prof gc

This was run on the Graal Java Virtual Machine, Graal VM 21.1.0.r16 (for Java 16). These are the results that I got comparing the measures gc.alloc.rate.norm and gc.churn.G1_Eden_Space.norm between the mail branch (old) and this branch (new)

.rate.norm (B/op)	N	Measure New	Error New	Measure Old	Error Old	New / old
leftAssocConcat:·	100	31604.383	0.751	34812.001	0.263	90.79
leftAssocConcat:·	1000	301797.049	69.894	333769.964	1.591	90.42
leftAssocConcat:·	10000	3007982.467	37.518	3327989.057	41.607	90.38
leftAssocFlatMap:·	100	64686.748	0.432	69407.35	0.783	93.2
leftAssocFlatMap:·	1000	674472.355	44.366	626439.517	4.008	107.67
leftAssocFlatMap:·	10000	6805196.417	44.999	6325217.453	66.376	107.59
rightAssocConcat:·	100	31643.933	0.503	34844.707	0.583	90.81
rightAssocConcat:·	1000	301769.522	1.397	333843.155	1.114	90.39
rightAssocConcat:·	10000	3007975.984	32.146	3327988.397	38.057	90.38
rightAssocFlatMap:·	100	68887.403	0.444	68919.581	0.444	99.95
rightAssocFlatMap:·	1000	667119.701	2.007	667112.877	1.767	100
rightAssocFlatMap:·	10000	6655600.623	40.765	6655576.533	89.728	100

Eden Space Churn norm	N	Measure New	Error New	Measure Old	Error Old	New / old
leftAssocConcat:·	100	31890.628	98.464	35121.478	94.564	90.8
leftAssocConcat:·	1000	303446.839	821.084	336063.58	1199.537	90.29
leftAssocConcat:·	10000	3016541.114	15849.264	3334790.278	18096.712	90.46
leftAssocFlatMap:·	100	65183.918	148.892	69975.284	164.421	93.15
leftAssocFlatMap:·	1000	679004.059	1239.28	631370.602	1845.723	107.54
leftAssocFlatMap:·	10000	6809133.462	38275.629	6339112.342	32213.896	107.41
rightAssocConcat:·	100	31947.028	70.083	35192.446	94.051	90.78
rightAssocConcat:·	1000	303451.834	467.402	336040.344	962.363	90.3
rightAssocConcat:·	10000	3014998.595	14921.215	3335184.284	22846.967	90.4
rightAssocFlatMap:·	100	69406.101	110.444	69519.644	131.493	99.84
rightAssocFlatMap:·	1000	671955.604	1671.515	671544.489	810.217	100.06
rightAssocFlatMap:·	10000	6662174.372	41800.489	6653812.956	44921.598	100.13

diesalbla · 2021-05-23T05:52:08Z

Here is a second measure of those benchmarks, using a different Java Virtual Machine. This ones were obtained running with the AdoptOpenJDK Java 16.0.1 virtual machine. Also, since the errors were low, I run them with 4 iterations rather than 8.

·gc.alloc.rate.norm
leftAssocConcat:	100	31716.881	2.234	34917.027	2.791	90.83
leftAssocConcat:	1000	301748.907	6.983	333734.715	3.9	90.42
leftAssocConcat:	10000	3103367.87	476744.821	3406855.145	424965.523	91.09
leftAssocFlatMap:	100	64592.28	1.669	64640.331	2.41	99.93
leftAssocFlatMap:	1000	626348.91	3.639	626396.924	4.673	99.99
leftAssocFlatMap:	10000	6325960.119	10465.984	6348622.021	303277.025	99.64
rightAssocConcat:	100	31644.794	2.357	34917.246	1.7	90.63
rightAssocConcat:	1000	301744.348	33.515	333734.998	4.018	90.41
rightAssocConcat:	10000	3077759.365	523817.332	3411297.725	428605.64	90.22
rightAssocFlatMap:	100	64016.366	1.605	64088.275	1.225	99.89
rightAssocFlatMap:	1000	619099.768	5.208	619118.39	10.492	100
rightAssocFlatMap:	10000	6203741.645	365083.039	6218750.335	327520.668	99.76

In this case, the "optimisation" seems to have no impact for some benchmarks, bur for others it seems to reduce the allocation rate by a 10%. Of course, these may be extreme cases. Is there any other "more realistic" benchmark worth trying out?

Make the OuterRun class mutable

51082b2

diesalbla added the help wanted label May 21, 2021

diesalbla requested a review from mpilquist May 28, 2021 01:51

diesalbla added Performance and removed help wanted labels May 28, 2021

mpilquist merged commit 0c0ce2f into typelevel:main May 28, 2021

diesalbla deleted the make_outerrun_mutable branch July 31, 2021 20:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make the OuterRun class mutable #2417

Make the OuterRun class mutable #2417

diesalbla commented May 21, 2021

mpilquist commented May 22, 2021

diesalbla commented May 22, 2021 •

edited

Loading

diesalbla commented May 23, 2021 •

edited

Loading

Make the OuterRun class mutable #2417

Make the OuterRun class mutable #2417

Conversation

diesalbla commented May 21, 2021

mpilquist commented May 22, 2021

diesalbla commented May 22, 2021 • edited Loading

diesalbla commented May 23, 2021 • edited Loading

diesalbla commented May 22, 2021 •

edited

Loading

diesalbla commented May 23, 2021 •

edited

Loading