built-in accurate and cross platform Memory Diagnoser #284

adamsitnik · 2016-10-16T16:48:29Z

Fixed isses:
#186 - don't include allocations from setup and cleanup
#200 - be accurate about allocated bytes/op, this was possible mostly due to #277
#208 - without using ETW tests are passing on appveyor now
#133 - scale the results, make them stable

Extras:

cross platform Memory Diagnoser
Memory Diagnoser as part of BenchmarkDotNet.Core, enabled by default

Todos:

Recently MS added GC.GetAllocatedBytesForCurrentThread to public api surface. We can't use it yet, because this version of Runtime is not yet available at nuget.org. As soon as they release it it should be relatively easy to use it. This will give us bytes allocated per operation for .NET Core as well. Done
~~When we benchmark Task-returning method we call .GetAwaiter, which most probably allocates memory. This should be verified, if it's true then cost should be excluded from final results.~~ verified, awaiter is struct, no extra allocations ;)

#200

adamsitnik · 2016-10-18T14:02:06Z

@mattwarren @AndreyAkinshin any comments on the PR?

AndreyAkinshin · 2016-10-18T15:14:08Z

@adamsitnik, the source code looks good to me. Give me some time, I will check how it works on Linux.

src/BenchmarkDotNet.Core/Engines/GcStats.cs

+            // AppDomain. The number is accurate as of the last garbage collection." - CLR via C#
+            // so we enforce GC.Collect here just to make sure we get accurate results
+            GC.Collect();
+            return AppDomain.CurrentDomain.MonitoringTotalAllocatedMemorySize;


src/BenchmarkDotNet.Core/Diagnosers/MemoryDiagnoser.cs

+                if (results.ContainsKey(benchmark))
+                {
+                    var result = results[benchmark];
+                    // TODO scale this based on the minimum value in the column, i.e. use B/KB/MB as appropriate


mattwarren · 2016-11-02T15:37:46Z

@adamsitnik Finally had a chance to play with this, it's really nice, great job!!

I've tested out scenarios similar to #186 and #200 and I agree that they're fixed.

I also tested #133 (reported by @xoofx). To do this I ran the IntroGC benchmark 3 times, with a modified build that prints out the Total # of Ops per/benchmark, I got the following results:

Method	GcServer	GcForce	Run 1	Run 2	Run 3
'stackalloc byte[10KB]'	False	True	1,006,632,960	1,006,632,960	1,006,632,960
'stackalloc byte[10KB]'	False	False	1,006,632,960	1,006,632,960	1,006,632,960
'stackalloc byte[10KB]'	True	True	503,316,480	503,316,480	503,316,480
'stackalloc byte[10KB]'	True	False	503,316,480	503,316,480	503,316,480
'new byte[10KB]'	False	True	7,864,320	7,864,320	7,864,320
'new byte[10KB]'	False	False	7,864,320	7,864,320	7,864,320
'new byte[10KB]'	True	True	7,864,320	7,864,320	7,864,320
'new byte[10KB]'	True	False	7,864,320	7,864,320	7,864,320

Also as you can see in the outputs below, the # of Gen 0 collections and Bytes Allocated/Op are stable across the runs (Full output Run1, Run 2, Run 3)

Run 1

Method	GcServer	GcForce	Mean	StdErr	StdDev	Median	Gen 0	Gen 1	Gen 2	Bytes Allocated/Op
'stackalloc byte[10KB]'	False	True	1.0580 ns	0.1378 ns	1.0671 ns	1.0284 ns	-	-	-	0.00
'stackalloc byte[10KB]'	False	False	1.0666 ns	0.1389 ns	1.0758 ns	1.0266 ns	-	-	-	0.00
'stackalloc byte[10KB]'	True	True	1.0732 ns	0.1399 ns	1.0833 ns	1.0236 ns	-	-	-	0.00
'stackalloc byte[10KB]'	True	False	1.0801 ns	0.1408 ns	1.0903 ns	1.0410 ns	-	-	-	0.00
'new byte[10KB]'	False	True	108.7057 ns	14.2016 ns	110.0048 ns	102.1161 ns	24,835.00	-	-	10,048.00
'new byte[10KB]'	False	False	110.6094 ns	14.4250 ns	111.7358 ns	106.6167 ns	25,045.00	-	-	10,048.00
'new byte[10KB]'	True	True	155.6753 ns	20.5542 ns	159.2124 ns	139.0744 ns	1,131.00	-	-	10,042.00
'new byte[10KB]'	True	False	168.1659 ns	22.5059 ns	174.3302 ns	137.8017 ns	1,330.00	-	-	10,042.00

Run 2

Method	GcServer	GcForce	Mean	StdErr	StdDev	Median	Gen 0	Gen 1	Gen 2	Bytes Allocated/Op
'stackalloc byte[10KB]'	False	True	1.0694 ns	0.1392 ns	1.0786 ns	1.0444 ns	-	-	-	0.00
'stackalloc byte[10KB]'	False	False	1.0722 ns	0.1396 ns	1.0814 ns	1.0501 ns	-	-	-	0.00
'stackalloc byte[10KB]'	True	False	1.0765 ns	0.1402 ns	1.0861 ns	1.0419 ns	-	-	-	0.00
'stackalloc byte[10KB]'	True	True	1.0790 ns	0.1405 ns	1.0883 ns	1.0459 ns	-	-	-	0.00
'new byte[10KB]'	False	True	111.6729 ns	14.6014 ns	113.1016 ns	102.3328 ns	24,831.00	-	-	10,048.00
'new byte[10KB]'	False	False	112.5393 ns	14.6706 ns	113.6381 ns	106.2042 ns	25,045.00	-	-	10,048.00
'new byte[10KB]'	True	False	158.9486 ns	21.4685 ns	166.2941 ns	128.0673 ns	1,331.00	-	-	10,042.00
'new byte[10KB]'	True	True	161.1846 ns	21.2168 ns	164.3446 ns	137.8021 ns	1,132.00	-	-	10,042.00

Run 3

Method	GcServer	GcForce	Mean	StdErr	StdDev	Median	Gen 0	Gen 1	Gen 2	Bytes Allocated/Op
'stackalloc byte[10KB]'	False	False	1.0690 ns	0.1392 ns	1.0783 ns	1.0454 ns	-	-	-	0.00
'stackalloc byte[10KB]'	True	True	1.0748 ns	0.1400 ns	1.0842 ns	1.0400 ns	-	-	-	0.00
'stackalloc byte[10KB]'	True	False	1.0772 ns	0.1403 ns	1.0866 ns	1.0403 ns	-	-	-	0.00
'stackalloc byte[10KB]'	False	True	1.0872 ns	0.1423 ns	1.1023 ns	1.0260 ns	-	-	-	0.00
'new byte[10KB]'	False	True	108.5197 ns	14.1317 ns	109.4635 ns	105.7219 ns	24,839.00	-	-	10,048.00
'new byte[10KB]'	False	False	109.8449 ns	14.3067 ns	110.8190 ns	106.2894 ns	25,045.00	-	-	10,048.00
'new byte[10KB]'	True	True	154.6476 ns	20.1560 ns	156.1277 ns	148.9501 ns	1,114.00	-	-	10,042.00
'new byte[10KB]'	True	False	158.9589 ns	20.7968 ns	161.0914 ns	142.1361 ns	1,326.00	-	-	10,042.00

adamsitnik · 2016-11-02T19:01:05Z

@mattwarren Thanks for the review and feedback!

There is one thing that I am not sure how to solve, please consider following scenario:

User wants to compare two different methods in terms of CPU and memory. For time and bytes allocated/op we give results that can be compared because they are = sum / operationsCount.

However this is not true for Gen 0, 1 & 2 stats. Two benchmarks might be executed different amount of times, but we don't scale the GC collections count. Let's say first benchmark is executed 10k times and has 100 Gen 0, the other 20k times and also has 100 Gen 0. So if the user takes a look at the GC column he or she might think that it has the same GC pressure, but the one executed 10k times is actually two times worse.

@mattwarren @AndreyAkinshin We should most probably scale the results. What do you think? How could we call such a column?

xoofx · 2016-11-03T07:55:01Z

Yes, a total amount for Gen0/Gen1/Gen2 doesn't make sense, because you can't compare them when op count is changing (typically issue #133) and should be converted to a gcCount/op instead.

…ented there

…for benchmarks with different # of runs, fixes #133

adamsitnik · 2016-11-03T15:46:07Z

Ok, I have updated the code. It's very stable now and fixes #133 Sample output:

                  Method | GcServer | GcForce |        Mean |     StdErr |      StdDev |      Median | Gen 0/op | Gen 1/op | Gen 2/op | Bytes Allocated/Op |
------------------------ |--------- |-------- |------------ |----------- |------------ |------------ |--------- |--------- |--------- |------------------- |
 'stackalloc byte[10KB]' |    False |    True |   1.2132 ns |  0.1581 ns |   1.2243 ns |   1.1721 ns |        - |        - |        - |                  0 |
 'stackalloc byte[10KB]' |    False |   False |   1.2206 ns |  0.1591 ns |   1.2323 ns |   1.1408 ns |        - |        - |        - |                  0 |
 'stackalloc byte[10KB]' |     True |   False |   1.2342 ns |  0.1608 ns |   1.2459 ns |   1.1820 ns |        - |        - |        - |                  0 |
 'stackalloc byte[10KB]' |     True |    True |   1.2348 ns |  0.1616 ns |   1.2517 ns |   1.1148 ns |        - |        - |        - |                  0 |
        'new byte[10KB]' |    False |    True | 131.1965 ns | 17.2203 ns | 133.3878 ns | 118.7028 ns | 0.003157 |        - |        - |             10,048 |
        'new byte[10KB]' |    False |   False | 131.3826 ns | 17.1396 ns | 132.7630 ns | 125.7887 ns | 0.003185 |        - |        - |             10,048 |
        'new byte[10KB]' |     True |   False | 234.2503 ns | 32.6858 ns | 253.1828 ns | 159.1990 ns | 0.000168 |        - |        - |             10,042 |
        'new byte[10KB]' |     True |    True | 250.3041 ns | 35.5450 ns | 275.3304 ns | 150.8811 ns | 0.000118 |        - |        - |             10,042 |

xoofx · 2016-11-03T15:52:13Z

@adamsitnik amazing! Wondering if we should scale Genx/op by 1k op... we will most likely always get a gc after a certain amount of ops and barely for only 1 (through this could happen with a for loop with lots of alloc...)

mattwarren · 2016-11-03T16:12:15Z

Wondering if we should scale Genx/op by 1k op

Agree, I was going to propose something similar. Scaling GenX to per/op will give pretty small values. I think that 'GenX/1k op' should be okay for most/all scenarios.

adamsitnik · 2016-11-04T10:04:42Z

@xoofx @mattwarren Thanks for the ideas!

Initially I started with / 1k op:

                  Method | GcServer | GcForce |        Mean |     StdErr |      StdDev |      Median | Gen 0/1k op | Gen 1/1k op | Gen 2/1k op | Bytes Allocated/Op |
------------------------ |--------- |-------- |------------ |----------- |------------ |------------ |------------ |------------ |------------ |------------------- |
 'stackalloc byte[10KB]' |    False |    True |   1.1611 ns |  0.1513 ns |   1.1719 ns |   1.1014 ns |           - |           - |           - |                  0 |
 'stackalloc byte[10KB]' |     True |    True |   1.1684 ns |  0.1522 ns |   1.1791 ns |   1.0974 ns |           - |           - |           - |                  0 |
 'stackalloc byte[10KB]' |    False |   False |   1.1904 ns |  0.1554 ns |   1.2034 ns |   1.0478 ns |           - |           - |           - |                  0 |
 'stackalloc byte[10KB]' |     True |   False |   1.1938 ns |  0.1560 ns |   1.2085 ns |   1.0838 ns |           - |           - |           - |                  0 |
        'new byte[10KB]' |    False |   False | 118.1717 ns | 15.3887 ns | 119.2000 ns | 114.3850 ns |    3.184636 |           - |           - |             10,048 |
        'new byte[10KB]' |    False |    True | 119.3012 ns | 15.5924 ns | 120.7781 ns | 111.0194 ns |    3.158442 |           - |           - |             10,048 |
        'new byte[10KB]' |     True |    True | 193.8840 ns | 25.3369 ns | 196.2585 ns | 173.9980 ns |    0.140254 |           - |           - |             10,042 |
        'new byte[10KB]' |     True |   False | 238.4089 ns | 77.1975 ns | 597.9696 ns | 105.6934 ns |    0.167592 |           - |           - |             10,042 |

but then I decided to try the per mille placeholder:

                  Method | GcServer | GcForce |        Mean |     StdErr |      StdDev |      Median |  Gen 0 | Gen 1 | Gen 2 | Bytes Allocated/Op |
------------------------ |--------- |-------- |------------ |----------- |------------ |------------ |------- |------ |------ |------------------- |
 'stackalloc byte[10KB]' |    False |   False |   1.1704 ns |  0.1525 ns |   1.1811 ns |   1.0855 ns |      - |     - |     - |                  0 |
 'stackalloc byte[10KB]' |    False |    True |   1.1802 ns |  0.1543 ns |   1.1954 ns |   1.0718 ns |      - |     - |     - |                  0 |
 'stackalloc byte[10KB]' |     True |    True |   1.1890 ns |  0.1550 ns |   1.2007 ns |   1.0957 ns |      - |     - |     - |                  0 |
 'stackalloc byte[10KB]' |     True |   False |   1.1931 ns |  0.1555 ns |   1.2045 ns |   1.1226 ns |      - |     - |     - |                  0 |
        'new byte[10KB]' |    False |    True | 116.5610 ns | 15.1830 ns | 117.6073 ns | 113.6389 ns | 3.158‰ |     - |     - |             10,048 |
        'new byte[10KB]' |    False |   False | 122.1617 ns | 15.9805 ns | 123.7846 ns | 114.1776 ns | 3.185‰ |     - |     - |             10,048 |
        'new byte[10KB]' |     True |   False | 198.6071 ns | 27.4427 ns | 212.5701 ns | 162.8640 ns | 0.168‰ |     - |     - |             10,042 |
        'new byte[10KB]' |     True |    True | 271.9210 ns | 54.2128 ns | 419.9302 ns | 101.6848 ns | 0.119‰ |     - |     - |             10,042 |

What do you think? Which option is better? Personally I like the ‰ approach but I am afraid that people reading the console output might mislead ‰ with %

… + reduce the column's name length (everything is per operation now)

mattwarren · 2016-11-04T11:41:19Z

If find the ‰ character really hard to read, I had to set my browser zoom to 200% before I could make it out! I don't know what to suggest as a good unit of measure though?!

Either way, I think it would be good to add a note in the "Diagnostic Output" section explaining how the calculations are done, something like:

        'new byte[10KB]' |     True |    True | 193.8840 ns | 25.3369 ns | 196.2585 ns | 173.9980 ns |    0.140254 |           - |           - |             10,042 |
        'new byte[10KB]' |     True |   False | 238.4089 ns | 77.1975 ns | 597.9696 ns | 105.6934 ns |    0.167592 |           - |           - |             10,042 |

// * Diagnostic Output - MemoryDiagnoser *
Note: the Gen 0/1/2/ Measurements are per ???? Operations

Conflicts: samples/BenchmarkDotNet.Samples/Intro/IntroGcMode.cs src/BenchmarkDotNet.Core/Engines/Engine.cs src/BenchmarkDotNet.Core/Engines/RunResults.cs tests/BenchmarkDotNet.IntegrationTests/CustomEngineTests.cs tests/BenchmarkDotNet.IntegrationTests/MemoryDiagnoserTests.cs

…ry diagnoser

src/BenchmarkDotNet.Core/Engines/GcStats.cs

+        private static Func<long> GetAllocatedBytesForCurrentThread()
+        {
+            // for some versions of .NET Core this method is internal, 
+            // for some public and for others public and exposed ;)


Conflicts: tests/BenchmarkDotNet.IntegrationTests/MemoryDiagnoserTests.cs

adamsitnik · 2016-11-18T10:23:02Z

@AndreyAkinshin @mattwarren I have finished to work on this PR. Could you do a code review?

@AndreyAkinshin We have no beta package dependency now, we can release 0.10.1 to nuget.org. Due update to netcoreapp1.1 we need new SDK to be installed, that's why tests on appveyour mail fail until they install it.

AndreyAkinshin · 2016-11-18T10:26:44Z

@adamsitnik, nice job! Will do a review on this weekend.

AndreyAkinshin · 2016-11-23T08:32:34Z

@adamsitnik, everything looks great! It prints correct results even on Linux. However, I have some additional minor requests:

What should happened when you are working with MemoryDiagnoser+MonoJob? For now, IntroGcMode throws a strange exception when we are trying to run it against mono. Could we print a friendly message in this case?
I'm not sure about hardcoded 1k-scaling for GCCollectionColumn. We could have a method with 0.0001 collections per operation and a method with 1000 collections per method. I think, we need here an adaptive logic for scaling (the same we have in TimeUnit; we choose ns/us/ms/s based on obtained measurements).
Why do we always print bytes in the AllocationColumn? What if a method allocates KB or MB (it's a usual situation in macrobenchmarking). Could we also use the adaptive approach here? (Check out the implementation of TimeUnit/TimeInterval and FrequencyUnit/Frequency, probably we could do the same for Memory).
Could you also update the documentation?

AndreyAkinshin · 2016-11-23T08:58:38Z

@adamsitnik: additional thoughts about columns/scaling/hints. We have the following problem: it's hard to explain the meaning of some columns with small amount of characters. Maybe each column could provide a Legend value which will be printed after the table (in a case when it's not empty). What do you think?

adamsitnik · 2016-11-23T10:31:29Z

@AndreyAkinshin thanks for the review! I'll adopt to your comments.

I agree that we need some explanation, especially for Gen 0/1/2 columns where the contained values are not what they used to be anymore. I see what I can do.

Conflicts: src/BenchmarkDotNet.Core/Engines/RunResults.cs src/BenchmarkDotNet.Core/Reports/BenchmarkReport.cs

adamsitnik · 2016-11-23T16:57:41Z

@AndreyAkinshin I updated the docs and addes smarter way for formatting allocated memory. But I am still not sure about scalling the Gen collection counts. Here the problem is that we have no unit :/

I tried to reproduce the Mono issue and it worked. Could you provide some more details? OS etc

           Method |        Mean |     StdErr |      StdDev |      Median |  Gen 0 | Allocated |
----------------- |------------ |----------- |------------ |------------ |------- |---------- |
 'new byte[10KB]' | 884.4896 ns | 46.3528 ns | 245.2762 ns | 776.4237 ns | 0.1183 |     10 kB |

mattwarren · 2016-11-23T17:07:22Z

BTW, this is what JMH does, I'm not saying we should do this, but it might prompt some ideas!

AndreyAkinshin · 2016-11-23T19:06:43Z

Here the problem is that we have no unit :/

Yes, it's a problem. =(

I tried to reproduce the Mono issue and it worked.

Ok, I will debug it myself.

BTW, this is what JMH does

@mattwarren, thanks for the input. Will think about it.

AndreyAkinshin · 2016-11-24T21:25:20Z

Guys, I don't know the best way to do it. Let's merge the PR, release new version, and try to use it.
@mattwarren, are you happy with the new MemoryDiagnoser?

adamsitnik · 2016-11-25T08:38:29Z

@AndreyAkinshin I really like this idea!

mattwarren · 2016-11-25T10:56:18Z

@mattwarren, are you happy with the new MemoryDiagnoser?

I'll have a play around with the latest version this weekend, but I've been using the code from this branch for the last week or so and it looks good to me!

benaadams · 2016-11-25T10:59:58Z

⌚️ Should also resolve #301 ?

adamsitnik · 2016-11-25T12:01:43Z

@benaadams Yes, exactly

AndreyAkinshin · 2016-11-25T12:14:41Z

Ok, it's merged. If everything is fine, I will release v0.10.1 on the next week.

mattwarren · 2016-12-13T12:17:45Z

Okay, I just noticed some interesting behaviour when using MonitoringTotalAllocatedMemorySize, if you run code like this:

public static void TestMonitoringTotalAllocatedMemorySize()
{
    AppDomain.MonitoringIsEnabled = true;

    // provoke JIT, static ctors etc (was allocating 1740 bytes with first call)
    var list = new List<string> { "stringA", "stringB" };
    list.Sort(); 
    var temp = new HashSet<string>();

    var currentDomain = AppDomain.CurrentDomain;

    var hashSetBefore = currentDomain.MonitoringTotalAllocatedMemorySize;
    var countersToUse = new [] {1, 5, 10, 25, 50, 100, 250, 500, 1000, 2500, 5000, 10000, 100000, 250000, 500000};
    foreach (var counter in countersToUse)
    {
        var loopCounter = 0;
        for (int i = 0; i < counter; i++)
        {
            var test = new HashSet<string>();
            loopCounter += test.Count;
        }
    
        Thread.Sleep(10);
        var hashSetAfter = currentDomain.MonitoringTotalAllocatedMemorySize;
        var totalAlloc = hashSetAfter - hashSetBefore;
        Console.WriteLine(
            "HashSet<string>() = {0,8:N2} bytes (Total = {1,12:N0} bytes, Counter = {2,8:N0})",
            (double)totalAlloc / counter, totalAlloc, counter);
    }            
}

You get the following output:

HashSet<string>() =     0.00 bytes (Total =            0 bytes, Counter =        1)
HashSet<string>() =     0.00 bytes (Total =            0 bytes, Counter =        5)
HashSet<string>() =     0.00 bytes (Total =            0 bytes, Counter =       10)
HashSet<string>() =   327.68 bytes (Total =        8,192 bytes, Counter =       25)
HashSet<string>() =   163.84 bytes (Total =        8,192 bytes, Counter =       50)
HashSet<string>() =   163.84 bytes (Total =       16,384 bytes, Counter =      100)
HashSet<string>() =   131.07 bytes (Total =       32,768 bytes, Counter =      250)
HashSet<string>() =   131.07 bytes (Total =       65,536 bytes, Counter =      500)
HashSet<string>() =   131.07 bytes (Total =      131,072 bytes, Counter =    1,000)
HashSet<string>() =   117.96 bytes (Total =      294,912 bytes, Counter =    2,500)
HashSet<string>() =   122.88 bytes (Total =      614,400 bytes, Counter =    5,000)
HashSet<string>() =   125.34 bytes (Total =    1,253,376 bytes, Counter =   10,000)
HashSet<string>() =    76.56 bytes (Total =    7,655,696 bytes, Counter =  100,000)
HashSet<string>() =    94.62 bytes (Total =   23,655,112 bytes, Counter =  250,000)
HashSet<string>() =   111.32 bytes (Total =   55,662,136 bytes, Counter =  500,000)

Note the variance in the amt of bytes in the 1st column, which depends on the different values of Counter. Is seems like MonitoringTotalAllocatedMemorySize often (but not always) measures allocations in 8,192 byte (8K) increments (see Total =), which seems suspiciously close to the Allocation quantum the GC uses, from Design of Allocator:

The Allocation quantum is the size of memory that the allocator allocates each time it needs more memory, in order to perform object allocations within an allocation context. The allocation is typically 8k and the average size of managed objects are around 35 bytes, enabling a single allocation quantum to be used for many object allocations.

Now I know that we generally do 1000's of iterations, so it might not be a problem, but I just wanted to raise it and see what others thought?

For instance in the test above we don't get a consistent value for how many bytes are allocated when you create an empty HashSet<T>?

adamsitnik · 2016-12-13T13:44:43Z

I have encountered similar problem when I was implementing that and found this interesting note:

"This instance Int64 property returns the number of bytes that have been allocated by a specific AppDomain. The number is accurate as of the last garbage collection." - CLR via C#

@mattwarren Could you try to enforece GC.Collect first? Like we do here

mattwarren · 2016-12-13T16:14:41Z

I tried adding GC.Collect() and it didn't seem to make any difference, current code in this gist?!

adamsitnik · 2016-12-28T09:20:01Z

@mattwarren sorry for the late response, finally I had some time to try it.

Here is my code for both desktop and Core .NET.

Results for dotnet run -c Release -f net45

HashSet<string>() = 8.192,00 bytes (Total =        8.192 bytes, Counter =        1), Current =   44.816 bytes
HashSet<string>() = 1.638,40 bytes (Total =        8.192 bytes, Counter =        5), Current =   57.536 bytes
HashSet<string>() =   819,20 bytes (Total =        8.192 bytes, Counter =       10), Current =   57.536 bytes
HashSet<string>() =   327,68 bytes (Total =        8.192 bytes, Counter =       25), Current =   57.536 bytes
HashSet<string>() =   163,84 bytes (Total =        8.192 bytes, Counter =       50), Current =   57.536 bytes
HashSet<string>() =    81,92 bytes (Total =        8.192 bytes, Counter =      100), Current =   57.536 bytes
HashSet<string>() =    65,54 bytes (Total =       16.384 bytes, Counter =      250), Current =   57.536 bytes
HashSet<string>() =    65,54 bytes (Total =       32.768 bytes, Counter =      500), Current =   57.536 bytes
HashSet<string>() =    66,42 bytes (Total =       66.416 bytes, Counter =    1.000), Current =   57.536 bytes
HashSet<string>() =    65,89 bytes (Total =      164.720 bytes, Counter =    2.500), Current =   57.536 bytes
HashSet<string>() =    64,07 bytes (Total =      320.368 bytes, Counter =    5.000), Current =   57.536 bytes
HashSet<string>() =    64,80 bytes (Total =      648.048 bytes, Counter =   10.000), Current =   57.536 bytes
HashSet<string>() =    64,06 bytes (Total =    6.406.320 bytes, Counter =  100.000), Current =   57.536 bytes
HashSet<string>() =    64,03 bytes (Total =   16.007.608 bytes, Counter =  250.000), Current =   57.536 bytes
HashSet<string>() =    64,01 bytes (Total =   32.007.024 bytes, Counter =  500.000), Current =   57.536 bytes

Results for dotnet run -c Release -f netcoreapp1.1:

HashSet<string>() = 6.560,00 bytes (Total =        6.560 bytes, Counter =        1), Current =   54.016 bytes
HashSet<string>() = 1.633,60 bytes (Total =        8.168 bytes, Counter =        5), Current =   58.480 bytes
HashSet<string>() =   816,80 bytes (Total =        8.168 bytes, Counter =       10), Current =   58.480 bytes
HashSet<string>() =   326,72 bytes (Total =        8.168 bytes, Counter =       25), Current =   58.480 bytes
HashSet<string>() =   163,36 bytes (Total =        8.168 bytes, Counter =       50), Current =   58.480 bytes
HashSet<string>() =    81,68 bytes (Total =        8.168 bytes, Counter =      100), Current =   58.480 bytes
HashSet<string>() =    65,34 bytes (Total =       16.336 bytes, Counter =      250), Current =   58.480 bytes
HashSet<string>() =    61,18 bytes (Total =       30.592 bytes, Counter =      500), Current =   58.480 bytes
HashSet<string>() =    63,26 bytes (Total =       63.264 bytes, Counter =    1.000), Current =   58.480 bytes
HashSet<string>() =    57,98 bytes (Total =      144.944 bytes, Counter =    2.500), Current =   58.480 bytes
HashSet<string>() =    56,76 bytes (Total =      283.800 bytes, Counter =    5.000), Current =   58.480 bytes
HashSet<string>() =    56,15 bytes (Total =      561.512 bytes, Counter =   10.000), Current =   58.480 bytes
HashSet<string>() =    55,87 bytes (Total =    5.586.944 bytes, Counter =  100.000), Current =   58.480 bytes
HashSet<string>() =    55,84 bytes (Total =   13.959.336 bytes, Counter =  250.000), Current =   58.480 bytes
HashSet<string>() =    55,84 bytes (Total =   27.918.672 bytes, Counter =  500.000), Current =   58.480 bytes

What helps us in the new MemoryDiagnoser is a lot of runs, no extra allocations and the fact that the result number is long, not double. So we cut the minimum overhead when doing long = long / long

built-in accurate and cross platform Memory Diagnoser, fixes #186, fixes

23f3b29

#200

Therzok reviewed Oct 24, 2016

View reviewed changes

adamsitnik mentioned this pull request Oct 29, 2016

Support for Beta versions #292

Closed

mattwarren reviewed Nov 2, 2016

View reviewed changes

src/BenchmarkDotNet.Core/Diagnosers/MemoryDiagnoser.cs

if (results.ContainsKey(benchmark))

{

var result = results[benchmark];

// TODO scale this based on the minimum value in the column, i.e. use B/KB/MB as appropriate

This comment was marked as spam.

Sign in to view

adamsitnik added 2 commits November 3, 2016 16:38

don't try to use AppDomain's Monitoring in Mono since it's not implem…

4cabc20

…ented there

scale GC collections count / op, makes MemoryDiagnoser output stable …

99c21e8

…for benchmarks with different # of runs, fixes #133

use per mille to make the Memory Diagnoser output more human-friendly…

e91255e

… + reduce the column's name length (everything is per operation now)

AndreyAkinshin added this to the v0.10.1 milestone Nov 6, 2016

AndreyAkinshin added enhancement Area:Diagnosers labels Nov 6, 2016

adamsitnik added 3 commits November 6, 2016 16:34

preallocate results list in more safe, but still ugly way

ade1bea

closed the ugly code in separate class

1022827

adamsitnik mentioned this pull request Nov 8, 2016

[Discussion] Changes on Engine, diagnosers and result reporting? #297

Closed

adamsitnik added 2 commits November 13, 2016 18:14

update to netcoreapp1.1 in order to get universal cross platform memo…

1e2d381

…ry diagnoser

don't show Gen 1 and Gen 2 columns if empty for all benchmarks

e69e80b

mattwarren reviewed Nov 14, 2016

View reviewed changes

adamsitnik added 3 commits November 18, 2016 09:27

update to .NET Core 1.1, fixes #301

2a529ab

always show Gen 0 column, display Gen 0/1/2 per 1k op

e6ccee6

Merge branch 'master' into universalMemoryDiagnoser

3bcc598

Conflicts: tests/BenchmarkDotNet.IntegrationTests/MemoryDiagnoserTests.cs

adamsitnik mentioned this pull request Nov 18, 2016

netcoreapp1.1 support #301

Closed

adamsitnik added 2 commits November 23, 2016 17:39

added documentation and smarter bytes formatting

eae2cd5

Merge branch 'master' into universalMemoryDiagnoser

1208c33

Conflicts: src/BenchmarkDotNet.Core/Engines/RunResults.cs src/BenchmarkDotNet.Core/Reports/BenchmarkReport.cs

AndreyAkinshin merged commit f1f2317 into master Nov 25, 2016

adamsitnik deleted the universalMemoryDiagnoser branch December 2, 2016 20:08

adamsitnik mentioned this pull request Dec 27, 2016

Support netcoreapp1.0 as well as 1.1 #334

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

built-in accurate and cross platform Memory Diagnoser #284

built-in accurate and cross platform Memory Diagnoser #284

adamsitnik commented Oct 16, 2016 •

edited

Loading

adamsitnik commented Oct 18, 2016

AndreyAkinshin commented Oct 18, 2016

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

mattwarren commented Nov 2, 2016

adamsitnik commented Nov 2, 2016

xoofx commented Nov 3, 2016

adamsitnik commented Nov 3, 2016

xoofx commented Nov 3, 2016

mattwarren commented Nov 3, 2016

adamsitnik commented Nov 4, 2016

mattwarren commented Nov 4, 2016 •

edited

Loading

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

adamsitnik commented Nov 18, 2016

AndreyAkinshin commented Nov 18, 2016

AndreyAkinshin commented Nov 23, 2016

AndreyAkinshin commented Nov 23, 2016

adamsitnik commented Nov 23, 2016

adamsitnik commented Nov 23, 2016

mattwarren commented Nov 23, 2016

AndreyAkinshin commented Nov 23, 2016

AndreyAkinshin commented Nov 24, 2016 •

edited

Loading

adamsitnik commented Nov 25, 2016

mattwarren commented Nov 25, 2016

benaadams commented Nov 25, 2016

adamsitnik commented Nov 25, 2016

AndreyAkinshin commented Nov 25, 2016

mattwarren commented Dec 13, 2016

adamsitnik commented Dec 13, 2016

mattwarren commented Dec 13, 2016

adamsitnik commented Dec 28, 2016

built-in accurate and cross platform Memory Diagnoser #284

built-in accurate and cross platform Memory Diagnoser #284

Conversation

adamsitnik commented Oct 16, 2016 • edited Loading

adamsitnik commented Oct 18, 2016

AndreyAkinshin commented Oct 18, 2016

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

mattwarren commented Nov 2, 2016

Run 1

Run 2

Run 3

adamsitnik commented Nov 2, 2016

xoofx commented Nov 3, 2016

adamsitnik commented Nov 3, 2016

xoofx commented Nov 3, 2016

mattwarren commented Nov 3, 2016

adamsitnik commented Nov 4, 2016

mattwarren commented Nov 4, 2016 • edited Loading

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

adamsitnik commented Nov 18, 2016

AndreyAkinshin commented Nov 18, 2016

AndreyAkinshin commented Nov 23, 2016

AndreyAkinshin commented Nov 23, 2016

adamsitnik commented Nov 23, 2016

adamsitnik commented Nov 23, 2016

mattwarren commented Nov 23, 2016

AndreyAkinshin commented Nov 23, 2016

AndreyAkinshin commented Nov 24, 2016 • edited Loading

adamsitnik commented Nov 25, 2016

mattwarren commented Nov 25, 2016

benaadams commented Nov 25, 2016

adamsitnik commented Nov 25, 2016

AndreyAkinshin commented Nov 25, 2016

mattwarren commented Dec 13, 2016

adamsitnik commented Dec 13, 2016

mattwarren commented Dec 13, 2016

adamsitnik commented Dec 28, 2016

adamsitnik commented Oct 16, 2016 •

edited

Loading

mattwarren commented Nov 4, 2016 •

edited

Loading

AndreyAkinshin commented Nov 24, 2016 •

edited

Loading