-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Stop passing CompAllocator pointers around #15025
Conversation
8857378
to
00a397e
Compare
aed763b
to
1bcd61f
Compare
@mikedn This is marked [WIP]. Is this ready for review? |
AFAIR the main remaining work here was extending the PR description to explain some of these changes. I'll have to do that and figure out if there's anything else... |
1f15f7a
to
b4aa832
Compare
This will probably need to be squashed but for review it may be easier to look at the individual commits (though GitHub displays them in a weird order on the conversation page). The most sensitive change is around allocator pooling. The rest of the PR consists of many trivial changes (e.g. I tried to make a diff of old and new mem stats dumps but discovered that in the old version CompMemKind tagged allocations weren't always recorded correctly (e.g. some allocation were made before initializing mem stats) so the diff is messy. |
8870dbf
to
e77caa4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.... @dotnet/jit-contrib anyone else want to take a look?
Not sure what a good place for coding guidelines is today - I'll use this to point out that using the
|
@BruceForstall any advice? Wonder if we can make the |
I think it's technically possible to do something like
which later be used as Then we can restrict what types of callers are allowed to use
But I guess this would add too much complexity... |
Re: coding conventions We still have https://github.com/dotnet/coreclr/blob/master/Documentation/coding-guidelines/clr-jit-coding-conventions.md, although the formatting parts have been obsoleted / overridden by jit-format. The non-formatting parts are still relevant. |
@mikedn Did you run various asm diffs combos on this to verify no diffs? |
I have the changes built locally, so I'll run some diffs. |
Ok, I ran both x86 and x64 Checked PMI diffs on frameworks and saw no diffs. (Did this before I saw your message, Andy) |
@mikedn want to squash and add something to the coding conventions doc? Then I think we're set to merge. |
Yep but it sounds a bit too much in this case. At the end of the day
Thanks! (for doing my work while I was sleeping :)
Sure, will do that this evening (morning for you suppose) Side note - I realized that the change from |
9f87094
to
bed12e2
Compare
Woops, I updated coding conventions but now a CoreFX build failed with an assert in allocator. Looks like a shutdown race... |
@dotnet/jit-contrib any last-minute reservations about me merging this... ? |
LGTM |
@mikedn maybe you can squash and reword the one surviving commit description? There's only fragments per commit so I didn't feel comfortable just stitching one up from those. |
Passing CompAllocator objects by value is advantageous because it no longer needs to be dynamically allocated and cached. CompAllocator instances can now be freely created, copied and stored, which makes adding new CompMemKind values easier. Together with other cleanup this also improves memory allocation performance by removing some extra levels of indirection that were previously required - jitstd::allocator had a pointer to CompAllocator, CompAllocator had a pointer to Compiler and Compiler finally had a pointer to ArenaAllocator. Without MEASURE_MEM_ALLOC enabled, both jitstd::allocator and CompAllocator now just contain a pointer to ArenaAllocator. When MEASURE_MEM_ALLOC is enabled CompAllocator also contains a pointer but to a MemStatsAllocator object that holds the relevant memory kind. This way CompAllocator is always pointer sized so that enabling MEASURE_MEM_ALLOC does not result in increased memory usage due to objects that store a CompAllocator instance. In order to implement this, 2 additional signficant changes have been made: * MemStats has been moved to ArenaAllocator, it's after all the allocator's job to maintain statistics. This also fixes some issues related to memory statistics, such as not tracking the memory allocated by the inlinee compiler (since that one used its own MemStats instance). * Extract the arena page pooling logic out of the allocator. It doesn't make sense to pool an allocator, it has very little state that can actually be reused and everyting else (including MemStats) needs to be reset on reuse. What really needs to be pooled is just a page of memory. Since this was touching allocation code the opportunity has been used to perform additional cleanup: * Remove unnecessary LSRA ListElementAllocator * Remove compGetMem and compGetMemArray * Make CompAllocator and HostAllocator more like the std allocator * Update HashTable to use CompAllocator * Update ArrayStack to use CompAllocator * Move CompAllocator & friends to alloc.h
@AndyAyersMS Squashed and reworded. Hopefully CI won't fail again :) Thanks everyone for review & feedback! |
Beautiful, thanks ! |
Passing CompAllocator objects by value is advantageous because it no longer needs to be dynamically allocated and cached. CompAllocator instances can now be freely created, copied and stored, which makes adding new CompMemKind values easier. Together with other cleanup this also improves memory allocation performance by removing some extra levels of indirection that were previously required - jitstd::allocator had a pointer to CompAllocator, CompAllocator had a pointer to Compiler and Compiler finally had a pointer to ArenaAllocator. Without MEASURE_MEM_ALLOC enabled, both jitstd::allocator and CompAllocator now just contain a pointer to ArenaAllocator. When MEASURE_MEM_ALLOC is enabled CompAllocator also contains a pointer but to a MemStatsAllocator object that holds the relevant memory kind. This way CompAllocator is always pointer sized so that enabling MEASURE_MEM_ALLOC does not result in increased memory usage due to objects that store a CompAllocator instance. In order to implement this, 2 additional signficant changes have been made: * MemStats has been moved to ArenaAllocator, it's after all the allocator's job to maintain statistics. This also fixes some issues related to memory statistics, such as not tracking the memory allocated by the inlinee compiler (since that one used its own MemStats instance). * Extract the arena page pooling logic out of the allocator. It doesn't make sense to pool an allocator, it has very little state that can actually be reused and everyting else (including MemStats) needs to be reset on reuse. What really needs to be pooled is just a page of memory. Since this was touching allocation code the opportunity has been used to perform additional cleanup: * Remove unnecessary LSRA ListElementAllocator * Remove compGetMem and compGetMemArray * Make CompAllocator and HostAllocator more like the std allocator * Update HashTable to use CompAllocator * Update ArrayStack to use CompAllocator * Move CompAllocator & friends to alloc.h Commit migrated from dotnet/coreclr@c2baf04
Currently
CompAllocator
is passed around by pointer even though (whenMEASURE_MEM_ALLOC
is not defined) it is a pointer sized object and thus it would make more sense to pass around by value. This is problematic because:optVnCopyProp
) but this needs care to avoid creating objects that depend on the allocator but have a longer lifetime. Now, dynamic allocation wouldn't be so bad in itself but if you do allocate dynamically you need to cache the allocator as well (e.g.LinearScan::getAllocator
). In all, it's a bit of hassle to useCompAllocator
and I suspect that this is one reason why quite a bit of JIT memory ends up inCMK_Generic
/Unknown
or other "general purpose" buckets.jitstd::list
contains a pointer to aCompAllocator
which contains a pointer to aCompiler
which contains a pointer to anArenaAllocator
. Out of 8 instructions required for an allocation 2 are wasted following through indirections.One problem with passing the existing
CompAllocator
around by value is that whenMEASURE_MEM_ALLOC
is enabled the size of theCompAllocator
increases and together with the size of any objects that contains an allocator object (e.g.jitstd::vector
). This will end up skewing memory statistics somewhat.The solution is to always make
CompAllocator
pointer sized. WhenMEASURE_MEM_ALLOC
is not enabled it should contain a pointer toArenaAllocator
. WhenMEASURE_MEM_ALLOC
is enabled it should contain a pointer to anArenaAllocator
wrapper object that holds a pointer to the real arena and the memory kind value. There would be one such wrapper object per memory kind value.Initially I tried to handle this in a more surgical manner but things got a bit ugly -
CompAllocator
relied onCompiler::compGetMem
and it wascompGetMem
that took care of tracking memory stats when it'sArenaAllocator
that should be doing that. So I ended up movingMemStats
toArenaAllocator
.And then moving mem stats to arena got complicated too due to
PooledAllocator
. Its existence meant that memory stats had to be reset every time the pooled allocator was reused. But the whole idea of a pooled allocator doesn't really make sense, you can't and don't need to pool the allocator because pretty much it's entire state needs to be reset anyway. What needs to be pooled is just a 64k memory page. So no morePooledAllocator
, anArenaAllocator
is always created when a method is compiled and on first allocation request it will attempt to grab the pooled page if available.Since changes around
CompAllocator
already affected various allocation sites I used the opportunity to do a bit of cleanup:CompAllocator
's allocation interface was a leftover from the days ofIAllocator
and it's rather clumsy. I changed it to look more likestd::allocator
interface - a singleallocate
function that takes acount
parameter that indicates how many objects to allocate. Avoids the need forsizeof(X)
and cast fromvoid*
toX*
. It's not exactly likestd::allocator
, that one is a template and I see no reason to mirror that.compGetMem
andcompGetMemArray
. There weren't too many uses of these functions and they're just redundant.SmallHashTable
to useCompAllocator
like other JIT collection classes do. It was using a special, global wrapper ofCompiler::compGetMem
.There's one small change related
compGetMemArray
that's worth mentioning - the integer overflow check. The old code didnumElem > (MAX_MEMORY_PER_ALLOCATION / elemSize)
whereMAX_MEMORY_PER_ALLOCATION
is 512 megabytes, this not only checked for integer overflow but also capped the allocation size to 512MB. The new code usesSIZE_MAX
, it's faster on 64bit targets because in many cases the number of elements isunsigned
and the C++ compiler eliminates the entire check. But this version does not limit the allocation size, if this is considered useful (e.g. it may stop the JIT before consuming all available memory) then such a check should be added toArenaAllocator::allocateNewPage
, where it has minimal perf impact.PIN shows a 0.15% improvement in instructions retired: https://1drv.ms/x/s!Av4baJYSo5pjgr1zpvRR2DPBIP_RSA