[FEA] Interface support to reserve an address space without actually allocating it #6

teju85 · 2020-05-06T19:00:06Z

Describe the solution you'd like
@seunghwak has a very good point here about having a support for reserving memory (aka over-subscription) using CUDA's virtual memory management APIs. I believe this is a good improvement to our existing Allocator, device_buffer and host_buffer interfaces. Thus, filing this issue so that this feature item is not lost.

Additional context
Ref: https://devblogs.nvidia.com/introducing-low-level-gpu-virtual-memory-management/

The text was updated successfully, but these errors were encountered:

Weights alteration

github-actions · 2021-02-16T20:11:59Z

This issue has been marked rotten due to no recent activity in the past 90d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

github-actions · 2021-02-16T20:12:09Z

This issue has been marked stale due to no recent activity in the past 30d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be marked rotten if there is no activity in the next 60d.

divyegala · 2021-07-16T23:06:11Z

@teju85 @seunghwak is this something that will be solved by using RMM directly?

teju85 · 2021-07-19T01:47:20Z

Yes, I think we should be able to use RMM's managed_memory_resource for this purpose. But I'd like to hear from @seunghwak if this was what he had in mind or something else.

seunghwak · 2021-07-19T02:28:52Z

My understanding of this issue is about reserving an address space (https://developer.nvidia.com/blog/introducing-low-level-gpu-virtual-memory-management/) instead of using managed memory.

This address space reservation feature is mainly to avoid reallocation. If we resize a vector, 1) it first allocates the memory block having the new size, 2) copies the old data to the new block, and 3) deallocates the old block. This address reservation feature allows us to replace 1), 2), 3) by just allocating a (new size - old size) block and mapping this to the reserved address space.

Managed memory is to allow using host memory as a lower level buffer (larger in size but slower in speed) for a device memory.

So, these two are two different things, and AFAIK, there was some brief discussion about supporting the address space reservation feature in RMM, but AFAIK, it has not been implemented yet in RMM.

teju85 · 2021-07-19T02:34:24Z

I now remember this discussion. This needs the support of address-reserve API of cuda, which I don't see in RMM. @harrism and/or @jrhemstad is this being planned for in RMM?

jrhemstad · 2021-07-19T13:57:28Z

Use of the CUDA VMM APIs would be an implementation detail of a memory resource implementation. There aren't any plans to expose explicit RMM APIs for CUDA VMM.

teju85 · 2021-07-20T02:35:19Z

@jrhemstad if there's a use-case like the one Seunghwa discussed above, is there a chance of getting this feature added onto RMM roadmap?

harrism · 2021-07-20T03:47:41Z

OK, so the specific use case is for resizing buffers, and the reasoning is to avoid copies.

The trouble with this is that RMM implements the memory_resource interface, which originates from std. There is no reallocate in memory_resource (or in C++, for that matter, only C). So adding a feature like this would have to be downstream of the MR interface, which is not attractive. For this reason it has only been discussed for RMM, no decision has been made whether or not to implement it.

So let me ask: is reallocation definitely a bottleneck?

teju85 · 2021-07-20T04:33:21Z

Fair enough @harrism . So far I haven't seen this being the bottleneck, maybe @seunghwak had a use-case in mind when he suggested this feature?

seunghwak · 2021-07-20T04:57:21Z

So, the biggest benefit of this is memory footprint. We can clearly live without this, but having this will allow more memory footprint optimization for us (and we can handle bigger graphs within the gpu memory limit).

To explain this in more detail,

without this feature, to resize, we need memory size of "old_size + new_size" while with address space reservation, we need only max(old_size, new_size). Say we're doing filtering of multiple blocks, we do something like.

rmm::device_uvector<int> filtered_elements(0, stream);
size_t num_inserted = 0;
for (size_t i = 0; i < num_blocks; ++i) {
  filtered_elements.resize(num_inserted + block_sizes[i], stream);
  filter(...); // num_inserted gets updated...
}
filtered_elements.resize(num_inserted, stream);

So, with the address reservation, we need to just reserve the address space of sum block_sizes[i] but without address reservation, this code will require actual allocation of sum (i=0 to block_sizes -2) block_sizes[i] + sum block_sizes[i] in the worst case (if 100% of the elements passes filtering).

Avoiding copy is a second benefit (but less important as this is pretty fast).

harrism · 2021-07-20T09:13:50Z

Address reservation can be used (as Jake pointed out) as an implementation detail of the vector to reduce memory overhead of resizing. This does not require an external interface for address reservation.

github-actions · 2021-12-23T21:01:25Z

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

github-actions · 2022-03-23T21:02:17Z

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

Demangle the error stack trace provided by GCC. Example output: ```bash RAFT failure at file=/workspace/raft/cpp/bench/ann/src/raft/raft_ann_bench_utils.h line=127: Ooops! Obtained 16 stack frames #1 in /workspace/raft/cpp/build/libraft_ivf_pq_ann_bench.so: raft::logic_error::logic_error(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) +0x5e [0x7fb20acce45e] #2 in /workspace/raft/cpp/build/libraft_ivf_pq_ann_bench.so: raft::bench::ann::configured_raft_resources::stream_wait(CUstream_st*) const +0x2e3 [0x7fb20acd0ac3] #3 in /workspace/raft/cpp/build/libraft_ivf_pq_ann_bench.so: raft::bench::ann::RaftIvfPQ<float, long>::search(float const*, int, int, unsigned long*, float*, CUstream_st*) const +0x63e [0x7fb20acd44fe] #4 in ./cpp/build/ANN_BENCH: void raft::bench::ann::bench_search<float>(benchmark::State&, raft::bench::ann::Configuration::Index, unsigned long, std::shared_ptr<raft::bench::ann::Dataset<float> const>, raft::bench::ann::Objective) +0xf76 [0x55853859f586] #5 in ./cpp/build/ANN_BENCH: benchmark::internal::LambdaBenchmark<benchmark::RegisterBenchmark<void (&)(benchmark::State&, raft::bench::ann::Configuration::Index, unsigned long, std::shared_ptr<raft::bench::ann::Dataset<float> const>, raft::bench::ann::Objective), raft::bench::ann::Configuration::Index&, unsigned long&, std::shared_ptr<raft::bench::ann::Dataset<float> const>&, raft::bench::ann::Objective&>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, void (&)(benchmark::State&, raft::bench::ann::Configuration::Index, unsigned long, std::shared_ptr<raft::bench::ann::Dataset<float> const>, raft::bench::ann::Objective), raft::bench::ann::Configuration::Index&, unsigned long&, std::shared_ptr<raft::bench::ann::Dataset<float> const>&, raft::bench::ann::Objective&)::{lambda(benchmark::State&)#1}>::Run(benchmark::State&) +0x84 [0x558538548f14] #6 in ./cpp/build/ANN_BENCH: benchmark::internal::BenchmarkInstance::Run(long, int, benchmark::internal::ThreadTimer*, benchmark::internal::ThreadManager*, benchmark::internal::PerfCountersMeasurement*) const +0x168 [0x5585385d6498] #7 in ./cpp/build/ANN_BENCH(+0x149108) [0x5585385b7108] #8 in ./cpp/build/ANN_BENCH: benchmark::internal::BenchmarkRunner::DoNIterations() +0x34f [0x5585385b8c7f] #9 in ./cpp/build/ANN_BENCH: benchmark::internal::BenchmarkRunner::DoOneRepetition() +0x119 [0x5585385b99b9] #10 in ./cpp/build/ANN_BENCH(+0x13afdd) [0x5585385a8fdd] #11 in ./cpp/build/ANN_BENCH: benchmark::RunSpecifiedBenchmarks(benchmark::BenchmarkReporter*, benchmark::BenchmarkReporter*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) +0x58e [0x5585385aa8fe] #12 in ./cpp/build/ANN_BENCH: benchmark::RunSpecifiedBenchmarks() +0x6a [0x5585385aaada] #13 in ./cpp/build/ANN_BENCH: raft::bench::ann::run_main(int, char**) +0x11ed [0x5585385980cd] #14 in /lib/x86_64-linux-gnu/libc.so.6(+0x28150) [0x7fb213e28150] #15 in /lib/x86_64-linux-gnu/libc.so.6: __libc_start_main +0x89 [0x7fb213e28209] #16 in ./cpp/build/ANN_BENCH(+0xbfcef) [0x55853852dcef] ``` Authors: - Artem M. Chirkin (https://github.com/achirkin) Approvers: - Tamas Bela Feher (https://github.com/tfeher) URL: #2188

teju85 mentioned this issue May 6, 2020

[REVIEW] migrating cumlHandle_impl (and related dependencies) into raft #3

Merged

divyegala pushed a commit to divyegala/raft that referenced this issue Oct 30, 2020

Merge pull request rapidsai#6 from afender/weight_alteration

90fe4ce

Weights alteration

github-actions bot added the inactive-90d label Feb 16, 2021

github-actions bot added the inactive-30d label Feb 16, 2021

github-actions bot removed inactive-30d inactive-90d labels Nov 23, 2021

github-actions bot added the inactive-30d label Dec 23, 2021

github-actions bot added the inactive-90d label Mar 23, 2022

copy-pr-bot bot pushed a commit that referenced this issue Aug 30, 2023

Merge pull request #6 from achirkin/enh-google-benchmarks

a45b646

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Interface support to reserve an address space without actually allocating it #6

[FEA] Interface support to reserve an address space without actually allocating it #6

teju85 commented May 6, 2020 •

edited

Loading

github-actions bot commented Feb 16, 2021

github-actions bot commented Feb 16, 2021

divyegala commented Jul 16, 2021

teju85 commented Jul 19, 2021 •

edited

Loading

seunghwak commented Jul 19, 2021

teju85 commented Jul 19, 2021

jrhemstad commented Jul 19, 2021

teju85 commented Jul 20, 2021

harrism commented Jul 20, 2021

teju85 commented Jul 20, 2021

seunghwak commented Jul 20, 2021

harrism commented Jul 20, 2021

github-actions bot commented Dec 23, 2021

github-actions bot commented Mar 23, 2022

[FEA] Interface support to reserve an address space without actually allocating it #6

[FEA] Interface support to reserve an address space without actually allocating it #6

Comments

teju85 commented May 6, 2020 • edited Loading

github-actions bot commented Feb 16, 2021

github-actions bot commented Feb 16, 2021

divyegala commented Jul 16, 2021

teju85 commented Jul 19, 2021 • edited Loading

seunghwak commented Jul 19, 2021

teju85 commented Jul 19, 2021

jrhemstad commented Jul 19, 2021

teju85 commented Jul 20, 2021

harrism commented Jul 20, 2021

teju85 commented Jul 20, 2021

seunghwak commented Jul 20, 2021

harrism commented Jul 20, 2021

github-actions bot commented Dec 23, 2021

github-actions bot commented Mar 23, 2022

teju85 commented May 6, 2020 •

edited

Loading

teju85 commented Jul 19, 2021 •

edited

Loading