'CUDA out of memory' in CUDA 11.0 #290

ybbbbt · 2021-01-07T03:13:40Z

Describe the bug
Dear Authors,
thanks for the great work. Currently, I have encountered a problem of 'out of memory' when using MinkowskiEngine (mainly for FCGF) in CUDA 11.0, but it works fine in CUDA 10.2.
The output is shown as below:

RuntimeError: CUDA out of memory. Tried to allocate 23.81 GiB (GPU 0; 10.76 GiB total capacity; 6.08 MiB already allocated; 5.03 GiB free; 22.00 MiB reserved in total by PyTorch)

To Reproduce

import torch
import numpy as np
import MinkowskiEngine as ME

xyz = np.random.uniform(-10, 10, (2000, 3))  # [N, 3]
feats = []
feats.append(np.ones((len(xyz), 1)))
feats = np.hstack(feats)

voxel_size = 0.025
# Voxelize xyz and feats
coords = np.floor(xyz / voxel_size)
_, unique_map, inverse_map = ME.utils.sparse_quantize(coords, return_index=True, return_inverse=True)
inds = unique_map
coords = coords[inds]
return_coords = xyz[inds]
coords = ME.utils.batched_coordinates([coords])

feats = feats[inds]

feats = torch.tensor(feats, dtype=torch.float32)
coords = coords.to(dtype=torch.int32)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

stensor = ME.SparseTensor(feats, coordinates=coords, device=device)

xx = torch.ones((4, 480, 640)).cuda()
bs, ys, xs = torch.where(xx > 0)

Desktop (please complete the following information):

CUDA version: 11.0
NVIDIA Driver version: 460.27.04
OS: Ubuntu 18.04
Minkowski Engine version 0.5.0

Additional context
When I comment the stensor = ME.SparseTensor(feats, coordinates=coords, device=device), the bs, ys, xs = torch.where(xx > 0) will works without 'out of memory'.

The text was updated successfully, but these errors were encountered:

chrischoy · 2021-01-07T03:51:40Z

This bus is not related to Minkowski Engine. You are trying to allocate 23.81 GiB memory using torch.where.
The bug seems to be fixed on the latest version of torch >= 1.7.0.

ybbbbt · 2021-01-07T05:08:02Z

Hi, thanks for your speedy reply.

I use the pytorch 1.7.0 with conda install pytorch==1.7.0 torchvision cudatoolkit=11.0 -c pytorch.

In CUDA 10.2, the above code only consume GPU memory no more than 1G.
In CUDA 11.0, even I reduce the variable xx to a tiny size (e.g. 1*4*6, see the code below), the out of memory issue still exist.
But when I remove the ME.SparseTensor(*), torch.where would not allocate such a large memory.

xx = torch.ones((1, 4, 6)).cuda()
bs, ys, xs = torch.where(xx > 0)

Besides, I also meet the same out of memory issue when writing code like a[a > th] = 0.

Even when I use torch BCELoss, something strange would also happen. (I'm sure that the input size and target size are identically the same)

  File "/home/aaa/anaconda3/envs/torch_1.7_cuda_11.0_py37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/aaa/anaconda3/envs/torch_1.7_cuda_11.0_py37/lib/python3.7/site-packages/torch/nn/modules/loss.py", line 530, in forward
    return F.binary_cross_entropy(input, target, weight=self.weight, reduction=self.reduction)
  File "/home/aaa/anaconda3/envs/torch_1.7_cuda_11.0_py37/lib/python3.7/site-packages/torch/nn/functional.py", line 2519, in binary_cross_entropy
    "Please ensure they have the same size.".format(target.size(), input.size()))
ValueError: Using a target size (torch.Size([0])) that is different to the input size (torch.Size([1])) is deprecated. Please ensure they have the same size

This would not happen when I remove ME.SparseTensor(*) or downgrade to CUDA 10.2.
So I guess this may be some compatible issue with CUDA 11.0?

Anyway, thank you very much to take time on this issue. I will also try out the pytorch 1.7.1 asap.

chrischoy · 2021-01-07T08:52:08Z

I see. It seems like it was a CUDA error and I was able to reproduce the error on 11.0. Fortunately, it seems that the error was fixed on 11.1.

Please go to https://developer.nvidia.com/cuda-11.1.1-download-archive to download 11.1.

wget https://developer.download.nvidia.com/compute/cuda/11.1.1/local_installers/cuda_11.1.1_455.32.00_linux.run
sudo sh cuda_11.1.1_455.32.00_linux.run --toolkit --silent --override

# Install MinkowskiEngine with CUDA 11.1
export CUDA_HOME=/usr/local/cuda-11.1; pip install MinkowskiEngine -v --no-deps

ybbbbt changed the title ~~CUDA out of memory in CUDA 11.0~~ 'CUDA out of memory' in CUDA 11.0 Jan 7, 2021

chrischoy closed this as completed Jan 7, 2021

chrischoy added a commit that referenced this issue Jan 7, 2021

CUDA 11.0 issue (#290)

6db3c49

Tanazzah pushed a commit to Tanazzah/MinkowskiEngine that referenced this issue Feb 9, 2024

CUDA 11.0 issue (NVIDIA#290)

deaee4a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

'CUDA out of memory' in CUDA 11.0 #290

'CUDA out of memory' in CUDA 11.0 #290

ybbbbt commented Jan 7, 2021

chrischoy commented Jan 7, 2021 •

edited

Loading

ybbbbt commented Jan 7, 2021

chrischoy commented Jan 7, 2021 •

edited

Loading

'CUDA out of memory' in CUDA 11.0 #290

'CUDA out of memory' in CUDA 11.0 #290

Comments

ybbbbt commented Jan 7, 2021

chrischoy commented Jan 7, 2021 • edited Loading

ybbbbt commented Jan 7, 2021

chrischoy commented Jan 7, 2021 • edited Loading

chrischoy commented Jan 7, 2021 •

edited

Loading

chrischoy commented Jan 7, 2021 •

edited

Loading