out of memory 训练的时候显存一直在增长 #252

songhat · 2023-02-23T13:52:48Z

已经尝试的方法：
loss.item()没问题
dataloader加载数据也没有增长数据。

deepxzy · 2023-03-21T09:42:52Z

后面有人提到了，train.py第76行，这两个顺序不对的话好像是会造成显存泄露
change

    for ii, (img, bbox_, label_, scale) in tqdm(enumerate(dataloader)):

to

    for ii, (img, bbox_, label_, scale) in enumerate(tqdm(dataloader)):

songhat · 2023-03-21T14:44:21Z

@deepxzy hi！感谢你的回答，我尝试你的方案，但是不work!

fatejzz · 2023-08-10T07:41:14Z

我有类似的训练时内存不断增加的问题，调试之后发现是eval阶段内存占用会不断增大

hungphandinh92it · 2024-05-12T03:27:02Z

I train on nvidia pytorch docker and also have this problem. Try not to use the pin_memory resolve this problem.
on train.py
test_dataloader = data_.DataLoader(testset, batch_size=1, num_workers=opt.test_num_workers, shuffle=False, pin_memory=False )

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

out of memory 训练的时候显存一直在增长 #252

out of memory 训练的时候显存一直在增长 #252

songhat commented Feb 23, 2023

deepxzy commented Mar 21, 2023

songhat commented Mar 21, 2023

fatejzz commented Aug 10, 2023

hungphandinh92it commented May 12, 2024

out of memory 训练的时候显存一直在增长 #252

out of memory 训练的时候显存一直在增长 #252

Comments

songhat commented Feb 23, 2023

deepxzy commented Mar 21, 2023

songhat commented Mar 21, 2023

fatejzz commented Aug 10, 2023

hungphandinh92it commented May 12, 2024