Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

out of memory 训练的时候显存一直在增长 #252

Open
songhat opened this issue Feb 23, 2023 · 4 comments
Open

out of memory 训练的时候显存一直在增长 #252

songhat opened this issue Feb 23, 2023 · 4 comments

Comments

@songhat
Copy link

songhat commented Feb 23, 2023

已经尝试的方法:
loss.item()没问题
dataloader加载数据也没有增长数据。

@deepxzy
Copy link

deepxzy commented Mar 21, 2023

后面有人提到了,train.py第76行,这两个顺序不对的话好像是会造成显存泄露
change

    for ii, (img, bbox_, label_, scale) in tqdm(enumerate(dataloader)):

to

    for ii, (img, bbox_, label_, scale) in enumerate(tqdm(dataloader)):

@songhat
Copy link
Author

songhat commented Mar 21, 2023

@deepxzy hi!感谢你的回答,我尝试你的方案,但是不work!

@fatejzz
Copy link

fatejzz commented Aug 10, 2023

我有类似的训练时内存不断增加的问题,调试之后发现是eval阶段内存占用会不断增大

@hungphandinh92it
Copy link

I train on nvidia pytorch docker and also have this problem. Try not to use the pin_memory resolve this problem.
on train.py
test_dataloader = data_.DataLoader(testset, batch_size=1, num_workers=opt.test_num_workers, shuffle=False, pin_memory=False )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants