You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Oct 30, 2019. It is now read-only.
The code runs smoothly without error on my own dataset with a single GPU and 4-GPU, but when using 8-GPUs, it encounters the error below. I tried many times, failed with the same error below. Why the gradOutput is reduced? Something wrong with the data?
| Epoch: [1][87/92] Time 6.657 Data 0.000 Err 1.2427 top1 100.000 top5 96.904
| Epoch: [1][88/92] Time 6.647 Data 0.000 Err 1.2477 top1 100.000 top5 97.309
| Epoch: [1][89/92] Time 6.410 Data 0.000 Err 1.1976 top1 100.000 top5 97.244
| Epoch: [1][90/92] Time 5.759 Data 0.000 Err 1.1808 top1 100.000 top5 97.732
| Epoch: [1][91/92] Time 6.149 Data 0.000 Err 1.1948 top1 100.000 top5 97.493
/home/scs4850/torch/install/bin/luajit: .../scs4850/torch/install/share/lua/5.1/threads/threads.lua:183: [thread 1 callback] /home/scs4850/torch/install/share/lua/5.1/nn/Container.lua:67:
In 19 module of nn.Sequential:
/home/scs4850/torch/install/share/lua/5.1/nn/THNN.lua:110: input and gradOutput have different number of elements: input[135000 x 26] has 3510000 elements, while gradOutput[121500 x 26] has 3159000 elements at /tmp/luarocks_cunn-scm-1-6007/cunn/lib/THCUNN/generic/Threshold.cu:44
stack traceback:
[C]: in function 'v'
/home/scs4850/torch/install/share/lua/5.1/nn/THNN.lua:110: in function 'Threshold_updateGradInput'
/home/scs4850/torch/install/share/lua/5.1/nn/Threshold.lua:32: in function 'updateGradInput'
/home/scs4850/torch/install/share/lua/5.1/nn/Module.lua:31: in function </home/scs4850/torch/install/share/lua/5.1/nn/Module.lua:29>
[C]: in function 'xpcall'
/home/scs4850/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
/home/scs4850/torch/install/share/lua/5.1/nn/Sequential.lua:84: in function </home/scs4850/torch/install/share/lua/5.1/nn/Sequential.lua:78>
[C]: in function 'xpcall'
.../scs4850/torch/install/share/lua/5.1/threads/threads.lua:234: in function 'callback'
/home/scs4850/torch/install/share/lua/5.1/threads/queue.lua:65: in function </home/scs4850/torch/install/share/lua/5.1/threads/queue.lua:41>
[C]: in function 'pcall'
/home/scs4850/torch/install/share/lua/5.1/threads/queue.lua:40: in function 'dojob'
[string " local Queue = require 'threads.queue'..."]:13: in main chunk
WARNING: If you see a stack trace below, it doesn't point to the place where this error occurred. Please use only the one above.
stack traceback:
[C]: in function 'error'
/home/scs4850/torch/install/share/lua/5.1/nn/Container.lua:67: in function 'rethrowErrors'
/home/scs4850/torch/install/share/lua/5.1/nn/Sequential.lua:84: in function </home/scs4850/torch/install/share/lua/5.1/nn/Sequential.lua:78>
[C]: in function 'xpcall'
.../scs4850/torch/install/share/lua/5.1/threads/threads.lua:234: in function 'callback'
/home/scs4850/torch/install/share/lua/5.1/threads/queue.lua:65: in function </home/scs4850/torch/install/share/lua/5.1/threads/queue.lua:41>
[C]: in function 'pcall'
/home/scs4850/torch/install/share/lua/5.1/threads/queue.lua:40: in function 'dojob'
[string " local Queue = require 'threads.queue'..."]:13: in main chunk
stack traceback:
[C]: in function 'error'
.../scs4850/torch/install/share/lua/5.1/threads/threads.lua:183: in function 'dojob'
.../scs4850/torch/install/share/lua/5.1/threads/threads.lua:264: in function 'synchronize'
...0/torch/install/share/lua/5.1/cunn/DataParallelTable.lua:717: in function 'exec'
...0/torch/install/share/lua/5.1/cunn/DataParallelTable.lua:229: in function 'backward'
./train.lua:77: in function 'train'
main.lua:50: in main chunk
[C]: in function 'dofile'
...4850/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00406670
Thanks
The text was updated successfully, but these errors were encountered:
@colesbury Yes, I modified the code based on my own project, but just the dimension of the related variables. It is strange that it runs smoothly without error with a single GPU and 4-GPU as I mentioned above.
Dear All,
The code runs smoothly without error on my own dataset with a single GPU and 4-GPU, but when using 8-GPUs, it encounters the error below. I tried many times, failed with the same error below. Why the gradOutput is reduced? Something wrong with the data?
Thanks
The text was updated successfully, but these errors were encountered: