Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

多gpu下训练的model在单gpu下测试出错 #94

Open
wqt2019 opened this issue Oct 11, 2018 · 6 comments
Open

多gpu下训练的model在单gpu下测试出错 #94

wqt2019 opened this issue Oct 11, 2018 · 6 comments

Comments

@wqt2019
Copy link

wqt2019 commented Oct 11, 2018

在8个gpu下训练的densenet-model,在单gpu下测试,出现错误,You are trying to load a weight file containing 1 layers into a model with 55 layers ,如果在加载model的时候改成basemodel.load_weights(modelPath,by_name=True),有结果输出,但结果错的离谱,网上也都没有找到解决方案。
训练代码中加了model = multi_gpu_model(model, 8)

@zylo117
Copy link

zylo117 commented Oct 16, 2018

save weights的时候用单GPU的model去save。
写成这样,multi_gpu_model = multi_gpu_model(single_gpu_model, 8)。
用multi的去compile、训练,用single的去save weights/model

@lianqingsong
Copy link

请问您是怎么设置的呢,我现在用了博主的源代码,并没有改动什么,发现训练起来,并没有用到gpu,请问您知道怎么解决吗 @evanfly

@wqt2019
Copy link
Author

wqt2019 commented Jan 14, 2019

参考这个,已解决 http://www.codeleading.com/article/231257812/

@WHQCHINA
Copy link

@evanfly 参考你给的网址多GPU训练还是不对,报错
-----------Start training-----------
Traceback (most recent call last):
File "train.py", line 189, in
callbacks = [checkpoint, changelr, tensorboard])
File "/usr/local/lib/python2.7/dist-packages/keras/legacy/interfaces.py", line 91, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 1415, in fit_generator
initial_epoch=initial_epoch)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/training_generator.py", line 93, in fit_generator
callbacks.set_model(callback_model)
File "/usr/local/lib/python2.7/dist-packages/keras/callbacks.py", line 52, in set_model
callback.set_model(model)
File "/usr/local/lib/python2.7/dist-packages/keras/callbacks.py", line 746, in set_model
self.sess = K.get_session()
File "/usr/local/lib/python2.7/dist-packages/keras/backend/tensorflow_backend.py", line 197, in get_session
[tf.is_variable_initialized(v) for v in candidate_vars])
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 887, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1110, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1286, in _do_run
run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1308, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation 'replica_0/lambda_3/Slice': Could not satisfy explicit device specification '/device:GPU:0' because no supported kernel for GPU devices is available.,你看看啥问题吗?

@wqt2019
Copy link
Author

wqt2019 commented Mar 1, 2019

在 model = Model(inputs=[input, labels, input_length, label_length], outputs=loss_out) 后面加上model = multi_gpu_model(model, 8)就行了,多gpu训练保存模型,然后重新写个脚本先load再保存就行了。
#-- coding:utf-8 --
from keras import backend as K
import os
import shutil
import tensorflow as tf
import keys
import densenet
from keras.layers import Input
from keras.models import Model
from keras.utils import multi_gpu_model

reload(densenet)
characters = keys.alphabet[:]
characters = characters[1:] + u'卍'
nclass = len(characters)
input = Input(shape=(32, None, 1), name='the_input')
y_pred= densenet.dense_cnn(input, nclass)
single_basemodel = Model(inputs=input, outputs=y_pred)
modelPath = os.path.join(os.getcwd(),'densenet-06-2.76.h5')
basemodel = multi_gpu_model(single_basemodel, gpus=8)
basemodel.load_weights(modelPath)
single_basemodel.save('densenet_single_gpu-7_06-2.76.h5') #多gpu训练,单gpu保存
print('done...')

@WHQCHINA
Copy link

WHQCHINA commented Mar 1, 2019

@evanfly 谢谢,已经解决

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants