-
Notifications
You must be signed in to change notification settings - Fork 487
Detail on WeightDrop class _setup()
cuDNN RNN weight compacting issue & register_parameter()
#51
Comments
I share @esvhd's request: I'd love to have more details on this issue. |
I can explain part 2 of the question, but would love an explanation of part 1. Essentially in a forward pass of the network with WeightDrop, there needs to be two separate copies of each weight parameter. The first copy is During backpropagation, the gradient is propagated through |
@zplizzi thanks for the clarification. Let me make sure I got this right. The two separate copies of each weight parameters - the first, un-registered is needed by the forward pass, and the second, registered, is used in training model for applying dropout and computing gradients? During training, the registered weights are updated, then copied to the un-registered version for eval later? So for the section of Perhaps what I need to understand better is registered vs un-registered weights in Thanks. |
I think you mostly got it right. In the I'm not exactly sure the significance of |
Thanks @zplizzi , very helpful. |
Hi @zplizzi thanks for the great explanation. I just tried the code in the new version of pytorch (1.0), sadly this code will no longer work as there is new parameter check on RNN internal calculation I'm thinking to change the code by moving the parts where |
@zplizzi I do get the point that we are using the dropped version in the forward pass, but when backpropagating we are updating the raw weights. However, in the forward propagation, PyTorch searches for the weights named "weight_hh" in the parameters it has i believe, and in that case it cannot find it because the name has changed. So how does it perform the forward prop on the dropout mask? |
Hi there,
cc @Smerity
Thanks for sharing the code first of all. I've been diving into the details and would really appreciate if you could share some insight into
WeightDrop
class'self._setup()
method.I have 2 questions.
regarding the comment on the cuDNN RNN weight compacting issue, code here. Could anyone expand on what exactly this issue is?
Why does the code delete parameters and registering them again by calling
register_parameter()
? code hereThanks.
The text was updated successfully, but these errors were encountered: