Question about backward #2

louieworth · 2021-04-17T03:53:13Z

Hi Phil,

I have watched your video on Youtube. There's still a question about the critic_loss.backward(retain_graph=True). In your solution, you just turn the torch version from 1.8.1 to 1.4, I think it's a bug in version 1.4 and so that you are running bug-free in version 1.4.

I have checked a lot of information but I still don't know how to solve it. So here I am to turn to you.
Here is my Traceback:

File "main.py", line 101, in <module>
    maddpg_agent.learn(memory)
  File "maddpg.py", line 99, in learn
    critic_loss.backward(retain_graph=True)
  File "/usr/local/lib/python3.7/dist-packages/torch/tensor.py", line 245, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/usr/local/lib/python3.7/dist-packages/torch/autograd/__init__.py", line 147, in backward
    allow_unreachable=True, accumulate_grad=True)  # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [64, 8]], which is output 0 of TBackward, is at version 3; expected version 2 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

The text was updated successfully, but these errors were encountered:

msclar · 2021-06-02T21:37:37Z

Were you able to solve it? According to stack overflow, the detection of this error was broken in 1.4.0, and solved in 1.5.0. So it is possible that the gradient computation is incorrect.

LoveDLWujing · 2021-09-03T13:39:28Z

Hi, I just fixed it. U just need to modify the lines 72-89 in maddpg.py file

`Before

 for agent_idx, agent in enumerate(self.agents):
        critic_value_ = agent.target_critic.forward(states_, new_actions).flatten()
        critic_value_[dones[:,0]] = 0.0
        critic_value = agent.critic.forward(states, old_actions).flatten()
        target = rewards[:,agent_idx] + agent.gamma*critic_value_
        critic_loss = F.mse_loss(target, critic_value)
        agent.critic.optimizer.zero_grad()
        critic_loss.backward(retain_graph=True)
        agent.critic.optimizer.step()

        actor_loss = agent.critic.forward(states, mu).flatten()
        actor_loss = -T.mean(actor_loss)
        agent.actor.optimizer.zero_grad()
        actor_loss.backward(retain_graph=True)
        agent.actor.optimizer.step()

        agent.update_network_parameters()

`

` After

    for agent_idx, agent in enumerate(self.agents):
        agent.actor.optimizer.zero_grad()
        
    for agent_idx, agent in enumerate(self.agents):
        critic_value_ = agent.target_critic.forward(states_, new_actions).flatten()
        critic_value_[dones[:,0]] = 0.0
        critic_value = agent.critic.forward(states, old_actions).flatten()

        target = rewards[:,agent_idx] + agent.gamma*critic_value_
        critic_loss = F.mse_loss(target, critic_value)
        agent.critic.optimizer.zero_grad()
        critic_loss.backward(retain_graph=True)
        agent.critic.optimizer.step()

        actor_loss = agent.critic.forward(states, mu).flatten()
        actor_loss = -T.mean(actor_loss)
        actor_loss.backward(retain_graph=True)
        
    for agent_idx, agent in enumerate(self.agents):
        agent.actor.optimizer.step()
        agent.update_network_parameters()

`

Hope helpful to u

GEYOUR · 2021-12-04T07:42:26Z

don't know how but it actually works!

MoMingQimio · 2021-12-15T08:14:33Z

it works!!! thanks a lot !!!

guanjiayi · 2022-01-07T06:55:03Z

It works, extremely grateful!

DyHAN-1 · 2022-03-01T08:25:13Z

After the fix , i want to know that which version of pytorch you used. pytotch version is still 1.4.0?

DyHAN-1 · 2022-03-01T08:26:11Z

After the fix , i want to know that which version of pytorch you used. pytotch version is still 1.4.0?

Vishwanath1999 · 2022-03-04T14:25:17Z

Score does not converge in the above solution.

spinachAn · 2022-03-29T15:07:49Z

Score does not converge in the above solution.

Hello, I also found this problem when testing the code, my loss does not converge and gradually increase. Do you know how to solve this problem?

Emmanuel-Naive · 2022-04-29T12:38:09Z

I used another way to make codes run and the same problem about converging happens. So, I do not think it is the correct way to calculate gradients.

for agent_idx, agent in enumerate(self.agents):
            critic_value_ = agent.target_critic.forward(states_, new_actions).flatten()
            critic_value_[dones[:, 0]] = 0.0
            critic_value = agent.critic.forward(states, old_actions).flatten()
            target = rewards[:, agent_idx] + agent.gamma * critic_value_
            critic_loss = F.mse_loss(critic_value, target)
            agent.critic.optimizer.zero_grad()
            critic_loss = critic_loss.clone().detach().requires_grad_(True)
            critic_loss.backward(retain_graph=True)
            agent.critic.optimizer.step()

            actor_loss = agent.critic.forward(states, mu).flatten()
            actor_loss = -T.mean(actor_loss)
            actor_loss = actor_loss.clone().detach().requires_grad_(True)
            agent.actor.optimizer.zero_grad()
            actor_loss.backward(retain_graph=True)
            agent.actor.optimizer.step()

            agent.update_network_parameters()

Were you able to solve it? According to stack overflow, the detection of this error was broken in 1.4.0, and solved in 1.5.0. So it is possible that the gradient computation is incorrect.

Vishwanath1999 · 2022-08-07T04:25:16Z

I solved it!! At last! There were some logical errors in the implementation.. but they are corrected in the below snippet

    states = T.tensor(states, dtype=T.float).to(device)
    actions = T.tensor(actions, dtype=T.float).to(device)
    rewards = T.tensor(rewards, dtype=T.float).to(device)
    states_ = T.tensor(states_, dtype=T.float).to(device)
    dones = T.tensor(dones).to(device)
    

    all_agents_new_actions = []
    old_agents_actions = []

    
    for agent_idx, agent in enumerate(self.agents):

        new_states = T.tensor(actor_new_states[agent_idx], 
                            dtype=T.float).to(device)

        new_pi = agent.target_actor.forward(new_states)

        all_agents_new_actions.append(new_pi)
        old_agents_actions.append(actions[agent_idx])

    new_actions = T.cat([acts for acts in all_agents_new_actions], dim=1)
    old_actions = T.cat([acts for acts in old_agents_actions],dim=1)

    for agent_idx, agent in enumerate(self.agents):
        with T.no_grad():
            critic_value_ = agent.target_critic.forward(states_, new_actions).flatten()
            target = rewards[:,agent_idx] + (1-dones[:,0].int())*agent.gamma*critic_value_

        critic_value = agent.critic.forward(states, old_actions).flatten()
        
        critic_loss = F.mse_loss(target, critic_value)
        agent.critic.optimizer.zero_grad()
        critic_loss.backward(retain_graph=True)
        agent.critic.optimizer.step()

        mu_states = T.tensor(actor_states[agent_idx], dtype=T.float).to(device)
        oa = old_actions.clone()
        oa[:,agent_idx*self.n_actions:agent_idx*self.n_actions+self.n_actions] = agent.actor.forward(mu_states)            
        actor_loss = -T.mean(agent.critic.forward(states, oa).flatten())
        agent.actor.optimizer.zero_grad()
        actor_loss.backward(retain_graph=True)
        agent.actor.optimizer.step()
        
    for agent in self.agents:    
        agent.update_network_parameters()

louieworth closed this as completed Apr 21, 2021

LoveDLWujing mentioned this issue Sep 3, 2021

I just fixed the problem about backward #5

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about backward #2

Question about backward #2

louieworth commented Apr 17, 2021

msclar commented Jun 2, 2021

LoveDLWujing commented Sep 3, 2021 •

edited

Loading

GEYOUR commented Dec 4, 2021

MoMingQimio commented Dec 15, 2021

guanjiayi commented Jan 7, 2022

DyHAN-1 commented Mar 1, 2022

DyHAN-1 commented Mar 1, 2022

Vishwanath1999 commented Mar 4, 2022

spinachAn commented Mar 29, 2022

Emmanuel-Naive commented Apr 29, 2022

Vishwanath1999 commented Aug 7, 2022 •

edited

Loading

Question about backward #2

Question about backward #2

Comments

louieworth commented Apr 17, 2021

msclar commented Jun 2, 2021

LoveDLWujing commented Sep 3, 2021 • edited Loading

GEYOUR commented Dec 4, 2021

MoMingQimio commented Dec 15, 2021

guanjiayi commented Jan 7, 2022

DyHAN-1 commented Mar 1, 2022

DyHAN-1 commented Mar 1, 2022

Vishwanath1999 commented Mar 4, 2022

spinachAn commented Mar 29, 2022

Emmanuel-Naive commented Apr 29, 2022

Vishwanath1999 commented Aug 7, 2022 • edited Loading

LoveDLWujing commented Sep 3, 2021 •

edited

Loading

Vishwanath1999 commented Aug 7, 2022 •

edited

Loading