Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about backward #2

Closed
louieworth opened this issue Apr 17, 2021 · 11 comments
Closed

Question about backward #2

louieworth opened this issue Apr 17, 2021 · 11 comments

Comments

@louieworth
Copy link

Hi Phil,

I have watched your video on Youtube. There's still a question about the critic_loss.backward(retain_graph=True). In your solution, you just turn the torch version from 1.8.1 to 1.4, I think it's a bug in version 1.4 and so that you are running bug-free in version 1.4.

I have checked a lot of information but I still don't know how to solve it. So here I am to turn to you.
Here is my Traceback:

File "main.py", line 101, in <module>
    maddpg_agent.learn(memory)
  File "maddpg.py", line 99, in learn
    critic_loss.backward(retain_graph=True)
  File "/usr/local/lib/python3.7/dist-packages/torch/tensor.py", line 245, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/usr/local/lib/python3.7/dist-packages/torch/autograd/__init__.py", line 147, in backward
    allow_unreachable=True, accumulate_grad=True)  # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [64, 8]], which is output 0 of TBackward, is at version 3; expected version 2 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
@msclar
Copy link

msclar commented Jun 2, 2021

Were you able to solve it? According to stack overflow, the detection of this error was broken in 1.4.0, and solved in 1.5.0. So it is possible that the gradient computation is incorrect.

@LoveDLWujing
Copy link

LoveDLWujing commented Sep 3, 2021

Hi, I just fixed it. U just need to modify the lines 72-89 in maddpg.py file

`Before

 for agent_idx, agent in enumerate(self.agents):
        critic_value_ = agent.target_critic.forward(states_, new_actions).flatten()
        critic_value_[dones[:,0]] = 0.0
        critic_value = agent.critic.forward(states, old_actions).flatten()
        target = rewards[:,agent_idx] + agent.gamma*critic_value_
        critic_loss = F.mse_loss(target, critic_value)
        agent.critic.optimizer.zero_grad()
        critic_loss.backward(retain_graph=True)
        agent.critic.optimizer.step()

        actor_loss = agent.critic.forward(states, mu).flatten()
        actor_loss = -T.mean(actor_loss)
        agent.actor.optimizer.zero_grad()
        actor_loss.backward(retain_graph=True)
        agent.actor.optimizer.step()

        agent.update_network_parameters()

`

` After

    for agent_idx, agent in enumerate(self.agents):
        agent.actor.optimizer.zero_grad()
        
    for agent_idx, agent in enumerate(self.agents):
        critic_value_ = agent.target_critic.forward(states_, new_actions).flatten()
        critic_value_[dones[:,0]] = 0.0
        critic_value = agent.critic.forward(states, old_actions).flatten()

        target = rewards[:,agent_idx] + agent.gamma*critic_value_
        critic_loss = F.mse_loss(target, critic_value)
        agent.critic.optimizer.zero_grad()
        critic_loss.backward(retain_graph=True)
        agent.critic.optimizer.step()

        actor_loss = agent.critic.forward(states, mu).flatten()
        actor_loss = -T.mean(actor_loss)
        actor_loss.backward(retain_graph=True)
        
    for agent_idx, agent in enumerate(self.agents):
        agent.actor.optimizer.step()
        agent.update_network_parameters()

`

Hope helpful to u

@GEYOUR
Copy link

GEYOUR commented Dec 4, 2021

don't know how but it actually works!

@MoMingQimio
Copy link

it works!!! thanks a lot !!!

@guanjiayi
Copy link

It works, extremely grateful!

@DyHAN-1
Copy link

DyHAN-1 commented Mar 1, 2022

After the fix , i want to know that which version of pytorch you used. pytotch version is still 1.4.0?

@DyHAN-1
Copy link

DyHAN-1 commented Mar 1, 2022

After the fix , i want to know that which version of pytorch you used. pytotch version is still 1.4.0?

@Vishwanath1999
Copy link

Score does not converge in the above solution.

@spinachAn
Copy link

Score does not converge in the above solution.

Hello, I also found this problem when testing the code, my loss does not converge and gradually increase. Do you know how to solve this problem?

@Emmanuel-Naive
Copy link

I used another way to make codes run and the same problem about converging happens. So, I do not think it is the correct way to calculate gradients.

for agent_idx, agent in enumerate(self.agents):
            critic_value_ = agent.target_critic.forward(states_, new_actions).flatten()
            critic_value_[dones[:, 0]] = 0.0
            critic_value = agent.critic.forward(states, old_actions).flatten()
            target = rewards[:, agent_idx] + agent.gamma * critic_value_
            critic_loss = F.mse_loss(critic_value, target)
            agent.critic.optimizer.zero_grad()
            critic_loss = critic_loss.clone().detach().requires_grad_(True)
            critic_loss.backward(retain_graph=True)
            agent.critic.optimizer.step()

            actor_loss = agent.critic.forward(states, mu).flatten()
            actor_loss = -T.mean(actor_loss)
            actor_loss = actor_loss.clone().detach().requires_grad_(True)
            agent.actor.optimizer.zero_grad()
            actor_loss.backward(retain_graph=True)
            agent.actor.optimizer.step()

            agent.update_network_parameters()

Were you able to solve it? According to stack overflow, the detection of this error was broken in 1.4.0, and solved in 1.5.0. So it is possible that the gradient computation is incorrect.

@Vishwanath1999
Copy link

Vishwanath1999 commented Aug 7, 2022

I solved it!! At last! There were some logical errors in the implementation.. but they are corrected in the below snippet

    states = T.tensor(states, dtype=T.float).to(device)
    actions = T.tensor(actions, dtype=T.float).to(device)
    rewards = T.tensor(rewards, dtype=T.float).to(device)
    states_ = T.tensor(states_, dtype=T.float).to(device)
    dones = T.tensor(dones).to(device)
    

    all_agents_new_actions = []
    old_agents_actions = []

    
    for agent_idx, agent in enumerate(self.agents):

        new_states = T.tensor(actor_new_states[agent_idx], 
                            dtype=T.float).to(device)

        new_pi = agent.target_actor.forward(new_states)

        all_agents_new_actions.append(new_pi)
        old_agents_actions.append(actions[agent_idx])

    new_actions = T.cat([acts for acts in all_agents_new_actions], dim=1)
    old_actions = T.cat([acts for acts in old_agents_actions],dim=1)

    for agent_idx, agent in enumerate(self.agents):
        with T.no_grad():
            critic_value_ = agent.target_critic.forward(states_, new_actions).flatten()
            target = rewards[:,agent_idx] + (1-dones[:,0].int())*agent.gamma*critic_value_

        critic_value = agent.critic.forward(states, old_actions).flatten()
        
        critic_loss = F.mse_loss(target, critic_value)
        agent.critic.optimizer.zero_grad()
        critic_loss.backward(retain_graph=True)
        agent.critic.optimizer.step()

        mu_states = T.tensor(actor_states[agent_idx], dtype=T.float).to(device)
        oa = old_actions.clone()
        oa[:,agent_idx*self.n_actions:agent_idx*self.n_actions+self.n_actions] = agent.actor.forward(mu_states)            
        actor_loss = -T.mean(agent.critic.forward(states, oa).flatten())
        agent.actor.optimizer.zero_grad()
        actor_loss.backward(retain_graph=True)
        agent.actor.optimizer.step()
        
    for agent in self.agents:    
        agent.update_network_parameters()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants