Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

W&B Bug Report: Information missing from 'Overview' Panel #2741

Closed
glenn-jocher opened this issue Apr 8, 2021 · 30 comments
Closed

W&B Bug Report: Information missing from 'Overview' Panel #2741

glenn-jocher opened this issue Apr 8, 2021 · 30 comments
Labels
bug Something isn't working Stale

Comments

@glenn-jocher
Copy link
Member

glenn-jocher commented Apr 8, 2021

馃悰 Bug

Default W&B logging is not recording the same information to the 'Overview' screen as before, in particular 'Command', which is required to reproduce, as well as System Hardware, Python Version, and others.

To Reproduce (REQUIRED)

In Colab Notebook: https://colab.research.google.com/github/ultralytics/yolov5/blob/master/tutorial.ipynb

  1. Run Setup cell
  2. Run W&B login cell
  3. Run Train cell

View bug at W&B console: https://wandb.ai/glenn-jocher/YOLOv5

Earlier Overview (GOOD):

https://wandb.ai/glenn-jocher/batch_size/runs/1hogmvhh/overview?workspace=user-glenn-jocher
Screenshot 2021-04-08 at 23 49 17

Current Overview (BAD):

https://wandb.ai/glenn-jocher/YOLOv5/runs/23u3uq0x/overview?workspace=user-glenn-jocher
Screenshot 2021-04-08 at 23 49 20

Screenshot 2021-04-08 at 23 42 24

@glenn-jocher glenn-jocher added the bug Something isn't working label Apr 8, 2021
@glenn-jocher
Copy link
Member Author

@AyushExel this second bug appeared at the same time as #2740 and may be related. Can you please look into this? I'm interested in launching a new activation study but I can't start this currently as the trainings won't be logged correctly in W&B.

@AyushExel
Copy link
Contributor

@glenn-jocher I think this isn't related to integration. Probably someone introduced a front-end bug which doesn't display all the info. This will be resolved. I'll raise a ticket. Looking into the other issue now. That too, I think isn't related to the integration. Looking.

@AyushExel
Copy link
Contributor

@glenn-jocher I cannot reproduce this. I just ran this run from the official yolo colab -> https://wandb.ai/cayush/yoloV5/runs/3fgnniz5?workspace=user-cayush
I see all the info in the overview section and all the images are visible too. Are you able to reproduce these on your end?

@glenn-jocher
Copy link
Member Author

@AyushExel that's strange. I restarted my computer and retrained again just now with the same problem. Checked on Chrome and Safari, and on my mobile device but no images are showing. Two new training runs are https://wandb.ai/glenn-jocher/YOLOv5

I see your results are showing up well though, https://wandb.ai/cayush/yoloV5/runs/3fgnniz5?workspace=user-cayush looks good.

@AyushExel
Copy link
Contributor

@glenn-jocher That's weird. I think this isn't a problem originating from the repo as the same colab is working fine for me and no one else has reported this. I'll get someone to look into this issue today.
Meanwhile, can you try a few things to see if the problem still persists:

  • Delete the YOLOv5 project from your account and then re-run the example.
  • Try to log into a new project by setting --project as something other than YOLOv5.

I'm looking into this and I'll reach out later today as soon as I know more.

@AyushExel
Copy link
Contributor

@glenn-jocher Let me know as soon as you try these out. It will definitely help in debugging.

Also, can you move the description of Issue #2740 here as well, so that it's easier to tend to both issues in one place

@glenn-jocher
Copy link
Member Author

@AyushExel I deleted YOLOv5 project yesterday when it first happened, but I'll try again today, and also try to log to --project bug. Oh, this is new. Now some images appear and others not. Overview panel appears better. I checked my storage to see if I was full, but it says 46G/100G so I should be fine there.

https://wandb.ai/glenn-jocher/bug/runs/2he5mx9f
Screenshot 2021-04-09 at 16 18 55

@AyushExel
Copy link
Contributor

@glenn-jocher The above run loads perfectly for me. And all the images are present in the files tab.

@glenn-jocher
Copy link
Member Author

glenn-jocher commented Apr 9, 2021

@AyushExel I tried again just now at same link and now all images appear correctly (!). Tried a new training to default project though and same missing images: https://wandb.ai/glenn-jocher/YOLOv5/runs/17tt5p8w

EDIT: I'll try to delete both bug and YOLOv5 projects and rerun

@AyushExel
Copy link
Contributor

@glenn-jocher Awesome, this means that there's some edge case with that project in particular. This is helpful. Thanks.
Also, don't worry about the storage. The limit isn't enforced yet.

@glenn-jocher
Copy link
Member Author

@AyushExel ok I've deleted both and rerun one training in both projects:

Result is that bug project works well, YOLOv5 does not. Hmm yes problem seems mainly in default project!

@AyushExel
Copy link
Contributor

@glenn-jocher yes that is helpful. I'll get back to you as soon as I have more info about this

@glenn-jocher
Copy link
Member Author

glenn-jocher commented Apr 9, 2021

@AyushExel ok thanks! This should be enough to get our activations study launched today at least :)

@AyushExel
Copy link
Contributor

AyushExel commented Apr 9, 2021

@glenn-jocher I have no idea what's activation study is though or maybe I forgot if we discussed it 馃槀

@glenn-jocher
Copy link
Member Author

@AyushExel it's just an idea I had to test the effect of different activations on training, i.e.:

class Conv(nn.Module):
    # Standard convolution
    def __init__(self, c1, c2, k=1, s=1, p=None, g=1, act=True):  # ch_in, ch_out, kernel, stride, padding, groups
        super(Conv, self).__init__()
        self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p), groups=g, bias=False)
        self.bn = nn.BatchNorm2d(c2)
        # self.act = nn.Identity() if act is True else (act if isinstance(act, nn.Module) else nn.Identity())
        # self.act = nn.Tanh() if act is True else (act if isinstance(act, nn.Module) else nn.Identity())
        # self.act = nn.Sigmoid() if act is True else (act if isinstance(act, nn.Module) else nn.Identity())
        # self.act = nn.ReLU() if act is True else (act if isinstance(act, nn.Module) else nn.Identity())
        # self.act = nn.LeakyReLU(0.1) if act is True else (act if isinstance(act, nn.Module) else nn.Identity())
        # self.act = nn.Hardswish() if act is True else (act if isinstance(act, nn.Module) else nn.Identity())
        # self.act = nn.SiLU() if act is True else (act if isinstance(act, nn.Module) else nn.Identity())
        self.act = Mish() if act is True else (act if isinstance(act, nn.Module) else nn.Identity())

    def forward(self, x):
        return self.act(self.bn(self.conv(x)))

    def fuseforward(self, x):
        return self.act(self.conv(x))

@AyushExel
Copy link
Contributor

Ohh awesome. btw, fun fact, Diganta Mishra, who wrote the Mish activation function paper, works on my team :)

@glenn-jocher
Copy link
Member Author

glenn-jocher commented Apr 9, 2021

@AyushExel wow interesting! I've had a few conversations with him before when he first published his function. Is he part of W&B now? We haven't officially adopted Mish() due to the increased resource requirements and the relative similarity with SiLU() results in our earlier experiments. Let's see what this activation study shows, should be ready in a few days if I can start it today.

Was first delayed by this issue and then today by my own bug I fixed in #2750, took me a while to isolate the source.

@glenn-jocher
Copy link
Member Author

@AyushExel activation study is live here! Looks like we have 3 close candidates at the top, and 1 clear loser so far. Many epochs to go still though.
https://wandb.ai/glenn-jocher/activations

@AyushExel
Copy link
Contributor

@glenn-jocher awesome :)

@ghost
Copy link

ghost commented May 4, 2021

@glenn-jocher
@AyushExel

Today I had the same problem as @glenn-jocher had above.

During the first training sessions this morning, the images were displayed. See picture 1

Unfortunately not tonight. See picture 2.

These are both specially created projects.

grafik

grafik

@AyushExel
Copy link
Contributor

@cyberFoxi It can happen because the images were not uploaded by the time you checked. Can you re-run and wait for thr run to sync completely and see if the problem still exists? Also, have you tried changing the project name using --project argparse? If none of these work, can you please share a link to the run, so I can have someone take a look at this ?

@ghost
Copy link

ghost commented May 4, 2021

@AyushExel Thank you for your very quick reply.
Ok, that's really funny. That's exactly what I did with --project ... and restarted a yolov5 training session.

Now the pictures are there.

Then there was probably a problem with the project

Thank you.

@github-actions
Copy link
Contributor

github-actions bot commented Jun 4, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@andimarafioti
Copy link

I had this issue and I debugged it to the name I gave the keys of my log dictionary. ie: wandb.log({"/val/image": wandb.Image(my_image)}) doesn't work, but wandb.log({"val_image": wandb.Image(my_image)}) does.

@AyushExel
Copy link
Contributor

@andimarafioti are you facing this issue in the current yolov5 repository?

@andimarafioti
Copy link

Hi @AyushExel , No, I found this issue while trying to solve the bug I was having with B&W, I actually thought this issue was from some wandb repository. It's probably not related to the issue you had here or your repository.

@AyushExel
Copy link
Contributor

@andimarafioti okay thanks. I think this is to do with how windows paths are handled. You can't use '/' in the names. The same works for linux. Are you on windows?

@andimarafioti
Copy link

I'm seeing the records on windows, but the code is running on linux.

@andimarafioti
Copy link

Also, using '/' in the names works for the rest of the things I'm logging, just not for media.

@AyushExel
Copy link
Contributor

@andimarafioti thanks for confirming.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Stale
Projects
None yet
Development

No branches or pull requests

3 participants