Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix redundant outputs via Logging in DDP training #500

Merged
merged 15 commits into from
Aug 11, 2020

Conversation

NanoCode012
Copy link
Contributor

@NanoCode012 NanoCode012 commented Jul 24, 2020

This PR fixes #463 second point.
Tested on coco128

python -m torch.distributed.launch --nproc_per_node 2 train.py --weights yolov5s.pt --cfg yolov5s.yaml --epochs 1 --img 320
Old output (current master)
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
Using CUDA Apex device0 _CudaDeviceProperties(name='Tesla V100-SXM2-32GB', total_memory=32480MB)
                device1 _CudaDeviceProperties(name='Tesla V100-SXM2-32GB', total_memory=32480MB)

Using CUDA Apex device0 _CudaDeviceProperties(name='Tesla V100-SXM2-32GB', total_memory=32480MB)
                device1 _CudaDeviceProperties(name='Tesla V100-SXM2-32GB', total_memory=32480MB)

Namespace(batch_size=8, bucket='', cache_images=False, cfg='./models/yolov5s.yaml', data='data/coco128.yaml', device='0,1', epochs=1, evolve=False, hyp='', img_size=[320, 320], local_rank=1, multi_scale=False, name='', noautoanchor=False, nosave=False, notest=False, rect=False, resume=False, single_cls=False, sync_bn=False, total_batch_size=16, weights='yolov5s.pt', world_size=2)
Hyperparameters {'optimizer': 'SGD', 'lr0': 0.01, 'momentum': 0.937, 'weight_decay': 0.0005, 'giou': 0.05, 'cls': 0.5, 'cls_pw': 1.0, 'obj': 1.0, 'obj_pw': 1.0, 'iou_t': 0.2, 'anchor_t': 4.0, 'fl_gamma': 0.0, 'hsv_h': 0.015, 'hsv_s': 0.7, 'hsv_v': 0.4, 'degrees': 0.0, 'translate': 0.0, 'scale': 0.5, 'shear': 0.0}
Namespace(batch_size=8, bucket='', cache_images=False, cfg='./models/yolov5s.yaml', data='data/coco128.yaml', device='0,1', epochs=1, evolve=False, hyp='', img_size=[320, 320], local_rank=0, multi_scale=False, name='', noautoanchor=False, nosave=False, notest=False, rect=False, resume=False, single_cls=False, sync_bn=False, total_batch_size=16, weights='yolov5s.pt', world_size=2)
Start Tensorboard with "tensorboard --logdir=runs", view at http://localhost:6006/

                 from  n    params  module                                  arguments                     
  0                -1  1      3520  models.common.Focus                     [3, 32, 3]                    
  1                -1  1     18560  models.common.Conv                      [32, 64, 3, 2]                
  2                -1  1     19904  models.common.BottleneckCSP             [64, 64, 1]                   
  3                -1  1     73984  models.common.Conv                      [64, 128, 3, 2]               
  4                -1  1    161152  models.common.BottleneckCSP             [128, 128, 3]                 
  5                -1  1    295424  models.common.Conv                      [128, 256, 3, 2]              
  6                -1  1    641792  models.common.BottleneckCSP             [256, 256, 3]                 
  7                -1  1   1180672  models.common.Conv                      [256, 512, 3, 2]              
  8                -1  1    656896  models.common.SPP                       [512, 512, [5, 9, 13]]        
  9                -1  1   1248768  models.common.BottleneckCSP             [512, 512, 1, False]          
 10                -1  1    131584  models.common.Conv                      [512, 256, 1, 1]              
 11                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 12           [-1, 6]  1         0  models.common.Concat                    [1]                           
 13                -1  1    378624  models.common.BottleneckCSP             [512, 256, 1, False]          
 14                -1  1     33024  models.common.Conv                      [256, 128, 1, 1]              
 15                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 16           [-1, 4]  1         0  models.common.Concat                    [1]                           
 17                -1  1     95104  models.common.BottleneckCSP             [256, 128, 1, False]          
 18                -1  1    147712  models.common.Conv                      [128, 128, 3, 2]              
 19          [-1, 14]  1         0  models.common.Concat                    [1]                           
 20                -1  1    313088  models.common.BottleneckCSP             [256, 256, 1, False]          
 21                -1  1    590336  models.common.Conv                      [256, 256, 3, 2]              
 22          [-1, 10]  1         0  models.common.Concat                    [1]                           
 23                -1  1   1248768  models.common.BottleneckCSP             [512, 512, 1, False]          
 24      [17, 20, 23]  1    229245  models.yolo.Detect                      [80, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [128, 256, 512]]
Model Summary: 191 layers, 7.46816e+06 parameters, 7.46816e+06 gradients

Hyperparameters {'optimizer': 'SGD', 'lr0': 0.01, 'momentum': 0.937, 'weight_decay': 0.0005, 'giou': 0.05, 'cls': 0.5, 'cls_pw': 1.0, 'obj': 1.0, 'obj_pw': 1.0, 'iou_t': 0.2, 'anchor_t': 4.0, 'fl_gamma': 0.0, 'hsv_h': 0.015, 'hsv_s': 0.7, 'hsv_v': 0.4, 'degrees': 0.0, 'translate': 0.0, 'scale': 0.5, 'shear': 0.0}

                 from  n    params  module                                  arguments                     
  0                -1  1      3520  models.common.Focus                     [3, 32, 3]                    
  1                -1  1     18560  models.common.Conv                      [32, 64, 3, 2]                
  2                -1  1     19904  models.common.BottleneckCSP             [64, 64, 1]                   
  3                -1  1     73984  models.common.Conv                      [64, 128, 3, 2]               
  4                -1  1    161152  models.common.BottleneckCSP             [128, 128, 3]                 
  5                -1  1    295424  models.common.Conv                      [128, 256, 3, 2]              
  6                -1  1    641792  models.common.BottleneckCSP             [256, 256, 3]                 
  7                -1  1   1180672  models.common.Conv                      [256, 512, 3, 2]              
  8                -1  1    656896  models.common.SPP                       [512, 512, [5, 9, 13]]        
  9                -1  1   1248768  models.common.BottleneckCSP             [512, 512, 1, False]          
 10                -1  1    131584  models.common.Conv                      [512, 256, 1, 1]              
 11                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 12           [-1, 6]  1         0  models.common.Concat                    [1]                           
 13                -1  1    378624  models.common.BottleneckCSP             [512, 256, 1, False]          
 14                -1  1     33024  models.common.Conv                      [256, 128, 1, 1]              
 15                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 16           [-1, 4]  1         0  models.common.Concat                    [1]                           
 17                -1  1     95104  models.common.BottleneckCSP             [256, 128, 1, False]          
 18                -1  1    147712  models.common.Conv                      [128, 128, 3, 2]              
 19          [-1, 14]  1         0  models.common.Concat                    [1]                           
 20                -1  1    313088  models.common.BottleneckCSP             [256, 256, 1, False]          
 21                -1  1    590336  models.common.Conv                      [256, 256, 3, 2]              
 22          [-1, 10]  1         0  models.common.Concat                    [1]                           
 23                -1  1   1248768  models.common.BottleneckCSP             [512, 512, 1, False]          
 24      [17, 20, 23]  1    229245  models.yolo.Detect                      [80, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [128, 256, 512]]
Model Summary: 191 layers, 7.46816e+06 parameters, 7.46816e+06 gradients

Optimizer groups: 62 .bias, 70 conv.weight, 59 other
Optimizer groups: 62 .bias, 70 conv.weight, 59 other
Transferred 368/370 items from yolov5s.pt
Transferred 368/370 items from yolov5s.pt
Scanning labels ../coco128/labels/train2017.cache (126 found, 0 missing, 2 empty
Scanning labels ../coco128/labels/train2017.cache (126 found, 0 missing, 2 empty
Scanning labels ../coco128/labels/train2017.cache (126 found, 0 missing, 2 empty

Analyzing anchors... Best Possible Recall (BPR) = 0.9591. Attempting to generate improved anchors, please wait...
WARNING: Extremely small objects found. 35 of 929 labels are < 3 pixels in width or height.
Running kmeans for 9 anchors on 927 points...
thr=0.25: 0.9731 best possible recall, 3.74 anchors past thr
n=9, img_size=320, metric_all=0.261/0.654-mean/best, past_thr=0.471-mean: 9,12,  32,19,  27,47,  73,43,  53,91,  77,161,  161,107,  174,237,  299,195
Evolving anchors with Genetic Algorithm: fitness = 0.6627: 100%|█| 1000/1000 [00
thr=0.25: 0.9957 best possible recall, 3.79 anchors past thr
n=9, img_size=320, metric_all=0.262/0.662-mean/best, past_thr=0.473-mean: 7,8,  17,12,  24,31,  58,39,  50,86,  71,146,  148,116,  144,240,  293,213
New anchors saved to model. Update model *.yaml to use these anchors in the future.

Image sizes 320 train, 320 test
Using 8 dataloader workers
Starting training for 1 epochs...

     Epoch   gpu_mem      GIoU       obj       cls     total   targets  img_size
       0/0    0.721G   0.08034    0.1729   0.03942    0.2927        55       320
               Class      Images     Targets           P           R      mAP@.5
                 all         128         929       0.172        0.65        0.44        0.24
Optimizer stripped from runs/exp1/weights/last.pt, 15.1MB
1 epochs completed in 0.011 hours.

New output
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************

Using CUDA Apex device0 _CudaDeviceProperties(name='Tesla V100-SXM2-32GB', total_memory=32480MB)
                device1 _CudaDeviceProperties(name='Tesla V100-SXM2-32GB', total_memory=32480MB)

Namespace(batch_size=8, bucket='', cache_images=False, cfg='./models/yolov5s.yaml', data='data/coco128.yaml', device='0,1', epochs=1, evolve=False, hyp='', img_size=[320, 320], local_rank=0, multi_scale=False, name='', noautoanchor=False, nosave=False, notest=False, rect=False, resume=False, single_cls=False, sync_bn=False, total_batch_size=16, weights='yolov5s.pt', world_size=2)
Start Tensorboard with "tensorboard --logdir=runs", view at http://localhost:6006/

Hyperparameter {'optimizer': 'SGD', 'lr0': 0.01, 'momentum': 0.937, 'weight_decay': 0.0005, 'giou': 0.05, 'cls': 0.5, 'cls_pw': 1.0, 'obj': 1.0, 'obj_pw': 1.0, 'iou_t': 0.2, 'anchor_t': 4.0, 'fl_gamma': 0.0, 'hsv_h': 0.015, 'hsv_s': 0.7, 'hsv_v': 0.4, 'degrees': 0.0, 'translate': 0.0, 'scale': 0.5, 'shear': 0.0}

                 from  n    params  module                                  arguments                     
  0                -1  1      3520  models.common.Focus                     [3, 32, 3]                    
  1                -1  1     18560  models.common.Conv                      [32, 64, 3, 2]                
  2                -1  1     19904  models.common.BottleneckCSP             [64, 64, 1]                   
  3                -1  1     73984  models.common.Conv                      [64, 128, 3, 2]               
  4                -1  1    161152  models.common.BottleneckCSP             [128, 128, 3]                 
  5                -1  1    295424  models.common.Conv                      [128, 256, 3, 2]              
  6                -1  1    641792  models.common.BottleneckCSP             [256, 256, 3]                 
  7                -1  1   1180672  models.common.Conv                      [256, 512, 3, 2]              
  8                -1  1    656896  models.common.SPP                       [512, 512, [5, 9, 13]]        
  9                -1  1   1248768  models.common.BottleneckCSP             [512, 512, 1, False]          
 10                -1  1    131584  models.common.Conv                      [512, 256, 1, 1]              
 11                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 12           [-1, 6]  1         0  models.common.Concat                    [1]                           
 13                -1  1    378624  models.common.BottleneckCSP             [512, 256, 1, False]          
 14                -1  1     33024  models.common.Conv                      [256, 128, 1, 1]              
 15                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 16           [-1, 4]  1         0  models.common.Concat                    [1]                           
 17                -1  1     95104  models.common.BottleneckCSP             [256, 128, 1, False]          
 18                -1  1    147712  models.common.Conv                      [128, 128, 3, 2]              
 19          [-1, 14]  1         0  models.common.Concat                    [1]                           
 20                -1  1    313088  models.common.BottleneckCSP             [256, 256, 1, False]          
 21                -1  1    590336  models.common.Conv                      [256, 256, 3, 2]              
 22          [-1, 10]  1         0  models.common.Concat                    [1]                           
 23                -1  1   1248768  models.common.BottleneckCSP             [512, 512, 1, False]          
 24      [17, 20, 23]  1    229245  models.yolo.Detect                      [80, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [128, 256, 512]]
Model Summary: 191 layers, 7.46816e+06 parameters, 7.46816e+06 gradients

Optimizer groups: 62 .bias, 70 conv.weight, 59 other
Transferred 368/370 items from yolov5s.pt
Scanning labels ../coco128/labels/train2017.cache (126 found, 0 missing, 2 empty
Scanning labels ../coco128/labels/train2017.cache (126 found, 0 missing, 2 empty
Scanning labels ../coco128/labels/train2017.cache (126 found, 0 missing, 2 empty

Analyzing anchors... Best Possible Recall (BPR) = 0.9591. Attempting to generate improved anchors, please wait...
WARNING: Extremely small objects found. 35 of 929 labels are < 3 pixels in width or height.
Running kmeans for 9 anchors on 927 points...
thr=0.25: 0.9731 best possible recall, 3.74 anchors past thr
n=9, img_size=320, metric_all=0.261/0.654-mean/best, past_thr=0.471-mean: 9,12,  32,19,  27,47,  73,43,  53,91,  77,161,  161,107,  174,237,  299,195
Evolving anchors with Genetic Algorithm: fitness = 0.6627: 100%|█| 1000/1000 [00
thr=0.25: 0.9957 best possible recall, 3.79 anchors past thr
n=9, img_size=320, metric_all=0.262/0.662-mean/best, past_thr=0.473-mean: 7,8,  17,12,  24,31,  58,39,  50,86,  71,146,  148,116,  144,240,  293,213
New anchors saved to model. Update model *.yaml to use these anchors in the future.

Image sizes 320 train, 320 test
Using 8 dataloader workers
Starting training for 1 epochs...

     Epoch   gpu_mem      GIoU       obj       cls     total   targets  img_size
       0/0    0.721G   0.08035    0.1729   0.03942    0.2926        55       320
               Class      Images     Targets           P           R      mAP@.5
                 all         128         929       0.173       0.649       0.441       0.241
Optimizer stripped from runs/exp15/weights/last.pt, 15.1MB
1 epochs completed in 0.010 hours.

The Old Output looks somewhat clean still because it is only outputting from 2 devices. 4 or 8 creates a mess.

I'm not sure whether I should set to use logging for everywhere for consistency, or only in the places that are affected by multi-gpu training.

  • Decide whether to change to logging everywhere or not. (Decision: Only multi-gpu areas)

  • Wait for test/detect refactor for multi-gpu

  • Wait for merge multi-node ddp support

  • Fix logging for multi-node

Edit: There is a "Scanning labels.." which repeats but it's part of tqdm, and not print. Not sure how to handle it right now.

🛠️ PR Summary

Made with ❤️ by Ultralytics Actions

🌟 Summary

Enhanced logging information for better tracking of model training processes.

📊 Key Changes

  • Added logging module imports to various files.
  • Replaced print statements with logger.info to standardize logging.
  • Added set_logging function to configure logging based on rank (necessary for distributed training setups).
  • Modified function signatures to use a shared rank variable for consistency.
  • Introduced logic to centralize the setting of DDP (Distributed Data Parallel) variables at the start of train.py.
  • Updated create_dataloader methods across train.py and datasets.py to ensure correct data processing in distributed environments.

🎯 Purpose & Impact

  • Purpose: The updates ensure that information during model training and dataset processing is logged efficiently. They streamline logging for better readability and management, especially when training models in distributed systems.
  • Impact: Users will benefit from clearer and more consistent logs, which are especially valuable for debugging and tracking the training of machine learning models at scale. The changes also promote clean coding practices and the efficient operation of distributed training sessions. 🧑‍💻🔍🌐

@glenn-jocher
Copy link
Member

@NanoCode012 I think changing only in multi-gpu affected regions makes sense.

Should we wait for the test / detect refactor first before this or do this one first? This PR does not affect test.py and detect.py.

"Scanning labels" messages should show up twice (as it treats the train and val datasets as independent, even if they both point to the same images). I see it show up 3 times, but this is not a huge problem. Also sometimes depending on your console tqdm messages may accidentally repeat by themselves even on single gpu.

@NanoCode012
Copy link
Contributor Author

NanoCode012 commented Jul 24, 2020

tqdm messages may accidentally repeat themselves

Okay I see!

Wait for test/detector PR merge?

I think so, then we can encapsulate it under one PR. However, you did mention that you want PR in small chunks...

Edit: I can create a new PR later to deal with the upcoming changes if you want.

I will set unit test to run in case I broke something tomorrow.

Edit2: Unit test success.

@glenn-jocher
Copy link
Member

@NanoCode012 just tried some multi-gpu training myself and saw the redundant output phenomenon. Let's see if we can resolve these conflicts and get this merged. Can you do a rebase on your side to origin/master?

@glenn-jocher
Copy link
Member

@NanoCode012 BTW, actual DDP ops seem to be working flawlessly. My use case is a 2x V100 training of v5x. I'll report the time difference once I get a few epochs completed.

@glenn-jocher
Copy link
Member

glenn-jocher commented Aug 6, 2020

nvidia-smi from epoch 0 and later to test high device 0 memory usage issue.

Epoch 0 train:

glenn_jocher_ultralytics_com@instance-7:~$ nvidia-smi
Thu Aug  6 17:50:26 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.01    Driver Version: 418.87.01    CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   54C    P0   210W / 300W |  15020MiB / 16130MiB |     84%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla V100-SXM2...  Off  | 00000000:00:05.0 Off |                    0 |
| N/A   54C    P0   223W / 300W |  14708MiB / 16130MiB |     88%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1806      C   /opt/conda/bin/python                      15009MiB |
|    1      1807      C   /opt/conda/bin/python                      14697MiB |
+-----------------------------------------------------------------------------+

epoch 0 test:

glenn_jocher_ultralytics_com@instance-7:~$ nvidia-smi
Thu Aug  6 18:19:45 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.01    Driver Version: 418.87.01    CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   50C    P0    84W / 300W |  14172MiB / 16130MiB |     99%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla V100-SXM2...  Off  | 00000000:00:05.0 Off |                    0 |
| N/A   40C    P0    56W / 300W |  14006MiB / 16130MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1806      C   /opt/conda/bin/python                      14159MiB |
|    1      1807      C   /opt/conda/bin/python                      13995MiB |
+-----------------------------------------------------------------------------+

epoch 1 train:

glenn_jocher_ultralytics_com@instance-7:~$ nvidia-smi
Thu Aug  6 18:21:59 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.01    Driver Version: 418.87.01    CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   55C    P0   277W / 300W |  14996MiB / 16130MiB |     99%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla V100-SXM2...  Off  | 00000000:00:05.0 Off |                    0 |
| N/A   55C    P0   263W / 300W |  14600MiB / 16130MiB |     99%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1806      C   /opt/conda/bin/python                      14981MiB |
|    1      1807      C   /opt/conda/bin/python                      14589MiB |
+-----------------------------------------------------------------------------+

@glenn-jocher
Copy link
Member

Just re-ran CI tests. Error on non-persistent buffers. Since PR predates the 1.6 update this seems logical. Ok this PR will need some work to properly rebase with origin/master. Perhaps simplest step is to merge #504 first (which just re-passed all CI tests now), and then tackle this one. @NanoCode012 sound good?

@NanoCode012
Copy link
Contributor Author

NanoCode012 commented Aug 6, 2020

Hi @glenn-jocher , the reason this PR is not ready is because if we merge the "multi-node" PR later, I will have to redo another "logging" PR to address the logging under multi-node.

Edit:

Just re-ran CI tests. Error on non-persistent buffers. Since PR predates the 1.6 update this seems logical. Ok this PR will need some work to properly rebase with origin/master. Perhaps simplest step is to merge #504 first (which just re-passed all CI tests now), and then tackle this one. @NanoCode012 sound good?

Yep! Sounds good!

@NanoCode012
Copy link
Contributor Author

Thanks @glenn-jocher , with multi-node merged, I will get working on this PR to bring it up to speed and add multi-node logging. It should be done by tomorrow or day after as it is late here.

@glenn-jocher
Copy link
Member

@NanoCode012 great, no rush. Glad to see we are making steady progress :)

@NanoCode012
Copy link
Contributor Author

NanoCode012 commented Aug 6, 2020

Hi @glenn-jocher , for the GPU memory issue that was mentioned in #610 , tkianai and I did test and we found out that it spiked after epoch 1 test #610 (comment) and #610 (comment) . Specifically saying the issue lies in test as --notest removes the issue.

I am not sure why it does not happen for you. Maybe it is because your maximum memory is 16 GB and it is already training at 14 GB (near its maximum)?

I will run my own test with 2 GPU as comparison. I think this should be in a separate Issue.

Edit: Add table

Command

python -m torch.distributed.launch --nproc_per_node 2 train.py --batch-size 64 --data coco.yaml --cfg yolov5s.yaml --weights ''
GPU Train 1 Train 3 Test 5
GPU 0 6145MiB 6483MiB 6483MiB
GPU 1 5965MiB 6295MiB 6295MiB

Edit 2: Due to losing track of time, I did not get to track down the inbetweens, however, we see that it doesn't spike as badly. I am confused why it spiked before. Was it because of 8 GPUs?

Edit 3: May take a bit longer for the rebase now. Got some other work to handle.

@glenn-jocher
Copy link
Member

DDP time difference:

50min/epoch 1x V100 --batch 16
29min/epoch 2x V100 --batch 32

2x GPU needs 58% of the min/epoch as 1 GPU. Awesome, DDP seems to be working great!! 👍

@NanoCode012
Copy link
Contributor Author

NanoCode012 commented Aug 11, 2020

Hi @glenn-jocher , I've done a rebase and fixed some leftovers. Sorry it took a while. I was a bit occupied with other stuff.

Logging output for 4 GPU DDP
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
Using CUDA device0 _CudaDeviceProperties(name='Tesla V100-SXM2-32GB', total_memory=32480MB)
           device1 _CudaDeviceProperties(name='Tesla V100-SXM2-32GB', total_memory=32480MB)
           device2 _CudaDeviceProperties(name='Tesla V100-SXM2-32GB', total_memory=32480MB)
           device3 _CudaDeviceProperties(name='Tesla V100-SXM2-32GB', total_memory=32480MB)

Namespace(adam=False, batch_size=4, bucket='', cache_images=False, cfg='', data='data/coco128.yaml', device='0,1,2,3', epochs=3, evolve=False, global_rank=0, hyp='data/hyp.finetune.yaml', img_size=[320, 320], local_rank=0, logdir='runs/', multi_scale=False, name='', noautoanchor=False, nosave=False, notest=False, rect=False, resume=False, single_cls=False, sync_bn=False, total_batch_size=16, weights='yolov5s.pt', world_size=4)
Start Tensorboard with "tensorboard --logdir runs/", view at http://localhost:6006/
2020-08-11 13:11:57.526967: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
Hyperparameter {'lr0': 0.01, 'momentum': 0.937, 'weight_decay': 0.0005, 'giou': 0.05, 'cls': 0.5, 'cls_pw': 1.0, 'obj': 1.0, 'obj_pw': 1.0, 'iou_t': 0.2, 'anchor_t': 4.0, 'fl_gamma': 0.0, 'hsv_h': 0.015, 'hsv_s': 0.7, 'hsv_v': 0.4, 'degrees': 0.0, 'translate': 0.5, 'scale': 0.5, 'shear': 0.0, 'perspective': 0.0, 'flipud': 0.0, 'fliplr': 0.5, 'mixup': 0.0}
Downloading https://drive.google.com/uc?export=download&id=1R5T6rIyy3lLwgFXNms8whc-387H0tMQO as yolov5s.pt...   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   279    0   279    0     0   1013      0 --:--:-- --:--:-- --:--:--  1014
100   408    0   408    0     0    341      0 --:--:--  0:00:01 --:--:--   553
  0     0    0     0    0     0      0      0 --:--:--  0:00:01 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:02 --:--:--     0
100 14.4M    0 14.4M    0     0  5344k      0 --:--:--  0:00:02 --:--:-- 41.2M
Done (6.2s)

                 from  n    params  module                                  arguments                     
  0                -1  1      3520  models.common.Focus                     [3, 32, 3]                    
  1                -1  1     18560  models.common.Conv                      [32, 64, 3, 2]                
  2                -1  1     19904  models.common.BottleneckCSP             [64, 64, 1]                   
  3                -1  1     73984  models.common.Conv                      [64, 128, 3, 2]               
  4                -1  1    161152  models.common.BottleneckCSP             [128, 128, 3]                 
  5                -1  1    295424  models.common.Conv                      [128, 256, 3, 2]              
  6                -1  1    641792  models.common.BottleneckCSP             [256, 256, 3]                 
  7                -1  1   1180672  models.common.Conv                      [256, 512, 3, 2]              
  8                -1  1    656896  models.common.SPP                       [512, 512, [5, 9, 13]]        
  9                -1  1   1248768  models.common.BottleneckCSP             [512, 512, 1, False]          
 10                -1  1    131584  models.common.Conv                      [512, 256, 1, 1]              
 11                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 12           [-1, 6]  1         0  models.common.Concat                    [1]                           
 13                -1  1    378624  models.common.BottleneckCSP             [512, 256, 1, False]          
 14                -1  1     33024  models.common.Conv                      [256, 128, 1, 1]              
 15                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 16           [-1, 4]  1         0  models.common.Concat                    [1]                           
 17                -1  1     95104  models.common.BottleneckCSP             [256, 128, 1, False]          
 18                -1  1    147712  models.common.Conv                      [128, 128, 3, 2]              
 19          [-1, 14]  1         0  models.common.Concat                    [1]                           
 20                -1  1    313088  models.common.BottleneckCSP             [256, 256, 1, False]          
 21                -1  1    590336  models.common.Conv                      [256, 256, 3, 2]              
 22          [-1, 10]  1         0  models.common.Concat                    [1]                           
 23                -1  1   1248768  models.common.BottleneckCSP             [512, 512, 1, False]          
 24      [17, 20, 23]  1    229245  models.yolo.Detect                      [80, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [128, 256, 512]]



Model Summary: 191 layers, 7.46816e+06 parameters, 7.46816e+06 gradients

Transferred 370/370 items from yolov5s.pt
Optimizer groups: 62 .bias, 70 conv.weight, 59 other
Scanning labels ../coco128/labels/train2017.cache (126 found, 0 missing, 2 empty, 0 duplicate, for 128 images): 128it [00:00, 9581.68it/s]
Scanning labels ../coco128/labels/train2017.cache (126 found, 0 missing, 2 empty, 0 duplicate, for 128 images): 128it [00:00, 16054.27it/s]

Analyzing anchors... anchors/target = 3.98, Best Possible Recall (BPR) = 0.9623. Attempting to generate improved anchors, please wait...
WARNING: Extremely small objects found. 35 of 929 labels are < 3 pixels in width or height.
Running kmeans for 9 anchors on 927 points...
thr=0.25: 0.9731 best possible recall, 3.74 anchors past thr
n=9, img_size=320, metric_all=0.261/0.654-mean/best, past_thr=0.471-mean: 9,12,  32,19,  27,47,  73,43,  53,91,  77,161,  161,107,  174,237,  299,195
Evolving anchors with Genetic Algorithm: fitness = 0.6577: 100%|█| 1000/1000 [00
thr=0.25: 0.9828 best possible recall, 3.79 anchors past thr
n=9, img_size=320, metric_all=0.263/0.660-mean/best, past_thr=0.473-mean: 9,10,  23,14,  30,42,  72,39,  53,87,  69,161,  146,128,  179,206,  292,225
New anchors saved to model. Update model *.yaml to use these anchors in the future.

Image sizes 320 train, 320 test
Using 4 dataloader workers
Starting training for 3 epochs...

     Epoch   gpu_mem      GIoU       obj       cls     total   targets  img_size
       0/2    0.535G   0.07882    0.1792   0.03896     0.297        38       320
               Class      Images     Targets           P           R      mAP@.5
                 all         128         929       0.154       0.627       0.399       0.192

     Epoch   gpu_mem      GIoU       obj       cls     total   targets  img_size
       1/2     1.71G   0.06736    0.1551   0.03594    0.2584        27       320
               Class      Images     Targets           P           R      mAP@.5
                 all         128         929       0.172        0.69       0.484       0.268

     Epoch   gpu_mem      GIoU       obj       cls     total   targets  img_size
       2/2     1.71G   0.05921    0.1718   0.03886    0.2699        82       320
               Class      Images     Targets           P           R      mAP@.5
                 all         128         929       0.129       0.706       0.474       0.232
Optimizer stripped from runs/exp0/weights/last.pt, 15.2MB
Optimizer stripped from runs/exp0/weights/best.pt, 15.2MB
3 epochs completed in 0.005 hours.
Current master output 4 GPU DDP
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
Using CUDA device0 _CudaDeviceProperties(name='Tesla V100-SXM2-32GB', total_memory=32480MB)
           device1 _CudaDeviceProperties(name='Tesla V100-SXM2-32GB', total_memory=32480MB)
           device2 _CudaDeviceProperties(name='Tesla V100-SXM2-32GB', total_memory=32480MB)
           device3 _CudaDeviceProperties(name='Tesla V100-SXM2-32GB', total_memory=32480MB)

Using CUDA device0 _CudaDeviceProperties(name='Tesla V100-SXM2-32GB', total_memory=32480MB)
           device1 _CudaDeviceProperties(name='Tesla V100-SXM2-32GB', total_memory=32480MB)
           device2 _CudaDeviceProperties(name='Tesla V100-SXM2-32GB', total_memory=32480MB)
           device3 _CudaDeviceProperties(name='Tesla V100-SXM2-32GB', total_memory=32480MB)

Using CUDA device0 _CudaDeviceProperties(name='Tesla V100-SXM2-32GB', total_memory=32480MB)
           device1 _CudaDeviceProperties(name='Tesla V100-SXM2-32GB', total_memory=32480MB)
           device2 _CudaDeviceProperties(name='Tesla V100-SXM2-32GB', total_memory=32480MB)
           device3 _CudaDeviceProperties(name='Tesla V100-SXM2-32GB', total_memory=32480MB)

Using CUDA device0 _CudaDeviceProperties(name='Tesla V100-SXM2-32GB', total_memory=32480MB)
           device1 _CudaDeviceProperties(name='Tesla V100-SXM2-32GB', total_memory=32480MB)
           device2 _CudaDeviceProperties(name='Tesla V100-SXM2-32GB', total_memory=32480MB)
           device3 _CudaDeviceProperties(name='Tesla V100-SXM2-32GB', total_memory=32480MB)

Namespace(adam=False, batch_size=4, bucket='', cache_images=False, cfg='', data='data/coco128.yaml', device='0,1,2,3', epochs=3, evolve=False, global_rank=3, hyp='data/hyp.finetune.yaml', img_size=[320, 320], local_rank=3, logdir='runs/', multi_scale=False, name='', noautoanchor=False, nosave=False, notest=False, rect=False, resume=False, single_cls=False, sync_bn=False, total_batch_size=16, weights='yolov5s.pt', world_size=4)
Namespace(adam=False, batch_size=4, bucket='', cache_images=False, cfg='', data='data/coco128.yaml', device='0,1,2,3', epochs=3, evolve=False, global_rank=2, hyp='data/hyp.finetune.yaml', img_size=[320, 320], local_rank=2, logdir='runs/', multi_scale=False, name='', noautoanchor=False, nosave=False, notest=False, rect=False, resume=False, single_cls=False, sync_bn=False, total_batch_size=16, weights='yolov5s.pt', world_size=4)
Namespace(adam=False, batch_size=4, bucket='', cache_images=False, cfg='', data='data/coco128.yaml', device='0,1,2,3', epochs=3, evolve=False, global_rank=1, hyp='data/hyp.finetune.yaml', img_size=[320, 320], local_rank=1, logdir='runs/', multi_scale=False, name='', noautoanchor=False, nosave=False, notest=False, rect=False, resume=False, single_cls=False, sync_bn=False, total_batch_size=16, weights='yolov5s.pt', world_size=4)
Hyperparameters {'lr0': 0.01, 'momentum': 0.937, 'weight_decay': 0.0005, 'giou': 0.05, 'cls': 0.5, 'cls_pw': 1.0, 'obj': 1.0, 'obj_pw': 1.0, 'iou_t': 0.2, 'anchor_t': 4.0, 'fl_gamma': 0.0, 'hsv_h': 0.015, 'hsv_s': 0.7, 'hsv_v': 0.4, 'degrees': 0.0, 'translate': 0.5, 'scale': 0.5, 'shear': 0.0, 'perspective': 0.0, 'flipud': 0.0, 'fliplr': 0.5, 'mixup': 0.0}
Hyperparameters {'lr0': 0.01, 'momentum': 0.937, 'weight_decay': 0.0005, 'giou': 0.05, 'cls': 0.5, 'cls_pw': 1.0, 'obj': 1.0, 'obj_pw': 1.0, 'iou_t': 0.2, 'anchor_t': 4.0, 'fl_gamma': 0.0, 'hsv_h': 0.015, 'hsv_s': 0.7, 'hsv_v': 0.4, 'degrees': 0.0, 'translate': 0.5, 'scale': 0.5, 'shear': 0.0, 'perspective': 0.0, 'flipud': 0.0, 'fliplr': 0.5, 'mixup': 0.0}Namespace(adam=False, batch_size=4, bucket='', cache_images=False, cfg='', data='data/coco128.yaml', device='0,1,2,3', epochs=3, evolve=False, global_rank=0, hyp='data/hyp.finetune.yaml', img_size=[320, 320], local_rank=0, logdir='runs/', multi_scale=False, name='', noautoanchor=False, nosave=False, notest=False, rect=False, resume=False, single_cls=False, sync_bn=False, total_batch_size=16, weights='yolov5s.pt', world_size=4)

Hyperparameters {'lr0': 0.01, 'momentum': 0.937, 'weight_decay': 0.0005, 'giou': 0.05, 'cls': 0.5, 'cls_pw': 1.0, 'obj': 1.0, 'obj_pw': 1.0, 'iou_t': 0.2, 'anchor_t': 4.0, 'fl_gamma': 0.0, 'hsv_h': 0.015, 'hsv_s': 0.7, 'hsv_v': 0.4, 'degrees': 0.0, 'translate': 0.5, 'scale': 0.5, 'shear': 0.0, 'perspective': 0.0, 'flipud': 0.0, 'fliplr': 0.5, 'mixup': 0.0}
Start Tensorboard with "tensorboard --logdir runs/", view at http://localhost:6006/
2020-08-11 16:01:30.689269: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
Hyperparameters {'lr0': 0.01, 'momentum': 0.937, 'weight_decay': 0.0005, 'giou': 0.05, 'cls': 0.5, 'cls_pw': 1.0, 'obj': 1.0, 'obj_pw': 1.0, 'iou_t': 0.2, 'anchor_t': 4.0, 'fl_gamma': 0.0, 'hsv_h': 0.015, 'hsv_s': 0.7, 'hsv_v': 0.4, 'degrees': 0.0, 'translate': 0.5, 'scale': 0.5, 'shear': 0.0, 'perspective': 0.0, 'flipud': 0.0, 'fliplr': 0.5, 'mixup': 0.0}
Downloading https://drive.google.com/uc?export=download&id=1R5T6rIyy3lLwgFXNms8whc-387H0tMQO as yolov5s.pt...   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   279    0   279    0     0    985      0 --:--:-- --:--:-- --:--:--   982
100   408    0   408    0     0    396      0 --:--:--  0:00:01 --:--:--   396
  0     0    0     0    0     0      0      0 --:--:--  0:00:01 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:01 --:--:--     0
100 14.4M    0 14.4M    0     0  5309k      0 --:--:--  0:00:02 --:--:-- 28.9M
Done (7.3s)

                 from  n    params  module                                  arguments                     

                 from  n    params  module                                  arguments                     

                 from  n    params  module                                  arguments                     

                 from  n    params  module                                  arguments                     
  0                -1  1      3520  models.common.Focus                     [3, 32, 3]                    
  0                -1  1      3520  models.common.Focus                     [3, 32, 3]                    
  0                -1  1      3520  models.common.Focus                     [3, 32, 3]                    
  1                -1  1     18560  models.common.Conv                      [32, 64, 3, 2]                
  1                -1  1     18560  models.common.Conv                      [32, 64, 3, 2]                  0                -1  1      3520  models.common.Focus                     [3, 32, 3]                    

  1                -1  1     18560  models.common.Conv                      [32, 64, 3, 2]                
  1                -1  1     18560  models.common.Conv                      [32, 64, 3, 2]                
  2                -1  1     19904  models.common.BottleneckCSP             [64, 64, 1]                   
  2                -1  1     19904  models.common.BottleneckCSP             [64, 64, 1]                   
  2                -1  1     19904  models.common.BottleneckCSP             [64, 64, 1]                   
  3                -1  1     73984  models.common.Conv                      [64, 128, 3, 2]               
  2                -1  1     19904  models.common.BottleneckCSP             [64, 64, 1]                   
  3                -1  1     73984  models.common.Conv                      [64, 128, 3, 2]               
  3                -1  1     73984  models.common.Conv                      [64, 128, 3, 2]               
  3                -1  1     73984  models.common.Conv                      [64, 128, 3, 2]               
  4                -1  1    161152  models.common.BottleneckCSP             [128, 128, 3]                 
  4                -1  1    161152  models.common.BottleneckCSP             [128, 128, 3]                 
  4                -1  1    161152  models.common.BottleneckCSP             [128, 128, 3]                 
  4                -1  1    161152  models.common.BottleneckCSP             [128, 128, 3]                 
  5                -1  1    295424  models.common.Conv                      [128, 256, 3, 2]              
  5                -1  1    295424  models.common.Conv                      [128, 256, 3, 2]              
  5                -1  1    295424  models.common.Conv                      [128, 256, 3, 2]              
  5                -1  1    295424  models.common.Conv                      [128, 256, 3, 2]              
  6                -1  1    641792  models.common.BottleneckCSP             [256, 256, 3]                 
  6                -1  1    641792  models.common.BottleneckCSP             [256, 256, 3]                 
  6                -1  1    641792  models.common.BottleneckCSP             [256, 256, 3]                 
  6                -1  1    641792  models.common.BottleneckCSP             [256, 256, 3]                 
  7                -1  1   1180672  models.common.Conv                      [256, 512, 3, 2]              
  7                -1  1   1180672  models.common.Conv                      [256, 512, 3, 2]              
  7                -1  1   1180672  models.common.Conv                      [256, 512, 3, 2]              
  7                -1  1   1180672  models.common.Conv                      [256, 512, 3, 2]              
  8                -1  1    656896  models.common.SPP                       [512, 512, [5, 9, 13]]        
  8                -1  1    656896  models.common.SPP                       [512, 512, [5, 9, 13]]        
  8                -1  1    656896  models.common.SPP                       [512, 512, [5, 9, 13]]        
  8                -1  1    656896  models.common.SPP                       [512, 512, [5, 9, 13]]        
  9                -1  1   1248768  models.common.BottleneckCSP             [512, 512, 1, False]          
  9                -1  1   1248768  models.common.BottleneckCSP             [512, 512, 1, False]          
  9                -1  1   1248768  models.common.BottleneckCSP             [512, 512, 1, False]          
 10                -1  1    131584  models.common.Conv                      [512, 256, 1, 1]              
 11                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
  9                -1  1   1248768  models.common.BottleneckCSP             [512, 512, 1, False]          
 12           [-1, 6]  1         0  models.common.Concat                    [1]                           
 10                -1  1    131584  models.common.Conv                      [512, 256, 1, 1]              
 11                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 12           [-1, 6]  1         0  models.common.Concat                    [1]                           
 10                -1  1    131584  models.common.Conv                      [512, 256, 1, 1]              
 11                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 12           [-1, 6]  1         0  models.common.Concat                    [1]                           
 10                -1  1    131584  models.common.Conv                      [512, 256, 1, 1]              
 11                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 12           [-1, 6]  1         0  models.common.Concat                    [1]                           
 13                -1  1    378624  models.common.BottleneckCSP             [512, 256, 1, False]          
 14                -1  1     33024  models.common.Conv                      [256, 128, 1, 1]              
 15                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']           13                -1  1    378624  models.common.BottleneckCSP             [512, 256, 1, False]          

 16           [-1, 4]  1         0  models.common.Concat                    [1]                           
 13                -1  1    378624  models.common.BottleneckCSP             [512, 256, 1, False]          
 14                -1  1     33024  models.common.Conv                      [256, 128, 1, 1]              
 15                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 16           [-1, 4]  1         0  models.common.Concat                    [1]                           
 13                -1  1    378624  models.common.BottleneckCSP             [512, 256, 1, False]          
 14                -1  1     33024  models.common.Conv                      [256, 128, 1, 1]              
 15                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 16           [-1, 4]  1         0  models.common.Concat                    [1]                           
 14                -1  1     33024  models.common.Conv                      [256, 128, 1, 1]              
 15                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 16           [-1, 4]  1         0  models.common.Concat                    [1]                           
 17                -1  1     95104  models.common.BottleneckCSP             [256, 128, 1, False]          
 17                -1  1     95104  models.common.BottleneckCSP             [256, 128, 1, False]          
 17                -1  1     95104  models.common.BottleneckCSP             [256, 128, 1, False]          
 18                -1  1    147712  models.common.Conv                      [128, 128, 3, 2]               17                -1  1     95104  models.common.BottleneckCSP             [256, 128, 1, False]          

 19          [-1, 14]  1         0  models.common.Concat                    [1]                           
 18                -1  1    147712  models.common.Conv                      [128, 128, 3, 2]              
 19          [-1, 14]  1         0  models.common.Concat                    [1]                           
 18                -1  1    147712  models.common.Conv                      [128, 128, 3, 2]              
 19          [-1, 14]  1         0  models.common.Concat                    [1]                           
 18                -1  1    147712  models.common.Conv                      [128, 128, 3, 2]              
 19          [-1, 14]  1         0  models.common.Concat                    [1]                           
 20                -1  1    313088  models.common.BottleneckCSP             [256, 256, 1, False]          
 20                -1  1    313088  models.common.BottleneckCSP             [256, 256, 1, False]          
 20                -1  1    313088  models.common.BottleneckCSP             [256, 256, 1, False]          
 20                -1  1    313088  models.common.BottleneckCSP             [256, 256, 1, False]          
 21                -1  1    590336  models.common.Conv                      [256, 256, 3, 2]              
 22          [-1, 10]  1         0  models.common.Concat                    [1]                           
 21                -1  1    590336  models.common.Conv                      [256, 256, 3, 2]              
 22          [-1, 10]  1         0  models.common.Concat                    [1]                           
 21                -1  1    590336  models.common.Conv                      [256, 256, 3, 2]              
 22          [-1, 10]  1         0  models.common.Concat                    [1]                           
 21                -1  1    590336  models.common.Conv                      [256, 256, 3, 2]              
 22          [-1, 10]  1         0  models.common.Concat                    [1]                           
 23                -1  1   1248768  models.common.BottleneckCSP             [512, 512, 1, False]          
 23                -1  1   1248768  models.common.BottleneckCSP             [512, 512, 1, False]          
 23                -1  1   1248768  models.common.BottleneckCSP             [512, 512, 1, False]          
 23                -1  1   1248768  models.common.BottleneckCSP             [512, 512, 1, False]          
 24      [17, 20, 23]  1    229245  models.yolo.Detect                      [80, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [128, 256, 512]]
 24      [17, 20, 23]  1    229245  models.yolo.Detect                      [80, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [128, 256, 512]]
 24      [17, 20, 23]  1    229245  models.yolo.Detect                      [80, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [128, 256, 512]]
 24      [17, 20, 23]  1    229245  models.yolo.Detect                      [80, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [128, 256, 512]]
Model Summary: 191 layers, 7.46816e+06 parameters, 7.46816e+06 gradients

Model Summary: 191 layers, 7.46816e+06 parameters, 7.46816e+06 gradients

Model Summary: 191 layers, 7.46816e+06 parameters, 7.46816e+06 gradients

Model Summary: 191 layers, 7.46816e+06 parameters, 7.46816e+06 gradients

Transferred 370/370 items from yolov5s.pt
Transferred 370/370 items from yolov5s.pt
Optimizer groups: 62 .bias, 70 conv.weight, 59 other
Optimizer groups: 62 .bias, 70 conv.weight, 59 other
Transferred 370/370 items from yolov5s.pt
Optimizer groups: 62 .bias, 70 conv.weight, 59 other
Transferred 370/370 items from yolov5s.pt
Optimizer groups: 62 .bias, 70 conv.weight, 59 other
Scanning labels ../coco128/labels/train2017.cache (126 found, 0 missing, 2 empty
Scanning labels ../coco128/labels/train2017.cache (126 found, 0 missing, 2 empty
Scanning labels ../coco128/labels/train2017.cache (126 found, 0 missing, 2 empty
Scanning labels ../coco128/labels/train2017.cache (126 found, 0 missing, 2 empty
Scanning labels ../coco128/labels/train2017.cache (126 found, 0 missing, 2 empty

Analyzing anchors... anchors/target = 3.97, Best Possible Recall (BPR) = 0.9569. Attempting to generate improved anchors, please wait...
WARNING: Extremely small objects found. 35 of 929 labels are < 3 pixels in width or height.
Running kmeans for 9 anchors on 927 points...
thr=0.25: 0.9623 best possible recall, 3.54 anchors past thr
n=9, img_size=320, metric_all=0.251/0.635-mean/best, past_thr=0.475-mean: 11,11,  30,34,  74,43,  46,87,  77,162,  135,100,  204,158,  158,280,  304,203
Evolving anchors with Genetic Algorithm: fitness = 0.6819: 100%|█| 1000/1000 [00
thr=0.25: 0.9914 best possible recall, 3.77 anchors past thr
n=9, img_size=320, metric_all=0.262/0.682-mean/best, past_thr=0.474-mean: 7,6,  9,16,  26,15,  30,41,  72,46,  56,110,  136,132,  183,195,  316,205
New anchors saved to model. Update model *.yaml to use these anchors in the future.

Image sizes 320 train, 320 test
Using 4 dataloader workers
Starting training for 3 epochs...

     Epoch   gpu_mem      GIoU       obj       cls     total   targets  img_size
       0/2    0.537G   0.07383    0.1703   0.04088     0.285        38       320
               Class      Images     Targets           P           R      mAP@.5
                 all         128         929       0.228       0.618       0.514       0.278

     Epoch   gpu_mem      GIoU       obj       cls     total   targets  img_size
       1/2     1.71G   0.06364     0.148    0.0377    0.2493        27       320
               Class      Images     Targets           P           R      mAP@.5
                 all         128         929       0.191       0.668        0.54       0.321

     Epoch   gpu_mem      GIoU       obj       cls     total   targets  img_size
       2/2     1.71G   0.05998    0.1575   0.03848     0.256        82       320
               Class      Images     Targets           P           R      mAP@.5
                 all         128         929       0.162       0.689       0.543       0.319
Optimizer stripped from runs/exp0/weights/last.pt, 15.2MB
Optimizer stripped from runs/exp0/weights/best.pt, 15.2MB
3 epochs completed in 0.005 hours.
Logging output on non-master machine for multi-node training
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************

Edit: Sorry for the multiple small commits, I just kept finding small inconsistencies while viewing the differences via the site and was correcting them. Please squash when merge.

Unit Test passed. This PR is finally ready. Please tell me if you want anything changed/explained.

@NanoCode012 NanoCode012 marked this pull request as ready for review August 11, 2020 12:55
@glenn-jocher
Copy link
Member

glenn-jocher commented Aug 11, 2020

@NanoCode012 thanks! I had an interesting idea that might make this change a little easier on the eyes. If we redefine print at the beginning of the functions that uses logger.info, would this cause any problems? I thought this might improve the readability, i.e.:

def function():
    print = logger.info
    
    # logger.info('hello')
    print('hello')

Could this work, or am I forgetting something?

@NanoCode012
Copy link
Contributor Author

I think it may be possible, not sure. But, I’m not sure it’s good for code readability later on. If someone else reads the code, they may miss the “logging” assignment.

@glenn-jocher
Copy link
Member

@NanoCode012 yes, maybe you are right. I've never used the logging package before, but replacing it with print will obscure the command. Ok I'll go ahead and merge!

@glenn-jocher glenn-jocher merged commit 4949401 into ultralytics:master Aug 11, 2020
@glenn-jocher glenn-jocher removed the TODO label Aug 11, 2020
@NanoCode012 NanoCode012 deleted the logging branch August 11, 2020 18:20
@glenn-jocher
Copy link
Member

glenn-jocher commented Aug 11, 2020

@NanoCode012 it looks like the update works error free, but some of the screen printing is different. For example colab isn't showing the CUDA information anymore, and Fusing ... should be followed by updated model.info() showing the new parameter count. I'll take a look.

Before:

Namespace(agnostic_nms=False, augment=False, classes=None, conf_thres=0.4, device='', img_size=640, iou_thres=0.5, output='inference/output', save_txt=False, source='inference/images', update=False, view_img=False, weights='yolov5s.pt')
Using CPU

Downloading https://github.com/ultralytics/yolov5/releases/download/v2.0/yolov5s.pt to yolov5s.pt...
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   623  100   623    0     0   1238      0 --:--:-- --:--:-- --:--:--  1238
100 14.4M  100 14.4M    0     0  1207k      0  0:00:12  0:00:12 --:--:-- 1195k

Fusing layers... Model Summary: 140 layers, 7.45958e+06 parameters, 6.61683e+06 gradients
image 1/2 /Users/glennjocher/PycharmProjects/yolov5/inference/images/bus.jpg: 640x512 4 persons, 1 buss, Done. (0.283s)
image 2/2 /Users/glennjocher/PycharmProjects/yolov5/inference/images/zidane.jpg: 384x640 2 persons, 2 ties, Done. (0.205s)
Results saved to inference/output
Done. (0.555s)

After:

Namespace(agnostic_nms=False, augment=False, classes=None, conf_thres=0.4, device='', img_size=640, iou_thres=0.5, output='inference/output', save_txt=False, source='inference/images', update=False, view_img=False, weights='yolov5s.pt')
Downloading https://github.com/ultralytics/yolov5/releases/download/v2.0/yolov5s.pt to yolov5s.pt...
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   623  100   623    0     0   3385      0 --:--:-- --:--:-- --:--:--  3404
100 14.4M  100 14.4M    0     0  8964k      0  0:00:01  0:00:01 --:--:-- 18.8M

Fusing layers... image 1/2 /Users/glennjocher/PycharmProjects/yolov5/inference/images/bus.jpg: 640x512 4 persons, 1 buss, Done. (0.284s)
image 2/2 /Users/glennjocher/PycharmProjects/yolov5/inference/images/zidane.jpg: 384x640 2 persons, 2 ties, Done. (0.212s)
Results saved to inference/output
Done. (0.566s)

@NanoCode012
Copy link
Contributor Author

Hi Glenn, I’ll take a look. It would seem we have to set_logging in the test/detect script as well. We might’ve missed other scripts too. Not sure what’s the best choice. Maybe do set_logging in the select_device function because they go hand in hand.

@NanoCode012
Copy link
Contributor Author

Hi @glenn-jocher .

New fix master...NanoCode012:logging-fix

Could not let Fusing layers... on same line as model.info() because logging does not support suppressing new line.

detect.py
Namespace(agnostic_nms=False, augment=False, classes=None, conf_thres=0.4, device='0', img_size=640, iou_thres=0.5, output='inference/output', save_txt=False, source='./inference/images/', update=False, view_img=False, weights=['yolov5m.pt'])
Using CUDA device0 _CudaDeviceProperties(name='Tesla V100-SXM2-32GB', total_memory=32480MB)

Downloading https://github.com/ultralytics/yolov5/releases/download/v2.0/yolov5m.pt to yolov5m.pt...
100%|██████████████████████████████████████| 41.9M/41.9M [00:06<00:00, 6.69MB/s]

Fusing layers... 
Model Summary: 188 layers, 2.17879e+07 parameters, 2.00672e+07 gradients
image 1/2 /yolov5/inference/images/bus.jpg: 640x512 4 persons, 1 buss, Done. (0.017s)
image 2/2 /yolov5/inference/images/zidane.jpg: 384x640 2 persons, 1 ties, Done. (0.016s)
Results saved to inference/output
Done. (0.146s)
test.py
Namespace(augment=False, batch_size=32, conf_thres=0.001, data='data/coco128.yaml', device='0', img_size=320, iou_thres=0.65, merge=False, save_json=False, save_txt=False, single_cls=False, task='val', verbose=False, weights=['yolov5s.pt'])
Using CUDA device0 _CudaDeviceProperties(name='Tesla V100-SXM2-32GB', total_memory=32480MB)

Downloading https://github.com/ultralytics/yolov5/releases/download/v2.0/yolov5s.pt to yolov5s.pt...
100%|███████████████████████████████████████| 14.5M/14.5M [00:56<00:00, 271kB/s]

Fusing layers... 
Model Summary: 140 layers, 7.45958e+06 parameters, 6.61683e+06 gradients
Scanning labels ../coco128/labels/train2017.cache (126 found, 0 missing, 2 empty, 0 duplicate, for 128 images): 128it [00:00, 15030.82it/s]
               Class      Images     Targets           P           R      mAP@.5
                 all         128         929       0.346       0.621       0.565       0.356
Speed: 0.5/1.3/1.8 ms inference/NMS/total per 320x320 image at batch-size 32
yolo.py
Using CUDA device0 _CudaDeviceProperties(name='Tesla V100-SXM2-32GB', total_memory=32480MB)


                 from  n    params  module                                  arguments                     
  0                -1  1      3520  models.common.Focus                     [3, 32, 3]                    
  1                -1  1     18560  models.common.Conv                      [32, 64, 3, 2]                
  2                -1  1     19904  models.common.BottleneckCSP             [64, 64, 1]                   
  3                -1  1     73984  models.common.Conv                      [64, 128, 3, 2]               
  4                -1  1    161152  models.common.BottleneckCSP             [128, 128, 3]                 
  5                -1  1    295424  models.common.Conv                      [128, 256, 3, 2]              
  6                -1  1    641792  models.common.BottleneckCSP             [256, 256, 3]                 
  7                -1  1   1180672  models.common.Conv                      [256, 512, 3, 2]              
  8                -1  1    656896  models.common.SPP                       [512, 512, [5, 9, 13]]        
  9                -1  1   1248768  models.common.BottleneckCSP             [512, 512, 1, False]          
 10                -1  1    131584  models.common.Conv                      [512, 256, 1, 1]              
 11                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 12           [-1, 6]  1         0  models.common.Concat                    [1]                           
 13                -1  1    378624  models.common.BottleneckCSP             [512, 256, 1, False]          
 14                -1  1     33024  models.common.Conv                      [256, 128, 1, 1]              
 15                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 16           [-1, 4]  1         0  models.common.Concat                    [1]                           
 17                -1  1     95104  models.common.BottleneckCSP             [256, 128, 1, False]          
 18                -1  1    147712  models.common.Conv                      [128, 128, 3, 2]              
 19          [-1, 14]  1         0  models.common.Concat                    [1]                           
 20                -1  1    313088  models.common.BottleneckCSP             [256, 256, 1, False]          
 21                -1  1    590336  models.common.Conv                      [256, 256, 3, 2]              
 22          [-1, 10]  1         0  models.common.Concat                    [1]                           
 23                -1  1   1248768  models.common.BottleneckCSP             [512, 512, 1, False]          
 24      [17, 20, 23]  1    229245  Detect                                  [80, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [128, 256, 512]]
Model Summary: 191 layers, 7.46816e+06 parameters, 7.46816e+06 gradients
export.py
Namespace(batch_size=1, img_size=[640, 640], weights='yolov5s.pt')
Downloading https://github.com/ultralytics/yolov5/releases/download/v2.0/yolov5s.pt to yolov5s.pt...
100%|██████████████████████████████████████| 14.5M/14.5M [00:14<00:00, 1.08MB/s]


Starting TorchScript export with torch 1.6.0...
/.conda/envs/py37/lib/python3.7/site-packages/torch/jit/__init__.py:1109: TracerWarning: Encountering a list at the output of the tracer might cause the trace to be incorrect, this is only valid if the container structure does not change based on the module's inputs. Consider using a constant container instead (e.g. for `list`, use a `tuple` instead. for `dict`, use a `NamedTuple` instead). If you absolutely need this and know the side effects, pass strict=False to trace() to allow this behavior.
  module._c._create_method_from_trace(method_name, func, example_inputs, var_lookup_fn, strict, _force_outplace)
TorchScript export success, saved as yolov5s.torchscript.pt

Starting ONNX export with onnx 1.6.0...
Fusing layers... 
Model Summary: 140 layers, 7.45958e+06 parameters, 6.61683e+06 gradients
graph torch-jit-export (
  %images[FLOAT, 1x3x640x640]
) initializers (
  %449[FLOAT, 4]
  %454[FLOAT, 4]
  %455[INT64, 1]
  %456[INT64, 1]
  %457[INT64, 1]
  %458[INT64, 1]
  %459[INT64, 1]
  %460[INT64, 1]
  %model.0.conv.conv.bias[FLOAT, 32]
  %model.0.conv.conv.weight[FLOAT, 32x12x3x3]
  %model.1.conv.bias[FLOAT, 64]
  %model.1.conv.weight[FLOAT, 64x32x3x3]
  %model.10.conv.bias[FLOAT, 256]
  %model.10.conv.weight[FLOAT, 256x512x1x1]
  %model.13.bn.bias[FLOAT, 256]
  %model.13.bn.running_mean[FLOAT, 256]
  %model.13.bn.running_var[FLOAT, 256]
  %model.13.bn.weight[FLOAT, 256]
  %model.13.cv1.conv.bias[FLOAT, 128]
  %model.13.cv1.conv.weight[FLOAT, 128x512x1x1]
  %model.13.cv2.weight[FLOAT, 128x512x1x1]
  %model.13.cv3.weight[FLOAT, 128x128x1x1]
  %model.13.cv4.conv.bias[FLOAT, 256]
  %model.13.cv4.conv.weight[FLOAT, 256x256x1x1]
  %model.13.m.0.cv1.conv.bias[FLOAT, 128]
  %model.13.m.0.cv1.conv.weight[FLOAT, 128x128x1x1]
  %model.13.m.0.cv2.conv.bias[FLOAT, 128]
  %model.13.m.0.cv2.conv.weight[FLOAT, 128x128x3x3]
  %model.14.conv.bias[FLOAT, 128]
  %model.14.conv.weight[FLOAT, 128x256x1x1]
  %model.17.bn.bias[FLOAT, 128]
  %model.17.bn.running_mean[FLOAT, 128]
  %model.17.bn.running_var[FLOAT, 128]
  %model.17.bn.weight[FLOAT, 128]
  %model.17.cv1.conv.bias[FLOAT, 64]
  %model.17.cv1.conv.weight[FLOAT, 64x256x1x1]
  %model.17.cv2.weight[FLOAT, 64x256x1x1]
  %model.17.cv3.weight[FLOAT, 64x64x1x1]
  %model.17.cv4.conv.bias[FLOAT, 128]
  %model.17.cv4.conv.weight[FLOAT, 128x128x1x1]
  %model.17.m.0.cv1.conv.bias[FLOAT, 64]
  %model.17.m.0.cv1.conv.weight[FLOAT, 64x64x1x1]
  %model.17.m.0.cv2.conv.bias[FLOAT, 64]
  %model.17.m.0.cv2.conv.weight[FLOAT, 64x64x3x3]
  %model.18.conv.bias[FLOAT, 128]
  %model.18.conv.weight[FLOAT, 128x128x3x3]
  %model.2.bn.bias[FLOAT, 64]
  %model.2.bn.running_mean[FLOAT, 64]
  %model.2.bn.running_var[FLOAT, 64]
  %model.2.bn.weight[FLOAT, 64]
  %model.2.cv1.conv.bias[FLOAT, 32]
  %model.2.cv1.conv.weight[FLOAT, 32x64x1x1]
  %model.2.cv2.weight[FLOAT, 32x64x1x1]
  %model.2.cv3.weight[FLOAT, 32x32x1x1]
  %model.2.cv4.conv.bias[FLOAT, 64]
  %model.2.cv4.conv.weight[FLOAT, 64x64x1x1]
  %model.2.m.0.cv1.conv.bias[FLOAT, 32]
  %model.2.m.0.cv1.conv.weight[FLOAT, 32x32x1x1]
  %model.2.m.0.cv2.conv.bias[FLOAT, 32]
  %model.2.m.0.cv2.conv.weight[FLOAT, 32x32x3x3]
  %model.20.bn.bias[FLOAT, 256]
  %model.20.bn.running_mean[FLOAT, 256]
  %model.20.bn.running_var[FLOAT, 256]
  %model.20.bn.weight[FLOAT, 256]
  %model.20.cv1.conv.bias[FLOAT, 128]
  %model.20.cv1.conv.weight[FLOAT, 128x256x1x1]
  %model.20.cv2.weight[FLOAT, 128x256x1x1]
  %model.20.cv3.weight[FLOAT, 128x128x1x1]
  %model.20.cv4.conv.bias[FLOAT, 256]
  %model.20.cv4.conv.weight[FLOAT, 256x256x1x1]
  %model.20.m.0.cv1.conv.bias[FLOAT, 128]
  %model.20.m.0.cv1.conv.weight[FLOAT, 128x128x1x1]
  %model.20.m.0.cv2.conv.bias[FLOAT, 128]
  %model.20.m.0.cv2.conv.weight[FLOAT, 128x128x3x3]
  %model.21.conv.bias[FLOAT, 256]
  %model.21.conv.weight[FLOAT, 256x256x3x3]
  %model.23.bn.bias[FLOAT, 512]
  %model.23.bn.running_mean[FLOAT, 512]
  %model.23.bn.running_var[FLOAT, 512]
  %model.23.bn.weight[FLOAT, 512]
  %model.23.cv1.conv.bias[FLOAT, 256]
  %model.23.cv1.conv.weight[FLOAT, 256x512x1x1]
  %model.23.cv2.weight[FLOAT, 256x512x1x1]
  %model.23.cv3.weight[FLOAT, 256x256x1x1]
  %model.23.cv4.conv.bias[FLOAT, 512]
  %model.23.cv4.conv.weight[FLOAT, 512x512x1x1]
  %model.23.m.0.cv1.conv.bias[FLOAT, 256]
  %model.23.m.0.cv1.conv.weight[FLOAT, 256x256x1x1]
  %model.23.m.0.cv2.conv.bias[FLOAT, 256]
  %model.23.m.0.cv2.conv.weight[FLOAT, 256x256x3x3]
  %model.24.m.0.bias[FLOAT, 255]
  %model.24.m.0.weight[FLOAT, 255x128x1x1]
  %model.24.m.1.bias[FLOAT, 255]
  %model.24.m.1.weight[FLOAT, 255x256x1x1]
  %model.24.m.2.bias[FLOAT, 255]
  %model.24.m.2.weight[FLOAT, 255x512x1x1]
  %model.3.conv.bias[FLOAT, 128]
  %model.3.conv.weight[FLOAT, 128x64x3x3]
  %model.4.bn.bias[FLOAT, 128]
  %model.4.bn.running_mean[FLOAT, 128]
  %model.4.bn.running_var[FLOAT, 128]
  %model.4.bn.weight[FLOAT, 128]
  %model.4.cv1.conv.bias[FLOAT, 64]
  %model.4.cv1.conv.weight[FLOAT, 64x128x1x1]
  %model.4.cv2.weight[FLOAT, 64x128x1x1]
  %model.4.cv3.weight[FLOAT, 64x64x1x1]
  %model.4.cv4.conv.bias[FLOAT, 128]
  %model.4.cv4.conv.weight[FLOAT, 128x128x1x1]
  %model.4.m.0.cv1.conv.bias[FLOAT, 64]
  %model.4.m.0.cv1.conv.weight[FLOAT, 64x64x1x1]
  %model.4.m.0.cv2.conv.bias[FLOAT, 64]
  %model.4.m.0.cv2.conv.weight[FLOAT, 64x64x3x3]
  %model.4.m.1.cv1.conv.bias[FLOAT, 64]
  %model.4.m.1.cv1.conv.weight[FLOAT, 64x64x1x1]
  %model.4.m.1.cv2.conv.bias[FLOAT, 64]
  %model.4.m.1.cv2.conv.weight[FLOAT, 64x64x3x3]
  %model.4.m.2.cv1.conv.bias[FLOAT, 64]
  %model.4.m.2.cv1.conv.weight[FLOAT, 64x64x1x1]
  %model.4.m.2.cv2.conv.bias[FLOAT, 64]
  %model.4.m.2.cv2.conv.weight[FLOAT, 64x64x3x3]
  %model.5.conv.bias[FLOAT, 256]
  %model.5.conv.weight[FLOAT, 256x128x3x3]
  %model.6.bn.bias[FLOAT, 256]
  %model.6.bn.running_mean[FLOAT, 256]
  %model.6.bn.running_var[FLOAT, 256]
  %model.6.bn.weight[FLOAT, 256]
  %model.6.cv1.conv.bias[FLOAT, 128]
  %model.6.cv1.conv.weight[FLOAT, 128x256x1x1]
  %model.6.cv2.weight[FLOAT, 128x256x1x1]
  %model.6.cv3.weight[FLOAT, 128x128x1x1]
  %model.6.cv4.conv.bias[FLOAT, 256]
  %model.6.cv4.conv.weight[FLOAT, 256x256x1x1]
  %model.6.m.0.cv1.conv.bias[FLOAT, 128]
  %model.6.m.0.cv1.conv.weight[FLOAT, 128x128x1x1]
  %model.6.m.0.cv2.conv.bias[FLOAT, 128]
  %model.6.m.0.cv2.conv.weight[FLOAT, 128x128x3x3]
  %model.6.m.1.cv1.conv.bias[FLOAT, 128]
  %model.6.m.1.cv1.conv.weight[FLOAT, 128x128x1x1]
  %model.6.m.1.cv2.conv.bias[FLOAT, 128]
  %model.6.m.1.cv2.conv.weight[FLOAT, 128x128x3x3]
  %model.6.m.2.cv1.conv.bias[FLOAT, 128]
  %model.6.m.2.cv1.conv.weight[FLOAT, 128x128x1x1]
  %model.6.m.2.cv2.conv.bias[FLOAT, 128]
  %model.6.m.2.cv2.conv.weight[FLOAT, 128x128x3x3]
  %model.7.conv.bias[FLOAT, 512]
  %model.7.conv.weight[FLOAT, 512x256x3x3]
  %model.8.cv1.conv.bias[FLOAT, 256]
  %model.8.cv1.conv.weight[FLOAT, 256x512x1x1]
  %model.8.cv2.conv.bias[FLOAT, 512]
  %model.8.cv2.conv.weight[FLOAT, 512x1024x1x1]
  %model.9.bn.bias[FLOAT, 512]
  %model.9.bn.running_mean[FLOAT, 512]
  %model.9.bn.running_var[FLOAT, 512]
  %model.9.bn.weight[FLOAT, 512]
  %model.9.cv1.conv.bias[FLOAT, 256]
  %model.9.cv1.conv.weight[FLOAT, 256x512x1x1]
  %model.9.cv2.weight[FLOAT, 256x512x1x1]
  %model.9.cv3.weight[FLOAT, 256x256x1x1]
  %model.9.cv4.conv.bias[FLOAT, 512]
  %model.9.cv4.conv.weight[FLOAT, 512x512x1x1]
  %model.9.m.0.cv1.conv.bias[FLOAT, 256]
  %model.9.m.0.cv1.conv.weight[FLOAT, 256x256x1x1]
  %model.9.m.0.cv2.conv.bias[FLOAT, 256]
  %model.9.m.0.cv2.conv.weight[FLOAT, 256x256x3x3]
) {
  %167 = Constant[value = <Tensor>]()
  %168 = Constant[value = <Tensor>]()
  %169 = Constant[value = <Tensor>]()
  %170 = Constant[value = <Tensor>]()
  %171 = Slice(%images, %168, %169, %167, %170)
  %172 = Constant[value = <Tensor>]()
  %173 = Constant[value = <Tensor>]()
  %174 = Constant[value = <Tensor>]()
  %175 = Constant[value = <Tensor>]()
  %176 = Slice(%171, %173, %174, %172, %175)
  %177 = Constant[value = <Tensor>]()
  %178 = Constant[value = <Tensor>]()
  %179 = Constant[value = <Tensor>]()
  %180 = Constant[value = <Tensor>]()
  %181 = Slice(%images, %178, %179, %177, %180)
  %182 = Constant[value = <Tensor>]()
  %183 = Constant[value = <Tensor>]()
  %184 = Constant[value = <Tensor>]()
  %185 = Constant[value = <Tensor>]()
  %186 = Slice(%181, %183, %184, %182, %185)
  %187 = Constant[value = <Tensor>]()
  %188 = Constant[value = <Tensor>]()
  %189 = Constant[value = <Tensor>]()
  %190 = Constant[value = <Tensor>]()
  %191 = Slice(%images, %188, %189, %187, %190)
  %192 = Constant[value = <Tensor>]()
  %193 = Constant[value = <Tensor>]()
  %194 = Constant[value = <Tensor>]()
  %195 = Constant[value = <Tensor>]()
  %196 = Slice(%191, %193, %194, %192, %195)
  %197 = Constant[value = <Tensor>]()
  %198 = Constant[value = <Tensor>]()
  %199 = Constant[value = <Tensor>]()
  %200 = Constant[value = <Tensor>]()
  %201 = Slice(%images, %198, %199, %197, %200)
  %202 = Constant[value = <Tensor>]()
  %203 = Constant[value = <Tensor>]()
  %204 = Constant[value = <Tensor>]()
  %205 = Constant[value = <Tensor>]()
  %206 = Slice(%201, %203, %204, %202, %205)
  %207 = Concat[axis = 1](%176, %186, %196, %206)
  %208 = Conv[dilations = [1, 1], group = 1, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [1, 1]](%207, %model.0.conv.conv.weight, %model.0.conv.conv.bias)
  %209 = LeakyRelu[alpha = 0.100000001490116](%208)
  %210 = Conv[dilations = [1, 1], group = 1, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [2, 2]](%209, %model.1.conv.weight, %model.1.conv.bias)
  %211 = LeakyRelu[alpha = 0.100000001490116](%210)
  %212 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%211, %model.2.cv1.conv.weight, %model.2.cv1.conv.bias)
  %213 = LeakyRelu[alpha = 0.100000001490116](%212)
  %214 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%213, %model.2.m.0.cv1.conv.weight, %model.2.m.0.cv1.conv.bias)
  %215 = LeakyRelu[alpha = 0.100000001490116](%214)
  %216 = Conv[dilations = [1, 1], group = 1, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [1, 1]](%215, %model.2.m.0.cv2.conv.weight, %model.2.m.0.cv2.conv.bias)
  %217 = LeakyRelu[alpha = 0.100000001490116](%216)
  %218 = Add(%213, %217)
  %219 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%218, %model.2.cv3.weight)
  %220 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%211, %model.2.cv2.weight)
  %221 = Concat[axis = 1](%219, %220)
  %222 = BatchNormalization[epsilon = 0.00100000004749745, momentum = 0.990000009536743](%221, %model.2.bn.weight, %model.2.bn.bias, %model.2.bn.running_mean, %model.2.bn.running_var)
  %223 = LeakyRelu[alpha = 0.100000001490116](%222)
  %224 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%223, %model.2.cv4.conv.weight, %model.2.cv4.conv.bias)
  %225 = LeakyRelu[alpha = 0.100000001490116](%224)
  %226 = Conv[dilations = [1, 1], group = 1, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [2, 2]](%225, %model.3.conv.weight, %model.3.conv.bias)
  %227 = LeakyRelu[alpha = 0.100000001490116](%226)
  %228 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%227, %model.4.cv1.conv.weight, %model.4.cv1.conv.bias)
  %229 = LeakyRelu[alpha = 0.100000001490116](%228)
  %230 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%229, %model.4.m.0.cv1.conv.weight, %model.4.m.0.cv1.conv.bias)
  %231 = LeakyRelu[alpha = 0.100000001490116](%230)
  %232 = Conv[dilations = [1, 1], group = 1, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [1, 1]](%231, %model.4.m.0.cv2.conv.weight, %model.4.m.0.cv2.conv.bias)
  %233 = LeakyRelu[alpha = 0.100000001490116](%232)
  %234 = Add(%229, %233)
  %235 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%234, %model.4.m.1.cv1.conv.weight, %model.4.m.1.cv1.conv.bias)
  %236 = LeakyRelu[alpha = 0.100000001490116](%235)
  %237 = Conv[dilations = [1, 1], group = 1, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [1, 1]](%236, %model.4.m.1.cv2.conv.weight, %model.4.m.1.cv2.conv.bias)
  %238 = LeakyRelu[alpha = 0.100000001490116](%237)
  %239 = Add(%234, %238)
  %240 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%239, %model.4.m.2.cv1.conv.weight, %model.4.m.2.cv1.conv.bias)
  %241 = LeakyRelu[alpha = 0.100000001490116](%240)
  %242 = Conv[dilations = [1, 1], group = 1, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [1, 1]](%241, %model.4.m.2.cv2.conv.weight, %model.4.m.2.cv2.conv.bias)
  %243 = LeakyRelu[alpha = 0.100000001490116](%242)
  %244 = Add(%239, %243)
  %245 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%244, %model.4.cv3.weight)
  %246 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%227, %model.4.cv2.weight)
  %247 = Concat[axis = 1](%245, %246)
  %248 = BatchNormalization[epsilon = 0.00100000004749745, momentum = 0.990000009536743](%247, %model.4.bn.weight, %model.4.bn.bias, %model.4.bn.running_mean, %model.4.bn.running_var)
  %249 = LeakyRelu[alpha = 0.100000001490116](%248)
  %250 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%249, %model.4.cv4.conv.weight, %model.4.cv4.conv.bias)
  %251 = LeakyRelu[alpha = 0.100000001490116](%250)
  %252 = Conv[dilations = [1, 1], group = 1, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [2, 2]](%251, %model.5.conv.weight, %model.5.conv.bias)
  %253 = LeakyRelu[alpha = 0.100000001490116](%252)
  %254 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%253, %model.6.cv1.conv.weight, %model.6.cv1.conv.bias)
  %255 = LeakyRelu[alpha = 0.100000001490116](%254)
  %256 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%255, %model.6.m.0.cv1.conv.weight, %model.6.m.0.cv1.conv.bias)
  %257 = LeakyRelu[alpha = 0.100000001490116](%256)
  %258 = Conv[dilations = [1, 1], group = 1, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [1, 1]](%257, %model.6.m.0.cv2.conv.weight, %model.6.m.0.cv2.conv.bias)
  %259 = LeakyRelu[alpha = 0.100000001490116](%258)
  %260 = Add(%255, %259)
  %261 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%260, %model.6.m.1.cv1.conv.weight, %model.6.m.1.cv1.conv.bias)
  %262 = LeakyRelu[alpha = 0.100000001490116](%261)
  %263 = Conv[dilations = [1, 1], group = 1, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [1, 1]](%262, %model.6.m.1.cv2.conv.weight, %model.6.m.1.cv2.conv.bias)
  %264 = LeakyRelu[alpha = 0.100000001490116](%263)
  %265 = Add(%260, %264)
  %266 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%265, %model.6.m.2.cv1.conv.weight, %model.6.m.2.cv1.conv.bias)
  %267 = LeakyRelu[alpha = 0.100000001490116](%266)
  %268 = Conv[dilations = [1, 1], group = 1, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [1, 1]](%267, %model.6.m.2.cv2.conv.weight, %model.6.m.2.cv2.conv.bias)
  %269 = LeakyRelu[alpha = 0.100000001490116](%268)
  %270 = Add(%265, %269)
  %271 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%270, %model.6.cv3.weight)
  %272 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%253, %model.6.cv2.weight)
  %273 = Concat[axis = 1](%271, %272)
  %274 = BatchNormalization[epsilon = 0.00100000004749745, momentum = 0.990000009536743](%273, %model.6.bn.weight, %model.6.bn.bias, %model.6.bn.running_mean, %model.6.bn.running_var)
  %275 = LeakyRelu[alpha = 0.100000001490116](%274)
  %276 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%275, %model.6.cv4.conv.weight, %model.6.cv4.conv.bias)
  %277 = LeakyRelu[alpha = 0.100000001490116](%276)
  %278 = Conv[dilations = [1, 1], group = 1, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [2, 2]](%277, %model.7.conv.weight, %model.7.conv.bias)
  %279 = LeakyRelu[alpha = 0.100000001490116](%278)
  %280 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%279, %model.8.cv1.conv.weight, %model.8.cv1.conv.bias)
  %281 = LeakyRelu[alpha = 0.100000001490116](%280)
  %282 = MaxPool[ceil_mode = 0, kernel_shape = [5, 5], pads = [2, 2, 2, 2], strides = [1, 1]](%281)
  %283 = MaxPool[ceil_mode = 0, kernel_shape = [9, 9], pads = [4, 4, 4, 4], strides = [1, 1]](%281)
  %284 = MaxPool[ceil_mode = 0, kernel_shape = [13, 13], pads = [6, 6, 6, 6], strides = [1, 1]](%281)
  %285 = Concat[axis = 1](%281, %282, %283, %284)
  %286 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%285, %model.8.cv2.conv.weight, %model.8.cv2.conv.bias)
  %287 = LeakyRelu[alpha = 0.100000001490116](%286)
  %288 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%287, %model.9.cv1.conv.weight, %model.9.cv1.conv.bias)
  %289 = LeakyRelu[alpha = 0.100000001490116](%288)
  %290 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%289, %model.9.m.0.cv1.conv.weight, %model.9.m.0.cv1.conv.bias)
  %291 = LeakyRelu[alpha = 0.100000001490116](%290)
  %292 = Conv[dilations = [1, 1], group = 1, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [1, 1]](%291, %model.9.m.0.cv2.conv.weight, %model.9.m.0.cv2.conv.bias)
  %293 = LeakyRelu[alpha = 0.100000001490116](%292)
  %294 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%293, %model.9.cv3.weight)
  %295 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%287, %model.9.cv2.weight)
  %296 = Concat[axis = 1](%294, %295)
  %297 = BatchNormalization[epsilon = 0.00100000004749745, momentum = 0.990000009536743](%296, %model.9.bn.weight, %model.9.bn.bias, %model.9.bn.running_mean, %model.9.bn.running_var)
  %298 = LeakyRelu[alpha = 0.100000001490116](%297)
  %299 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%298, %model.9.cv4.conv.weight, %model.9.cv4.conv.bias)
  %300 = LeakyRelu[alpha = 0.100000001490116](%299)
  %301 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%300, %model.10.conv.weight, %model.10.conv.bias)
  %302 = LeakyRelu[alpha = 0.100000001490116](%301)
  %311 = Constant[value = <Tensor>]()
  %312 = Resize[coordinate_transformation_mode = 'asymmetric', cubic_coeff_a = -0.75, mode = 'nearest', nearest_mode = 'floor'](%302, %311, %449)
  %313 = Concat[axis = 1](%312, %277)
  %314 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%313, %model.13.cv1.conv.weight, %model.13.cv1.conv.bias)
  %315 = LeakyRelu[alpha = 0.100000001490116](%314)
  %316 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%315, %model.13.m.0.cv1.conv.weight, %model.13.m.0.cv1.conv.bias)
  %317 = LeakyRelu[alpha = 0.100000001490116](%316)
  %318 = Conv[dilations = [1, 1], group = 1, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [1, 1]](%317, %model.13.m.0.cv2.conv.weight, %model.13.m.0.cv2.conv.bias)
  %319 = LeakyRelu[alpha = 0.100000001490116](%318)
  %320 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%319, %model.13.cv3.weight)
  %321 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%313, %model.13.cv2.weight)
  %322 = Concat[axis = 1](%320, %321)
  %323 = BatchNormalization[epsilon = 0.00100000004749745, momentum = 0.990000009536743](%322, %model.13.bn.weight, %model.13.bn.bias, %model.13.bn.running_mean, %model.13.bn.running_var)
  %324 = LeakyRelu[alpha = 0.100000001490116](%323)
  %325 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%324, %model.13.cv4.conv.weight, %model.13.cv4.conv.bias)
  %326 = LeakyRelu[alpha = 0.100000001490116](%325)
  %327 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%326, %model.14.conv.weight, %model.14.conv.bias)
  %328 = LeakyRelu[alpha = 0.100000001490116](%327)
  %337 = Constant[value = <Tensor>]()
  %338 = Resize[coordinate_transformation_mode = 'asymmetric', cubic_coeff_a = -0.75, mode = 'nearest', nearest_mode = 'floor'](%328, %337, %454)
  %339 = Concat[axis = 1](%338, %251)
  %340 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%339, %model.17.cv1.conv.weight, %model.17.cv1.conv.bias)
  %341 = LeakyRelu[alpha = 0.100000001490116](%340)
  %342 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%341, %model.17.m.0.cv1.conv.weight, %model.17.m.0.cv1.conv.bias)
  %343 = LeakyRelu[alpha = 0.100000001490116](%342)
  %344 = Conv[dilations = [1, 1], group = 1, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [1, 1]](%343, %model.17.m.0.cv2.conv.weight, %model.17.m.0.cv2.conv.bias)
  %345 = LeakyRelu[alpha = 0.100000001490116](%344)
  %346 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%345, %model.17.cv3.weight)
  %347 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%339, %model.17.cv2.weight)
  %348 = Concat[axis = 1](%346, %347)
  %349 = BatchNormalization[epsilon = 0.00100000004749745, momentum = 0.990000009536743](%348, %model.17.bn.weight, %model.17.bn.bias, %model.17.bn.running_mean, %model.17.bn.running_var)
  %350 = LeakyRelu[alpha = 0.100000001490116](%349)
  %351 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%350, %model.17.cv4.conv.weight, %model.17.cv4.conv.bias)
  %352 = LeakyRelu[alpha = 0.100000001490116](%351)
  %353 = Conv[dilations = [1, 1], group = 1, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [2, 2]](%352, %model.18.conv.weight, %model.18.conv.bias)
  %354 = LeakyRelu[alpha = 0.100000001490116](%353)
  %355 = Concat[axis = 1](%354, %328)
  %356 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%355, %model.20.cv1.conv.weight, %model.20.cv1.conv.bias)
  %357 = LeakyRelu[alpha = 0.100000001490116](%356)
  %358 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%357, %model.20.m.0.cv1.conv.weight, %model.20.m.0.cv1.conv.bias)
  %359 = LeakyRelu[alpha = 0.100000001490116](%358)
  %360 = Conv[dilations = [1, 1], group = 1, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [1, 1]](%359, %model.20.m.0.cv2.conv.weight, %model.20.m.0.cv2.conv.bias)
  %361 = LeakyRelu[alpha = 0.100000001490116](%360)
  %362 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%361, %model.20.cv3.weight)
  %363 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%355, %model.20.cv2.weight)
  %364 = Concat[axis = 1](%362, %363)
  %365 = BatchNormalization[epsilon = 0.00100000004749745, momentum = 0.990000009536743](%364, %model.20.bn.weight, %model.20.bn.bias, %model.20.bn.running_mean, %model.20.bn.running_var)
  %366 = LeakyRelu[alpha = 0.100000001490116](%365)
  %367 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%366, %model.20.cv4.conv.weight, %model.20.cv4.conv.bias)
  %368 = LeakyRelu[alpha = 0.100000001490116](%367)
  %369 = Conv[dilations = [1, 1], group = 1, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [2, 2]](%368, %model.21.conv.weight, %model.21.conv.bias)
  %370 = LeakyRelu[alpha = 0.100000001490116](%369)
  %371 = Concat[axis = 1](%370, %302)
  %372 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%371, %model.23.cv1.conv.weight, %model.23.cv1.conv.bias)
  %373 = LeakyRelu[alpha = 0.100000001490116](%372)
  %374 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%373, %model.23.m.0.cv1.conv.weight, %model.23.m.0.cv1.conv.bias)
  %375 = LeakyRelu[alpha = 0.100000001490116](%374)
  %376 = Conv[dilations = [1, 1], group = 1, kernel_shape = [3, 3], pads = [1, 1, 1, 1], strides = [1, 1]](%375, %model.23.m.0.cv2.conv.weight, %model.23.m.0.cv2.conv.bias)
  %377 = LeakyRelu[alpha = 0.100000001490116](%376)
  %378 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%377, %model.23.cv3.weight)
  %379 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%371, %model.23.cv2.weight)
  %380 = Concat[axis = 1](%378, %379)
  %381 = BatchNormalization[epsilon = 0.00100000004749745, momentum = 0.990000009536743](%380, %model.23.bn.weight, %model.23.bn.bias, %model.23.bn.running_mean, %model.23.bn.running_var)
  %382 = LeakyRelu[alpha = 0.100000001490116](%381)
  %383 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%382, %model.23.cv4.conv.weight, %model.23.cv4.conv.bias)
  %384 = LeakyRelu[alpha = 0.100000001490116](%383)
  %385 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%352, %model.24.m.0.weight, %model.24.m.0.bias)
  %386 = Shape(%385)
  %387 = Constant[value = <Scalar Tensor []>]()
  %388 = Gather[axis = 0](%386, %387)
  %389 = Shape(%385)
  %390 = Constant[value = <Scalar Tensor []>]()
  %391 = Gather[axis = 0](%389, %390)
  %392 = Shape(%385)
  %393 = Constant[value = <Scalar Tensor []>]()
  %394 = Gather[axis = 0](%392, %393)
  %397 = Unsqueeze[axes = [0]](%388)
  %400 = Unsqueeze[axes = [0]](%391)
  %401 = Unsqueeze[axes = [0]](%394)
  %402 = Concat[axis = 0](%397, %455, %456, %400, %401)
  %403 = Reshape(%385, %402)
  %output = Transpose[perm = [0, 1, 3, 4, 2]](%403)
  %405 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%368, %model.24.m.1.weight, %model.24.m.1.bias)
  %406 = Shape(%405)
  %407 = Constant[value = <Scalar Tensor []>]()
  %408 = Gather[axis = 0](%406, %407)
  %409 = Shape(%405)
  %410 = Constant[value = <Scalar Tensor []>]()
  %411 = Gather[axis = 0](%409, %410)
  %412 = Shape(%405)
  %413 = Constant[value = <Scalar Tensor []>]()
  %414 = Gather[axis = 0](%412, %413)
  %417 = Unsqueeze[axes = [0]](%408)
  %420 = Unsqueeze[axes = [0]](%411)
  %421 = Unsqueeze[axes = [0]](%414)
  %422 = Concat[axis = 0](%417, %457, %458, %420, %421)
  %423 = Reshape(%405, %422)
  %424 = Transpose[perm = [0, 1, 3, 4, 2]](%423)
  %425 = Conv[dilations = [1, 1], group = 1, kernel_shape = [1, 1], pads = [0, 0, 0, 0], strides = [1, 1]](%384, %model.24.m.2.weight, %model.24.m.2.bias)
  %426 = Shape(%425)
  %427 = Constant[value = <Scalar Tensor []>]()
  %428 = Gather[axis = 0](%426, %427)
  %429 = Shape(%425)
  %430 = Constant[value = <Scalar Tensor []>]()
  %431 = Gather[axis = 0](%429, %430)
  %432 = Shape(%425)
  %433 = Constant[value = <Scalar Tensor []>]()
  %434 = Gather[axis = 0](%432, %433)
  %437 = Unsqueeze[axes = [0]](%428)
  %440 = Unsqueeze[axes = [0]](%431)
  %441 = Unsqueeze[axes = [0]](%434)
  %442 = Concat[axis = 0](%437, %459, %460, %440, %441)
  %443 = Reshape(%425, %442)
  %444 = Transpose[perm = [0, 1, 3, 4, 2]](%443)
  return %output, %424, %444
}
ONNX export success, saved as yolov5s.onnx
2020-08-12 10:43:20.966009: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
TensorFlow version 2.3.0 detected. Last version known to be fully compatible is 1.14.0 .

Starting CoreML export with coremltools 3.4...
CoreML export failure: module 'coremltools' has no attribute 'convert'

Export complete. Visualize with https://github.com/lutzroeder/netron.

The solution is not the cleanest. It would be better to add set_logging to a function that will always be called at every script instead of adding set_logging manually to each script. The closest I can find is select_device but it does not exist in export.py.

@glenn-jocher
Copy link
Member

@NanoCode012 looks good! Can you submit a PR? Fusing is ok, I understand.

@NanoCode012 NanoCode012 mentioned this pull request Aug 12, 2020
burglarhobbit pushed a commit to burglarhobbit/yolov5 that referenced this pull request Jan 1, 2021
* Change print to logging

* Clean function set_logging

* Add line spacing

* Change leftover prints to log

* Fix scanning labels output

* Fix rank naming

* Change leftover print to logging

* Reorganized DDP variables

* Fix type error

* Make quotes consistent

* Fix spelling

* Clean function call

* Add line spacing

* Update datasets.py

* Update train.py

Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>
KMint1819 pushed a commit to KMint1819/yolov5 that referenced this pull request May 12, 2021
* Change print to logging

* Clean function set_logging

* Add line spacing

* Change leftover prints to log

* Fix scanning labels output

* Fix rank naming

* Change leftover print to logging

* Reorganized DDP variables

* Fix type error

* Make quotes consistent

* Fix spelling

* Clean function call

* Add line spacing

* Update datasets.py

* Update train.py

Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>
BjarneKuehl pushed a commit to fhkiel-mlaip/yolov5 that referenced this pull request Aug 26, 2022
* Change print to logging

* Clean function set_logging

* Add line spacing

* Change leftover prints to log

* Fix scanning labels output

* Fix rank naming

* Change leftover print to logging

* Reorganized DDP variables

* Fix type error

* Make quotes consistent

* Fix spelling

* Clean function call

* Add line spacing

* Update datasets.py

* Update train.py

Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improvement of DDP is needed!
3 participants