Ema patch fix #279 #295

NanoCode012 · 2020-07-04T09:01:51Z

Do not save keys in ["module", "process_group", "reducer"] for ema attributes for fix #279

🛠️ PR Summary

_{Made with ❤️ by Ultralytics Actions}

🌟 Summary

Improved multi-GPU support and cleaner code for YOLOv5.

📊 Key Changes

🖥️ Introduced multi-processing and distributed training enhancements to better support multi-GPU setups.
🛠️ Made modifications to allow devices to be set more flexibly when initializing training, including automatic device selection if not specified.
✂️ Removed redundant print statements to declutter console output during training.
👾 Refactored Create_dataloader function to support distributed training with the split flag and rank-aware samplers.
🧽 Cleaned up code within the training loop to ensure proper device assignment of the train.py script.

🎯 Purpose & Impact

📈 The changes pave the way for more efficient and scalable training on multiple GPUs, leading to faster model training and experimentation times.
🔍 The modifications and refactoring improve code maintainability, making it easier for other developers to work with and contribute to the codebase.
✨ The cleaner console output and code structure enhance the user experience by making it easier to monitor and understand training progress.

Updated to latest commit

Update following latest commit for fixing checkpoint saving

This reverts commit 90a930f.

This reverts commit 81ab07c.

Update

NanoCode012 · 2020-07-04T09:14:30Z

Trained on 1,2,4 GPU using python train.py --weights yolov5s.pt --epochs 4 --img 320 --device 0.. and used python test.py --weights weights/last.pt to test.

NanoCode012 · 2020-07-04T09:44:41Z

Sorry. I made a mistake with merging my branches. Will reopen in new branch.

NanoCode012 added 30 commits July 1, 2020 18:37

First test on DDP

58a6768

Fixed nprocs type

c9381d8

Attempt to fix opt cannot be found in other process

2335232

Changed from device to gpu

92fc7c7

Add check so only 1 GPU downloads model

935a8e0

Moved setup to top to init process group earlier

69e132e

Added type to world_size when parse

bed4e20

Add parameters

e569a2b

Attempt to fix map_location error

ba535b5

Add device parameter for cpu compatibility

3ab5c97

Disable tensorboard because tb_writer not found

546762d

Disabled tensorboard

75ebe0b

Moved global code into startup func

d354686

Commented printing of hyp

92ce60f

Merge branch 'ddp' of https://github.com/NanoCode012/yolov5 into ddp

22ccc5d

Removed old function

29c3472

Attempt to fix map_location error

8701adb

Attempt to fix map_location

90b0f32

Fix map_location

ce2e905

Only let rank=0 to move txt file

ed582fb

Add check for rank=0 to only move file

dc2b261

Add more rank=0 only operations

53d43aa

Add blocking while wait for file saving

26164c4

Attempt to fix map_location error

1d3bce8

Add only rank=0 to save model

5c27305

Attempt to clean output

8167e04

Changed to split parameter for dataloader

499e5d0

Add output_device parameter to DDP

71ac8e9

Moved DDP initalize after model params are set

28b43a4

Add rank to ema

22d0954

NanoCode012 added 24 commits July 3, 2020 10:47

Add shuffle parameter for train's dataloader

aaba468

Merge pull request #1 from ultralytics/master

10d45aa

Updated to latest commit

Merge branch 'ddp' into patch

a38be2c

Merge pull request #2 from NanoCode012/patch

3e3f0d6

Update following latest commit for fixing checkpoint saving

Merge branch 'ddp' of https://github.com/NanoCode012/yolov5 into ddp

ae5b8d1

Fix condition to only let rank=0 calc map

e260473

Removed shuffle

7f3b9d8

Allow all gpu to calc mAP

4721de9

Attempt change lr0

704e774

Added print statement to test

90a930f

Test move variables to local scope

0213074

Moved apex import to train function

59ce9b4

Fixed missing mixed_precision

b1b707b

Made it easier to see the print log

6c1f2d2

Fix import

cc3c69d

Revert "Added print statement to test"

bce256b

This reverts commit 90a930f.

Add non_blocking to img and label

b4da6ea

Add device argument to test function

0cd1ebb

Test undo passing ema.ema.module for model ckpt

81ab07c

Revert "Test undo passing ema.ema.module for model ckpt"

91143ef

This reverts commit 81ab07c.

Divide # of worker by gpu

5191189

Changed num of workers for validloader too

89d195e

Merge pull request #3 from ultralytics/master

bf96134

Update

Update fix save error for multi-gpu

7c658ae

NanoCode012 changed the title ~~Ema patch #279~~ Ema patch fix #279 Jul 4, 2020

Merge branch 'ddp' into ema-patch

e8fb876

NanoCode012 closed this Jul 4, 2020

NanoCode012 deleted the ema-patch branch July 4, 2020 09:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ema patch fix #279 #295

Ema patch fix #279 #295

NanoCode012 commented Jul 4, 2020 •

edited by UltralyticsAssistant

Loading

NanoCode012 commented Jul 4, 2020

NanoCode012 commented Jul 4, 2020

Ema patch fix #279 #295

Ema patch fix #279 #295

Conversation

NanoCode012 commented Jul 4, 2020 • edited by UltralyticsAssistant Loading

🛠️ PR Summary

🌟 Summary

📊 Key Changes

🎯 Purpose & Impact

NanoCode012 commented Jul 4, 2020

NanoCode012 commented Jul 4, 2020

NanoCode012 commented Jul 4, 2020 •

edited by UltralyticsAssistant

Loading