Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ema patch fix #279 #295

Closed
wants to merge 61 commits into from
Closed

Conversation

NanoCode012
Copy link
Contributor

@NanoCode012 NanoCode012 commented Jul 4, 2020

Do not save keys in ["module", "process_group", "reducer"] for ema attributes for fix #279

πŸ› οΈ PR Summary

Made with ❀️ by Ultralytics Actions

🌟 Summary

Improved multi-GPU support and cleaner code for YOLOv5.

πŸ“Š Key Changes

  • πŸ–₯️ Introduced multi-processing and distributed training enhancements to better support multi-GPU setups.
  • πŸ› οΈ Made modifications to allow devices to be set more flexibly when initializing training, including automatic device selection if not specified.
  • βœ‚οΈ Removed redundant print statements to declutter console output during training.
  • πŸ‘Ύ Refactored Create_dataloader function to support distributed training with the split flag and rank-aware samplers.
  • 🧽 Cleaned up code within the training loop to ensure proper device assignment of the train.py script.

🎯 Purpose & Impact

  • πŸ“ˆ The changes pave the way for more efficient and scalable training on multiple GPUs, leading to faster model training and experimentation times.
  • πŸ” The modifications and refactoring improve code maintainability, making it easier for other developers to work with and contribute to the codebase.
  • ✨ The cleaner console output and code structure enhance the user experience by making it easier to monitor and understand training progress.

@NanoCode012 NanoCode012 changed the title Ema patch #279 Ema patch fix #279 Jul 4, 2020
@NanoCode012
Copy link
Contributor Author

Trained on 1,2,4 GPU using python train.py --weights yolov5s.pt --epochs 4 --img 320 --device 0.. and used python test.py --weights weights/last.pt to test.

@NanoCode012
Copy link
Contributor Author

Sorry. I made a mistake with merging my branches. Will reopen in new branch.

@NanoCode012 NanoCode012 closed this Jul 4, 2020
@NanoCode012 NanoCode012 deleted the ema-patch branch July 4, 2020 09:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TypeError: can't pickle torch.distributed.ProcessGroupNCCL objects
1 participant