Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor train function into Trainer class #515

Closed
wants to merge 7 commits into from

Conversation

NanoCode012
Copy link
Contributor

@NanoCode012 NanoCode012 commented Jul 25, 2020

My idea is to break down the parts of train function into smaller functions and put them into one class called Trainer.

Now, train function only needs to call

trainer = Trainer(hyp, opt, device)
results = trainer.fit()
# destroy process group
# clear cache

In Trainer class, I created functions and helper functions for each section in the original train code.

I believe this will be better because it separates different codes under different functions, so it will be easier to read and understand instead of being solely dependent on comments.

The only files modified are train.py and trainer.py. The others are most likely due to commit history when I merge to master.

I plan to move trainer.py into utils folder after I fix the imports issue.

The biggest part to do would be the fit function where the training happens. I'm creating this draft PR first to see if glenn likes this idea before I proceed further.

🛠️ PR Summary

Made with ❤️ by Ultralytics Actions

🌟 Summary

Refactoring YOLOv5 training flow to introduce a trainer class for better code organization and maintainability.

📊 Key Changes

  • Dockerfile: Removal of deprecated create_pretrained function in favor of strip_optimizer function.
  • detect.py: create_pretrained replaced with strip_optimizer indicating a move towards finalizing model training.
  • models/yolo.py: Enhancement of image augmentation with flip and scale adjustments.
  • train.py: Major refactor encapsulating training process in a new Trainer class and updates in how the optimizer and scheduler are managed.
  • trainer.py: New file introducing the Trainer class responsible for organizing the entire training process.
  • utils/datasets.py: Minor cleanup and code improvements.
  • utils/torch_utils.py: Optimization of image scaling, avoiding unnecessary operations if the scale ratio is 1.0.
  • utils/utils.py: Removal of the create_pretrained function and changes to strip_optimizer to strip optimizers after training completion.

🎯 Purpose & Impact

  • 🔍 Better Code Organization: Encapsulating the training process within a Trainer class makes the code more modular and easier to maintain.
  • 🚀 Improved Training Workflow: Changes in arguments of various functions and removal of redundant code streamline the training process.
  • 🛠️ Maintenance and Future Development: The refactoring lays groundwork for easier updates and maintenance of the training functionality in the future.
  • 🍃 Leaner Models: The modified strip_optimizer function helps in obtaining slimmer models by removing non-essential information post-training.

NanoCode012 and others added 7 commits July 3, 2020 11:05
* Update datasets.py (ultralytics#494)

* update yolo.py TTA flexibility and extensibility (ultralytics#506)

* update yolo.py TTA flexibility and extensibility

* Update scale_img()

* Update utils.py strip_optimizer() (ultralytics#509)

* Update utils.py strip_optimizer() (ultralytics#509)

Follow-on update that I missed adding into PR 509.

* Update utils.py strip_optimizer() (ultralytics#509)

Follow-on update that I missed adding into PR 509.

Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>
@NanoCode012
Copy link
Contributor Author

@MagicFrogSJTU , could you tell me if I'm going the right path?

@MagicFrogSJTU
Copy link
Contributor

MagicFrogSJTU commented Jul 25, 2020

@MagicFrogSJTU , could you tell me if I'm going the right path?

For train.py, I think we need a deep survey and draw a design first. Simpler spliting the code is not what we only want. The most important target is to separate accademic logics and enginner routines, as in pytorch_lightning.
I am not familiar with it and don't know which way is the right way. Therefore, I can not give you any advice on this issue. I suggest we suspend and read someone's code first.

@NanoCode012
Copy link
Contributor Author

I see. I followed the example from hugging_face/transformers. I notice that the code was split into smaller functions. The only difference is that transformers 's Trainer code accepts different datasets/models etc, whereas my one is static.

I will re-look at pytorch lightning's then.

@NanoCode012
Copy link
Contributor Author

After reading the code on lightning, I think it's too complex. I will have to spend some time thinking if it's even possible for me to do it or worth it as it does not seem to be an urgent issue. I will close this PR first.

@MagicFrogSJTU
Copy link
Contributor

After reading the code on lightning, I think it's too complex. I will have to spend some time thinking if it's even possible for me to do it or worth it as it does not seem to be an urgent issue. I will close this PR first.

@NanoCode012 My advice on imitating pytorch_lighting is just a suggestion, cause I myself don't have a clear vision now. So do what you think is right! Don't let me stop you!

@NanoCode012
Copy link
Contributor Author

No, you are right on the splitting part. I actually felt like I made things harder to understand due to many functions and helper functions. Right now, I have a feeling to make one Trainer class that will be inherited by a new Train class. The Trainer class is where the “necessary” part goes , for loop in batch, set img.to(device), set sampler. Where as the Train class will implement the “forward” function, “backward”, create_optimiser.

This way, the main implementation should only be in the Train class.

@MagicFrogSJTU
Copy link
Contributor

No, you are right on the splitting part. I actually felt like I made things harder to understand due to many functions and helper functions. Right now, I have a feeling to make one Trainer class that will be inherited by a new Train class. The Trainer class is where the “necessary” part goes , for loop in batch, set img.to(device), set sampler. Where as the Train class will implement the “forward” function, “backward”, create_optimiser.

This way, the main implementation should only be in the Train class.

Sounds reasonable!

@NanoCode012
Copy link
Contributor Author

Hm, would it be unreasonable to use the Lightning module? Since they have already implemented the hard work and tested, wouldn’t it be better to use them? Or, are there certain reasons we can’t use them(license etc)

@MagicFrogSJTU
Copy link
Contributor

For my own opionion, pytorch_lightning is too heavy. I think quick, easy, smooth transform is more acceptable for Glenn.

@NanoCode012
Copy link
Contributor Author

Ok. I see!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants