Refactor train function into Trainer class #515

NanoCode012 · 2020-07-25T13:37:10Z

My idea is to break down the parts of train function into smaller functions and put them into one class called Trainer.

Now, train function only needs to call

trainer = Trainer(hyp, opt, device)
results = trainer.fit()
# destroy process group
# clear cache

In Trainer class, I created functions and helper functions for each section in the original train code.

I believe this will be better because it separates different codes under different functions, so it will be easier to read and understand instead of being solely dependent on comments.

The only files modified are train.py and trainer.py. The others are most likely due to commit history when I merge to master.

I plan to move trainer.py into utils folder after I fix the imports issue.

The biggest part to do would be the fit function where the training happens. I'm creating this draft PR first to see if glenn likes this idea before I proceed further.

🛠️ PR Summary

_{Made with ❤️ by Ultralytics Actions}

🌟 Summary

Refactoring YOLOv5 training flow to introduce a trainer class for better code organization and maintainability.

📊 Key Changes

Dockerfile: Removal of deprecated create_pretrained function in favor of strip_optimizer function.
detect.py: create_pretrained replaced with strip_optimizer indicating a move towards finalizing model training.
models/yolo.py: Enhancement of image augmentation with flip and scale adjustments.
train.py: Major refactor encapsulating training process in a new Trainer class and updates in how the optimizer and scheduler are managed.
trainer.py: New file introducing the Trainer class responsible for organizing the entire training process.
utils/datasets.py: Minor cleanup and code improvements.
utils/torch_utils.py: Optimization of image scaling, avoiding unnecessary operations if the scale ratio is 1.0.
utils/utils.py: Removal of the create_pretrained function and changes to strip_optimizer to strip optimizers after training completion.

🎯 Purpose & Impact

🔍 Better Code Organization: Encapsulating the training process within a Trainer class makes the code more modular and easier to maintain.
🚀 Improved Training Workflow: Changes in arguments of various functions and removal of redundant code streamline the training process.
🛠️ Maintenance and Future Development: The refactoring lays groundwork for easier updates and maintenance of the training functionality in the future.
🍃 Leaner Models: The modified strip_optimizer function helps in obtaining slimmer models by removing non-essential information post-training.

Updated to latest commit

Update

Update master branch

Update to v2

* Update datasets.py (ultralytics#494) * update yolo.py TTA flexibility and extensibility (ultralytics#506) * update yolo.py TTA flexibility and extensibility * Update scale_img() * Update utils.py strip_optimizer() (ultralytics#509) * Update utils.py strip_optimizer() (ultralytics#509) Follow-on update that I missed adding into PR 509. * Update utils.py strip_optimizer() (ultralytics#509) Follow-on update that I missed adding into PR 509. Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>

NanoCode012 · 2020-07-25T13:55:18Z

@MagicFrogSJTU , could you tell me if I'm going the right path?

MagicFrogSJTU · 2020-07-25T14:07:23Z

@MagicFrogSJTU , could you tell me if I'm going the right path?

For train.py, I think we need a deep survey and draw a design first. Simpler spliting the code is not what we only want. The most important target is to separate accademic logics and enginner routines, as in pytorch_lightning.
I am not familiar with it and don't know which way is the right way. Therefore, I can not give you any advice on this issue. I suggest we suspend and read someone's code first.

NanoCode012 · 2020-07-25T14:19:03Z

I see. I followed the example from hugging_face/transformers. I notice that the code was split into smaller functions. The only difference is that transformers 's Trainer code accepts different datasets/models etc, whereas my one is static.

I will re-look at pytorch lightning's then.

NanoCode012 · 2020-07-26T16:47:02Z

After reading the code on lightning, I think it's too complex. I will have to spend some time thinking if it's even possible for me to do it or worth it as it does not seem to be an urgent issue. I will close this PR first.

MagicFrogSJTU · 2020-07-27T02:43:07Z

After reading the code on lightning, I think it's too complex. I will have to spend some time thinking if it's even possible for me to do it or worth it as it does not seem to be an urgent issue. I will close this PR first.

@NanoCode012 My advice on imitating pytorch_lighting is just a suggestion, cause I myself don't have a clear vision now. So do what you think is right! Don't let me stop you!

NanoCode012 · 2020-07-27T03:51:52Z

No, you are right on the splitting part. I actually felt like I made things harder to understand due to many functions and helper functions. Right now, I have a feeling to make one Trainer class that will be inherited by a new Train class. The Trainer class is where the “necessary” part goes , for loop in batch, set img.to(device), set sampler. Where as the Train class will implement the “forward” function, “backward”, create_optimiser.

This way, the main implementation should only be in the Train class.

MagicFrogSJTU · 2020-07-27T04:50:47Z

No, you are right on the splitting part. I actually felt like I made things harder to understand due to many functions and helper functions. Right now, I have a feeling to make one Trainer class that will be inherited by a new Train class. The Trainer class is where the “necessary” part goes , for loop in batch, set img.to(device), set sampler. Where as the Train class will implement the “forward” function, “backward”, create_optimiser.

This way, the main implementation should only be in the Train class.

Sounds reasonable!

NanoCode012 · 2020-07-27T08:06:45Z

Hm, would it be unreasonable to use the Lightning module? Since they have already implemented the hard work and tested, wouldn’t it be better to use them? Or, are there certain reasons we can’t use them(license etc)

MagicFrogSJTU · 2020-07-27T08:28:20Z

For my own opionion, pytorch_lightning is too heavy. I think quick, easy, smooth transform is more acceptable for Glenn.

NanoCode012 · 2020-07-27T08:47:43Z

Ok. I see!

NanoCode012 and others added 7 commits July 3, 2020 11:05

Merge pull request #1 from ultralytics/master

10d45aa

Updated to latest commit

Merge pull request #3 from ultralytics/master

bf96134

Update

Merge pull request #5 from ultralytics/master

a4d3414

Update

Merge pull request #8 from ultralytics/master

a775c7a

Update master branch

Merge pull request #11 from ultralytics/master

c18d7a3

Update to v2

Refactor train function into Trainer class

fe49609

NanoCode012 closed this Jul 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor train function into Trainer class #515

Refactor train function into Trainer class #515

NanoCode012 commented Jul 25, 2020 •

edited by UltralyticsAssistant

Loading

NanoCode012 commented Jul 25, 2020

MagicFrogSJTU commented Jul 25, 2020 •

edited

Loading

NanoCode012 commented Jul 25, 2020

NanoCode012 commented Jul 26, 2020

MagicFrogSJTU commented Jul 27, 2020

NanoCode012 commented Jul 27, 2020

MagicFrogSJTU commented Jul 27, 2020

NanoCode012 commented Jul 27, 2020

MagicFrogSJTU commented Jul 27, 2020

NanoCode012 commented Jul 27, 2020

Refactor train function into Trainer class #515

Refactor train function into Trainer class #515

Conversation

NanoCode012 commented Jul 25, 2020 • edited by UltralyticsAssistant Loading

🛠️ PR Summary

🌟 Summary

📊 Key Changes

🎯 Purpose & Impact

NanoCode012 commented Jul 25, 2020

MagicFrogSJTU commented Jul 25, 2020 • edited Loading

NanoCode012 commented Jul 25, 2020

NanoCode012 commented Jul 26, 2020

MagicFrogSJTU commented Jul 27, 2020

NanoCode012 commented Jul 27, 2020

MagicFrogSJTU commented Jul 27, 2020

NanoCode012 commented Jul 27, 2020

MagicFrogSJTU commented Jul 27, 2020

NanoCode012 commented Jul 27, 2020

NanoCode012 commented Jul 25, 2020 •

edited by UltralyticsAssistant

Loading

MagicFrogSJTU commented Jul 25, 2020 •

edited

Loading