-
-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
After learning yolov8I, I get an error that the best.pt file does not exist. #414
Comments
@jhp21c-dev the error message you're observing typically signifies that the model's training was not completed, which matches with the detail you provided about the Colab runtime being interrupted. The In your case, the interruption caused the training process to halt prematurely, thus resulting in the absence of a For future, my advice would be to make sure the training goes on uninterrupted. If you experience frequent Colab session disconnections, you might want to explore saving intermediate checkpoint files periodically during training, so you won't lose entire progress if a disconnection occurs. Also, ensure that you have enough disk space - if the disk space is out, the model-saving step at the end of training could fail, which might be contributing to the missing Keep up the good work with the learning and training process. Let me know if you encounter any other problems! |
@UltralyticsAssistant Thanks for the answer. |
@jhp21c-dev yes, if your Colab runtime was interrupted and you run your notebook again, the training generally would start from epoch 0 by default. But, as you pointed out, to avoid repeating epochs if any interruptions occur, you can adopt a solution involving the concept of 'checkpoints'. A checkpoint typically involves saving the state of your model at certain intervals, say after each epoch. This involves saving not only the model architecture, but also the values of parameters the model has learned so far. This is very useful when dealing with unexpected discontinuation of the training process as it enables you to reload the model from the point of the last checkpoint. You can modify your training script to include the model state saving commands after each epoch or after a fixed set of intervals depending on your requirement. Without heading into the specifics of how to apply a solution (as you asked not to include code examples), I highly recommend looking up "saving PyTorch model checkpoints" in a search engine to get started. You'll be able to find many useful examples which you can modify according to your needs. If you gave any more questions or need further clarification, let us know. We're here to help! |
Search before asking
HUB Component
No response
Bug
Environment
Minimal Reproducible Example
Additional
No response
The text was updated successfully, but these errors were encountered: