Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix list paths #721

Merged
merged 3 commits into from
Aug 13, 2020
Merged

Fix list paths #721

merged 3 commits into from
Aug 13, 2020

Conversation

Hatovix
Copy link
Contributor

@Hatovix Hatovix commented Aug 12, 2020

I noticed earlier today that check_dataset function was only expecting single string from dict['train'] and dict['val'], generating errors while providing a list of paths.

πŸ› οΈ PR Summary

Made with ❀️ by Ultralytics Actions

🌟 Summary

Enhanced dataset auto-download and improved platform compatibility in general utilities.

πŸ“Š Key Changes

  • πŸ› οΈ Changed the way the Python platform module is imported and used.
  • πŸ› Fixed the check_git_status function to use platform.system() for better OS detection.
  • 🧹 Cleaned up the comment wording for the check_file function.
  • πŸ’Ύ Updated the check_dataset function, adding support for MacOS curl and refining the auto-download script logic.

🎯 Purpose & Impact

  • πŸš€ Ensure that the utility functions are more reliable across different operating systems.
  • πŸ“ˆ Improve automatic dataset downloading, making it easier for users to start projects without manual data setup.
  • 🍏 Provide a specific solution for MacOS users to avoid certificate errors during the auto-download process.
  • πŸ‘₯ These updates aim to streamline user experience and lower entry barriers for new users trying to utilize YOLOv5 for their projects.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @Hatovix, thank you for submitting a PR! To allow your work to be integrated as seamlessly as possible, we advise you to:

  • Verify your PR is up-to-date with origin/master. If your PR is behind origin/master update by running the following, replacing 'feature' with the name of your local branch:
git remote add upstream https://github.com/ultralytics/yolov5.git
git fetch upstream
git checkout feature  # <----- replace 'feature' with local branch name
git rebase upstream/master
git push -u origin -f
  • Verify all Continuous Integration (CI) checks are passing.
  • Reduce changes to the absolute minimum required for your bug fix or feature addition. "It is not daily increase but daily decrease, hack away the unessential. The closer to the source, the less wastage there is." -Bruce Lee

@glenn-jocher
Copy link
Member

@Hatovix ah, thank you, good catch! I will look through this.

@glenn-jocher
Copy link
Member

@Hatovix maybe we should just flip the if statements so it only checks datasets with download keys.
if 'download' in dict:

I assume only the official datasets would have download keys, and they are all single-source datasets (i.e. train and val strings).

@Hatovix
Copy link
Contributor Author

Hatovix commented Aug 12, 2020

You are probably right about the download keys, provided by public datasets.

Another think about this function (check_dataset), in my case of use:

  • custom dataset
  • list of paths
  • no download key

A nonexistant path displays the associated warning, but the training only stop later, after loading previous existing paths from the list, and fails on the first problematic one, (just noticed that Exception('Dataset autodownload unavailable.') probably needs raise statement).

I think it is better to check that everything is ready to be loaded, since loading can be time consuming.

@glenn-jocher
Copy link
Member

Just realized if newcomers use the coco128.yaml template for their own dataset, they may leave the download key there with an empty string maybe download: '', or just blank download: , without knowing they can also just delete it.

Perhaps this function may need a bit of a broader rethink.

@glenn-jocher
Copy link
Member

At the end of the day the desired outcome is simply that the official datasets autodownload if not found locally, the details of the function aren't too important, but I wanted to create a structure that allows people to implement on their own custom datasets a s well, should they choose to do so.

@glenn-jocher
Copy link
Member

This should be more robust than before. I removed checking of the train paths, as some use cases involve running test.py on coco val without actually requiring you to download coco train. Also added an OS check, as I'm getting a certificate error on MacOS locally with torch download.

@glenn-jocher
Copy link
Member

CI passing, merging. Thank you for your contributionS @Hatovix.

@glenn-jocher glenn-jocher merged commit 56c2c34 into ultralytics:master Aug 13, 2020
burglarhobbit pushed a commit to burglarhobbit/yolov5 that referenced this pull request Jan 1, 2021
* Add list paths on check_dataset

* missing raise statement

* Update general.py

Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>
KMint1819 pushed a commit to KMint1819/yolov5 that referenced this pull request May 12, 2021
* Add list paths on check_dataset

* missing raise statement

* Update general.py

Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>
BjarneKuehl pushed a commit to fhkiel-mlaip/yolov5 that referenced this pull request Aug 26, 2022
* Add list paths on check_dataset

* missing raise statement

* Update general.py

Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants