Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MetaShift dataset #3813

Closed
osanseviero opened this issue Mar 3, 2022 · 7 comments
Closed

Add MetaShift dataset #3813

osanseviero opened this issue Mar 3, 2022 · 7 comments
Assignees
Labels
dataset request Requesting to add a new dataset vision Vision datasets

Comments

@osanseviero
Copy link
Member

Adding a Dataset

Instructions to add a new dataset can be found here.

@osanseviero osanseviero added the dataset request Requesting to add a new dataset label Mar 3, 2022
@mariosasko mariosasko added the vision Vision datasets label Mar 3, 2022
@dnaveenr
Copy link
Contributor

dnaveenr commented Mar 8, 2022

I would like to take this up and give it a shot. Any image specific - dataset guidelines to keep in mind ? Thank you.

@dnaveenr
Copy link
Contributor

dnaveenr commented Mar 8, 2022

#self-assign

@dnaveenr
Copy link
Contributor

dnaveenr commented Mar 9, 2022

I've started working on adding this dataset. I require some inputs on the following :

Ref for the initial draft here

  1. The dataset does not have a typical - train/test/val split. What do we do for the _split_generators() function ? How do we go about this ?
  2. This dataset builds on the Visual Genome dataset, using a metadata file. The dataset is generated using generate_full_MetaShift.py script. By default, the authors choose to generate the dataset only for a SELECTED_CLASSES. The following script is used :
    Code : https://github.com/Weixin-Liang/MetaShift/blob/main/dataset/generate_full_MetaShift.py
    Info : https://metashift.readthedocs.io/en/latest/sub_pages/download_MetaShift.html#generate-the-full-metashift-dataset
    Can I just copy over the required functions into the metashift.py to generate the dataset ?
  3. How do we complete the _generate_examples for this dataset ?

The user has the ability to use default selected classes, get the complete dataset or add more specific additional classes. I think config would be a good option here.

Inputs, suggestions would be helpful. Thank you.

@osanseviero
Copy link
Member Author

I think @mariosasko and @lhoestq should be able to help here 😄

@lhoestq
Copy link
Member

lhoestq commented Mar 10, 2022

Hi ! Thanks for adding this dataset :) Let me answer your questions:

  1. in this case you can put everything in the "train" split
  2. Yes you can copy the script (provided you also include the MIT license of the code in the file header for example). Though we ideally try to not create new directories nor files when generating dataset, so if possible this script should be adapted to not create the file structure they mentioned, but instead yield the images one by one in _generate_examples. Let me know if you think this is feasible
  3. see point 2 haha

The user has the ability to use default selected classes, get the complete dataset or add more specific additional classes. I think config would be a good option here.

Yup ! We can also define a selected_classes parameter such that users can do

load_dataset("metashift", selected_classes=["cat", "dog", ...])

@dnaveenr
Copy link
Contributor

Great. This is helpful. Thanks @lhoestq .
Regarding Point 2, I'll try using yield instead of creating the directories and see if its feasible. selected_classes config sounds good.

@dnaveenr dnaveenr mentioned this issue Mar 12, 2022
3 tasks
@mariosasko
Copy link
Collaborator

Closed via #3900

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dataset request Requesting to add a new dataset vision Vision datasets
Projects
None yet
Development

No branches or pull requests

4 participants