-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add MetaShift dataset #3900
Add MetaShift dataset #3900
Conversation
The documentation is not available anymore as the PR was closed or merged. |
@lhoestq Please could you review this when you get time. Thank you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi! Thanks for working on this!
Beyond the generated MetaShift dataset, the original preprocess script also generates the meta-graphs for each class, I have currently not included this part. [ Ref : Link ]
Maybe we can add the generated meta-graphs to the card as images (with attributions)?
There is a Bonus section, the authors share. I have currently not included this part. [ Ref : Link ]
Would be cool if we could have them as additional configs. Also, maybe we could have configs that expose image metadata from the https://nlp.stanford.edu/data/gqa/sceneGraphs.zip
file (this file is downloaded in the script but not used).
I couldn't get the dummy dataset. Need some inputs here.
I suggest you try to generate the dataset_infos.json
file first, and then I can help with the dummy data.
- **Leaderboard:** [More Information Needed] | ||
- **Point of Contact:** [More Information Needed] | ||
|
||
### Dataset Summary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This dataset was used to investigate the modality gap phenomenon, so maybe we can mention/explain that here?
Rename card name. Co-authored-by: Mario Šaško <mario@huggingface.co>
Naming for links and add point of contact info. Co-authored-by: Mario Šaško <mario@huggingface.co>
Fix extra whitespace. Co-authored-by: Mario Šaško <mario@huggingface.co>
Extra full stop removed. Co-authored-by: Mario Šaško <mario@huggingface.co>
Add bibtex tag. Co-authored-by: Mario Šaško <mario@huggingface.co>
Cleaner code changes. Co-authored-by: Mario Šaško <mario@huggingface.co>
Use os.path.join instead. Co-authored-by: Mario Šaško <mario@huggingface.co>
Use staticmethod, remove print statements. Co-authored-by: Mario Šaško <mario@huggingface.co>
Add task template. Co-authored-by: Mario Šaško <mario@huggingface.co>
add static method. Co-authored-by: Mario Šaško <mario@huggingface.co>
…atasets into add_metashift_dataset
Thanks a lot for your inputs @mariosasko .
Yes. We can do this for the default set of classes. Will add this.
I'll try adding the bonus section as additional config. |
Oh, I forgot to mention that. Let's add a |
Okay. Got it. Will add these and constants as config parameters. The image metadata from scene graphs looks like this : {
"2407890": {
"width": 640,
"height": 480,
"location": "living room",
"weather": none,
"objects": {
"271881": {
"name": "chair",
"x": 220,
"y": 310,
"w": 50,
"h": 80,
"attributes": ["brown", "wooden", "small"],
"relations": {
"32452": {
"name": "on",
"object": "275312"
},
"32452": {
"name": "near",
"object": "279472"
}
}
}
}
}
}
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, here are a few suggestions to fix the CI :)
CI fixes. Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>
Correct task categories. Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>
Add encoding. Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left a few comments, but other than that looks great.
Add paperswithcode id. Co-authored-by: Mario Šaško <mario@huggingface.co>
Correct sentence. Co-authored-by: Mario Šaško <mario@huggingface.co>
Co-authored-by: Mario Šaško <mario@huggingface.co>
Co-authored-by: Mario Šaško <mario@huggingface.co>
Co-authored-by: Mario Šaško <mario@huggingface.co>
Co-authored-by: Mario Šaško <mario@huggingface.co>
Co-authored-by: Mario Šaško <mario@huggingface.co>
add default classes info. Co-authored-by: Mario Šaško <mario@huggingface.co>
Co-authored-by: Mario Šaško <mario@huggingface.co>
Co-authored-by: Mario Šaško <mario@huggingface.co>
Co-authored-by: Mario Šaško <mario@huggingface.co>
Thanks a lot for your suggestions, Mario. The thing I learnt from the review is that I need to make better sentence formations. I will keep this in mind. :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not an easy dataset to add, but you did a great job! And it can even be streamed!
Pinging @lhoestq for the final review
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks all good thank you ! I fixed minor issues with the tags and the license
Super impressed by your work on this, congrats :)
Thanks a lot for your support. @mariosasko and @lhoestq .
Its my first dataset contribution to the 🤗 Datasets library, I'm super excited. Thank you. :) Also, I think we can close this request issue now, #3813 |
This PR adds the MetaShift dataset.
Dataset Request : Add MetaShift dataset #3813
@lhoestq As discussed,
For real data, I performed the following test :
Error as follows :
To-Do :
Need your help and suggestions for improvement. Thank you