Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a few datasets of reference in the documentation #892

Merged
merged 2 commits into from
Nov 27, 2020

Conversation

lhoestq
Copy link
Member

@lhoestq lhoestq commented Nov 26, 2020

I started making a small list of various datasets of reference in the documentation.
Since many datasets share a lot in common I think it's good to have a list of datasets scripts to get some inspiration from.

Let me know what you think, and if you have ideas of other datasets that we may add to this list, please let me know.

@thomwolf
Copy link
Member

Looks good to me. Do we also support TSV in this helper (explain if it should be text or CSV) and in the dummy-data creator?

@lhoestq
Copy link
Member Author

lhoestq commented Nov 26, 2020

snli is basically based on tsv files (but named as .txt) and it is in the list of datasets of reference.
The dummy data creator supports tsv

@lhoestq
Copy link
Member Author

lhoestq commented Nov 27, 2020

merging this one.
If you think of other datasets of reference to add we can still add them later

@lhoestq lhoestq merged commit bbef850 into master Nov 27, 2020
@lhoestq lhoestq deleted the add-datasets-of-reference-in-docs branch November 27, 2020 18:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants