Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the Multilingual Amazon Reviews Corpus #928

Merged
merged 8 commits into from
Dec 1, 2020

Conversation

joeddav
Copy link
Contributor

@joeddav joeddav commented Nov 30, 2020

  • Name: Multilingual Amazon Reviews Corpus* (amazon_reviews_multi)
  • Description: A collection of Amazon reviews in English, Japanese, German, French, Spanish and Chinese.
  • Paper: https://arxiv.org/abs/2010.02573

Checkbox

  • Create the dataset script /datasets/my_dataset/my_dataset.py using the template
  • Fill the _DESCRIPTION and _CITATION variables
  • Implement _infos(), _split_generators() and _generate_examples()
  • Make sure that the BUILDER_CONFIGS class attribute is filled with the different configurations of the dataset and that the BUILDER_CONFIG_CLASS is specified if there is a custom config class.
  • Generate the metadata file dataset_infos.json for all configurations
  • Generate the dummy data dummy_data.zip files to have the dataset script tested and that they don't weigh too much (<50KB)
  • Add the dataset card README.md using the template : fill the tags and the various paragraphs
  • Both tests for the real data and the dummy data pass.

Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome ! good job

feel free to merge if there's no more changes to be added

datasets/amazon_reviews_multi/README.md Outdated Show resolved Hide resolved
datasets/amazon_reviews_multi/amazon_reviews_multi.py Outdated Show resolved Hide resolved
@joeddav joeddav merged commit 0ecda52 into huggingface:master Dec 1, 2020
@joeddav joeddav deleted the amazon-reviews-multi branch December 1, 2020 16:04
sileod pushed a commit to sileod/datasets that referenced this pull request Dec 7, 2020
* add amazon reviews multilingual

* readme typos

* add custom config class name

* style

* flesh out dataset card

* simplify download extract

* update licesne tag
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants