Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add metrec: arabic poetry dataset #893

Merged
merged 9 commits into from
Dec 1, 2020

Conversation

zaidalyafeai
Copy link
Contributor

No description provided.

Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks awesome good job !

For the sprint that is going to begin soon, we're now asking to add one dataset card per dataset.
You can se more information in those PRs : #896 and #894

Could you create a dataset card for Metrec as well ? The goal of a dataset card is to detail the characteristics of a dataset. The content of the dataset card is going to be displayed in the huggingface.co dataset search tool :)

datasets/metrec/metrec.py Outdated Show resolved Hide resolved
@zaidalyafeai
Copy link
Contributor Author

@lhoestq removed prints and added the dataset card.

datasets/metrec/metrec.py Outdated Show resolved Hide resolved
@zaidalyafeai
Copy link
Contributor Author

@lhoestq, I want to add other datasets as well. I am not sure if it is possible to do so with the same branch.

Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice good job :)

To add another dataset, please use a create a new branch from master and create another PR

datasets/metrec/README.md Outdated Show resolved Hide resolved
datasets/metrec/README.md Outdated Show resolved Hide resolved
@yjernite
Copy link
Member

Hi @zaidalyafeai, really excited to get more Arabic coverage in the lib, thanks for your contribution!

Couple of last comments:

  • this PR seems to modify some files that are unrelated to your dataset. Could you rebase from master? It should take care of that.
  • The dataset card is a good start! Can you describe the task in a few words and add more information in the Data Structure part, including listing and describing the fields? Also, if you don't know how to fill out a paragraph, or if you have some information but think more would be beneficial, please leave [More Information Needed] instead of [N/A]

@zaidalyafeai
Copy link
Contributor Author

zaidalyafeai commented Nov 30, 2020

Hi @zaidalyafeai, really excited to get more Arabic coverage in the lib, thanks for your contribution!

Couple of last comments:

  • this PR seems to modify some files that are unrelated to your dataset. Could you rebase from master? It should take care of that.
  • The dataset card is a good start! Can you describe the task in a few words and add more information in the Data Structure part, including listing and describing the fields? Also, if you don't know how to fill out a paragraph, or if you have some information but think more would be beneficial, please leave [More Information Needed] instead of [N/A]

I have no idea how some other files changed. I tried to rebase and push but this created some errors. I had to run the command
git push -u --force origin add-metrec-dataset which might cause some problems.

@lhoestq
Copy link
Member

lhoestq commented Dec 1, 2020

Feel free to create another branch/another PR without all the other changes

@zaidalyafeai
Copy link
Contributor Author

@yjernite can you explain which other files are changed because of the PR ? https://github.com/huggingface/datasets/pull/893/files only shows files related to the dataset.

@lhoestq
Copy link
Member

lhoestq commented Dec 1, 2020

Right ! github is nice with us today :)

datasets/metrec/README.md Outdated Show resolved Hide resolved
@lhoestq
Copy link
Member

lhoestq commented Dec 1, 2020

Looks like this one is ready to merge, thanks @zaidalyafeai !

@lhoestq lhoestq merged commit 132c8bf into huggingface:master Dec 1, 2020
@zaidalyafeai
Copy link
Contributor Author

@lhoestq thanks for the merge. I am not a GitHub geek. I already have another dataset to add. I'm not sure how to add another given my forked repo. Do I follow the same steps with a different checkout name ?

@lhoestq
Copy link
Member

lhoestq commented Dec 1, 2020

If you've followed the instructions in here : https://github.com/huggingface/datasets/blob/master/ADD_NEW_DATASET.md#start-by-preparing-your-environment

(especially point 2. and the command git remote add upstream ....)

Then you can try

git checkout master
git fetch upstream
git rebase upstream/master
git checkout -b add-<my-new-dataset-name>

sileod pushed a commit to sileod/datasets that referenced this pull request Dec 7, 2020
* add metrec arabic poetry dataset

* add metrec arabic poetry dataset

* remove prints and add dataset card

* fix conflicts

* fix conflicts

* Update datasets/metrec/README.md

* Update datasets/metrec/README.md

* Update datasets/metrec/README.md

Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants