Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concat only unique fields in DatasetInfo.from_merge #2163

Merged
merged 2 commits into from
Apr 6, 2021

Conversation

mariosasko
Copy link
Collaborator

I thought someone from the community with less experience would be interested in fixing this issue, but that wasn't the case.

Fixes #2103

@bhavitvyamalik
Copy link
Contributor

Hi @mariosasko,
Just came across this PR and I was wondering if we can use
description = "\n\n".join(OrderedDict.fromkeys([info.description for info in dataset_infos]))

This will obviate the need for unique and is almost as fast as set. We could have used dict inplace of OrderedDict but it's available 3.7+ onwards

@mariosasko
Copy link
Collaborator Author

mariosasko commented Apr 4, 2021

Hi,

let's see what @lhoestq thinks. Although my approach adds more code, it's more readable IMO.

@bhavitvyamalik
Copy link
Contributor

Yeah, that's true. Your approach is more readable.

Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks !

Small preference on this definition on unique since it's way more readable that using OrderedDict.

@lhoestq lhoestq merged commit c9c8c7c into huggingface:master Apr 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

citation, homepage, and license fields of dataset_info.json are duplicated many times
3 participants