-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add summarization template #2529
Add summarization template #2529
Conversation
tests/test_arrow_dataset.py
Outdated
"dummy": ["123456"], | ||
} | ||
# Test we can load from task name | ||
with Dataset.from_dict(data, info=info) as dset: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems that whatever magic @lhoestq did with windows now allows me to avoid using the annoying context manager that we have in many unit tests:
with tempfile.TemporaryDirectory() as tmp_dir, Dataset.from_dict(data) as dset:
with self._to(in_memory, tmp_dir, dset) as dset:
# do operations on `dset`
i never really understood why we had to do this so please correct me if it's actually needed 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The with
statement are used to properly close open arrow files on windows (it really doesn't like keeping things open).
What you mention though is the call to self._to
call that moves a dataset on disk or in memory, which can be useful in some tests. Here you don't need to do this indeed.
Since the in_memory
parameter is not used in this test, you can move this test outside of the BaseDatasetTest
class and have it as a regular pytest test case.
This way it won't be run twice (once for in_memory=True
and once for in_memory=False
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🚀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice thanks !
Could you just move the test outside of the BaseDatasetTest class please ? Otherwise it will unnecessarily be run twice.
tests/test_arrow_dataset.py
Outdated
"dummy": ["123456"], | ||
} | ||
# Test we can load from task name | ||
with Dataset.from_dict(data, info=info) as dset: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The with
statement are used to properly close open arrow files on windows (it really doesn't like keeping things open).
What you mention though is the call to self._to
call that moves a dataset on disk or in memory, which can be useful in some tests. Here you don't need to do this indeed.
Since the in_memory
parameter is not used in this test, you can move this test outside of the BaseDatasetTest
class and have it as a regular pytest test case.
This way it won't be run twice (once for in_memory=True
and once for in_memory=False
)
sure, on it! thanks for the explanations about the |
@lhoestq i've moved all the task template tests outside of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks all good, thanks :)
This PR adds a task template for text summarization. As far as I can tell, we do not need to distinguish between "extractive" or "abstractive" summarization - both can be handled with this template.
Usage: