[METRICS, breaking] Refactor caching behavior, pickle/cloudpickle metrics and dataset, add tests on metrics #518

thomwolf · 2020-08-19T19:43:08Z

Move the acquisition of the filelock at a later stage during metrics processing so it can be pickled/cloudpickled after instantiation.

Also add some tests on pickling, concurrent but separate metric instances and concurrent and distributed metric instances.

Changes significantly the caching behavior for the metrics:

if the metric is used in a non-distributed setup (most common case) we try to find a free cache file using UUID instead of asking for an experiment_id if we can't lock the cache file this allows to use several instances of the same metrics in parallel.
if the metrics is used in a distributed setup we ask for an experiment_id if we can't lock the cache file (because all the nodes need to have related cache file names for the final sync.
after the computation, we free the locks and delete all the cache files.

Breaking: Some arguments for Metrics initialization have been removed for simplicity (version...) and some have been renamed for consistency with the rest of the library (in_memory => keep_in_memory).

Also remove the _has_transformers detection in utils to avoid importing transformers everytime during loading.

thomwolf · 2020-08-20T07:47:24Z

(test failure is unrelated)

lhoestq

Awesome !
Bonus test you could add: check that it's not possible to use two concurrent metric instances

lhoestq

Cool ! I think I'll play with it a little bit. I wonder if we can have user experience issues because of timeouts

lhoestq · 2020-08-21T15:51:42Z

src/nlp/metric.py

-        if self.hash:
-            builder_data_dir = os.path.join(builder_data_dir, self.hash)
+        builder_data_dir = self._data_dir_root
+        builder_data_dir = os.path.join(builder_data_dir, self.name, self.config_name)


The hash is missing here ?

I removed it. It's not used anymore now that we create hashes on the fly.

Ok ! Maybe you can update the docstring that mentions self.hash then

lhoestq · 2020-08-21T15:57:49Z

src/nlp/metric.py

+            else:
+                filelocks.append(filelock)
+
+        return file_paths, filelocks

    def finalize(self, timeout=120):


it doesn't match the 100sec timeout by default in _get_all_cache_files.
Maybe add more info about the timeout in the docstring ?

Yes this is on purpose. When you open a new file you want to find quickly an open file to use.
When you get all the distributed files during a distributed setup you want to give the time to the nodes to finish their work before failing.

I'll add some info in the docstrings

Oh I read to quickly, indeed 100 and 120 don't match. I'll update that.

lhoestq · 2020-08-24T09:09:34Z

src/nlp/metric.py

+            file_paths = [self.cache_file_name]
+        else:
+            file_paths = [
+                os.path.join(self.data_dir, f"{self.experiment_id}-{self.num_process}-{process_id}.arrow")


What about the files with file_uuid like this ?

file_path = os.path.join( self.data_dir, f"{self.experiment_id}-{file_uuid}-{self.num_process}-{self.process_id}.arrow" )

They are only used for single-process metrics (the previous branches of the if-else).
I'll add a comment.

sgugger · 2020-08-24T16:01:35Z

As discussed with @thomwolf merging since the hyperparameter-search has been merged in transformers.

thomwolf added 3 commits August 19, 2020 20:48

moving filelock later in process

d064b05

not import transformers at loading

5989bd1

Adding some tests to pickle metrics and datasets

6000aab

thomwolf changed the title ~~Cloudpickle~~ Pickle/cloudpickle metrics and dataset Aug 19, 2020

thomwolf requested a review from lhoestq August 19, 2020 20:02

lhoestq approved these changes Aug 20, 2020

View reviewed changes

thomwolf added 2 commits August 20, 2020 19:55

WIP - refactoring metrics caching

f47744c

more robust, flexible and cleaner caching behavior for metrics + tests

dd44fdf

thomwolf changed the title ~~Pickle/cloudpickle metrics and dataset~~ [METRICS] Refactor caching behavior, pickle/cloudpickle metrics and dataset, add tests on metrics Aug 21, 2020

thomwolf changed the title ~~[METRICS] Refactor caching behavior, pickle/cloudpickle metrics and dataset, add tests on metrics~~ [METRICS, breaking] Refactor caching behavior, pickle/cloudpickle metrics and dataset, add tests on metrics Aug 21, 2020

thomwolf requested a review from lhoestq August 21, 2020 13:22

thomwolf added 2 commits August 21, 2020 17:36

fix keep_in_memory in metrics

4fc41b9

fix test

ed64d1e

lhoestq reviewed Aug 21, 2020

View reviewed changes

lhoestq reviewed Aug 24, 2020

View reviewed changes

sgugger merged commit c8fdfe3 into master Aug 24, 2020

sgugger deleted the cloudpickle branch August 24, 2020 16:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[METRICS, breaking] Refactor caching behavior, pickle/cloudpickle metrics and dataset, add tests on metrics #518

[METRICS, breaking] Refactor caching behavior, pickle/cloudpickle metrics and dataset, add tests on metrics #518

thomwolf commented Aug 19, 2020 •

edited

Loading

thomwolf commented Aug 20, 2020

lhoestq left a comment

lhoestq left a comment

lhoestq Aug 21, 2020

thomwolf Aug 21, 2020

lhoestq Aug 24, 2020

lhoestq Aug 21, 2020

thomwolf Aug 21, 2020

thomwolf Aug 21, 2020

thomwolf Aug 21, 2020 •

edited

Loading

lhoestq Aug 24, 2020

thomwolf Aug 24, 2020

sgugger commented Aug 24, 2020

[METRICS, breaking] Refactor caching behavior, pickle/cloudpickle metrics and dataset, add tests on metrics #518

[METRICS, breaking] Refactor caching behavior, pickle/cloudpickle metrics and dataset, add tests on metrics #518

Conversation

thomwolf commented Aug 19, 2020 • edited Loading

thomwolf commented Aug 20, 2020

lhoestq left a comment

Choose a reason for hiding this comment

lhoestq left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thomwolf Aug 21, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sgugger commented Aug 24, 2020

thomwolf commented Aug 19, 2020 •

edited

Loading

thomwolf Aug 21, 2020 •

edited

Loading