Remove `dumps_task` #8067

fjetter · 2023-08-03T12:55:47Z

This is a tangent to #8049

I noticed that the dumps_task is a surprisingly expensive operation (about 12% in #7998)

It is also a rather significant complexity driver and I believe it is no longer necessary now that pickle is used on the scheduler.

This PR explores what actually relies on this behavior and how much complexity we can remove with the removal of dumps_task

github-actions · 2023-08-03T14:06:22Z

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

      20 files ±    0       20 suites ±0 11h 35m 47s ⏱️ + 39m 57s
  3 757 tests +    5   3 645 ✔️ ±    0   106 💤 +  2 5 ❌ +3 1 🔥 ±0
36 343 runs - 134 34 585 ✔️ - 223 1 750 💤 +85 7 ❌ +4 1 🔥 ±0

For more details on these failures and errors, see this check.

Results for commit 43b261f. ± Comparison against base commit ef6d4bf.

This pull request removes 12 and adds 17 tests. Note that renamed tests count towards both.

distributed.tests.test_cancelled_state ‑ test_execute_preamble_early_cancel[executing-False-deserialize_task]
distributed.tests.test_cancelled_state ‑ test_execute_preamble_early_cancel[executing-False-execute]
distributed.tests.test_cancelled_state ‑ test_execute_preamble_early_cancel[executing-True-deserialize_task]
distributed.tests.test_cancelled_state ‑ test_execute_preamble_early_cancel[executing-True-execute]
distributed.tests.test_cancelled_state ‑ test_execute_preamble_early_cancel[resumed-False-deserialize_task]
distributed.tests.test_cancelled_state ‑ test_execute_preamble_early_cancel[resumed-False-execute]
distributed.tests.test_cancelled_state ‑ test_execute_preamble_early_cancel[resumed-True-deserialize_task]
distributed.tests.test_cancelled_state ‑ test_execute_preamble_early_cancel[resumed-True-execute]
distributed.tests.test_scheduler ‑ test_dumps_task
distributed.tests.test_worker ‑ test_gather_missing_workers_replicated[True]
…

distributed.protocol.tests.test_numpy
distributed.shuffle.tests.test_rechunk
distributed.shuffle.tests.test_shuffle ‑ test_restarting_does_not_deadlock
distributed.tests.test_cancelled_state ‑ test_execute_preamble_early_cancel[executing-False]
distributed.tests.test_cancelled_state ‑ test_execute_preamble_early_cancel[executing-True]
distributed.tests.test_cancelled_state ‑ test_execute_preamble_early_cancel[resumed-False]
distributed.tests.test_cancelled_state ‑ test_execute_preamble_early_cancel[resumed-True]
distributed.tests.test_client ‑ test_gather_race_vs_AMM[False]
distributed.tests.test_client ‑ test_gather_race_vs_AMM[True]
distributed.tests.test_utils_comm ‑ test_gather_from_workers_busy
…

♻️ This comment has been updated with latest results.

fjetter · 2023-08-09T13:15:35Z

distributed/tests/test_scheduler.py

-    class Refcount:
-        "Track how many instances of this class exist; logs the count at creation and deletion"
-
-        count = 0
-        lock = dask.utils.SerializableLock()
-        log = []
-
-        def __init__(self):
-            with self.lock:
-                type(self).count += 1
-                self.log.append(self.count)
-
-        def __del__(self):
-            with self.lock:
-                self.log.append(self.count)
-                type(self).count -= 1


This test is interesting. This PR is not changing anything in terms of scheduling, ordering, etc. but this is still quite reliably failing. It seems as if Refcount is relying on explicit garbage collection. This is something I want to look into a little more since we're seeing a lot of GC warnings recently. However, for the sake of this PR I rewrote it to count keys in data instead of relying on GC. Eventually, I think both tests would make sense

This is really a weird case and somehow connected to how this object is defined in a local context.
I looked pretty closely but I cannot find any cyclic references. In fact, I see actually fewer objects actually tracked by GC than this counter is let to believe. I know that CPython guarantees that __del__ is indeed called and only called once but I believe there are some caveats about when this is the case.

fjetter · 2023-08-09T16:53:08Z

Tests look good and considering the large reduction of complexity, I suggest to move forward unless benchmarking raises a red flag (A/B currently running, manual tests hasn't shown any anomalies)

fjetter · 2023-08-10T12:20:38Z

Well, benchmarks are happy, mostly

https://github.com/coiled/benchmarks/suites/14979719727/artifacts/854911855

Some rather common operations are 20-30% faster! Some tests (primarily the parquet tests) are slightly negatively impacted. I suspect this is because we're no longer caching parts of the deserialization but I haven't verified

Wall Clock

Average memory also looks good (this is interesting...)

We do see a couple of jumps in peak memory usage

I suspect that the memory changes are more or less an artifact of subtle timing changes but I haven't verified.

The very large outlier in wall time is the test_single_future which I strongly suspect suffers from the removed cache. The absolute change is minimal but the relative one is large.

fjetter · 2023-08-10T12:25:22Z

Thinking about these results for a moment, I suspect the improved runtime is mostly from the removal of the deserialization step.

distributed/distributed/worker.py

Lines 2275 to 2284 in 1f8a11c

    
           try: 
        
               function, args, kwargs = await self._maybe_deserialize_task(ts) 
        
           except Exception as exc: 
        
               logger.error("Could not deserialize task %s", key, exc_info=True) 
        
               return ExecuteFailureEvent.from_exception( 
        
                   exc, 
        
                   key=key, 
        
                   run_id=run_id, 
        
                   stimulus_id=f"run-spec-deserialize-failed-{time()}", 
        
               )

While deserializing tasks we were basically already blocking a task slot on the state machine even though the threadpool was idling. (I was recently also thinking about "oversubscribing" the state machine, i.e. state_machine.nthreads > TPE.max_workers to create some pressure and keep the TPE busy, very different ticket, of course)

fjetter · 2023-08-10T12:54:51Z

cc @madsbk this may also interest you? Not directly related to #8083 but kind of

hendrikmakait

Thanks, @fjetter! I <3 the reduction in complexity.

hendrikmakait · 2023-08-10T13:34:56Z

distributed/worker.py

-        assert not function and not args and not kwargs
-        function = execute_task
-        args = (task,)
+def _normalize_task(task: Any) -> T_runspec:


nit: It feels like this should live in utils (and maybe be public?) instead of worker given that we use it on the scheduler as well.

I don't want this to be public. This is merely a translation layer between garbage outside and clean within

Eventually this should become dask/dask#9969

Makes sense, I guess it being public in a private module might be preferable but that's nit-picking.

hendrikmakait · 2023-08-10T13:38:38Z

distributed/worker_state_machine.py

-        elif not isinstance(self.run_spec, SerializedTask):
-            self.run_spec = SerializedTask(task=self.run_spec)
+        if isinstance(self.run_spec, ToPickle):
+            # FIXME Sometimes the protocol is not unpacking this


Should we create an issue for this?

we can but this is very low prio

distributed/worker_state_machine.py

Co-authored-by: Hendrik Makait <hendrik.makait@gmail.com>

rjzamora · 2023-08-16T19:10:33Z

distributed/core.py

-cache_loads: LRU[bytes, Callable[..., Any]] = LRU(maxsize=100)
-
-
-def loads_function(bytes_object):


cc @madsbk - It looks like we were using this function in dask-cuda (rapidsai/dask-cuda#1219)

Yes, we use it for its caching feature but I don't think it is needed.

In versions of distributed after dask/distributed#8067 but before dask/distributed#8216, we must patch protocol.loads to include the same decompression fix.

In versions of distributed after dask/distributed#8067 but before dask/distributed#8216, we must patch protocol.loads to include the same decompression fix. Authors: - Lawrence Mitchell (https://github.com/wence-) Approvers: - Peter Andreas Entschev (https://github.com/pentschev) URL: #1247

fjetter mentioned this pull request Aug 4, 2023

[Discussion] Serialize objects within tasks #4673

Open

Remove dumps_task

0dbbaa8

fjetter force-pushed the remove_dumps_task branch from 26041ef to 0dbbaa8 Compare August 8, 2023 10:54

fjetter added 2 commits August 8, 2023 15:25

more fixes

3cf1465

multiple test fixes

eb922ec

fjetter commented Aug 9, 2023

View reviewed changes

leftover

73e75e5

fjetter marked this pull request as ready for review August 9, 2023 15:06

fjetter changed the title ~~WIP Remove dumps_task~~ Remove dumps_task Aug 9, 2023

fjetter changed the title ~~Remove dumps_task~~ Remove dumps_task Aug 9, 2023

hendrikmakait self-requested a review August 10, 2023 12:53

hendrikmakait approved these changes Aug 10, 2023

View reviewed changes

Update distributed/worker_state_machine.py

43b261f

Co-authored-by: Hendrik Makait <hendrik.makait@gmail.com>

hendrikmakait merged commit 4f30abc into dask:main Aug 11, 2023
19 of 25 checks passed

fjetter deleted the remove_dumps_task branch August 12, 2023 13:04

rjzamora mentioned this pull request Aug 16, 2023

Cannot import dask_cuda with distributed:main rapidsai/dask-cuda#1219

Closed

rjzamora reviewed Aug 16, 2023

View reviewed changes

pentschev mentioned this pull request Sep 26, 2023

UCX serialization errors after dumps_task removal from Distributed rapidsai/dask-cuda#1246

Open

wence- mentioned this pull request Sep 27, 2023

Monkeypatch protocol.loads ala dask/distributed#8216 rapidsai/dask-cuda#1247

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove `dumps_task` #8067

Remove `dumps_task` #8067

fjetter commented Aug 3, 2023 •

edited

Loading

github-actions bot commented Aug 3, 2023 •

edited

Loading

fjetter Aug 9, 2023

fjetter Aug 9, 2023

fjetter commented Aug 9, 2023

fjetter commented Aug 10, 2023

fjetter commented Aug 10, 2023

fjetter commented Aug 10, 2023

hendrikmakait left a comment

hendrikmakait Aug 10, 2023

fjetter Aug 10, 2023

hendrikmakait Aug 10, 2023

hendrikmakait Aug 10, 2023

fjetter Aug 10, 2023

rjzamora Aug 16, 2023

madsbk Aug 17, 2023

		cache_loads: LRU[bytes, Callable[..., Any]] = LRU(maxsize=100)


		def loads_function(bytes_object):

Remove dumps_task #8067

Remove dumps_task #8067

Conversation

fjetter commented Aug 3, 2023 • edited Loading

github-actions bot commented Aug 3, 2023 • edited Loading

Unit Test Results

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fjetter commented Aug 9, 2023

fjetter commented Aug 10, 2023

fjetter commented Aug 10, 2023

fjetter commented Aug 10, 2023

hendrikmakait left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Remove `dumps_task` #8067

Remove `dumps_task` #8067

fjetter commented Aug 3, 2023 •

edited

Loading

github-actions bot commented Aug 3, 2023 •

edited

Loading