Profiling data of Scheduler.update_graph for very large graph #7998

fjetter · 2023-07-12T16:52:02Z

I recently had the pleasure to see how the scheduler reacts to a very large graph. Not too well.

I submitted a graph with a couple million tasks. Locally it looks like 2.5MM tasks but the scheduler later says less. Anyhow, it's seven digits. update_graph ran for about 5min, i.e. also blocking the event loop for that time (#7980)

What is eating up the most time is

Function	value
particularly this check for `__main__` in dumps result	5%
stringfiy	12%
key_split	12%
unpack_remotedata	12%
generate_taskstate	20%
dask.order	12%
transitioning all tasks	10%
Other foo (e.g. walking the graph for deps and such)	17%

It also looks like the TaskState and all the foo attached to them is taking up about 65% of the memory which in this case is about 82GiB. Assuming we're at 2MM tasks that's roundabout 40KB per TaskState. That's quite a lot.

scheduler-profile.zip

Nothing to do here, this is purely informational.

The text was updated successfully, but these errors were encountered:

fjetter · 2023-08-14T11:42:32Z

I did some memory profiling of the scheduler [1] based on 145c13a

I submitted a large array workload with about 1.5MM tasks. The scheduler requires about 6GB of RAM to hold the computation state in memory. The peak is a bit larger since there is some intermediate state required (mostly for dask.order).

Once the plateau of this graph is reached, computation starts and the memory usage breaks down roughly as

5836 MiB Total
├── 290 MiB Raw graph -> (This is only part of the raw deserialized graph. The original one is about 680MiB)
├── 1945 MiB Materialized Graph
│   ├── 584 MiB Stringification of values of dsk [2]
│   ├── 132 MiB Stringification of keys [3]
│   ├── 624 MiB Stringification of dependencies [4]
│   ├── 475 MiB dumps_task (removed on main)
│   └── 130 MiB Other
├── 3379 MiB generate_taskstates
│   ├── 377 MiB key_split (group keys)
│   ├── 80 MiB actual tasks dict
│   └── 2970 MiB TaskState object
│       ├── 347 MiB slots (actual object space)
│       ├── 112 MiB weakref instances on self
│       ├── 110 MiB key_split_group
│       └── 2355 MiB various sets in TaskState (profiler points to waiting_on)
└── 222 MiB Other

The two big contributions worth discussing is the TaskState that allocates more than 2GiB and materialize graph

The tracing for the TaskState object is a little fuzzy (possibly because it is using slots?) but it largely points to the usage of sets in TaskState. Indeed, empty sets are allocating relatively high memory. With 9 sets and 3 dictionaries we're already at a lower bound per TaskState of 2.39KiB

# Python 3.10.11
format_bytes(
   ...:     9 * sys.getsizeof(set())
   ...:     + 3 * sys.getsizeof(dict())
   ...:     )
Out[9]: '2.09 kiB'

which adds up to almost 3GiB alone for 1.5MM tasks. The actual memory use is even better than this calculation suggests (not sure what went wrong here...)

The other large contribution is the stringification of keys. Stringify does not cache/deduplicate str values, nor is the python interpreter able to intern our keys (afaik, only possible w/ ascii chars) every call to stringify effectively allocates new memory.
While the actual stringified keys only take 132MiB in this example, the lack of duplication blow up to much more.

This suggests that we should either remove or rework stringification and possibly consider a slimmer representation of our TaskState object.

[1] scheduler_memory_profile.html.zip
[2]

distributed/distributed/scheduler.py

Line 4769 in 145c13a

new_dsk[new_k] = stringify(v, exclusive=exclusive)

[3]

distributed/distributed/scheduler.py

Line 4767 in 145c13a

new_k = stringify(k)

[4]

distributed/distributed/scheduler.py

Line 4773 in 145c13a

stringify(k): {stringify(dep) for dep in deps}

fjetter · 2023-08-14T11:44:29Z

Note that the above graph wasn't using any annotations. Annotations will add one more stringification for annotated keys

fjetter added performance discussion Discussing a topic with no specific actions yet labels Jul 12, 2023

fjetter mentioned this issue Aug 3, 2023

Remove dumps_task #8067

Merged

fjetter closed this as completed Aug 3, 2023

fjetter closed this as not planned Won't fix, can't repro, duplicate, stale Aug 3, 2023

fjetter mentioned this issue Aug 8, 2023

Remove stringification #8083

Merged

fjetter reopened this Aug 14, 2023

fjetter mentioned this issue Sep 19, 2023

Scheduler memory leak / large worker footprint on simple workload #3898

Open

milesgranger mentioned this issue Nov 8, 2023

Reduce memory usage of scheduler process - Optimize scheduler.py::TaskState class #8331

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Profiling data of Scheduler.update_graph for very large graph #7998

Profiling data of Scheduler.update_graph for very large graph #7998

fjetter commented Jul 12, 2023 •

edited

Loading

fjetter commented Aug 14, 2023

fjetter commented Aug 14, 2023

Profiling data of Scheduler.update_graph for very large graph #7998

Profiling data of Scheduler.update_graph for very large graph #7998

Comments

fjetter commented Jul 12, 2023 • edited Loading

fjetter commented Aug 14, 2023

fjetter commented Aug 14, 2023

fjetter commented Jul 12, 2023 •

edited

Loading