Skip to content

Commit

Permalink
performance: about 5x speedup for inserting into the cache
Browse files Browse the repository at this point in the history
Seems that `.values()` clause is pretty slow comparing to using `.execute` method instead.

Didn't dig too much why exactly, but the docs kind of give a clue here:

https://docs.sqlalchemy.org/en/14/core/dml.html#sqlalchemy.sql.expression.Insert.values

> It is essential to note that passing multiple values is NOT the same as using traditional executemany() form. The above syntax is a special syntax not typically used.
> To emit an INSERT statement against multiple rows, the normal method is to pass a multiple values list to the Connection.execute() method, which is supported by all database backends and is generally more efficient for a very large number of parameters.

Benchmark:

running `taskset -c 7 pytest -rap --durations=0 -s src/cachew/tests/test_cachew.py -k 'test_many[500000-False]'`

on @karlicoss desktop computer. Using taskset to pin the process to a single core just in case, although that shouldn't matter (note that there is still some jitter, but not sure what it's caused by, might be page faults or something).

baseline -- running without cache at all (@Cachew decorator commented out):

  0.86s
  0.82s
  0.83s
  0.82s
  0.84s

before the change:

  23.29s
  24.44s
  25.25s
  24.41s
  24.46s

after the change:

  5.90s
  5.83s
  5.75s
  5.72s
  5.87s
  • Loading branch information
karlicoss committed Apr 11, 2022
1 parent df69da6 commit c54e1ed
Showing 1 changed file with 9 additions and 2 deletions.
11 changes: 9 additions & 2 deletions src/cachew/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -985,13 +985,20 @@ def cachew_wrapper(
# at this point we're guaranteed to have an exclusive write transaction

datas = func(*args, **kwargs)
column_names = [c.name for c in table_cache_tmp.columns]
insert_into_table_cache_tmp = table_cache_tmp.insert()

chunk: List[Any] = []
def flush() -> None:
nonlocal chunk
if len(chunk) > 0:
# pylint: disable=no-value-for-parameter
conn.execute(table_cache_tmp.insert().values(chunk))
# TODO hmm, it really doesn't work unless you zip into a dict first
# maybe should return dicts from binder instead then?
chunk_dict = [
dict(zip(column_names, row))
for row in chunk
]
conn.execute(insert_into_table_cache_tmp, chunk_dict)
chunk = []

for d in datas:
Expand Down

0 comments on commit c54e1ed

Please sign in to comment.