Skip to content

Commit

Permalink
core: use raw sqlalchemy row iterator, speeds up reading from cachew …
Browse files Browse the repository at this point in the history
…by about 25%

before

```
src/cachew/tests/test_cachew.py::test_many[gc_off-1000000] [INFO    2023-09-14 22:04:14,721 cachew __init__.py:796 ] cachew.tests.test_cachew:test_many.<locals>.iter_data: wrote   1000000 objects to   cachew (sqlite /tmp/pytest-of-karlicos/pytest-90/test_many_gc_off_1000000_0/test_many)
test_many: initial write to cache took 3.2s
test_many: cache size is 72.904704Mb
[INFO    2023-09-14 22:04:15,020 cachew __init__.py:660 ] cachew.tests.test_cachew:test_many.<locals>.iter_data: loading 1000000 objects from cachew (sqlite /tmp/pytest-of-karlicos/pytest-90/test_many_gc_off_1000000_0/test_many)
test_many: reading from cache took 2.8s
```

after
```
src/cachew/tests/test_cachew.py::test_many[gc_off-1000000] [INFO    2023-09-14 22:04:36,065 cachew __init__.py:796 ] cachew.tests.test_cachew:test_many.<locals>.iter_data: wrote   1000000 objects to   cachew (sqlite /tmp/pytest-of-karlicos/pytest-91/test_many_gc_off_1000000_0/test_many)
test_many: initial write to cache took 3.3s
test_many: cache size is 72.904704Mb
[INFO    2023-09-14 22:04:36,427 cachew __init__.py:660 ] cachew.tests.test_cachew:test_many.<locals>.iter_data: loading 1000000 objects from cachew (sqlite /tmp/pytest-of-karlicos/pytest-91/test_many_gc_off_1000000_0/test_many)
test_many: reading from cache took 2.2s
```
  • Loading branch information
karlicoss committed Sep 14, 2023
1 parent 4537d39 commit ad99308
Showing 1 changed file with 21 additions and 3 deletions.
24 changes: 21 additions & 3 deletions src/cachew/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -618,8 +618,8 @@ def cachew_wrapper(
else:
old_hashes = cursor.fetchall()


assert len(old_hashes) <= 1, old_hashes # shouldn't happen

old_hash: Optional[SourceHash]
if len(old_hashes) == 0:
old_hash = None
Expand All @@ -628,10 +628,28 @@ def cachew_wrapper(

logger.debug('old hash: %s', old_hash)


def cached_items():
rows = conn.execute(table_cache.select())
for (blob,) in rows:

# by default, sqlalchemy wraps all results into Row object
# this can cause quite a lot of overhead if you're reading many rows
# it seems that in principle, sqlalchemy supports just returning bare underlying tuple from the dbapi
# but from browsing the code it doesn't seem like this functionality exposed
# if you're looking for cues, see
# - ._source_supports_scalars
# - ._generate_rows
# - ._row_getter
# by using this raw iterator we speed up reading the cache quite a bit
raw_row_iterator = getattr(rows, '_raw_row_iterator', None)
if raw_row_iterator is None:
warnings.warn(
"CursorResult._raw_row_iterator method isn't found. This could lead to degraded cache reading performance."
)
row_iterator = rows
else:
row_iterator = raw_row_iterator()

for (blob,) in row_iterator:
j = orjson_loads(blob)
obj = marshall.load(j)
yield obj
Expand Down

0 comments on commit ad99308

Please sign in to comment.