core: use raw sqlalchemy row iterator, speeds up reading from cachew …

…by about 25% before ``` src/cachew/tests/test_cachew.py::test_many[gc_off-1000000] [INFO 2023-09-14 22:04:14,721 cachew __init__.py:796 ] cachew.tests.test_cachew:test_many.<locals>.iter_data: wrote 1000000 objects to cachew (sqlite /tmp/pytest-of-karlicos/pytest-90/test_many_gc_off_1000000_0/test_many) test_many: initial write to cache took 3.2s test_many: cache size is 72.904704Mb [INFO 2023-09-14 22:04:15,020 cachew __init__.py:660 ] cachew.tests.test_cachew:test_many.<locals>.iter_data: loading 1000000 objects from cachew (sqlite /tmp/pytest-of-karlicos/pytest-90/test_many_gc_off_1000000_0/test_many) test_many: reading from cache took 2.8s ``` after ``` src/cachew/tests/test_cachew.py::test_many[gc_off-1000000] [INFO 2023-09-14 22:04:36,065 cachew __init__.py:796 ] cachew.tests.test_cachew:test_many.<locals>.iter_data: wrote 1000000 objects to cachew (sqlite /tmp/pytest-of-karlicos/pytest-91/test_many_gc_off_1000000_0/test_many) test_many: initial write to cache took 3.3s test_many: cache size is 72.904704Mb [INFO 2023-09-14 22:04:36,427 cachew __init__.py:660 ] cachew.tests.test_cachew:test_many.<locals>.iter_data: loading 1000000 objects from cachew (sqlite /tmp/pytest-of-karlicos/pytest-91/test_many_gc_off_1000000_0/test_many) test_many: reading from cache took 2.2s ```
karlicoss · Sep 14, 2023 · ad99308 · ad99308
1 parent 4537d39
commit ad99308
Showing 1 changed file with 21 additions and 3 deletions.
diff --git a/src/cachew/__init__.py b/src/cachew/__init__.py
@@ -618,8 +618,8 @@ def cachew_wrapper(
             else:
                 old_hashes = cursor.fetchall()
 
-
             assert len(old_hashes) <= 1, old_hashes  # shouldn't happen
+
             old_hash: Optional[SourceHash]
             if len(old_hashes) == 0:
                 old_hash = None
@@ -628,10 +628,28 @@ def cachew_wrapper(
 
             logger.debug('old hash: %s', old_hash)
 
-
             def cached_items():
                 rows = conn.execute(table_cache.select())
-                for (blob,) in rows:
+
+                # by default, sqlalchemy wraps all results into Row object
+                # this can cause quite a lot of overhead if you're reading many rows
+                # it seems that in principle, sqlalchemy supports just returning bare underlying tuple from the dbapi
+                # but from browsing the code it doesn't seem like this functionality exposed
+                # if you're looking for cues, see
+                # - ._source_supports_scalars
+                # - ._generate_rows
+                # - ._row_getter
+                # by using this raw iterator we speed up reading the cache quite a bit
+                raw_row_iterator = getattr(rows, '_raw_row_iterator', None)
+                if raw_row_iterator is None:
+                    warnings.warn(
+                        "CursorResult._raw_row_iterator method isn't found. This could lead to degraded cache reading performance."
+                    )
+                    row_iterator = rows
+                else:
+                    row_iterator = raw_row_iterator()
+
+                for (blob,) in row_iterator:
                     j = orjson_loads(blob)
                     obj = marshall.load(j)
                     yield obj