Improve Realtime Continuous Aggregate performance #5261

fabriziomello · 2023-01-31T20:58:54Z

When calling the cagg_watermark function to get the watermark of a
Continuous Aggregate we execute a SELECT MAX(time_dimension) query
in the underlying materialization hypertable.

The problem is that a SELECT MAX(time_dimention) query can be
expensive because it will scan all hypertable chunks increasing the
planning time for a Realtime Continuous Aggregates.

Improved it by creating a new catalog table to serve as a cache table
to store the current Continous Aggregate watermark in the following
situations:

Create CAgg: store the minimum value of hypertable time dimension
data type;
Refresh CAgg: store the last value of the time dimension materialized
in the underlying materialization hypertable (or the minimum value of
materialization hypertable time dimension data type if there's no
data materialized);
Drop CAgg Chunks: the same as refresh cagg.

Closes #4699, #5307

fabriziomello · 2023-03-01T22:10:16Z

Some preliminary results:

1. Last TimescaleDB release: 2.10.0

tsbench on  main [?] via tsbench took 12s 
➜ python -m src.tsbench --with-connection pgsql://fabrizio@localhost:5432/2.10.0 --benchmarks cagg_watermark              
*** Processing benchmarks on connection: pgsql://fabrizio@localhost:5432/2.10.0
*** Executing benchmark 'cagg_watermark'
*** All benchmarks are executed - done
============================================
Report for benchmark suite 'cagg_watermark'
+--------------------------------------------------------------------------------------+------------------------------------------+
| Query                                                                                | 8b549b08e28d121946eccbc7452bc476c7754dc8 |
+--------------------------------------------------------------------------------------+------------------------------------------+
| SELECT bucket, a, value FROM agg_1m  WHERE a = 1 AND bucket > '2023-01-01 00:00:00'; |                                     2.03 |
| SELECT bucket, a, value FROM agg_5m  WHERE a = 1 AND bucket > '2023-01-01 00:00:00'; |                                     8.89 |
| SELECT bucket, a, value FROM agg_15m WHERE a = 1 AND bucket > '2023-01-01 00:00:00'; |                                   158.18 |
+--------------------------------------------------------------------------------------+------------------------------------------+

Flamegraph:

2. This PR

tsbench on  main [?] via tsbench took 11s 
➜ python -m src.tsbench --with-connection pgsql://fabrizio@localhost:5432/fabrizio --benchmarks cagg_watermark --no-teardown
*** Processing benchmarks on connection: pgsql://fabrizio@localhost:5432/fabrizio
*** Executing benchmark 'cagg_watermark'
*** All benchmarks are executed - done
============================================
Report for benchmark suite 'cagg_watermark'
+--------------------------------------------------------------------------------------+------------------------------------------+
| Query                                                                                | b7ad720d8243fe17a14a257434beff7430bc97c7 |
+--------------------------------------------------------------------------------------+------------------------------------------+
| SELECT bucket, a, value FROM agg_1m  WHERE a = 1 AND bucket > '2023-01-01 00:00:00'; |                                     1.76 |
| SELECT bucket, a, value FROM agg_5m  WHERE a = 1 AND bucket > '2023-01-01 00:00:00'; |                                     3.17 |
| SELECT bucket, a, value FROM agg_15m WHERE a = 1 AND bucket > '2023-01-01 00:00:00'; |                                    28.63 |
+--------------------------------------------------------------------------------------+------------------------------------------+

Flamegraph:

codecov · 2023-03-07T00:08:17Z

Codecov Report

❗ No coverage uploaded for pull request base (main@699fcf4). Click here to learn what that means.
The diff coverage is 94.63%.

❗ Current head de52e20 differs from pull request most recent head db35aa2. Consider uploading reports for the commit db35aa2 to get more accurate results

@@           Coverage Diff           @@
##             main    #5261   +/-   ##
=======================================
  Coverage        ?   90.69%           
=======================================
  Files           ?      229           
  Lines           ?    53235           
  Branches        ?        0           
=======================================
  Hits            ?    48283           
  Misses          ?     4952           
  Partials        ?        0

Impacted Files	Coverage Δ
src/ts_catalog/catalog.c	`89.37% <ø> (ø)`
src/ts_catalog/catalog.h	`100.00% <ø> (ø)`
src/chunk.c	`93.71% <85.71%> (ø)`
tsl/src/continuous_aggs/materialize.c	`73.85% <88.23%> (ø)`
src/ts_catalog/continuous_aggs_watermark.c	`95.83% <95.83%> (ø)`
src/hypertable.c	`87.59% <100.00%> (ø)`
src/ts_catalog/continuous_agg.c	`96.47% <100.00%> (ø)`
tsl/src/continuous_aggs/create.c	`89.16% <100.00%> (ø)`
tsl/src/continuous_aggs/invalidation_threshold.c	`62.24% <100.00%> (ø)`
tsl/src/continuous_aggs/refresh.c	`97.68% <100.00%> (ø)`

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

mkindahl

In general it looks good. A few minor things that I think need to be fixed. There is a bigger question on why some of the error would be raised since these would rather be indications of us doing something wrong with the locking. Raising errors here are safe, since they would normally not fire, but not sure if we want to have some additional tests for these cases.

src/chunk.c

src/hypertable.c

src/ts_catalog/continuous_aggs_watermark.c

tsl/src/continuous_aggs/create.c

tsl/src/continuous_aggs/refresh.c

tsl/src/continuous_aggs/materialize.c

src/ts_catalog/continuous_aggs_watermark.c

sql/pre_install/tables.sql

src/ts_catalog/continuous_aggs_watermark.c

mkindahl

We've had problems with VACUUM previously, so I think we should be careful about releasing locks too early, but otherwise it looks good.

src/ts_catalog/continuous_aggs_watermark.c

When calling the `cagg_watermark` function to get the watermark of a Continuous Aggregate we execute a `SELECT MAX(time_dimension)` query in the underlying materialization hypertable. The problem is that a `SELECT MAX(time_dimention)` query can be expensive because it will scan all hypertable chunks increasing the planning time for a Realtime Continuous Aggregates. Improved it by creating a new catalog table to serve as a cache table to store the current Continous Aggregate watermark in the following situations: - Create CAgg: store the minimum value of hypertable time dimension data type; - Refresh CAgg: store the last value of the time dimension materialized in the underlying materialization hypertable (or the minimum value of materialization hypertable time dimension data type if there's no data materialized); - Drop CAgg Chunks: the same as refresh cagg. Closes timescale#4699, timescale#5307

timescale-automation · 2023-03-22T19:36:59Z

Automated backport to 2.10.x not done: cherry-pick failed.

Git status

HEAD detached at origin/2.10.x
You are currently cherry-picking commit 38fcd1b7.
  (fix conflicts and run "git cherry-pick --continue")
  (use "git cherry-pick --skip" to skip this patch)
  (use "git cherry-pick --abort" to cancel the cherry-pick operation)

Changes to be committed:
	modified:   CHANGELOG.md
	modified:   sql/pre_install/tables.sql
	modified:   sql/updates/post-update.sql
	modified:   sql/util_time.sql
	modified:   src/chunk.c
	modified:   src/hypertable.c
	modified:   src/hypertable.h
	modified:   src/ts_catalog/CMakeLists.txt
	modified:   src/ts_catalog/catalog.c
	modified:   src/ts_catalog/catalog.h
	modified:   src/ts_catalog/continuous_agg.c
	new file:   src/ts_catalog/continuous_aggs_watermark.c
	new file:   src/ts_catalog/continuous_aggs_watermark.h
	modified:   test/expected/drop_rename_hypertable.out
	modified:   tsl/src/continuous_aggs/create.c
	modified:   tsl/src/continuous_aggs/invalidation_threshold.c
	modified:   tsl/src/continuous_aggs/materialize.c
	modified:   tsl/src/continuous_aggs/materialize.h
	modified:   tsl/src/continuous_aggs/refresh.c
	modified:   tsl/test/expected/cagg_refresh.out
	modified:   tsl/test/expected/cagg_union_view-12.out
	modified:   tsl/test/expected/cagg_union_view-13.out
	modified:   tsl/test/expected/cagg_union_view-14.out
	modified:   tsl/test/expected/cagg_union_view-15.out
	modified:   tsl/test/shared/expected/extension.out
	modified:   tsl/test/sql/cagg_union_view.sql.in

Unmerged paths:
  (use "git add <file>..." to mark resolution)
	both modified:   sql/updates/latest-dev.sql
	both modified:   sql/updates/reverse-dev.sql

Job log

In timescale#5261 we cached the Continuous Aggregate watermark value in a metadata table to improve performance to avoid a `SELECT max(primary_dimension)` execution at query time. Manually DML operations on a CAgg are not recommended and instead the user should use the `refresh_continuous_aggregate` procedure. But we handle `TRUNCATE` over CAggs generating the necessary invalidation logs so make sense to also update the watermark.

In timescale#5261 we cached the Continuous Aggregate watermark value in a metadata table to improve performance avoiding compute the watermark at planning time. Manually DML operations on a CAgg are not recommended and instead the user should use the `refresh_continuous_aggregate` procedure. But we handle `TRUNCATE` over CAggs generating the necessary invalidation logs so make sense to also update the watermark.

In #5261 we cached the Continuous Aggregate watermark value in a metadata table to improve performance avoiding compute the watermark at planning time. Manually DML operations on a CAgg are not recommended and instead the user should use the `refresh_continuous_aggregate` procedure. But we handle `TRUNCATE` over CAggs generating the necessary invalidation logs so make sense to also update the watermark.

github-actions bot assigned fabriziomello Jan 31, 2023

fabriziomello force-pushed the improve_cagg_watermark_performance branch from b531a06 to 1ec9003 Compare January 31, 2023 20:59

fabriziomello force-pushed the improve_cagg_watermark_performance branch 4 times, most recently from 4b50c2a to a81f847 Compare March 1, 2023 13:07

shhnwz mentioned this pull request Mar 1, 2023

[Bug]: performance issues with nested hierarchical aggregations #5307

Closed

fabriziomello force-pushed the improve_cagg_watermark_performance branch 7 times, most recently from d7397f5 to 4ad547f Compare March 6, 2023 23:54

fabriziomello force-pushed the improve_cagg_watermark_performance branch 14 times, most recently from 0dea5d8 to 9deb2e7 Compare March 8, 2023 23:14

fabriziomello force-pushed the improve_cagg_watermark_performance branch 10 times, most recently from f0e7ec7 to d5f59bf Compare March 20, 2023 20:29

mkindahl reviewed Mar 21, 2023

View reviewed changes

erimatnor reviewed Mar 21, 2023

View reviewed changes

fabriziomello force-pushed the improve_cagg_watermark_performance branch from d5f59bf to 3faeca8 Compare March 22, 2023 14:05

mkindahl reviewed Mar 22, 2023

View reviewed changes

src/ts_catalog/continuous_aggs_watermark.c Outdated Show resolved Hide resolved

src/ts_catalog/continuous_aggs_watermark.c Show resolved Hide resolved

src/ts_catalog/continuous_aggs_watermark.c Show resolved Hide resolved

erimatnor approved these changes Mar 22, 2023

View reviewed changes

mkindahl approved these changes Mar 22, 2023

View reviewed changes

fabriziomello force-pushed the improve_cagg_watermark_performance branch from de52e20 to 2e7a04b Compare March 22, 2023 18:24

fabriziomello force-pushed the improve_cagg_watermark_performance branch from 2e7a04b to db35aa2 Compare March 22, 2023 18:44

fabriziomello merged commit 38fcd1b into timescale:main Mar 22, 2023

timescale-automation added the auto-backport-not-done Automated backport of this PR has failed non-retriably (e.g. conflicts) label Mar 22, 2023

fabriziomello mentioned this pull request Jun 2, 2023

[Bug]: The record that was inserted into the materialization hypertable of a continuous aggregate does no longer show up in the continuous aggregate from 2.11.0 #5743

Open

alejandrodnm mentioned this pull request Jun 8, 2023

[Bug]: CAGGs queries fails after migration to Cloud because watermark is not restored and can't be set #5763

Closed

fabriziomello mentioned this pull request Apr 26, 2024

Update the watermark when truncating a CAgg #6865

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve Realtime Continuous Aggregate performance #5261

Improve Realtime Continuous Aggregate performance #5261

fabriziomello commented Jan 31, 2023 •

edited

Loading

fabriziomello commented Mar 1, 2023

codecov bot commented Mar 7, 2023 •

edited

Loading

mkindahl left a comment

mkindahl left a comment

timescale-automation commented Mar 22, 2023

Improve Realtime Continuous Aggregate performance #5261

Improve Realtime Continuous Aggregate performance #5261

Conversation

fabriziomello commented Jan 31, 2023 • edited Loading

fabriziomello commented Mar 1, 2023

1. Last TimescaleDB release: 2.10.0

2. This PR

codecov bot commented Mar 7, 2023 • edited Loading

Codecov Report

mkindahl left a comment

Choose a reason for hiding this comment

mkindahl left a comment

Choose a reason for hiding this comment

timescale-automation commented Mar 22, 2023

Git status

fabriziomello commented Jan 31, 2023 •

edited

Loading

codecov bot commented Mar 7, 2023 •

edited

Loading