-
Notifications
You must be signed in to change notification settings - Fork 885
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Realtime Continuous Aggregate performance #5261
Improve Realtime Continuous Aggregate performance #5261
Conversation
b531a06
to
1ec9003
Compare
4b50c2a
to
a81f847
Compare
Some preliminary results: 1. Last TimescaleDB release: 2.10.0tsbench on main [?] via tsbench took 12s
➜ python -m src.tsbench --with-connection pgsql://fabrizio@localhost:5432/2.10.0 --benchmarks cagg_watermark
*** Processing benchmarks on connection: pgsql://fabrizio@localhost:5432/2.10.0
*** Executing benchmark 'cagg_watermark'
*** All benchmarks are executed - done
============================================
Report for benchmark suite 'cagg_watermark'
+--------------------------------------------------------------------------------------+------------------------------------------+
| Query | 8b549b08e28d121946eccbc7452bc476c7754dc8 |
+--------------------------------------------------------------------------------------+------------------------------------------+
| SELECT bucket, a, value FROM agg_1m WHERE a = 1 AND bucket > '2023-01-01 00:00:00'; | 2.03 |
| SELECT bucket, a, value FROM agg_5m WHERE a = 1 AND bucket > '2023-01-01 00:00:00'; | 8.89 |
| SELECT bucket, a, value FROM agg_15m WHERE a = 1 AND bucket > '2023-01-01 00:00:00'; | 158.18 |
+--------------------------------------------------------------------------------------+------------------------------------------+ 2. This PRtsbench on main [?] via tsbench took 11s
➜ python -m src.tsbench --with-connection pgsql://fabrizio@localhost:5432/fabrizio --benchmarks cagg_watermark --no-teardown
*** Processing benchmarks on connection: pgsql://fabrizio@localhost:5432/fabrizio
*** Executing benchmark 'cagg_watermark'
*** All benchmarks are executed - done
============================================
Report for benchmark suite 'cagg_watermark'
+--------------------------------------------------------------------------------------+------------------------------------------+
| Query | b7ad720d8243fe17a14a257434beff7430bc97c7 |
+--------------------------------------------------------------------------------------+------------------------------------------+
| SELECT bucket, a, value FROM agg_1m WHERE a = 1 AND bucket > '2023-01-01 00:00:00'; | 1.76 |
| SELECT bucket, a, value FROM agg_5m WHERE a = 1 AND bucket > '2023-01-01 00:00:00'; | 3.17 |
| SELECT bucket, a, value FROM agg_15m WHERE a = 1 AND bucket > '2023-01-01 00:00:00'; | 28.63 |
+--------------------------------------------------------------------------------------+------------------------------------------+ |
d7397f5
to
4ad547f
Compare
Codecov Report
@@ Coverage Diff @@
## main #5261 +/- ##
=======================================
Coverage ? 90.69%
=======================================
Files ? 229
Lines ? 53235
Branches ? 0
=======================================
Hits ? 48283
Misses ? 4952
Partials ? 0
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
0dea5d8
to
9deb2e7
Compare
f0e7ec7
to
d5f59bf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general it looks good. A few minor things that I think need to be fixed. There is a bigger question on why some of the error would be raised since these would rather be indications of us doing something wrong with the locking. Raising errors here are safe, since they would normally not fire, but not sure if we want to have some additional tests for these cases.
d5f59bf
to
3faeca8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've had problems with VACUUM
previously, so I think we should be careful about releasing locks too early, but otherwise it looks good.
de52e20
to
2e7a04b
Compare
When calling the `cagg_watermark` function to get the watermark of a Continuous Aggregate we execute a `SELECT MAX(time_dimension)` query in the underlying materialization hypertable. The problem is that a `SELECT MAX(time_dimention)` query can be expensive because it will scan all hypertable chunks increasing the planning time for a Realtime Continuous Aggregates. Improved it by creating a new catalog table to serve as a cache table to store the current Continous Aggregate watermark in the following situations: - Create CAgg: store the minimum value of hypertable time dimension data type; - Refresh CAgg: store the last value of the time dimension materialized in the underlying materialization hypertable (or the minimum value of materialization hypertable time dimension data type if there's no data materialized); - Drop CAgg Chunks: the same as refresh cagg. Closes timescale#4699, timescale#5307
2e7a04b
to
db35aa2
Compare
Automated backport to 2.10.x not done: cherry-pick failed. Git status
|
In timescale#5261 we cached the Continuous Aggregate watermark value in a metadata table to improve performance to avoid a `SELECT max(primary_dimension)` execution at query time. Manually DML operations on a CAgg are not recommended and instead the user should use the `refresh_continuous_aggregate` procedure. But we handle `TRUNCATE` over CAggs generating the necessary invalidation logs so make sense to also update the watermark.
In timescale#5261 we cached the Continuous Aggregate watermark value in a metadata table to improve performance to avoid a `SELECT max(primary_dimension)` execution at query time. Manually DML operations on a CAgg are not recommended and instead the user should use the `refresh_continuous_aggregate` procedure. But we handle `TRUNCATE` over CAggs generating the necessary invalidation logs so make sense to also update the watermark.
In timescale#5261 we cached the Continuous Aggregate watermark value in a metadata table to improve performance avoiding compute the watermark at planning time. Manually DML operations on a CAgg are not recommended and instead the user should use the `refresh_continuous_aggregate` procedure. But we handle `TRUNCATE` over CAggs generating the necessary invalidation logs so make sense to also update the watermark.
In timescale#5261 we cached the Continuous Aggregate watermark value in a metadata table to improve performance avoiding compute the watermark at planning time. Manually DML operations on a CAgg are not recommended and instead the user should use the `refresh_continuous_aggregate` procedure. But we handle `TRUNCATE` over CAggs generating the necessary invalidation logs so make sense to also update the watermark.
In #5261 we cached the Continuous Aggregate watermark value in a metadata table to improve performance avoiding compute the watermark at planning time. Manually DML operations on a CAgg are not recommended and instead the user should use the `refresh_continuous_aggregate` procedure. But we handle `TRUNCATE` over CAggs generating the necessary invalidation logs so make sense to also update the watermark.
When calling the
cagg_watermark
function to get the watermark of aContinuous Aggregate we execute a
SELECT MAX(time_dimension)
queryin the underlying materialization hypertable.
The problem is that a
SELECT MAX(time_dimention)
query can beexpensive because it will scan all hypertable chunks increasing the
planning time for a Realtime Continuous Aggregates.
Improved it by creating a new catalog table to serve as a cache table
to store the current Continous Aggregate watermark in the following
situations:
data type;
in the underlying materialization hypertable (or the minimum value of
materialization hypertable time dimension data type if there's no
data materialized);
Closes #4699, #5307