Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding new heartbeat_agg #615

Merged
merged 1 commit into from
Nov 29, 2022
Merged

Adding new heartbeat_agg #615

merged 1 commit into from
Nov 29, 2022

Conversation

WireBaron
Copy link
Contributor

This adds a new postgres aggregate to our toolkit that can be use to measure the liveness of a system given regular heartbeats.

Fixes #578

Copy link
Contributor

@davidkohn88 davidkohn88 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good!

#[pg_extern]
pub fn dead_ranges(
agg: HeartbeatAgg<'static>,
) -> TableIterator<'static, (name!(start, TimestampTz), name!(end, TimestampTz))> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wondering if we should just return a tstzrange? Mostly because this leaves it unclear whether it's inclusive on both sides or exactly how that works. We should be clear about the behavior either way, will need to document...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would think that time ranges (really any ranges over continuous data) are almost always left inclusive right exclusive, and really only feel a need to explain if the behavior varies from this. We can be explicit about this in the documentation if you feel it's important, but honestly I don't think any other approach would make sense.

}

#[pg_extern]
pub fn duration_dead(agg: HeartbeatAgg<'static>) -> Interval {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might want to change these names to downtime / uptime
Though it does mean our nomenclature changes. I think live_at / dead_at are slightly better than up_at / down_at and live_ranges / dead_ranges are much better than up_ranges / down_ranges

If we stick with this formulation though, I think we should definitely go with dead_duration / live_duration as it feels like switching the ordering of words ie leading with duration rather than the state is worse?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think just changing these accessors to uptime/downtime is alright. Having both uptime and live_ranges together doesn't strike me as too confusing.


const BUFFER_SIZE: usize = 1000; // How many values to absorb before consolidating

// Given the lack of a good range map class, or efficient predecessor operation on btrees,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://docs.rs/rangetree/latest/rangetree/ doesn't work? or just not good enough / just another dependency we don't need? Anything we can learn from it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at several different options here, none of them actually behaved as I needed. The example you linked actually uses a balanced tree, so avoids the most common issue I had, but it also assumes a space of discrete values which can be taken one at a time rather than letting a caller take continuous subranges of the overall space.

let ptr =
pg_sys::palloc(std::mem::size_of::<pg_sys::Interval>()) as *mut pg_sys::Interval;
*ptr = interval;
Interval(pgx::Datum::from(ptr))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pgx::Datum won't work in pgx 0.6.0 (see #618)

Suggested change
Interval(pgx::Datum::from(ptr))
Interval(pg_sys::Datum::from(ptr))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

interval_len: i64,
buffer: Vec<i64>,
liveness: Vec<(i64, i64)>, // sorted array of non-overlapping (start_time, end_time)
}

impl HeartbeatTransState {
pub fn new(start: i64, end: i64, interval: i64) -> Self {
assert!(end - start > interval);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be triggered by user input? If so, is the error message comprehensible and can we test it?

Same for another assertion below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a more explicit error message here and to insert that should make it clear to users what went wrong in this case.

This adds a new postgres aggregate to our toolkit that can be use to measure the liveness of a system given regular heartbeats.
@WireBaron
Copy link
Contributor Author

bors r+

@bors
Copy link
Contributor

bors bot commented Nov 29, 2022

Build succeeded:

@bors bors bot merged commit a6c9f07 into main Nov 29, 2022
@bors bors bot deleted the br/heartbeat_agg branch November 29, 2022 23:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Signal Heartbeat Claculation
4 participants