Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[query] Fix temporal function tag application bug #2231

Merged
merged 9 commits into from
Apr 2, 2020

Conversation

arnikola
Copy link
Collaborator

Fixes #2214

Fixes an issue where batched processing of temporal functions would sometimes cause series tag mismatches.

TODO: add regression test data for this case

@fingon
Copy link
Contributor

fingon commented Mar 30, 2020

Does this commit also assume some other dependencies? I cherry-picked the commit on top of 0.15rc3 and result still seems similar as bug #2214 (= spurious blips on promql, when using e.g.)

max by (process_name)(max_over_time(procstat_memory_rss{service_id=~'$service_id'}[$__interval]))

in Grafana, which turns to this promql on API ( unquoted for your convenience ):

/api/v1/query_range?query=max by (process_name)(max_over_time(procstat_memory_rss{service_id=~'1003'}[15s])) &start=1585547985&end=1585551585&step=15

For example for one of the things matching, there seems to be (still) bleed-over from other time series (from Grafana query inspector, too lazy to run promql query by hand + search for it from json):

71:Array[1585551375,52244480]
72:Array[1585551390,50855936]
73:Array[1585551405,52244480]
74:Array[1585551420,3901157376]
75:Array[1585551435,3902169088]
76:Array[1585551450,52961280]
77:Array[1585551465,53080064]
78:Array[1585551480,52961280]
79:Array[1585551495,53080064]
80:Array[1585551510,3903795200]
81:Array[1585551525,55300096]
82:Array[1585551540,3904131072]
83:Array[1585551555,55300096]
84:Array[1585551570,52961280]
85:Array[1585551585,55300096]
86:Array[1585551600,52961280]
87:Array[1585551615,55300096]
88:Array[1585551630,52961280]

Test setup: 1 coordinator, 6 db nodes, rf=3, all with this patch + 0.15rc3.

@codecov
Copy link

codecov bot commented Mar 30, 2020

Codecov Report

Merging #2231 into master will increase coverage by 0.1%.
The diff coverage is 77.5%.

Impacted file tree graph

@@           Coverage Diff            @@
##           master   #2231     +/-   ##
========================================
+ Coverage    72.4%   72.5%   +0.1%     
========================================
  Files        1024    1024             
  Lines       89129   89079     -50     
========================================
+ Hits        64541   64639     +98     
+ Misses      20241   20097    -144     
+ Partials     4347    4343      -4
Flag Coverage Δ
#aggregator 82.1% <ø> (ø) ⬆️
#cluster 85.2% <ø> (ø) ⬆️
#collector 82.8% <ø> (ø) ⬆️
#dbnode 79.1% <100%> (ø) ⬆️
#m3em 74.4% <ø> (ø) ⬆️
#m3ninx 74.4% <ø> (ø) ⬆️
#m3nsch 51.1% <ø> (ø) ⬆️
#metrics 17.5% <ø> (ø) ⬆️
#msg 74.9% <ø> (+0.1%) ⬆️
#query 69.3% <71%> (+0.5%) ⬆️
#x 83.2% <ø> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update de15419...e93b792. Read the comment docs.

ctx := models.NewQueryContext(context.Background(),
tally.NoopScope, cost.NoopChainedEnforcer(),
models.QueryContextOptions{})
var ctx = models.NewQueryContext(context.Background(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit: Maybe make this a method? In case contexts conflict across tests?

Copy link
Collaborator

@robskillington robskillington left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with super minor comments

@fingon
Copy link
Contributor

fingon commented Apr 1, 2020

+1 from me, after this iteration I don't see #2214 in my test cluster anymore (or #2191, but that seemed to happen more on staging cluster which I upgrade only to rc's)

result := ts.Datapoint{Timestamp: enc.tsEncoderState.PrevTime}
result := ts.Datapoint{
Timestamp: enc.tsEncoderState.PrevTime,
TimestampNanos: xtime.ToUnixNano(enc.tsEncoderState.PrevTime),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, should we move to add a method called NewDatapoint(t time.Time, v float64) and then always add the fields ourselves?

Otherwise I think other callsites might miss this as well.

We can change the Timestamp and Value to unexported and add methods to access them (which will likely get inlined by Go). (i.e. Timestamp() time.Time and TimestampNanos() int64 and Value() float64

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was thinking that... I think there are a lot of places this will touch though; ok to do it as a follow up?

meta = meta.CombineMetadata(insertResult.meta)
res := <-result.ResultChan()
if res.Err != nil {
return emptyResult, res.Err
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, should we add back the defer func() { for range resultChan {} }() before returning here? Otherwise could see it possible to leak goroutines that are blocked tryring to publish to resultCh.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The channel can only ever have a single value, so that's not actually necessary here

aggDuration: xtime.UnixNano(c.op.duration),
stepSize: xtime.UnixNano(bounds.StepSize),
steps: steps,
steps: bounds.Steps(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, any reason we're not excluding name part any more? (as per prior comment // rename series to exclude their __name__ tag as part of function processing.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still around, just moved some logic around; will readd this comment around the relevant paths 👍

)

if stats.Enabled {
decodeDuration += stats.DecodeDuration
}

seriesMeta.Tags = seriesMeta.Tags.WithoutName()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps add // rename series to exclude their __name__ tag as part of function processing. above this line for consistency with other code path and explaining why without name is called?

@@ -237,6 +237,8 @@ func (cb ColumnBlockBuilder) PopulateColumns(size int) {
for i := range cb.block.columns {
cb.block.columns[i] = column{Values: cols[size*i : size*(i+1)]}
}

cb.block.seriesMeta = make([]SeriesMeta, size)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add test coverage that this is always size of cols now? Seems like was just missing before?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, luckily the new test fails without this statement 👍

Copy link
Collaborator

@robskillington robskillington left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@arnikola arnikola merged commit 7b073b4 into master Apr 2, 2020
@arnikola arnikola deleted the arnikola/temporal-tag-fix branch April 2, 2020 13:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

graphs via PromQL query on Grafana change a lot between refreshes
3 participants