Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add linear regression functions #1063

Merged
merged 11 commits into from
Oct 13, 2018
Merged

Add linear regression functions #1063

merged 11 commits into from
Oct 13, 2018

Conversation

benraskin92
Copy link
Collaborator

No description provided.

}

case DerivType:
if len(args) != 1 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you forgot to assign duration here. Bug ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good- duration is not needed for deriv

@codecov
Copy link

codecov bot commented Oct 10, 2018

Codecov Report

Merging #1063 into master will decrease coverage by 0.01%.
The diff coverage is 88.4%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1063      +/-   ##
==========================================
- Coverage    77.4%   77.39%   -0.02%     
==========================================
  Files         545      546       +1     
  Lines       46068    46132      +64     
==========================================
+ Hits        35659    35702      +43     
- Misses       8156     8168      +12     
- Partials     2253     2262       +9
Flag Coverage Δ
#aggregator 81.59% <ø> (ø) ⬆️
#collector 59.23% <ø> (ø) ⬆️
#dbnode 81.29% <ø> (-0.05%) ⬇️
#m3em 73.21% <ø> (ø) ⬆️
#m3ninx 75.25% <ø> (ø) ⬆️
#m3nsch 51.19% <ø> (ø) ⬆️
#query 63.67% <88.4%> (+0.18%) ⬆️
#x 75.1% <ø> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4a10b8e...281f999. Read the comment docs.

@@ -299,6 +299,7 @@ func (c *baseNode) processSingleRequest(request processRequest) error {
values = append(values, val)
newVal := math.NaN()
alignedTime, _ := bounds.TimeForIndex(i)
// fmt.Println("align: ", alignedTime.String())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit; delete

return emptyOp, fmt.Errorf("invalid number of args for %s: %d", PredictLinearType, len(args))
}

duration, ok = args[1].(float64)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's arg[0]?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The duration for the actual query.


var slope, intercept float64

if l.op.operatorType == PredictLinearType {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reads a little weird to me, since you're splitting processing amongst two functions and doing related calculations in both.

Instead, can you pull the code from linearRegression() into this Process function, and let it do the switching on its own?

Alternatively, if you want to avoid dealing with operator type inside the regression function, you can pass a bool that signifies wether you're calculating regression or deriv into there, and have it just return a float, something like linearRegression(isDeriv bool, dps ts.Datapoints, interceptTime time.Time) float64

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like separate linearRegression since it makes it easy to test. My above comment about having a separate func instead duration should fix this problem.

func linearRegression(dps ts.Datapoints, interceptTime time.Time) (float64, float64) {
var (
n float64
sumX, sumY float64
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are X and Y?

sumX, sumY float64
sumXY, sumX2 float64

nonNaNCount int
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe valueCount?

nonNaNCount++

// convert to milliseconds
x := float64(dp.Timestamp.Sub(interceptTime).Nanoseconds()/1000000) / 1e3
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this converts it to seconds, not millis. Can you just do float64(dp.Timestamp.Sub(interceptTime).Seconds())?

@@ -360,7 +361,7 @@ func (c *baseNode) sweep(processedKeys []bool, maxBlocks int) {

// Processor is implemented by the underlying transforms
type Processor interface {
Process(values ts.Datapoints) float64
Process(values ts.Datapoints, alignedTime time.Time) float64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call it evaluationTime ?

)

type linearRegressionProcessor struct {
duration float64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of this, store a function here which takes in a slope and intercept, returns a float. This way you don't have to compare Op in Process.


var slope, intercept float64

if l.op.operatorType == PredictLinearType {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like separate linearRegression since it makes it easy to test. My above comment about having a separate func instead duration should fix this problem.

var (
n float64
sumX, sumY float64
sumXY, sumX2 float64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better name ?

n float64
sumX, sumY float64
sumXY, sumX2 float64

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove line ?

}

valueCount++

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove line ?

}

slope, intercept := linearRegression(dps, evaluationTime, l.isDeriv)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove line ?

@benraskin92 benraskin92 merged commit 03b0242 into master Oct 13, 2018
@benraskin92 benraskin92 deleted the braskin/linear_regression branch October 13, 2018 16:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants