Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GetRef returns 0 for existing series #8861

Closed
krasi-georgiev opened this issue May 24, 2021 · 15 comments
Closed

GetRef returns 0 for existing series #8861

krasi-georgiev opened this issue May 24, 2021 · 15 comments

Comments

@krasi-georgiev
Copy link
Contributor

krasi-georgiev commented May 24, 2021

The following call to GetRef always returns 0 in the first iteration even when the series exists.
The strange thing is that when is commented out labels.Label{Name: "domain", Value: "b"}, it returns the existing reference for the series as expected.

This is with the current commit on the main branch #03b354d4d9386e4b3bfbcd45da4bb58b182051a5

I am not sure if this is a bug or the implementation of using GetRef is incorrect.

cc @codesome , @tomwilkie , @bboreham

package main

import (
	"context"
	"fmt"
	"log"
	"os"
	"time"

	"github.com/prometheus/prometheus/pkg/labels"
	"github.com/prometheus/prometheus/pkg/timestamp"
	"github.com/prometheus/prometheus/storage"
	"github.com/prometheus/prometheus/tsdb"
)

func main() {
	tsdbOptions := tsdb.DefaultOptions()
	dbPath := "db"
	os.RemoveAll(dbPath)

	if err := os.MkdirAll(dbPath, 0777); err != nil {
		log.Fatal(err, "creating tsdb DB folder")
	}
	tsDB, err := tsdb.Open(dbPath, nil, nil, tsdbOptions)
	if err != nil {
		log.Fatal(err, "creating tsdb DB")
	}

	defer func() {
		if err := tsDB.Close(); err != nil {
			log.Fatal("msg", "closing the tsdb", "err", err)
		}
	}()

	lbls2 := labels.Labels{
		labels.Label{Name: "__name__", Value: "name"},
		labels.Label{Name: "source", Value: "a"},
	}

	lbls3 := labels.Labels{
		labels.Label{Name: "__name__", Value: "name"},
		labels.Label{Name: "source", Value: "a"},
		labels.Label{Name: "domain", Value: "b"},
	}

	fmt.Println("adding 3 labels first append")
	append(tsDB, lbls3)

	fmt.Println("adding 2 labels first append")
	append(tsDB, lbls2)

	if err := tsDB.Close(); err != nil {
		log.Fatal("msg", "closing the tsdb", "err", err)
	}
	fmt.Println("reopening db")
	tsDB, err = tsdb.Open(dbPath, nil, nil, tsdbOptions)
	if err != nil {
		log.Fatal(err, "creating tsdb DB")
	}

	fmt.Println("adding 3 labels second append")
	append(tsDB, lbls3)
	fmt.Println("adding 2 labels second append")
	append(tsDB, lbls2)

}

func append(tsDB *tsdb.DB, lbls labels.Labels) {
	for i := 0; i < 3; i++ {
		appender := tsDB.Appender(context.Background())

		var (
			ref          uint64
			copiedLabels labels.Labels
		)
		g, ok := appender.(storage.GetRef)
		if !ok {
			log.Fatal("msg", "storage not ref getter")

		}

		ref, copiedLabels = g.GetRef(lbls)

		if ref != 0 {
			lbls = copiedLabels
		}
		log.Println("msg", "adding value to db", "ref", ref)
		if _, err := appender.Append(ref,
			lbls,
			timestamp.FromTime(time.Now()),
			1,
		); err != nil {
			log.Fatal("msg", "append values to the DB", "err", err)
		}

		if err := appender.Commit(); err != nil {
			log.Fatal("msg", "adding values to the DB", "err", err)
		}

		time.Sleep(time.Second)
	}
}

output:

adding 3 labels first append
2021/05/27 15:04:38 msg adding value to db ref 0
2021/05/27 15:04:39 msg adding value to db ref 1
2021/05/27 15:04:40 msg adding value to db ref 1
adding 2 labels first append
2021/05/27 15:04:41 msg adding value to db ref 0
2021/05/27 15:04:42 msg adding value to db ref 2
2021/05/27 15:04:43 msg adding value to db ref 2

reopening db
adding 3 labels second append
2021/05/27 15:04:44 msg adding value to db ref 0<-first problem, why GetRef returns 0 for existing timeseries?
2021/05/27 15:04:45 msg adding value to db ref 3<-second problem why Append created a new reference for the same time series?
2021/05/27 15:04:46 msg adding value to db ref 3
adding 2 labels second append
2021/05/27 15:04:47 msg adding value to db ref 2<-correctly returns the of the time series
2021/05/27 15:04:48 msg adding value to db ref 2
2021/05/27 15:04:49 msg adding value to db ref 2
@krasi-georgiev
Copy link
Contributor Author

cc @bwplotka

@bwplotka
Copy link
Member

From an interface standpoint, it's correct, it's up to the implementation to drop series any time, so getRef can get 0 when labels are not preserved.

But based on our implementation it indeed sounds weird, maybe it's some race? Still, it's not dangerous, callers should expect any of this.

@krasi-georgiev
Copy link
Contributor Author

krasi-georgiev commented May 27, 2021

ok I refactored the example a bit to make it more clear where I am confused.

Main thing I can't figure out is how would I avoid creating the same timeseries with a diff ref ID which is what is happening here?

I was expecting that GetRef would always give me the ID of given series when they exist but since here it returns 0 for some reason Append decides to give this series a new ref 3.

@krasi-georgiev
Copy link
Contributor Author

krasi-georgiev commented May 27, 2021

also I see few places in Prometheus itself where the Append is called with 0 for the reference so this means that the same timeseries will be created under a new reference every time.

@bboreham
Copy link
Member

I don’t think the implementation is compelled to add something new every time you pass zero to Append.
Merely if you pass non-zero it can skip hashing and looking up the labels.

(I didn’t check this in the code today. Speaking from memory.)

@krasi-georgiev
Copy link
Contributor Author

krasi-georgiev commented May 27, 2021

@bboreham yes this is what I thought, but did you look at my example?
In my example when you pass 0 to Append it creates a new reference every time and internally it indeed creates new timeseries for the exact same labels.

@krasi-georgiev
Copy link
Contributor Author

Maybe this behaviour changed with the recent Append refactoring.

@krasi-georgiev
Copy link
Contributor Author

ping @codesome , @bwplotka

@bboreham
Copy link
Member

bboreham commented Jun 7, 2021

It does the expected thing if you swap "source" and "domain" so that the label names are in alphabetical order.
I understand this to be a requirement of the Prometheus internals, however I cannot immediately point you to a reference.

@bboreham
Copy link
Member

bboreham commented Jun 7, 2021

It's on the definition of Labels:

// Labels is a sorted set of labels. Order has to be guaranteed upon
// instantiation.
type Labels []Label

@bwplotka
Copy link
Member

bwplotka commented Jun 7, 2021

Nice finding!

Ordering is a must - hash is totally different depending on the order. So is there something we should change @krasi-georgiev or does it solve your problem?

@krasi-georgiev
Copy link
Contributor Author

@bboreham no labels are swapped, the code is using exactly the same labels for all appends.
I added append with 2 labels and 3 labels just to showcase that with 2 labels append returns existing ref and with 3 labels it always creates a new ref.

here is the same code but only with append for 3 labels.

package main

import (
	"context"
	"fmt"
	"log"
	"os"
	"time"

	"github.com/prometheus/prometheus/pkg/labels"
	"github.com/prometheus/prometheus/pkg/timestamp"
	"github.com/prometheus/prometheus/tsdb"
)

func main() {
	tsdbOptions := tsdb.DefaultOptions()
	dbPath := "db"
	os.RemoveAll(dbPath)

	if err := os.MkdirAll(dbPath, 0777); err != nil {
		log.Fatal(err, "creating tsdb DB folder")
	}
	tsDB, err := tsdb.Open(dbPath, nil, nil, tsdbOptions)
	if err != nil {
		log.Fatal(err, "creating tsdb DB")
	}

	defer func() {
		if err := tsDB.Close(); err != nil {
			log.Fatal("msg", "closing the tsdb", "err", err)
		}
	}()

	lbls3 := labels.Labels{
		labels.Label{Name: "__name__", Value: "name"},
		labels.Label{Name: "source", Value: "a"},
		labels.Label{Name: "domain", Value: "b"},
	}

	fmt.Println("adding 3 labels first append")
	append(tsDB, lbls3)

	

	if err := tsDB.Close(); err != nil {
		log.Fatal("msg", "closing the tsdb", "err", err)
	}
	fmt.Println("reopening db")
	tsDB, err = tsdb.Open(dbPath, nil, nil, tsdbOptions)
	if err != nil {
		log.Fatal(err, "creating tsdb DB")
	}

	fmt.Println("adding 3 labels second append")
	append(tsDB, lbls3)
}

func append(tsDB *tsdb.DB, lbls labels.Labels) {
	var (
		err error
		ref uint64
	)
	for i := 0; i < 3; i++ {
		appender := tsDB.Appender(context.Background())
		ref, err = appender.Add(
			lbls,
			timestamp.FromTime(time.Now()),
			1,
		)
		if err != nil {
			log.Fatal("msg", "append values to the DB", "err", err)
		}
		log.Println("msg", "adding value to db", "ref", ref)

		if err := appender.Commit(); err != nil {
			log.Fatal("msg", "adding values to the DB", "err", err)
		}

		time.Sleep(time.Second)
	}
}

output

adding 3 labels first append
2021/06/08 11:31:04 msg adding value to db ref 1
2021/06/08 11:31:05 msg adding value to db ref 1
2021/06/08 11:31:06 msg adding value to db ref 1
reopening db
adding 3 labels second append
2021/06/08 11:31:07 msg adding value to db ref 2
2021/06/08 11:31:08 msg adding value to db ref 2
2021/06/08 11:31:09 msg adding value to db ref 2

@krasi-georgiev
Copy link
Contributor Author

oooh, now I understand what are you saying, the labels need to be sorted before the append.

@roidelapluie
Copy link
Member

I have run your code with sorted labels, indeed it works as expected:

$ go run .
adding 3 labels first append
2021/06/08 10:42:07 msg adding value to db ref 1
2021/06/08 10:42:08 msg adding value to db ref 1
2021/06/08 10:42:09 msg adding value to db ref 1
reopening db
adding 3 labels second append
2021/06/08 10:42:10 msg adding value to db ref 1
2021/06/08 10:42:11 msg adding value to db ref 1
2021/06/08 10:42:12 msg adding value to db ref 1

thanks @bboreham for the investigation.

@krasi-georgiev
Copy link
Contributor Author

yep here is the updated code that works

package main

import (
	"context"
	"fmt"
	"log"
	"os"
	"time"

	"github.com/prometheus/prometheus/pkg/labels"
	"github.com/prometheus/prometheus/pkg/timestamp"
	"github.com/prometheus/prometheus/tsdb"
)

func main() {
	tsdbOptions := tsdb.DefaultOptions()
	dbPath := "db"
	os.RemoveAll(dbPath)

	if err := os.MkdirAll(dbPath, 0777); err != nil {
		log.Fatal(err, "creating tsdb DB folder")
	}
	tsDB, err := tsdb.Open(dbPath, nil, nil, tsdbOptions)
	if err != nil {
		log.Fatal(err, "creating tsdb DB")
	}

	defer func() {
		if err := tsDB.Close(); err != nil {
			log.Fatal("msg", "closing the tsdb", "err", err)
		}
	}()

	lbls3 := labels.Labels{
		labels.Label{Name: "__name__", Value: "name"},
		labels.Label{Name: "source", Value: "a"},
		labels.Label{Name: "domain", Value: "b"},
	}
	sort.Sort(lbls3) // <-------------- sorting the labels

	fmt.Println("adding 3 labels first append")
	append(tsDB, lbls3)

	

	if err := tsDB.Close(); err != nil {
		log.Fatal("msg", "closing the tsdb", "err", err)
	}
	fmt.Println("reopening db")
	tsDB, err = tsdb.Open(dbPath, nil, nil, tsdbOptions)
	if err != nil {
		log.Fatal(err, "creating tsdb DB")
	}

	fmt.Println("adding 3 labels second append")
	append(tsDB, lbls3)
}

func append(tsDB *tsdb.DB, lbls labels.Labels) {
	var (
		err error
		ref uint64
	)
	for i := 0; i < 3; i++ {
		appender := tsDB.Appender(context.Background())
		ref, err = appender.Add(
			lbls,
			timestamp.FromTime(time.Now()),
			1,
		)
		if err != nil {
			log.Fatal("msg", "append values to the DB", "err", err)
		}
		log.Println("msg", "adding value to db", "ref", ref)

		if err := appender.Commit(); err != nil {
			log.Fatal("msg", "adding values to the DB", "err", err)
		}

		time.Sleep(time.Second)
	}
}

output

adding 3 labels first append
2021/06/08 11:45:27 msg adding value to db ref 1
2021/06/08 11:45:28 msg adding value to db ref 1
2021/06/08 11:45:29 msg adding value to db ref 1
reopening db
adding 3 labels second append
2021/06/08 11:45:30 msg adding value to db ref 1
2021/06/08 11:45:31 msg adding value to db ref 1
2021/06/08 11:45:32 msg adding value to db ref 1

@prometheus prometheus locked as resolved and limited conversation to collaborators Dec 5, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants