Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

receive: use s2 when decompressing snappy data, reduce CPU usage for incoming remote write requests by ~5-7% #4354

Merged
merged 2 commits into from
Jun 22, 2021

Conversation

GiedriusS
Copy link
Member

@GiedriusS GiedriusS commented Jun 17, 2021

Use https://github.com/klauspost/compress/tree/master/s2#s2-compression
for decompressing Snappy data. I think if this pays off then we could
also try using this in Prometheus for encoding?

Benchmarks show these numbers:

benchmark                                                                                       old ns/op     new ns/op     delta
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_500_of_them/OK-16                         1139713       1071864       -5.95%
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_500_of_them/conflict_errors-16            1294286       1207937       -6.67%
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_5000_of_them/OK-16                        11613568      10849901      -6.58%
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_5000_of_them/conflict_errors-16           13100752      12478380      -4.75%
BenchmarkHandlerReceiveHTTP/extremely_large_label_value_10MB,_10_of_them/OK-16                  96128949      89515761      -6.88%
BenchmarkHandlerReceiveHTTP/extremely_large_label_value_10MB,_10_of_them/conflict_errors-16     96769948      90741611      -6.23%

benchmark                                                                                       old allocs     new allocs     delta
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_500_of_them/OK-16                         4578           4578           +0.00%
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_500_of_them/conflict_errors-16            7618           7617           -0.01%
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_5000_of_them/OK-16                        45325          45324          -0.00%
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_5000_of_them/conflict_errors-16           75133          75133          +0.00%
BenchmarkHandlerReceiveHTTP/extremely_large_label_value_10MB,_10_of_them/OK-16                  81             82             +1.23%
BenchmarkHandlerReceiveHTTP/extremely_large_label_value_10MB,_10_of_them/conflict_errors-16     203            202            -0.49%

benchmark                                                                                       old bytes     new bytes     delta
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_500_of_them/OK-16                         1254474       1254280       -0.02%
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_500_of_them/conflict_errors-16            1427791       1427618       -0.01%
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_5000_of_them/OK-16                        13826832      13806752      -0.15%
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_5000_of_them/conflict_errors-16           15412336      15412232      -0.00%
BenchmarkHandlerReceiveHTTP/extremely_large_label_value_10MB,_10_of_them/OK-16                  120041647     120018331     -0.02%
BenchmarkHandlerReceiveHTTP/extremely_large_label_value_10MB,_10_of_them/conflict_errors-16     119704515     119701011     -0.00%

Signed-off-by: Giedrius Statkevičius giedrius.statkevicius@vinted.com

  • I added CHANGELOG entry for this change.
  • Change is not relevant to the end user.

Use https://github.com/klauspost/compress/tree/master/s2#s2-compression
for decompressing Snappy data. I think if this pays off then we could
also try using this in Prometheus for encoding?

Benchmarks show these numbers:
```
benchmark                                                                                       old ns/op     new ns/op     delta
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_500_of_them/OK-16                         1139713       1071864       -5.95%
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_500_of_them/conflict_errors-16            1294286       1207937       -6.67%
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_5000_of_them/OK-16                        11613568      10849901      -6.58%
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_5000_of_them/conflict_errors-16           13100752      12478380      -4.75%
BenchmarkHandlerReceiveHTTP/extremely_large_label_value_10MB,_10_of_them/OK-16                  96128949      89515761      -6.88%
BenchmarkHandlerReceiveHTTP/extremely_large_label_value_10MB,_10_of_them/conflict_errors-16     96769948      90741611      -6.23%

benchmark                                                                                       old allocs     new allocs     delta
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_500_of_them/OK-16                         4578           4578           +0.00%
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_500_of_them/conflict_errors-16            7618           7617           -0.01%
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_5000_of_them/OK-16                        45325          45324          -0.00%
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_5000_of_them/conflict_errors-16           75133          75133          +0.00%
BenchmarkHandlerReceiveHTTP/extremely_large_label_value_10MB,_10_of_them/OK-16                  81             82             +1.23%
BenchmarkHandlerReceiveHTTP/extremely_large_label_value_10MB,_10_of_them/conflict_errors-16     203            202            -0.49%

benchmark                                                                                       old bytes     new bytes     delta
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_500_of_them/OK-16                         1254474       1254280       -0.02%
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_500_of_them/conflict_errors-16            1427791       1427618       -0.01%
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_5000_of_them/OK-16                        13826832      13806752      -0.15%
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_5000_of_them/conflict_errors-16           15412336      15412232      -0.00%
BenchmarkHandlerReceiveHTTP/extremely_large_label_value_10MB,_10_of_them/OK-16                  120041647     120018331     -0.02%
BenchmarkHandlerReceiveHTTP/extremely_large_label_value_10MB,_10_of_them/conflict_errors-16     119704515     119701011     -0.00%
```

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
@GiedriusS
Copy link
Member Author

GiedriusS commented Jun 18, 2021

The ASM code seems to roughly follow the Go code. So, it is easier to compare https://github.com/klauspost/compress/blob/master/s2/decode_other.go with https://github.com/golang/snappy/blob/master/decode_other.go. diff -u ... makes it clear where time is saved. (: I am not an expert on compression but it seems like time is saved by essentially decoding more at once in each loop i.e. skipping over some "pointless" work.

The code seems to be quite similar and copied/pasted in some places which means that it had been battle-tested even before S2, with some tweaks at the beginning of the decoding routine and at the end.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
@GiedriusS GiedriusS marked this pull request as ready for review June 18, 2021 08:29
Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love it, LGTM (:

@bwplotka bwplotka merged commit 7a90505 into thanos-io:main Jun 22, 2021
@GiedriusS GiedriusS deleted the feature/use_s2 branch June 22, 2021 11:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants