Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: crypto/types: use native slice index producing for loop in (*CompactBitArry).NumTrueBitsBefore #14000

Merged
merged 1 commit into from
Nov 24, 2022

Conversation

odeke-em
Copy link
Collaborator

Noticed while examining a bunch of profiles, that the for loop inside (*CompactBitArry).NumTrueBitsBefore unnecessarily consumed a bunch of time:

     7.55s      9.88s (flat, cum) 93.38% of Total
     240ms      250ms     88:func (bA *CompactBitArray) NumTrueBitsBefore(index int) int {
         .          .     89:	onesCount := 0
      70ms      340ms     90:	max := bA.Count()
      70ms       70ms     91:	if index > max {
         .          .     92:		index = max
         .          .     93:	}
         .          .     94:	// below we iterate over the bytes then over bits (in low endian) and count bits set to 1
     2.54s      2.76s     95:	for elem := 0; elem < len(bA.Elems); elem++ {

but we can use the native for loop that produces indices while iterating over slices. Just by simply changing the form results in an improvement

     7.50s      9.95s (flat, cum) 94.94% of Total
     240ms      320ms     88:func (bA *CompactBitArray) NumTrueBitsBefore(index int) int {
         .          .     89:	onesCount := 0
     170ms      420ms     90:	max := bA.Count()
      90ms      100ms     91:	if index > max {
         .          .     92:		index = max
         .          .     93:	}
         .          .     94:	// below we iterate over the bytes then over bits (in low endian) and count bits set to 1
     1.49s      1.62s     95:	for elem := range bA.Elems {

and an improvement in CPU time

$ benchstat before.txt after.txt
name                     old time/op    new time/op    delta
NumTrueBitsBefore/new-8    13.3ns ± 1%    12.5ns ± 1%  -6.07%  (p=0.000 n=10+10)

name                     old alloc/op   new alloc/op   delta
NumTrueBitsBefore/new-8     0.00B          0.00B         ~     (all equal)

name                     old allocs/op  new allocs/op  delta
NumTrueBitsBefore/new-8      0.00           0.00         ~     (all equal)

Fixes #13999

…mpactBitArry).NumTrueBitsBefore

Noticed while examining a bunch of profiles, that the for loop inside
(*CompactBitArry).NumTrueBitsBefore unnecessarily consumed a bunch of
time:
```shell
     7.55s      9.88s (flat, cum) 93.38% of Total
     240ms      250ms     88:func (bA *CompactBitArray) NumTrueBitsBefore(index int) int {
         .          .     89:	onesCount := 0
      70ms      340ms     90:	max := bA.Count()
      70ms       70ms     91:	if index > max {
         .          .     92:		index = max
         .          .     93:	}
         .          .     94:	// below we iterate over the bytes then over bits (in low endian) and count bits set to 1
     2.54s      2.76s     95:	for elem := 0; elem < len(bA.Elems); elem++ {
```

but we can use the native for loop that produces indices while iterating
over slices. Just by simply changing the form results in an improvement

```shell
     7.50s      9.95s (flat, cum) 94.94% of Total
     240ms      320ms     88:func (bA *CompactBitArray) NumTrueBitsBefore(index int) int {
         .          .     89:	onesCount := 0
     170ms      420ms     90:	max := bA.Count()
      90ms      100ms     91:	if index > max {
         .          .     92:		index = max
         .          .     93:	}
         .          .     94:	// below we iterate over the bytes then over bits (in low endian) and count bits set to 1
     1.49s      1.62s     95:	for elem := range bA.Elems {
```

and an improvement in CPU time

```shell
$ benchstat before.txt after.txt
name                     old time/op    new time/op    delta
NumTrueBitsBefore/new-8    13.3ns ± 1%    12.5ns ± 1%  -6.07%  (p=0.000 n=10+10)

name                     old alloc/op   new alloc/op   delta
NumTrueBitsBefore/new-8     0.00B          0.00B         ~     (all equal)

name                     old allocs/op  new allocs/op  delta
NumTrueBitsBefore/new-8      0.00           0.00         ~     (all equal)
```

Fixes #13999
@odeke-em odeke-em requested a review from a team as a code owner November 24, 2022 10:24
@tac0turtle tac0turtle enabled auto-merge (squash) November 24, 2022 10:38
@tac0turtle tac0turtle merged commit 00ad3ec into main Nov 24, 2022
@tac0turtle tac0turtle deleted the crypto-types-NumTrueBits-iterate-with-native-loop branch November 24, 2022 10:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants