Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use faster copy when not overlapping #48

Merged
merged 1 commit into from
Sep 4, 2019

Conversation

klauspost
Copy link
Contributor

@klauspost klauspost commented Jul 10, 2019

Use the built-in copy function when the source doesn't overlap the destination.

Again benchmarks are a bit polarized based on how often this is the case, but should be a solid improvement for all non-amd64 users.

Benchmark measured on AMD64 but with -tags=noasm:

>benchstat old.txt new.txt
name        old time/op    new time/op    delta
_UFlat0-8      194µs ± 3%     130µs ± 2%   -33.14%  (p=0.000 n=10+10)
_UFlat1-8     1.62ms ± 1%    1.42ms ± 1%   -11.98%    (p=0.000 n=9+9)
_UFlat2-8     8.91µs ± 4%    8.73µs ± 1%      ~      (p=0.182 n=10+9)
_UFlat3-8      222ns ± 2%     219ns ± 6%    -1.36%   (p=0.022 n=10+9)
_UFlat4-8     28.4µs ± 2%    11.5µs ± 1%   -59.57%  (p=0.000 n=10+10)
_UFlat5-8      797µs ± 5%     536µs ± 1%   -32.77%  (p=0.000 n=10+10)
_UFlat6-8      565µs ± 1%     571µs ± 1%    +1.04%   (p=0.007 n=8+10)
_UFlat7-8      494µs ± 4%     496µs ± 3%      ~     (p=0.986 n=10+10)
_UFlat8-8     1.55ms ± 4%    1.53ms ± 3%      ~     (p=0.280 n=10+10)
_UFlat9-8     1.93ms ± 1%    1.98ms ± 3%    +2.57%  (p=0.000 n=10+10)
_UFlat10-8     186µs ± 2%     102µs ± 2%   -45.14%  (p=0.000 n=10+10)
_UFlat11-8     524µs ± 2%     510µs ± 1%    -2.56%   (p=0.000 n=10+8)

name        old speed      new speed      delta
_UFlat0-8    528MB/s ± 3%   790MB/s ± 1%   +49.54%  (p=0.000 n=10+10)
_UFlat1-8    434MB/s ± 1%   493MB/s ± 1%   +13.61%    (p=0.000 n=9+9)
_UFlat2-8   13.8GB/s ± 4%  14.1GB/s ± 2%      ~      (p=0.182 n=10+9)
_UFlat3-8    901MB/s ± 1%   912MB/s ± 6%    +1.18%    (p=0.026 n=9+9)
_UFlat4-8   3.60GB/s ± 2%  8.91GB/s ± 1%  +147.32%  (p=0.000 n=10+10)
_UFlat5-8    514MB/s ± 5%   764MB/s ± 2%   +48.59%  (p=0.000 n=10+10)
_UFlat6-8    269MB/s ± 1%   266MB/s ± 1%    -1.03%   (p=0.009 n=8+10)
_UFlat7-8    253MB/s ± 4%   252MB/s ± 3%      ~     (p=0.985 n=10+10)
_UFlat8-8    276MB/s ± 4%   279MB/s ± 3%      ~     (p=0.288 n=10+10)
_UFlat9-8    249MB/s ± 1%   243MB/s ± 3%    -2.51%  (p=0.000 n=10+10)
_UFlat10-8   637MB/s ± 2%  1162MB/s ± 2%   +82.29%  (p=0.000 n=10+10)
_UFlat11-8   352MB/s ± 2%   361MB/s ± 1%    +2.62%   (p=0.000 n=10+8)

Copy link
Contributor

@nigeltao nigeltao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're giving benchmark numbers in the commit description, can you also say what the GOARCH is that you measured, not just "non-amd64"? For example, arm, arm64, 386??

decode_other.go Outdated Show resolved Hide resolved
@klauspost
Copy link
Contributor Author

This is measured on AMD64 but with -tags=noasm. I will add it to the description.

@klauspost klauspost force-pushed the use-copy-for-non-overlapping branch 2 times, most recently from 4612910 to efb0d86 Compare September 1, 2019 17:52
Use the built-in copy function when the source doesn't overlap the destination.

Again benchmarks are a bit polarized based on how often this is the case, but should be a solid improvement for all non-amd64 users.

Benchmark  measured on AMD64 but with `-tags=noasm`:

```
>benchstat old.txt new.txt
name        old time/op    new time/op    delta
_UFlat0-8      194µs ± 3%     130µs ± 2%   -33.14%  (p=0.000 n=10+10)
_UFlat1-8     1.62ms ± 1%    1.42ms ± 1%   -11.98%    (p=0.000 n=9+9)
_UFlat2-8     8.91µs ± 4%    8.73µs ± 1%      ~      (p=0.182 n=10+9)
_UFlat3-8      222ns ± 2%     219ns ± 6%    -1.36%   (p=0.022 n=10+9)
_UFlat4-8     28.4µs ± 2%    11.5µs ± 1%   -59.57%  (p=0.000 n=10+10)
_UFlat5-8      797µs ± 5%     536µs ± 1%   -32.77%  (p=0.000 n=10+10)
_UFlat6-8      565µs ± 1%     571µs ± 1%    +1.04%   (p=0.007 n=8+10)
_UFlat7-8      494µs ± 4%     496µs ± 3%      ~     (p=0.986 n=10+10)
_UFlat8-8     1.55ms ± 4%    1.53ms ± 3%      ~     (p=0.280 n=10+10)
_UFlat9-8     1.93ms ± 1%    1.98ms ± 3%    +2.57%  (p=0.000 n=10+10)
_UFlat10-8     186µs ± 2%     102µs ± 2%   -45.14%  (p=0.000 n=10+10)
_UFlat11-8     524µs ± 2%     510µs ± 1%    -2.56%   (p=0.000 n=10+8)

name        old speed      new speed      delta
_UFlat0-8    528MB/s ± 3%   790MB/s ± 1%   +49.54%  (p=0.000 n=10+10)
_UFlat1-8    434MB/s ± 1%   493MB/s ± 1%   +13.61%    (p=0.000 n=9+9)
_UFlat2-8   13.8GB/s ± 4%  14.1GB/s ± 2%      ~      (p=0.182 n=10+9)
_UFlat3-8    901MB/s ± 1%   912MB/s ± 6%    +1.18%    (p=0.026 n=9+9)
_UFlat4-8   3.60GB/s ± 2%  8.91GB/s ± 1%  +147.32%  (p=0.000 n=10+10)
_UFlat5-8    514MB/s ± 5%   764MB/s ± 2%   +48.59%  (p=0.000 n=10+10)
_UFlat6-8    269MB/s ± 1%   266MB/s ± 1%    -1.03%   (p=0.009 n=8+10)
_UFlat7-8    253MB/s ± 4%   252MB/s ± 3%      ~     (p=0.985 n=10+10)
_UFlat8-8    276MB/s ± 4%   279MB/s ± 3%      ~     (p=0.288 n=10+10)
_UFlat9-8    249MB/s ± 1%   243MB/s ± 3%    -2.51%  (p=0.000 n=10+10)
_UFlat10-8   637MB/s ± 2%  1162MB/s ± 2%   +82.29%  (p=0.000 n=10+10)
_UFlat11-8   352MB/s ± 2%   361MB/s ± 1%    +2.62%   (p=0.000 n=10+8)
```

Co-Authored-By: Nigel Tao <nigeltao@golang.org>
@klauspost
Copy link
Contributor Author

@nigeltao updated

@nigeltao nigeltao merged commit c9879f9 into golang:master Sep 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants