Skip to content

Snappy compression implemented in Rust (including the Snappy frame format).

License

Notifications You must be signed in to change notification settings

BurntSushi/rust-snappy

Repository files navigation

snap

A pure Rust implementation of the Snappy compression algorithm. Includes streaming compression and decompression using the Snappy frame format. This implementation is ported from both the reference C++ implementation and the Go implementation.

Build status

Licensed under the BSD 3-Clause.

Documentation

https://docs.rs/snap

Usage

Add this to your Cargo.toml:

[dependencies]
snap = "0.2"

Example: compress data on stdin

This program reads data from stdin, compresses it and emits it to stdout. This example can be found in examples/compress.rs:

use std::io;

fn main() {
    let stdin = io::stdin();
    let stdout = io::stdout();

    let mut rdr = stdin.lock();
    // Wrap the stdout writer in a Snappy writer.
    let mut wtr = snap::Writer::new(stdout.lock());
    io::copy(&mut rdr, &mut wtr).expect("I/O operation failed");
}

Example: decompress data on stdin

This program reads data from stdin, decompresses it and emits it to stdout. This example can be found in examples/decompress.rs:

use std::io;

fn main() {
    let stdin = io::stdin();
    let stdout = io::stdout();

    // Wrap the stdin reader in a Snappy reader.
    let mut rdr = snap::read::FrameDecoder::new(stdin.lock());
    let mut wtr = stdout.lock();
    io::copy(&mut rdr, &mut wtr).expect("I/O operation failed");
}

Example: the szip tool

szip is a tool with similar behavior as gzip, except it uses Snappy compression. It can be installed with Cargo:

$ cargo install szip

To compress a file, run szip file. To decompress a file, run szip -d file.sz. See szip --help for more details.

Testing

This crate is tested against the reference C++ implementation of Snappy. Currently, compression is byte-for-byte equivalent with the C++ implementation. This seems like a reasonable starting point, although it is not necessarily a goal to always maintain byte-for-byte equivalence.

Tests against the reference C++ implementation can be run with cargo test --features cpp. Note that you will need to have the C++ Snappy library in your LD_LIBRARY_PATH (or equivalent).

To run tests, you'll need to explicitly run the test crate:

$ cargo test --manifest-path test/Cargo.toml

To test that this library matches the output of the reference C++ library, use:

$ cargo test --manifest-path test/Cargo.toml --features cpp

Tests are in a separate crate because of the dependency on the C++ reference library. Namely, Cargo does not yet permit optional dev dependencies.

Performance

The performance of this implementation should roughly match the performance of the C++ implementation on x86_64. Below are the results of the microbenchmarks (as defined in the C++ library):

name             cpp ns/iter           rust ns/iter         diff ns/iter   diff %
uflat00_html     45,097 (2270 MB/s)    44,372 (2307 MB/s)           -725   -1.61%
uflat01_urls     496,988 (1412 MB/s)   475,693 (1475 MB/s)       -21,295   -4.28%
uflat02_jpg      4,800 (25644 MB/s)    4,935 (24942 MB/s)            135    2.81%
uflat03_jpg_200  144 (1388 MB/s)       127 (1574 MB/s)               -17  -11.81%
uflat04_pdf      6,699 (15285 MB/s)    6,586 (15548 MB/s)           -113   -1.69%
uflat05_html4    187,082 (2189 MB/s)   184,941 (2214 MB/s)        -2,141   -1.14%
uflat06_txt1     152,245 (998 MB/s)    152,185 (999 MB/s)            -60   -0.04%
uflat07_txt2     134,235 (932 MB/s)    135,057 (926 MB/s)            822    0.61%
uflat08_txt3     407,234 (1047 MB/s)   418,990 (1018 MB/s)        11,756    2.89%
uflat09_txt4     563,671 (854 MB/s)    580,281 (830 MB/s)         16,610    2.95%
uflat10_pb       42,207 (2809 MB/s)    41,624 (2849 MB/s)           -583   -1.38%
uflat11_gaviota  159,276 (1157 MB/s)   153,006 (1204 MB/s)        -6,270   -3.94%
zflat00_html     108,043 (947 MB/s)    104,306 (981 MB/s)         -3,737   -3.46%
zflat01_urls     1,416,005 (495 MB/s)  1,305,846 (537 MB/s)     -110,159   -7.78%
zflat02_jpg      8,260 (14902 MB/s)    8,372 (14702 MB/s)            112    1.36%
zflat03_jpg_200  329 (607 MB/s)        247 (809 MB/s)                -82  -24.92%
zflat04_pdf      12,279 (8339 MB/s)    11,351 (9021 MB/s)           -928   -7.56%
zflat05_html4    465,677 (879 MB/s)    448,619 (913 MB/s)        -17,058   -3.66%
zflat06_txt1     461,344 (329 MB/s)    442,385 (343 MB/s)        -18,959   -4.11%
zflat07_txt2     409,416 (305 MB/s)    393,293 (318 MB/s)        -16,123   -3.94%
zflat08_txt3     1,194,880 (357 MB/s)  1,178,756 (362 MB/s)      -16,124   -1.35%
zflat09_txt4     1,638,914 (294 MB/s)  1,614,618 (298 MB/s)      -24,296   -1.48%
zflat10_pb       100,514 (1179 MB/s)   97,523 (1216 MB/s)         -2,991   -2.98%
zflat11_gaviota  358,002 (514 MB/s)    326,086 (565 MB/s)        -31,916   -8.92%

Notes: These benchmarks were run with Snappy/C++ on commit 32d6d7 with debug assertions disabled. Both the C++ and Rust benchmarks were run with the same benchmark harness. Benchmarks were run on an Intel i7-6900K.

For reference, here are the benchmarks run on the same machine from the Go implementation of Snappy (which has a hand rolled implementation in Assembly). Note that these were run using Go's microbenchmark tool.

Benchmark_UFlat0           20000             50325 ns/op        2034.75 MB/s
Benchmark_UFlat1            3000            518867 ns/op        1353.11 MB/s
Benchmark_UFlat2          300000              5934 ns/op        20741.56 MB/s
Benchmark_UFlat3        20000000               113 ns/op        1766.48 MB/s
Benchmark_UFlat4          200000              7124 ns/op        14372.85 MB/s
Benchmark_UFlat5           10000            218680 ns/op        1873.05 MB/s
Benchmark_UFlat6           10000            193376 ns/op         786.49 MB/s
Benchmark_UFlat7           10000            165456 ns/op         756.57 MB/s
Benchmark_UFlat8            3000            505216 ns/op         844.69 MB/s
Benchmark_UFlat9            2000            678399 ns/op         710.29 MB/s
Benchmark_UFlat10          30000             42303 ns/op        2803.29 MB/s
Benchmark_UFlat11          10000            186899 ns/op         986.20 MB/s
Benchmark_ZFlat0           10000            102311 ns/op        1000.86 MB/s
Benchmark_ZFlat1            1000           1336789 ns/op         525.20 MB/s
Benchmark_ZFlat2          200000              8480 ns/op        14515.18 MB/s
Benchmark_ZFlat3         5000000               267 ns/op         746.44 MB/s
Benchmark_ZFlat4          200000             11749 ns/op        8715.03 MB/s
Benchmark_ZFlat5            3000            436820 ns/op         937.68 MB/s
Benchmark_ZFlat6            3000            422042 ns/op         360.36 MB/s
Benchmark_ZFlat7            5000            376019 ns/op         332.91 MB/s
Benchmark_ZFlat8            2000           1133338 ns/op         376.55 MB/s
Benchmark_ZFlat9            1000           1559530 ns/op         308.98 MB/s
Benchmark_ZFlat10          20000             91263 ns/op        1299.41 MB/s
Benchmark_ZFlat11           5000            323804 ns/op         569.23 MB/s

Comparison with other Snappy crates

  • snappy - These are bindings to the C++ library. No support for the Snappy frame format.
  • snappy_framed - Implements the Snappy frame format on top of the snappy crate.
  • rsnappy - Written in pure Rust, but lacks documentation and the Snappy frame format. Performance is unclear and tests appear incomplete.
  • snzip - Was created and immediately yanked from crates.io.

About

Snappy compression implemented in Rust (including the Snappy frame format).

Resources

License

Stars

Watchers

Forks

Sponsor this project

 

Packages

No packages published

Languages