Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: spiraldb/fsst
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: v0.2.3
Choose a base ref
...
head repository: spiraldb/fsst
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: v0.3.0
Choose a head ref
  • 4 commits
  • 17 files changed
  • 2 contributors

Commits on Aug 28, 2024

  1. centering (#26)

    a10y committed Aug 28, 2024
    Configuration menu
    Copy the full SHA
    38017d0 View commit details
    Browse the repository at this point in the history

Commits on Sep 3, 2024

  1. feat: port in more from the C++ code (#24)

    This PR ports in some more functionality based on the MIT-licensed C++
    code from CWI.
    
    In particular, it implements the following:
    
    * The `makeSample` function from C++ to build a sample of ~16KB from the
    input data
    * The `suffix limit` optimization and its corresponding `finalize`
    method needed when building the symbol table, including changes to the
    `compress_word` function we have that more directly corresponds to the
    `compressVariant` from the C++ code
    * The `byteCodes` from C++, which we implement here as `codes_one_byte`.
    Note that before this PR, one-byte codes would not be found unless the
    byte occurred at the end of the plaintext string
    * Separates the `Compressor` build state into a new `CompressorBuilder`
    struct, which has all methods that take `&mut self`. This also means
    that we can in theory construct a `Compressor` now from a symbol table,
    though that logic is not implemented.
    
    Additional things in this PR:
    
    * Added a micro benchmark for `compress_word` method comparing the
    relative speeds of both code paths, see
    #24 (comment)
    * Removed many of the old small-data benchmarks. I've added several of
    the `dbtext` compression benchmarks from the CWI paper. Here's a table
    of the compression factors:
    
    dbtext | c++ compress factor | fsst-rs compress factor
    -------|-----|-------
    l_comment | 2.73 | 2.69
    urls | 2.33 | 2.27
    wikipedia | 1.81 | 1.75
    
    I'll follow up to figure out how to close the gap with those 1-2%
    differences
    a10y committed Sep 3, 2024
    Configuration menu
    Copy the full SHA
    c944de6 View commit details
    Browse the repository at this point in the history
  2. chore: release v0.3.0 (#27)

    ## 🤖 New release
    * `fsst-rs`: 0.2.3 -> 0.3.0
    
    <details><summary><i><b>Changelog</b></i></summary><p>
    
    <blockquote>
    
    ## [0.3.0](v0.2.3...v0.3.0) -
    2024-09-03
    
    ### Added
    - port in more from the C++ code
    ([#24](#24))
    
    ### Other
    - centering ([#26](#26))
    </blockquote>
    
    
    </p></details>
    
    ---
    This PR was generated with
    [release-plz](https://github.com/MarcoIeni/release-plz/).
    
    Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
    github-actions[bot] committed Sep 3, 2024
    Configuration menu
    Copy the full SHA
    5ac1e10 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    bca81cb View commit details
    Browse the repository at this point in the history
Loading