Skip to content

Commit

Permalink
searcher: use naive line counting on big-endian
Browse files Browse the repository at this point in the history
This patches out bytecount's "fast" vectorized algorithm on big-endian
machines, where it has been observed to fail. Going forward, bytecount
should probably fix this on their end, but for now, we take a small
performance hit on big-endian machines.

Fixes #1144
  • Loading branch information
BurntSushi committed Feb 9, 2019
1 parent f99b991 commit a4868b8
Showing 1 changed file with 7 additions and 0 deletions.
7 changes: 7 additions & 0 deletions grep-searcher/src/lines.rs
Original file line number Diff line number Diff line change
Expand Up @@ -109,10 +109,17 @@ impl LineStep {
}

/// Count the number of occurrences of `line_term` in `bytes`.
#[cfg(target_endian = "little")]
pub fn count(bytes: &[u8], line_term: u8) -> u64 {
bytecount::count(bytes, line_term) as u64
}

/// Count the number of occurrences of `line_term` in `bytes`.
#[cfg(target_endian = "big")]
pub fn count(bytes: &[u8], line_term: u8) -> u64 {
bytecount::naive_count(bytes, line_term) as u64
}

/// Given a line that possibly ends with a terminator, return that line without
/// the terminator.
#[inline(always)]
Expand Down

6 comments on commit a4868b8

@sylvestre
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running ripgrep 0.10.0 testsuite using grep-searcher 0.1.2 (to address this bug - it required to update encoding-rs-io too) is causing the two test failures:


failures:

---- feature::f416_crlf_only_matching stdout ----
thread 'feature::f416_crlf_only_matching' panicked at '
printed outputs differ!

expected:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Sherlock


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

got:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Sherlock



~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
', tests/feature.rs:455:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
stack backtrace:
   0: std::sys::unix::backtrace::tracing::imp::unwind_backtrace
   1: std::sys_common::backtrace::print
   2: std::panicking::default_hook::{{closure}}
   3: std::panicking::default_hook
   4: std::panicking::rust_panic_with_hook
   5: std::panicking::continue_panic_fmt
   6: std::panicking::begin_panic_fmt
   7: integration::feature::f416_crlf_only_matching::{{closure}}
             at tests/feature.rs:455
   8: integration::feature::f416_crlf_only_matching
             at tests/macros.rs:7
   9: integration::feature::f416_crlf_only_matching::{{closure}}
             at tests/macros.rs:5
  10: core::ops::function::FnOnce::call_once
             at /usr/src/rustc-1.32.0/src/libcore/ops/function.rs:238
  11: <F as alloc::boxed::FnBox<A>>::call_box
  12: __rust_maybe_catch_panic

---- json::crlf stdout ----
thread 'json::crlf' panicked at 'assertion failed: `(left == right)`
  left: `SubMatch { m: Text { text: "Sherlock\r" }, start: 56, end: 65 }`,
 right: `SubMatch { m: Text { text: "Sherlock" }, start: 56, end: 64 }`', tests/json.rs:298:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
stack backtrace:
   0: std::sys::unix::backtrace::tracing::imp::unwind_backtrace
   1: std::sys_common::backtrace::print
   2: std::panicking::default_hook::{{closure}}
   3: std::panicking::default_hook
   4: std::panicking::rust_panic_with_hook
   5: std::panicking::continue_panic_fmt
   6: std::panicking::begin_panic_fmt
   7: integration::json::crlf::{{closure}}
             at tests/json.rs:298
   8: integration::json::crlf
             at tests/macros.rs:7
   9: integration::json::crlf::{{closure}}
             at tests/macros.rs:5
  10: core::ops::function::FnOnce::call_once
             at /usr/src/rustc-1.32.0/src/libcore/ops/function.rs:238
  11: <F as alloc::boxed::FnBox<A>>::call_box
  12: __rust_maybe_catch_panic


failures:
    feature::f416_crlf_only_matching
    json::crlf

test result: FAILED. 185 passed; 2 failed; 0 ignored; 0 measured; 0 filtered out


Rings a bell? Or I report an issue or should I cherry-pick a fix from ripgrep?

@BurntSushi
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. Yeah, you'll probably want to cherry pick. The grep-searcher crate has a crlf related bug fix that I think required updating ripgrep's integration tests, and also appears to require an update in grep-regex as well. I'm beyond the point of being able to do a patch release of ripgrep though, so you might need to manually apply the big endian patch? Alternatively, it looks like a fix in bytecount is coming soon.

This was the commit that fixed the crlf related bugs: 9d70311

@sylvestre
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, maybe it wil be easier to wait for a new version of ripgrep & bytecount

@BurntSushi
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bytecount 0.5.1 is out: https://crates.io/crates/bytecount

I confirmed that it fixes the issue, and master has been updated to use it. I think if you just bump bytecount on your end, then you should be good.

@sylvestre
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BurntSushi do you have an ETA for the new ripgrep various crates?
9d70311 will be hard to backport as it is touching the different crates
thanks

@BurntSushi
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could do a release of the various crates, but I don't have an ETA on the next proper release of ripgrep though. Those are a lot of work.

Also, I don't think you need to backport 9d70311 unless you're specifically looking to patch the CRLF fix, which is really mostly only relevant on Windows. At this point, you should just be able to bump bytecount and the big endian bug should be fixed. What am I missing?

Please sign in to comment.