Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Obvious match missing from results when using group: a(.*c) does not match abc #1064

Closed
phiresky opened this issue Sep 24, 2018 · 3 comments · Fixed by #1065
Closed

Obvious match missing from results when using group: a(.*c) does not match abc #1064

phiresky opened this issue Sep 24, 2018 · 3 comments · Fixed by #1065
Labels
bug A bug.

Comments

@phiresky
Copy link
Contributor

What version of ripgrep are you using?

ripgrep 0.10.0
-SIMD -AVX (compiled)
+SIMD +AVX (runtime)

Bug

Example:

echo abc | rg --debug 'a(.*c)'

DEBUG|grep_regex::literal|grep-regex/src/literal.rs:110: required literal found: "ac"
DEBUG|globset|globset/src/lib.rs:429: built glob set; 0 literals, 0 basenames, 8 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes

doesn't find any results. The following all work:

echo ac | rg 'a(.*c)'

echo abc | rg 'a.*c'

echo abc | ag 'a(.*c)'

echo abc | grep 'a\(.*c\)'

@lespea
Copy link

lespea commented Sep 24, 2018

Interestingly enough this also fails: echo abc | rg --debug 'a(?:.*c)' but this works: echo abc | rg --debug 'a(.*)c'

@BurntSushi
Copy link
Owner

Ug. This looks like a bug in ripgrep's inner literal extractor. It thinks ac is a literal that must appear in every match, but this is obviously wrong. The regex engine prefix/suffix detection is correct:

[andrew@Cheetah ~]$ regex-debug prefixes 'a(.*c)'
Cut(a)
[andrew@Cheetah ~]$ regex-debug suffixes 'a(.*c)'
Cut(c)

I diagnosed this from the --debug output. Cutting out the irrelevant stuff, you can see the problem here:

[andrew@Cheetah ~]$ echo abc | rg 'a(.*c)' --debug
DEBUG|grep_regex::literal|grep-regex/src/literal.rs:110: required literal found: "ac"
[andrew@Cheetah ~]$ echo abc | rg 'a.*c' --debug
DEBUG|grep_regex::literal|grep-regex/src/literal.rs:110: required literal found: "a"

I would have thought this was a regression since I touched this code as part of libripgrep work, but 0.9.0 exhibits the same bug.

@BurntSushi BurntSushi added the bug A bug. label Sep 24, 2018
@phiresky
Copy link
Contributor Author

checking out ripgrep 0.0.1 (with regex = "0.1.75") has the same bug

BurntSushi added a commit that referenced this issue Sep 25, 2018
It seems the inner literal detector fails spectacularly in cases of
concatenations that involve groups. The issue here is that if the prefix
of a group inside a concatenation can match the empty string, then any
literals generated to that point in the concatenation need to be cut
such that they are never extended. The detector isn't really built to
handle this case, so we just act conservative cut literals whenever we
see a sub-group. This may make some regexes slower, but the inner
literal detector already misses plenty of cases.

Literal detection (including in the regex engine) is a key component
that needs to be completely rethought at some point.

Fixes #1064
BurntSushi added a commit that referenced this issue Sep 25, 2018
It seems the inner literal detector fails spectacularly in cases of
concatenations that involve groups. The issue here is that if the prefix
of a group inside a concatenation can match the empty string, then any
literals generated to that point in the concatenation need to be cut
such that they are never extended. The detector isn't really built to
handle this case, so we just act conservative cut literals whenever we
see a sub-group. This may make some regexes slower, but the inner
literal detector already misses plenty of cases.

Literal detection (including in the regex engine) is a key component
that needs to be completely rethought at some point.

Fixes #1064
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants