Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible bad literal extraction #93

Closed
brookst opened this issue Sep 25, 2016 · 1 comment
Closed

Possible bad literal extraction #93

brookst opened this issue Sep 25, 2016 · 1 comment
Labels
bug A bug.

Comments

@brookst
Copy link

brookst commented Sep 25, 2016

I was trying to search for IP addresses and ran into an odd problem. I tried the expression '(\d{1,3}\.){3}\d{1,3}' and found no matches. I changed from a fixed repetition of the subgroup to one or more - '(\d{1,3}\.)+\d{1,3}' - and got matches. Using the --debug flag I see that the first case looks for the wrong literal:

DEBUG:grep::literals: required literal found: "..."

The second case only looks for a single . as expected.

Is this a bug in the regex parsing or am I doing something wrong?

@BurntSushi BurntSushi added the bug A bug. label Sep 26, 2016
@BurntSushi
Copy link
Owner

It's a pretty sweet bug. This is indeed a bug in ripgrep's inner literal extraction. (Over aggressive literal extraction is one of the largest sources of bugs in the regex engine too.)

amsharma91 added a commit to amsharma91/ripgrep that referenced this issue Sep 27, 2016
If we do, this results in extracting `foofoofoo` from `(\wfoo){3}`,
which is wrong. This does prevent us from extracting `foofoofoo` from
`foo{3}`, which is unfortunate, but we miss plenty of other stuff too.
Literal extracting needs a good rethink (all the way down into the regex
engine).

Fixes BurntSushi#93
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A bug.
Projects
None yet
Development

No branches or pull requests

2 participants