Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation for --ignore #1084

Open
ddickstein opened this issue May 17, 2017 · 6 comments
Open

Documentation for --ignore #1084

ddickstein opened this issue May 17, 2017 · 6 comments

Comments

@ddickstein
Copy link

Please provide documentation for how to use --ignore. It just says "PATTERN" but gives no information for what kind of pattern it's looking for or any example usages. A quick google search reveals dozens of different formats for the pattern, most of which do not work. My goal was to follow symlinks but exclude a number of directories, and after trying at least 10 different iterations, none of them seemed to work.

@canvural
Copy link

Its any valid regex pattern.

@ddickstein
Copy link
Author

Wasn't working for me. And it's not clear from just PATTERN if it's referring to a regex or a shell pattern, which are different.

@okdana
Copy link
Contributor

okdana commented Jun 13, 2017

I have long had trouble with ignore patterns myself — they never seem to behave quite the way i expect. So i looked into it, and i've found that the behaviour is actually quite complex. I'm not an expert in C but i think this is mostly correct:

Notes

  • Ignore patterns are never applied to file names provided directly on the command line — e.g., if you do ag x foo.c (where foo.c is a regular file), foo.c will never be ignored, regardless of any other options you might supply.

  • Ignore patterns are never applied if you use -u without -p. I feel like this is a bug. You can work around it like this: ag -up /dev/null --ignore y x foo/

  • Contrary to what canvural said, under no circumstances are ignore patterns ever treated as regex(7)- or PCRE-style regular expressions.

fnmatch patterns

If the ignore pattern contains one of a handful of meta-characters (including !, *, and ?), it's treated as an fnmatch pattern (i'm calling them that because they pass the is_fnmatch() check). There are several different variations on this type of pattern, which are handled as follows:

File-extension patterns

If the pattern begins with *. and the following characters contain a . and don't contain any other meta-characters, it's treated as a file-extension pattern. I think there are a two issues here:

  • The requirement to contain a second . doesn't make sense to me. Why should *.min.js be treated as a file extension, but *.js not? Is the logic inverted here?

  • Even then, the extension-matching behaviour is surprising — it treats everything after the first dot in a file name as the extension. In other words, in the file name foo.bar.min.js, the file extension is bar.min.js. This wouldn't be an issue if the ignored extension was matched against the right side of the file extension, but it's actually just a strcmp(). So if i tell ag to ignore *.min.js, it will not in fact ignore foo.bar.min.js, since bar.min.js doesn't equal min.js.

It's worth noting also that file-extension patterns are matched against directories. I think this is deliberate, but it does seem a little surprising for a special-case file-extension-matching feature.

Anyway, i assume that these patterns are special-cased for performance reasons — ignoring file extensions is a very common use case, and it's faster to do a strcmp() against a set of fixed strings than to perform actual pattern-matching on each file. IMO, though, the fact that it's a very common use case means that this particular ignore functionality, above all others, should Just Work.

Slash-regex patterns

If the pattern begins with a /, it's treated as a slash-regex pattern.

Slash-regex patterns are anchored to the beginning of the search path supplied on the command line. So, for example, if you run ag --ignore '/foo*' x foobar/baz/ baz/foobar/, ag will ignore every file under foobar/baz, but not any files under baz/foobar.

After some minor normalisation (stripping the leading slash, &c.), the slash-regex pattern is passed directly to fnmatch() to be compared against the complete file path (relative to and including the search path).

Invert-regex patterns

If the pattern begins with a !, it's treated as an invert-regex pattern.

Invert-regex patterns are used to 'white-list' files that would otherwise match a standard regex pattern (described below). They can NOT be combined with the slash-regex behaviour, and ignores from slash-regex patterns take precedence over 'un-ignores' from invert-regex patterns. For example:

# Slash-regex pattern '/foo*' wins, will ignore all files
% ag --ignore '/foo*' --ignore '!foo*' x foobar/baz

# Invert-regex pattern '!foo*' wins, no files will be ignored
% ag --ignore 'foo*' --ignore '!foo*' x foobar/baz

Invert-regex patterns are matched against each individual segment of the path. As a result, any invert-regex pattern containing a / is effectively dropped. In other words, you can't use !a*/d* to match a path like /foo/bar/abc/def.

And again, despite the name, these are not actually regex patterns, they're passed to fnmatch().

Regex patterns

Any other non-literal pattern is a standard regex pattern.

Like invert-regex patterns, these are matched against each individual segment of the path, and again they are not actually regex patterns, they're passed to fnmatch().

Static patterns

Patterns that don't contain meta-characters are treated as static patterns. There are two variations on these:

Slash-static patterns

Like slash-regex patterns, slash-static patterns are prefixed with a / and are anchored to the beginning of the search path provided on the command line. So ag --ignore '/foo/bar' x foo/bar/ baz/foo/bar/ would ignore all of the files under foo/bar but not any under baz/foo/bar.

Name-static patterns

Anything else is treated as a standard or name-static pattern. Like regex patterns, these are matched against each individual segment of the path (so bar will match the file foo/bar/baz.c).

Unlike every other kind of pattern, name-static patterns are also matched across path segments — so for example b/c will match a/b/c/d/foo.c (but not a/b/ccc/d/foo.c).


I hope it's not mean of me to say so but i would call this design sub-optimal. There are so many special cases and exceptions, it's quite difficult to even document the behaviour (as you can see), let alone expect users to remember it.

Given that ag at least nominally supports reading from VCS ignore files, i think the expectation that most users would have is that patterns supplied to --ignore are treated identically to patterns found in .ignore or .gitignore, and those patterns are treated more or less the same way git treats them (described here).

I understand that there are performance considerations and that git has some special features that would have to be re-implemented (like its handling of ** and trailing /), but that's my gut reaction anyway.

@alecbz
Copy link

alecbz commented Sep 10, 2017

Oh interesting, I didn't realize this flag existed when I filed #1138. But given the apparently strange behavior of this flag, maybe --invert-file-search-regex is exactly what's desired here? As implemented in #1150, it's exactly the inverse of the -G/--file-search-regex flag.

@abitrolly
Copy link

With this documentation in mind, I still don't see how to ignore all .so files, including versioned.

/usr/lib64/firefox/libmozsqlite3.so
/usr/lib64/vtk/libvtksqlite.so.1
/usr/lib64/libgdal.so.20.4.2

@dosentmatter
Copy link

With this documentation in mind, I still don't see how to ignore all .so files, including versioned.

/usr/lib64/firefox/libmozsqlite3.so
/usr/lib64/vtk/libvtksqlite.so.1
/usr/lib64/libgdal.so.20.4.2

ag -gl --ignore '*.so*'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants