regex - find files that don't match - regex

I have a list of directories where most are in a format like ./[foobar]/. However, some are formatted like ./[foo] bar/.
I would like to use find (or some other utility my shell offers) to find those directories not matching the first pattern (i.e. having text outside the square braces). Until now, I was unable to find a way to "inverse" my pattern.
Any ways to do this?

You could combine find with grep and it's -v option. find . -type d | grep -v "[foobar]"

find supports negation by means of ! and -not. The latter is not POSIX-compliant. In order to use ! you have to precede it with backslash or put it inside single quotes.

A simple regular glob will work in this particular example:
$ ls
[a]b [ab]
$ echo \[*\]
[ab]
For more complex patterns you can enable extglob:
!(pattern-list)
Matches anything except one of the given patterns
(and similar globs)
Or using find:
find dir ! -name ...

find -type d -name '*\]?*'
Unless you insist on opening bracket check...

Related

Optional character in bash regex [duplicate]

suppose I have two files: ac and abc. I want to find a regex to match both files. Normally I would expect the following regex to work, but it never does:
find ./ -name ab?c
I have tried escaping or not the questionmark, this never seems to work. Normally in the regex documentations I have found; ? means: previous character repeated 0 or 1 times, but find doesn't seem to understand this.
I have tried this on two different find versions: GNU find version 4.2.31 and
find (GNU findutils) 4.6.0
PS: this works with *, but I specifically would like to match just one optional character.
find ./ -name a*c
gives
./ac
./abc
The expression passed to -name is not a regex, it is a glob expression. A (single) glob expression can't be used for your use case but you can use regular expressions using -regex:
find -regex '.*/ab?c'
Btw, the default regular expression language is Emacs Style as explained here : https://www.emacswiki.org/emacs/RegularExpression . You can change the regex language using -regextype.
To match the expression with only one optional character try using or option:
touch abc ac abbc
find . -name "a?c" -or -name "ac"
Gives you only: abc and ac names.
Generally you can build pretty complex find queries using or and and options =)
The find -name option uses a glob pattern, which is not the same as a regex. For globs, ? means any single character. If you want a character to be optional, you need to use two patterns:
find ./ -name abc -o -name ac
Other answers are good enough to have a solution but knowing find's -regex option matches on whole entry is essential. So you can't just do a partial match:
-regex 'ab?c'
You have to use one or two dot-stars:
-regex '.*ab?c.*'
Also without wildcards this would be possible using grep:
ls . | grep 'ab\?c'

linux find files with optional character in their name

suppose I have two files: ac and abc. I want to find a regex to match both files. Normally I would expect the following regex to work, but it never does:
find ./ -name ab?c
I have tried escaping or not the questionmark, this never seems to work. Normally in the regex documentations I have found; ? means: previous character repeated 0 or 1 times, but find doesn't seem to understand this.
I have tried this on two different find versions: GNU find version 4.2.31 and
find (GNU findutils) 4.6.0
PS: this works with *, but I specifically would like to match just one optional character.
find ./ -name a*c
gives
./ac
./abc
The expression passed to -name is not a regex, it is a glob expression. A (single) glob expression can't be used for your use case but you can use regular expressions using -regex:
find -regex '.*/ab?c'
Btw, the default regular expression language is Emacs Style as explained here : https://www.emacswiki.org/emacs/RegularExpression . You can change the regex language using -regextype.
To match the expression with only one optional character try using or option:
touch abc ac abbc
find . -name "a?c" -or -name "ac"
Gives you only: abc and ac names.
Generally you can build pretty complex find queries using or and and options =)
The find -name option uses a glob pattern, which is not the same as a regex. For globs, ? means any single character. If you want a character to be optional, you need to use two patterns:
find ./ -name abc -o -name ac
Other answers are good enough to have a solution but knowing find's -regex option matches on whole entry is essential. So you can't just do a partial match:
-regex 'ab?c'
You have to use one or two dot-stars:
-regex '.*ab?c.*'
Also without wildcards this would be possible using grep:
ls . | grep 'ab\?c'

Check if file exists in a directory specified by regex Linux

I'm trying to use regex inside a file operator to seek for a file where one subdirectory is specified with a regex, but I wonder if it's even possible. I think I tried all the quote and bracket combinations possible. Either I'm missing something here, or a file operator requires a specific path?
I'm not entirely sure. Could somebody please clarify?
What I want to achieve is something like this (this obviously doesn't work because it takes the regex part as name of the subdirectory)
if [[ -r '/agent/[0-9 .]*/bin/run.sh' ]]
find has a -regex option
-regex pattern
File name matches regular expression pattern. This is a match on the whole path, not a search. For example, to match a file
named ./fubar3', you can use the regular expression.*bar.' or
.*b.*3, but not f.*r3. The regular expressions understood by find are by default Emacs Regular Expressions, but this can be changed with the
-regextype option.
if find . -regex '/agent/[0-9 \.]*/bin/run.sh'
According to the manual, only the =~ operator does regex matching.
There's another way though:
RUNFILES=$( find /agent -name run.sh | grep -e '^/agent/[0-9 .]*/bin/run.sh$' )
if [ -n "$RUNFILES" ]
then
[...]
fi
You can simply use the ls command with a regular expression:
if ls /agent/[0-9\ .]*/bin/run.sh

finding directories with regex

I'm trying to find directories that begin with '6g' then a 2 letter state, and 4 digit number. an example directory would be /home/6gAL0533/
typically I copy to all directories starting with 6g using
"find 6g* -maxdepth 0 -type d"
but I am at a point now where I need to copy files to particular directories based on their 4 digit number (for instance 0300 - 0500) but I cant seem to get the find command to work for me. I think i need to use regex with the single character "." like "find -type d -regex '6g..0[3-5]...' " but that returns no results. I am probably using regex syntax incorrectly, but I haven't found much info on using regex to find directories. Any help would be appreciated. Thanks!
You don't need find for this at all, and you also don't need regular expressions; glob patterns are quite adequate (which means you can be compatible with POSIX find, rather than requiring a GNU-extended version):
shopt -s nullglob
dirs=( /home/6g[A-Za-z][A-Za-z]0[3-5][0-9][0-9][0-9] )
printf '%q\n' "${dirs[#]}" # print results
cp -- "${dirs[#]}" /to/destination # copy results somewhere else
...or...
dirs=( /home/6g??0[3-5]??? ) # use wildcards, as in your proposed regex
...or...
find /home -type d -maxdepth 1 -name '6g??0[3-5]???'
If you want to use find regexes (as Charles Duffy points out, there are a variety of simpler solutions), you need to remember that -regex looks for matches to the entire pathname.
As documented in man find:
-regex pattern
File name matches regular expression pattern. This is a match
on the whole path, not a search. For example, to match a file
named `./fubar3', you can use the regular expression `.*bar.' or
`.*b.*3', but not `f.*r3'. The regular expressions understood by
find are by default Emacs Regular Expressions, but this can be
changed with the -regextype option.
Consequently, you need:
find -type d -regex '.*/6g..0[3-5]...'

regex match either string in linux "find" command

I'm trying the following to recursively look for files ending in either .py or .py.server:
$ find -name "stub*.py(|\.server)"
However this does not work.
I've tried variations like:
$ find -name "stub*.(py|py\.server)"
They do not work either.
A simple find -name "*.py" does work so how come this regex does not?
Say:
find . \( -name "*.py" -o -name "*.py.server" \)
Saying so would result in file names matching *.py and *.py.server.
From man find:
expr1 -o expr2
Or; expr2 is not evaluated if expr1 is true.
EDIT: If you want to specify a regex, use the -regex option:
find . -type f -regex ".*\.\(py\|py\.server\)"
Find can take a regular expression pattern:
$ find . -regextype posix-extended -regex '.*[.]py([.]server)?$' -print
Options:
-regex pattern
File name matches regular expression pattern. This is a match on the whole path, not a search. For example, to match a file named ./fubar3, you can use the regular expression .*bar. or
.*b.*3, but not f.*r3. The regular expressions understood by find are by default Emacs
Regular Expressions, but this can be changed with the -regextype option.
-print True;
print the full file name on the standard output, followed by a newline. If you are piping
the output of find into another program and there is the faintest possibility that the files which
you are searching for might contain a newline, then you should seriously consider using the
-print0 option instead of -print. See the UNUSUAL FILENAMES section for information about how
unusual characters in filenames are handled.
-regextype type
Changes the regular expression syntax understood by -regex and -iregex tests which occur later on
the command line. Currently-implemented types are emacs (this is the default), posix-awk, posix-
basic, posix-egrep and posix-extended.
A clearer description or the options. Don't forgot all the information can be found by reading man find or info find.
find -name does not use regexp, here's an extract from the man page on Ubuntu 12.04
-name pattern
Base of file name (the path with the leading directories
removed) matches shell pattern pattern. The metacharacters
(`*', `?', and `[]') match a `.' at the start of the base name
(this is a change in findutils-4.2.2; see section STANDARDS CON‐
FORMANCE below). To ignore a directory and the files under it,
use -prune; see an example in the description of -path. Braces
are not recognised as being special, despite the fact that some
shells including Bash imbue braces with a special meaning in
shell patterns. The filename matching is performed with the use
of the fnmatch(3) library function. Don't forget to enclose
the pattern in quotes in order to protect it from expansion by
the shell.
So the pattern that -name takes is more like a shell glob and not at all like a regexp
If I wanted to find by regexp I'd do something like
find . -type f -print | egrep 'stub(\.py|\.server)'