find with regex. Using string literals from variable - regex

I know that shell doesn't really have arrays, but I know that I can do this with a list of values:
dir_array=("quarantine" "720" "low" "high" "DVD" "error" "keep")
for d in "${dir_array[#]}"
do
…
done
I also know that I can exclude these directories from find using -regex and -prune:
find -E . \
-type d -regex './(DVD|quarantine|720|high|low|error|keep)' -prune -o \
-type f -iregex '.*.(avi|wmv|mp4|m4v|mov|mkv)' -print
So, finally, here's my question:
How can I use my original $dir_array in the (first) regex in the find instead of repeating myself?

you could convert the array into a string variable and then use the variable in the find command, like this:
str=`echo "./(${dir_array[#]})" | sed "s/ /\|/g;"`
echo "$str"

Related

Recursively find filenames of exactly 8 hex characters, but not all 0-9, no lookahead (Mac terminal, bash)

I'm trying to write a regex to find files recursively with Mac Terminal (bash, not zsh even though Catalina wants me to switch over for whatever reason) using the find command. I'm looking for files that are:
Exactly 8 hexadecimal digits (0-9 and A-F)
But NOT only decimal digits (0-9)
In other words, it would match A1234567, ABC12DEF, 12345ABC, and ABCDABCD, but not 12345678 or 09876543.
To find files that are exactly 8 hex digits, I've used this:
find -E . -type f -regex '.*/[A-F0-9]{8}'
The .*/ is necessary to allow the full path name to precede the filename. This is eventually going to get fed to rm, so I have to keep the path.
It SEEMS like this should work to fulfill both of my requirements:
find -E . -type f -regex '.*/(?![0-9]{8})[A-F0-9]{8}'
But that returns an error:
find: -regex: .*/(?![0-9]{8})[A-F0-9]{8}: repetition-operator operand invalid
It seems like the find command doesn't support lookaheads. How can I do this without one?
With any POSIX-compliant find
find . -type f \
-name '????????' \
! -name '*[![:xdigit:]]*' \
-name '*[![:digit:]]*'
And if you insist on using regexps for this, here you go
find -E . -type f \
-regex '.*/[[:xdigit:]]{8}' \
! -regex '.*/[[:digit:]]*'
Those who use GNU find should drop -E and insert -regextype posix-extended after paths to make this work.
It's probably easiest to just filter out the results you don't like:
find -E . -type f -regex '.*/[A-F0-9]{8}' -print | egrep -v '.*/[0-9]{8}$'
$ find -E . -type f -regex '.*/[A-F0-9]{8}' -print
./01234567
./ABCDEFAF
./ABCDEF01
./ABCDEF2A
./ABCDEFA2
./x/01234567
./x/ABCDEFAF
./x/ABCDEF01
./x/ABCDEF2A
./x/ABCDEFA2
$ find -E . -type f -regex '.*/[A-F0-9]{8}' -print | egrep -v '.*/[0-9]{8}$'
./01234567
./ABCDEFAF
./ABCDEF01
./ABCDEF2A
./ABCDEFA2
./x/01234567
./x/ABCDEFAF
./x/ABCDEF01
./x/ABCDEF2A
./x/ABCDEFA2
My find didn't understand -E and was inexplicably grumpy about -regex in general, but this still worked:
find . -type f -name '[A-F0-9][A-F0-9][A-F0-9][A-F0-9][A-F0-9][A-F0-9][A-F0-9][A-F0-9]' -a -name '*[A-F]*'
Not as elegant as oguz ismail's, but easier to read for my clogged brain, lol

Regex: Find files not ending with numeral suffix

I need to make a command which returns all files without numeral suffix (*.0, *.123, ...)
Have for example three files:
gg.p qqq.449 rtr55
I want to find only these:
./rtr55
./gg.p
I tried to find them using grep. However I got only results with no effect.
find -type f | grep -v '\.[0-9]+$'
(This command returned:)
./qqq.449
./rtr55
./gg.p
So there is probably some regex format error. Do you know, how to fix it?
The + operator belongs to the extended regular expressions. There are many workarounds:
find -type f | grep -v '\.[0-9]\+$'
find -type f | egrep -v '\.[0-9]+$'
find -type f | grep -E -v '\.[0-9]+$'
find -type f | grep -v '\.[0-9][0-9]*$'
Why would you use grep at all?
find -regex '.*\.[0-9][0-9]*' -prune -o -type f
If your expressions are simple enough (or your find doesn't support -regex), you could use -name instead of -regex but a glob wildcard can't capture an arbitrary amount of numbers after the dot. Here's one or two:
find -name '*.[0-9]' -prune -o -name '*.[0-9][0-9]' -prune -o -type f
Notice that this isn't purely an efficiency question; grep would simply not do the right thing if you ever come across file names with newlines in them.

find file with numeric values greater than a specified number

When I run the following command, I get a list of files
find . -type f -name '*_duplicate_[0-9]*.txt'
./prefix_duplicate_001.txt
./prefix_duplicate_002.txt
./prefix_duplicate_003.txt
./prefix_duplicate_004.txt
./prefix_duplicate_005.txt
Now I'm only interested in files which have the numbers greater than or equal to 003. How can I get this done?
Thank you in advance.
Using -regex option in find, you can tweak regex to get all files with 3 or higher value after _duplicate_ with leading zeroes:
find . -regextype posix-extended -type f \
-regex '.*_duplicate_0*([3-9]|[1-9][0-9])[0-9]*\.txt'
On OSX use this find:
find -E . -type f -regex '.*_duplicate_0*([3-9]|[1-9][0-9])[0-9]*\.txt'
./prefix_duplicate_003.txt
./prefix_duplicate_004.txt
./prefix_duplicate_005.txt
use this pattern
.*_duplicate_(?!00[1-2])\d{3}\.txt
Demo
As much as I like to use single commands when possible, I think maybe this is what you need here:
find . -type f -name '*_duplicate_[0-9]*.mat' | awk -F '[_.]' '$4 > 3 { print $0 }'
There are variations on that - for example, this:
find . -type f -name "*.mat" | awk -F '[_.]' '$0 ~ /_duplicate_[0-9]*.mat/ && $4 > 3 { print $0 }'
But I'm not sure it really makes a difference from an efficiency standpoint...
00[3-9]|(([1-9]\\d\\d)|(\\d[1-9]\\d))
only for the number part.

How to recursively change files in directories whose name matches a string in Perl?

I have many directories for different projects. Under some project directories, there are subdirectories named "matlab_programs". In only subdirectories named matlab_programs, I would like to replace the string 'red' with 'blue' in files ending with *.m.
The following perl code will recursively replace the strings in all *.m files, regardless of what subdirectories the files are in.
find . -name "*.m" | xargs perl -p -i -e "s/red/blue/g"
And to find the full paths of all directories called matlab_programs,
find . -type d -name "matlab_programs"
How can I combine these so I only replace strings if the files are in a subdirectory called matlab_programs?
Perl has the excellent File::Find module, that lets you specify a callback to be called on each file.
So you can specified a complex compound criteria, like this:
#!/usr/bin/env perl
use strict;
use warnings;
use File::Find;
sub find_files {
next unless m/\.m\z/; # skip any files that don't end in .m
if ( $File::Find::dir =~ m/matlab_programs$/ ) {
print $File::Find::name, " found\n";
}
}
find( \&find_files, "." );
And then you can do whatever you wish with the files you find - like opening/text replacing and closing.
You want to find all directories named matlab_programs using
find . -type d -name "matlab_programs"
and then execute
find $f -name "*.m" | xargs perl -p -i -e "s/red/blue/g"
on all results $f. Judging by your use of xargs, there are no special characters such as spaces in your file names. so the following should work:
find `find . -type d -name "matlab_programs"` -name "*.m" |
xargs perl -p -i -e "s/red/blue/g"
or
find . -type d -name "matlab_programs" |
while read f
do
find $f -name "*.m" | xargs perl -p -i -e "s/red/blue/g"
done |
xargs perl -p -i -e "s/red/blue/g"
Incidentally, I'd use single quotes here; I always use them whenever the quoted string is to be taken literally.
Do you have bash? The $(...) syntax works like backticks (the way both the shell and Perl use them) but they can be nested.
perl -pi -e s/red/blue/g $(find $(find . -type d -name matlab_programs) -type f -name \*.m)
Many flavors of find also support a -path pattern test, so you can just combine your filename conditions into that argument
perl -pi -e s/red/blue/g $(find . -type f -path \*/matlab_programs/\*.m)

regextype with find command

I am trying to use the find command with -regextype but it could not able to work properly.
I am trying to find all c and h files send them to pipe and grep the name, func_foo inside those files. What am I missing?
$ find ./ -regextype sed -regex ".*\[c|h]" | xargs grep -n --color func_foo
Also in a similar aspect I tried the following command but it gives me an error like paths must precede expression:
$ find ./ -type f "*.c" | xargs grep -n --color func_foo
The accepted answer contains some inaccuracies.
On my system, GNU find's manpage says to run find -regextype help to see the list of supported regex types.
# find -regextype help
find: Unknown regular expression type 'help'; valid types are 'findutils-default', 'awk', 'egrep', 'ed', 'emacs', 'gnu-awk', 'grep', 'posix-awk', 'posix-basic', 'posix-egrep', 'posix-extended', 'posix-minimal-basic', 'sed'.
E.g. find . -regextype egrep -regex '.*\.(c|h)' finds .c and .h files.
Your regexp syntax was wrong, you had square brackets instead of parentheses. With square brackets, it would be [ch].
You can just use the default regexp type as well: find . -regex '.*\.\(c\|h\)$' also works. Notice that you have to escape (, |, ) characters in this case (with sed regextype as well). You don't have to escape them when using egrep, posix-egrep, posix-extended.
Why not just do:
find ./ -name "*.[c|h]" | xargs grep -n --color func_foo
and
find ./ -type f -name "*.c" | xargs grep -n --color func_foo
Regarding the valid paramters to find's option -regextype this comes verbatim from man find:
-regextype type
Changes the regular expression syntax understood by -regex and -iregex tests which occur later on
the command line. Currently-implemented types are emacs (this is the default),
posix-awk, posix-basic, posix-egrep and posix-extended
There is no type sed.