find file with numeric values greater than a specified number - regex

When I run the following command, I get a list of files
find . -type f -name '*_duplicate_[0-9]*.txt'
./prefix_duplicate_001.txt
./prefix_duplicate_002.txt
./prefix_duplicate_003.txt
./prefix_duplicate_004.txt
./prefix_duplicate_005.txt
Now I'm only interested in files which have the numbers greater than or equal to 003. How can I get this done?
Thank you in advance.

Using -regex option in find, you can tweak regex to get all files with 3 or higher value after _duplicate_ with leading zeroes:
find . -regextype posix-extended -type f \
-regex '.*_duplicate_0*([3-9]|[1-9][0-9])[0-9]*\.txt'
On OSX use this find:
find -E . -type f -regex '.*_duplicate_0*([3-9]|[1-9][0-9])[0-9]*\.txt'
./prefix_duplicate_003.txt
./prefix_duplicate_004.txt
./prefix_duplicate_005.txt

use this pattern
.*_duplicate_(?!00[1-2])\d{3}\.txt
Demo

As much as I like to use single commands when possible, I think maybe this is what you need here:
find . -type f -name '*_duplicate_[0-9]*.mat' | awk -F '[_.]' '$4 > 3 { print $0 }'
There are variations on that - for example, this:
find . -type f -name "*.mat" | awk -F '[_.]' '$0 ~ /_duplicate_[0-9]*.mat/ && $4 > 3 { print $0 }'
But I'm not sure it really makes a difference from an efficiency standpoint...

00[3-9]|(([1-9]\\d\\d)|(\\d[1-9]\\d))
only for the number part.

Related

Deleting files not containing double digit number and pattern in grep

The pattern below is supposed to delete all files that dont start with 1_ but instead it matches all files that don't contain 1.
For example, it'll not match 11_xxx.sql.bz2 and 1_xxx.sql.bz2 but will match all the others correctly.
How can I ensure the pattern only matches the exact number and not any number which contains the number?
For example, i would like the script below only to not match 1_xxx.sql.bz2
ls | grep -P "^[^1]+_([^_]+).+$" | xargs -d"\n" rm
I will need to keep items without a number at the start
I suggest using find like this to match all files in current directory excluding those that start with 1_:
find . -maxdepth 1 -type f -name '[0-9]*' -not -name '1_*' -delete
If your find doesn't support -delete then use:
find . -maxdepth 1 -type f -name '[0-9]*' -not -name '1_*' -exec rm {} +
use grep -v to invert the match, so you exclude files that match the pattern.
grep -v '^1_'

Recursively find filenames of exactly 8 hex characters, but not all 0-9, no lookahead (Mac terminal, bash)

I'm trying to write a regex to find files recursively with Mac Terminal (bash, not zsh even though Catalina wants me to switch over for whatever reason) using the find command. I'm looking for files that are:
Exactly 8 hexadecimal digits (0-9 and A-F)
But NOT only decimal digits (0-9)
In other words, it would match A1234567, ABC12DEF, 12345ABC, and ABCDABCD, but not 12345678 or 09876543.
To find files that are exactly 8 hex digits, I've used this:
find -E . -type f -regex '.*/[A-F0-9]{8}'
The .*/ is necessary to allow the full path name to precede the filename. This is eventually going to get fed to rm, so I have to keep the path.
It SEEMS like this should work to fulfill both of my requirements:
find -E . -type f -regex '.*/(?![0-9]{8})[A-F0-9]{8}'
But that returns an error:
find: -regex: .*/(?![0-9]{8})[A-F0-9]{8}: repetition-operator operand invalid
It seems like the find command doesn't support lookaheads. How can I do this without one?
With any POSIX-compliant find
find . -type f \
-name '????????' \
! -name '*[![:xdigit:]]*' \
-name '*[![:digit:]]*'
And if you insist on using regexps for this, here you go
find -E . -type f \
-regex '.*/[[:xdigit:]]{8}' \
! -regex '.*/[[:digit:]]*'
Those who use GNU find should drop -E and insert -regextype posix-extended after paths to make this work.
It's probably easiest to just filter out the results you don't like:
find -E . -type f -regex '.*/[A-F0-9]{8}' -print | egrep -v '.*/[0-9]{8}$'
$ find -E . -type f -regex '.*/[A-F0-9]{8}' -print
./01234567
./ABCDEFAF
./ABCDEF01
./ABCDEF2A
./ABCDEFA2
./x/01234567
./x/ABCDEFAF
./x/ABCDEF01
./x/ABCDEF2A
./x/ABCDEFA2
$ find -E . -type f -regex '.*/[A-F0-9]{8}' -print | egrep -v '.*/[0-9]{8}$'
./01234567
./ABCDEFAF
./ABCDEF01
./ABCDEF2A
./ABCDEFA2
./x/01234567
./x/ABCDEFAF
./x/ABCDEF01
./x/ABCDEF2A
./x/ABCDEFA2
My find didn't understand -E and was inexplicably grumpy about -regex in general, but this still worked:
find . -type f -name '[A-F0-9][A-F0-9][A-F0-9][A-F0-9][A-F0-9][A-F0-9][A-F0-9][A-F0-9]' -a -name '*[A-F]*'
Not as elegant as oguz ismail's, but easier to read for my clogged brain, lol

Regex in Bash: not wanting to include directories

I have a list of images, collected using the following line:
# find . -mindepth 1 -type f -name "*.JPG" | grep "MG_[0-9][0-9][0-9][0-9].JPG"
output:
./DCIM/103canon/IMG_0039.JPG
./DCIM/103canon/IMG_0097.JPG
./DCIM/103canon/IMG_1600.JPG
./DCIM/103canon/IMG_2317.JPG
./DCIM/IMG_0042.JPG
./DCIM/IMG_1152.JPG
./DCIM/IMG_1810.JPG
./DCIM/IMG_2564.JPG
./images/IMG_0058.JPG
./images/IMG_0079.JPG
./images/IMG_1233.JPG
./images/IMG_1959.JPG
./images/IMG_2012/favs/IMG_0039.JPG
./images/IMG_2012/favs/IMG_1060.JPG
./images/IMG_2012/favs/IMG_1729.JPG
./images/IMG_2012/favs/IMG_2013.JPG
./images/IMG_2012/favs/IMG_2317.JPG
./images/IMG_2012/IMG_0079.JPG
./images/IMG_2012/IMG_1403.JPG
./images/IMG_2012/IMG_2102.JPG
./images/IMG_2013/IMG_0060.JPG
./images/IMG_2013/IMG_1311.JPG
./images/IMG_2013/IMG_1729.JPG
./images/IMG_2013/IMG_2013.JPG
./IMG_0085.JPG
./IMG_1597.JPG
./IMG_2288.JPG
however I only want the very last portion, the IMG_\d\d\d\d.JPG. I have tried hundreds of regular expressions and this is the one that gives me the best result. Is there a way to only print out the filename without the directory tree before it or is is solely down to the regex?
Thanks
It should be
find . -mindepth 1 -type f -name "*MG_[0-9][0-9][0-9][0-9].JPG" -printf "%f\n"
If the -printf option is not available with your implementation of find (as in current versions of Mac OS X),
then you can use -execdir echo {} \; instead (if that's available):
find . -mindepth 1 -type f -name "*MG_[0-9][0-9][0-9][0-9].JPG" -execdir echo {} \;

Regex: Find files not ending with numeral suffix

I need to make a command which returns all files without numeral suffix (*.0, *.123, ...)
Have for example three files:
gg.p qqq.449 rtr55
I want to find only these:
./rtr55
./gg.p
I tried to find them using grep. However I got only results with no effect.
find -type f | grep -v '\.[0-9]+$'
(This command returned:)
./qqq.449
./rtr55
./gg.p
So there is probably some regex format error. Do you know, how to fix it?
The + operator belongs to the extended regular expressions. There are many workarounds:
find -type f | grep -v '\.[0-9]\+$'
find -type f | egrep -v '\.[0-9]+$'
find -type f | grep -E -v '\.[0-9]+$'
find -type f | grep -v '\.[0-9][0-9]*$'
Why would you use grep at all?
find -regex '.*\.[0-9][0-9]*' -prune -o -type f
If your expressions are simple enough (or your find doesn't support -regex), you could use -name instead of -regex but a glob wildcard can't capture an arbitrary amount of numbers after the dot. Here's one or two:
find -name '*.[0-9]' -prune -o -name '*.[0-9][0-9]' -prune -o -type f
Notice that this isn't purely an efficiency question; grep would simply not do the right thing if you ever come across file names with newlines in them.

find with regex. Using string literals from variable

I know that shell doesn't really have arrays, but I know that I can do this with a list of values:
dir_array=("quarantine" "720" "low" "high" "DVD" "error" "keep")
for d in "${dir_array[#]}"
do
…
done
I also know that I can exclude these directories from find using -regex and -prune:
find -E . \
-type d -regex './(DVD|quarantine|720|high|low|error|keep)' -prune -o \
-type f -iregex '.*.(avi|wmv|mp4|m4v|mov|mkv)' -print
So, finally, here's my question:
How can I use my original $dir_array in the (first) regex in the find instead of repeating myself?
you could convert the array into a string variable and then use the variable in the find command, like this:
str=`echo "./(${dir_array[#]})" | sed "s/ /\|/g;"`
echo "$str"