Deleting files not containing double digit number and pattern in grep

Deleting files not containing double digit number and pattern in grep - regex

The pattern below is supposed to delete all files that dont start with 1_ but instead it matches all files that don't contain 1.
For example, it'll not match 11_xxx.sql.bz2 and 1_xxx.sql.bz2 but will match all the others correctly.
How can I ensure the pattern only matches the exact number and not any number which contains the number?
For example, i would like the script below only to not match 1_xxx.sql.bz2
ls | grep -P "^[^1]+_([^_]+).+$" | xargs -d"\n" rm

I will need to keep items without a number at the start
I suggest using find like this to match all files in current directory excluding those that start with 1_:
find . -maxdepth 1 -type f -name '[0-9]*' -not -name '1_*' -delete
If your find doesn't support -delete then use:
find . -maxdepth 1 -type f -name '[0-9]*' -not -name '1_*' -exec rm {} +

use grep -v to invert the match, so you exclude files that match the pattern.
grep -v '^1_'

Related

Using regex OR with find to list and delete files

I have a folder with these files:
sample.jpg
sample.ods
sample.txt
sample.xlsx
Now, I need to find and remove files that end with either .ods or .xlsx.
To fish them out I initially use:
ls | grep -E "*.ods|*.xlsx"
This gives me:
sample.ods
sample.xlsx
Now, I don't want to parse ls so I use find:
find . -type f -regextype grep -regex '.*/*.ods\|*.xlsx' | wc -l
But that gives me the output of 1 while I expect to have 2 files before I extend the command to:
find . -type f -regextype grep -regex '.*/*.ods\|*.xlsx' | xargs -d"\n" rm
Which works but removes only the .ods file but not the .xlsx one.
What am I missing here?
I'm on ubuntu 18.04 and my find version is find (GNU findutils) 4.7.0-git.

You don't need to use regex here, just use -name and -or and so:
find . -type f -name "*.ods" -or -name "*.xlsx" -delete
Find files ending with either ods or xlsx and delete
If you really wanted to use regex, you could use the following:
find . -maxdepth 1 -regextype posix-extended -regex "(.*\.ods)|(.*\.xlsx)" -delete
Make sure that the expressions are in between brackets

Recursively find filenames of exactly 8 hex characters, but not all 0-9, no lookahead (Mac terminal, bash)

I'm trying to write a regex to find files recursively with Mac Terminal (bash, not zsh even though Catalina wants me to switch over for whatever reason) using the find command. I'm looking for files that are:
Exactly 8 hexadecimal digits (0-9 and A-F)
But NOT only decimal digits (0-9)
In other words, it would match A1234567, ABC12DEF, 12345ABC, and ABCDABCD, but not 12345678 or 09876543.
To find files that are exactly 8 hex digits, I've used this:
find -E . -type f -regex '.*/[A-F0-9]{8}'
The .*/ is necessary to allow the full path name to precede the filename. This is eventually going to get fed to rm, so I have to keep the path.
It SEEMS like this should work to fulfill both of my requirements:
find -E . -type f -regex '.*/(?![0-9]{8})[A-F0-9]{8}'
But that returns an error:
find: -regex: .*/(?![0-9]{8})[A-F0-9]{8}: repetition-operator operand invalid
It seems like the find command doesn't support lookaheads. How can I do this without one?

With any POSIX-compliant find
find . -type f \
-name '????????' \
! -name '*[![:xdigit:]]*' \
-name '*[![:digit:]]*'
And if you insist on using regexps for this, here you go
find -E . -type f \
-regex '.*/[[:xdigit:]]{8}' \
! -regex '.*/[[:digit:]]*'
Those who use GNU find should drop -E and insert -regextype posix-extended after paths to make this work.

It's probably easiest to just filter out the results you don't like:
find -E . -type f -regex '.*/[A-F0-9]{8}' -print | egrep -v '.*/[0-9]{8}$'
$ find -E . -type f -regex '.*/[A-F0-9]{8}' -print
./01234567
./ABCDEFAF
./ABCDEF01
./ABCDEF2A
./ABCDEFA2
./x/01234567
./x/ABCDEFAF
./x/ABCDEF01
./x/ABCDEF2A
./x/ABCDEFA2
$ find -E . -type f -regex '.*/[A-F0-9]{8}' -print | egrep -v '.*/[0-9]{8}$'
./01234567
./ABCDEFAF
./ABCDEF01
./ABCDEF2A
./ABCDEFA2
./x/01234567
./x/ABCDEFAF
./x/ABCDEF01
./x/ABCDEF2A
./x/ABCDEFA2

My find didn't understand -E and was inexplicably grumpy about -regex in general, but this still worked:
find . -type f -name '[A-F0-9][A-F0-9][A-F0-9][A-F0-9][A-F0-9][A-F0-9][A-F0-9][A-F0-9]' -a -name '*[A-F]*'
Not as elegant as oguz ismail's, but easier to read for my clogged brain, lol

Regex in Bash: not wanting to include directories

I have a list of images, collected using the following line:
# find . -mindepth 1 -type f -name "*.JPG" | grep "MG_[0-9][0-9][0-9][0-9].JPG"
output:
./DCIM/103canon/IMG_0039.JPG
./DCIM/103canon/IMG_0097.JPG
./DCIM/103canon/IMG_1600.JPG
./DCIM/103canon/IMG_2317.JPG
./DCIM/IMG_0042.JPG
./DCIM/IMG_1152.JPG
./DCIM/IMG_1810.JPG
./DCIM/IMG_2564.JPG
./images/IMG_0058.JPG
./images/IMG_0079.JPG
./images/IMG_1233.JPG
./images/IMG_1959.JPG
./images/IMG_2012/favs/IMG_0039.JPG
./images/IMG_2012/favs/IMG_1060.JPG
./images/IMG_2012/favs/IMG_1729.JPG
./images/IMG_2012/favs/IMG_2013.JPG
./images/IMG_2012/favs/IMG_2317.JPG
./images/IMG_2012/IMG_0079.JPG
./images/IMG_2012/IMG_1403.JPG
./images/IMG_2012/IMG_2102.JPG
./images/IMG_2013/IMG_0060.JPG
./images/IMG_2013/IMG_1311.JPG
./images/IMG_2013/IMG_1729.JPG
./images/IMG_2013/IMG_2013.JPG
./IMG_0085.JPG
./IMG_1597.JPG
./IMG_2288.JPG
however I only want the very last portion, the IMG_\d\d\d\d.JPG. I have tried hundreds of regular expressions and this is the one that gives me the best result. Is there a way to only print out the filename without the directory tree before it or is is solely down to the regex?
Thanks

It should be
find . -mindepth 1 -type f -name "*MG_[0-9][0-9][0-9][0-9].JPG" -printf "%f\n"

If the -printf option is not available with your implementation of find (as in current versions of Mac OS X),
then you can use -execdir echo {} \; instead (if that's available):
find . -mindepth 1 -type f -name "*MG_[0-9][0-9][0-9][0-9].JPG" -execdir echo {} \;

Regex: Find files not ending with numeral suffix

I need to make a command which returns all files without numeral suffix (*.0, *.123, ...)
Have for example three files:
gg.p qqq.449 rtr55
I want to find only these:
./rtr55
./gg.p
I tried to find them using grep. However I got only results with no effect.
find -type f | grep -v '\.[0-9]+$'
(This command returned:)
./qqq.449
./rtr55
./gg.p
So there is probably some regex format error. Do you know, how to fix it?

The + operator belongs to the extended regular expressions. There are many workarounds:
find -type f | grep -v '\.[0-9]\+$'
find -type f | egrep -v '\.[0-9]+$'
find -type f | grep -E -v '\.[0-9]+$'
find -type f | grep -v '\.[0-9][0-9]*$'

Why would you use grep at all?
find -regex '.*\.[0-9][0-9]*' -prune -o -type f
If your expressions are simple enough (or your find doesn't support -regex), you could use -name instead of -regex but a glob wildcard can't capture an arbitrary amount of numbers after the dot. Here's one or two:
find -name '*.[0-9]' -prune -o -name '*.[0-9][0-9]' -prune -o -type f
Notice that this isn't purely an efficiency question; grep would simply not do the right thing if you ever come across file names with newlines in them.

find file with numeric values greater than a specified number

When I run the following command, I get a list of files
find . -type f -name '*_duplicate_[0-9]*.txt'
./prefix_duplicate_001.txt
./prefix_duplicate_002.txt
./prefix_duplicate_003.txt
./prefix_duplicate_004.txt
./prefix_duplicate_005.txt
Now I'm only interested in files which have the numbers greater than or equal to 003. How can I get this done?
Thank you in advance.

Using -regex option in find, you can tweak regex to get all files with 3 or higher value after _duplicate_ with leading zeroes:
find . -regextype posix-extended -type f \
-regex '.*_duplicate_0*([3-9]|[1-9][0-9])[0-9]*\.txt'
On OSX use this find:
find -E . -type f -regex '.*_duplicate_0*([3-9]|[1-9][0-9])[0-9]*\.txt'
./prefix_duplicate_003.txt
./prefix_duplicate_004.txt
./prefix_duplicate_005.txt

use this pattern
.*_duplicate_(?!00[1-2])\d{3}\.txt
Demo

As much as I like to use single commands when possible, I think maybe this is what you need here:
find . -type f -name '*_duplicate_[0-9]*.mat' | awk -F '[_.]' '$4 > 3 { print $0 }'
There are variations on that - for example, this:
find . -type f -name "*.mat" | awk -F '[_.]' '$0 ~ /_duplicate_[0-9]*.mat/ && $4 > 3 { print $0 }'
But I'm not sure it really makes a difference from an efficiency standpoint...

00[3-9]|(([1-9]\\d\\d)|(\\d[1-9]\\d))
only for the number part.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Deleting files not containing double digit number and pattern in grep - regex

use grep -v to invert the match, so you exclude files that match the pattern. grep -v '^1_'

Related

Using regex OR with find to list and delete files

Recursively find filenames of exactly 8 hex characters, but not all 0-9, no lookahead (Mac terminal, bash)

Regex in Bash: not wanting to include directories

Regex: Find files not ending with numeral suffix

find file with numeric values greater than a specified number

Categories

Resources