Using regex OR with find to list and delete files - regex

I have a folder with these files:
sample.jpg
sample.ods
sample.txt
sample.xlsx
Now, I need to find and remove files that end with either .ods or .xlsx.
To fish them out I initially use:
ls | grep -E "*.ods|*.xlsx"
This gives me:
sample.ods
sample.xlsx
Now, I don't want to parse ls so I use find:
find . -type f -regextype grep -regex '.*/*.ods\|*.xlsx' | wc -l
But that gives me the output of 1 while I expect to have 2 files before I extend the command to:
find . -type f -regextype grep -regex '.*/*.ods\|*.xlsx' | xargs -d"\n" rm
Which works but removes only the .ods file but not the .xlsx one.
What am I missing here?
I'm on ubuntu 18.04 and my find version is find (GNU findutils) 4.7.0-git.

You don't need to use regex here, just use -name and -or and so:
find . -type f -name "*.ods" -or -name "*.xlsx" -delete
Find files ending with either ods or xlsx and delete
If you really wanted to use regex, you could use the following:
find . -maxdepth 1 -regextype posix-extended -regex "(.*\.ods)|(.*\.xlsx)" -delete
Make sure that the expressions are in between brackets

Related

Recursively find filenames of exactly 8 hex characters, but not all 0-9, no lookahead (Mac terminal, bash)

I'm trying to write a regex to find files recursively with Mac Terminal (bash, not zsh even though Catalina wants me to switch over for whatever reason) using the find command. I'm looking for files that are:
Exactly 8 hexadecimal digits (0-9 and A-F)
But NOT only decimal digits (0-9)
In other words, it would match A1234567, ABC12DEF, 12345ABC, and ABCDABCD, but not 12345678 or 09876543.
To find files that are exactly 8 hex digits, I've used this:
find -E . -type f -regex '.*/[A-F0-9]{8}'
The .*/ is necessary to allow the full path name to precede the filename. This is eventually going to get fed to rm, so I have to keep the path.
It SEEMS like this should work to fulfill both of my requirements:
find -E . -type f -regex '.*/(?![0-9]{8})[A-F0-9]{8}'
But that returns an error:
find: -regex: .*/(?![0-9]{8})[A-F0-9]{8}: repetition-operator operand invalid
It seems like the find command doesn't support lookaheads. How can I do this without one?
With any POSIX-compliant find
find . -type f \
-name '????????' \
! -name '*[![:xdigit:]]*' \
-name '*[![:digit:]]*'
And if you insist on using regexps for this, here you go
find -E . -type f \
-regex '.*/[[:xdigit:]]{8}' \
! -regex '.*/[[:digit:]]*'
Those who use GNU find should drop -E and insert -regextype posix-extended after paths to make this work.
It's probably easiest to just filter out the results you don't like:
find -E . -type f -regex '.*/[A-F0-9]{8}' -print | egrep -v '.*/[0-9]{8}$'
$ find -E . -type f -regex '.*/[A-F0-9]{8}' -print
./01234567
./ABCDEFAF
./ABCDEF01
./ABCDEF2A
./ABCDEFA2
./x/01234567
./x/ABCDEFAF
./x/ABCDEF01
./x/ABCDEF2A
./x/ABCDEFA2
$ find -E . -type f -regex '.*/[A-F0-9]{8}' -print | egrep -v '.*/[0-9]{8}$'
./01234567
./ABCDEFAF
./ABCDEF01
./ABCDEF2A
./ABCDEFA2
./x/01234567
./x/ABCDEFAF
./x/ABCDEF01
./x/ABCDEF2A
./x/ABCDEFA2
My find didn't understand -E and was inexplicably grumpy about -regex in general, but this still worked:
find . -type f -name '[A-F0-9][A-F0-9][A-F0-9][A-F0-9][A-F0-9][A-F0-9][A-F0-9][A-F0-9]' -a -name '*[A-F]*'
Not as elegant as oguz ismail's, but easier to read for my clogged brain, lol

Regex in Bash: not wanting to include directories

I have a list of images, collected using the following line:
# find . -mindepth 1 -type f -name "*.JPG" | grep "MG_[0-9][0-9][0-9][0-9].JPG"
output:
./DCIM/103canon/IMG_0039.JPG
./DCIM/103canon/IMG_0097.JPG
./DCIM/103canon/IMG_1600.JPG
./DCIM/103canon/IMG_2317.JPG
./DCIM/IMG_0042.JPG
./DCIM/IMG_1152.JPG
./DCIM/IMG_1810.JPG
./DCIM/IMG_2564.JPG
./images/IMG_0058.JPG
./images/IMG_0079.JPG
./images/IMG_1233.JPG
./images/IMG_1959.JPG
./images/IMG_2012/favs/IMG_0039.JPG
./images/IMG_2012/favs/IMG_1060.JPG
./images/IMG_2012/favs/IMG_1729.JPG
./images/IMG_2012/favs/IMG_2013.JPG
./images/IMG_2012/favs/IMG_2317.JPG
./images/IMG_2012/IMG_0079.JPG
./images/IMG_2012/IMG_1403.JPG
./images/IMG_2012/IMG_2102.JPG
./images/IMG_2013/IMG_0060.JPG
./images/IMG_2013/IMG_1311.JPG
./images/IMG_2013/IMG_1729.JPG
./images/IMG_2013/IMG_2013.JPG
./IMG_0085.JPG
./IMG_1597.JPG
./IMG_2288.JPG
however I only want the very last portion, the IMG_\d\d\d\d.JPG. I have tried hundreds of regular expressions and this is the one that gives me the best result. Is there a way to only print out the filename without the directory tree before it or is is solely down to the regex?
Thanks
It should be
find . -mindepth 1 -type f -name "*MG_[0-9][0-9][0-9][0-9].JPG" -printf "%f\n"
If the -printf option is not available with your implementation of find (as in current versions of Mac OS X),
then you can use -execdir echo {} \; instead (if that's available):
find . -mindepth 1 -type f -name "*MG_[0-9][0-9][0-9][0-9].JPG" -execdir echo {} \;

Why does -name work but I cannot use -regex in Mac terminal?

On my Mac running Yosemite I have a directory of files on my Desktop:
29_foo10.bar
29_foo2.bar
29_foo20.bar
29_foo3.bar
I want to target the files with a single digit after foo. When I use find -name I can target the selection of files with:
USERNAME=darthvader
DIRECTORY="/Users/$USERNAME/desktop/test"
for theProblem in $(find $DIRECTORY -type f -name '29_foo[0-9].bar'); do
cd $DIRECTORY
echo $theProblem
done
and I'm returned the two files in the terminal (29_foo2.bar & 29_foo3.bar) but when I try using -regex it returns nothing, code:
for theProblem in $(find $DIRECTORY -type f -regex '29_foo[0-9].bar'); do
cd $DIRECTORY
echo $theProblem
done
I did some research and found OS X Find in bash with regex digits \d not producing expected results
so I modified my code to:
for theProblem in $(find -E $DIRECTORY -iregex '29_foo[0-9].bar'); do
and I'm returned:
No such file or directory
so I tried:
for theProblem in $(find -E $DIRECTORY -type f -regex '29_foo[0-9].bar'); do
but I still get:
No such file or directory
So I did some further research and found bash: recursively find all files that match a certain pattern and tested:
for theProblem in $(find $DIRECTORY -regex '29_foo[0-9].bar'); do
and I'm returned:
No such file or directory
Further down the rabbit hole I found How to use regex in file find so I tried:
for theProblem in $(find $DIRECTORY -regextype posix-extended -regex '29_foo[0-9].bar'); do
and terminal tells me:
find: -regextype: unknown primary or operator
Why in Yosemite am I not able to target the files with -regex? When I do man regex I am returned the manual, my bash version is 3.2.57. So why does -name work when -regex will not?
You should use this regex in find:
find "$DIRECTORY" -type f -regex '.*/29_foo[0-9]\.bar$'
Changes are:
.*/ is required at start because there will be either a DOT or some directory path before each filename.
Added anchor $ in the end to avoid matching 29_foo3.barn if at all this filename is also there.
DOT needs to be escaped in -regex otherwise it will match any character.
Above find command will work on OSX-find as well on gnu-find.

How to find files with regex and list them?

I am new to the whole command-line thing and trying to figure out how to search the current directory and its sub directories for files with a specific filename via regex. Then I want to have the files listed in my command-line.
The regex should match files like:
B2ctes_UCUAAwF-K-large-123x322-132x423.jpg
this_is-a-123-file_name-3124x2445-4235x32.jpeg
file-32x32-64x64.png
The important part is the -[number]x[number]-[number]x[number]
My attempt looks like this:
find . -type f -regex ".+?-\d+x\d+-\d+x\d+\.\w{3,4}" -ls;
There are two problems with this:
-ls puts shows a lot of information. I just want the filenames.
The regex doesn’t work. I have tried to use .+, but even that does not return anything.
You can use this find with regex:
find . -regextype posix-extended -type f -regex ".*-[[:digit:]]+x[[:digit:]]+-[[:digit:]]+x[[:digit:]]+\.[[:alnum:]]{3,4}"
Or on OSX:
find -E . -type f -regex ".*-[[:digit:]]+x[[:digit:]]+-[[:digit:]]+x[[:digit:]]+\.[[:alnum:]]{3,4}"
And without regex:
find . -type f -name "*-[[:digit:]]*x[[:digit:]]*-[[:digit:]]*x[[:digit:]]*.[[:alnum:]]*"
What about simply :
find . -type f -name '-[0-9]*x[0-9]*-[0-9]*x-[0-9]*'
or
find . -type f -regextype posix-egrep -regex '.*-[0-9]+x[0-9]+-[0-9]+x-[0-9]+.*'

find all files except e.g. *.xml files in shell

Using bash, how to find files in a directory structure except for *.xml files?
I'm just trying to use
find . -regex ....
regexe:
'.*^((?!xml).)*$'
but without expected results...
or is there another way to achieve this, i.e. without a regexp matching?
find . ! -name "*.xml" -type f
find . -not -name '*.xml'
Should do the trick.
Sloppier than the find solutions above, and it does more work than it needs to, but you could do
find . | grep -v '\.xml$'
Also, is this a tree of source code? Maybe you have all your source code and some XML in a tree, but you want to only get the source code? If you were using ack, you could do:
ack -f --noxml
with bash:
shopt -s extglob globstar nullglob
for f in **/*!(.xml); do
[[ -d $f ]] && continue
# do stuff with $f
done
You can also do it with or-ring as follows:
find . -type f -name "*.xml" -o -type f -print
Try something like this for a regex solution:
find . -regextype posix-extended -not -regex '^.*\.xml$'