Find directory with regular expression: white spaces, command find - regex

I had to write a bash script that have to find the directories in the current directory and that directories must have a name that start with a letter of the alphabet [A-z]. For shell I wrote:
find . -maxdepth 1 -name '[[:alpha:]]*' -type d
and it's ok. But in the script I wrote:
#! /bin/bash
files=$(find . -maxdepth 1 -name '[[:alpha:]]*' -type d)
for FILE in $files; do echo 'you are in', $FILE; done;
But, when it finds a directory with whitespace (ex. ./New Directory) the output is
./New
Directory
as it were 2 different directories. Why? How can i resolve this problem?

find -maxdepth 1 -type d -regextype posix-awk -regex ".*/[A-Z].*" -exec echo "you are in" {} \;

This may work for you :
find . -maxdepth 1 -name '[[:alpha:]]*' -type d | sed 's/^/You are in /'

Related

RegEx search on linux filesystem based on filenames

I have tried following command find . | egrep -v '.*/[A-Z]{3}-[0-9]{8}-.' to recursively search for files (not folders) that are not in the pattern. This also displays folders! What am I missing?
You can use find directly with -not option:
find . -type f -regextype posix-egrep -not -regex '.*/[A-Z]{3}-[0-9]{8}-[^/]*$' -exec basename {} \;
With GNU find, you may use
find . -type f -regextype posix-egrep -not -regex '.*/[A-Z]{3}-[0-9]{8}-[^/]*$' -printf "%f\n"
Details:
-type f - return only file paths
-regextype posix-egrep sets the regex flavor to POSIX ERE
-not reverses the regex result
.*/[A-Z]{3}-[0-9]{8}-[^/]*$ - matches paths where file names start with three uppercase letters, -, eight digits, - and then can have any text other than / till the end of the string
-exec basename {} \; / -printf "%f\n" only prints the file names without folders (see Have Find print just the filenames, not full paths)

Using regex OR with find to list and delete files

I have a folder with these files:
sample.jpg
sample.ods
sample.txt
sample.xlsx
Now, I need to find and remove files that end with either .ods or .xlsx.
To fish them out I initially use:
ls | grep -E "*.ods|*.xlsx"
This gives me:
sample.ods
sample.xlsx
Now, I don't want to parse ls so I use find:
find . -type f -regextype grep -regex '.*/*.ods\|*.xlsx' | wc -l
But that gives me the output of 1 while I expect to have 2 files before I extend the command to:
find . -type f -regextype grep -regex '.*/*.ods\|*.xlsx' | xargs -d"\n" rm
Which works but removes only the .ods file but not the .xlsx one.
What am I missing here?
I'm on ubuntu 18.04 and my find version is find (GNU findutils) 4.7.0-git.
You don't need to use regex here, just use -name and -or and so:
find . -type f -name "*.ods" -or -name "*.xlsx" -delete
Find files ending with either ods or xlsx and delete
If you really wanted to use regex, you could use the following:
find . -maxdepth 1 -regextype posix-extended -regex "(.*\.ods)|(.*\.xlsx)" -delete
Make sure that the expressions are in between brackets

Regex in Bash: not wanting to include directories

I have a list of images, collected using the following line:
# find . -mindepth 1 -type f -name "*.JPG" | grep "MG_[0-9][0-9][0-9][0-9].JPG"
output:
./DCIM/103canon/IMG_0039.JPG
./DCIM/103canon/IMG_0097.JPG
./DCIM/103canon/IMG_1600.JPG
./DCIM/103canon/IMG_2317.JPG
./DCIM/IMG_0042.JPG
./DCIM/IMG_1152.JPG
./DCIM/IMG_1810.JPG
./DCIM/IMG_2564.JPG
./images/IMG_0058.JPG
./images/IMG_0079.JPG
./images/IMG_1233.JPG
./images/IMG_1959.JPG
./images/IMG_2012/favs/IMG_0039.JPG
./images/IMG_2012/favs/IMG_1060.JPG
./images/IMG_2012/favs/IMG_1729.JPG
./images/IMG_2012/favs/IMG_2013.JPG
./images/IMG_2012/favs/IMG_2317.JPG
./images/IMG_2012/IMG_0079.JPG
./images/IMG_2012/IMG_1403.JPG
./images/IMG_2012/IMG_2102.JPG
./images/IMG_2013/IMG_0060.JPG
./images/IMG_2013/IMG_1311.JPG
./images/IMG_2013/IMG_1729.JPG
./images/IMG_2013/IMG_2013.JPG
./IMG_0085.JPG
./IMG_1597.JPG
./IMG_2288.JPG
however I only want the very last portion, the IMG_\d\d\d\d.JPG. I have tried hundreds of regular expressions and this is the one that gives me the best result. Is there a way to only print out the filename without the directory tree before it or is is solely down to the regex?
Thanks
It should be
find . -mindepth 1 -type f -name "*MG_[0-9][0-9][0-9][0-9].JPG" -printf "%f\n"
If the -printf option is not available with your implementation of find (as in current versions of Mac OS X),
then you can use -execdir echo {} \; instead (if that's available):
find . -mindepth 1 -type f -name "*MG_[0-9][0-9][0-9][0-9].JPG" -execdir echo {} \;

Why does -name work but I cannot use -regex in Mac terminal?

On my Mac running Yosemite I have a directory of files on my Desktop:
29_foo10.bar
29_foo2.bar
29_foo20.bar
29_foo3.bar
I want to target the files with a single digit after foo. When I use find -name I can target the selection of files with:
USERNAME=darthvader
DIRECTORY="/Users/$USERNAME/desktop/test"
for theProblem in $(find $DIRECTORY -type f -name '29_foo[0-9].bar'); do
cd $DIRECTORY
echo $theProblem
done
and I'm returned the two files in the terminal (29_foo2.bar & 29_foo3.bar) but when I try using -regex it returns nothing, code:
for theProblem in $(find $DIRECTORY -type f -regex '29_foo[0-9].bar'); do
cd $DIRECTORY
echo $theProblem
done
I did some research and found OS X Find in bash with regex digits \d not producing expected results
so I modified my code to:
for theProblem in $(find -E $DIRECTORY -iregex '29_foo[0-9].bar'); do
and I'm returned:
No such file or directory
so I tried:
for theProblem in $(find -E $DIRECTORY -type f -regex '29_foo[0-9].bar'); do
but I still get:
No such file or directory
So I did some further research and found bash: recursively find all files that match a certain pattern and tested:
for theProblem in $(find $DIRECTORY -regex '29_foo[0-9].bar'); do
and I'm returned:
No such file or directory
Further down the rabbit hole I found How to use regex in file find so I tried:
for theProblem in $(find $DIRECTORY -regextype posix-extended -regex '29_foo[0-9].bar'); do
and terminal tells me:
find: -regextype: unknown primary or operator
Why in Yosemite am I not able to target the files with -regex? When I do man regex I am returned the manual, my bash version is 3.2.57. So why does -name work when -regex will not?
You should use this regex in find:
find "$DIRECTORY" -type f -regex '.*/29_foo[0-9]\.bar$'
Changes are:
.*/ is required at start because there will be either a DOT or some directory path before each filename.
Added anchor $ in the end to avoid matching 29_foo3.barn if at all this filename is also there.
DOT needs to be escaped in -regex otherwise it will match any character.
Above find command will work on OSX-find as well on gnu-find.

Recursively go through directories and files in bash + use wc

I need to go recursively through directories. First argument must be directory in which I need to start from, second argument is regex which describes name of the file.
ex. ./myscript.sh directory "regex"
While script recursively goes through directories and files, it must use wc -l to count lines in the files which are described by regex.
How can I use find with -exec to do that? Or there is maybe some other way to do it? Please help.
Thanks
Yes, you can use find:
$ find DIR -iname "regex" -type f -exec wc -l '{}' \;
Or, if you want to count the total number of lines, in all files:
$ find DIR -iname "regex" -type f -exec wc -l '{}' \; | awk '{ SUM += $1 } END { print SUM }'
Your script would then look like:
#!/bin/bash
# $1 - name of the directory - first argument
# $2 - regex - second argument
if [ $# -lt 2 ]; then
echo Usage: ./myscript.sh DIR "REGEX"
exit
fi
find "$1" -iname "$2" -type f -exec wc -l '{}' \;
Edit: - if you need more fancy regular expressions, use -regextype posix-extended and -regex instead of -iname as noted by #sudo_O in his answer.