find works in terminal but fails in bash script - regex

So I'm writing a bash script that counts the number of files in a directory and outputs a number. The function takes a directory argument as well as an optional file-type extension argument.
I am using the following lines to set the dir variable to the directory and ext variable to a regular expression that will represent all the file types to count.
dir=$1
[[ $# -eq 2 ]] && ext="*.$2" || ext="*"
The problem I am encountering occurs when I attempt to run the following line:
echo $(find $dir -maxdepth 1 -type f -name $ext | wc -l)
Running the script from the terminal works when I provide the second file-type argument but fails when I don't.
harrison#Luminous:~$ bash Documents/howmany.sh Documents/ sh
3
harrison#Luminous:~$ bash Documents/howmany.sh Documents/
find: paths must precede expression: Desktop
Usage: find [-H] [-L] [-P] [-Olevel] [-D help|tree|search|stat|rates|opt|exec] [path...] [expression]
0
I have searched for this error and I know it's an issue with the shell expanding my wildcard as explained here. I've tried experimenting with single quotes, double quotes, and backslashes to escape the asterisk but nothing seems to work. What's particularly interesting is that when I try running this directly through the terminal, it works perfectly fine.
harrison#Luminous:~$ echo $(find Documents/ -maxdepth 1 -type f -name "*" | wc -l)
6

Simplified:
dir=${1:-.} #if $1 not set use .
name=${2+*.$2} #if $2 is set use *.$2 for name
name=${name:-*} #if name still isnt set, use *
find "$dir" -name "$name" -print #use quotes
or
name=${2+*.$2} #if $2 is set use *.$2 for name
find "${1:-.}" -name "${name:-*}" -print #use quotes
also, as #John Kugelman says, you could use:
name=${2+*.$2}
find "${1:-.}" ${name:+-name "$name"} -print
find . -name "*" -print is the same as find . -print, so if $name isn't set, there's no need to specify -name "*".

Try this:
dir="$1"
[[ $# -eq 2 ]] && ext='*.$2' || ext='*'
If that doesn't work, you can just switch to an if statement, where you use the -name pattern in a branch and you don't in the other.
A couple more points:
Those are not regular expressions, but rather shell patterns.
echo $(command) is just equivalent to command.

Related

Getting specific text from filenames in Bash script

I'm trying to write bash script, in which I could get the specific text from all filenames in directory, e.g
I have files in my folder:
.\file1.txt
.\file2.hex
.\DSConf-11.22.33[4444].pkg
.\file3.cpp
What I need to do is taking 4444 number from above filenames. I created a script but it doesn't work and I can't find mistake:
#!/usr/bin/bash
FILENAME=$(find . -maxdepth 1 -name "DSConf-*.*.*\[*\].pkg")
PATTERN='^DSConf-(\d+).(\d+).(\d+)\[([^)]+)\]'
[[ $FILENAME =~ $PATTERN ]]
echo "${BASH_REMATCH[4]}"
for f in ./; do echo "$f" | grep -Eo "DSConf-\d+.\d+.\d+\[([^)]+)\]"; done
will generate your output.
You may look for files with names starting with DSConf- and end with .pkg using find -name "DSConf-*" and then get the digits between square brackets using sed:
find . -maxdepth 1 -name "DSConf-*.pkg" | sed -n 's/.*\[\([0-9]*\)].*/\1/p'
The sed -n 's/.*\[\([0-9]*\)].*/\1/p' command suppresses sed line output using -n, then s/.*\[\([0-9]*\)].*/\1/p substitutes the whole string with just the 0 or more digits between [ and ] and p prints that number.

bash compare regex expression and not the variable

I want to remove all file contain a substring in a string, if does not contain, I want to ignore it, so I use regex expression
str=9009
patt=*v[0-9]{3,}*.txt
for i in "${patt}"; do echo "$i"
if ! [[ "$i" =~ $str ]]; then rm "$i" ; fi done
but I got an error :
*v[0-9]{3,}*.txt
rm: cannot remove '*v[0-9]{3,}*.txt': No such file or directory
file name like this : mari_v9009.txt femme_v9009.txt mari_v9010.txt femme_v9010.txt
bash filename expansion does not use regular expressions. See https://www.gnu.org/software/bash/manual/bash.html#Filename-Expansion
To find files with "v followed by 3 or more digits followed by .txt" you'll have to use bash's extended pattern matching.
A demonstration:
$ shopt -s extglob
$ touch mari_v9009.txt femme_v9009.txt mari_v9010.txt femme_v9010.txt
$ touch foo_v12.txt
$ for f in *v[0-9][0-9]+([0-9]).txt; do echo "$f"; done
femme_v9009.txt
femme_v9010.txt
mari_v9009.txt
mari_v9010.txt
What you have with this pattern for i in *v[0-9]{3,}*.txt is:
first, bash performs brace expansion which results in
for i in *v[0-9]3*.txt *v[0-9]*.txt
then, the first word *v[0-9]3*.txt results in no matches, and the default behaviour of bash is to leave the pattern as a plain string. rm tries to delete the file named literally "*v[0-9]3*.txt" and that gives you the "file not found error"
next, the second word *v[0-9]*.txt gets expanded, but the expansion will include files you don't want to delete.
I missed the not from the question.
try this: within [[ ... ]], the == and != operators are a pattern-matching operators, and extended globbing is enabled by default
keep_pattern='*v[0-9][0-9]+([0-9]).txt'
for file in *; do
if [[ $file != $keep_pattern ]]; then
echo rm "$file"
fi
done
But find would be preferable here, if it's OK to descend into subdirectories:
find . -regextype posix-extended '!' -regex '.*v[0-9]{3,}\.txt' -print
# ...............................^^^
If that returns the files you expect to delete, change -print to -delete
You need to remove the quotes in the for loop. Then the filename globs will be interpreted:
for i in ${patt}; do echo "$i"
I assume that you are using Python.
I have tested your regex code, and found the * character unnecessary.
The following seems to work fine: v[0-9]{3,}.txt
Can you please elaborate some more on the issue?
Thanks,
Bren.
I just piped the error message to /dev/null. This worked for me:
#!/bin/bash
str=9009
patt=*v[0-9]{3,}*.txt
rm $(eval ls $patt 2> /dev/null | grep $str)
This is not regex, this is globbing. Take a look what gets expanded:
# echo *v[0-9]{3,}*.txt
*v[0-9]3*.txt femme_v9009.txt femme_v9010.txt mari_v9009.txt mari_v9010.txt
*v[0-9]3*.txt obvously doesn't exists. can you clarify what files are you trying to achieve with {3,} ? Otherwise live it out and it will match the kind of filenames you have specified.
http://tldp.org/LDP/abs/html/globbingref.html

append epoch date at the beginning of a file in bash

I have a list of 20 files, 10 of them already have 1970-01-01- at the beginning of the name and 10 does not ( the remaining ones all start with a small letter ) .
So my task was to rename those files that do not have the epoch date in the beginning with the epoch date too. Using bash, the below code works, but I could not solve it using a regular expression for example using rename. I had to extract the basename and then further mv. An elegant solution would be just use one pipe instead of two.
Works
find ./ -regex './[a-z].*' | xargs -I {} basename {} | xargs -I {} mv {} 1970-01-01-{}
Hence looking for a solution with just one xargs or -exec?
You can just use a single rename command:
rename -n 's/^([a-z])/1970-01-01-$1/' *
Assuming you're operating on all the files present in current directory.
Note that -n flag (dry run) will only show intended actions by rename command but won't really rename any files.
If you want to combine with find then use:
find . -type f -maxdepth 1 -name '[a-z]*.txt' -execdir rename -n 's/^/1970-01-01-/' {} +
I always prefer readable code over short code.
r() {
base=$(basename "$1")
dir=$(dirname "$1")
if [[ "$base" =~ ^1970-01-01- ]]
then
: "ignore, already has correct prefix"
else
echo mv "$1" "$dir/1970-01-01-$base"
fi
}
export -f r
find . -type f -exec bash -c 'r {}' \;
This also just prints out what would have been done (for testing). Remove the echo before the mv to have to real thing.
Mind that the mv will overwrite existing files (if there is a ./a/b/c and an ./a/b/1970-01-01-c already). Use option -i to mv to be save from this.

Regular expression in bash not working

Is there any way in bash so that I can match the patter like that
[0-9]{8}.*.jpg
I have written the above for the following pattern match
"First 8 character should be digit and rest of them would be anything and end with .jpg"
but the above is not working. if I write in the below manner it's working
[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9].*.jpg
Now suppose I want first 20 character must be digit should I repeat the [0-9] 20 times.. I think there is a better solution available which i don't know...
If anyone know please help....
You can use the regex in find:
find test -regextype posix-extended -regex "^[0-9]{8}.*.jpg$"
Test
$ touch test/12345678aaa.jpg
$ touch test/1234567aaa.jpg
$ find test -regextype posix-extended -regex ".*/[0-9]{8}.*"
test/12345678aaa.jpg
And if it is related to the previous question, you can use:
for file in $(find test -regextype posix-extended -regex ".*/[0-9]{8}.*")
do
echo "my file is $file"
done
If you create directories and files in them, more matchings can appear:
$ mkdir test/123456789.dir
$ touch test/123456789.dir/1234567890.jpg
You can filter by -type f, so that you just get files:
$ find test -type f -regextype posix-extended -regex ".*/[0-9]{8}.*"
test/12345678aaa.jpg
test/123456789.dir/1234567890.jpg
And/or specify the depth of the find, so that it does not contain subdirectories:
$ find test -maxdepth 1 -type f -regextype posix-extended -regex ".*/[0-9]{8}.*"
test/12345678aaa.jpg
It looks like you're trying to generate a list of filenames from a regular expression. You can do that, but not directly from Bash as far as I know. Instead, use find:
find -E . -regex '.*/[0-9]{8}.*\.jpg' -depth 1
Something like that works on my Mac OS X system; on Linux the . for current directory is optional, or you can specify a different directory to search in. I added -depth 1 to avoid descending into subdirectories.
A bit late answer.
Bash's filename exapnsion patterns ( called globbing ) has it's own rules. They're exists in two forms:
simple globbing
extended globbing (if you have enabled shopts -s extglob
You can read about the both rules for example here. (3.5.8.1 Pattern Matching)
You should remember, the globbing rules aren't the traditional regular expressions (as you probably know for grep or sed and such), and especially they're not the perl's (extended) regular expressions.
So, if you want use filename expansion (aka globbing) you're stuck with the above two (simple/extended) pattern rules. Of course, bash knows regular expressions, but not for filename-expansion (globbing).
So, you can for example do the next:
shopt -s globstar #if you haven't already enabled - for the ** expansion
regex="[0-9]{8}.*\.jpg"
for file in ./**/*.jpg #will match all *.jpg recusrively (globstar)
do
#try the regex matching
[[ $file =~ $regex ]] || continue #didn't match
#matched! - do something with the file
echo "the $file has at least 8 digits"
done
or you can use, the find command with the built-in regex matching rules (see other answers), or the grep with perl-like regexes, such:
find somewhere -type f -name \*.jpg -maxdepth 1 -print0 | grep -zP '/\d{8}.*.jpg'
The speed: for the large trees the find is faster. At least on my notebook, where:
while IFS= read -d $'\0' -r file
do
echo "$file"
done < <(find ~/Pictures -name \*.JPG -print0 | grep -zP 'P\d{4}.*\.JPG')
runs real 0m1.593s, and the
regex="P[0-9]{4}.*\.JPG"
for file in ~/Pictures/**/*.JPG
do
[[ $file =~ $regex ]] || continue #didn't match
echo "$file"
done
runs real 0m3.628s seconds.
On the small trees, IMHO is better to use the builting bash regexes. (maybe, I prefer it because i like the ./**/*.ext expansion, and got all filenames correctly inside the variable, regardless of spaces and like, without the care about the -print0 and read -d $'\0; and such...)

How can I exclude directories matching certain patterns from the output of the Linux 'find' command?

I want to use regex's with Linux's find command to dive recursively into a gargantuan directory tree, showing me all of the .c, .cpp, and .h files, but omitting matches containing certain substrings. Ultimately I want to send the output to an xargs command to do certain processing on all of the matching files. I can pipe the find output through grep to remove matches containing those substrings, but that solution doesn't work so well with filenames that contain spaces. So I tried using find's -print0 option, which terminates each filename with a nul char instead of a newline (whitespace), and using xargs -0 to expect nul-delimited input instead of space-delimited input, but I couldn't figure out how to pass the nul-delimited find through the piped grep filters successfully; grep -Z didn't seem to help in that respect.
So I figured I'd just write a better regex for find and do away with the intermediary grep filters... perhaps sed would be an alternative?
In any case, for the following small sampling of directories...
./barney/generated/bam bam.h
./barney/src/bam bam.cpp
./barney/deploy/bam bam.h
./barney/inc/bam bam.h
./fred/generated/dino.h
./fred/src/dino.cpp
./fred/deploy/dino.h
./fred/inc/dino.h
...I want the output to include all of the .h, .c, and .cpp files but NOT those ones that appear in the 'generated' and 'deploy' directories.
BTW, you can create an entire test directory (named fredbarney) for testing solutions to this question by cutting & pasting this whole line into your bash shell:
mkdir fredbarney; cd fredbarney; mkdir fred; cd fred; mkdir inc; mkdir docs; mkdir generated; mkdir deploy; mkdir src; echo x > inc/dino.h; echo x > docs/info.docx; echo x > generated/dino.h; echo x > deploy/dino.h; echo x > src/dino.cpp; cd ..; mkdir barney; cd barney; mkdir inc; mkdir docs; mkdir generated; mkdir deploy; mkdir src; echo x > 'inc/bam bam.h'; echo x > 'docs/info info.docx'; echo x > 'generated/bam bam.h'; echo x > 'deploy/bam bam.h'; echo x > 'src/bam bam.cpp'; cd ..;
This command finds all of the .h, .c, and .cpp files...
find . -regextype posix-egrep -regex ".+\.(c|cpp|h)$"
...but if I pipe its output through xargs, the 'bam bam' files each get treated as two separate (nonexistant) filenames (note that here I'm simply using ls as a stand-in for what I actually want to do with the output):
$ find . -regextype posix-egrep -regex ".+\.(c|cpp|h)$" | xargs -n 1 ls
ls: ./barney/generated/bam: No such file or directory
ls: bam.h: No such file or directory
ls: ./barney/src/bam: No such file or directory
ls: bam.cpp: No such file or directory
ls: ./barney/deploy/bam: No such file or directory
ls: bam.h: No such file or directory
ls: ./barney/inc/bam: No such file or directory
ls: bam.h: No such file or directory
./fred/generated/dino.h
./fred/src/dino.cpp
./fred/deploy/dino.h
./fred/inc/dino.h
So I can enhance that with the -print0 and -0 args to find and xargs:
$ find . -regextype posix-egrep -regex ".+\.(c|cpp|h)$" -print0 | xargs -0 -n 1 ls
./barney/generated/bam bam.h
./barney/src/bam bam.cpp
./barney/deploy/bam bam.h
./barney/inc/bam bam.h
./fred/generated/dino.h
./fred/src/dino.cpp
./fred/deploy/dino.h
./fred/inc/dino.h
...which is great, except that I don't want the 'generated' and 'deploy' directories in the output. So I try this:
$ find . -regextype posix-egrep -regex ".+\.(c|cpp|h)$" -print0 | grep -v generated | grep -v deploy | xargs -0 -n 1 ls
barney fred
...which clearly does not work. So I tried using the -Z option with grep (not knowing exactly what the -Z option really does) and that didn't work either. So I figured I'd write a better regex for find and this is the best I could come up with:
find . -regextype posix-egrep -regex "(?!.*(generated|deploy).*$)(.+\.(c|cpp|h)$)" -print0 | xargs -0 -n 1 ls
...but bash didn't like that (!.*: event not found, whatever that means), and even if that weren't an issue, my regex doesn't seem to work on the regex tester web page I normally use.
Any ideas how I can make this work? This is the output I want:
$ find . [----options here----] | [----maybe grep or sed----] | xargs -0 -n 1 ls
./barney/src/bam bam.cpp
./barney/inc/bam bam.h
./fred/src/dino.cpp
./fred/inc/dino.h
...and I'd like to avoid scripts & temporary files, which I suppose might be my only option.
Thanks in advance!
-Mark
This works for me:
find . -regextype posix-egrep -regex '.+\.(c|cpp|h)$' -not -path '*/generated/*' \
-not -path '*/deploy/*' -print0 | xargs -0 ls -L1d
Changes from your version are minimal: I added exclusions of certain path patterns separately, because that's easier, and I single-quote things to hide them from shell interpolation.
The event not found is because ! is being interpreted as a request for history expansion by bash. The fix is to use single quotes instead of double quotes.
Pop quiz: What characters are special inside of a single-quoted string in sh?
Answer: Only ' is special (it ends the string). That's the ultimate safety.
grep with -Z (sometimes known as --null) makes grep output terminated with a null character instead of newline. What you wanted was -z (sometimes known as --null-data) which causes grep to interpret a null character in its input as end-of-line instead of a newline character. This makes it work as expected with the output of find ... -print0, which adds a null character after each file name instead of a newline.
If you had done it this way:
find . -regextype posix-egrep -regex '.+\.(c|cpp|h)$' -print0 | \
grep -vzZ generated | grep -vzZ deploy | xargs -0 ls -1Ld
Then the input and output of grep would have been null-delimited and it would have worked correctly... until one of your source files began being named deployment.cpp and started getting "mysteriously" excluded by your script.
Incidentally, here's a nicer way to generate your testcase file set.
while read -r file ; do
mkdir -p "${file%/*}"
touch "$file"
done <<'DATA'
./barney/generated/bam bam.h
./barney/src/bam bam.cpp
./barney/deploy/bam bam.h
./barney/inc/bam bam.h
./fred/generated/dino.h
./fred/src/dino.cpp
./fred/deploy/dino.h
./fred/inc/dino.h
DATA
Since I did this anyway to verify I figured I'd share it and save you from repetition. Don't do anything twice! That's what computers are for.
Your command:
find . -regextype posix-egrep -regex "(?!.*(generated|deploy).*$)(.+\.(c|cpp|h)$)" -print0 | xargs -0 -n 1 ls
fails because you are trying to use Posix extended regular expressions, which dont support lookaround/lookbehind etc. https://superuser.com/a/596499/658319
find does support pcre, so if you convert to pcre, this should work.