Regular expression in bash not working - regex

Is there any way in bash so that I can match the patter like that
[0-9]{8}.*.jpg
I have written the above for the following pattern match
"First 8 character should be digit and rest of them would be anything and end with .jpg"
but the above is not working. if I write in the below manner it's working
[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9].*.jpg
Now suppose I want first 20 character must be digit should I repeat the [0-9] 20 times.. I think there is a better solution available which i don't know...
If anyone know please help....

You can use the regex in find:
find test -regextype posix-extended -regex "^[0-9]{8}.*.jpg$"
Test
$ touch test/12345678aaa.jpg
$ touch test/1234567aaa.jpg
$ find test -regextype posix-extended -regex ".*/[0-9]{8}.*"
test/12345678aaa.jpg
And if it is related to the previous question, you can use:
for file in $(find test -regextype posix-extended -regex ".*/[0-9]{8}.*")
do
echo "my file is $file"
done
If you create directories and files in them, more matchings can appear:
$ mkdir test/123456789.dir
$ touch test/123456789.dir/1234567890.jpg
You can filter by -type f, so that you just get files:
$ find test -type f -regextype posix-extended -regex ".*/[0-9]{8}.*"
test/12345678aaa.jpg
test/123456789.dir/1234567890.jpg
And/or specify the depth of the find, so that it does not contain subdirectories:
$ find test -maxdepth 1 -type f -regextype posix-extended -regex ".*/[0-9]{8}.*"
test/12345678aaa.jpg

It looks like you're trying to generate a list of filenames from a regular expression. You can do that, but not directly from Bash as far as I know. Instead, use find:
find -E . -regex '.*/[0-9]{8}.*\.jpg' -depth 1
Something like that works on my Mac OS X system; on Linux the . for current directory is optional, or you can specify a different directory to search in. I added -depth 1 to avoid descending into subdirectories.

A bit late answer.
Bash's filename exapnsion patterns ( called globbing ) has it's own rules. They're exists in two forms:
simple globbing
extended globbing (if you have enabled shopts -s extglob
You can read about the both rules for example here. (3.5.8.1 Pattern Matching)
You should remember, the globbing rules aren't the traditional regular expressions (as you probably know for grep or sed and such), and especially they're not the perl's (extended) regular expressions.
So, if you want use filename expansion (aka globbing) you're stuck with the above two (simple/extended) pattern rules. Of course, bash knows regular expressions, but not for filename-expansion (globbing).
So, you can for example do the next:
shopt -s globstar #if you haven't already enabled - for the ** expansion
regex="[0-9]{8}.*\.jpg"
for file in ./**/*.jpg #will match all *.jpg recusrively (globstar)
do
#try the regex matching
[[ $file =~ $regex ]] || continue #didn't match
#matched! - do something with the file
echo "the $file has at least 8 digits"
done
or you can use, the find command with the built-in regex matching rules (see other answers), or the grep with perl-like regexes, such:
find somewhere -type f -name \*.jpg -maxdepth 1 -print0 | grep -zP '/\d{8}.*.jpg'
The speed: for the large trees the find is faster. At least on my notebook, where:
while IFS= read -d $'\0' -r file
do
echo "$file"
done < <(find ~/Pictures -name \*.JPG -print0 | grep -zP 'P\d{4}.*\.JPG')
runs real 0m1.593s, and the
regex="P[0-9]{4}.*\.JPG"
for file in ~/Pictures/**/*.JPG
do
[[ $file =~ $regex ]] || continue #didn't match
echo "$file"
done
runs real 0m3.628s seconds.
On the small trees, IMHO is better to use the builting bash regexes. (maybe, I prefer it because i like the ./**/*.ext expansion, and got all filenames correctly inside the variable, regardless of spaces and like, without the care about the -print0 and read -d $'\0; and such...)

Related

using regex to iterate over files that matches a certain pattern in bash scripts

I have a regrex pattern ([0-9]*\.)+[0-9]*_to_([0-9]*\.)+[0-9]*, a matching string would be 11.1.1.1_to_21.1.1.1. I want to discover all files under a directory with the above pattern.
However I am not able to get it correctly using the code below. I tried to escape ( and ) by adding \ before them, but that did not work.
dir=$SCRIPT_PATH/oaa_partition/upgrade/([0-9]*\.)+[0-9]*_to_([0-9]*\.)+[0-9]*.sql
for FILE in $dir; do
echo $FILE
done
I was only able to something like this
dir=$SCRIPT_PATH/oaa_partition/upgrade/[0-9]*_to_*.sql
for FILE in $dir; do
echo $FILE
done
Need some help on how to use the full regrex pattern ([0-9]*\.)+[0-9]*_to_([0-9]*\.)+[0-9]* here.
Your regex is simple enough for replacing it with a bash extglob
#!/bin/bash
shopt -s extglob
glob='+(*([0-9]).)*([0-9])_to_+(*([0-9]).)*([0-9]).sql'
for file in "$SCRIPT_PATH"/oaa_partition/upgrade/$glob
do
printf '%q\n' "$file"
done
If the regex is too complex for translating it to extended globs then you can filter the files using a bash regex inside the for loop:
#!/bin/bash
regex='([0-9]*\.)+[0-9]*_to_([0-9]*\.)+[0-9]*\.sql'
for file in "$SCRIPT_PATH"/oaa_partition/upgrade/*_to_*.sql
do
[[ $file =~ /$regex\.sql$ ]] || continue
printf '%q\n' "$file"
done
BTW, as it is, your regex could match a lot of unwanted things, for example: 0._to_..sql.
If this is enough for differentiating the targeted files from the others then you can probably just use the basic glob
[0-9]*_to_[0-9]*.sql
To fix the regex you would want to match at least 1 number before the dot, and if you go with it, a literal dot before the sql
([0-9]+\.)+[0-9]*_to_([0-9]+\.)+[0-9]*\.sql
https://regex101.com/r/5xB3Bt/1
You cannot use regular expression in for loop. It only supports glob patterns and that is not as robust as a regex.
You will have to use your regex in gnu-find command as:
find . -mindepth 1 -maxdepth 1 -regextype egrep -regex '.*/([0-9]*\.)+[0-9]*_to_([0-9]*\.)+[0-9]*\.sql'
To loop these entries:
while IFS= read -rd '' file; do
echo "$file"
done < <(find . -mindepth 1 -maxdepth 1 -regextype egrep -regex '.*/([0-9]*\.)+[0-9]*_to_([0-9]*\.)+[0-9]*\.sql')

pattern to match multiple filenames with find utility

How to find multiple filenames with the bash find command?
$ find /path/* -type f -name pattern
The pattern should match a list of file names:
fname1.jpg
fname2.png
myfile.css
example.gif
I tryed with
https://alvinalexander.com/linux-unix/linux-find-multiple-filenames-patterns-command-example
find multiple filenames command: finding three filename extensions
find . -type f \( -name "*cache" -o -name "*xml" -o -name "*html" \)
and it works.
Anyway I think it would be cleaner with a -name pattern, rather than with a list of -names.
from
$ man find
-name pattern
I m searching for something like: -name '[fname2.png|myfile.css|example.gif ]'
-regex alternative would look as follows:
find . -type f -regextype posix-egrep -regex ".+\.(jpg|png|css)$"
As for -name option:
-name pattern - Base of file name (the path with the leading
directories removed) matches shell pattern.
Shell pattern is not a full-fledged regex pattern.
Just mix them:
find -name "aoc*" -regextype awk -regex ".*[0-9].(class|scala)"
This searches for files, matching shell-pattern aoc* and end in number, with ending .class or .scala.
For your example:
find -name "fname*" -regextype awk -regex ".*[0-9].(png|jpg|css)"
Available types are listet with:
find -regextype -help
However, I first tried "-regextype sed" which is available, but sed itself has options, changing the styles of regexes. And patterns I used to use with sed didn't work, but since the pattern works with awk, it's sufficient for me.

Unix regular expression

I need to use unix regular expression in unix find command:
find "/home/user/somePath/" -maxdepth 1 ! -regex
"/home/user/somePath/someUnwantedPath" ! -regex
"/home/user/somePath/someMoreUnwantedPath"
This works but I need to optimize the regex into a single one because the unwanted paths are more than just a few.
I suppose you can do it with alternation.
/home/user/somePath/(someUnwantedPath|someMoreUnwantedPath)
find "/home/user/somePath/" -maxdepth 1 ! -regex "/home/user/somePath/(someUnwantedPath|someMoreUnwantedPath)"
Just add more paths at the end of the end of the parenthesized group starting with a new | as alternation delimiter. I.e. |AnotherUnwantedPath.
Edit
I'm a "Windows dude", so I'm not that familiar with Unix, but I wanted to try it out on BUW, and it appears you have to escape regex metacharacters. So I guess the correct answer should be
/home/user/somePath/\(someUnwantedPath\|someMoreUnwantedPath\)/.*
find "/home/user/somePath/" -maxdepth 1 ! -regex "/home/user/somePath/\(someUnwantedPath\|someMoreUnwantedPath\)/.*"
You can use grep -v -f instead of regex to make it clean. The alternate (|) operator does not work in many unix systems.Your exclude files should list down all the files to be excluded(including subdirectories if any. )
cat excl_files.txt
/home/user/somePath/someUnwantedPath1
/home/user/somePath/someUnwantedPath2
/home/user/somePath/someUnwantedPath3
..
/home/user/somePath/someUnwantedPathn
find "/home/user/somePath/" -maxdepth 1 | grep -v -f excl_files.txt

how to loop through files that match a regular expression in a unix shell script

I want to be able to loop through a list of files that match a particular pattern. I can get unix to list these files using ls and egrep with a regular expression, but I cannot find a way to turn this into an iterative process. I suspect that using ls is not the answer. Any help would be gratefully received.
My current ls command looks as follows:
ls | egrep -i 'MYFILE[0-9][0-9]([0][1-9]|1[0-2])([0][1-9]|[12][0-9]|[3][01]).dat'
I would expect the above to match:
MYFILE160418.dat
myFILE170312.DAT
MyFiLe160416.DaT
but not:
MYOTHERFILE150202.DAT
Myfile.dat
myfile.csv
Thanks,
Paul.
Based on the link Andy K provided I have used the following to loop based on my matching criteria:
for i in $(ls | egrep -i 'MYFILE[0-9][0-9]([0][1-9]|1[0-2])([0][1-9]|[12][0-9]|[3][01]).dat' ); do
echo item: $i;
done
You can use (GNU) find with the regex search option instead of parsing ls.
find . -regextype "egrep" \
-iregex '.*/MYFILE[0-9][0-9]([0][1-9]|1[0-2])([0][1-9]|[12][0-9]|[3][01]).dat' \
-exec [[whatever you want to do]] {} \;
Where [[whatever you want to do]] is the command you want to perform on the names of the files.
From the man page
-regextype type
Changes the regular expression syntax understood by -regex and -iregex tests
which occur later on the command line. Currently-implemented types are
emacs (this is the default),posix-awk, posix-basic, posix-egrep and
posix-extended.
-regex pattern
File name matches regular expression pattern. This is a match on the whole
path, not a search. For example, to match a file named `./fubar3', you can
use the regular expression
`.*bar.' or `.*b.*3', but not `f.*r3'. The regular expressions understood by
find are by default Emacs Regular Expressions, but this can be changed with
the -regextype option.
-iregex pattern
Like -regex, but the match is case insensitive.

regextype with find command

I am trying to use the find command with -regextype but it could not able to work properly.
I am trying to find all c and h files send them to pipe and grep the name, func_foo inside those files. What am I missing?
$ find ./ -regextype sed -regex ".*\[c|h]" | xargs grep -n --color func_foo
Also in a similar aspect I tried the following command but it gives me an error like paths must precede expression:
$ find ./ -type f "*.c" | xargs grep -n --color func_foo
The accepted answer contains some inaccuracies.
On my system, GNU find's manpage says to run find -regextype help to see the list of supported regex types.
# find -regextype help
find: Unknown regular expression type 'help'; valid types are 'findutils-default', 'awk', 'egrep', 'ed', 'emacs', 'gnu-awk', 'grep', 'posix-awk', 'posix-basic', 'posix-egrep', 'posix-extended', 'posix-minimal-basic', 'sed'.
E.g. find . -regextype egrep -regex '.*\.(c|h)' finds .c and .h files.
Your regexp syntax was wrong, you had square brackets instead of parentheses. With square brackets, it would be [ch].
You can just use the default regexp type as well: find . -regex '.*\.\(c\|h\)$' also works. Notice that you have to escape (, |, ) characters in this case (with sed regextype as well). You don't have to escape them when using egrep, posix-egrep, posix-extended.
Why not just do:
find ./ -name "*.[c|h]" | xargs grep -n --color func_foo
and
find ./ -type f -name "*.c" | xargs grep -n --color func_foo
Regarding the valid paramters to find's option -regextype this comes verbatim from man find:
-regextype type
Changes the regular expression syntax understood by -regex and -iregex tests which occur later on
the command line. Currently-implemented types are emacs (this is the default),
posix-awk, posix-basic, posix-egrep and posix-extended
There is no type sed.