Unix regular expression - regex

I need to use unix regular expression in unix find command:
find "/home/user/somePath/" -maxdepth 1 ! -regex
"/home/user/somePath/someUnwantedPath" ! -regex
"/home/user/somePath/someMoreUnwantedPath"
This works but I need to optimize the regex into a single one because the unwanted paths are more than just a few.

I suppose you can do it with alternation.
/home/user/somePath/(someUnwantedPath|someMoreUnwantedPath)
find "/home/user/somePath/" -maxdepth 1 ! -regex "/home/user/somePath/(someUnwantedPath|someMoreUnwantedPath)"
Just add more paths at the end of the end of the parenthesized group starting with a new | as alternation delimiter. I.e. |AnotherUnwantedPath.
Edit
I'm a "Windows dude", so I'm not that familiar with Unix, but I wanted to try it out on BUW, and it appears you have to escape regex metacharacters. So I guess the correct answer should be
/home/user/somePath/\(someUnwantedPath\|someMoreUnwantedPath\)/.*
find "/home/user/somePath/" -maxdepth 1 ! -regex "/home/user/somePath/\(someUnwantedPath\|someMoreUnwantedPath\)/.*"

You can use grep -v -f instead of regex to make it clean. The alternate (|) operator does not work in many unix systems.Your exclude files should list down all the files to be excluded(including subdirectories if any. )
cat excl_files.txt
/home/user/somePath/someUnwantedPath1
/home/user/somePath/someUnwantedPath2
/home/user/somePath/someUnwantedPath3
..
/home/user/somePath/someUnwantedPathn
find "/home/user/somePath/" -maxdepth 1 | grep -v -f excl_files.txt

Related

using regex to iterate over files that matches a certain pattern in bash scripts

I have a regrex pattern ([0-9]*\.)+[0-9]*_to_([0-9]*\.)+[0-9]*, a matching string would be 11.1.1.1_to_21.1.1.1. I want to discover all files under a directory with the above pattern.
However I am not able to get it correctly using the code below. I tried to escape ( and ) by adding \ before them, but that did not work.
dir=$SCRIPT_PATH/oaa_partition/upgrade/([0-9]*\.)+[0-9]*_to_([0-9]*\.)+[0-9]*.sql
for FILE in $dir; do
echo $FILE
done
I was only able to something like this
dir=$SCRIPT_PATH/oaa_partition/upgrade/[0-9]*_to_*.sql
for FILE in $dir; do
echo $FILE
done
Need some help on how to use the full regrex pattern ([0-9]*\.)+[0-9]*_to_([0-9]*\.)+[0-9]* here.
Your regex is simple enough for replacing it with a bash extglob
#!/bin/bash
shopt -s extglob
glob='+(*([0-9]).)*([0-9])_to_+(*([0-9]).)*([0-9]).sql'
for file in "$SCRIPT_PATH"/oaa_partition/upgrade/$glob
do
printf '%q\n' "$file"
done
If the regex is too complex for translating it to extended globs then you can filter the files using a bash regex inside the for loop:
#!/bin/bash
regex='([0-9]*\.)+[0-9]*_to_([0-9]*\.)+[0-9]*\.sql'
for file in "$SCRIPT_PATH"/oaa_partition/upgrade/*_to_*.sql
do
[[ $file =~ /$regex\.sql$ ]] || continue
printf '%q\n' "$file"
done
BTW, as it is, your regex could match a lot of unwanted things, for example: 0._to_..sql.
If this is enough for differentiating the targeted files from the others then you can probably just use the basic glob
[0-9]*_to_[0-9]*.sql
To fix the regex you would want to match at least 1 number before the dot, and if you go with it, a literal dot before the sql
([0-9]+\.)+[0-9]*_to_([0-9]+\.)+[0-9]*\.sql
https://regex101.com/r/5xB3Bt/1
You cannot use regular expression in for loop. It only supports glob patterns and that is not as robust as a regex.
You will have to use your regex in gnu-find command as:
find . -mindepth 1 -maxdepth 1 -regextype egrep -regex '.*/([0-9]*\.)+[0-9]*_to_([0-9]*\.)+[0-9]*\.sql'
To loop these entries:
while IFS= read -rd '' file; do
echo "$file"
done < <(find . -mindepth 1 -maxdepth 1 -regextype egrep -regex '.*/([0-9]*\.)+[0-9]*_to_([0-9]*\.)+[0-9]*\.sql')

how to loop through files that match a regular expression in a unix shell script

I want to be able to loop through a list of files that match a particular pattern. I can get unix to list these files using ls and egrep with a regular expression, but I cannot find a way to turn this into an iterative process. I suspect that using ls is not the answer. Any help would be gratefully received.
My current ls command looks as follows:
ls | egrep -i 'MYFILE[0-9][0-9]([0][1-9]|1[0-2])([0][1-9]|[12][0-9]|[3][01]).dat'
I would expect the above to match:
MYFILE160418.dat
myFILE170312.DAT
MyFiLe160416.DaT
but not:
MYOTHERFILE150202.DAT
Myfile.dat
myfile.csv
Thanks,
Paul.
Based on the link Andy K provided I have used the following to loop based on my matching criteria:
for i in $(ls | egrep -i 'MYFILE[0-9][0-9]([0][1-9]|1[0-2])([0][1-9]|[12][0-9]|[3][01]).dat' ); do
echo item: $i;
done
You can use (GNU) find with the regex search option instead of parsing ls.
find . -regextype "egrep" \
-iregex '.*/MYFILE[0-9][0-9]([0][1-9]|1[0-2])([0][1-9]|[12][0-9]|[3][01]).dat' \
-exec [[whatever you want to do]] {} \;
Where [[whatever you want to do]] is the command you want to perform on the names of the files.
From the man page
-regextype type
Changes the regular expression syntax understood by -regex and -iregex tests
which occur later on the command line. Currently-implemented types are
emacs (this is the default),posix-awk, posix-basic, posix-egrep and
posix-extended.
-regex pattern
File name matches regular expression pattern. This is a match on the whole
path, not a search. For example, to match a file named `./fubar3', you can
use the regular expression
`.*bar.' or `.*b.*3', but not `f.*r3'. The regular expressions understood by
find are by default Emacs Regular Expressions, but this can be changed with
the -regextype option.
-iregex pattern
Like -regex, but the match is case insensitive.

Unix - File Search with Regular expression

find . -iname "abc_v?_test.txt" -print
Which finds all the files
abc_v1_test.txt, abc_v2_test.txt, ..., abc_v9_test.txt
But how can I get additionally get abc_v10_test.txt, abc_v11_test.txt..
You can use -regex option as well:
find . -regextype posix-egrep -iregex ".*abc_v[0-9]{1,2}_test\.txt$"
You still have the option to get all files and pass them to grep, ack or ag:
find . | ag 'abc_v\d+_test\.txt'
note you can replace ag by egrep
Finally, I have implemented in this way
find . -iname "abc_v*_test.txt" -print.
is there any Regular expression that accepts only 1 or 2 numbers after V?

Regular expression in bash not working

Is there any way in bash so that I can match the patter like that
[0-9]{8}.*.jpg
I have written the above for the following pattern match
"First 8 character should be digit and rest of them would be anything and end with .jpg"
but the above is not working. if I write in the below manner it's working
[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9].*.jpg
Now suppose I want first 20 character must be digit should I repeat the [0-9] 20 times.. I think there is a better solution available which i don't know...
If anyone know please help....
You can use the regex in find:
find test -regextype posix-extended -regex "^[0-9]{8}.*.jpg$"
Test
$ touch test/12345678aaa.jpg
$ touch test/1234567aaa.jpg
$ find test -regextype posix-extended -regex ".*/[0-9]{8}.*"
test/12345678aaa.jpg
And if it is related to the previous question, you can use:
for file in $(find test -regextype posix-extended -regex ".*/[0-9]{8}.*")
do
echo "my file is $file"
done
If you create directories and files in them, more matchings can appear:
$ mkdir test/123456789.dir
$ touch test/123456789.dir/1234567890.jpg
You can filter by -type f, so that you just get files:
$ find test -type f -regextype posix-extended -regex ".*/[0-9]{8}.*"
test/12345678aaa.jpg
test/123456789.dir/1234567890.jpg
And/or specify the depth of the find, so that it does not contain subdirectories:
$ find test -maxdepth 1 -type f -regextype posix-extended -regex ".*/[0-9]{8}.*"
test/12345678aaa.jpg
It looks like you're trying to generate a list of filenames from a regular expression. You can do that, but not directly from Bash as far as I know. Instead, use find:
find -E . -regex '.*/[0-9]{8}.*\.jpg' -depth 1
Something like that works on my Mac OS X system; on Linux the . for current directory is optional, or you can specify a different directory to search in. I added -depth 1 to avoid descending into subdirectories.
A bit late answer.
Bash's filename exapnsion patterns ( called globbing ) has it's own rules. They're exists in two forms:
simple globbing
extended globbing (if you have enabled shopts -s extglob
You can read about the both rules for example here. (3.5.8.1 Pattern Matching)
You should remember, the globbing rules aren't the traditional regular expressions (as you probably know for grep or sed and such), and especially they're not the perl's (extended) regular expressions.
So, if you want use filename expansion (aka globbing) you're stuck with the above two (simple/extended) pattern rules. Of course, bash knows regular expressions, but not for filename-expansion (globbing).
So, you can for example do the next:
shopt -s globstar #if you haven't already enabled - for the ** expansion
regex="[0-9]{8}.*\.jpg"
for file in ./**/*.jpg #will match all *.jpg recusrively (globstar)
do
#try the regex matching
[[ $file =~ $regex ]] || continue #didn't match
#matched! - do something with the file
echo "the $file has at least 8 digits"
done
or you can use, the find command with the built-in regex matching rules (see other answers), or the grep with perl-like regexes, such:
find somewhere -type f -name \*.jpg -maxdepth 1 -print0 | grep -zP '/\d{8}.*.jpg'
The speed: for the large trees the find is faster. At least on my notebook, where:
while IFS= read -d $'\0' -r file
do
echo "$file"
done < <(find ~/Pictures -name \*.JPG -print0 | grep -zP 'P\d{4}.*\.JPG')
runs real 0m1.593s, and the
regex="P[0-9]{4}.*\.JPG"
for file in ~/Pictures/**/*.JPG
do
[[ $file =~ $regex ]] || continue #didn't match
echo "$file"
done
runs real 0m3.628s seconds.
On the small trees, IMHO is better to use the builting bash regexes. (maybe, I prefer it because i like the ./**/*.ext expansion, and got all filenames correctly inside the variable, regardless of spaces and like, without the care about the -print0 and read -d $'\0; and such...)

Regex to match logfiles 1 to 11

I would like to simply fetch logfiles 1 to 11 out of 500 with one regex:
log4j-cnode1.log.11
log4j-cnode1.log.10
log4j-cnode1.log.9
log4j-cnode1.log.8
log4j-cnode1.log.7
log4j-cnode1.log.6
log4j-cnode1.log.5
log4j-cnode1.log.4
log4j-cnode1.log.3
log4j-cnode1.log.2
log4j-cnode1.log.1
so I do not want to fetch log4j-cnode1.log.12, log4j-cnode1.log.13, ... , log4j-cnode1.log.500
I was trying this command:
find . -iname "log4j-cnode1*\.log\.(1[0-1]|[1-9])"
why does this not work?
1 to 9 works fine with this:
find . -iname "log4j-cnode1*\.log\.[1-9]"
Because -iname doesn't accept regular expressions, and even if it would, your 1* would probably not be what you want. Use -iregex:
find -regextype posix-extended -iregex '(.*/)?log4j-cnode1.*\.log\.(1[0-1]|[1-9])'
find . -iname "log4j-cnode1*\.log\.(1?[0-9])"
Your Regex says 1 followed by 0 or 1 followed by 1-9
$ find -name 'log4j-cnode1*\.log\.[0-9]*'
./log4j-cnode1.log.1
./log4j-cnode1.log.10
./log4j-cnode1.log.11
./log4j-cnode1.log.2
./log4j-cnode1.log.3
./log4j-cnode1.log.4
./log4j-cnode1.log.5
./log4j-cnode1.log.6
./log4j-cnode1.log.7
./log4j-cnode1.log.8
./log4j-cnode1.log.9
You got it almost right.
But, instead of -iname, use -iregex with -regextype egrep (or awk), like this:
find . -regextype egrep \
-iregex ".*log4j-cnode1.*\.log\.(1[0-1]|[1-9])"