How can I run a regex against a filename? - regex

In a list of files:
javascript-custom-rules-plugin-1.0-SNAPSHOT.jar
README.txt
sonar-build-breaker-plugin-2.0.jar
sonar-javascript-plugin-2.11.jar
tmo-custom-rules-1.0.jar
I am attempting to match these filenames by regex.
My Script
#!/usr/bin/env bash
install_location=/usr/local/sonar/extensions/plugins
for f in $(ls -1 $install_location)
do
# remove any previous versions of this plugin
if [[ "$f" =~ ".*tmo-custom-rules-(.+)\.jar" ]]
then
echo "found $f. will remove"
else
echo "$f doesn't match"
fi
done
I've tried if [[ "$f" =~ ".*tmo-custom-rules-(.+)\.jar" ]] and if [[ "$f" == *"tmo-custom-rules" ]] to no avail.
I'm getting
javascript-custom-rules-plugin-1.0-SNAPSHOT.jar doesn't match
README.txt doesn't match
sonar-build-breaker-plugin-2.0.jar doesn't match
sonar-javascript-plugin-2.11.jar doesn't match
tmo-custom-rules-1.0.jar doesn't match
when I expect found tmo-custom-rules-1.0.jar. will remove
I've run my regular expression through many regular expression testers with the data above, and they all return the correct matches, but I can't get it to work here in my script.
How can I loop through, and check to see if any files matches this regular expression?

In BASH regex must be unquoted so this should work:
[[ $f =~ .*tmo-custom-rules-(.+)\.jar ]]
Or better:
re=".*tmo-custom-rules-(.+)\.jar"
[[ $f =~ $re ]]
However you don't even need regex and can use shell glob matching:
#!/usr/bin/env bash
install_location=/usr/local/sonar/extensions/plugins
for f in "$install_location"/*
do
# remove any previous versions of this plugin
if [[ $f == *tmo-custom-rules-*.jar ]]
then
echo "found $f. will remove"
else
echo "$f doesn't match"
fi
done
Note that you can avoid using output of ls which is not always fit for scripting.

You can do this with expr using the colon operator:
if expr "$f" : '.*tmo-custom-rules-.*\.jar' > /dev/null; then
echo matches
fi
Note that the regular expressions in this context are assumed to be anchored to the start of the line.

Related

Using regular expressions in a ksh Script

I have a file (file.txt) that contains some text like:
000000000+000+0+00
000000001+000+0+00
000000002+000+0+00
and I am trying to check each line to make sure that it follows the format:
character*9, "+", character*3, "+", etc
so far I have:
#!/bin/ksh
file=file.txt
line_number=1
for line in $(cat $file)
do
if [[ "$line" != "[[.]]{9}+[[.]]{3}+[[.]]{1}+[[.]]{2} ]" ]]
then
echo "Invalid number ($line) check line $line_number"
exit 1
fi
let "line_number++"
done
however this does not evaluate correctly, no matter what I put in the lines the program terminates.
When you want line numbers of the mismatches, you can use grep -vn. Be careful with writing a correct regular expression, and you will have
grep -Evn "^.{9}[+].{3}[+].[+].{2}$" file.txt
This is not in the layout that you want, so change the layout with sed:
grep -Evn "^.{9}[+].{3}[+].[+].{2}$" file.txt |
sed -r 's/([^:]*):(.*)/Invalid number (\2) check line number \1./'
EDIT:
I changed .{1} into ..
The sed is also over the top. When you need spme explanation, you can start with echo "Linenr:Invalid line"
I'm having funny results putting the regex in the condition directly:
$ line='000000000+000+0+00'
$ [[ $line =~ ^.{9}\+.{3}\+.\+..$ ]] && echo ok
ksh: syntax error: `~(E)^.{9}\+.{3}\+.\+..$ ]] && echo ok
' unexpected
But if I save the regex in a variable:
$ re="^.{9}\+.{3}\+.\+..$"
$ [[ $line =~ $re ]] && echo ok
ok
So you can do
#!/bin/ksh
file=file.txt
line_number=1
re="^.{9}\+.{3}\+.\+..$"
while IFS= read -r line; do
if [[ ! $line =~ $re ]]; then
echo "Invalid number ($line) check line $line_number"
exit 1
fi
let "line_number++"
done < "$file"
You can also use a plain glob pattern:
if [[ $line != ?????????+???+?+?? ]]; then echo error; fi
ksh glob patterns have some regex-like syntax. If there's an optional space in there, you can handle that with the ?(sub-pattern) syntax
pattern="?????????+???+?( )?+??"
line1="000000000+000+0+00"
line2="000000000+000+ 0+00"
[[ $line1 == $pattern ]] && echo match || echo no match # => match
[[ $line2 == $pattern ]] && echo match || echo no match # => match
Read the "File Name Generation" section of the ksh man page.
Your regex looks bad - using sites like https://regex101.com/ is very helpful. From your description, I suspect it should look more like one of these;
^.{9}\+.{3}\+.{1}\+.{2}$
^[^\+]{9}\+[^\+]{3}\+[^\+]{1}\+[^\+]{2}$
^[0-9]{9}\+[0-9]{3}\+[0-9]{1}\+[0-9]{2}$
From the ksh manpage section on [[ - you would probably want to be using =~.
string =~ ere
True if string matches the pattern ~(E)ere where ere is an extended regular expression.
Note: As far as I know, ksh regex doesn't follow the normal syntax
You may have better luck with using grep:
# X="000000000+000+0+00"
# grep -qE "^[^\+]{9}\+[^\+]{3}\+[^\+]{1}\+[^\+]{2}$" <<<"${X}" && echo true
true
Or:
if grep -qE "^[^\+]{9}\+[^\+]{3}\+[^\+]{1}\+[^\+]{2}$" <<<"${line}"
then
exit 1
fi
You may also prefer to use a construct like below for handling files:
while read line; do
echo "${line}";
done < "${file}"

Regex not matching name in filepath

I have a folder with ipa files. I need to identify them by having a appstore or enterprise in the filename.
mles:drive-ios-swift mles$ ls build
com.project.drive-appstore.ipa
com.project.test.swift.dev-enterprise.ipa
com.project.drive_v2.6.0._20170728_1156.ipa
I've tried:
#!/bin/bash -veE
fileNameRegex="**appstore**"
for appFile in build-test/*{.ipa,.apk}; do
if [[ $appFile =~ $fileNameRegex ]]; then
echo "$appFile Matches"
else
echo "$appFile Does not match"
fi
done
However nothing matches:
mles:drive-ios-swift mles$ ./test.sh
build-test/com.project.drive-appstore.ipa Does not match
build-test/com.project.drive_v2.6.0._20170728_1156.ipa Does not match
build-test/com.project.test.swift.dev-enterprise.ipa Does not match
build-test/*.apk Does not match
How would the correct script look like to match build-test/com.project.drive-appstore.ipa?
You are confusing between the glob string match with a regex match. For a greedy glob match like * you can just use the test operator with ==,
#!/usr/bin/env bash
fileNameGlob='*appstore*'
# ^^^^^^^^^^^^ Single quote the regex string
for appFile in build-test/*{.ipa,.apk}; do
# To skip non-existent files
[[ -e $appFile ]] || continue
if [[ $appFile == *${fileNameGlob}* ]]; then
echo "$appFile Matches"
else
echo "$appFile Does not match"
fi
done
produces a result
build-test/com.project.drive_v2.6.0._20170728_1156.ipa Does not match
build-test/com.project.drive-appstore.ipa Matches
build-test/com.project.test.swift.dev-enterprise.ipa Does not match
(or) with a regex use greedy match .* as
fileNameRegex='.*appstore.*'
if [[ $appFile =~ ${fileNameRegex} ]]; then
# rest of the code
That said to match your original requirement to match enterprise or appstore string in file name use extended glob matches in bash
Using glob:
shopt -s nullglob
shopt -s extglob
fileExtGlob='*+(enterprise|appstore)*'
if [[ $appFile == ${fileExtGlob} ]]; then
# rest of the code
and with regex,
fileNameRegex2='enterprise|appstore'
if [[ $appFile =~ ${fileNameRegex2} ]]; then
# rest of the code
You can use the following regex to match appstore and enterprise in a filename:
for i in build-test/*; do if [[ $i =~ appstore|enterprise ]]; then echo $i; fi; done

Native bash regexp [[ $f =~ "^[^\.]+$" ]] never matching

I'm currently trying to loop through all files in a certain directory using bash. If the file matches the following regular expression, it outputs the filename. If it doesn't, it outputs 'not' and then the filename. The regular expression is supposed to filter out any files that have a '.' in them.
for f in * ; do
if [[ $f =~ "^[^\.]+$" ]]; then
echo "$f"
else
echo "not $f"
fi
done
It correctly loops through all the files, but for a reason that has stumped me for quite a while, I cannot get it to only exclude files with a '.' in them. For example, in a directory with the following files:
bashrc
gitconfig
install.sh
README.md
vimrc
the output of the script is such:
not bashrc
not gitconfig
not install.sh
not README.md
not vimrc
I validated the regular expression here. Any thoughts?
Don't quote the right-hand side of your expression.
if [[ $f =~ ^[^.]+$ ]]; then
Quotes make the string a literal substring, rather than a regular expression.
For better portability across bash versions, put your regex in a variable (single-quoted, which will make the backslash literal):
re='^[.]+$'
if [[ $f =~ $re ]]; then
That said, you could do this with an extglob as well:
shopt -s extglob # enable extended globs
for f in +([!.]); do
printf 'Matched %q\n' "$f"
done
...or with a general-purpose pattern match:
for f in *; do
if [[ $f = *.* ]]; then
printf '%q contains a dot\n' "$f"
else
printf '%q does not contain a dot\n' "$f"
fi
done

bash if [[ =~ regex compare not working?

I have a value in a variable that may be absolute or relative url, and I need to check which one it is.
I have found that there's a =~ operator in [[, but I can't get it to work. What am I doing wrong?
url="http://test"
if [[ "$url" =~ "^http://" ]];
then echo "absolute.";
fi;
You need to use regex without quote:
url="http://test"
if [[ "$url" =~ ^http:// ]]; then
echo "absolute."
fi
This outputs `absolute. as regex needs to be without quote in newer BASH (after BASH v3.1)
Or avoid regex and use glob matching:
if [[ "$url" == "http://"* ]]; then
echo "absolute."
fi

Regular expression for extracting date

I a regular expression to match a date on the form 01/Jan/2000:23:59:59. I managed to match the pattern using Notepad++'s regex interpreter, using the following regex:
[1-3][0-9]/[A-Z][a-z]{2}/(19|20)[0-9]{2}:[0-9]{2}:[0-5][0-9]:[0-5][0-9]
Unfortunately, I need to do this with bash. AWK is not an option right now, I'm afraid. So, I tried to convert the above regex into something that bash would interpret in the same way. Thus far, I've come up with this:
[1-3][0-9]/[A-Z][a-z]\{2\}/(19|20)[0-9]\{2\}:[0-2][0-9]:[0-5][0-9]:[0-5][0-9]
The full command I'm using is
expr "$line" : '\([1-3][0-9]/[A-Z][a-z]\{2\}/(19|20)[0-9]\{2\}:[0-2][0-9]:[0-5][0-9]:[0-5][0-9]\)'
where $line contains the string out of which I need to extract the date. Unfortunately my bash version of the regex doesn't work. I have tried different things, like escaping / and :, but I can't seem to get it to work. What am I doing wrong?
The only problem was your first pattern [1-3]. It should be [0-3].
[[ $DATE =~ [0-3][0-9]/[A-Z][a-z]{2}/(19|20)[0-9]{2}:[0-9]{2}:[0-5][0-9]:[0-5][0-9] ]]
Also, on some earlier versions of Bash you have to store it on a variable:
RE='[0-3][0-9]/[A-Z][a-z]{2}/(19|20)[0-9]{2}:[0-9]{2}:[0-5][0-9]:[0-5][0-9]'
[[ $DATE =~ $RE ]]
Example:
> DATE='01/Jan/2000:23:59:59'
> [[ $DATE =~ [0-3][0-9]/[A-Z][a-z]{2}/(19|20)[0-9]{2}:[0-9]{2}:[0-5][0-9]:[0-5][0-9] ]] && echo Match.
Match.
Bash 3.0:
> echo "$BASH_VERSION"
3.00.0(1)-release
> DATE='01/Jan/2000:23:59:59'
> RE='[0-3][0-9]/[A-Z][a-z]{2}/(19|20)[0-9]{2}:[0-9]{2}:[0-5][0-9]:[0-5][0-9]'
> [[ $DATE =~ $RE ]] && echo Match.
Match.
If you want to apply it on a loop, you can have something like this:
RE='[0-3][0-9]/[A-Z][a-z]{2}/(19|20)[0-9]{2}:[0-9]{2}:[0-5][0-9]:[0-5][0-9]'
while read -r LINE; do
[[ $LINE =~ $RE ]] && echo "Match: $LINE"
done < date_list.txt
By the way, if you want to exactly match the whole word only use add ^ and $ at the beginning and the end of pattern:
[[ $DATE =~ ^[0-3][0-9]/[A-Z][a-z]{2}/(19|20)[0-9]{2}:[0-9]{2}:[0-5][0-9]:[0-5][0-9]$ ]]
To extract matches on the line use () and BASH_REMATCH:
[[ $DATE =~ .*([0-3][0-9]/[A-Z][a-z]{2}/(19|20)[0-9]{2}:[0-9]{2}:[0-5][0-9]:[0-5][0-9]).* ]] && echo "${BASH_REMATCH[1]}"