Regular expression does not match, no idea why - regex

I moved my server to a new one (cheaper). Both have the same Linux (CentOS) in different versions (5.0 and 6.5). I have a shell script that filters a line out of a log:
if [ -f $URLFILE ]
then
echo "File found, getting userinfo..."
while read line;
do
if [[ $line =~ ".Userlist: .*" ]]
then
echo "Found user information."
echo $line > /home/....net.txt;
...
So, if the red line matches the regex, it should be echoed into the file. This works fine on the old system but the regex does not match on the new system (without any other changes). The regex is correct as far as a regex tester on the Internet tells me.

There are 2 problems:
sh doesn't support regex matching using =~ like bash
Regex should NOT be quoted and your variable should preferably be quoted.
You can use this in BASH:
if [[ "$line" =~ Userlist:[[:blank:]] ]]
OR just avoid using regex altogether and use glob matching:
if [[ "$line" == *"Userlist: "* ]]

Related

Is it possible to do an OR in a bash regular expression?

I know I can use grep, awk etc, but I have a large set of bash scripts that have some conditional statements using =~ like this:
#works
if [[ "bar" =~ "bar" ]]; then echo "match"; fi
If I try and get it to do a logical OR, I can't get it to match:
#doesn't work
if [[ "bar" =~ "foo|bar" ]]; then echo "match"; fi
or perhaps this...
#doesn't work
if [[ "bar" =~ "foo\|bar" ]]; then echo "match"; fi
Is it possible to get a logical OR using =~ or should I switch to grep?
You don't need a regex operator to do an alternate match. The [[ extended test operator allows extended pattern matching options using which you can just do below. The +(pattern-list) provides a way to match one more number of patterns separated by |
[[ bar == +(foo|bar) ]] && echo match
The extended glob rules are automatically applied when the [[ keyword is used with the == operator.
As far as the regex part, with any command supporting ERE library, alternation can be just done with | construct as
[[ bar =~ foo|bar ]] && echo ok
[[ bar =~ ^(foo|bar)$ ]] && echo ok
As far why your regex within quotes don't work is because regex parsing in bash has changed between releases 3.1 and 3.2. Before 3.2 it was safe to wrap your regex pattern in quotes but this has changed in 3.2. Since then, regex should always be unquoted.
You should protect any special characters by escaping it using a backslash. The best way to always be compatible is to put your regex in a variable and expand that variable in [[ without quotes. Also see Chet Ramey's Bash FAQ, section E14 which explains very well about this quoting behavior.

Regex not matching name in filepath

I have a folder with ipa files. I need to identify them by having a appstore or enterprise in the filename.
mles:drive-ios-swift mles$ ls build
com.project.drive-appstore.ipa
com.project.test.swift.dev-enterprise.ipa
com.project.drive_v2.6.0._20170728_1156.ipa
I've tried:
#!/bin/bash -veE
fileNameRegex="**appstore**"
for appFile in build-test/*{.ipa,.apk}; do
if [[ $appFile =~ $fileNameRegex ]]; then
echo "$appFile Matches"
else
echo "$appFile Does not match"
fi
done
However nothing matches:
mles:drive-ios-swift mles$ ./test.sh
build-test/com.project.drive-appstore.ipa Does not match
build-test/com.project.drive_v2.6.0._20170728_1156.ipa Does not match
build-test/com.project.test.swift.dev-enterprise.ipa Does not match
build-test/*.apk Does not match
How would the correct script look like to match build-test/com.project.drive-appstore.ipa?
You are confusing between the glob string match with a regex match. For a greedy glob match like * you can just use the test operator with ==,
#!/usr/bin/env bash
fileNameGlob='*appstore*'
# ^^^^^^^^^^^^ Single quote the regex string
for appFile in build-test/*{.ipa,.apk}; do
# To skip non-existent files
[[ -e $appFile ]] || continue
if [[ $appFile == *${fileNameGlob}* ]]; then
echo "$appFile Matches"
else
echo "$appFile Does not match"
fi
done
produces a result
build-test/com.project.drive_v2.6.0._20170728_1156.ipa Does not match
build-test/com.project.drive-appstore.ipa Matches
build-test/com.project.test.swift.dev-enterprise.ipa Does not match
(or) with a regex use greedy match .* as
fileNameRegex='.*appstore.*'
if [[ $appFile =~ ${fileNameRegex} ]]; then
# rest of the code
That said to match your original requirement to match enterprise or appstore string in file name use extended glob matches in bash
Using glob:
shopt -s nullglob
shopt -s extglob
fileExtGlob='*+(enterprise|appstore)*'
if [[ $appFile == ${fileExtGlob} ]]; then
# rest of the code
and with regex,
fileNameRegex2='enterprise|appstore'
if [[ $appFile =~ ${fileNameRegex2} ]]; then
# rest of the code
You can use the following regex to match appstore and enterprise in a filename:
for i in build-test/*; do if [[ $i =~ appstore|enterprise ]]; then echo $i; fi; done

Regex in a bash scipt

I've got the following text file which contains:
12.3-456, test
test test test
If the line contains xx.x-xxx, then I want to print the line out. (X's are numbers)
I think I have the correct regex and have tested it here:
http://regexr.com/3clu3
I have then used this in a bash script but the line containing the text is not printed out.
What have I messed up?
#!/bin/bash
while IFS='' read -r line || [[ -n "$line" ]]; do
if [[ $line =~ /\d\d.\d-\d\d\d,/g ]]; then
echo $line
fi
done < input.txt
You need to use [0-9] instead of a \d in Bash regex. No regex delimiters are necessary, and the global flag is not necessary either. Also, you can contract it a bit using limiting quantifiers (like {3} that will match 3 occurrences of the pattern next to it). Besides, a dot matches any character in regex, so you need to escape it if you want to match a literal dot symbol.
Use
regex="[0-9]{2}\.[0-9]-[0-9]{3},"
if [[ $line =~ $regex ]]
...
This works:
#!/bin/bash
#regex="/\d\d.\d-\d\d\d,/g"
regex="[0-9\.\-]+\, [A-Za-z]+"
while IFS='' read -r line || [[ -n "$line" ]]; do
echo $line
if [[ $line =~ $regex ]]; then
echo "match"
fi
done
regex is [any of 0-9, '.', '-'] followed by ',' followed by alphachars. This could be refined in a number of ways - e.g. explicit places before/ after '-'.
Testing indicates:
$ ./sqltrace2.sh < input.txt
12.3-456, test
match
123.3-456, test
match
12.3-456,
test test test
test test test

Regular Expression won't work in bash, works in other tools

I have the following string:
Started GET "/stuff/search?search_string=Actin&organism_id=9&advanced_design=false&user_ip=172.16.0.1&filter=" for 172.16.0.4 at 2015-06-30 13:58:26 +0200
Parameters: {"search_string"=>"Actin", "organism_id"=>"9", "advanced_design"=>"false", "user_ip"=>"172.16.0.1", "filter"=>""}
Started GET "/stuff/search?search_string=NM_001101&organism_id=9&advanced_design=false&user_ip=172.16.0.1&filter=" for 172.16.0.4 at 2015-06-30 14:00:39 +0200
Parameters: {"search_string"=>"NM_001101", "organism_id"=>"9", "advanced_design"=>"false", "user_ip"=>"172.16.0.1", "filter"=>""}
Started GET "/stuff/search?search_string=ENST00000331789&organism_id=9&advanced_design=false&user_ip=172.16.0.1&filter=" for 172.16.0.4 at 2015-06-30 14:00:49 +0200
Parameters: {"search_string"=>"ENST00000331789", "organism_id"=>"9", "advanced_design"=>"false", "user_ip"=>"172.16.0.1", "filter"=>""}
and I want to extract the value of the "search_string" key. I need to do this in a bash script. For this I have came up with the following regular expression:
"\{(\"search_string\"\=\>\")([a-zA-Z0-9.\-_]+)(.*?)\}"
I have tested this on multiple online regular expression testers, like rubular or regex101.com and it works fine there. However, in bash, the regex does not match the text.
Here is my script (i have cut off the text for this question, but normally the text in a file which i am grep-ing):
#!/bin/bash
regex="\{(\"search_string\"\=\>\")([a-zA-Z0-9.\-_]+)(.*?)\}"
string='{"search_string"=>"NM_001101"}'
if [[ $string =~ $regex ]]
then
echo "OK"
else
echo "not OK"
fi
filename="/some/path/search.txt"
if [ -f "$filename" ]
then
result=$(grep -F "$regex" "$filename")
echo "$result"
else
echo "$filename is not a file or it does not exist"
fi
In this case, the script returns "not OK".
Obviously, the script is not ready yet as I am stuck with this regular expression. What am I doing wrong ?
Thanks!
Just escape all the backslashes other than the one before double quotes one more time.
regex="\\{\"search_string\"=>\"[a-zA-Z0-9._-]+(.*?)\\}"
string='{"search_string"=>"NM_001101"}'
echo $regex
if [[ $string =~ $regex ]]
then
echo "OK"
else
echo "not OK"
fi
IDEONE
This regex works in awk, so you could make some modifications to your script and use awk for the matching. awk readlines lines from stdin or every line of a file by default, and regex are enclosed like "//", commands are enclosed like "{}". Here I echoed your example, piped the stdin to awk and used the command "print ok" to test if the regex was matched. I think you can take this piece of code to make your script work the way you want in bash.
~$ echo '{"search_string"=>"NM_001101"}' | awk '/\{(\"search_string\"\=\>\")([a-zA-Z0-9.\-_]+)(.*?)\}/{print "ok"}'
ok

Regex and if in shell script

my programs starts some services and store its output in tmp variable and I want to match the variable's content if it starts with FATAL keyword or not? and if it contains I will print Port in use using echo command
For example if tmp contains FATAL: Exception in startup, exiting.
I can do it by sed: echo $tmp | sed 's/^FATAL.*/"Port in use"/'
but I want to use the builtin if to match the pattern.
How can I use the shell built in features to match REGEX?
POSIX shell doesn't have a regular expression operator for UNIX ERE or PCRE. But it does have the case keyword:
case "$tmp" in
FATAL*) doSomethingDrastic;;
*) doSomethingNormal;;
esac
You didn't tag the question bash, but if you do have that shell you can do some other kinds of pattern matching or even ERE:
if [[ "$tmp" = FATAL* ]]; then
…
fi
or
if [[ $tmp =~ ^FATAL ]]; then
…
fi
if [ -z "${tmp%FATAL*}" ]
then echo "Start with"
else
echo "Does not start with"
fi
work on KSH, BASH under AIX. Think it's also ok under Linux.
It's not a real regex but the limited regex used for file matching (internal to the shell, not like sed/grep/... that have their own version inside) of the shell. So * and ? could be used