Multiline Bash Regex Match - regex

I've been reading through tons of threads about this, but none have helped me yet.
This is a sample of my text:
[userId:#"1"
userName:#""
userPos:#[#"11006321C", ]
userDisp:#[#"4200012FD6", ]];
[userId:#"2"
userName:#""
userPos:#[#"412520084C",
#"7200851",
#"54720021",
]
userDisp:#[#"230035FD6",
#"3213456432C0035FD6",
#"1F200538D5",
]];
I'm trying to capture this:
userPos:#[#"11006321C", ]
userDisp:#[#"4200012FD6", ]]
and
userPos:#[#"412520084C",
#"7200851",
#"54720021",
]
userDisp:#[#"230035FD6",
#"3213456432C0035FD6",
#"1F200538D5",
]]
(All matches from the text) using regex: userPos:#\[((?:.\r?\n?)*)\]
Trying it in bash using:
for string in $file # text has been read into this variable
do
[[ $word =~ $regex ]]
if [[ ${BASH_REMATCH[0]} ]]
then
string="${x:+x }${BASH_REMATCH[0]}"
userlist+=("$string")
echo "$string"
fi
done
To append them to a list.
But this doesn't work since the regex matches noting at all. I know there are different kinds of Regex engines and stuff, and I've tried so many different regexes for this to work in bash, but can't seem to get it to work.
Anyone who could help me capture what I want in bash?

The regex you're looking for is userPos:([^;]*)].
regex="userPos:([^;]*)]"
while [[ $text =~ $regex ]]
do
string="${x:+x }${BASH_REMATCH[0]}"
userlist+=("$string")
echo "$string"
text=${text#*"${BASH_REMATCH[0]}"}
done
$text is your text.

Related

Is it possible to do an OR in a bash regular expression?

I know I can use grep, awk etc, but I have a large set of bash scripts that have some conditional statements using =~ like this:
#works
if [[ "bar" =~ "bar" ]]; then echo "match"; fi
If I try and get it to do a logical OR, I can't get it to match:
#doesn't work
if [[ "bar" =~ "foo|bar" ]]; then echo "match"; fi
or perhaps this...
#doesn't work
if [[ "bar" =~ "foo\|bar" ]]; then echo "match"; fi
Is it possible to get a logical OR using =~ or should I switch to grep?
You don't need a regex operator to do an alternate match. The [[ extended test operator allows extended pattern matching options using which you can just do below. The +(pattern-list) provides a way to match one more number of patterns separated by |
[[ bar == +(foo|bar) ]] && echo match
The extended glob rules are automatically applied when the [[ keyword is used with the == operator.
As far as the regex part, with any command supporting ERE library, alternation can be just done with | construct as
[[ bar =~ foo|bar ]] && echo ok
[[ bar =~ ^(foo|bar)$ ]] && echo ok
As far why your regex within quotes don't work is because regex parsing in bash has changed between releases 3.1 and 3.2. Before 3.2 it was safe to wrap your regex pattern in quotes but this has changed in 3.2. Since then, regex should always be unquoted.
You should protect any special characters by escaping it using a backslash. The best way to always be compatible is to put your regex in a variable and expand that variable in [[ without quotes. Also see Chet Ramey's Bash FAQ, section E14 which explains very well about this quoting behavior.

How can I run a regex against a filename?

In a list of files:
javascript-custom-rules-plugin-1.0-SNAPSHOT.jar
README.txt
sonar-build-breaker-plugin-2.0.jar
sonar-javascript-plugin-2.11.jar
tmo-custom-rules-1.0.jar
I am attempting to match these filenames by regex.
My Script
#!/usr/bin/env bash
install_location=/usr/local/sonar/extensions/plugins
for f in $(ls -1 $install_location)
do
# remove any previous versions of this plugin
if [[ "$f" =~ ".*tmo-custom-rules-(.+)\.jar" ]]
then
echo "found $f. will remove"
else
echo "$f doesn't match"
fi
done
I've tried if [[ "$f" =~ ".*tmo-custom-rules-(.+)\.jar" ]] and if [[ "$f" == *"tmo-custom-rules" ]] to no avail.
I'm getting
javascript-custom-rules-plugin-1.0-SNAPSHOT.jar doesn't match
README.txt doesn't match
sonar-build-breaker-plugin-2.0.jar doesn't match
sonar-javascript-plugin-2.11.jar doesn't match
tmo-custom-rules-1.0.jar doesn't match
when I expect found tmo-custom-rules-1.0.jar. will remove
I've run my regular expression through many regular expression testers with the data above, and they all return the correct matches, but I can't get it to work here in my script.
How can I loop through, and check to see if any files matches this regular expression?
In BASH regex must be unquoted so this should work:
[[ $f =~ .*tmo-custom-rules-(.+)\.jar ]]
Or better:
re=".*tmo-custom-rules-(.+)\.jar"
[[ $f =~ $re ]]
However you don't even need regex and can use shell glob matching:
#!/usr/bin/env bash
install_location=/usr/local/sonar/extensions/plugins
for f in "$install_location"/*
do
# remove any previous versions of this plugin
if [[ $f == *tmo-custom-rules-*.jar ]]
then
echo "found $f. will remove"
else
echo "$f doesn't match"
fi
done
Note that you can avoid using output of ls which is not always fit for scripting.
You can do this with expr using the colon operator:
if expr "$f" : '.*tmo-custom-rules-.*\.jar' > /dev/null; then
echo matches
fi
Note that the regular expressions in this context are assumed to be anchored to the start of the line.

bash if [[ =~ regex compare not working?

I have a value in a variable that may be absolute or relative url, and I need to check which one it is.
I have found that there's a =~ operator in [[, but I can't get it to work. What am I doing wrong?
url="http://test"
if [[ "$url" =~ "^http://" ]];
then echo "absolute.";
fi;
You need to use regex without quote:
url="http://test"
if [[ "$url" =~ ^http:// ]]; then
echo "absolute."
fi
This outputs `absolute. as regex needs to be without quote in newer BASH (after BASH v3.1)
Or avoid regex and use glob matching:
if [[ "$url" == "http://"* ]]; then
echo "absolute."
fi

Perl Regex not working in Bash script?

I have a regular expression:
^(.+?)(\.[^.]+$|$)
which separates a file name and the file extension (if there is one)
http://movingtofreedom.org/2008/04/01/regex-match-filename-base-and-extension/
Works perfectly fine in Perl
Say $FILE ='.myfile.form.txt'
$1 is '.myfile.form' and
$2 is '.txt', as they should be
I know Bash regex and Perl regex aren't the same, but I've never had a problem with Bash Rematching until now
But when I try to use in in a Bash script as, say...
FILE='.myfile.form.txt'
[[ $FILE =~ ^(.+?)(\.[^.]+$|$) ]]
${BASH_REMATCH[1]} will just have the entire file name (.myfile.form.txt), and nothing in ${BASH_REMATCH[2]}
I'm wondering what's wrong/going on here
Thanks for any help!
regex(7) which is referenced by regex(3) which is referenced by bash(1) makes no mention of greediness modifiers. Your pattern cannot be implemented in bash regex.
This doesn't mean you can't achieve what you want, though.
[[ $FILE =~ ^(.+)(\.[^.]*)$ ]] || [[ $FILE =~ ^(.*)()$ ]]
file="${BASH_REMATCH[1]}"
ext="${BASH_REMATCH[2]}"
Or something more straightforward like
if [[ $FILE =~ ^(.+)(\.[^.]*)$ ]]; then
file="${BASH_REMATCH[1]}"
ext="${BASH_REMATCH[2]}"
else
file="$FILE"
ext=""
fi

Regular expression for extracting date

I a regular expression to match a date on the form 01/Jan/2000:23:59:59. I managed to match the pattern using Notepad++'s regex interpreter, using the following regex:
[1-3][0-9]/[A-Z][a-z]{2}/(19|20)[0-9]{2}:[0-9]{2}:[0-5][0-9]:[0-5][0-9]
Unfortunately, I need to do this with bash. AWK is not an option right now, I'm afraid. So, I tried to convert the above regex into something that bash would interpret in the same way. Thus far, I've come up with this:
[1-3][0-9]/[A-Z][a-z]\{2\}/(19|20)[0-9]\{2\}:[0-2][0-9]:[0-5][0-9]:[0-5][0-9]
The full command I'm using is
expr "$line" : '\([1-3][0-9]/[A-Z][a-z]\{2\}/(19|20)[0-9]\{2\}:[0-2][0-9]:[0-5][0-9]:[0-5][0-9]\)'
where $line contains the string out of which I need to extract the date. Unfortunately my bash version of the regex doesn't work. I have tried different things, like escaping / and :, but I can't seem to get it to work. What am I doing wrong?
The only problem was your first pattern [1-3]. It should be [0-3].
[[ $DATE =~ [0-3][0-9]/[A-Z][a-z]{2}/(19|20)[0-9]{2}:[0-9]{2}:[0-5][0-9]:[0-5][0-9] ]]
Also, on some earlier versions of Bash you have to store it on a variable:
RE='[0-3][0-9]/[A-Z][a-z]{2}/(19|20)[0-9]{2}:[0-9]{2}:[0-5][0-9]:[0-5][0-9]'
[[ $DATE =~ $RE ]]
Example:
> DATE='01/Jan/2000:23:59:59'
> [[ $DATE =~ [0-3][0-9]/[A-Z][a-z]{2}/(19|20)[0-9]{2}:[0-9]{2}:[0-5][0-9]:[0-5][0-9] ]] && echo Match.
Match.
Bash 3.0:
> echo "$BASH_VERSION"
3.00.0(1)-release
> DATE='01/Jan/2000:23:59:59'
> RE='[0-3][0-9]/[A-Z][a-z]{2}/(19|20)[0-9]{2}:[0-9]{2}:[0-5][0-9]:[0-5][0-9]'
> [[ $DATE =~ $RE ]] && echo Match.
Match.
If you want to apply it on a loop, you can have something like this:
RE='[0-3][0-9]/[A-Z][a-z]{2}/(19|20)[0-9]{2}:[0-9]{2}:[0-5][0-9]:[0-5][0-9]'
while read -r LINE; do
[[ $LINE =~ $RE ]] && echo "Match: $LINE"
done < date_list.txt
By the way, if you want to exactly match the whole word only use add ^ and $ at the beginning and the end of pattern:
[[ $DATE =~ ^[0-3][0-9]/[A-Z][a-z]{2}/(19|20)[0-9]{2}:[0-9]{2}:[0-5][0-9]:[0-5][0-9]$ ]]
To extract matches on the line use () and BASH_REMATCH:
[[ $DATE =~ .*([0-3][0-9]/[A-Z][a-z]{2}/(19|20)[0-9]{2}:[0-9]{2}:[0-5][0-9]:[0-5][0-9]).* ]] && echo "${BASH_REMATCH[1]}"