in a bash script, I have :
mkv="xxxx E05 xxxx"
if [[ $mkv =~ E[0-9]{2} ]] ; then echo FOUND; fi
good. this tells me if $mkv matches against E[0-9]{2}, but this is not what I want.
I want to get the matching string (i.e. 05 in my example)
I put a reference () in my regexp, hoping I'd be able to get it later, but I could not.
I tried :
if [[ $mkv =~ E([0-9]{2}) ]] ; then echo FOUND $1; fi
if [[ $mkv =~ E([0-9]{2}) ]] ; then echo FOUND \1; fi
etc... but all of them failed
thanks !
You can use the BASH_REMATCH array to get the parts that matched:
if [[ $mkv =~ E([0-9]{2}) ]] ; then echo FOUND ${BASH_REMATCH[1]} ; fi
${BASH_REMATCH[0]} will contain the whole/full match (Exx), ${BASH_REMATCH[1]} the first captured group (only the digits here).
Related
For some reason, the following regular expression match doesn't seem to be working.
string="#Hello world";
[[ "$string" =~ 'ello' ]] && echo "matches";
[[ "$string" =~ 'el.o' ]] && echo "matches";
The first command succeeds (as expected), but the second one does not.
Shouldn't that period be treated by the regular expression as a single character?
Quoting the period causes it to be treated as a literal character, not a regular-expression metacharacter. Best practice if you want to quote the entire regular expression is to do so in a variable, where regular expression matching rules aren't in effect, then expand the parameter unquoted (which is safe to do inside [[ ... ]]).
regex='el.o'
[[ "$string" =~ $regex ]] && echo "matches"
string="#Hello world";
[[ "$string" =~ ello ]] && echo "matches";
[[ "$string" =~ el.o ]] && echo "matches";
Test
$ string="hh elxo fj"
$ [[ "$string" =~ el.o ]] && echo "matches";
matches
I have a value in a variable that may be absolute or relative url, and I need to check which one it is.
I have found that there's a =~ operator in [[, but I can't get it to work. What am I doing wrong?
url="http://test"
if [[ "$url" =~ "^http://" ]];
then echo "absolute.";
fi;
You need to use regex without quote:
url="http://test"
if [[ "$url" =~ ^http:// ]]; then
echo "absolute."
fi
This outputs `absolute. as regex needs to be without quote in newer BASH (after BASH v3.1)
Or avoid regex and use glob matching:
if [[ "$url" == "http://"* ]]; then
echo "absolute."
fi
I a regular expression to match a date on the form 01/Jan/2000:23:59:59. I managed to match the pattern using Notepad++'s regex interpreter, using the following regex:
[1-3][0-9]/[A-Z][a-z]{2}/(19|20)[0-9]{2}:[0-9]{2}:[0-5][0-9]:[0-5][0-9]
Unfortunately, I need to do this with bash. AWK is not an option right now, I'm afraid. So, I tried to convert the above regex into something that bash would interpret in the same way. Thus far, I've come up with this:
[1-3][0-9]/[A-Z][a-z]\{2\}/(19|20)[0-9]\{2\}:[0-2][0-9]:[0-5][0-9]:[0-5][0-9]
The full command I'm using is
expr "$line" : '\([1-3][0-9]/[A-Z][a-z]\{2\}/(19|20)[0-9]\{2\}:[0-2][0-9]:[0-5][0-9]:[0-5][0-9]\)'
where $line contains the string out of which I need to extract the date. Unfortunately my bash version of the regex doesn't work. I have tried different things, like escaping / and :, but I can't seem to get it to work. What am I doing wrong?
The only problem was your first pattern [1-3]. It should be [0-3].
[[ $DATE =~ [0-3][0-9]/[A-Z][a-z]{2}/(19|20)[0-9]{2}:[0-9]{2}:[0-5][0-9]:[0-5][0-9] ]]
Also, on some earlier versions of Bash you have to store it on a variable:
RE='[0-3][0-9]/[A-Z][a-z]{2}/(19|20)[0-9]{2}:[0-9]{2}:[0-5][0-9]:[0-5][0-9]'
[[ $DATE =~ $RE ]]
Example:
> DATE='01/Jan/2000:23:59:59'
> [[ $DATE =~ [0-3][0-9]/[A-Z][a-z]{2}/(19|20)[0-9]{2}:[0-9]{2}:[0-5][0-9]:[0-5][0-9] ]] && echo Match.
Match.
Bash 3.0:
> echo "$BASH_VERSION"
3.00.0(1)-release
> DATE='01/Jan/2000:23:59:59'
> RE='[0-3][0-9]/[A-Z][a-z]{2}/(19|20)[0-9]{2}:[0-9]{2}:[0-5][0-9]:[0-5][0-9]'
> [[ $DATE =~ $RE ]] && echo Match.
Match.
If you want to apply it on a loop, you can have something like this:
RE='[0-3][0-9]/[A-Z][a-z]{2}/(19|20)[0-9]{2}:[0-9]{2}:[0-5][0-9]:[0-5][0-9]'
while read -r LINE; do
[[ $LINE =~ $RE ]] && echo "Match: $LINE"
done < date_list.txt
By the way, if you want to exactly match the whole word only use add ^ and $ at the beginning and the end of pattern:
[[ $DATE =~ ^[0-3][0-9]/[A-Z][a-z]{2}/(19|20)[0-9]{2}:[0-9]{2}:[0-5][0-9]:[0-5][0-9]$ ]]
To extract matches on the line use () and BASH_REMATCH:
[[ $DATE =~ .*([0-3][0-9]/[A-Z][a-z]{2}/(19|20)[0-9]{2}:[0-9]{2}:[0-5][0-9]:[0-5][0-9]).* ]] && echo "${BASH_REMATCH[1]}"
What is the correct way to escape a dollar sign in a bash regex? I am trying to test whether a string begins with a dollar sign. Here is my code, in which I double escape the dollar within my double quotes expression:
echo -e "AB1\nAB2\n\$EXTERNAL_REF\nAB3" | while read value;
do
if [[ ! $value =~ "^\\$" ]];
then
echo $value
else
echo "Variable found: $value"
fi
done
This does what I want for one box which has:
GNU bash, version 3.2.25(1)-release (x86_64-redhat-linux-gnu)
And the verbose output shows
+ [[ ! $EXTERNAL_REF =~ ^\$ ]]
+ echo 'Variable found: $EXTERNAL_REF'
However, on another box which uses
GNU bash, version 4.1.2(1)-release (x86_64-redhat-linux-gnu)
The comparison is expanded as follows
+ [[ ! $EXTERNAL_REF =~ \^\\\$ ]]
+ echo '$EXTERNAL_REF'
Is there a standard/better way to do this that will work across all implementations?
Many thanks
Why do you use a regular expression here? A glob is enough:
#!/bin/bash
while read value; do
if [[ "$value" != \$* ]]; then
echo "$value"
else
echo "Variable found: $value"
fi
done < <(printf "%s\n" "AB1" "AB2" '$EXTERNAL_REF' "AB3")
Works here with shopt -s compat32.
The regex doesn't need any quotes at all. This should work:
if [[ ! $value =~ ^\$ ]];
I would replace the double quotes with single quotes and remove a single \ and have the changes as below
$value =~ "^\\$"
can also be used as
$value =~ '^\$'
I never found the solution either, but for my purposes, I settled on the following workaround:
if [[ "$value" =~ ^(.)[[:alpha:]_][[:alnum:]_]+\\b && ${BASH_REMATCH[1]} == '$' ]]; then
echo "Variable found: $value"
else
echo "$value"
fi
Rather than trying to "quote" the dollar-sign, I instead match everything around it and I capture the character where the dollar-sign should be to do a direct-string comparison on. A bit of a kludge, but it works.
Alternatively, I've taken to using variables, but just for the backslash character (I don't like storing the entire regex in a variable because I find it confusing for the regex to not appear in the context where it's used):
bs="\\"
string="test\$test"
if [[ "$string" =~ $bs$ ]]; then
echo "output \"$BASH_REMATCH\""
fi
I can run a while loop with regex successfully
$ cat while.sh
#!/bin/sh
arr=(a1c a2c a3c b4c)
i=0
while [[ ${arr[i]} =~ a(.)c ]]
do
echo ${BASH_REMATCH[1]}
((i++))
done
$ ./while.sh
1
2
3
A for loop causes this error
$ cat for.sh
#!/bin/sh
arr=(a1c a2c a3c b4c)
for ((i=0; [[ ${arr[i]} =~ a(.)c ]]; i++))
do
echo ${BASH_REMATCH[1]}
done
$ ./for.sh
./for.sh: line 3: ((: [[ a1c =~ a(.)c ]]: syntax error: operand expected (error
token is "[[ a1c =~ a(.)c ]]")
To follow-up on my comment above, if you wanted to keep your formatting, more or less, this might do what you expected, because you need the return value of the evaluation as an expression in the for loop, not the output of it.
for ((i=0; `[[ ${arr[i]} =~ a(.)c ]] && echo -n 1`; i++)); do
# do whatever
done
Ugly, but worked for me, and should explain your error for you. The back-ticks evaluate the expression then output either a '1' for true or nothing for false. This leaves you the valid conditional for the loop in the middle.
I'm not sure your for loop construct is legal with regex. Double parenthesis are for Arithmetic expressions and that includes for loops. Regex matching is not arithmetic. I think if you were really set on using for for some reason, you would have to do something like:
arr=(a1c a2c a3c b4c)
for val in "${arr[#]}"; do
if [[ $val =~ a(.)c ]]; then
echo ${BASH_REMATCH[1]}
fi
done