Regular Expression in Bash Script - regex

Hello awesome community,
I'm a complete dope when it comes to regex. I've put off learning it.. and now my laziness has caught up with me.
What I'm trying to do:
Check if a string matches this format:
10_06_13
ie. Todays date, or a similar date with "2digits_2digits_2digits"
What I've done:
regex='([0-9][0-9][_][0-9][0-9][_][0-9][0-9])'
if [[ "$incoming_string" =~ $regex ]]
then
# Do awesome stuff here
fi
This works to a certain extent. But when the incoming string equals 011_100_131 ... it still passes the regex check.
I'd be grateful if anyone could help point me in the right direction.
Cheers

=~ succeeds if the string on the left contains a match for the regex on the right. If you want to know if the string matches the regex, you need to "anchor" the regex on both sides, like this:
regex='^[0-9][0-9][_][0-9][0-9][_][0-9][0-9]$'
if [[ $incoming_string =~ $regex ]]
then
# Do awesome stuff here
fi
The ^ only succeeds at the beginning of the string, and the $ only succeeds at the end.
Notes:
I removed the unnecessary () from the regex and "" from the [[ ... ]].
The bash manual is poorly worded, since it says that =~ succeeds if the string matches.

Related

Why does bash "=~" operator ignore the last part of the pattern specified?

I am trying to do compare a string in bash to a regex pattern and have found something odd. For starters I am using GNU bash, version 5.0.17(1)-release (x86_64-pc-linux-gnu). This is within WSL.
For example here is sample program demonstrating the problem:
#!/bin/env bash
name="John"
if [[ "${name}" =~ "John"* ]]; then
echo "found"
else
echo "not found"
fi
exit
As expected this will echo found since the name "John" matches the regex pattern described. Now what I find odd is if I drop the n in John, it still echos found. Imo "Joh" does match the pattern of "John"*.
If you drop the "hn" and just set $name to "Jo" then it echos not found. It seems to only affect the last character in the Regex pattern (aside from the wildcard).
I am converting an old csh script to bash and this behavior is not happening in csh. What is causing bash to do this?
You're mixing up syntax for shell patterns and regular expressions. Your regular expression, after stripping the quoting, is John*: Joh followed by any number of n, including 0. Matches Joh, John, Johnn, Johnnn, ...
It's not anchored, so it also matches any string containing one of the matches above.
Since it's not anchored, depending on what you want, you could do any of these:
Any string containing John should match:
Regex: [[ $name =~ John ]]
Shell pattern: [[ $name == *John* ]]
Any string that begins with John should match:
Regex: [[ $name =~ ^John ]]
Shell pattern: [[ $name == John* ]]
Notice that shell patterns, unlike the regular expressions, must match the entire string.
A note on quoting: within [[ ... ]], the left-hand side doesn't have to be quoted; on the right-hand side, quoted parts are interpreted literally. For regular expressions, it's a good practice to define it in a separate variable:
re='^John'
if [[ $name =~ $re ]]; then
This avoids a few edge cases with special characters in the regex.
The =~ operator compares using regular expression syntax, not glob syntax. The * isn't a shell wildcard, it means, "the previous character, 0 or more times".
The string Joh matches the regular expression John* because it contains Joh followed by zero n characters.

Bash Regex to extract everything between the last occurrence of a string (release-) and some characters (--)

I have multiple strings, where I want to extract everything between the last occurrence of a string (release-) and some characters (--). More specifically, for a sting like the following:
inte_integration-abc-abcde-abcdefg-release-release-PI_4.1-Sprint-3.1a--1.0.2-RELEASE
I want to have the following output:
PI_4.1-Sprint-3.1a
I created a regex online, which you can find here. There regex is the following:
.*release-(.*)--.*
However, when I am trying to use this script into a bash script, it wont work. Here is an example.
artifactoryVersion="inte_integration-abc-abcde-abcdefg-release-release-PI_4.1-Sprint-3.1a--1.0.2-RELEASE"
[[ "$artifactoryVersion" =~ (.*release-(.*)--.*) ]]
echo $BASH_REMATCH[0]
echo $BASH_REMATCH[1]
Will return:
inte_integration-abc-abcde-abcdefg-release-release-PI_4.1-Sprint-3.1a--1.0.2-RELEASE[0]
inte_integration-abc-abcde-abcdefg-release-release-PI_4.1-Sprint-3.1a--1.0.2-RELEASE[1]
Do you have any ideas about how can I accomplish my goal in bash?
You may use:
s='inte_integration-abc-abcde-abcdefg-release-release-PI_4.1-Sprint-3.1a--1.0.2-RELEASE'
rx='.*-release-(.*)--'
[[ $s =~ $rx ]] && echo "${BASH_REMATCH[1]}"
PI_4.1-Sprint-3.1a
Code Demo
Your regex appears correct but make sure to use "${BASH_REMATCH[1]}" to extract first capture group in the result.
You need to use the following:
#!/bin/bash
artifactoryVersion="inte_integration-abc-abcde-abcdefg-release-release-PI_4.1-Sprint-3.1a--1.0.2-RELEASE"
if [[ "$artifactoryVersion" =~ .*release-(.*)-- ]]; then
echo ${BASH_REMATCH[1]};
fi
See the online demo
Output:
PI_4.1-Sprint-3.1a
With your shown samples please try following BASH code with regex. I have also mentioned comments before executing each statement to understand each statement here.
##Shell variable named var being created here.
var="inte_integration-abc-abcde-abcdefg-release-release-PI_4.1-Sprint-3.1a--1.0.2-RELEASE"
##Mentioning regex which needs to be checked on later in program.
regex="(.*release-release)-(.*)--"
##Check condition on var variable with regex if match found then print 2nd capturing group value.
[[ $var =~ $regex ]] && echo "${BASH_REMATCH[2]}"
Explanation of regex: Following is the detailed explanation for used regex.
regex="(.*release-release)-(.*)--": Creating shell variable named regex in which putting regular expression (.*release-release)-(.*)--.
Where regex is creating 2 capturing groups.
First matching everything till release-release(with greedy match), which is followed by a -(not captured anywhere).
Which is followed by a greedy match, which will basically match everything before -- to get the exactly needed value.
You can also do it with shell parameter expansions (it's slower than a bash regex but it's standard):
artifactoryVersion='inte_integration-abc-abcde-abcdefg-release-release-PI_4.1-Sprint-3.1a--1.0.2-RELEASE'
result=${artifactoryVersion##*-release-}
result=${result%%--*}
printf %s\\n "$result"
PI_4.1-Sprint-3.1a
Or directly with a bash parameter expansion and extended globing:
#!/bin/bash
shopt -s extglob
artifactoryVersion='inte_integration-abc-abcde-abcdefg-release-release-PI_4.1-Sprint-3.1a--1.0.2-RELEASE'
echo "${artifactoryVersion//#(*-release-|--*)}"
PI_4.1-Sprint-3.1a

Bash regular expression with quotes

I am writing a script and I want to check a variable for a format. This is the function I use :
check_non_numeric() {
#re='^\".*\"$'
re='\[^\]*\'
if ! [[ $1 =~ $re ]] ; then
echo "'$1' is not a valid format - \"[name]\" "
exit 1
fi
}
I want the regular expression to match a string with anything but quotation mark inside and quotation marks around it ("a" or "string" or "dsfo!^$**#"). The problem is that these regular expressions that I came up with dont work for me. I have used a very similar function to check if a variable is an integer or float and it worked there. Could you please tell me what the regular expression in question should be ?
Thank you very much
I'm assuming you meant you want to match anything that is not a string surrounded by quotes. It's easier to match use your regex to match, and the bash-test to "not" match it-- if that's not clear, use !. Here's a couple of ways to do it.
if [[ ! $(expr "$string" : '\".*\"') -gt 0 ]]; then echo "expr good"; fi
if [[ ! "$string" =~ \".*\" ]]; then echo "test good"; fi
Make sure you quote your variable you are testing with expr (which is there for edification purposes only).
As you want to match anything except string with quotation marks, you just target the quotation mark:
re='["]'
if [[ ! $1 =~ $re ]] ; then
Actually you don't need regex for this. Globbing will be enough:
if [[ ! $1 = *\"* ]]; then
...
fi
Your regex is very, very far off. \[ matches a literal left square bracket, and ^ (outside a character class) matches beginning of line.
Something like '^"[^"]*"' should work, if that's really what you want.
However, I kind of doubt that. In order to pass a value in literal double quotes, you would need something like
yourprogram '"value"'
or
yourprogram "\"value\""
which I would certainly want to avoid if I were you.

Using Bash regex match to test membership

Below, I use ALLOWED as container to test a token.
I am using a Bash regex match syntax =~ where the right hand side should be an extended regular expression.
In Bash's Regular Expression Matching. Using the operator =~, the left hand side operand is matched against the extended regular expression (ERE) on the right hand side. Check a related question on using date regex.
But I can't see str1 as a regex and I don't know why ALLOWED matches a string which is present inside it. Even as this works in this case, having regex (str1) as the test string leaves it open for tricky bugs in future.
export ALLOWED="str0 str1 strn"
export STR1="str1"
export STR2="str2"
if [[ $ALLOWED =~ ${STR1} ]]; then
echo "how does it this work?"
fi
if [[ $ALLOWED =~ ${STR2} ]]; then
:
else
echo "does not work."
fi
Questions:
Why/ How does this work?
What's a better to do test for an element in a list in bash?
The syntax is content =~ regex, for example think about how this simple phone number validation works
$ phone="555-443-2321"; if [[ $phone =~ [0-9-]+ ]]; then echo PASS; fi
as in your example, the right hand side is the regular expression and left hand side is the content.
Your regex can be a string literal, then the check will be whether content contains that substring
$ phone="555-443-2321"; if [[ $phone =~ "555" ]]; then echo PASS; fi
if it makes it easier for you think that as a regex for .*555.*
If I understand right, the confusion is because $a =~ $b checks whether there's a match for $b in $a, not whether $a as a whole matches. [[ "str0 str1 strn" =~ str1 ]] succeeds because there's a match for the (trivial) regex str1 somewhere in "str0 str1 strn".
If you want to check for a match to the entire string, you need to anchor the regex with a ^ at the beginning, and $ at the end: [[ $ALLOWED =~ ^${STR1}$ ]]

Shell regex to end of line

I have a file like this little example:
# ...
# mode=dev
# ...
Somewhere in this file there is a "variable" within a comment. And i would like to get the value with regex in a Shell script.
My code so far:
#!/bin/bash
conf=$(<"/etc/test.conf") # Get the file content
regex='mode=(.*)$' # Set a regex
if [[ $conf =~ $regex ]]; then # Search for the regex in the file
# We found it, so ...
echo "${BASH_REMATCH[1]}" # ... here is the value
fi
My big problem is, that it will not find the value :(
I tried a lot of different regex expressions and tested them with https://regex101.com/ , but it seems, that the Shell regex interprator is different from pcre and python.
My best solution was to find the mode= and everything after it. So is there a way to get only the value? The start is easy ... find mode=. But how do I say the shell regex to get everything behind mode= until the next linebreak? and not beyond this linebreak?
Something with \n (unix linebreak) and $ (end of string) did not work for me :(
Thanks for the help,
greetings
You can use this regex to get your match:
conf=$(<"/etc/test.conf")
regex=$'mode=([^\n]*)'
[[ $conf =~ $regex ]] && echo "${BASH_REMATCH[1]}"
Output:
dev
Regex $'mode=([^\n]*)' will match literal text mode= followed by 0 or more of any character that is not \n.