So I have this code
function test(){
local output="ASD[test]"
if [[ "$output" =~ ASD\[(.*?)\] ]]; then
echo "success";
else
echo "fail"
fi;
}
And as you can see it's supposed to echo success since the string matches that regular expression. However this ends up returning fail. What did I do wrong?
The ? in ASD\[(.*?)\] doesn't belong there. It looks like you're trying to apply a non-greedy modifier to the *, which is *? in Perl-compatible syntax, but Bash doesn't support that. (See the guide here.) In fact, if you examine $? after the test, you'll see that it's not 1 (the normal "string didn't match" result) but 2, which indicates a syntax error in the regular expression.
If you use the simpler pattern ASD\[(.*)\], then the match will succeed. However, if you use that regex on a string which might have later instances of brackets, too much will get captured by the parentheses. For example:
output=ASD[test1],ASD[test2]
[[ $output =~ ASD\[(.*)\] ]] && echo "first subscript is '${BASH_REMATCH[1]}'"
#=> first subscript is 'test1],ASD[test2'
In languages that support the *? syntax, it makes the matching "non-greedy" so that it will match the smallest string it can that makes the overall match succeed; without the ?, such expressions always match the longest possible instead. Since Bash doesn't have non-greediness, your best bet is probably to use a character class that matches everything except a close bracket, making it impossible for the match to move past the first one:
[[ $output =~ ASD\[([^]]*)\] ]] && echo "first subscript is '${BASH_REMATCH[1]}'"
#=> first subscript is 'test1'
Note that this breaks if there are any nested layers of bracket pairs within the subscript brackets - but then, so does the *? version.
Related
I am trying to do compare a string in bash to a regex pattern and have found something odd. For starters I am using GNU bash, version 5.0.17(1)-release (x86_64-pc-linux-gnu). This is within WSL.
For example here is sample program demonstrating the problem:
#!/bin/env bash
name="John"
if [[ "${name}" =~ "John"* ]]; then
echo "found"
else
echo "not found"
fi
exit
As expected this will echo found since the name "John" matches the regex pattern described. Now what I find odd is if I drop the n in John, it still echos found. Imo "Joh" does match the pattern of "John"*.
If you drop the "hn" and just set $name to "Jo" then it echos not found. It seems to only affect the last character in the Regex pattern (aside from the wildcard).
I am converting an old csh script to bash and this behavior is not happening in csh. What is causing bash to do this?
You're mixing up syntax for shell patterns and regular expressions. Your regular expression, after stripping the quoting, is John*: Joh followed by any number of n, including 0. Matches Joh, John, Johnn, Johnnn, ...
It's not anchored, so it also matches any string containing one of the matches above.
Since it's not anchored, depending on what you want, you could do any of these:
Any string containing John should match:
Regex: [[ $name =~ John ]]
Shell pattern: [[ $name == *John* ]]
Any string that begins with John should match:
Regex: [[ $name =~ ^John ]]
Shell pattern: [[ $name == John* ]]
Notice that shell patterns, unlike the regular expressions, must match the entire string.
A note on quoting: within [[ ... ]], the left-hand side doesn't have to be quoted; on the right-hand side, quoted parts are interpreted literally. For regular expressions, it's a good practice to define it in a separate variable:
re='^John'
if [[ $name =~ $re ]]; then
This avoids a few edge cases with special characters in the regex.
The =~ operator compares using regular expression syntax, not glob syntax. The * isn't a shell wildcard, it means, "the previous character, 0 or more times".
The string Joh matches the regular expression John* because it contains Joh followed by zero n characters.
Inside of my $foo variable I have this data (please pay close attention to the .s and ,s):
,example.com,de.wikipedia.org,reddit,stackoverflow.com.,amazon.,
I am trying to write an if statement in bash that basically works like this:
if [[ "${foo}" =~ *','[a-z0-9]','* || "${foo}" =~ *','[a-z0-9]'.,'* ]]; then
echo "Invalid input detected"
else
echo "OK"
fi
It would echo Invalid input detected since reddit and amazon. are in $foo.
If I change the contents of $foo to be:
,example.com,de.wikipedia.org,www.reddit.com,stackoverflow.com.,amazon.com,
Then it would echo OK.
I am using bash 3.2.57(1)-release on OS X 10.11.6 El Capitan.
Try:
if [[ $foo =~ ,[a-z0-9]*, || $foo =~ ,[a-z0-9]*\., ]]; then
echo "Invalid input detected"
else
echo "OK"
fi
Notes:
=~ is a regular expression operator. The right-hand-side needs to be a regular expression, not a glob.
, is not a shell-active character. Thus, it does not need any special quoting.
[a-z0-9] matches exactly one alphanumeric. Since we want to allow for more any number, use [a-z0-9]*
In regular expressions, ','* matches zero or more commas. This is not what you want. One might write ,.* which, because, . is a wildcard, matches a comma followed by zero or more of anything. Since the regex is not anchored to the end, adding a final .* makes no difference.
Inside of [[...]] there is no word splitting. So shell variables do not the double-quoting that need elsewhere.
Note that, in [a-z0-9], the exact characters that match a-z or 0-9 depend on the collation order in the locale.
I am writing a script and I want to check a variable for a format. This is the function I use :
check_non_numeric() {
#re='^\".*\"$'
re='\[^\]*\'
if ! [[ $1 =~ $re ]] ; then
echo "'$1' is not a valid format - \"[name]\" "
exit 1
fi
}
I want the regular expression to match a string with anything but quotation mark inside and quotation marks around it ("a" or "string" or "dsfo!^$**#"). The problem is that these regular expressions that I came up with dont work for me. I have used a very similar function to check if a variable is an integer or float and it worked there. Could you please tell me what the regular expression in question should be ?
Thank you very much
I'm assuming you meant you want to match anything that is not a string surrounded by quotes. It's easier to match use your regex to match, and the bash-test to "not" match it-- if that's not clear, use !. Here's a couple of ways to do it.
if [[ ! $(expr "$string" : '\".*\"') -gt 0 ]]; then echo "expr good"; fi
if [[ ! "$string" =~ \".*\" ]]; then echo "test good"; fi
Make sure you quote your variable you are testing with expr (which is there for edification purposes only).
As you want to match anything except string with quotation marks, you just target the quotation mark:
re='["]'
if [[ ! $1 =~ $re ]] ; then
Actually you don't need regex for this. Globbing will be enough:
if [[ ! $1 = *\"* ]]; then
...
fi
Your regex is very, very far off. \[ matches a literal left square bracket, and ^ (outside a character class) matches beginning of line.
Something like '^"[^"]*"' should work, if that's really what you want.
However, I kind of doubt that. In order to pass a value in literal double quotes, you would need something like
yourprogram '"value"'
or
yourprogram "\"value\""
which I would certainly want to avoid if I were you.
i have some regex that is behaving oddly in my shell script i have variables, and i have tried every what way to get them to behave, and they dont seem to do any regex, and i know my regex quite well thanks to regex101, here is what a sample looks like
fname="direcheck"
FIND="*"
if [[ $fname =~ $FIND ]]; then
echo "no quotes"
fi
if [[ "$fname" =~ "$FIND" ]]; then
echo "with quotes"
fi
right now it will display nothing
if i change find to
FIND="[9]*"
then it prints no quotes
if i say
FIND="[a-z]*"
then it prints no quotes
if i say
FIND="dircheck"
then nothing prints
if i say
FIND="*ck"
then nothing prints
I don't get how this regex is working
how do i use these variables, and what is the proper syntax?
* and *ck are invalid regular expressions. It would work (with no quotes) if you were comparing with ==, not =~. If you want to use the same functionality that you get in == for them, the equivalent regexps are .* and .*ck.
[9]* is any number (including zero) of characters that are 9. There is zero characters 9 in your direcheck, so it matches. (Edited from brainfart, thanks chepner)
dircheck is not found in direcheck, so not printing anything is hardly surprising.
[a-z]* is any number of characters that are between a and z (i.e. any number of lowercase letters). This will match, assuming it's not quoted.
I finally figured it out, and why it was working so oddly
[a-z]* and [9]* and [anythinghere]* they all match because it matches zero or more times. so "direcheck" has [9] zero or more times.
so
if [[ "$fname" =~ $FIND ]]; then
or
if [[ $fname =~ $FIND ]]; then
are both correct, and
if [[ "$fname" =~ "$FIND" ]]; then
matches only when the string matches exactly because $FIND is matched as a literal string not regex
On Bash 4.1 machine,
I'm trying to use "double bracket" [[ expression ]] to do REGEX comparison using "NEGATIVE LOOKAHEAD".
I did "set +H" to disable BASH variable'!' expansion to command history search.
I want to match to "any string" except "arm-trusted-firmware".
set +H
if [[ alsa =~ ^(?!arm-trusted-firmware).* ]]; then echo MATCH; else echo "NOT MATCH"; fi
I expect this to print "MATCH" back,
but it prints "NOT MATCH".
After looking into the return code of "double bracket",
it returns "2":
set +H
[[ alsa =~ ^(?!arm-trusted-firmware).* ]]
echo $?
According to bash manual,
the return value '2' means "the regular expression is syntactically incorrect":
An additional binary operator, =~, is available,
with the same precedence as == and !=.
When it is used,
the string to the right of the operator is considered
an extended regular expression and matched accordingly (as in regex(3)).
The return value is 0 if the string matches the pattern, and 1 otherwise.
If the regular expression is syntactically incorrect,
the conditional expression's return value is 2.
What did I do wrong?
In my original script,
I'm comparing against to a list of STRINGs.
When it matches, I trigger some function calls;
when it doesn't match, I skip my actions.
So, YES, from this example,
I'm comparing literally the STRING between 'alsa' and 'arm-trusted-firmware'.
By default bash POSIX standard doesn't supports PCRE. (source: Wiki Bash Hackers)
As workaround, you'll need to enable extglob. This will enable some extended globing patterns:
$ shopt -s extglob
Check Wooledge Wiki for reading more about extglob.
Then you'll be able to use patterns like that:
?(pattern-list) Matches zero or one occurrence of the given patterns
*(pattern-list) Matches zero or more occurrences of the given patterns
+(pattern-list) Matches one or more occurrences of the given patterns
#(pattern-list) Matches one of the given patterns
!(pattern-list) Matches anything except one of the given patterns
More about extended BASH globbing at Wiki Bash Hackers and LinuxJournal.
Thanks for the answer from #Barmar
BASH doesn't support "lookaround" (lookahead and lookbehind)
bash doesn't use PCRE, and doesn't support lookarounds.
Respectfully, aren't you over-complicating things?
if [ "$alsa" = arm-trusted-firmware ]
then
echo 'MATCH'
else
echo 'NOT MATCH'
fi
If you have a good reason for wanting to use the Bashism [[, it would serve
you better to provide an example that justifies it.
Bashism