How to write conditional code depending on Bash version / features? - regex

I'm using the =~ in one of my Bash scripts. However, I need to make the script compatible with Bash versions that do not support that operator, in particular the version of Bash that ships with msysgit which is not build against libregex. I have a work-around that uses expr match instead, but that again does not work on Mac OS X for some reason.
Thus, I want to use the expr match only if [ -n "$MSYSTEM" -a ${BASH_VERSINFO[0]} -lt 4 ] and use =~ otherwise. The problem is that Bash always seems to parse the whole script, and even if msysgit Bash would execute the work-around at runtime, it still stumbles upon the =~ when parsing the script.
Is it possible to conditionally execute code in the same script depending on the Bash version, or should I look into another way to address this issue?

In your case, you can replace the regular expression with an equivalent pattern match.
[[ $foo = \[+([0-9])\][[:space:]]* ]]
Some explanations:
Patterns are matched against the entire string. The following regexes and patterns are equivalent:
^foo$ and foo
^foo and foo*
foo$ and *foo
foo and *foo*
+(...) matches one or more occurrences of the enclosed pattern, which in this case is [0-9]. That is, if $pattern and $regex match the same string, then so do +($pattern) and ($regex)+.

My current solution is to use grep -q on all platforms instead. This avoids any conditionals or complicated code constructs.
Probably using eval to parse the code containing =~ only at runtime would have worked, too, but then again that would have made the code more complicated to read.

For this particular pattern, an equivalent but portable case statement can be articulated. It needs to have a fairly substantial number of different glob patterns to enumerate all the corner cases, though.
case $foo in
[![]* | \[[!0-9]* | *[!][0-9[:space:]]* | *[!0-9[:space:]] | \
*\]*[![:space:]] | *[!0-9]\]* | \[*\[* | *\]*\]* )
return 1;; # false
\[*[0-9]*\]* )
return 0;; # true
*)
return 1;; # false
esac

Related

Bash: shell script if statement using multiple conditions including regex

I am currently studying the shell script and having some syntax issue.
what I am tyring is to make the 'if' statement to catch any user-input with alphabet, except the "giveup" line
here is the code that I built:
if [ $usrGuess =~ *[:alpha:]* && $usrGuess != "giveup" ]
once I run the code, it gives out the error message saying that:
[: missing `]'
If you guys have any solution to this, I will be happy to hear your advice :)
Thanks!
test ([) builtin of any shell (or the external one) does not support putting conditional construct e.g. &&, || or multiple command separator e.g. ; inside it.
Also, [ does not support Regex matching with =~. BTW your Regex pattern is not correct, it seems more like a glob pattern (and that should suffice in this case).
Both of the above are supported by the [[ keyword of bash and not all shells support these.
So, you can do:
if [[ $usrGuess = *[[:alpha:]]* && $usrGuess != "giveup" ]]
Here, I have moved for [[ and used the Glob match $usrGuess = *[:alpha:]* (dropped Regex matching).
Use double brackets, as your condition is composite:
if [[ $usrGuess =~ *[:alpha:]* && $usrGuess != "giveup" ]]
A slightly different approach using grep command would also work.
if grep -v '^giveup$' <<<$userGuess | grep -iq '^[a-z]*$'
In this example, we use exit code of grep command to make a if-else decision. Also note the '-q' option to second grep command. This ensures that the grep command matches the pattern silently.
Pros: Less complicated if() clause.
Con: There are two grep processes executed.
If you did want to retain POSIX compatibility, use the expr command to perform the regular expression match.
if expr "$usrGuess" : '[[:alpha:]]*' > /dev/null && [ "$usrGuess" != "giveup" ]
Either way, I'd opt to check against "giveup" first; if that check fails, you avoid the more expensive regular-expression check altogether.

bash regular expression different formats

I have used regular expression in my code like this: .*[^0-9].*
But recently I have seen some functions implemented like this: *[!0-9]* for the same purpose of first example, that is non-integer numbers.
So I confused what is the true form of regex and what is the difference of them.
can anybody help me in this issue?
There is only one regular expression - the first one. The second one is a glob pattern.
See regex(7) for the description of POSIX extended regular expressions supported by Bash:
http://man7.org/linux/man-pages/man7/regex.7.html
See Bash manual for the description of glob patterns: http://www.gnu.org/software/bash/manual/html_node/Pattern-Matching.html
Bash uses regular expressions in [[…]] command only: http://www.gnu.org/software/bash/manual/html_node/Conditional-Constructs.html
Bash uses glob patterns for everything else.
POSIX defines:
1) two types of regular expressions: BREs and EREs. These are used by utilities / built-ins.
BREs are more restricted and exist for backwards compatibility and typing less on an interactive session. Avoid them if possible and use EREs instead, which are more flexible and PERL-like.
Some utilities allow you to choose between both types of regular expressions.
For example, grep matches BREs by default (backwards compatibility...), but you can make it match EREs with -E.
Use usually must quote those before passing them to utilities or the shell will filename expand them.
.*[^0-9].* could be both a BRE or an ERE. In both cases it means the same as the Perl regex, which is equivalent to the glob *[!0-9]*.
The main difference between BRE and ERE is that EREs add more useful Perl like special characters such as (a|b), a{m,n}, a+, a?. Examples:
echo a | grep '(a|b)'
# output:
echo a | grep -E '(a|b)'
# output: a
echo a | grep 'a{1,2}'
# output:
echo a | grep -E 'a{1,2}'
# output: a
2) Patterns Used for Filename Expansion, also known as globs (used by the POSIX glob C function). These are usually expanded by the shell before going to the utilities and expand to match filenames. If you quote them they are don't expand anymore.
*[!0-9]* is must be a glob since BREs ane EREs use ^ instead of !.
echo *[!0-9]*
# output: filenames which are not numbers
echo '*[!0-9]*'
# output: *[!0-9]*

GREP: variable in regular expression

If I want to look whether a string is alphanumeric and shorter than a certain value, say 10, I would do like this (in BASH+GREP):
if grep '^[0-9a-zA-Z]\{1,10\}$' <<<$1 ; then ...
(BTW: I'm checking for $1, i.e. the first argument)
What if I want the value 10 to be written on a variable, e.g.
UUID_LEN=10
if grep '^[0-9a-zA-Z]\{1,$UUID_LEN\}$' <<<$1 ; then ...
I tried all sort of escapes, braces and so on, but could not avoid the error message
grep: Invalid content of \{\}
After googling and reading bash and grep tutorials I'm pretty convinced it can't be done. Am I wrong? Any way to go around this?
You need to use double quotes so that the shell expands the parameter before passing the resulting argument to grep:
if grep "^[0-9a-zA-Z]\{1,$UUID_LEN\}$" <<<$1 ; then ...
bash can perform regular expression matching itself, without having to start another process to run grep:
if [[ $1 =~ ^[0-9a-zA-Z]{1,$UUID_LEN}$ ]]; then

Bash string replacement with regex repetition

I have a file: filename_20130214_suffix.csv
I'd like replace the yyyymmdd part in bash. Here is what I intend to do:
file=`ls -t /path/filename_* | head -1`
file2=${file/20130214/20130215}
#this will not work
#file2=${file/[0-9]{8}/20130215/}
The problem is that parameter expansion does not use regular expressions, but patterns or globs(compare the difference between the regular expression "filename_..csv" and the glob "filename_.csv"). Globs cannot match a fixed number of a specific string.
However, you can enable extended patterns in bash, which should be close enough to what you want.
shopt -s extglob # Turn on extended pattern support
file2=${file/+([0-9])/20130215}
You can't match exactly 8 digts, but the +(...) lets you match one or more of the pattern inside the parentheses, which should be sufficient for your use case.
Since all you want to do in this case is replace everything between the _ characters, you could also simply use
file2=${file/_*_/_20130215_}
[[ $file =~ ^([^_]+_)[0-9]{8}(_.*) ]] && file2="${BASH_REMATCH[1]}20130215${BASH_REMATCH[2]}"

Getting the index of the substring on solaris

How can I find the index of a substring which matches a regular expression on solaris10?
Assuming that what you want is to find the location of the first match of a wildcard in a string using bash, the following bash function returns just that, or empty if the wildcard doesn't match:
function match_index()
{
local pattern=$1
local string=$2
local result=${string/${pattern}*/}
[ ${#result} = ${#string} ] || echo ${#result}
}
For example:
$ echo $(match_index "a[0-9][0-9]" "This is a a123 test")
10
If you want to allow full-blown regular expressions instead of just wildcards, replace the "local result=" line with
local result=$(echo "$string" | sed 's/'"$pattern"'.*$//')
but then you're exposed to the usual shell quoting issues.
The goto options for me are bash, awk and perl. I'm not sure what you're trying to do, but any of the three would likely work well. For example:
f=somestring
string=$(expr match "$f" '.*\(expression\).*')
echo $string
You tagged the question as bash, so I'm going to assume you're asking how to do this in a bash script. Unfortunately, the built-in regular expression matching doesn't save string indices. However, if you're asking this in order to extract the match substring, you're in luck:
if [[ "$var" =~ "$regex" ]]; then
n=${#BASH_REMATCH[*]}
while [[ $i -lt $n ]]
do
echo "capture[$i]: ${BASH_REMATCH[$i]}"
let i++
done
fi
This snippet will output in turn all of the submatches. The first one (index 0) will be the entire match.
You might like your awk options better, though. There's a function match which gives you the index you want. Documentation can be found here. It'll also store the length of the match in RLENGTH, if you need that. To implement this in a bash script, you could do something like:
match_index=$(echo "$var_to_search" | \
awk '{
where = match($0, '"$regex_to_find"')
if (where)
print where
else
print -1
}')
There are a lot of ways to deal with passing the variables in to awk. This combination of piping output and directly embedding one into the awk one-liner is fairly common. You can also give awk variable values with the -v option (see man awk).
Obviously you can modify this to get the length, the match string, whatever it is you need. You can capture multiple things into an array variable if necessary:
match_data=($( ... awk '{ ... print where,RLENGTH,match_string ... }'))
If you use bash 4.x you can source the oobash. A string lib written in bash with oo-style:
http://sourceforge.net/projects/oobash/
String is the constructor function:
String a abcda
a.indexOf a
0
a.lastIndexOf a
4
a.indexOf da
3
There are many "methods" more to work with strings in your scripts:
-base64Decode -base64Encode -capitalize -center
-charAt -concat -contains -count
-endsWith -equals -equalsIgnoreCase -reverse
-hashCode -indexOf -isAlnum -isAlpha
-isAscii -isDigit -isEmpty -isHexDigit
-isLowerCase -isSpace -isPrintable -isUpperCase
-isVisible -lastIndexOf -length -matches
-replaceAll -replaceFirst -startsWith -substring
-swapCase -toLowerCase -toString -toUpperCase
-trim -zfill