Bash: shell script if statement using multiple conditions including regex - regex

I am currently studying the shell script and having some syntax issue.
what I am tyring is to make the 'if' statement to catch any user-input with alphabet, except the "giveup" line
here is the code that I built:
if [ $usrGuess =~ *[:alpha:]* && $usrGuess != "giveup" ]
once I run the code, it gives out the error message saying that:
[: missing `]'
If you guys have any solution to this, I will be happy to hear your advice :)
Thanks!

test ([) builtin of any shell (or the external one) does not support putting conditional construct e.g. &&, || or multiple command separator e.g. ; inside it.
Also, [ does not support Regex matching with =~. BTW your Regex pattern is not correct, it seems more like a glob pattern (and that should suffice in this case).
Both of the above are supported by the [[ keyword of bash and not all shells support these.
So, you can do:
if [[ $usrGuess = *[[:alpha:]]* && $usrGuess != "giveup" ]]
Here, I have moved for [[ and used the Glob match $usrGuess = *[:alpha:]* (dropped Regex matching).

Use double brackets, as your condition is composite:
if [[ $usrGuess =~ *[:alpha:]* && $usrGuess != "giveup" ]]

A slightly different approach using grep command would also work.
if grep -v '^giveup$' <<<$userGuess | grep -iq '^[a-z]*$'
In this example, we use exit code of grep command to make a if-else decision. Also note the '-q' option to second grep command. This ensures that the grep command matches the pattern silently.
Pros: Less complicated if() clause.
Con: There are two grep processes executed.

If you did want to retain POSIX compatibility, use the expr command to perform the regular expression match.
if expr "$usrGuess" : '[[:alpha:]]*' > /dev/null && [ "$usrGuess" != "giveup" ]
Either way, I'd opt to check against "giveup" first; if that check fails, you avoid the more expensive regular-expression check altogether.

Related

Regex to match custom key pair not working in linux [duplicate]

I'm trying to do a tiny bash script that'll clean up the file and folder names of downloaded episodes of some tv shows I like. They often look like "[ www.Speed.Cd ] - Some.Show.S07E14.720p.HDTV.X264-SOMEONE", and I basically just want to strip out that speedcd advertising bit.
It's easy enough to remove www.Speed.Cd, spaces, and dashes using regexp matching in BASH, but for the life of me, I cannot figure out how to include the brackets in a list of characters to be matched against. [- [] doesn't work, neither does [- \[], [- \\[], [- \\\[], or any number of escape characters preceding the bracket I want to remove.
Here's what I've got so far:
[[ "$newfile" =~ ^(.*)([- \[]*(www\.torrenting\.com|spastikustv|www\.speed\.cd|moviesp2p\.com)[- \]]*)(.*)$ ]] &&
newfile="${BASH_REMATCH[1]}${BASH_REMATCH[4]}"
But it breaks on the brackets.
Any ideas?
TIA,
Daniel :)
EDIT: I should probably note that I'm using "shopt -s nocasematch" to ensure case insensitive matching, just in case you're wondering :)
EDIT 2: Thanks to all who contributed. I'm not 100% sure which answer was to be the "correct" one, as I had several problems with my statement. Actually, the most accurate answer was just a comment to my question posted by jw013, but I didn't get it at the time because I hadn't understood yet that spaces should be escaped. I've opted for aefxx's as that one basically says the same, but with explanations :) Would've liked to put a correct answer mark on ormaaj's answer, too, as he spotted more grave issues with my expression.
Anyway, the approach I was using above, trying to match and extract the parts to keep and leave behind the unwanted ones is really not very elegant, and won't catch all cases, not even something really simple like "Some.Show.S07E14.720p.HDTV.X264-SOMEONE - [ www.Speed.Cd ]". I've instead rewritten it to match and extract just the unwanted parts and then do string replacement of those on the original string, like so (loop is in case there's multiple brandings):
# Remove common torrent site brandings, including surrounding spaces, brackets, etc.:
while [[ "$newfile" =~ ([[\ {\(-]*(www\.)?(torrentday\.com|torrenting\.com|spastikustv|speed\.cd|moviesp2p\.com|publichd\.org|publichd|scenetime\.com|kingdom-release)[]\ }\)-]*) ]]; do
newfile=${newfile//"${BASH_REMATCH[1]}"/}
done
Ok, this is the first time I've heard of the =~ operator but nevertheless here's what I found by trial and error:
if [[ $newfile =~ ^(.*)([-[:space:][]*(what|ever)[][:space:]-]*)(.*)$ ]]
^^^^^^^^^^ ^^^^^^^^^^
Looks strange but actually does work (just tested it).
EDIT
Quote from the Linux man pages regex(7):
To include a literal ] in the list, make it the first character (following a possible ^). To include a literal -, make it the first or last character, or the second endpoint of a range. To use a literal aq-aq as the first endpoint of a range, enclose it in "[." and ".]" to make it a collating element (see below). With the exception of these and some combinations using aq[aq (see next paragraphs), all other special characters, including aq\aq, lose their special significance within a bracket expression.
Whenever you're doing a regex it's most compatible between Bash versions to put regexes in a variable even if you do manage to dodge all the pitfalls of putting them directly in a test expression. http://mywiki.wooledge.org/BashPitfalls#if_.5B.5B_.24foo_.3D.2BAH4_.27some_RE.27_.5D.5D
Your current regex looks like you're trying to optionally match anything preceding the opening bracket. I'd guess you're actually trying to save for example 3 and 4 from something like this:
$ shopt -s nocasematch
$ newfile='[ www.Speed.Cd ] - Some.Show.S07E14.720p.HDTV.X264-SOMEONE'
$ re='^.*[-[:space:][]*(www\.torrenting\.com|spastikustv|www\.speed\.cd|moviesp2p\.com)[][:space:]-]*(.*)$'
$ [[ $newfile =~ $re ]]
$ declare -p BASH_REMATCH
declare -ar BASH_REMATCH='([0]="[ www.Speed.Cd ] - Some.Show.S07E14.720p.HDTV.X264-SOMEONE" [1]="www.Speed.Cd" [2]="Some.Show.S07E14.720p.HDTV.X264-SOMEONE")'
The basic issue is quite simple, if not obvious.
A BASH REGEX is totally unprotected (from the shell), and cannot be protected by "​double quotes​". This means that every literal space (and tab,etc) must be protected by a baskslash \ ... end of story. The rest is just a case of getting you regex to suit your needs.
One other thing; use [\ [] and []\ ] to match [ and ] respectively, within the range square-bracket construct (in this case along with a space).
example:
newfile="[ ]"
[[ "$newfile" =~ ^[\ []\ []\ ]$ ]] &&
echo YES ||
echo NO
You can try something like this (though you weren't 100% clear on what cases you are trying to filter:
newfile="[ www.Speed.Cd ] - Some.Show.S07E14.720p.HDTV.X264-SOMEONE"
if [[ $newfile =~ ^(.*)([^a-zA-Z0-9.]*\[.*\][^a-zA-Z0-9.]*)(.*)$ ]]; then
newfile="${BASH_REMATCH[1]}${BASH_REMATCH[3]}"
fi
echo $newfile
# Some.Show.S07E14.720p.HDTV.X264-SOMEONE
Its just stripping any non-alnum (and dot) characters outside the [], and anything within []

How to write conditional code depending on Bash version / features?

I'm using the =~ in one of my Bash scripts. However, I need to make the script compatible with Bash versions that do not support that operator, in particular the version of Bash that ships with msysgit which is not build against libregex. I have a work-around that uses expr match instead, but that again does not work on Mac OS X for some reason.
Thus, I want to use the expr match only if [ -n "$MSYSTEM" -a ${BASH_VERSINFO[0]} -lt 4 ] and use =~ otherwise. The problem is that Bash always seems to parse the whole script, and even if msysgit Bash would execute the work-around at runtime, it still stumbles upon the =~ when parsing the script.
Is it possible to conditionally execute code in the same script depending on the Bash version, or should I look into another way to address this issue?
In your case, you can replace the regular expression with an equivalent pattern match.
[[ $foo = \[+([0-9])\][[:space:]]* ]]
Some explanations:
Patterns are matched against the entire string. The following regexes and patterns are equivalent:
^foo$ and foo
^foo and foo*
foo$ and *foo
foo and *foo*
+(...) matches one or more occurrences of the enclosed pattern, which in this case is [0-9]. That is, if $pattern and $regex match the same string, then so do +($pattern) and ($regex)+.
My current solution is to use grep -q on all platforms instead. This avoids any conditionals or complicated code constructs.
Probably using eval to parse the code containing =~ only at runtime would have worked, too, but then again that would have made the code more complicated to read.
For this particular pattern, an equivalent but portable case statement can be articulated. It needs to have a fairly substantial number of different glob patterns to enumerate all the corner cases, though.
case $foo in
[![]* | \[[!0-9]* | *[!][0-9[:space:]]* | *[!0-9[:space:]] | \
*\]*[![:space:]] | *[!0-9]\]* | \[*\[* | *\]*\]* )
return 1;; # false
\[*[0-9]*\]* )
return 0;; # true
*)
return 1;; # false
esac

what is the Regular Expressions to use to find if a line contain a word?

I would like to check in my script if in a line i have the word device i tried this regex
if[$SERIAL =~ /device/] but the execution result commande unkown. this is my script i try to install apk in only devices that present the stat device so not ofline, you find my script below
for SERIAL in $(adb devices | tail -n +2 | cut -sf 1);
do
if [$SERIAL =~ /device/]
#if [$SERIAL = "/^.*/device\b.*$/m"]
then
cd $1
for APKLIST in $(ls *.apk);
do
echo "Installation de $APKLIST on $SERIAL"
adb -s $SERIAL install -r $1/$APKLIST &
#adb bugreport > bug.txt
done
fi
done
try to change your if line into:
if [[ $SERIAL =~ "device" ]]
After the | cut -sf 1 your $SERIAL variable is going to contain just the serial number. So why are you trying to match it against device? It is never going to be true. Why won't you just use
for SERIAL in $(adb devices | grep device$ | cut -sf 1);
instead of your if?
You must always put a space between the square brackets in your if statement and what you're testing. You must also understand there are two types of square brackets, the [ ... ] standard and the expanded [[ ... ]].
The [ is actually a Unix command which is aliased to the test1 command:
$ ls -li /bin/test /bin/[
54008404 -rwxr-xr-x 2 root wheel 18576 Jul 25 2012 /bin/[
54008404 -rwxr-xr-x 2 root wheel 18576 Jul 25 2012 /bin/test
You can see the basic tests by looking at the test manpage.
Both Kornshell (which started it) and BASH give you the expanded test using double square brackets ([[ ... ]]). The [[ ... ]] is mainly used for pattern matching. Pattern matching extends the globbing that the shell can do on the command line -- sort of a poor man's regular expression. Newer versions of BASH actually allow you to use real regular expressions rather than pattern matching.
The important thing to remember is that you must use the double square brackets whenever you are doing a pattern match testing (or regular expression testing in BASH) rather than the single square brackets which can only test strings.
Also remember that regular expressions can match anywhere on a line:
if [[ $SERIAL =~ "device" ]]
could match devices. What you want to do is to add \b to make sure you're matching the word device. The \b is a regular expression word boundary:
if [[ $SERIAL =~ "\bdevice\b" ]]
1. Yes, I know the [ is a shell builtin in BASH and Kornshell, and [ doesn't execute /bin/[. However, the test command is also a shell builtin too.

BASH regexp matching - including brackets in a bracketed list of characters to match against?

I'm trying to do a tiny bash script that'll clean up the file and folder names of downloaded episodes of some tv shows I like. They often look like "[ www.Speed.Cd ] - Some.Show.S07E14.720p.HDTV.X264-SOMEONE", and I basically just want to strip out that speedcd advertising bit.
It's easy enough to remove www.Speed.Cd, spaces, and dashes using regexp matching in BASH, but for the life of me, I cannot figure out how to include the brackets in a list of characters to be matched against. [- [] doesn't work, neither does [- \[], [- \\[], [- \\\[], or any number of escape characters preceding the bracket I want to remove.
Here's what I've got so far:
[[ "$newfile" =~ ^(.*)([- \[]*(www\.torrenting\.com|spastikustv|www\.speed\.cd|moviesp2p\.com)[- \]]*)(.*)$ ]] &&
newfile="${BASH_REMATCH[1]}${BASH_REMATCH[4]}"
But it breaks on the brackets.
Any ideas?
TIA,
Daniel :)
EDIT: I should probably note that I'm using "shopt -s nocasematch" to ensure case insensitive matching, just in case you're wondering :)
EDIT 2: Thanks to all who contributed. I'm not 100% sure which answer was to be the "correct" one, as I had several problems with my statement. Actually, the most accurate answer was just a comment to my question posted by jw013, but I didn't get it at the time because I hadn't understood yet that spaces should be escaped. I've opted for aefxx's as that one basically says the same, but with explanations :) Would've liked to put a correct answer mark on ormaaj's answer, too, as he spotted more grave issues with my expression.
Anyway, the approach I was using above, trying to match and extract the parts to keep and leave behind the unwanted ones is really not very elegant, and won't catch all cases, not even something really simple like "Some.Show.S07E14.720p.HDTV.X264-SOMEONE - [ www.Speed.Cd ]". I've instead rewritten it to match and extract just the unwanted parts and then do string replacement of those on the original string, like so (loop is in case there's multiple brandings):
# Remove common torrent site brandings, including surrounding spaces, brackets, etc.:
while [[ "$newfile" =~ ([[\ {\(-]*(www\.)?(torrentday\.com|torrenting\.com|spastikustv|speed\.cd|moviesp2p\.com|publichd\.org|publichd|scenetime\.com|kingdom-release)[]\ }\)-]*) ]]; do
newfile=${newfile//"${BASH_REMATCH[1]}"/}
done
Ok, this is the first time I've heard of the =~ operator but nevertheless here's what I found by trial and error:
if [[ $newfile =~ ^(.*)([-[:space:][]*(what|ever)[][:space:]-]*)(.*)$ ]]
^^^^^^^^^^ ^^^^^^^^^^
Looks strange but actually does work (just tested it).
EDIT
Quote from the Linux man pages regex(7):
To include a literal ] in the list, make it the first character (following a possible ^). To include a literal -, make it the first or last character, or the second endpoint of a range. To use a literal aq-aq as the first endpoint of a range, enclose it in "[." and ".]" to make it a collating element (see below). With the exception of these and some combinations using aq[aq (see next paragraphs), all other special characters, including aq\aq, lose their special significance within a bracket expression.
Whenever you're doing a regex it's most compatible between Bash versions to put regexes in a variable even if you do manage to dodge all the pitfalls of putting them directly in a test expression. http://mywiki.wooledge.org/BashPitfalls#if_.5B.5B_.24foo_.3D.2BAH4_.27some_RE.27_.5D.5D
Your current regex looks like you're trying to optionally match anything preceding the opening bracket. I'd guess you're actually trying to save for example 3 and 4 from something like this:
$ shopt -s nocasematch
$ newfile='[ www.Speed.Cd ] - Some.Show.S07E14.720p.HDTV.X264-SOMEONE'
$ re='^.*[-[:space:][]*(www\.torrenting\.com|spastikustv|www\.speed\.cd|moviesp2p\.com)[][:space:]-]*(.*)$'
$ [[ $newfile =~ $re ]]
$ declare -p BASH_REMATCH
declare -ar BASH_REMATCH='([0]="[ www.Speed.Cd ] - Some.Show.S07E14.720p.HDTV.X264-SOMEONE" [1]="www.Speed.Cd" [2]="Some.Show.S07E14.720p.HDTV.X264-SOMEONE")'
The basic issue is quite simple, if not obvious.
A BASH REGEX is totally unprotected (from the shell), and cannot be protected by "​double quotes​". This means that every literal space (and tab,etc) must be protected by a baskslash \ ... end of story. The rest is just a case of getting you regex to suit your needs.
One other thing; use [\ [] and []\ ] to match [ and ] respectively, within the range square-bracket construct (in this case along with a space).
example:
newfile="[ ]"
[[ "$newfile" =~ ^[\ []\ []\ ]$ ]] &&
echo YES ||
echo NO
You can try something like this (though you weren't 100% clear on what cases you are trying to filter:
newfile="[ www.Speed.Cd ] - Some.Show.S07E14.720p.HDTV.X264-SOMEONE"
if [[ $newfile =~ ^(.*)([^a-zA-Z0-9.]*\[.*\][^a-zA-Z0-9.]*)(.*)$ ]]; then
newfile="${BASH_REMATCH[1]}${BASH_REMATCH[3]}"
fi
echo $newfile
# Some.Show.S07E14.720p.HDTV.X264-SOMEONE
Its just stripping any non-alnum (and dot) characters outside the [], and anything within []

Checking a string to see if it contains numeric character in UNIX

I'm new to UNIX, having only started it at work today, but experienced with Java, and have the following code:
#/bin/bash
echo "Please enter a word:"
read word
grep -i $word $1 | cut -d',' -f1,2 | tr "," "-"> output
This works fine, but what I now need to do is to check when word is read, that it contains nothing but letters and if it has numeric characters in print "Invalid input!" message and ask them to enter it again. I assumed regular expressions with an if statement would be the easy way to do this but I cannot get my head around how to use them in UNIX as I am used to the Java application of them. Any help with this would be greatly appreciated, as I couldn't find help when searching as all the solutions with regular expressions in linux I found only dealt with if it was either all numeric or not.
Yet another approach. Grep exits with 0 if a match is found, so you can test the exit code:
echo "${word}" | grep -q '[0-9]'
if [ $? = 0 ]; then
echo 'Invalid input'
fi
This is /bin/sh compatible.
Incorporating Daenyth and John's suggestions, this becomes
if echo "${word}" | grep '[0-9]' >/dev/null; then
echo 'Invalid input'
fi
The double bracket operator is an extended version of the test command which supports regexes via the =~ operator:
#!/bin/bash
while true; do
read -p "Please enter a word: " word
if [[ $word =~ [0-9] ]]; then
echo 'Invalid input!' >&2
else
break
fi
done
This is a bash-specific feature. Bash is a newer shell that is not available on all flavors of UNIX--though by "newer" I mean "only recently developed in the post-vacuum tube era" and by "not all flavors of UNIX" I mean relics like old versions of Solaris and HP-UX.
In my opinion this is the simplest option and bash is plenty portable these days, but if being portable to old UNIXes is in fact important then you'll need to use the other posters' sh-compatible answers. sh is the most common and most widely supported shell, but the price you pay for portability is losing things like =~.
If you're trying to write portable shell code, your options for string manipulation are limited. You can use shell globbing patterns (which are a lot less expressive than regexps) in the case construct:
export LC_COLLATE=C
read word
while
case "$word" in
*[!A-Za-z]*) echo >&2 "Invalid input, please enter letters only"; true;;
*) false;;
esac
do
read word
done
EDIT: setting LC_COLLATE is necessary because in most non-C locales, character ranges like A-Z don't have the “obvious” meaning. I assume you want only ASCII letters; if you also want letters with diacritics, don't change LC_COLLATE, and replace A-Za-z by [:alpha:] (so the whole pattern becomes *[![:alpha:]]*).
For full regexps, see the expr command. EDIT: Note that expr, like several other basic shell tools, has pitfalls with some special strings; the z characters below prevent $word from being interpreted as reserved words by expr.
export LC_COLLATE=C
read word
while expr "z$word" : 'z[A-Za-z]*$' >/dev/null; then
echo >&2 "Invalid input, please enter letters only"
read word
fi
If you only target recent enough versions of bash, there are other options, such as the =~ operator of [[ ... ]] conditional commands.
Note that your last line has a bug, the first command should be
grep -i "$word" "$1"
The quotes are because somewhat counter-intuitively, "$foo" means “the value of the variable called foo” whereas plain $foo means “take the value of foo, split it into separate words where it contains whitespace, and treat each word as a globbing pattern and try to expand it”. (In fact if you've already checked that $word contains only letters, leaving the quotes won't do any harm, but it takes more time to think of these special cases than to just put the quotes every times.)
Yet another (quite) portable way to do it ...
if test "$word" != "`printf "%s" "$word" | tr -dc '[[:alpha:]]'`"; then
echo invalid
fi
One portable (assuming bash >= 3) way to do this is to remove all numbers and test for length:
#!/bin/bash
read -p "Enter a number" var
if [[ -n ${var//[0-9]} ]]; then
echo "Contains non-numbers!"
else
echo "ok!"
fi
Coming from Java, it's important to note that bash has no real concept of objects or data types. Everything is a string, and complex data structures are painful at best.
For more info on what I did, and other related functions, google for bash string manipulation.
Playing around with Bash parameter expansion and character classes:
# cf. http://wiki.bash-hackers.org/syntax/pe
word="abc1def"
word="abc,def"
word=$'abc\177def'
# cf. http://mywiki.wooledge.org/BashFAQ/058 (no NUL byte in Bash variable)
word=$'abc\000def'
word="abcdef"
(
set -xv
[[ "${word}" != "${word/[[:digit:]]/}" ]] && echo invalid || echo valid
[[ -n "${word//[[:alpha:]]/}" ]] && echo invalid || echo valid
)
Everyone's answers seem to be based on the fact that the only invalid characters are numbers. The initial questions states that they need to check that the string contains "nothing but letters".
I think the best way to do it is
nonalpha=$(echo "$word" | sed 's/[[:alpha:]]//g')
if [[ ${#nonalpha} -gt 0 ]]; then
echo "Invalid character(s): $nonalpha"
fi
If you found this page looking for a way to detect non-numeric characters in your string (like I did!) replace [[:alpha:]] with [[:digit:]].