How to write and match regular expressions in /bin/sh script? - regex

I am writing a shell script for a limited unix-based microkernel which doesn't have bash! the /bin/sh can't run the following lines for some reasons.
if [[ `uname` =~ (QNX|qnx) ]]; then
read -p "what is the dev prefix to use? " dev_prefix
if [[ $dev_prefix =~ ^[a-z0-9_-]+#[a-z0-9_-"."]+:.*$ ]]; then
For the 1st and 3rd lines, it complains about missing expression operator, and for the 2nd line it says no coprocess! Can anyone shed light on differences between /bin/bash and /bin/sh scripts?

You can use this equivalent script in /bin/sh:
if uname | grep -Eq '(QNX|qnx)'; then
printf "what is the dev prefix to use? "
read dev_prefix
if echo "$dev_prefix" | grep -Eq '^[a-z0-9_-]+#[a-z0-9_-"."]+:'; then
...
fi
fi

You can use shellcheck to detect non-Posix features in a script:
Copy/Paste this into https://www.shellcheck.net/:
#!/bin/sh
if [[ `1uname` =~ (QNX|qnx) ]]; then
read -p "what is the dev prefix to use? " dev_prefix
if [[ $dev_prefix =~ ^[a-z0-9_-]+#[a-z0-9_-"."]+:.*$ ]]; then
: nothing
fi
fi
Or install shellcheck locally, and run shellcheck ./check.sh,
and it will highlight the non-posix features:
In ./check.sh line 2:
if [[ `1uname` =~ (QNX|qnx) ]]; then
^-- SC2039: In POSIX sh, [[ ]] is not supported.
^-- SC2006: Use $(..) instead of deprecated `..`
In ./check.sh line 4:
if [[ $dev_prefix =~ ^[a-z0-9_-]+#[a-z0-9_-"."]+:.*$ ]]; then
^-- SC2039: In POSIX sh, [[ ]] is not supported.
You either have to rewrite the expressions as globs (not realistic), or use external commands (grep/awk), a explained by #anubhava

Related

Using regular expressions in a ksh Script

I have a file (file.txt) that contains some text like:
000000000+000+0+00
000000001+000+0+00
000000002+000+0+00
and I am trying to check each line to make sure that it follows the format:
character*9, "+", character*3, "+", etc
so far I have:
#!/bin/ksh
file=file.txt
line_number=1
for line in $(cat $file)
do
if [[ "$line" != "[[.]]{9}+[[.]]{3}+[[.]]{1}+[[.]]{2} ]" ]]
then
echo "Invalid number ($line) check line $line_number"
exit 1
fi
let "line_number++"
done
however this does not evaluate correctly, no matter what I put in the lines the program terminates.
When you want line numbers of the mismatches, you can use grep -vn. Be careful with writing a correct regular expression, and you will have
grep -Evn "^.{9}[+].{3}[+].[+].{2}$" file.txt
This is not in the layout that you want, so change the layout with sed:
grep -Evn "^.{9}[+].{3}[+].[+].{2}$" file.txt |
sed -r 's/([^:]*):(.*)/Invalid number (\2) check line number \1./'
EDIT:
I changed .{1} into ..
The sed is also over the top. When you need spme explanation, you can start with echo "Linenr:Invalid line"
I'm having funny results putting the regex in the condition directly:
$ line='000000000+000+0+00'
$ [[ $line =~ ^.{9}\+.{3}\+.\+..$ ]] && echo ok
ksh: syntax error: `~(E)^.{9}\+.{3}\+.\+..$ ]] && echo ok
' unexpected
But if I save the regex in a variable:
$ re="^.{9}\+.{3}\+.\+..$"
$ [[ $line =~ $re ]] && echo ok
ok
So you can do
#!/bin/ksh
file=file.txt
line_number=1
re="^.{9}\+.{3}\+.\+..$"
while IFS= read -r line; do
if [[ ! $line =~ $re ]]; then
echo "Invalid number ($line) check line $line_number"
exit 1
fi
let "line_number++"
done < "$file"
You can also use a plain glob pattern:
if [[ $line != ?????????+???+?+?? ]]; then echo error; fi
ksh glob patterns have some regex-like syntax. If there's an optional space in there, you can handle that with the ?(sub-pattern) syntax
pattern="?????????+???+?( )?+??"
line1="000000000+000+0+00"
line2="000000000+000+ 0+00"
[[ $line1 == $pattern ]] && echo match || echo no match # => match
[[ $line2 == $pattern ]] && echo match || echo no match # => match
Read the "File Name Generation" section of the ksh man page.
Your regex looks bad - using sites like https://regex101.com/ is very helpful. From your description, I suspect it should look more like one of these;
^.{9}\+.{3}\+.{1}\+.{2}$
^[^\+]{9}\+[^\+]{3}\+[^\+]{1}\+[^\+]{2}$
^[0-9]{9}\+[0-9]{3}\+[0-9]{1}\+[0-9]{2}$
From the ksh manpage section on [[ - you would probably want to be using =~.
string =~ ere
True if string matches the pattern ~(E)ere where ere is an extended regular expression.
Note: As far as I know, ksh regex doesn't follow the normal syntax
You may have better luck with using grep:
# X="000000000+000+0+00"
# grep -qE "^[^\+]{9}\+[^\+]{3}\+[^\+]{1}\+[^\+]{2}$" <<<"${X}" && echo true
true
Or:
if grep -qE "^[^\+]{9}\+[^\+]{3}\+[^\+]{1}\+[^\+]{2}$" <<<"${line}"
then
exit 1
fi
You may also prefer to use a construct like below for handling files:
while read line; do
echo "${line}";
done < "${file}"

Regex - validate IPv6 shell script

I am able to validate IPv6 addresses using java with following regex:
([0-9a-fA-F]{0,4}:){1,7}([0-9a-fA-F]){0,4}
But I need to do this in shell script to which I am new.
This regex doesn't seem to work in shell. Have tried some other combinations also but nothing helped.
#!/bin/bash
regex="([0-9a-fA-F]{0,4}:){1,7}([0-9a-fA-F]){0,4}"
var="$1"
if [[ "$var" =~ "$regex" ]]
then
echo "matches"
else
echo "doesn't match!"
fi
It gives output doesn't match! for 2001:0Db8:85a3:0000:0000:8a2e:0370:7334
How can I write this in shell script?
Java regex shown in question would work in bash as well but make sure to not to use quoted regex variable. If the variable or string on the right hand side of =~ operator is quoted, then it is treated as a string literal instead of regex.
I also recommend using anchors in regex. Otherwise it will print matches for invalid input as: 2001:0db8:85a3:0000:0000:8a2e:0370:7334:foo:bar:baz.
Following script should work for you:
#!/bin/bash
regex='^([0-9a-fA-F]{0,4}:){1,7}[0-9a-fA-F]{0,4}$'
var="$1"
if [[ $var =~ $regex ]]; then
echo "matches"
else
echo "doesn't match!"
fi
[[ and =~ won't work with sh, and awk almost works everywhere.
Here is what I did
saved as ./check-ipv6.sh, chmod +x ./check-ipv6.sh
#!/bin/sh
regex='^([0-9a-fA-F]{0,4}:){1,7}[0-9a-fA-F]{0,4}$'
echo -n "$1" | awk '$0 !~ /'"$regex"'/{print "not an ipv6=>"$0;exit 1}'
Or you prefer bash than sh
#!/bin/bash
regex='^([0-9a-fA-F]{0,4}:){1,7}[0-9a-fA-F]{0,4}$'
awk '$0 !~ /'"$regex"'/{print "not an ipv6=>"$0;exit 1}' <<< "$1"
Test
~$ ./check-ipv6.sh 2001:0Db8:85a3:0000:0000:8a2e:0370:7334x
not an ipv6=>2001:0Db8:85a3:0000:0000:8a2e:0370:7334x
~$ echo $?
1
~$ ./check-ipv6.sh 2001:0Db8:85a3:0000:0000:8a2e:0370:7334
~$ echo $?
0

bash if [[ =~ regex compare not working?

I have a value in a variable that may be absolute or relative url, and I need to check which one it is.
I have found that there's a =~ operator in [[, but I can't get it to work. What am I doing wrong?
url="http://test"
if [[ "$url" =~ "^http://" ]];
then echo "absolute.";
fi;
You need to use regex without quote:
url="http://test"
if [[ "$url" =~ ^http:// ]]; then
echo "absolute."
fi
This outputs `absolute. as regex needs to be without quote in newer BASH (after BASH v3.1)
Or avoid regex and use glob matching:
if [[ "$url" == "http://"* ]]; then
echo "absolute."
fi

How to test if string matches a regex in POSIX shell? (not bash)

I'm using Ubuntu system shell, not bash, and I found the regular way can not work:
#!/bin/sh
string='My string';
if [[ $string =~ .*My.* ]]
then
echo "It's there!"
fi
error [[: not found!
What can I do to solve this problem?
The [[ ... ]] are a bash-ism. You can make your test shell-agnostic by just using grep with a normal if:
if echo "$string" | grep -q "My"; then
echo "It's there!"
fi
Using grep for such a simple pattern can be considered wasteful. Avoid that unnecessary fork, by using the Sh built-in Glob-matching engine (NOTE: This does not support regex):
case "$value" in
*XXX*) echo OK ;;
*) echo fail ;;
esac
It is POSIX compliant. Bash have simplified syntax for this:
if [[ "$value" == *XXX* ]]; then :; fi
and even regex:
[[ abcd =~ b.*d ]] && echo ok
You could use expr:
if expr "$string" : "My" 1>/dev/null; then
echo "It's there";
fi
This would work with both sh and bash.
As a handy function:
exprq() {
local value
test "$2" = ":" && value="$3" || value="$2"
expr "$1" : "$value" 1>/dev/null
}
# Or `exprq "somebody" "body"` if you'd rather ditch the ':'
if exprq "somebody" : "body"; then
echo "once told me"
fi
Quoting from man expr:
STRING : REGEXP
anchored pattern match of REGEXP in STRING

shell test operator regular expressions

#!/bin/bash
# This file will fix the cygwin vs linux paths and load programmer's notepad under windows.
# mail : <sandundhammikaperera#gmail.com>
# invokes the GNU GPL, all rights are granted.
# check first parameter is non empty.
# if empty then give a error message and exit.
file=${1:?"Usage: pn filename"};
if [[ "$file" == /*/* ]] ;then
#if long directory name.
# :FAILTHROUGH:
echo "$0: Executing pn.exe $file"
else
file="$(pwd)/$file";
fi
#check whether the filename starts with / if so replace it with appropriate prefix #
prefix="C:/cygwin/";
#check for the partterns starting with "/" #
echo $var | grep "^/*$"
if [[ "$?" -eq "0" ]] ;then
# check again whether parttern starts with /cygdrive/[a-z]/ parttern #
if [[ $file == /cygdrive/[a-z]/* ]] ; then
file=${file#/cygdrive/[a-z]/};
file="C:/"$file;
else
file="$prefix""$file";
fi
fi
#check for the appropriate file permissions #
# :TODO:
echo $file
exec "/cygdrive/c/Program Files (x86)/Programmer's Notepad/pn.exe" $file
as I in my program which convert path names between cygwin and windows and load
the pn.exe [ programmer's notepad in windows]. So my questions are,
There are built in regex expression for the "[[" or 'test' operator. (as well as
I used them in my above program). But why they don't work in here if I change,
echo $var | grep "^/*$"
if [[ "$?" -eq "0" ]] ;then
to this,
if [[ "$file" == ^/*$ ]] ;then
What is the reason for that? Is there any workaround?
I have already tried the second method [[ "$file" == ^/*$ ]] but it didn't work.
then , simple googling brought to me here: http://unix.com/shell-programming
How to find all the documentation about [[ operator or 'test' command? I have used
man test but :(. Which document specifies it's limitations on regex usage if there so.
First, grep "^/*$" will only match paths containing only slashes, like "/", "///", "////". You can use grep "^/" to match paths starting with a slash. If you want to use bash regexes:
var="/some"
#echo $var | grep "^/"
if [[ "$var" =~ ^/ ]] ;then
echo "yes"
fi