Check if a string contains valid pattern in Bash

Check if a string contains valid pattern in Bash - regex

I have a file a.txt contains a string like:
Axxx-Bxxxx
Rules for checking if it is valid or not include:
length is 10 characters.
x here is digits only.
Then, I try to check with:
#!/bin/bash
exp_len=10;
file=a.txt;
msg="checking string";
tmp="File not exist";
echo $msg;
if[ -f $file];then
tmp=$(cat $file);
if[[${#tmp} != $exp_len ]];then
msg="invalid length";
elif [[ $tmp =~ ^[A[0-9]{3}-B[0-9]{4}]$]];then
msg="valid";
else
msg="invalid";
fi
else
msg="file not exist";
fi
echo $msg;
But in valid case it doesn't work...
Is there someone help to correct me?
Thanks :)

Other than the regex fix, your code can be refactored as well, moreover there are syntax issues as well. Consider this code:
file="a.txt"
msg="checking string"
tmp="File not exist"
echo "$msg"
if [[ -f $file ]]; then
s="$(<$file)"
if [[ $s =~ ^A[0-9]{3}-B[0-9]{4}$ ]]; then
msg="valid"
else
msg="invalid"
fi
else
msg="file not exist"
fi
echo "$msg"
Changes are:
Remove unnecessary cat
Use [[ ... ]] when using bash
Spaces inside [[ ... ]] are required (your code was missing them)
There is no need to check length of 10 as regex will make sure that part as well
As mentioned in comments earlier correct regex should be ^A[0-9]{3}-B[0-9]{4}$ or ^A[[:digit:]]{3}-B[[:digit:]]{4}$

Note that a regex like ^[A[0-9]{3}-B[0-9]{4}]$ matches
^ - start of string
[A[0-9]{3} - three occurrences of A, [ or a digit
-B - a -B string
[0-9]{4} - four digits
] - a ] char
$ - end of string.
So, it matches strings like [A[-B1234], [[[-B1939], etc.
Your regex checking line must look like
if [[ $tmp =~ ^A[0-9]{3}-B[0-9]{4}$ ]];then
See the online demo:
#!/bin/bash
tmp="A123-B1234";
if [[ $tmp =~ ^A[0-9]{3}-B[0-9]{4}$ ]];then
msg="valid";
else
msg="invalid";
fi
echo $msg;
Output:
valid

Using just grep might be easier:
$ echo A123-B1234 > valid.txt
$ echo 123 > invalid.txt
$ grep -Pq 'A\d{3}-B\d{4}' valid.txt && echo valid || echo invalid
valid
$ grep -Pq 'A\d{3}-B\d{4}' invalid.txt && echo valid || echo invalid
invalid

With your shown samples and attempts, please try following code also.
#!/bin/bash
exp_len=10;
file=a.txt;
msg="checking string";
tmp="File not exist";
if [[ -f "$file" ]]
then
echo "File named $file is existing.."
awk '/^A[0-9]{3}-B[0-9]{4}$/{print "valid";next} {print "invalid"}' "$file"
else
echo "Please do check File named $file is not existing, exiting from script now..."
exit 1;
fi
OR In case you want to check if line in your Input_file should be 10 characters long(by seeing OP's attempted code's exp_len shell variable) then try following code, where an additional condition is also added in awk code.
#!/bin/bash
exp_len=10;
file=a.txt;
msg="checking string";
tmp="File not exist";
if [[ -f "$file" ]]
then
echo "File named $file is existing.."
awk -v len="$exp_len" 'length($0) == len && /^A[0-9]{3}-B[0-9]{4}$/{print "valid";next} {print "invalid"}' "$file"
else
echo "Please do check File named $file is not existing, exiting from script now..."
exit 1;
fi
NOTE: I am using here -f flag to test if file is existing or not, you can change it to -s eg: -s "$file" in case you want to check file is present and is of NOT NULL size.

Related

Using regular expressions in a ksh Script

I have a file (file.txt) that contains some text like:
000000000+000+0+00
000000001+000+0+00
000000002+000+0+00
and I am trying to check each line to make sure that it follows the format:
character*9, "+", character*3, "+", etc
so far I have:
#!/bin/ksh
file=file.txt
line_number=1
for line in $(cat $file)
do
if [[ "$line" != "[[.]]{9}+[[.]]{3}+[[.]]{1}+[[.]]{2} ]" ]]
then
echo "Invalid number ($line) check line $line_number"
exit 1
fi
let "line_number++"
done
however this does not evaluate correctly, no matter what I put in the lines the program terminates.

When you want line numbers of the mismatches, you can use grep -vn. Be careful with writing a correct regular expression, and you will have
grep -Evn "^.{9}[+].{3}[+].[+].{2}$" file.txt
This is not in the layout that you want, so change the layout with sed:
grep -Evn "^.{9}[+].{3}[+].[+].{2}$" file.txt |
sed -r 's/([^:]*):(.*)/Invalid number (\2) check line number \1./'
EDIT:
I changed .{1} into ..
The sed is also over the top. When you need spme explanation, you can start with echo "Linenr:Invalid line"

I'm having funny results putting the regex in the condition directly:
$ line='000000000+000+0+00'
$ [[ $line =~ ^.{9}\+.{3}\+.\+..$ ]] && echo ok
ksh: syntax error: `~(E)^.{9}\+.{3}\+.\+..$ ]] && echo ok
' unexpected
But if I save the regex in a variable:
$ re="^.{9}\+.{3}\+.\+..$"
$ [[ $line =~ $re ]] && echo ok
ok
So you can do
#!/bin/ksh
file=file.txt
line_number=1
re="^.{9}\+.{3}\+.\+..$"
while IFS= read -r line; do
if [[ ! $line =~ $re ]]; then
echo "Invalid number ($line) check line $line_number"
exit 1
fi
let "line_number++"
done < "$file"
You can also use a plain glob pattern:
if [[ $line != ?????????+???+?+?? ]]; then echo error; fi
ksh glob patterns have some regex-like syntax. If there's an optional space in there, you can handle that with the ?(sub-pattern) syntax
pattern="?????????+???+?( )?+??"
line1="000000000+000+0+00"
line2="000000000+000+ 0+00"
[[ $line1 == $pattern ]] && echo match || echo no match # => match
[[ $line2 == $pattern ]] && echo match || echo no match # => match
Read the "File Name Generation" section of the ksh man page.

Your regex looks bad - using sites like https://regex101.com/ is very helpful. From your description, I suspect it should look more like one of these;
^.{9}\+.{3}\+.{1}\+.{2}$
^[^\+]{9}\+[^\+]{3}\+[^\+]{1}\+[^\+]{2}$
^[0-9]{9}\+[0-9]{3}\+[0-9]{1}\+[0-9]{2}$
From the ksh manpage section on [[ - you would probably want to be using =~.
string =~ ere
True if string matches the pattern ~(E)ere where ere is an extended regular expression.
Note: As far as I know, ksh regex doesn't follow the normal syntax
You may have better luck with using grep:
# X="000000000+000+0+00"
# grep -qE "^[^\+]{9}\+[^\+]{3}\+[^\+]{1}\+[^\+]{2}$" <<<"${X}" && echo true
true
Or:
if grep -qE "^[^\+]{9}\+[^\+]{3}\+[^\+]{1}\+[^\+]{2}$" <<<"${line}"
then
exit 1
fi
You may also prefer to use a construct like below for handling files:
while read line; do
echo "${line}";
done < "${file}"

Regex - validate IPv6 shell script

I am able to validate IPv6 addresses using java with following regex:
([0-9a-fA-F]{0,4}:){1,7}([0-9a-fA-F]){0,4}
But I need to do this in shell script to which I am new.
This regex doesn't seem to work in shell. Have tried some other combinations also but nothing helped.
#!/bin/bash
regex="([0-9a-fA-F]{0,4}:){1,7}([0-9a-fA-F]){0,4}"
var="$1"
if [[ "$var" =~ "$regex" ]]
then
echo "matches"
else
echo "doesn't match!"
fi
It gives output doesn't match! for 2001:0Db8:85a3:0000:0000:8a2e:0370:7334
How can I write this in shell script?

Java regex shown in question would work in bash as well but make sure to not to use quoted regex variable. If the variable or string on the right hand side of =~ operator is quoted, then it is treated as a string literal instead of regex.
I also recommend using anchors in regex. Otherwise it will print matches for invalid input as: 2001:0db8:85a3:0000:0000:8a2e:0370:7334:foo:bar:baz.
Following script should work for you:
#!/bin/bash
regex='^([0-9a-fA-F]{0,4}:){1,7}[0-9a-fA-F]{0,4}$'
var="$1"
if [[ $var =~ $regex ]]; then
echo "matches"
else
echo "doesn't match!"
fi

[[ and =~ won't work with sh, and awk almost works everywhere.
Here is what I did
saved as ./check-ipv6.sh, chmod +x ./check-ipv6.sh
#!/bin/sh
regex='^([0-9a-fA-F]{0,4}:){1,7}[0-9a-fA-F]{0,4}$'
echo -n "$1" | awk '$0 !~ /'"$regex"'/{print "not an ipv6=>"$0;exit 1}'
Or you prefer bash than sh
#!/bin/bash
regex='^([0-9a-fA-F]{0,4}:){1,7}[0-9a-fA-F]{0,4}$'
awk '$0 !~ /'"$regex"'/{print "not an ipv6=>"$0;exit 1}' <<< "$1"
Test
~$ ./check-ipv6.sh 2001:0Db8:85a3:0000:0000:8a2e:0370:7334x
not an ipv6=>2001:0Db8:85a3:0000:0000:8a2e:0370:7334x
~$ echo $?
1
~$ ./check-ipv6.sh 2001:0Db8:85a3:0000:0000:8a2e:0370:7334
~$ echo $?
0

How to match this string in bash?

I'm reading a file in bash, line by line. I need to print lines that have the following format:
don't care <<< at least one character >>> don't care.
These are all the way which I have tried and none of them work:
if [[ $line =~ .*<<<.+>>>.* ]]; then
echo "$line"
fi
This has incorrect syntax
These two have correct syntax don't work
if [[ $line =~ '.*<<<.+>>>.*' ]]; then
echo "$line"
fi
And this:
if [[ $line == '*<<<*>>>*' ]]; then
echo "$line"
fi
So how to I tell bash to only print lines with that format? PD: I have tested and printing all lines works just fine.

Don't need regular expression. filename patterns will work just fine:
if [[ $line == *"<<<"?*">>>"* ]]; then ...
* - match zero or more characters
? - match exactly one character
"<<<" and ">>>" - literal strings: The angle brackets need to be quoted so bash does not interpret them as a here-string redirection.
$ line=foobar
$ [[ $line == *"<<<"?*">>>"* ]] && echo y || echo n
n
$ line='foo<<<>>>bar'
$ [[ $line == *"<<<"?*">>>"* ]] && echo y || echo n
n
$ line='foo<<<x>>>bar'
$ [[ $line == *"<<<"?*">>>"* ]] && echo y || echo n
y
$ line='foo<<<xyz>>>bar'
$ [[ $line == *"<<<"?*">>>"* ]] && echo y || echo n
y

For maximum compatibility, it's always a good idea to define your regex pattern as a separate variable in single quotes, then use it unquoted. This works for me:
re='<<<.+>>>'
if [[ $line =~ $re ]]; then
echo "$line"
fi
I got rid of the redundant leading/trailing .*, by the way.
Of course, I'm assuming that you have a valid reason to process the file in native bash (if not, just use grep -E '<<<.+>>>' file)

<, <<, <<<, >, and >> are special in the shell and need quoting:
[[ $line =~ '<<<'.+'>>>' ]]
. and + shouldn't be quoted, though, to keep their special meaning.
You don't need the leading and trailing .* in =~ matching, but you need them (or their equivalents) in patterns:
[[ $line == *'<<<'?*'>>>'* ]]
It's faster to use grep to extract lines:
grep -E '<<<.+>>>' input-file

I don't even understand why you are reading the file line per line. I have just launched following command in the bash prompt and it's working fine:
grep "<<<<.+>>>>" test.txt
where test.txt contains following data:
<<<<>>>>
<<<<a>>>>
<<<<aa>>>>
The result of the command was:
<<<<a>>>>
<<<<aa>>>>

bash and grep: passing of regex parameter

I'm trying to write a bash script that helps solving crosswords. For example, the question is "Alcoholic Drink in German". I already have a 'B' at the first place, an 'R' at the last place and two gaps in between. So a regex would be $B..R^
Since I live in Switzerland, I'd like to use the ngerman dictionary (DICT=/usr/share/dict/ngerman).
Here's how I'd do it directly on the shell:
grep -i '^B...$' /usr/share/dict/ngerman
That works perfectly, and the word 'Bier' appears among three others. Since this syntax is cumbersome, I'd like to write a little batch script, that allows me to enter it like this:
crosswords 'B..R'
Here's my approach:
#!/bin/bash
DICT=/usr/share/dict/ngerman
usage () {
progname=$(basename $0)
echo "usage: $progname regex"
}
if [ $# -le 0 ]; then
usage
exit 1
fi
regex="'^$1$'"
cmd="grep -i $regex $DICT"
echo $regex
echo $cmd
$($cmd) | while read word; do
echo "$word"
done
But nothing appears, it doesn't work. I also output the $regex and the $cmd variable for debugging reasons. Here's what comes out:
'^B..R$'
grep -i '^B..R$' /usr/share/dict/ngerman
That's exactly what I need. If I copy/paste the command above, it works perfectly. But if i call it with $($cmd), it fails.
What is wrong?

you do not need to put quotes around regex variable string. and $($cmd) should change to $cmd
so the correct code is :
#!/bin/bash
DICT=/usr/share/dict/ngerman
usage () {
progname=$(basename $0)
echo "usage: $progname regex"
}
if [ $# -le 0 ]; then
usage
exit 1
fi
regex="^$1$"
cmd="grep -i $regex $DICT"
echo $regex
echo $cmd
$cmd | while read word; do
echo "$word"
done

Change regex="^'$1$'" to regex="^$1$" and $($cmd) to $cmd
Here is a fixed version:
#!/bin/bash
DICT=/usr/share/dict/ngerman
usage () {
progname=$(basename "$0")
echo "usage: $progname regex"
}
if [ $# -le 0 ]; then
usage
exit 1
fi
regex="^$1$"
cmd="grep -i $regex $DICT"
echo "$regex"
echo "$cmd"
$cmd | while read -r word; do
echo "$word"
done
But this script has potential problems. For example try running it as ./script 'asdads * '. This will expand to all files in a directory and all of them are going to be passed to grep.
Here is a bit improved version of your code with correct quoting and also with bonus input validation:
#!/bin/bash
DICT=/usr/share/dict/ngerman
usage () {
progname=$(basename "$0")
echo "usage: $progname regex"
}
if [ $# -le 0 ]; then
usage
exit 1
fi
if ! [[ $1 =~ ^[a-zA-Z\.]+$ ]]; then
echo 'Wrong word. Please use only a-zA-Z characters and dots for unknown letters'
exit 1
fi
grep -i "^$1$" "$DICT" | while read -r word; do
echo "$word"
done

Oh, now I got it. When I do it manually, '' are expanded! Here's my test program in C (param-test.c):
#include <stdio.h>
int main(int argc, char *argv[]) {
puts(argv[1]);
return 0;
}
Then I call:
param-test 'foo'
And I see:
foo
That's the problem! grep doesn't really get 'B..R', but just B..R.

shell scripting and regular expression

#!bin/bash
echo enter your password :
read password
passlength=$(echo ${#password})
if [ $passlength -le 8 ];
then
echo you entered correct password
else
echo entered password is incorrect
fi
if [[$password == [a-z]*[0-9][a-z]*]];
then
echo match found
else
echo match not found
fi
I am not getting what's wrong with this code. If I enter any string as a password, let's say hello123, it gives me an error:
hello123 : command not found
What is wrong with my script?

You can do the following to make it work cross-platforms with any the bourne shell (/bin/sh) based shell, no bash specific primitives -
echo "$password" | grep -q "[a-z]*[0-9][a-z]*"
if [ $? -eq 0 ] ;then
echo "match found"
else
echo "match not found"
fi
Also feel free to use quotes around the variable names. It will save you hours and hours worth of useless debugging. :)

Technically it should give you an error like [[hello123 : command not found.
The issue is that [[$password is not expanded how you think it is. Bash will first resolve the $password variable to what you entered (i.e. hello123). This will yield the string [[hello123 which bash will then try to invoke (and fail, as there is nothing with that name).
Simply add a space () after [[ and bash will recognise [[ as the command to run (although it is a builtin).
if [[ "$password" == [a-z]*[0-9][a-z]* ]]
then
...

The corrected script is below. The errors were:
#!/bin/bash, not #!bin/bash
To read password length, just do passlength=${#password}, not
passlength=$(echo ${#password})
Always put a space after [ or [[
#!/bin/bash
echo "enter your password :"
read password
passlength=${#password}
if [[ $passlength -le 8 ]]
then
echo "you entered correct password"
else
echo "entered password is incorrect"
fi
if [[ $password == [a-z]*[0-9][a-z]* ]]
then
echo "match found"
else
echo "match not found"
fi

In the bash [[ construct, the == operator will match glob-style patterns, and =~ will match regular expressions. See the documentation.

#!/bin/bash
read -s -p "Enter Password: " password
password_length=${#password}
if [ $password_length -lt 8 -o $password_length -gt 20 ] ;then
echo -e "Invalid password - should be between 8 and 20 characters in length.";
echo ;
else
# Check for invalid characters
case $password in
*[^a-zA-Z0-9]* )
echo -e "Password contains invalid characters.";
echo ;
;;
* )
echo "Password accepted.";
echo ;
break;
;;
esac
fi
More tuned example..

Try to replace line
if [[$password == [a-z]*[0-9][a-z]*]];
with following
if echo "$password" | grep -qs '[a-z]*[0-9][a-z]*'
HTH

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Check if a string contains valid pattern in Bash - regex

Using just grep might be easier: $ echo A123-B1234 > valid.txt $ echo 123 > invalid.txt $ grep -Pq 'A\d{3}-B\d{4}' valid.txt && echo valid || echo invalid valid $ grep -Pq 'A\d{3}-B\d{4}' invalid.txt && echo valid || echo invalid invalid

Related

Using regular expressions in a ksh Script

Regex - validate IPv6 shell script

How to match this string in bash?

bash and grep: passing of regex parameter

shell scripting and regular expression

Categories

Resources