bash and grep: passing of regex parameter - regex

I'm trying to write a bash script that helps solving crosswords. For example, the question is "Alcoholic Drink in German". I already have a 'B' at the first place, an 'R' at the last place and two gaps in between. So a regex would be $B..R^
Since I live in Switzerland, I'd like to use the ngerman dictionary (DICT=/usr/share/dict/ngerman).
Here's how I'd do it directly on the shell:
grep -i '^B...$' /usr/share/dict/ngerman
That works perfectly, and the word 'Bier' appears among three others. Since this syntax is cumbersome, I'd like to write a little batch script, that allows me to enter it like this:
crosswords 'B..R'
Here's my approach:
#!/bin/bash
DICT=/usr/share/dict/ngerman
usage () {
progname=$(basename $0)
echo "usage: $progname regex"
}
if [ $# -le 0 ]; then
usage
exit 1
fi
regex="'^$1$'"
cmd="grep -i $regex $DICT"
echo $regex
echo $cmd
$($cmd) | while read word; do
echo "$word"
done
But nothing appears, it doesn't work. I also output the $regex and the $cmd variable for debugging reasons. Here's what comes out:
'^B..R$'
grep -i '^B..R$' /usr/share/dict/ngerman
That's exactly what I need. If I copy/paste the command above, it works perfectly. But if i call it with $($cmd), it fails.
What is wrong?

you do not need to put quotes around regex variable string. and $($cmd) should change to $cmd
so the correct code is :
#!/bin/bash
DICT=/usr/share/dict/ngerman
usage () {
progname=$(basename $0)
echo "usage: $progname regex"
}
if [ $# -le 0 ]; then
usage
exit 1
fi
regex="^$1$"
cmd="grep -i $regex $DICT"
echo $regex
echo $cmd
$cmd | while read word; do
echo "$word"
done

Change regex="^'$1$'" to regex="^$1$" and $($cmd) to $cmd
Here is a fixed version:
#!/bin/bash
DICT=/usr/share/dict/ngerman
usage () {
progname=$(basename "$0")
echo "usage: $progname regex"
}
if [ $# -le 0 ]; then
usage
exit 1
fi
regex="^$1$"
cmd="grep -i $regex $DICT"
echo "$regex"
echo "$cmd"
$cmd | while read -r word; do
echo "$word"
done
But this script has potential problems. For example try running it as ./script 'asdads * '. This will expand to all files in a directory and all of them are going to be passed to grep.
Here is a bit improved version of your code with correct quoting and also with bonus input validation:
#!/bin/bash
DICT=/usr/share/dict/ngerman
usage () {
progname=$(basename "$0")
echo "usage: $progname regex"
}
if [ $# -le 0 ]; then
usage
exit 1
fi
if ! [[ $1 =~ ^[a-zA-Z\.]+$ ]]; then
echo 'Wrong word. Please use only a-zA-Z characters and dots for unknown letters'
exit 1
fi
grep -i "^$1$" "$DICT" | while read -r word; do
echo "$word"
done

Oh, now I got it. When I do it manually, '' are expanded! Here's my test program in C (param-test.c):
#include <stdio.h>
int main(int argc, char *argv[]) {
puts(argv[1]);
return 0;
}
Then I call:
param-test 'foo'
And I see:
foo
That's the problem! grep doesn't really get 'B..R', but just B..R.

Related

Check if a string contains valid pattern in Bash

I have a file a.txt contains a string like:
Axxx-Bxxxx
Rules for checking if it is valid or not include:
length is 10 characters.
x here is digits only.
Then, I try to check with:
#!/bin/bash
exp_len=10;
file=a.txt;
msg="checking string";
tmp="File not exist";
echo $msg;
if[ -f $file];then
tmp=$(cat $file);
if[[${#tmp} != $exp_len ]];then
msg="invalid length";
elif [[ $tmp =~ ^[A[0-9]{3}-B[0-9]{4}]$]];then
msg="valid";
else
msg="invalid";
fi
else
msg="file not exist";
fi
echo $msg;
But in valid case it doesn't work...
Is there someone help to correct me?
Thanks :)
Other than the regex fix, your code can be refactored as well, moreover there are syntax issues as well. Consider this code:
file="a.txt"
msg="checking string"
tmp="File not exist"
echo "$msg"
if [[ -f $file ]]; then
s="$(<$file)"
if [[ $s =~ ^A[0-9]{3}-B[0-9]{4}$ ]]; then
msg="valid"
else
msg="invalid"
fi
else
msg="file not exist"
fi
echo "$msg"
Changes are:
Remove unnecessary cat
Use [[ ... ]] when using bash
Spaces inside [[ ... ]] are required (your code was missing them)
There is no need to check length of 10 as regex will make sure that part as well
As mentioned in comments earlier correct regex should be ^A[0-9]{3}-B[0-9]{4}$ or ^A[[:digit:]]{3}-B[[:digit:]]{4}$
Note that a regex like ^[A[0-9]{3}-B[0-9]{4}]$ matches
^ - start of string
[A[0-9]{3} - three occurrences of A, [ or a digit
-B - a -B string
[0-9]{4} - four digits
] - a ] char
$ - end of string.
So, it matches strings like [A[-B1234], [[[-B1939], etc.
Your regex checking line must look like
if [[ $tmp =~ ^A[0-9]{3}-B[0-9]{4}$ ]];then
See the online demo:
#!/bin/bash
tmp="A123-B1234";
if [[ $tmp =~ ^A[0-9]{3}-B[0-9]{4}$ ]];then
msg="valid";
else
msg="invalid";
fi
echo $msg;
Output:
valid
Using just grep might be easier:
$ echo A123-B1234 > valid.txt
$ echo 123 > invalid.txt
$ grep -Pq 'A\d{3}-B\d{4}' valid.txt && echo valid || echo invalid
valid
$ grep -Pq 'A\d{3}-B\d{4}' invalid.txt && echo valid || echo invalid
invalid
With your shown samples and attempts, please try following code also.
#!/bin/bash
exp_len=10;
file=a.txt;
msg="checking string";
tmp="File not exist";
if [[ -f "$file" ]]
then
echo "File named $file is existing.."
awk '/^A[0-9]{3}-B[0-9]{4}$/{print "valid";next} {print "invalid"}' "$file"
else
echo "Please do check File named $file is not existing, exiting from script now..."
exit 1;
fi
OR In case you want to check if line in your Input_file should be 10 characters long(by seeing OP's attempted code's exp_len shell variable) then try following code, where an additional condition is also added in awk code.
#!/bin/bash
exp_len=10;
file=a.txt;
msg="checking string";
tmp="File not exist";
if [[ -f "$file" ]]
then
echo "File named $file is existing.."
awk -v len="$exp_len" 'length($0) == len && /^A[0-9]{3}-B[0-9]{4}$/{print "valid";next} {print "invalid"}' "$file"
else
echo "Please do check File named $file is not existing, exiting from script now..."
exit 1;
fi
NOTE: I am using here -f flag to test if file is existing or not, you can change it to -s eg: -s "$file" in case you want to check file is present and is of NOT NULL size.

How to test if string matches a regex in POSIX shell? (not bash)

I'm using Ubuntu system shell, not bash, and I found the regular way can not work:
#!/bin/sh
string='My string';
if [[ $string =~ .*My.* ]]
then
echo "It's there!"
fi
error [[: not found!
What can I do to solve this problem?
The [[ ... ]] are a bash-ism. You can make your test shell-agnostic by just using grep with a normal if:
if echo "$string" | grep -q "My"; then
echo "It's there!"
fi
Using grep for such a simple pattern can be considered wasteful. Avoid that unnecessary fork, by using the Sh built-in Glob-matching engine (NOTE: This does not support regex):
case "$value" in
*XXX*) echo OK ;;
*) echo fail ;;
esac
It is POSIX compliant. Bash have simplified syntax for this:
if [[ "$value" == *XXX* ]]; then :; fi
and even regex:
[[ abcd =~ b.*d ]] && echo ok
You could use expr:
if expr "$string" : "My" 1>/dev/null; then
echo "It's there";
fi
This would work with both sh and bash.
As a handy function:
exprq() {
local value
test "$2" = ":" && value="$3" || value="$2"
expr "$1" : "$value" 1>/dev/null
}
# Or `exprq "somebody" "body"` if you'd rather ditch the ':'
if exprq "somebody" : "body"; then
echo "once told me"
fi
Quoting from man expr:
STRING : REGEXP
anchored pattern match of REGEXP in STRING

How to check if string contains characters in regex pattern in shell?

How do I check if a variable contains characters (regex) other than 0-9a-z and - in pure bash?
I need a conditional check. If the string contains characters other than the accepted characters above simply exit 1.
One way of doing it is using the grep command, like this:
grep -qv "[^0-9a-z-]" <<< $STRING
Then you ask for the grep returned value with the following:
if [ ! $? -eq 0 ]; then
echo "Wrong string"
exit 1
fi
As #mpapis pointed out, you can simplify the above expression it to:
grep -qv "[^0-9a-z-]" <<< $STRING || exit 1
Also you can use the bash =~ operator, like this:
if [[ ! "$STRING" =~ [^0-9a-z-] ]] ; then
echo "Valid";
else
echo "Not valid";
fi
case has support for matching:
case "$string" in
(+(-[[:alnum:]-])) true ;;
(*) exit 1 ;;
esac
the format is not pure regexp, but it works faster then separate process with grep - which is important if you would have multiple checks.
Using Bash's substitution engine to test if $foo contains $bar
bar='[^0-9a-z-]'
if [ -n "$foo" -a -z "${foo/*$bar*}" ] ; then
echo exit 1
fi

In bash, how can I check a string for partials in an array?

If I have a string:
s='path/to/my/foo.txt'
and an array
declare -a include_files=('foo.txt' 'bar.txt');
how can I check the string for matches in my array efficiently?
You could loop through the array and use a bash substring check
for file in "${include_files[#]}"
do
if [[ $s = *${file} ]]; then
printf "%s\n" "$file"
fi
done
Alternately, if you want to avoid the loop and you only care that a file name matches or not, you could use the # form of bash extended globbing. The following example assumes that array file names do not contain |.
shopt -s extglob
declare -a include_files=('foo.txt' 'bar.txt');
s='path/to/my/foo.txt'
printf -v pat "%s|" "${include_files[#]}"
pat="${pat%|}"
printf "%s\n" "${pat}"
#prints foo.txt|bar.txt
if [[ ${s##*/} = #(${pat}) ]]; then echo yes; fi
For an exact match to the file name:
#!/bin/bash
s="path/to/my/foo.txt";
ARR=('foo.txt' 'bar.txt');
for str in "${ARR[#]}";
do
# if [ $(echo "$s" | awk -F"/" '{print $NF}') == "$str" ]; then
if [ $(basename "$s") == "$str" ]; then # A better option than awk for sure...
echo "match";
else
echo "no match";
fi;
done

Delete everything except all surrounded by ()

Let's say i have file like this
adsf(2)
af(3)
g5a(65)
aafg(1245)
a(3)df
How can i get from this only numbers between ( and ) ?
using BASH
A couple of solution comes to mind. Some of them handles the empty lines correctly, others not. Trivial to remove those though, using either grep -v '^$' or sed '/^$/d'.
sed
sed 's|.*(\([0-9]\+\).*|\1|' input
awk
awk -F'[()]' '/./{print $2}' input
2
3
65
1245
3
pure bash
#!/bin/bash
IFS="()"
while read a b; do
if [ -z $b ]; then
continue
fi
echo $b
done < input
and finally, using tr
cat input | tr -d '[a-z()]'
while read line; do
if [ -z "$line" ]; then
continue
fi
line=${line#*(}
line=${line%)*}
echo $line
done < file
Positive lookaround:
$ echo $'a1b(2)c\nd3e(456)fg7' | grep -Poe '(?<=\()[0-9]*(?=\))'
2
456
Another one:
while read line ; do
[[ $line =~ .*\(([[:digit:]]+)\).* ]] && echo "${BASH_REMATCH[1]}"
done < file