multi-lines pattern matching - regex

I have some files with content like this:
file1:
AAA
BBB
CCC
123
file2:
AAA
BBB
123
I want to echo the filename only if the first 3 lines are letters, or "file1" in the samples above.
Im merging the 3 lines into one and comparing it to my regex [A-Z], but could not get it to match for some reason
my script:
file=file1
if [[ $(head -3 $file|tr -d '\n'|sed 's/\r//g') == [A-Z] ]]; then
echo "$file"
fi
I ran it with bash -x, this is the output
+ file=file1
++ head -3 file1
++ tr -d '\n'
++ sed 's/\r//g'
+ [[ ASMUTCEDD == [A-Z] ]]
+exit

What you missed:
You can use grep to check that the input matches only [A-Z] characters (or indeed Bash's built-in regex matching, as #Barmar pointed out)
You can use the pipeline directly in the if statement, without [[ ... ]]
Like this:
file=file1
if head -n 3 "$file" | tr -d '\n\r' | grep -qE '^[A-Z]+$'; then
echo "$file"
fi

To do regular expression matching you have to use =~, not ==. And the regular expression should be ^[A-Z]*$. Your regular expression matches if there's a letter anywhere in the string, not just if the string is entirely letters.
if [[ $(head -3 $file|tr -d '\n\r') =~ ^[A-Z]*$ ]]; then
echo "$file"
fi

You can use built-ins and character classes for this problem:-
#!/bin/bash
file="file1"
C=0
flag=0
while read line
do
(( ++C ))
[ $C -eq 4 ] && break;
[[ "$line" =~ '[^[:alpha:]]' ]] && flag=1
done < "$file"
[ $flag -eq 0 ] && echo "$file"

Related

Unable to compare regular expression with string properly

Basically im trying to add up all the numbers in a file called numbers.txt . It contains non-number strings as well .
Here is my shell script
#!/bin/bash
sum=0
x=$(cat numbers.txt)
re='^[0-9]+$'
for i in $x
do
echo $i
if [ $i = re ]
then
sum=`expr $sum + $i`
fi
done
echo $sum
Here is the Output
abc
hellow
123
1
2
3
hello67
39
0
Below is txt file
abc hellow 123
1 2 3
hello67 39
The output instead of zero should have been 168 .
Corrected your script a little, for comparing with regex i use =~:
#!/bin/bash
sum=0
x=$(cat numbers.txt)
re='^[0-9]+$'
for i in $x
do
echo $i
if [[ $i =~ $re ]]
then
sum=$((sum + i))
fi
done
echo $sum
You are literally comparing the strings, not matching a regex.
You can use grep for regex matching, for example:
#!/bin/bash
sum=0
x=$(cat numbers.txt)
re='^[0-9]+$'
for i in $x
do
echo $i
echo $i | grep -oP "${re}" &> /dev/null
if [ $? == "0" ]
then
sum=`expr $sum + $i`
fi
done
echo $sum
echo $i | grep -oP "${re}" will pipe the text into grep. If it matches the regex, grep returns 0 which will be written into the special variable $?. So if that is 0, you know you have a number and can sum it up. That is the reason for if [ $? == "0" ].
Btw: = will assign a value to a variable, to compare, you need to use ==.
When using [ it actually does, my bad.

How to match this string in bash?

I'm reading a file in bash, line by line. I need to print lines that have the following format:
don't care <<< at least one character >>> don't care.
These are all the way which I have tried and none of them work:
if [[ $line =~ .*<<<.+>>>.* ]]; then
echo "$line"
fi
This has incorrect syntax
These two have correct syntax don't work
if [[ $line =~ '.*<<<.+>>>.*' ]]; then
echo "$line"
fi
And this:
if [[ $line == '*<<<*>>>*' ]]; then
echo "$line"
fi
So how to I tell bash to only print lines with that format? PD: I have tested and printing all lines works just fine.
Don't need regular expression. filename patterns will work just fine:
if [[ $line == *"<<<"?*">>>"* ]]; then ...
* - match zero or more characters
? - match exactly one character
"<<<" and ">>>" - literal strings: The angle brackets need to be quoted so bash does not interpret them as a here-string redirection.
$ line=foobar
$ [[ $line == *"<<<"?*">>>"* ]] && echo y || echo n
n
$ line='foo<<<>>>bar'
$ [[ $line == *"<<<"?*">>>"* ]] && echo y || echo n
n
$ line='foo<<<x>>>bar'
$ [[ $line == *"<<<"?*">>>"* ]] && echo y || echo n
y
$ line='foo<<<xyz>>>bar'
$ [[ $line == *"<<<"?*">>>"* ]] && echo y || echo n
y
For maximum compatibility, it's always a good idea to define your regex pattern as a separate variable in single quotes, then use it unquoted. This works for me:
re='<<<.+>>>'
if [[ $line =~ $re ]]; then
echo "$line"
fi
I got rid of the redundant leading/trailing .*, by the way.
Of course, I'm assuming that you have a valid reason to process the file in native bash (if not, just use grep -E '<<<.+>>>' file)
<, <<, <<<, >, and >> are special in the shell and need quoting:
[[ $line =~ '<<<'.+'>>>' ]]
. and + shouldn't be quoted, though, to keep their special meaning.
You don't need the leading and trailing .* in =~ matching, but you need them (or their equivalents) in patterns:
[[ $line == *'<<<'?*'>>>'* ]]
It's faster to use grep to extract lines:
grep -E '<<<.+>>>' input-file
I don't even understand why you are reading the file line per line. I have just launched following command in the bash prompt and it's working fine:
grep "<<<<.+>>>>" test.txt
where test.txt contains following data:
<<<<>>>>
<<<<a>>>>
<<<<aa>>>>
The result of the command was:
<<<<a>>>>
<<<<aa>>>>

bash + how to verify if string/word have some specific charecter

In my bash script I use the grep command in order to verify if value contain the “-“ character
As the following
echo a-b-c-d-f | grep "-"
a-b-c-d-f
or
echo version-1-APP-stef-10-1 | grep "-"
version-1-APP-stef-10-1
and in my bash script:
[[ ` echo version-1-APP-stef-10-1 | grep -c "-" ` -ne 0 ]] && echo "yes its contain"
But this is very ugly way !!!!!!!!!!!
What the alternative in bash to verify if string / word contain specific character as “-“
You don't need grep here, just glob match will do the job:
[[ "version-1-APP-stef-10-1" == *"-"* ]] && echo "hyphen is present"
hyphen is present
Use a glob
str=a-b-c-d-f
[[ $str == *-* ]] && echo 'yes'

Bash regex match spanning multiple lines

I'm trying to create a bash script that validates files. One of the requirements is that there has to be exactly one "2" in the file.
Here's my code at the moment:
regex1="[0-9b]*2[0-9b]*2[0-9b]*"
# This regex will match if there are at least two 2's in the file
if [[ ( $(cat "$file") =~ $regex1 ) ]]; then
# stuff to do when there's more than 1 "2"
fi
#...
regex2="^[013456789b]*$"
# This regex will match if there are at least no 2's in the file
if [[ ( $(cat "$file") =~ $regex2 ) ]]; then
# stuff to do when there are no 2's
fi
What I'm trying to do is match the following pieces:
654654654654
254654845845
845462888888
(because there are 2 2's in there, it should be matched)
987886546548
546546546848
654684546548
(because there are no 2's in there, it should be matched)
Any idea how I make it search all lines with the =~ operator?
I'm trying to create a bash script that validates files. One of the
requirements is that there has to be exactly one "2" in the file.
Try using grep
#!/bin/bash
file='input.txt'
n=$(grep -o '2' "$file" | wc -l)
# echo $n
if [[ $n -eq 1 ]]; then
echo 'Valid'
else
echo 'Invalid'
fi
How about this:
twocount=$(tr -dc '2' input.txt | wc -c)
if (( twocount != 1 ))
then
# there was either no 2, or more than one 2
else
# exactly one 2
fi
Using anchors as you've been, match a string of non-2s, a 2, and another string of non-2s.
^[^2]*2[^2]*$
Multiline regex match is indeed possible using awk with null record separator.
Consider below code:
awk '$0 ~ /^.*2.*2/ || $0 ~ /^[013456789]*$/' RS= file
654654654654
254654845845
845462888888
Take note of RS= which makes awk join multiple lines into single line $0 until it hits a double newline.

Delete everything except all surrounded by ()

Let's say i have file like this
adsf(2)
af(3)
g5a(65)
aafg(1245)
a(3)df
How can i get from this only numbers between ( and ) ?
using BASH
A couple of solution comes to mind. Some of them handles the empty lines correctly, others not. Trivial to remove those though, using either grep -v '^$' or sed '/^$/d'.
sed
sed 's|.*(\([0-9]\+\).*|\1|' input
awk
awk -F'[()]' '/./{print $2}' input
2
3
65
1245
3
pure bash
#!/bin/bash
IFS="()"
while read a b; do
if [ -z $b ]; then
continue
fi
echo $b
done < input
and finally, using tr
cat input | tr -d '[a-z()]'
while read line; do
if [ -z "$line" ]; then
continue
fi
line=${line#*(}
line=${line%)*}
echo $line
done < file
Positive lookaround:
$ echo $'a1b(2)c\nd3e(456)fg7' | grep -Poe '(?<=\()[0-9]*(?=\))'
2
456
Another one:
while read line ; do
[[ $line =~ .*\(([[:digit:]]+)\).* ]] && echo "${BASH_REMATCH[1]}"
done < file