Unable to compare regular expression with string properly

Unable to compare regular expression with string properly - regex

Basically im trying to add up all the numbers in a file called numbers.txt . It contains non-number strings as well .
Here is my shell script
#!/bin/bash
sum=0
x=$(cat numbers.txt)
re='^[0-9]+$'
for i in $x
do
echo $i
if [ $i = re ]
then
sum=`expr $sum + $i`
fi
done
echo $sum
Here is the Output
abc
hellow
123
1
2
3
hello67
39
0
Below is txt file
abc hellow 123
1 2 3
hello67 39
The output instead of zero should have been 168 .

Corrected your script a little, for comparing with regex i use =~:
#!/bin/bash
sum=0
x=$(cat numbers.txt)
re='^[0-9]+$'
for i in $x
do
echo $i
if [[ $i =~ $re ]]
then
sum=$((sum + i))
fi
done
echo $sum

You are literally comparing the strings, not matching a regex.
You can use grep for regex matching, for example:
#!/bin/bash
sum=0
x=$(cat numbers.txt)
re='^[0-9]+$'
for i in $x
do
echo $i
echo $i | grep -oP "${re}" &> /dev/null
if [ $? == "0" ]
then
sum=`expr $sum + $i`
fi
done
echo $sum
echo $i | grep -oP "${re}" will pipe the text into grep. If it matches the regex, grep returns 0 which will be written into the special variable $?. So if that is 0, you know you have a number and can sum it up. That is the reason for if [ $? == "0" ].
Btw: = will assign a value to a variable, to compare, you need to use ==.
When using [ it actually does, my bad.

Related

multi-lines pattern matching

I have some files with content like this:
file1:
AAA
BBB
CCC
123
file2:
AAA
BBB
123
I want to echo the filename only if the first 3 lines are letters, or "file1" in the samples above.
Im merging the 3 lines into one and comparing it to my regex [A-Z], but could not get it to match for some reason
my script:
file=file1
if [[ $(head -3 $file|tr -d '\n'|sed 's/\r//g') == [A-Z] ]]; then
echo "$file"
fi
I ran it with bash -x, this is the output
+ file=file1
++ head -3 file1
++ tr -d '\n'
++ sed 's/\r//g'
+ [[ ASMUTCEDD == [A-Z] ]]
+exit

What you missed:
You can use grep to check that the input matches only [A-Z] characters (or indeed Bash's built-in regex matching, as #Barmar pointed out)
You can use the pipeline directly in the if statement, without [[ ... ]]
Like this:
file=file1
if head -n 3 "$file" | tr -d '\n\r' | grep -qE '^[A-Z]+$'; then
echo "$file"
fi

To do regular expression matching you have to use =~, not ==. And the regular expression should be ^[A-Z]*$. Your regular expression matches if there's a letter anywhere in the string, not just if the string is entirely letters.
if [[ $(head -3 $file|tr -d '\n\r') =~ ^[A-Z]*$ ]]; then
echo "$file"
fi

You can use built-ins and character classes for this problem:-
#!/bin/bash
file="file1"
C=0
flag=0
while read line
do
(( ++C ))
[ $C -eq 4 ] && break;
[[ "$line" =~ '[^[:alpha:]]' ]] && flag=1
done < "$file"
[ $flag -eq 0 ] && echo "$file"

Comparing a word in a string with another in another string

I have a file with strings, like below:
ABCEF
RFGTH
ABCEF_ABCT
DRFRF_ABCT
LOIKH
LOIKH_DEFT
I need to extract the lines which have words matching even if they have _ABCT at the end.
while IFS= read -r line
do
if [ $line == $line ];
then
echo "$line"
fi
done < "$file"
The output I want is:
ABCEF
ABCEF_ABCT
LOIKH
LOIKH_DEFT
I know I have a mistake in the IF branch but I just got out of options now and I don't know how to get the outcome I need.

I would use awk to solve this problem:
awk -F_ '{ ++count[$1]; line[NR] = $0 }
END { for (i = 1; i <= NR; ++i) { split(line[i], a); if (count[a[1]] > 1) print line[i] } }' file
A count is kept of the first field of each line. Each line is saved to an array. Once the file is processed, any lines whose first part has a count greater than one are printed.

for w in $(for wrd in $(grep -o "^[A-Z]*" abc.dat)
do
n=$(grep -c $wrd abc.dat)
if (( $n > 1 ))
then
echo $wrd
fi
done | uniq)
do
grep $w abc.dat
done
With grep -o extract tokens "^[A-Z]*" from beginning of line (^) only matching A-Z (not _). These tokens are searched again in the same file and counted (grep -c) and if > 1 collected. With uniq they are only taken once and then again we search for them in the file to find all matches, but only once.

Here's a pure Bash solution using arrays and associative arrays:
#!/bin/bash
IFS=_
declare -A seen
while read -r -a tokens
do
# ${tokens[0]} contains the first word before the underscore.
word="${tokens[0]}"
if [[ "${seen[$word]}" ]]
then
[[ "${seen[$word]}" -eq 1 ]] && echo "$word"
echo "${tokens[*]}"
(( seen["$word"]++ ))
else
seen["$word"]=1
fi
done < "$file"
Output:
ABCEF
ABCEF_ABCT
LOIKH
LOIKH_DEFT

One more answer using sed
#!/bin/bash
#set -x
counter=1;
while read line ; do
((counter=counter+1))
var=$(sed -n -e "$counter,\$ s/$line/$line/p" file.txt)
if [ -n "$var" ]
then
echo $line
echo $var
fi
done < file.txt

How to check if string contains characters in regex pattern in shell?

How do I check if a variable contains characters (regex) other than 0-9a-z and - in pure bash?
I need a conditional check. If the string contains characters other than the accepted characters above simply exit 1.

One way of doing it is using the grep command, like this:
grep -qv "[^0-9a-z-]" <<< $STRING
Then you ask for the grep returned value with the following:
if [ ! $? -eq 0 ]; then
echo "Wrong string"
exit 1
fi
As #mpapis pointed out, you can simplify the above expression it to:
grep -qv "[^0-9a-z-]" <<< $STRING || exit 1
Also you can use the bash =~ operator, like this:
if [[ ! "$STRING" =~ [^0-9a-z-] ]] ; then
echo "Valid";
else
echo "Not valid";
fi

case has support for matching:
case "$string" in
(+(-[[:alnum:]-])) true ;;
(*) exit 1 ;;
esac
the format is not pure regexp, but it works faster then separate process with grep - which is important if you would have multiple checks.

Using Bash's substitution engine to test if $foo contains $bar
bar='[^0-9a-z-]'
if [ -n "$foo" -a -z "${foo/*$bar*}" ] ; then
echo exit 1
fi

Get the multilevel basename of a Path

I am trying to write a program that is sort of similar to UNIX basename, except I can control the level of its base.
For example, the program would perform tasks like the following:
$PROGRAM /PATH/TO/THE/FILE.txt 1
FILE.txt # returns the first level basename
$PROGRAM /PATH/TO/THE/FILE.txt 2
THE/FILE.txt #returns the second level basename
$ PROGRAM /PATH/TO/THE/FILE.txt 3
TO/THE/FILE.txt #returns the third level base name
I was trying to write this in perl, and to quickly test my idea, I used the following command line script to obtain the second level basename, to no avail:
$echo "/PATH/TO/THE/FILE.txt" | perl -ne '$rev=reverse $_; $rev=~s:((.*?/){2}).*:$2:; print scalar reverse $rev'
/THE
As you can see, it's only printing out the directory name and not the rest.
I feel this has to do with nongreedy matching with quantifier or what not, but my knowledge lacks in that area.
If there is more efficient way to do this in bash, please advise

You will find that your own solution works fine if you use $1 in the substitution instead of $2. The captures are numbered in the order that their opening parentheses appear within the regex, and you want to retain the outermost capture. However the code is less than elegant.
The File::Spec module is ideal for this purpose. It has been a core module with every release of Perl v5 and so shouldn't need installing.
use strict;
use warnings;
use File::Spec;
my #path = File::Spec->splitdir($ARGV[0]);
print File::Spec->catdir(splice #path, -$ARGV[1]), "\n";
output
E:\Perl\source>bnamen.pl /PATH/TO/THE/FILE.txt 1
FILE.txt
E:\Perl\source>bnamen.pl /PATH/TO/THE/FILE.txt 2
THE\FILE.txt
E:\Perl\source>bnamen.pl /PATH/TO/THE/FILE.txt 3
TO\THE\FILE.txt

A pure bash solution (with no checking of the number of arguments and all that):
#!/bin/bash
IFS=/ read -a a <<< "$1"
IFS=/ scratch="${a[*]:${#a[#]}-$2}"
echo "$scratch"
Done.
Works like this:
$ ./program /PATH/TO/THE/FILE.txt 1
FILE.txt
$ ./program /PATH/TO/THE/FILE.txt 2
THE/FILE.txt
$ ./program /PATH/TO/THE/FILE.txt 3
TO/THE/FILE.txt
$ ./program /PATH/TO/THE/FILE.txt 4
PATH/TO/THE/FILE.txt

#!/bin/bash
[ $# -ne 2 ] && exit
input=$1
rdepth=$2
delim=/
[ $rdepth -lt 1 ] && echo "depth must be greater than zero" && exit
parts=$(echo -n $input | sed "s,[^$delim],,g" | wc -m)
[ $parts -lt 1 ] && echo "invalid path" && exit
[ $rdepth -gt $parts ] && echo "input has only $parts part(s)" && exit
depth=$((parts-rdepth+2))
echo $input | cut -d "$delim" -f$depth-
Usage:
$ ./level.sh /tmp/foo/bar 2
foo/bar

Here's a bash script to do it with awk:
#!/bin/bash
level=$1
awk -v lvl=$level 'BEGIN{FS=OFS="/"}
{count=NF-lvl+1;
if (count < 1) {
count=1;
}
while (count <= NF) {
if (count > NF-lvl+1 ) {
printf "%s", OFS;
}
printf "%s", $(count);
count+=1;
}
printf "\n";
}'
To use it, do:
$ ./script_name num_args input_file
For example, if file input contains the line "/PATH/TO/THE/FILE.txt"
$ ./get_lvl_name 2 < input
THE/FILE.txt
$

As #tripleee said, split on the path delimiter ("/" for Unix-like) and then paste back together. For example:
echo "/PATH/TO/THE/FILE.txt" | perl -ne 'BEGIN{$n=shift} #p = split /\//; $start=($#p-$n+1<0?0:$#p-$n+1); print join("/",#p[$start..$#p])' 1
FILE.txt
echo "/PATH/TO/THE/FILE.txt" | perl -ne 'BEGIN{$n=shift} #p = split /\//; $start=($#p-$n+1<0?0:$#p-$n+1); print join("/",#p[$start..$#p])' 3
TO/THE/FILE.txt
Just for fun, here's one that will work on Unix and Windows (and any other) path types, if you provide the delimiter as the second argument:
# Unix-like
echo "PATH/TO/THE/FILE.txt" | perl -ne 'BEGIN{$n=shift;$d=shift} #p = split /\Q$d\E/; $start=($#p-$n+1<0?0:$#p-$n+1); print join($d,#p[$start..$#p])' 3 /
TO/THE/FILE.txt
# Wrong delimiter
echo "PATH/TO/THE/FILE.txt" | perl -ne 'BEGIN{$n=shift;$d=shift} #p = split /\Q$d\E/; $start=($#p-$n+1<0?0:$#p-$n+1); print join($d,#p[$start..$#p])' 3 \\
PATH/TO/THE/FILE.txt
# Windows
echo "C:\Users\Name\Documents\document.doc" | perl -ne 'BEGIN{$n=shift;$d=shift} #p = split /\Q$d\E/; $start=($#p-$n+1<0?0:$#p-$n+1); print join($d,#p[$start..$#p])' 3 \\
Name\Documents\document.doc
# Wrong delimiter
echo "C:\Users\Name\Documents\document.doc" | perl -ne 'BEGIN{$n=shift;$d=shift} #p = split /\Q$d\E/; $start=($#p-$n+1<0?0:$#p-$n+1); print join($d,#p[$start..$#p])' 3 /
C:\Users\Name\Documents\document.doc

Delete everything except all surrounded by ()

Let's say i have file like this
adsf(2)
af(3)
g5a(65)
aafg(1245)
a(3)df
How can i get from this only numbers between ( and ) ?
using BASH

A couple of solution comes to mind. Some of them handles the empty lines correctly, others not. Trivial to remove those though, using either grep -v '^$' or sed '/^$/d'.
sed
sed 's|.*(\([0-9]\+\).*|\1|' input
awk
awk -F'[()]' '/./{print $2}' input
2
3
65
1245
3
pure bash
#!/bin/bash
IFS="()"
while read a b; do
if [ -z $b ]; then
continue
fi
echo $b
done < input
and finally, using tr
cat input | tr -d '[a-z()]'

while read line; do
if [ -z "$line" ]; then
continue
fi
line=${line#*(}
line=${line%)*}
echo $line
done < file

Positive lookaround:
$ echo $'a1b(2)c\nd3e(456)fg7' | grep -Poe '(?<=\()[0-9]*(?=\))'
2
456

Another one:
while read line ; do
[[ $line =~ .*\(([[:digit:]]+)\).* ]] && echo "${BASH_REMATCH[1]}"
done < file

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Unable to compare regular expression with string properly - regex

Corrected your script a little, for comparing with regex i use =~: #!/bin/bash sum=0 x=$(cat numbers.txt) re='^[0-9]+$' for i in $x do echo $i if [[ $i =~ $re ]] then sum=$((sum + i)) fi done echo $sum

Related

multi-lines pattern matching

Comparing a word in a string with another in another string

How to check if string contains characters in regex pattern in shell?

Get the multilevel basename of a Path

Delete everything except all surrounded by ()

Categories

Resources