Search curl output for string with Bash - regex

Trying to fetch a webpage as a lowercase string, then search the string output for a substring
My attempt:
1 #!/usr/bin/env bash
2
3 URL="https://somesite.com"
4 MOVIES_SOURCE="movies.txt"
5 PAGE=`curl "$URL"` | tr '[:upper:]' '[:lower:]'
6
7 while IFS= read -r movie
8 do
9 FOUND="$($PAGE =~ "$movie")"
10 echo $FOUND
11 if [[ $FOUND ]]; then
12 echo "$movie found"
13 fi
14 done < $MOVIES_SOURCE
15
When I run this, I'm receiving line 9: =~: command not found
The $movie variable is valid and contains each line from movies.txt, but I'm struggling to figure out this one!

If you want to use regex matching in bash:
if [[ $PAGE =~ $movie ]]; then
echo "$movie found"
fi
example:
PAGE="text blah Avengers more text"
movie="Avengers"
if [[ $PAGE =~ $movie ]]; then
echo "$movie found"
fi
gives:
Avengers found
Also: to capture the output of the whole curl command:
PAGE=$(curl "$URL" | tr '[:upper:]' '[:lower:]')
always prefer $() over backticks
you had to wrap your whole command in order for $PAGE to contain the output where you converted to lowercase.

Related

Unable to compare regular expression with string properly

Basically im trying to add up all the numbers in a file called numbers.txt . It contains non-number strings as well .
Here is my shell script
#!/bin/bash
sum=0
x=$(cat numbers.txt)
re='^[0-9]+$'
for i in $x
do
echo $i
if [ $i = re ]
then
sum=`expr $sum + $i`
fi
done
echo $sum
Here is the Output
abc
hellow
123
1
2
3
hello67
39
0
Below is txt file
abc hellow 123
1 2 3
hello67 39
The output instead of zero should have been 168 .
Corrected your script a little, for comparing with regex i use =~:
#!/bin/bash
sum=0
x=$(cat numbers.txt)
re='^[0-9]+$'
for i in $x
do
echo $i
if [[ $i =~ $re ]]
then
sum=$((sum + i))
fi
done
echo $sum
You are literally comparing the strings, not matching a regex.
You can use grep for regex matching, for example:
#!/bin/bash
sum=0
x=$(cat numbers.txt)
re='^[0-9]+$'
for i in $x
do
echo $i
echo $i | grep -oP "${re}" &> /dev/null
if [ $? == "0" ]
then
sum=`expr $sum + $i`
fi
done
echo $sum
echo $i | grep -oP "${re}" will pipe the text into grep. If it matches the regex, grep returns 0 which will be written into the special variable $?. So if that is 0, you know you have a number and can sum it up. That is the reason for if [ $? == "0" ].
Btw: = will assign a value to a variable, to compare, you need to use ==.
When using [ it actually does, my bad.

Bash: extract 2nd integer from test string

I have a string which can contain 2 or more integers. I am trying to extract only the 2nd integer but the following code is printing all occurences.
#!/bin/bash
TEST_STRING=$(echo "207 - 11 (INTERRUPT_NAME) 0xffffffff:0xffffffff")
ERROR_COUNT=$(echo $TEST_STRING | grep -o -E '[0-9]+')
echo $ERROR_COUNT
The output is:
207 11 0 0
Basically I would like ERROR_COUNT to be 11 for the given TEST_STRING.
Using bash's =~ operator:
$ test_string="207 - 11 (INTERRUPT_NAME) 0xffffffff:0xffffffff"
$ [[ $test_string =~ [0-9]+[^0-9]+([0-9]+) ]] && [[ ! -z ${BASH_REMATCH[1]} ]] && echo ${BASH_REMATCH[1]}
11
Explained:
[[ $test_string =~ [0-9]+[^0-9]+([0-9]+) ]] if $test_string has substring
integer — non-integer — integer, the latter integer is set to variable ${BASH_REMATCH[1]}
&& and
[[ ! -z ${BASH_REMATCH[1]} ]] something is actually set to the variable
&& "then"
echo ${BASH_REMATCH[1]} output the variable
#!/bin/bash
TEST_STRING=$(echo "207 - 11 (INTERRUPT_NAME) 0xffffffff:0xffffffff")
ERROR_COUNT="$( echo "${TEST_STRING}" | awk '{print $3}' )"
echo "${ERROR_COUNT}"
The output is:
11
Here is my take on it, using read to parse the separate variables:
TEST_STRING="207 - 11 (INTERRUPT_NAME) 0xffffffff:0xffffffff"
read num sep ERROR_COUNT rest <<<"$TEST_STRING"
echo ${ERROR_COUNT}
11
Using an auxiliary variable:
TEST_STRING="207 - 11 (INTERRUPT_NAME) 0xffffffff:0xffffffff"
# Remove everything until the space in front of the 11
temp=${TEST_STRING#*- }
# Remove what comes afterwards
ERROR_COUNT=${temp%% *}
This assumes that the error count is preceded by - followed and one space, and is followed by a space.

bash regex not working - works with online editors

Regex works with online editors but not in a bash script. Tried couple different ways
#!/bin/bash
echo -n "Your string> "
read String
regex='(?<!NOT.)TEST_34_TEST'
if [[ "$String" =~ ^(\?\<\!NOT\.)TEST_34_TEST ]]; then
echo Match
else
echo Non-Match
fi
if [[ "$String" =~ $regex ]]; then
echo Match
else
echo Non-Match
fi
I want string matching TEST_34_TEST and that does have NOT prefixed to it
TEST_34_TEST,TEST_34_TEST,TEST_34_TEST -> should match all 3
TEST_34_TEST, NOT_TEST_34_TEST, TEST_34_TEST -> should match 2 values
NOT_TEST_34_TEST, TEST_34_TEST, TEST_34_TEST -> should match 2 values
Thanks in advance.
You can use GNU grep if you only want to know the number of matches (and not do anything with them)
for s in "TEST_34_TEST,TEST_34_TEST,TEST_34_TEST" "TEST_34_TEST, NOT_TEST_34_TEST, TEST_34_TEST" "NOT_TEST_34_TEST, TEST_34_TEST, TEST_34_TEST"; do
grep -noP '((?<!NOT.)TEST_34_TEST)' <<< "$s" | wc -l
done
and will print
3
2
2

multi-lines pattern matching

I have some files with content like this:
file1:
AAA
BBB
CCC
123
file2:
AAA
BBB
123
I want to echo the filename only if the first 3 lines are letters, or "file1" in the samples above.
Im merging the 3 lines into one and comparing it to my regex [A-Z], but could not get it to match for some reason
my script:
file=file1
if [[ $(head -3 $file|tr -d '\n'|sed 's/\r//g') == [A-Z] ]]; then
echo "$file"
fi
I ran it with bash -x, this is the output
+ file=file1
++ head -3 file1
++ tr -d '\n'
++ sed 's/\r//g'
+ [[ ASMUTCEDD == [A-Z] ]]
+exit
What you missed:
You can use grep to check that the input matches only [A-Z] characters (or indeed Bash's built-in regex matching, as #Barmar pointed out)
You can use the pipeline directly in the if statement, without [[ ... ]]
Like this:
file=file1
if head -n 3 "$file" | tr -d '\n\r' | grep -qE '^[A-Z]+$'; then
echo "$file"
fi
To do regular expression matching you have to use =~, not ==. And the regular expression should be ^[A-Z]*$. Your regular expression matches if there's a letter anywhere in the string, not just if the string is entirely letters.
if [[ $(head -3 $file|tr -d '\n\r') =~ ^[A-Z]*$ ]]; then
echo "$file"
fi
You can use built-ins and character classes for this problem:-
#!/bin/bash
file="file1"
C=0
flag=0
while read line
do
(( ++C ))
[ $C -eq 4 ] && break;
[[ "$line" =~ '[^[:alpha:]]' ]] && flag=1
done < "$file"
[ $flag -eq 0 ] && echo "$file"

Bash regex string variable match

I have the following script i wrote in perl that works just fine. But i am trying to achieve the same thing using bash.
#!/usr/bin/perl
use 5.010;
use strict;
INIT {
my $string = 'Seconds_Behind_Master: 1';
my ($s) = ($string =~ /Seconds_Behind_Master: ([\d]+)/);
if ($s > 10) {
print "Too long... ${s}";
} else {
print "It's ok";
}
}
__END__
How can i achieve this using a bash script? Basically, i want to be able to read and match the value at the end of the string "Seconds_Behind_Master: N" where N can be any value.
You can use regular expression in bash, just like in perl.
#!/bin/bash
STRING="Seconds_Behind_Master: "
REGEX="Seconds_Behind_Master: ([0-9]+)"
RANGE=$( seq 8 12 )
for i in $RANGE; do
NEW_STRING="${STRING}${i}"
echo $NEW_STRING;
[[ $NEW_STRING =~ $REGEX ]]
SECONDS="${BASH_REMATCH[1]}"
if [ -n "$SECONDS" ]; then
if [[ "$SECONDS" -gt 10 ]]; then
echo "Too Long...$SECONDS"
else
echo "OK"
fi
else
echo "ERROR: Failed to match '$NEW_STRING' with REGEX '$REGEX'"
fi
done
Output
Seconds_Behind_Master: 8
OK
Seconds_Behind_Master: 9
OK
Seconds_Behind_Master: 10
OK
Seconds_Behind_Master: 11
Too Long...11
Seconds_Behind_Master: 12
Too Long...12
man bash #BASH_REMATCH
You can use a tool for it e.g. sed if you want to stay with regexps:
#!/bin/sh
string="Seconds_Behind_Master: 1"
s=`echo $string | sed -r 's/Seconds_Behind_Master: ([0-9]+)/\1/g'`
if [ $s -gt 10 ]
then
echo "Too long... $s"
else
echo "It's OK"
fi
The specific case of "more than a single digit" is particularly easy with just a pattern match:
case $string in
*Seconds_Behind_Master: [1-9][0-9]*) echo Too long;;
*) echo OK;;
esac
To emulate what your Perl code is doing more closely, you can extract the number with simple string substitutions.
s=${string##*Seconds_Behind_Master: }
s=${s%%[!0-9]*}
[ $s -gt 10 ] && echo "Too long: $s" || echo OK.
These are glob patterns, not regular expressions; * matches any string, [!0-9] matches a single character which is not a digit. All of this is Bourne-compatible, i.e. not strictly Bash only (you can use /bin/sh instead of /bin/bash).