Bash: extract 2nd integer from test string

Bash: extract 2nd integer from test string - regex

I have a string which can contain 2 or more integers. I am trying to extract only the 2nd integer but the following code is printing all occurences.
#!/bin/bash
TEST_STRING=$(echo "207 - 11 (INTERRUPT_NAME) 0xffffffff:0xffffffff")
ERROR_COUNT=$(echo $TEST_STRING | grep -o -E '[0-9]+')
echo $ERROR_COUNT
The output is:
207 11 0 0
Basically I would like ERROR_COUNT to be 11 for the given TEST_STRING.

Using bash's =~ operator:
$ test_string="207 - 11 (INTERRUPT_NAME) 0xffffffff:0xffffffff"
$ [[ $test_string =~ [0-9]+[^0-9]+([0-9]+) ]] && [[ ! -z ${BASH_REMATCH[1]} ]] && echo ${BASH_REMATCH[1]}
11
Explained:
[[ $test_string =~ [0-9]+[^0-9]+([0-9]+) ]] if $test_string has substring
integer — non-integer — integer, the latter integer is set to variable ${BASH_REMATCH[1]}
&& and
[[ ! -z ${BASH_REMATCH[1]} ]] something is actually set to the variable
&& "then"
echo ${BASH_REMATCH[1]} output the variable

#!/bin/bash
TEST_STRING=$(echo "207 - 11 (INTERRUPT_NAME) 0xffffffff:0xffffffff")
ERROR_COUNT="$( echo "${TEST_STRING}" | awk '{print $3}' )"
echo "${ERROR_COUNT}"
The output is:
11

Here is my take on it, using read to parse the separate variables:
TEST_STRING="207 - 11 (INTERRUPT_NAME) 0xffffffff:0xffffffff"
read num sep ERROR_COUNT rest <<<"$TEST_STRING"
echo ${ERROR_COUNT}
11

Using an auxiliary variable:
TEST_STRING="207 - 11 (INTERRUPT_NAME) 0xffffffff:0xffffffff"
# Remove everything until the space in front of the 11
temp=${TEST_STRING#*- }
# Remove what comes afterwards
ERROR_COUNT=${temp%% *}
This assumes that the error count is preceded by - followed and one space, and is followed by a space.

Related

Unable to compare regular expression with string properly

Basically im trying to add up all the numbers in a file called numbers.txt . It contains non-number strings as well .
Here is my shell script
#!/bin/bash
sum=0
x=$(cat numbers.txt)
re='^[0-9]+$'
for i in $x
do
echo $i
if [ $i = re ]
then
sum=`expr $sum + $i`
fi
done
echo $sum
Here is the Output
abc
hellow
123
1
2
3
hello67
39
0
Below is txt file
abc hellow 123
1 2 3
hello67 39
The output instead of zero should have been 168 .

Corrected your script a little, for comparing with regex i use =~:
#!/bin/bash
sum=0
x=$(cat numbers.txt)
re='^[0-9]+$'
for i in $x
do
echo $i
if [[ $i =~ $re ]]
then
sum=$((sum + i))
fi
done
echo $sum

You are literally comparing the strings, not matching a regex.
You can use grep for regex matching, for example:
#!/bin/bash
sum=0
x=$(cat numbers.txt)
re='^[0-9]+$'
for i in $x
do
echo $i
echo $i | grep -oP "${re}" &> /dev/null
if [ $? == "0" ]
then
sum=`expr $sum + $i`
fi
done
echo $sum
echo $i | grep -oP "${re}" will pipe the text into grep. If it matches the regex, grep returns 0 which will be written into the special variable $?. So if that is 0, you know you have a number and can sum it up. That is the reason for if [ $? == "0" ].
Btw: = will assign a value to a variable, to compare, you need to use ==.
When using [ it actually does, my bad.

Parsing string with two captures in bash

I'm trying to parse a string with regex. A valid string is of the following format:
https://github.com/xyz/abc/a_123/project_14.git
The valid string should contain github.com and xyz or zyx. If the string is valid I want to capture abc/a_123 into $A and project_14 into $B.
What I did:
if [[ "$x" == *"github.com"* ]]; then
if [[ "$x" == *"xyz"* ]]; then
# (1)
elif [[ "$x" == *"zyx"* ]]; then
# (2)
else
return 1 # Invalid
fi
return 0 # Valid
fi
return 1 # Invalid
In both (1) and (2) I want to set $A and $B with the values (same behavior on different cases).
Also, I think that this solution is not good because it will enter the if-else in the case of https://github.com/bla/abc/a_123/xyz.git so I guess we need to change it to be "github.com/xyz". Also, how can I get rid of .git (if exists)?
Another example:
https://github.com/zyx/asdasdas/lalal/asdas/nu.git
# $A = asdasdas/lalal/asdas
# $B = nu
What is the proper way to achieve this goal?

Here is a way using regex:
url='https://github.com/xyz/abc/a_123/project_14.git'
if [[ $url =~ http[s]?:[/]{2}(github.com)[/]([[:alpha:]]+)(/.*)$ ]]
then
$A=${BASH_REMATCH[2]}
$B=${BASH_REMATCH[3]%.git}
fi
And here is a small proof of concept:
url='https://github.com/xyz/abc/a_123/project_14.git'
if [[ $url =~ http[s]?:[/]{2}(github.com)[/]([[:alpha:]]+)(/.*)$ ]]
then
echo ${BASH_REMATCH[2]} ${BASH_REMATCH[3]%.git}
fi
Resulting in:
xyz /abc/a_123/project_14

I think this does what you want :
#!/bin/bash
repo="https://github.com/xyz/abc/a_123/project_14.git"
[[ ! "$repo" =~ https:\/\/github.com\/[a-z]+\/[a-z]+\/[a-z]_[0-9]+\/.*.git ]] && exit
A=$( echo "$repo" | sed -E "s/(https:\/\/github.com\/[a-z]+)(\/[a-z]+\/[a-z]_[0-9]+\/)(.*.git)/\2/g" )
B=$( echo "$repo" | sed -E "s/(https:\/\/github.com\/[a-z]+)(\/[a-z]+\/[a-z]_[0-9]+\/)(.*.git)/\3/g" )
echo "$A"
echo "${B%%.git}"
Let me know if it helps

Would you please try the following:
strchk() {
local x=$1
if [[ $x =~ github.com/(xyz|zyx)/(.+)/(.+) ]]; then
A="${BASH_REMATCH[2]}"
B="${BASH_REMATCH[3]%.*}"
return 0
else
return 1
fi
}
Results:
strchk "https://github.com/xyz/abc/a_123/project_14.git" && echo "A=$A, B=$B"
=> A=abc/a_123, B=project_14
strchk "https://github.com/bla/abc/a_123/xyz.git" && echo "A=$A, B=$B"
=> <empty>
strchk "https://github.com/zyx/asdasdas/lalal/asdas/nu.git" && echo "A=$A, B=$B"
=> A=asdasdas/lalal/asdas, B=nu
Explanations:
The pattern github.com/(xyz|zyx)/ matches a string which contains
github.com/ followed by xyz/ or zyx/.
The next pattern (.+)/ matches a substring after xyz/ or zyx/ as long
as it reaches the rightmost slash then stores the captured substring within the parens into
a bash variable ${BASH_REMATCH[2]}.
The last pattern (.+) captures the remaining substring into
${BASH_REMATCH[3]}.
The parameter expansion ${BASH_REMATCH[3]%.*} removes the extension
after the dot if exists.
Hope this helps.

Search curl output for string with Bash

Trying to fetch a webpage as a lowercase string, then search the string output for a substring
My attempt:
1 #!/usr/bin/env bash
2
3 URL="https://somesite.com"
4 MOVIES_SOURCE="movies.txt"
5 PAGE=`curl "$URL"` | tr '[:upper:]' '[:lower:]'
6
7 while IFS= read -r movie
8 do
9 FOUND="$($PAGE =~ "$movie")"
10 echo $FOUND
11 if [[ $FOUND ]]; then
12 echo "$movie found"
13 fi
14 done < $MOVIES_SOURCE
15
When I run this, I'm receiving line 9: =~: command not found
The $movie variable is valid and contains each line from movies.txt, but I'm struggling to figure out this one!

If you want to use regex matching in bash:
if [[ $PAGE =~ $movie ]]; then
echo "$movie found"
fi
example:
PAGE="text blah Avengers more text"
movie="Avengers"
if [[ $PAGE =~ $movie ]]; then
echo "$movie found"
fi
gives:
Avengers found
Also: to capture the output of the whole curl command:
PAGE=$(curl "$URL" | tr '[:upper:]' '[:lower:]')
always prefer $() over backticks
you had to wrap your whole command in order for $PAGE to contain the output where you converted to lowercase.

multi-lines pattern matching

I have some files with content like this:
file1:
AAA
BBB
CCC
123
file2:
AAA
BBB
123
I want to echo the filename only if the first 3 lines are letters, or "file1" in the samples above.
Im merging the 3 lines into one and comparing it to my regex [A-Z], but could not get it to match for some reason
my script:
file=file1
if [[ $(head -3 $file|tr -d '\n'|sed 's/\r//g') == [A-Z] ]]; then
echo "$file"
fi
I ran it with bash -x, this is the output
+ file=file1
++ head -3 file1
++ tr -d '\n'
++ sed 's/\r//g'
+ [[ ASMUTCEDD == [A-Z] ]]
+exit

What you missed:
You can use grep to check that the input matches only [A-Z] characters (or indeed Bash's built-in regex matching, as #Barmar pointed out)
You can use the pipeline directly in the if statement, without [[ ... ]]
Like this:
file=file1
if head -n 3 "$file" | tr -d '\n\r' | grep -qE '^[A-Z]+$'; then
echo "$file"
fi

To do regular expression matching you have to use =~, not ==. And the regular expression should be ^[A-Z]*$. Your regular expression matches if there's a letter anywhere in the string, not just if the string is entirely letters.
if [[ $(head -3 $file|tr -d '\n\r') =~ ^[A-Z]*$ ]]; then
echo "$file"
fi

You can use built-ins and character classes for this problem:-
#!/bin/bash
file="file1"
C=0
flag=0
while read line
do
(( ++C ))
[ $C -eq 4 ] && break;
[[ "$line" =~ '[^[:alpha:]]' ]] && flag=1
done < "$file"
[ $flag -eq 0 ] && echo "$file"

How to match this string in bash?

I'm reading a file in bash, line by line. I need to print lines that have the following format:
don't care <<< at least one character >>> don't care.
These are all the way which I have tried and none of them work:
if [[ $line =~ .*<<<.+>>>.* ]]; then
echo "$line"
fi
This has incorrect syntax
These two have correct syntax don't work
if [[ $line =~ '.*<<<.+>>>.*' ]]; then
echo "$line"
fi
And this:
if [[ $line == '*<<<*>>>*' ]]; then
echo "$line"
fi
So how to I tell bash to only print lines with that format? PD: I have tested and printing all lines works just fine.

Don't need regular expression. filename patterns will work just fine:
if [[ $line == *"<<<"?*">>>"* ]]; then ...
* - match zero or more characters
? - match exactly one character
"<<<" and ">>>" - literal strings: The angle brackets need to be quoted so bash does not interpret them as a here-string redirection.
$ line=foobar
$ [[ $line == *"<<<"?*">>>"* ]] && echo y || echo n
n
$ line='foo<<<>>>bar'
$ [[ $line == *"<<<"?*">>>"* ]] && echo y || echo n
n
$ line='foo<<<x>>>bar'
$ [[ $line == *"<<<"?*">>>"* ]] && echo y || echo n
y
$ line='foo<<<xyz>>>bar'
$ [[ $line == *"<<<"?*">>>"* ]] && echo y || echo n
y

For maximum compatibility, it's always a good idea to define your regex pattern as a separate variable in single quotes, then use it unquoted. This works for me:
re='<<<.+>>>'
if [[ $line =~ $re ]]; then
echo "$line"
fi
I got rid of the redundant leading/trailing .*, by the way.
Of course, I'm assuming that you have a valid reason to process the file in native bash (if not, just use grep -E '<<<.+>>>' file)

<, <<, <<<, >, and >> are special in the shell and need quoting:
[[ $line =~ '<<<'.+'>>>' ]]
. and + shouldn't be quoted, though, to keep their special meaning.
You don't need the leading and trailing .* in =~ matching, but you need them (or their equivalents) in patterns:
[[ $line == *'<<<'?*'>>>'* ]]
It's faster to use grep to extract lines:
grep -E '<<<.+>>>' input-file

I don't even understand why you are reading the file line per line. I have just launched following command in the bash prompt and it's working fine:
grep "<<<<.+>>>>" test.txt
where test.txt contains following data:
<<<<>>>>
<<<<a>>>>
<<<<aa>>>>
The result of the command was:
<<<<a>>>>
<<<<aa>>>>

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Bash: extract 2nd integer from test string - regex

#!/bin/bash TEST_STRING=$(echo "207 - 11 (INTERRUPT_NAME) 0xffffffff:0xffffffff") ERROR_COUNT="$( echo "${TEST_STRING}" | awk '{print $3}' )" echo "${ERROR_COUNT}" The output is: 11

Here is my take on it, using read to parse the separate variables: TEST_STRING="207 - 11 (INTERRUPT_NAME) 0xffffffff:0xffffffff" read num sep ERROR_COUNT rest <<<"$TEST_STRING" echo ${ERROR_COUNT} 11

Related

Unable to compare regular expression with string properly

Parsing string with two captures in bash

Search curl output for string with Bash

multi-lines pattern matching

How to match this string in bash?

Categories

Resources