Trying to write a regex in bash - regex

I am new to regex and I am trying to write a regex in a bash script .
I am trying to match line with a regex which has to return the second word in the line .
regex = "commit\s+(.*)"
line = "commit 5456eee"
if [$line =~ $regex]
then
echo $2
else
echo "No match"
fi
When I run this I get the following error:-
man.sh: line 1: regex: command not found
man.sh: line 2: line: command not found
I am new to bash scripting .
Can anyone please help me fix this .
I just want to write a regex to capture the word that follows commit

You don't want a regex, you want parameter expansion/substring extraction:
line="commit 5456eee"
first="${line% *}"
regex="${line#* }"
if [[ $line =~ $regex ]]
then
echo $2
else
echo "No match"
fi
$first == 'commit', $regex == '5456eee'. Bash provides all the tools you need.

If you really only need the second word you could also do it with awk
line = "commit 5456eee"
echo $line | awk '{ print $2 }'
or if you have a file:
cat filename | awk '{ print $2 }'
Even if it's no bash only solution, awk should be present on most linux os's.

You should remove the spaces around the equals sign, otherwise bash thinks you want to execute the regex command using = and "commit\s+(.*)" as arguments.
Then you should remove the spaces also in the if condition and quote the strings:
$ regex="commit\s+(.*)"
$ line="commit 5456eee"
$ if [ "$line"=~"$regex" ]
> then
> echo "Match"
> else
> echo "No match"
> fi
Match

maybe you didn't start your script with the
#!/bin/sh
or
#!/bin/bash
to define the language you're using... ?
It must be your first line.
then be careful, spaces are consistant in bash. In your "if" statement, it should be :
if [ $line =~ $regex ]
check this out and tell us more about the errors you get

if you make this script to a file like test.sh
and execute like that :
test.sh commit aaa bbb ccc
$0 $1 $2 $3 $4
you can get the arguments eassily by $0 $1...

A simple way to get the resulting capture group that was matched (if there is one) is to use BASH_REMATCH, which puts the match results into it's own array:
regex=$"commit (.*)"
line=$"commit 5456eee"
if [[ $line =~ $regex ]]
then
match=${BASH_REMATCH[1]}
echo $match
else
echo "No match"
fi
Since you have only one capture group it will be defined within the array as BASH_REMATCH[1]. In the above example I've assigned the variable $match to the result of BASH_REMATCH[1] which returns:
5456eee

Related

Bash regex =~ doesn’t support multiline mode?

using =~ operator to match output of a command and grab group from it. Code is as follows:
Comamndout=$(cmd) Match=‘^hello world’ If $Comamndout =~ $Match; then
echo something fi
Commandout is in pattern
Something
Hello world
But if statement is failing.
Is bash regex support multiline search with everyline start with ^ and end with $.
No, the =~ operator doesn't perform a multiline search. A newline must be matched literally:
string=$(cmd)
regexp='(^|'$'\n'')hello world'
if [[ $string =~ $regexp ]]; then
echo matches
fi
=~ would treat multiple lines as one line.
if [[ $(echo -e "abc\nd") =~ ^a.*d$ ]]; then
echo "find a string '$(echo -e "abc\nd")' that starts with a and ends with d"
fi
Output:
find a string 'abc
d' that starts with a and ends with d
P.S.
When processing multiple lines, it is common to use grep or read with either re-direct or pipeline.
For a grep and pipeline example:
# to find a line start with either a or e
echo -e "abc\nd\ne" | grep -E "^[ae]"
Output:
abc
e
For a read and redirect example:
while read line; do
if [[ $line =~ ^a} ]] ; then
echo "find a line '${line}' start with a"
fi
done <<< $(echo -e "abc\nd\ne")
Output:
find a line 'abc' start with a
P.S.
-e of echo means translate following \n into new line. -E of grep means using the extended regular expression to match.

How to store each occurrence of multiline string in array using bash regex

Given a text file test.txt with contents:
hello
someline1
someline2
...
world1
line that shouldn't match
hello
someline1
someline2
...
world2
How can I store both of these multiline matches in separate array indexes?
I'm currently trying to use regex="hello.*world[12]"
Unfortunately I can only use native Bash, so Perl etc is off the table. Thanks
As the regex of bash does not have such functionality as findall() function of python, we need to capture the matched substring one by one in the loop.
Would you please try the following:
#!/bin/bash
str=$(<test.txt)
regex="hello.world[12]"
while [[ $str =~ ($regex)(.*) ]]; do
ary+=( "${BASH_REMATCH[1]}" ) # store the match into an array
str="${BASH_REMATCH[2]}" # remaining substring
done
for i in "${!ary[#]}"; do # see the result
echo "[$i] ${ary[$i]}"
done
Output:
[0] hello
world1
[1] hello
world2
[Edit]
If there exist some lines between "hello" and "world", we need to change the approach as the regex of bash does not support the shortest match. Then how about:
regex1="hello"
regex2="world"
while IFS= read -r line; do
if [[ $line =~ $regex1 ]]; then
str="$line"$'\n'
f=1
elif (( f )); then
str+="$line"$'\n'
if [[ $line =~ $regex2 ]]; then
ary+=("$str")
f=0
fi
fi
done < test.txt
I would use awk and mapfile (bash version >= 4.3)
#!/bin/bash
mapfile -d '' arr < <(
awk '/hello/{f=1} f; /world[12]/ && f {f=0; printf "\000"}' test.txt
)
arr=([0]=$'hello\nsomeline1\nsomeline2\n...\nworld1\n' [1]=$'hello\nsomeline1\nsomeline2\n...\nworld2\n')
notes:
awk '/hello/{f=1} f; /world[12]/ && f{f=0; printf "\000"}'
. when encountering hello, set the flag to true
. for each line, print it if the flag is true
. when encountering world[12] and the flag is true, set the flag to false and print a null-byte delimiter
mapfile -d '' arr
split the input into an array in which each element was delimited by a null-byte (instead of \n)
version for older bash:
#!/bin/bash
arr=()
while IFS='' read -r -d '' block
do
arr+=( "$block" )
done < <(
awk '/hello/{f=1} f; /world[12]/ && f{f=0; printf "\000"}' test.txt
)

How to match fields regex in bash

Made a regex that matches the field I want to assign to my variable in bash:
The regex is:
(\,?[ ]?(\.?\d{1,3}){4})+\ (.*)
and the substring I am interested about is $3 (group 3)
Could anyone please give me command line to assign the substring to my variable?
Example:
MYVARIABLE=$(echo $FULLSTRING | grep -oP '(\,?[ ]?(\.?\d{1,3}){4})+\ (.*)'
But this example obviously did not work
Thanks a lot
You may extract the Group 3 value using Bash regex matching:
text="1.23.23.45 This is what I want"
rx='(,? ?(\.?[0-9]{1,3}){4})+ (.*)'
if [[ $text =~ $rx ]]; then
echo "${BASH_REMATCH[3]}"
else
echo "No match!"
fi
See the online Bash demo printing This is what I want.
If there is a regex match (if [[ $text =~ $rx ]]), the contents of Group 3 are in "${BASH_REMATCH[3]}".
If you have Perl installed, then you can match against your regex and print the field you want:
MYVARIABLE=$(echo $FULLSTRING | perl -nE '/(\,?[ ]?(\.?\d{1,3}){4})+\ (.*)/;say $3')
Example:
FULLSTRING=', .123.4.5.6 matchthis'
MYVARIABLE=$(echo $FULLSTRING | perl -nE '/(\,?[ ]?(\.?\d{1,3}){4})+\ (.*)/;say $3')
echo $MYVARIABLE
Outputs: matchthis

Using regular expressions in a ksh Script

I have a file (file.txt) that contains some text like:
000000000+000+0+00
000000001+000+0+00
000000002+000+0+00
and I am trying to check each line to make sure that it follows the format:
character*9, "+", character*3, "+", etc
so far I have:
#!/bin/ksh
file=file.txt
line_number=1
for line in $(cat $file)
do
if [[ "$line" != "[[.]]{9}+[[.]]{3}+[[.]]{1}+[[.]]{2} ]" ]]
then
echo "Invalid number ($line) check line $line_number"
exit 1
fi
let "line_number++"
done
however this does not evaluate correctly, no matter what I put in the lines the program terminates.
When you want line numbers of the mismatches, you can use grep -vn. Be careful with writing a correct regular expression, and you will have
grep -Evn "^.{9}[+].{3}[+].[+].{2}$" file.txt
This is not in the layout that you want, so change the layout with sed:
grep -Evn "^.{9}[+].{3}[+].[+].{2}$" file.txt |
sed -r 's/([^:]*):(.*)/Invalid number (\2) check line number \1./'
EDIT:
I changed .{1} into ..
The sed is also over the top. When you need spme explanation, you can start with echo "Linenr:Invalid line"
I'm having funny results putting the regex in the condition directly:
$ line='000000000+000+0+00'
$ [[ $line =~ ^.{9}\+.{3}\+.\+..$ ]] && echo ok
ksh: syntax error: `~(E)^.{9}\+.{3}\+.\+..$ ]] && echo ok
' unexpected
But if I save the regex in a variable:
$ re="^.{9}\+.{3}\+.\+..$"
$ [[ $line =~ $re ]] && echo ok
ok
So you can do
#!/bin/ksh
file=file.txt
line_number=1
re="^.{9}\+.{3}\+.\+..$"
while IFS= read -r line; do
if [[ ! $line =~ $re ]]; then
echo "Invalid number ($line) check line $line_number"
exit 1
fi
let "line_number++"
done < "$file"
You can also use a plain glob pattern:
if [[ $line != ?????????+???+?+?? ]]; then echo error; fi
ksh glob patterns have some regex-like syntax. If there's an optional space in there, you can handle that with the ?(sub-pattern) syntax
pattern="?????????+???+?( )?+??"
line1="000000000+000+0+00"
line2="000000000+000+ 0+00"
[[ $line1 == $pattern ]] && echo match || echo no match # => match
[[ $line2 == $pattern ]] && echo match || echo no match # => match
Read the "File Name Generation" section of the ksh man page.
Your regex looks bad - using sites like https://regex101.com/ is very helpful. From your description, I suspect it should look more like one of these;
^.{9}\+.{3}\+.{1}\+.{2}$
^[^\+]{9}\+[^\+]{3}\+[^\+]{1}\+[^\+]{2}$
^[0-9]{9}\+[0-9]{3}\+[0-9]{1}\+[0-9]{2}$
From the ksh manpage section on [[ - you would probably want to be using =~.
string =~ ere
True if string matches the pattern ~(E)ere where ere is an extended regular expression.
Note: As far as I know, ksh regex doesn't follow the normal syntax
You may have better luck with using grep:
# X="000000000+000+0+00"
# grep -qE "^[^\+]{9}\+[^\+]{3}\+[^\+]{1}\+[^\+]{2}$" <<<"${X}" && echo true
true
Or:
if grep -qE "^[^\+]{9}\+[^\+]{3}\+[^\+]{1}\+[^\+]{2}$" <<<"${line}"
then
exit 1
fi
You may also prefer to use a construct like below for handling files:
while read line; do
echo "${line}";
done < "${file}"

Regex in a bash scipt

I've got the following text file which contains:
12.3-456, test
test test test
If the line contains xx.x-xxx, then I want to print the line out. (X's are numbers)
I think I have the correct regex and have tested it here:
http://regexr.com/3clu3
I have then used this in a bash script but the line containing the text is not printed out.
What have I messed up?
#!/bin/bash
while IFS='' read -r line || [[ -n "$line" ]]; do
if [[ $line =~ /\d\d.\d-\d\d\d,/g ]]; then
echo $line
fi
done < input.txt
You need to use [0-9] instead of a \d in Bash regex. No regex delimiters are necessary, and the global flag is not necessary either. Also, you can contract it a bit using limiting quantifiers (like {3} that will match 3 occurrences of the pattern next to it). Besides, a dot matches any character in regex, so you need to escape it if you want to match a literal dot symbol.
Use
regex="[0-9]{2}\.[0-9]-[0-9]{3},"
if [[ $line =~ $regex ]]
...
This works:
#!/bin/bash
#regex="/\d\d.\d-\d\d\d,/g"
regex="[0-9\.\-]+\, [A-Za-z]+"
while IFS='' read -r line || [[ -n "$line" ]]; do
echo $line
if [[ $line =~ $regex ]]; then
echo "match"
fi
done
regex is [any of 0-9, '.', '-'] followed by ',' followed by alphachars. This could be refined in a number of ways - e.g. explicit places before/ after '-'.
Testing indicates:
$ ./sqltrace2.sh < input.txt
12.3-456, test
match
123.3-456, test
match
12.3-456,
test test test
test test test