Match two consecutive lines using Regex and Bash features only

Match two consecutive lines using Regex and Bash features only - regex

What Regular Expression(s) can you use to match two consecutive lines?
The aim is not to use any packages like awk or sed but only use pure RegExp inside a shell script.
Example, I would like to ensure the word "hello" is immediately followed by "world" in the next line.
Acceptance criteria:
"hello" is not to have any spaces before it
"world" must have at least 1 or more space before it.
#/bin/bash
file=./myfile.txt
regex='^hello'
[[ `cat $file` =~ $regexp ]] && echo "yes" || echo "no"
myfile.txt
abc is def
hello
world
cde is efg

Here is pure bash way:
file='./myfile.txt'
[[ $(<$file) =~ hello$'\n'[[:blank:]]*world ]] && echo "yes" || echo "no"
yes
Here $'\n' matches a new line and [[:blank:]]* matches 0+ tabs or spaces.
If you want to be more precise then use:
[[ $(<file) =~ (^|$'\n')hello$'\n'[[:blank:]]*world($'\n'|$) ]] && echo "yes" || echo "no"
However grep or awk are much better tools for this job.

Related

Check if string contains embedded string in order

I want to check if some string is embedded within another string. For example pineapple and apple match as well as aepprestlse and apple.
This is a simple task if I know the word I want to test against for example:
if [[ $e == *"a"*"p"*"p"*"l"*"e"* ]]
then
echo "match"
fi
However I do not know the length or contents of what will replace my "apple" variable when I run the script. How can I perform this check with variable sizes/contents?

Here is how you can generate a glob pattern to match:
data='bcdaeppr?estlse'
search='app?le'
# generate a regex using sed i.e. *\a*\p*\p*\?\l*\e*
patt="*$(sed 's/./\\&*/g' <<< "$search")"
# now match it
[[ $data == $patt ]] && echo "matched" || echo "nope"
matched
# not matching example
data='bcdaepprestlse'
[[ $data == $patt ]] && echo "matched" || echo "nope"
nope

awk to the rescue!
$ awk -v s='pineapple' -v r='apple' '
BEGIN{for(i=1;i<=length(r);i++)
{s=substr(s,k);
k=index(s,substr(r,i,1));
if(k==0) exit 1}
exit 0}'; echo $?

Regex in a bash scipt

I've got the following text file which contains:
12.3-456, test
test test test
If the line contains xx.x-xxx, then I want to print the line out. (X's are numbers)
I think I have the correct regex and have tested it here:
http://regexr.com/3clu3
I have then used this in a bash script but the line containing the text is not printed out.
What have I messed up?
#!/bin/bash
while IFS='' read -r line || [[ -n "$line" ]]; do
if [[ $line =~ /\d\d.\d-\d\d\d,/g ]]; then
echo $line
fi
done < input.txt

You need to use [0-9] instead of a \d in Bash regex. No regex delimiters are necessary, and the global flag is not necessary either. Also, you can contract it a bit using limiting quantifiers (like {3} that will match 3 occurrences of the pattern next to it). Besides, a dot matches any character in regex, so you need to escape it if you want to match a literal dot symbol.
Use
regex="[0-9]{2}\.[0-9]-[0-9]{3},"
if [[ $line =~ $regex ]]
...

This works:
#!/bin/bash
#regex="/\d\d.\d-\d\d\d,/g"
regex="[0-9\.\-]+\, [A-Za-z]+"
while IFS='' read -r line || [[ -n "$line" ]]; do
echo $line
if [[ $line =~ $regex ]]; then
echo "match"
fi
done
regex is [any of 0-9, '.', '-'] followed by ',' followed by alphachars. This could be refined in a number of ways - e.g. explicit places before/ after '-'.
Testing indicates:
$ ./sqltrace2.sh < input.txt
12.3-456, test
match
123.3-456, test
match
12.3-456,
test test test
test test test

match leading dots in bash if using regex

Say I want to match the leading dot in a string ".a"
So I type
[[ ".a" =~ ^\. ]] && echo "ha"
ha
[[ "a" =~ ^\. ]] && echo "ha"
ha
Why am I getting the same result here?

You need to escape the dot it has meaning beyond just a period - it is a metacharacter in regex.
[[ "a" =~ ^\. ]] && echo "ha"
Make the change in the other example as well.
Check your bash version - you need 4.0 or higher I believe.

There's some compatibility issues with =~ between Bash versions after 3.0. The safest way to use =~ in Bash is to put the RE pattern in a var:
$ pat='^\.foo'
$ [[ .foo =~ $pat ]] && echo yes || echo no
yes
$ [[ foo =~ $pat ]] && echo yes || echo no
no
$
For more details, see E14 on the Bash FAQ page.

Probably it's because bash tries to treat "." as a \ character, like \n \r etc.
In order to tell \ & . as 2 separate characters, try
[[ "a" =~ ^\\. ]] && echo ha

How to check whether a string has at least one alphabetic character?

I want to check whether a string has at least one
alphabetic character?
a regex could be like:
"^.*[a-zA-Z].*$"
however, I want to judge whether a string has at least one
alphabetic character?
so I want to use, like
if [ it contains at least one alphabetic character];then
...
else
...
fi
so I'm at a loss on how to use the regex
I tried
if [ "$x"=~[a-zA-Z]+ ];then echo "yes"; else echo "no" ;fi
or
if [ "$x"=~"^.*[a-zA-Z].*$" ];then echo "yes"; else echo "no" ;fi
and test with x="1234", both of the above script output result of "yes", so they are wrong
how to achieve my goal?thanks!

Try this:
#!/bin/bash
x="1234"
y="a1234"
if [[ "$x" =~ [A-Za-z] ]]; then
echo "$x has one alphabet"
fi
if [[ "$y" =~ [A-Za-z] ]]; then
echo "Y is $y and has at least one alphabet"
fi

If you want to be portable, I'd call /usr/bin/grep with [A-Za-z].

Use the [:alpha:] character class that respects your locale, with a regular expression
[[ $str =~ [[:alpha:]] ]] && echo has alphabetic char
or a glob-style pattern
[[ $str == *[[:alpha:]]* ]] && echo has alphabetic char

It's quite common in sh scripts to use grep in an if clause. You can find many such examples in /etc/rc.d/.
if echo $theinputstring | grep -q '[a-zA-Z]' ; then
echo yes
else
echo no
fi

regex to match strings not preceded by a bang

In bash, I am trying to match valid attributes that are present in an array. Attributes may be 'disabled' by preceding them with a bang (exclamation mark, !), in which case they must not be matched. I have this:
[[ ${TESTS[#]} =~ [^\!]match ]]
which will return true if the word 'match' is in TESTS and not preceded by a !.
It works, except when the word match is in the first position in the array. The problem is the regexp is saying 'match preceded by something that isn't a !'. When it's the first item it is preceded by nothing and therefore does not match.
How do I modify the above to say 'match not preceded by !' ?
From reading answers to other questions I have tried (?<!!)match but this does not work.

Use this re:
([^\!]|^)match
Example of usage:
$ [[ match =~ (^|[^\!])match ]] && echo matches || echo "doesn't match"
matches
$ [[ xmatch =~ (^|[^\!])match ]] && echo matches || echo "doesn't match"
matches
$ [[ '!match' =~ (^|[^\!])match ]] && echo match || echo "doesn't match"
doesn't match
In general, it would be also correct to use assertions here, but bash uses POSIX regular expressions and they know nothing about assertions. But with grep (GNU grep), or perl, or anything that supports PCRE you can do it:
$ echo match | grep -qP '(?<!!)match' && echo matches || echo "doesn't match"
matches
$ echo xmatch | grep -qP '(?<!!)match' && echo matches || echo "doesn't match"
matches
$ echo '!match' | grep -qP '(?<!!)match' && echo matches || echo "doesn't match"
doesn't match

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Match two consecutive lines using Regex and Bash features only - regex

Related

Check if string contains embedded string in order

Regex in a bash scipt

match leading dots in bash if using regex

How to check whether a string has at least one alphabetic character?

regex to match strings not preceded by a bang

Categories

Resources