RegEx for matching characters in square brackets in bash - regex

I need to implement my regexp for some characters with square brackets:
1) [+] for this
2) [*] and for this
3) [+/*] and for this
4) [*/+] and for this
I made this on bash:
[root#testmachine5 ~]# echo "[+]" | egrep -o "^\[(\+|\*|\+/\*)\]"
[+]
[root#testmachine5 ~]# echo "[*]" | egrep -o "^\[(\+|\*|\+/\*)\]"
[*]
[root#testmachine5 ~]# echo "[+/*]" | egrep -o "^\[(\+|\*|\+/\*)\]"
[+/*]
[root#testmachine5 ~]# echo "[*/+]" | egrep -o "^\[(\+|\*|\+/\*)\]"
[root#testmachine5 ~]#
How you can see, it works, in first, second and third variants. But it does not work in last variant.
How do I solve this problem?

For the last statement you have to add another option to the alternation for the reversed version to match. You can combine the first 2 options to a character class:
echo "[*/+]" | egrep -o "^\[([+*]|\+/\*|\*\/\+)\]"
^^^^ ^^^^^^
Regex demo | Bash demo

Between square brackets match a + optionally followed by /*, or * optionally followed by /+.
$ egrep -o '\[(\+(/\*)?|\*(/\+)?)\]' <<EOF
[+]
[/]
[*]
[+/*]
[*/*]
[/+*]
[*/+]
EOF
[+]
[*]
[+/*]
[*/+]

More generic regex:
lynx#bionic:~/$ echo "[*]..[*]..[+/*]..[*/+]" | egrep -o "\[([*/+]{1,3})\]"
[*]
[*]
[+/*]
[*/+]
Or that's way
lynx#bionic:~$ echo "[*/+]" | grep -Po "(?<=\[)([*/+]{1,3})(?=\])"
*/+

Related

grep within nested brackets

How do I grep strings in between nested brackets using bash? Is it possible without the use of loops? For example, if I have a string like:
[[TargetString1:SomethingIDontWantAfterColon[[TargetString2]]]]
I wish to grep only the two target strings inside the [[]]:
TargetString1
TargetString2
I tried the following command which cannot get TargetString2
grep -o -P '(?<=\[\[).*(?=\]\])'|cut -d ':' -f1
With GNU's grep P option:
grep -oP "(?<=\[\[)[\w\s]+"
The regex will match a sequence of word characters (\w+) when followed by two brackets ([[). This works for your sample string, but will not work for more complicated constructs like:
[[[[TargetString1]]TargetString2:SomethingIDontWantAfterColon[[TargetString3]]]]
where only TargetString1 and TargetString3 are matched.
To extract from nested [[]] brackets, you can use sed
#!/bin/bash
str="[[TargetString1:SomethingIDontWantAfterColon[[TargetString2]]]]"
echo $str | grep -o -P '(?<=\[\[).*(?=\]\])'|cut -d ':' -f1
echo $str | sed 's/.*\[\([^]]*\)\].*/\1/g' #which works only if string exsit between []
Output:
TargetString1
TargetString2
You can use grep regex grep -Eo '\[\[\w+' | sed 's/\[\[//g' for doing this
[root#localhost ~]# echo "[[TargetString1:SomethingIDontWantAfterColon[[TargetString2]]]]" | grep -Eo '\[\[\w+' | sed 's/\[\[//g'
TargetString1
TargetString2
[root#localhost ~]#

Print matching regex group in grep

I have this text https://bitbucket.com/user/repo.git and I want to print repo, the content between / and .git, without including delimiters. I have this:
echo https://bitbucket.com/user/repo.git | grep -E -o '\/(.*?)\.git'
But it prints /repo.git. How can I print just repo?
Use the [^/]+(?=\.git$) pattern with -P option:
echo https://bitbucket.com/user/repo.git | grep -P -o '[^/]+(?=\.git$)'
See the online demo
The [^/]+(?=\.git$) pattern matches 1+ chars other than / that are followed with .git at the end of the string.
You can use sed to do that
echo https://bitbucket.com/user/repo.git | sed -e 's/^.\*\\/\\(.\*\\).git$/\1/g'

Why [^\d\w\s,] matches "leonardo,davinci"?

I can't understand why the regexp:
[^\d\s\w,]
Matches the string:
"leonardo,davinci"
That is my test:
$ echo "leonardo,davinci" | egrep '[^\d\w\s,]'
leonardo,davinci
While this works as expected:
$ echo "leonardo,davinci" | egrep '[\S\W\D]'
$
Thanks very much
It's because egrep doesn't have the predefined sets \d, \w, \s. Therefore, putting slash in front of them is just matching them literally:
leonardo,davinci
echo "leonardo,davinci" | egrep '[^a-zA-Z0-9 ,]'
Will indeed, not match.
If you have it installed, you can use pcregrep instead:
echo "leonardo,davinci" | pcregrep '[^\w\s,]'

grep/egrep the star operator not matching all occurrences

Let's take the string AaAa. I want to match the as:
$ echo AaAa | grep -o a
a
a
So it is printing every match and not just the first one. When I add a star after the a I get the following
$ echo AaAa | grep -o 'a*'
$
Why did grep not output every match this time? I know it matched because if we remove the -o option it prints the whole line:
$ echo AaAa | grep 'a*'
AaAa
To see how many matches it should have matched I used sed:
$ echo AaAa | sed 's/a*/x/g'
xAxAx
The strings that were substituted for x should have been what grep -o printed. So the matches are as follows:
The null string in the beginning for matching a zero times
The first a
The second a
Why didn't it print the following?
$ echo AaAa | grep -o 'a*'
a
a
$
EDIT
The above was done with GNU grep 2.5.1
The following was done with GNU grep 2.6.3
$ echo AaAa | grep -o 'a*'
a
a
$
Notice that it still didn't print the first null string on its own line. It seems the bug was partially fixed in this newer release. Shouldn't there be a null string matched as well, like the sed example above?
Let's start with this:
$ echo AaAa | grep -o 'a*'
$
You mentioned this was run on version 2.5.1. This appears to be a bug in grep that seems to have been fixed in 2.5.3.
Here's a quote from GNU grep development:
2.5.3
=====
Fix the combinations:
* -i -o
* --colour -i
* -o -b
* -o and zero-width matches
Go through the bug list im my mailbox and fix fixable.
Fix bugs reported with 2.5.2.
-o and zero-width matches is the bug we seem to be dealing with here. Zero width assertions don't consume characters in the string to the match, but they are still assertions so they do have to match. In this case, our zero width assertion is matching the character a zero times.
On to the next part:
$ echo AaAa | grep -o 'a*'
a
a
$
I think the reason you don't get a blank line here is just that the -o flag just doesn't print anything for zero width assertions.
You can eliminate the duplicates using awk:
$ echo AaAa | grep -o a|awk '!x[$0]++'
a

Regular expression does not match `]` symbol

My goal is to display only the text after last ] symbol.
echo MY_TEXT | grep -o "[^\]]*$"
The output is just the last symbol.
If I change "]" symbol to any letter, it works as expected.
Examples:
$ echo Hello World | grep -o '[^o]*$'
rld # and this is correct!
$ echo He]ll]o Wo]rld | grep -o "[^\]]*$"
d # but expected: rld
Why behavior is different for symbol o and ]?
Thank you in advance.
You do not need to escape the ]:
echo 'He]ll]o Wo]rld' | grep -o "[^]]*$"
Produces: rld
At couple of thing: [ doesn't need escaping inside a character class, you can use --color to see only the letters are match and notice -o splits each match on a new line:
$ echo "He]ll]o Wo]rld" | grep -o '[^]]*$'
rld
$ echo "He]ll]o Wo]rld" | grep --color '[^]]*'
He]ll]o Wo]rld
$ echo "He]ll]o Wo]rld" | grep -o '[^]]*'
He
ll
o Wo
rld
You can strip the ] character using tr among many other ways:
$ echo "He]ll]o Wo]rld" | tr -d ']'
Hello World
Removed the escape character and it worked for me.
$ echo He]ll]o Wo]rld | grep -o "[^]]*$"
> rld