Regular expression does not match `]` symbol - regex

My goal is to display only the text after last ] symbol.
echo MY_TEXT | grep -o "[^\]]*$"
The output is just the last symbol.
If I change "]" symbol to any letter, it works as expected.
Examples:
$ echo Hello World | grep -o '[^o]*$'
rld # and this is correct!
$ echo He]ll]o Wo]rld | grep -o "[^\]]*$"
d # but expected: rld
Why behavior is different for symbol o and ]?
Thank you in advance.

You do not need to escape the ]:
echo 'He]ll]o Wo]rld' | grep -o "[^]]*$"
Produces: rld

At couple of thing: [ doesn't need escaping inside a character class, you can use --color to see only the letters are match and notice -o splits each match on a new line:
$ echo "He]ll]o Wo]rld" | grep -o '[^]]*$'
rld
$ echo "He]ll]o Wo]rld" | grep --color '[^]]*'
He]ll]o Wo]rld
$ echo "He]ll]o Wo]rld" | grep -o '[^]]*'
He
ll
o Wo
rld
You can strip the ] character using tr among many other ways:
$ echo "He]ll]o Wo]rld" | tr -d ']'
Hello World

Removed the escape character and it worked for me.
$ echo He]ll]o Wo]rld | grep -o "[^]]*$"
> rld

Related

RegEx for matching characters in square brackets in bash

I need to implement my regexp for some characters with square brackets:
1) [+] for this
2) [*] and for this
3) [+/*] and for this
4) [*/+] and for this
I made this on bash:
[root#testmachine5 ~]# echo "[+]" | egrep -o "^\[(\+|\*|\+/\*)\]"
[+]
[root#testmachine5 ~]# echo "[*]" | egrep -o "^\[(\+|\*|\+/\*)\]"
[*]
[root#testmachine5 ~]# echo "[+/*]" | egrep -o "^\[(\+|\*|\+/\*)\]"
[+/*]
[root#testmachine5 ~]# echo "[*/+]" | egrep -o "^\[(\+|\*|\+/\*)\]"
[root#testmachine5 ~]#
How you can see, it works, in first, second and third variants. But it does not work in last variant.
How do I solve this problem?
For the last statement you have to add another option to the alternation for the reversed version to match. You can combine the first 2 options to a character class:
echo "[*/+]" | egrep -o "^\[([+*]|\+/\*|\*\/\+)\]"
^^^^ ^^^^^^
Regex demo | Bash demo
Between square brackets match a + optionally followed by /*, or * optionally followed by /+.
$ egrep -o '\[(\+(/\*)?|\*(/\+)?)\]' <<EOF
[+]
[/]
[*]
[+/*]
[*/*]
[/+*]
[*/+]
EOF
[+]
[*]
[+/*]
[*/+]
More generic regex:
lynx#bionic:~/$ echo "[*]..[*]..[+/*]..[*/+]" | egrep -o "\[([*/+]{1,3})\]"
[*]
[*]
[+/*]
[*/+]
Or that's way
lynx#bionic:~$ echo "[*/+]" | grep -Po "(?<=\[)([*/+]{1,3})(?=\])"
*/+

Pulling Single digit out of string --bash

Example
./test.sh R19
echo "$1" > test.txt
cat test.txt | grep -o ^[A-Z] > model.txt
cat test.txt | grep -o [0-9] > num1.txt
cat test.txt | grep -o [0-9]$ > num2.txt
echo "$(cat model.txt)00$(cat num1.txt)00$(cat num2.txt)"
Im expecting to see R001009, however what i get is
R001
9009
So how can i get it so my num1.txt only recieves the middle number and not both?
That's because grep -o '[0-9]' is returning all the digits on separate lines.
The painful way would be cat test.txt | grep -o [0-9] | head -1 > num1.txt
But don't do that: you're doing way too much file I/O. Use a regex in bash:
if [[ $1 =~ ^([A-Z])([0-9])([0-9])$ ]]; then
printf "%s00%d00%d\n" "${BASH_REMATCH[#]:1}"
fi
Make sure you're using #!/bin/bash as your shebang line.
$ set -- R19
$ if [[ $1 =~ ^([A-Z])([0-9])([0-9])$ ]]; then
> printf "%s00%d00%d\n" "${BASH_REMATCH[#]:1}"
> fi
R001009

How did [a-z] match é?

Wow, this actually matched an é. What happened here? I would like it to not matching anything other than typically lower case letters.
$ echo "frappé"|egrep -E "^[a-z]+$"
frappé
egrep (GNU grep) 2.16 on Ubuntu 14.04
Your locale setting tells egrep/grep -E how to collate the [a-z] character range.
$ export LC_COLLATE=C
$ echo "frappé" | egrep '^[a-z]+$'
# no match
$ export LC_COLLATE=en_US.utf8
$ echo "frappé" | egrep '^[a-z]+$'
frappé
Named character classes can be used to match characters with diacritics in spite of the locale:
$ export LC_COLLATE=C
$ echo "frappé" | egrep '^[[:lower:]]+$'
frappé

grep/egrep the star operator not matching all occurrences

Let's take the string AaAa. I want to match the as:
$ echo AaAa | grep -o a
a
a
So it is printing every match and not just the first one. When I add a star after the a I get the following
$ echo AaAa | grep -o 'a*'
$
Why did grep not output every match this time? I know it matched because if we remove the -o option it prints the whole line:
$ echo AaAa | grep 'a*'
AaAa
To see how many matches it should have matched I used sed:
$ echo AaAa | sed 's/a*/x/g'
xAxAx
The strings that were substituted for x should have been what grep -o printed. So the matches are as follows:
The null string in the beginning for matching a zero times
The first a
The second a
Why didn't it print the following?
$ echo AaAa | grep -o 'a*'
a
a
$
EDIT
The above was done with GNU grep 2.5.1
The following was done with GNU grep 2.6.3
$ echo AaAa | grep -o 'a*'
a
a
$
Notice that it still didn't print the first null string on its own line. It seems the bug was partially fixed in this newer release. Shouldn't there be a null string matched as well, like the sed example above?
Let's start with this:
$ echo AaAa | grep -o 'a*'
$
You mentioned this was run on version 2.5.1. This appears to be a bug in grep that seems to have been fixed in 2.5.3.
Here's a quote from GNU grep development:
2.5.3
=====
Fix the combinations:
* -i -o
* --colour -i
* -o -b
* -o and zero-width matches
Go through the bug list im my mailbox and fix fixable.
Fix bugs reported with 2.5.2.
-o and zero-width matches is the bug we seem to be dealing with here. Zero width assertions don't consume characters in the string to the match, but they are still assertions so they do have to match. In this case, our zero width assertion is matching the character a zero times.
On to the next part:
$ echo AaAa | grep -o 'a*'
a
a
$
I think the reason you don't get a blank line here is just that the -o flag just doesn't print anything for zero width assertions.
You can eliminate the duplicates using awk:
$ echo AaAa | grep -o a|awk '!x[$0]++'
a

Counting regex pattern matches in one line using sed or grep?

I want to count the number of matches there is on one single line (or all lines as there always will be only one line).
I want to count not just one match per line as in
echo "123 123 123" | grep -c -E "123" # Result: 1
Better example:
echo "1 1 2 2 2 5" | grep -c -E '([^ ])( \1){1}' # Result: 1, expected: 2 or 3
You could use grep -o then pipe through wc -l:
$ echo "123 123 123" | grep -o 123 | wc -l
3
Maybe below:
echo "123 123 123" | sed "s/123 /123\n/g" | wc -l
( maybe ugly, but my bash fu is not that great )
Maybe you should convert spaces to newlines first:
$ echo "1 1 2 2 2 5" | tr ' ' $'\n' | grep -c 2
3
Why not use awk?
You could use awk '{print gsub(your_regex,"&")}'
to print the number of matches on each line, or
awk '{c+=gsub(your_regex,"&")}END{print c}'
to print the total number of matches. Note that relative speed may vary depending on which awk implementation is used, and which input is given.
This might work for you:
sed -n -e ':a' -e 's/123//p' -e 'ta' file | sed -n '$='
GNU sed could be written:
sed -n ':;s/123//p;t' file | sed -n '$='