grep/egrep the star operator not matching all occurrences - regex

Let's take the string AaAa. I want to match the as:
$ echo AaAa | grep -o a
a
a
So it is printing every match and not just the first one. When I add a star after the a I get the following
$ echo AaAa | grep -o 'a*'
$
Why did grep not output every match this time? I know it matched because if we remove the -o option it prints the whole line:
$ echo AaAa | grep 'a*'
AaAa
To see how many matches it should have matched I used sed:
$ echo AaAa | sed 's/a*/x/g'
xAxAx
The strings that were substituted for x should have been what grep -o printed. So the matches are as follows:
The null string in the beginning for matching a zero times
The first a
The second a
Why didn't it print the following?
$ echo AaAa | grep -o 'a*'
a
a
$
EDIT
The above was done with GNU grep 2.5.1
The following was done with GNU grep 2.6.3
$ echo AaAa | grep -o 'a*'
a
a
$
Notice that it still didn't print the first null string on its own line. It seems the bug was partially fixed in this newer release. Shouldn't there be a null string matched as well, like the sed example above?

Let's start with this:
$ echo AaAa | grep -o 'a*'
$
You mentioned this was run on version 2.5.1. This appears to be a bug in grep that seems to have been fixed in 2.5.3.
Here's a quote from GNU grep development:
2.5.3
=====
Fix the combinations:
* -i -o
* --colour -i
* -o -b
* -o and zero-width matches
Go through the bug list im my mailbox and fix fixable.
Fix bugs reported with 2.5.2.
-o and zero-width matches is the bug we seem to be dealing with here. Zero width assertions don't consume characters in the string to the match, but they are still assertions so they do have to match. In this case, our zero width assertion is matching the character a zero times.
On to the next part:
$ echo AaAa | grep -o 'a*'
a
a
$
I think the reason you don't get a blank line here is just that the -o flag just doesn't print anything for zero width assertions.

You can eliminate the duplicates using awk:
$ echo AaAa | grep -o a|awk '!x[$0]++'
a

Related

Get the following character which match a string

I'm trying to retreive a specific data returned from a command line. Here is my command line:
snmpwalk -v2c -c community localhost 1.3.6.1.2.1.2 | grep tun0
Which give me as result:
IF-MIB::ifDescr.4 = STRING: tun0
In this result I want to retreive 4. I thought using regex, but maybe there is an easier way to fetch it.
Regex I tried :
\ifDescr.\s+\K\S+ https://regex101.com/r/9X04MD/1
[\n\r].*ifDescr.\s*([^\n\r]*) https://regex101.com/r/9X04MD/2
I would like to fetch it in a single command line like
snmpwalk -v2c -c community localhost 1.3.6.1.2.1.2 | grep tun0 | ?
There are so many options that don't involve using GNU grep's experimental -P option. For example given just your sample input to work off, here's one way with any sed:
$ echo "$out" | sed 's/.*\.\([0-9]\).*tun0/\1/'
4
or any awk:
$ echo "$out" | awk -F'[. ]' '/tun0/{print $2}'
4
I'd recommend pattern (?<=ifDescr\.)[^ =]+
Explanation:
(?<=ifDescr\.) - positive lookbehind, asserts that wat is preceeding is ifDescr.
[^ =]+ match one or more characters other than space or equal sign =
Demo

Matching the First Character on Each Line (UNIX egrep)

I'm looking to match and return just the first character from each line in a plain-text UTF-8 encoded file using in a UNIX terminal using egrep. I presumed that the following egrep command with a simple RegEx would produce the desired result:
egrep -o "^." FILE.txt
However, the output appears to be matching and returning every character in the file; that is, it is behaving as if the command were:
egrep -o "." FILE.txt
Similar results occur with the following command,
egrep -o "^[a-z]" FILE.txt
namely, the results act as if the RegEx "[a-z]" were supplied (i.e., every lowercase ASCII character in the range a-z is matched).
Commands in which just one specific alphanumeric characters ist supplied seem, as expected, to return every line that begins with the specific character, e.g.,
egrep -o "^1" FILE.txt
or
egrep -o "^T" FILE.txt
return all lines beginning with "1" or "T", respectively.
I have tried pasting the entirety of the file into a RegEx tester, such as at https://regexr.com/, and the expression "^." indeed behaves as expected, so I don't think that my file has any further whitespace characters that could be interfering.
Is there some other behavior of the line-beginning metacharacter "^" with egrep that could be causing this problem?
This is a known bug in BSD grep and GNU grep 2.5.1-FreeBSD (also discussed here).
In -o mode, ^ anchor isn't handled properly (reported here, patched here):
$ echo abc | bsdgrep -o "^."
a
b
c
GNU grep on Linux behaves as expected:
$ echo abc | grep -o "^."
a
Related to what you are trying to achieve here (print the first character of every line), grep is an overkill. A simple cut would suffice:
$ echo abc | cut -c1
a

Indent line ranges with sed?

I'm trying here to convert old fashion phpBB code blocks to MARKDOWN using sed.
Please consider following data sample:
cat sed.txt
[code]xxxx-YYY-xxxx[/code]
Some text
[code]yyyy-ZZZ-yyyy[/code]
More text
Bogus code block[/code]
[code]zzzz-XXX-zzzz[/code]
After long trial and error I've ended up with the following strategy:
sed -ne '
/\[code\].*\[\/code\]/ {
s#\[/*code\]##g
s#^#\n\n #
s#$#\n\n#p
}' sed.txt | cat -Av
$
$
xxxx-YYY-xxxx$
$
$
$
$
yyyy-ZZZ-yyyy$
$
$
$
$
zzzz-XXX-zzzz$
$
$
This works great, however I find it would be easier and seem more natural to do it this way:
sed -ne '
/\[code\].*\[\/code\]/ {
s#\[/*code\]#\n\n#g
s#^# #p
}' sed.txt | cat -Av
$
$
xxxx-YYY-xxxx$
$
$
$
$
yyyy-ZZZ-yyyy$
$
$
$
$
zzzz-XXX-zzzz$
$
$
But that does not work as expected. Any suggestions why, how to get around this?
Thank you
sed '/\[code\].*\[\/code\]/ {
s#\[code]#& #g
s#\[/*code\]#\
\
#g
}' sed.txt
order of substitution is important and changed between your two sample
I also change a bit the behavior, the -n and p are not needed in this text sample (but maybe if coming from a biggest structure)
(test on my aix so posix version)
This might work for you (GNU sed):
sed -nr 's/^\[(code\])(.*)\[\/\1$/\n\n \2\n\n/p' file | sed -n l
N.B. In your script you prepend 2 newlines to the beginning of the pattern space and then prepend 4 spaces again, thus the indentation is added infront of the first of the newlines not infront of the text.

Regular expression does not match `]` symbol

My goal is to display only the text after last ] symbol.
echo MY_TEXT | grep -o "[^\]]*$"
The output is just the last symbol.
If I change "]" symbol to any letter, it works as expected.
Examples:
$ echo Hello World | grep -o '[^o]*$'
rld # and this is correct!
$ echo He]ll]o Wo]rld | grep -o "[^\]]*$"
d # but expected: rld
Why behavior is different for symbol o and ]?
Thank you in advance.
You do not need to escape the ]:
echo 'He]ll]o Wo]rld' | grep -o "[^]]*$"
Produces: rld
At couple of thing: [ doesn't need escaping inside a character class, you can use --color to see only the letters are match and notice -o splits each match on a new line:
$ echo "He]ll]o Wo]rld" | grep -o '[^]]*$'
rld
$ echo "He]ll]o Wo]rld" | grep --color '[^]]*'
He]ll]o Wo]rld
$ echo "He]ll]o Wo]rld" | grep -o '[^]]*'
He
ll
o Wo
rld
You can strip the ] character using tr among many other ways:
$ echo "He]ll]o Wo]rld" | tr -d ']'
Hello World
Removed the escape character and it worked for me.
$ echo He]ll]o Wo]rld | grep -o "[^]]*$"
> rld

Get total number of matches for a regex via standard unix command

Let's say that I want to count the number of "o" characters in the text
oooasdfa
oasoasgo
My first thought was to do grep -c o, but this returns 2, because grep returns the number of matching lines, not the total number of matches. Is there a flag I can use with grep to change this? Or perhaps I should be using awk, or some other command?
This will print the number of matches:
echo "oooasdfa
oasoasgo" | grep -o o | wc -l
you can use the shell (bash)
$ var=$(<file)
$ echo $var
oooasdfa oasoasgo
$ o="${var//[^o]/}"
$ echo ${#o}
6
awk
$ awk '{m=gsub("o","");sum+=m}END{print sum}' file
6