Grep and regex: inconsistent behaviour [duplicate] - regex

This question already has answers here:
Why are there so many different regular expression dialects?
(4 answers)
Closed 1 year ago.
I was working on the Linux box A and I run this:
grep '^\S*\s-' access_log
That displayed some lines, as expected.
Then I moved to the machine B and I launched exactly the same command. But this time it didn't work.
I had to launch this in order to get done what I needed:
grep '^[^ ]* -' access_log
Before succeeding, I tried all of these but with no luck:
grep '^\S* -' access_log
grep '^\S*\s-' access_log
grep -e '^\S* -' access_log
grep -E '^\S* ' access_log
It looks like machine B doesn't understand the metacharacters \S and \s.
Both of the boxes were running: grep 2.5.1 and bash 3.2.25
How is that possible?
Cheers,
Dan

Judging from the grep man page. It seems that if you can use things like \s you are using Perl regular expressions. Which are used when the -P option is passed to grep. So it may be that that option is set automatically on machine A and not on machine B. The reason for that may be some alias, or the option is set in GREP_OPTIONS.

Related

Match X or Y in grep regular expression

I'm trying to run a fairly simple regular expression to clear out some home directories. For background: I'm trying to ask users on my system to clear out their unnecessary files to clear up space on their home directories, so I want to inform users with scripts such as Anaconda / Miniconda installation scripts that they can clear that out.
To generate a list of users who might need such an email, I'm trying to run a simple regular expression to list all homedirs that contain such an installation script. So my assumption would be that the follwing should suffice:
for d in $(ls -d /home/); do
if $(ls $d | grep -q "(Ana|Mini)conda[23].*\.sh"); then
echo $d;
fi;
done;
But after running this, it resulted in nothing at all, sadly. After a while looking, I noticed that grep does not interpret regular expressions as I would expect it to. The following:
echo "Lorem ipsum dolor sit amet" | grep "(Lorem|Ipsum) ipsum"
results in no matches at all. Which would then explain why the above forloop wouldn't work either.
My question then is: is it possible to match the specified regular expression (Ana|Mini)conda[23].*\.sh, in the same way it matches strings in https://regex101.com/r/yxN61p/1? Or is there some other way to find all users who have such a file in their homedir using a simple for-loop in bash?
Short answer: grep defaults to Basic Regular Expressions (BRE), but unescaped () and | are part of Extended Regular Expressions (ERE). GNU grep, as an extension, supports alternation (which isn't technically part of BRE), but you have to escape \:
grep -q "\(Ana\|Mini\)conda[23].*\.sh"
Or you can indicate that you want to use ERE:
grep -Eq "(Ana|Mini)conda[23].*\.sh"
Longer answer: this all being said, you don't need grep, and parsing the output of ls comes with a lot of pitfalls. Instead, you can use globs:
printf '%s\n' /home/*/*{Ana,Mini}conda[23]*.sh
should do it, if I understand the intention correctly.
This uses the fact that printf just repeats its formatting string if supplied with more parameters than formatting directives, printing each file on a separate line.
/home/*/*{Ana,Mini}conda[23]*.sh uses brace expansion, i.e., it first expands to
/home/*/*Anaconda[23]*.sh /home/*/*Miniconda[23]*.sh
and each of those is then expanded with filename expansion. [23] works the same way as in a regular expression; * is "zero or more of any character except /".
If you don't know how deep in the directory tree the files you're looking for are, you could use globstar and **:
shopt -s globstar
printf '%s\n' /home/**/*{Ana,Mini}conda[23]*.sh
** matches all files and zero or more subdirectories.
Finally, if you want to handle the case where nothing matches, you could set either shopt -s nullglob (expand to nothing if nothing matches) or shopt -s failglob (error if nothing matches).
Shell patterns are described here.
You don't need ls or grep at all for this:
shopt -s extglob
for f in /home/*/#(Ana|Mini)conda[23].*.sh; do
echo "$f"
done
With extglob enabled, #(Ana|Mini) matches either Ana or Mini.

grep expression for a whole word [duplicate]

This question already has answers here:
How to grep for the whole word
(7 answers)
Closed 6 years ago.
I'm trying to parse the firewall log file and only take the lines that don't contain the router's address as source. The router's address is the obvious 192.168.2.1 and the computer's address is 192.168.2.110.
If I write grep -v 192.168.2.1 then I don't get the destination 192.168.2.110, because it starts with 192.168.2.1. If I don't use anything, then I get the lines from the router, that I would like to filter out. I have searched and tried different regexs, but no matter what I do, I either get both addresses or none.
This force PATTERN to match only whole words grep -w.
grep -v -w 192.168.2.1 file
192.168.2.110
Or Enclose your pattern with \<pattern\>
grep -v '\<192.168.2.1\>' file
192.168.2.110
You can try to use \b which matches with word boundaries:
grep -vP '\b192.168.2.1\b'
or better yet
grep -vP '\b192\.168\.2\.1\b'
You need the -P mode for this to work.

Grep regex treated as path

I have a script, where I read strings from txt file, then assign it to $snmp_cred variable and then trying to strip ip address from strings, using grep, into another variable ($snmp_ip)
while read snmp_cred; do
echo appliance $ADDM_address and $snmp_cred
snmp_ip=$(echo $snmp_cred | grep "/((25[0-5]|2[0-4][0-9]|[01]?[1-9][0-9]?)\.){3}(25[0-5]|2[0-4][0-9]|[01]?[1-9][0-9]?)\d(?=#)/g")
echo IP for snmp community is $snmp_ip
done </tmp/input.txt
Content of input.txt file is:
a10networks/generic/1.3.6.1.4.1.22610.1.3.27_thunder_series4430s/10.72.168.33#public
a10networks/generic/1.3.6.1.4.1.22610.1.3.23_thunder_series1030s/172.17.48.24#public
a10networks/generic/1.3.6.1.4.1.22610.1.3.16_ax3200_12/10.251.1.101#public
The regex works in online regex editor, but fails into bash script. Bash output is:
++ echo $'a10networks/generic/1.3.6.1.4.1.22610.1.3.27_thunder_series4430s/10.72.168.33#public\r'
++ grep '/((25[0-5]|2[0-4][0-9]|[01]?[1-9][0-9]?)\.){3}(25[0-5]|2[0-4][0-9]|[01]?[1-9][0-9]?)\d(?=#)/g'
+ snmp_ip=
+ echo IP for snmp community is
IP for snmp community is
can anyone point, what an I doing wrong?
Since you are not getting the matched texts only, you do not really need the lookahead that the POSIX regex does not support. Also, note that \d is not supported by POSIX regex standard either. Also, grep pattern should not be placed inside regex delimiters.
If you still need to use your pattern (say, to also grab the matches), pass the -oP option use:
grep -oP "((25[0-5]|2[0-4]\d|[01]?[1-9]\d?)\.){3}(25[0-5]|2[0-4]\d|[01]?[1-9]\d?)\d(?=#)"
And the online demo
In this statement:
snmp_ip=$($snmp_cred | grep "/((25[0-5]|2[0-4][0-9]|[01]?[1-9][0-9]?)\.){3}(25[0-5]|2[0-4][0-9]|[01]?[1-9][0-9]?)\d(?=#)/g")
you are just expanding the variable, without passing it to grep.
you need to either pass it to grep as an argument (in the form of a file redirection) or send it to greps STDIN.
this worked for me
#!/bin/bash
while read snmp_cred; do
#echo appliance $ADDM_address and $snmp_cred
snmp_ip=$(grep -E -o "((25[0-5]|2[0-4][0-9]|[01]?[1-9][0-9]?)\.){3}(25[0-5]|2[0-4][0-9]|[01]?[1-9][0-9]?)#" <<< $snmp_cred)
echo IP for snmp community is $snmp_ip
done <input.txt
output:
IP for snmp community is 10.72.168.33#
IP for snmp community is 172.17.48.24#
IP for snmp community is

Regular expressions: [a]bc vs abc

I am not a regular expressions expert, but I thought I understood the basics. I was reading a tutorial that mentioned using this syntax:
$ ps -ewwo pid,args | grep [s]sh
to determine if SSHD is running or not.
I do not understand why the first s is in brackets. I would think that ssh and [s]sh would yield the same results, but I actually get different results.
$ ps -ewwo pid,args | grep [s]sh
1258 /usr/bin/ssh-agent /usr/bin/dbus-launch --exit-with-session gnome-session --session=ubuntu
2988 /usr/sbin/sshd -D
$ ps -ewwo pid,args | grep ssh
1258 /usr/bin/ssh-agent /usr/bin/dbus-launch --exit-with-session gnome-session --session=ubuntu
2988 /usr/sbin/sshd -D
3082 grep --color=auto ssh
So why does it find the 3rd result in the second example?
Thanks!
The regular expressions [a]bc and abc match exactly the same set of strings, but they're being applied to different data, because the command-line arguments to grep appear in the output of the ps command.
Using [a]bc causes the literal string "[a]bc" to appear in the output of ps -- and this isn't matched by the regular expression [a]bc.
The idea is to avoid matching the line for the grep command itself.
The brackets are a character class but it doesn't really make sense to have a character class with one character and no repeat specified.
The reason you get different results is because ssh matches itself the grep arguments in the process list, but it [s]sh does not match itself.
When you pipe ps into grep, you'll often find the running grep process because the term exists in the program name and it's probable it will match.

Mac OSX, Bash, awk, and negative look behind

I want to find a particular process using awk:
ps aux|awk '/plugin-container.*Flash.*/'
Now it finds the process, but it includes itself in the results, because ps results include them as well. To prevent that, I am trying to use negative look behind as follows:
ps aux|awk '/(\?<!awk).*plugin-container.*Flash.*/'
But it does not work. Does awk support look behind? What am I doing wrong? Thanks
The common trick is to use
ps aux | grep '[p]lugin-container.*Flash.*'
The character class [p] prevents grep itself from being matched.
I don't know whether awk supports lookbehind, but I usually solve this problem with grep -v:
aix#aix:~$ ps aux | awk '/plugin-container.*Flash/' | grep -v awk
(Also, I'd normally use grep for the awk command above.)
I don't know if awk supports lookbehinds, but if, then the question mark at the start should not be escaped, a negative lookbehind starts (?<!
A question mark right after the opening bracket is the sign, that this group is not a capturing group, i.e. it has some special meaning.