Making a mistake with | (or) in regular expressions - regex

I'm sure I'm doing something really obviously wrong here, but I can't figure out what. Using grep from a bash shell, I have a file test.txt:
ABC123
ABC456
ABC789
DEF123
DEF456
DEF789
Now at the command line:
$ grep ABC test2.txt
ABC123
ABC456
ABC789
$ grep DEF test2.txt
DEF123
DEF456
DEF789
So those work great. Now, I expect the following command to print the whole file, but:
$ grep ABC\|DEF test2.txt
$ grep (ABC)\|(DEF) test2.txt
-bash: syntax error near unexpected token `ABC'
$ grep \(ABC\)\|\(DEF\) test2.txt
$ grep 'ABC|DEF' test2.txt
What am I doing wrong?

Turn on the extended regex with -E:
grep -E "ABC|DEF" test2.txt

As others have pointed out, standard grep command does not support the or syntax. Unfortunately, from there, things are a mishmash.
Some systems have a egrep that does offer or syntax.
Some systems can use a -E or -P (for Perl) flag to extend grep syntax.
Some systems have both the -E and egrep that do the same thing. This implies that there are systems out there where grep -E and egrep are not the same. (sad but true).
Some systems now use the extended regular expressions in their standard grep command. Apparently, your system doesn't.
Read your manpages to see what your system does support. Some systems have a manpage for re_formatthat will explain what they support and don't support in extended format.
Then again, you could always just use a Perl one-liner:
$ perl -ne "print if /(ABC)|(DEF)/" test.txt
At least you know all the stuff that supports.

I don't think standard syntax supports it. You could use -P switch if available:
grep -P "(ABC|DEF)" test2.txt

Use egrep instead, which is the same as using grep -E:
egrep 'ABC|DEF' test2.txt

Related

Cross platform regex substring match for git

Sorry for yet another pattern matching question, but I'm struggling to to find a tool that will do a regex in a git hook. It needs to work on Windows, Mac and Linux.
This gnu grep works for Windows and Linux, but not Mac (because bsd)
echo "feature/EOPP-234-foo" | grep -Po -e '[A-Z]{4}-\d{1,5}'
This works for Mac and Linux, but not windows (because <git>\usr\bin\egrep don't seem to work)
echo "feature/EOPP-234-foo" | egrep -o '[A-Z]{4}-\d{1,5}'
sed might be the most common tool, but stuffed if I can get it to match:
echo "feature/EOPP-234-foo" | sed -n 's/^.*\([A-Z]{4}\-\d{1,5}\).*$/\1/p'
I've even tried bash matching with no luck
[[ "feature/EOPP-234-foo" =~ ([A-Z]{4}-\d{1,5}) ]] && echo ${BASH_REMATCH[1]}
Any ideas?
When you need to make POSIX tools run on Windows, you need to remember to use double quotation marks around your commands, not single quotes.
Also, you can use a common POSIX ERE compliant regex across all these environments. This means \d must be replaced with [0-9] or [[:digit:]] as \d is a PCRE only compliant construct.
You can use
grep -Eo "[A-Z]{4}-[0-9]{1,5}"
grep -Eo "[A-Z]{4}-[[:digit:]]{1,5}"

regular expression with grep not working?

I am running cygwin, grep 2.21 on Windows 7.
I am trying to get all tcp connections from netstat, so I run the following:
netstat | grep -i "^(TCP|UDP)"
Now, it returns nothing, but when I run netstat it clearly returns a bunch of tcp connections. I also tried :
netstat | egrep -i "^(TCP|UDP)"
Also returns nothing. What am I missing here? I thought that the caret means "begins with". Thanks.
For me, netstat | grep -P '(tcp|udp)' worked.
You may want to use the i flag to ignore the case if necessary.
netstat | grep -Pi '(TcP|UDp)'
About the other answer here, using egrep or grep -e gives the same result. Based on this.
The -P flag was inspired by this post.
Using the option -P, accordingly to man grep, sets the pattern interpreter as perl. Not sure why it did not work without -P though.
netstat outputs the protocol using lower case characters. Try either of the following:
netstat | grep '^\(udp\|tcp\)'
or
netstat | egrep '^(udp|tcp)'
The difference between the two is that egrep supports an extended regular expression syntax in which (, ) and | should not be escaped. As Reuel Ramos Ribeiro noted,
egrep is equivalent to using grep -e so alternatively one could use netstat | grep -e '^(udp|tcp)'.

pattern matching while using ls command in bash script

In a sh script, I am trying to loop over all files that match the following pattern
abc.123 basically abc. followed by only numbers, number following . can be of any length.
Using
$ shopt -s extglob
$ ls abc.+([0-9])
does the job but on terminal and not through the script. How can I get only files that match the pattern?
if I understood you right, the pattern could be translated into regex:
^abc\.[0-9]+$
so you could
keep using ls and grep the output. for example:
ls *.*|xargs -n1|grep -E '^abc\.[0-9]+$'
or use find
find has an option -regex
If you're using sh and not bash, and presumably you also want to be POSIX compliant, you can use:
for f in ./*
do
echo "$f" | grep -Eq '^\./abc.[0-9]+$' && continue
echo "Something with $f here"
done
It will work fine with filenames with spaces, quotes and such, but may match some filenames with line feeds in them that it shouldn't.
If you tagged your question bash because you're using bash, then just use extglob like you described.

grep with regexp: whitespace doesn't match unless I add an assertion

GNU grep 2.5.4 on bash 4.1.5(1) on Ubuntu 10.04
This matches
$ echo "this is a line" | grep 'a[[:space:]]\+line'
this is a line
But this doesn't
$ echo "this is a line" | grep 'a\s\+line'
But this matches too
$ echo "this is a line" | grep 'a\s\+\bline'
this is a line
I don't understand why #2 does not match (whereas # 1 does) and #3 also shows a match. Whats the difference here?
Take a look at your grep manpage. Perl added a lot of regular expression extensions that weren't in the original specification. However, because they proved so useful, many programs adopted them.
Unfortunately, grep is sometimes stuck in the past because you want to make sure your grep command remains compatible with older versions of grep.
Some systems have egrep with some extensions. Others allow you to use grep -E to get them. Still others have a grep -P that allows you to use Perl extensions. I believe Linux systems' grep command can use the -P extension which is not available in most Unix systems unless someone has replaced the grep with the GNU version. Newer versions of Mac OS X also support the -P switch, but not older versions.
grep doesn't support the complete set of regular expressions, so try using -P to enable perl regular expressions. You don't need to escape the + i.e.
echo "this is a line" | grep -P 'a\s+line'

Why does this `grep -o` fail, and how should I work around it?

Given the input
echo abc123def | grep -o '[0-9]*'
On one computer (with GNU grep 2.5.4), this returns 123, and on another (with GNU grep 2.5.1) it returns the empty string. Is there some explanation for why grep 2.5.1 fails here, or is it just a bug? I'm using grep -o in this way in a bash script that I'd like to be able to run on different computers (which may have different versions of grep). Is there a "right way" to get consistent behavior?
Yes, 2.5.1's -o handling was buggy:
http://www.mail-archive.com/bug-grep#gnu.org/msg00993.html
Grep is probably not the right tool for this; sed or tr or even perl might be better depending on what the actual task is.
you can use the shell. its faster
$ str=abc123def
$ echo ${str//[a-z]/}
123
I had the same issue and found that egrep was installed on that machine. A quick solution was using
echo abc123def | egrep -o '[0-9]*'
This will give similar results:
echo abc123def | sed -n 's/[^0-9]*\([0-9]\+\).*/\1/p'
Your question is a near-duplicate of this one.
Because you are using a regex so you must use either:
grep -E
egrep (like Sebastian posted).
Good luck!