regex grep/egrep problems

regex grep/egrep problems - regex

Based on searches on stackoverflow, I found out the difference between grep/egrep, but I still can't determine why this doesn't work. I've even check it at https://regex101.com/ and it shows it checked out right.
Regex:
.*ping[\/] or ping\D
searching against (text.txt):
path=/bin/ping6
path=/bin/ping
I'm trying to skip the first and only find the 2nd.
If I would do grep ping text.txt it finds both which isn't what I want.
grep -e ".*ping[\/]" text.txt [doesn't work]
egrep ".*ping[\/]" text.txt [doesn't work]
grep -P ".*ping[\/]" text.txt [doesn't work]
I did get this to work but not sure why:
grep -P "ping\D" text.txt [worked]
grep -e "ping\D" text.txt [doesn't work]
What am I failing to understand with grep -e/egrep/grep -p/regex?
When I man grep -P it said it's highly experimental and not to use it.

\D is the letter "D" for extended regular expressions. For perl regexes, \D is a non-digit and \d is a digit.
To search for "ping" followed optionally by a digit followed by a slash, you want:
grep 'ping[[:digit:]]\?/'
grep -E 'ping[[:digit:]]?/'
To search for "ping" not followed by a digit:
grep -E 'ping($|[^[:digit:]])' # ping then end-of-line or non-digit
grep -P 'ping(?!\d)'
GNU extended regular expressions documented here.

Related

sed from constant regex

I tried to remove the unwanted symbols
%H1256
*+E1111
*;E2311
+-'E3211
{E4511
DE4513
so I tried by using this command
sed 's/+E[0-9]/E/g
but it won't remove the blank spaces, and the digits need to be preserved.
expected:
H1256
E1111
E2311
E3211
E4511
E4513
EDIT
Special thanks to https://stackoverflow.com/users/3832970/wiktor-stribiżew my days have been saved by him
sed -n 's/.*\([A-Z][0-9]*\).*/\1/p' file or grep -oE '[A-Z][0-9]+' file

You may use either sed:
sed -n 's/.*\([[:upper:]][[:digit:]]*\).*/\1/p' file
or grep:
grep -oE '[[:upper:]][[:digit:]]+' file
See the online demo
Basically, the patterns match an uppercase letter ([[:upper:]]) followed with digits ([[:digit:]]* matches 0 or more digits in the POSIX BRE sed solution and [[:digit:]]+ matches 1+ digits in an POSIX ERE grep solution).
While sed solution will extract a single value (last one) from each line, grep will extract all values it finds from all lines.

This should do the job:
sed -E 's/^[^[:alnum:]]+//' file
Or if it is only the last 5 characters you need
sed -E 's/.*(.{5})$/\1/' file

Why can't I use ^\s with grep?

Both of the regexes below work In my case.
grep \s
grep ^[[:space:]]
However all those below fail. I tried both in git bash and putty.
grep ^\s
grep ^\s*
grep -E ^\s
grep -P ^\s
grep ^[\s]
grep ^(\s)
The last one even produces a syntax error.
If I try ^\s in debuggex it works.
Debuggex Demo
How do I find lines starting with whitespace characters with grep ? Do I have to use [[:space:]] ?

grep \s works for you because your input contains s. Here, you escape s and it matches the s, since it is not parsed as a whitespace matching regex escape. If you use grep ^\\s, you will match a string starting with whitespace since the \\ will be parsed as a literal \ char.
A better idea is to enable POSIX ERE syntax with -E and quote the pattern:
grep -E '^\s' <<< "$s"
See the online demo:
s=' word'
grep ^\\s <<< "$s"
# => word
grep -E '^\s' <<< "$s"
# => word

sed regex with alternative on Solaris doesn't work

Currently I'm trying to use sed with regex on Solaris but it doesn't work.
I need to show only lines matching to my regex.
sed -n -E '/^[a-zA-Z0-9]*$|^a_[a-zA-Z0-9]*$/p'
input file:
grtad
a_pitr
_aupa
a__as
baman
12353
ai345
ki_ag
-MXx2
!!!23
+_)#*
I want to show only lines matching to above regex:
grtad
a_pitr
baman
12353
ai345
Is there another way to use alternative? Is it possible in perl?
Thanks for any solutions.

With Perl
perl -ne 'print if /^(a_)?[a-zA-Z0-9]*$/' input.txt
The (a_)? matches a_ one-or-zero times, so optionally. It may or may not be there.
The (a_) also captures the match, what is not needed. So you can use (?:a_)? instead. The ?: makes () only group what is inside (so ? applies to the whole thing), but not remember it.

with grep
$ grep -xiE '(a_)?[a-z0-9]*' ip.txt
grtad
a_pitr
baman
12353
ai345
-x match whole line
-i ignore case
-E extended regex, if not available, use grep -xi '\(a_\)\?[a-z0-9]*'
(a_)? zero or one time match a_
[a-z0-9]* zero or more alphabets or numbers
With sed
sed -nE '/^(a_)?[a-zA-Z0-9]*$/p' ip.txt
or, with GNU sed
sed -nE '/^(a_)?[a-z0-9]*$/Ip' ip.txt

Finding a repeated string with grep and "bounds"

I have a file test-matching.txt that looks like this:
ba
bababa
baba
babadooba
According to the grep man page, I should be able to get all but the first line using the expression
grep "ba{2,}" test-matching.txt
This should match all the lines containing instances of a string with 2 or more "ba's". However, when I run it, I get no output.
First I tried grep "ba" test-matching.txt just to make sure it was working at all, and it gave me all four lines as output.
I've also tried the following, each with no output:
With the -e option: grep -e "ba{2,}" test-matching.txt
With the -e option and single quotes: grep -e 'ba{2,}' test-matching.txt
With the -e option and escaped braces: grep -e "ba\{2,\}" test-matching.txt
Without the -e option and single quotes: grep 'ba{2,}' test-matching.txt
Without the -e option and escaped braces: grep "ba\{2,\}" test-matching.txt
With {2} instead of {2,}: grep -e 'ba{2}' test-matching.txt
With {2} instead of {2,} and the -e option: grep -e 'ba{2} test-matching.txt
etc.
What is the correct way match all the lines of "ba" concatenated 2 or more times?

Use egrep or grep -E (not grep -e) if you want to use Extended regular expression syntax. If you want to use basic regular expression syntax, you need to backslash-escape the braces. Finally, if you want to repeat ba, you need to group: egrep '(ba){2,}', or grep '\(ba\)\{2,\}' if you prefer using basic regular expressions.

ba{2,} hits only the a
baa
baaa
baaaa
etc
You need (ba){2,} to make it works on group.
Try:
egrep "(ba){2,}" file
or
grep "\(ba\)\{2,\}" file
bababa
baba
babadooba

Bash (grep) regex performing unexpectedly

I have a text file, which contains a date in the form of dd/mm/yyyy (e.g 20/12/2012).
I am trying to use grep to parse the date and show it in the terminal, and it is successful,
until I meet a certain case:
These are my test cases:
grep -E "\d*" returns 20/12/2012
grep -E "\d*/" returns 20/12/2012
grep -E "\d*/\d*" returns 20/12/2012
grep -E "\d*/\d*/" returns nothing
grep -E "\d+" also returns nothing
Could someone explain to me why I get this unexpected behavior?
EDIT: I get the same behavior if I substitute the " (weak quotes) for ' (strong quotes).

The syntax you used (\d) is not recognised by Bash's Extended regex.
Use grep -P instead which uses Perl regex (PCRE). For example:
grep -P "\d+/\d+/\d+" input.txt
grep -P "\d{2}/\d{2}/\d{4}" input.txt # more restrictive
Or, to stick with extended regex, use [0-9] in place of \d:
grep -E "[0-9]+/[0-9]+/[0-9]" input.txt
grep -E "[0-9]{2}/[0-9]{2}/[0-9]{4}" input.txt # more restrictive

You could also use -P instead of -E which allows grep to use the PCRE syntax
grep -P "\d+/\d+" file
does work too.

grep and egrep/grep -E don't recognize \d. The reason your first three patterns work is because of the asterisk that makes \d optional. It is actually not found.
Use [0-9] or [[:digit:]].

To help troubleshoot cases like this, the -o flag can be helpful as it shows only the matched portion of the line. With your original expressions:
grep -Eo "\d*" returns nothing - a clue that \d isn't doing what you thought it was.
grep -Eo "\d*/" returns / (twice) - confirmation that \d isn't matching while the slashes are.
As noted by others, the -P flag solves the issue by recognizing "\d", but to clarify Explosion Pills' answer, you could also use -E as follows:
grep -Eo "[[:digit:]]*/[[:digit:]]*/" returns 20/12/
EDIT: Per a comment by #shawn-chin (thanks!), --color can be used similarly to highlight the portions of the line that are matched while still showing the entire line:
grep -E --color "[[:digit:]]*/[[:digit:]]*/" returns 20/12/2012 (can't do color here, but the bold "20/12/" portion would be in color)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

regex grep/egrep problems - regex

Related

sed from constant regex

Why can't I use ^\s with grep?

sed regex with alternative on Solaris doesn't work

Finding a repeated string with grep and "bounds"

Bash (grep) regex performing unexpectedly

Categories

Resources