Regex doesn't work in grep - regex

Try search for filenames in file.txt with this regexp: ([\w.-]+)[.]\w+/gm
On regexr.com it works good, but when I try to find them with grep with this command I get nothing:
grep -E "([\w.-]+)[.]\w+/gm" file.txt
What am I doing wrong?
Input:
hello.py fasdfasdf
fadsfsdf
f
file.docx fsdfasdf
fadsfsdf.fds
FILE.mp3
Output:
hello.py
file.docx
fadsfsdf.fds
FILE.mp3

\w is a Perl extension; either use the -P option with grep (if supported), or use a standard regular expression instead:
grep -E '([[:alpha:].-]+)[.][[:alpha:]]+/gm' file.text

Related

Extract few matching strings from matching lines in file using sed

I have a file with strings similar to this:
abcd u'current_count': u'2', u'total_count': u'3', u'order_id': u'90'
I have to find current_count and total_count for each line of file. I am trying below command but its not working. Please help.
grep current_count file | sed "s/.*\('current_count': u'\d+'\).*/\1/"
It is outputting the whole line but I want something like this:
'current_count': u'3', 'total_count': u'3'
It's printing the whole line because the pattern in the s command doesn't match, so no substitution happens.
sed regexes don't support \d for digits, or x+ for xx*. GNU sed has a -r option to enable extended-regex support so + will be a meta-character, but \d still doesn't work. GNU sed also allows \+ as a meta-character in basic regex mode, but that's not POSIX standard.
So anyway, this will work:
echo -e "foo\nabcd u'current_count': u'2', u'total_count': u'3', u'order_id': u'90'" |
sed -nr "s/.*('current_count': u'[0-9]+').*/\1/p"
# output: 'current_count': u'2'
Notice that I skip the grep by using sed -n s///p. I could also have used /current_count/ as an address:
sed -r -e '/current_count/!d' -e "s/.*('current_count': u'[0-9]+').*/\1/"
Or with just grep printing only the matching part of the pattern, instead of the whole line:
grep -E -o "'current_count': u'[[:digit:]]+'
(or egrep instead of grep -E). I forget if grep -o is POSIX-required behaviour.
For me this looks like some sort of serialized Python data. Basically I would try to find out the origin of that data and parse it properly.
However, while being hackish, sed can also being used here:
sed "s/.*current_count': [a-z]'\([0-9]\+\).*/\1/" input.txt
sed "s/.*total_count': [a-z]'\([0-9]\+\).*/\1/" input.txt

Regex to match an IP adress within a colon and a slash with grep

The lines in the file I want to search look like this:
log:192.1.1.128/50098
log:192.1.1.11/22
...
Now I tried the following RegEx but none of them worked:
grep -oE "\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b" file
grep -oE "\b((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(\.|$)){4}\b"
grep -oE "\b(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b"
You can do this without regex using awk (on this simple example):
awk -F":|/" '{print $2}' file
192.1.1.128
192.1.1.11
To test if its IP contains three .:
awk -F":|/" '{n=split($2,a,".");if (n=4) print $2}' file
192.1.1.128
192.1.1.11
You could use grep also.
$ grep -oP '.*?:\K[^/]*(?=/)' file
192.1.1.128
192.1.1.11
Grep's extended regexp parameter -E won't support \d, you need to use [0-9] instead of \d.
$ grep -oE "\b[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\b" file
192.1.1.128
192.1.1.11

Finding a repeated string with grep and "bounds"

I have a file test-matching.txt that looks like this:
ba
bababa
baba
babadooba
According to the grep man page, I should be able to get all but the first line using the expression
grep "ba{2,}" test-matching.txt
This should match all the lines containing instances of a string with 2 or more "ba's". However, when I run it, I get no output.
First I tried grep "ba" test-matching.txt just to make sure it was working at all, and it gave me all four lines as output.
I've also tried the following, each with no output:
With the -e option: grep -e "ba{2,}" test-matching.txt
With the -e option and single quotes: grep -e 'ba{2,}' test-matching.txt
With the -e option and escaped braces: grep -e "ba\{2,\}" test-matching.txt
Without the -e option and single quotes: grep 'ba{2,}' test-matching.txt
Without the -e option and escaped braces: grep "ba\{2,\}" test-matching.txt
With {2} instead of {2,}: grep -e 'ba{2}' test-matching.txt
With {2} instead of {2,} and the -e option: grep -e 'ba{2} test-matching.txt
etc.
What is the correct way match all the lines of "ba" concatenated 2 or more times?
Use egrep or grep -E (not grep -e) if you want to use Extended regular expression syntax. If you want to use basic regular expression syntax, you need to backslash-escape the braces. Finally, if you want to repeat ba, you need to group: egrep '(ba){2,}', or grep '\(ba\)\{2,\}' if you prefer using basic regular expressions.
ba{2,} hits only the a
baa
baaa
baaaa
etc
You need (ba){2,} to make it works on group.
Try:
egrep "(ba){2,}" file
or
grep "\(ba\)\{2,\}" file
bababa
baba
babadooba

Regular expression in Unix using or operator

I'm trying to print,using grep, lines which contains vasile or line which contains ion . This is command but it doesn't work:
grep (vasile|ion) test.txt
I don't need this :
grep vasile test.txt | grep ion test.txt
try,
terminal$ grep -e vasile -e ion test.txt
Other way using OR operator | in grep
terminal$ grep 'vasile\|ion' test.txt
If you use awk, you can do:
awk '/vasile|ion/' test.txt
awk '/vasile/ || /ion/' test.txt
Try alternation with Grep's extended regex option:
grep -E 'vasile |ion' file
This should work with all Posix greps. \| is a GNU extension to BRE..

Is this a grep bug?

I expect
egrep -i "((\w)\2){4,}" /usr/share/dict/words
to match the word 'subbookkeeper', but it does not.
Thoughts?
Apparently egrep doesn't support {m,n} repeat syntax:
$ egrep -i '((\w)\2)((\w)\4)((\w)\6)' words
bookkeeper
bookkeeping
subbookkeeper
$ egrep -i '((\w)\2)((\w)\4)((\w)\6)((\w)\8)' words
subbookkeeper
If you spell out the groups, it works.
This is on my Mac.
The problem seems to be that egrep is not resetting captured groups on repeats. Not sure if this is a bug or just ambiguity in what the notation implies. If you manually repeat then it should work:
egrep -i "(\w)\1(\w)\2(\w)\3(\w)\4" /usr/share/dict/words
However, it is strange that this does not work. This does work in perl:
perl -lne "print if /((\w)\2){3}/" /usr/share/dict/words
BTW, egrep does support {m,n} syntax. This proves that:
egrep -i "a{2}" /usr/share/dict/words
Your regex is correct and there is not a bug. /usr/share/dict/words does not contain the word "subbookkeeper".
On my freebsd system it did find match
[vaibhavc#freebsd-vai ~]$ cat acb
subbookkeeper
[vaibhavc#freebsd-vai ~]$ egrep "((\w)\2){4,}" -i acb
subbookkeeper