grep to select strings that contains certain words - regex

I have a list:
/device1/element1/CmdDiscovery
/device1/element1/CmdReaction
/device1/element1/Direction
/device1/element1/MS-E2E003-COM14/Field2
/device1/element1/MS-E2E003-COM14/Field3
/device1/element1/MS-E2E003-COM14/NRepeatLeft
How can I grep so that the returned strings containing only "Field" followed by digits or simply NRepeatLeft at the end of string (in my example it will be the last three strings)?
Expected output:
/device1/element1/MS-E2E003-COM14/Field2
/device1/element1/MS-E2E003-COM14/Field3
/device1/element1/MS-E2E003-COM14/NRepeatLeft

Try doing this :
grep -E "(Field[0-9]*|NRepeatLeft$)" file.txt
| | | ||
| | OR end_line |
| opening_choice closing_choice
extented_grep
if you don't have -E switch (stands for ERE : Extented Regex Expression):
grep "\(Field[0-9]*\|NRepeatLeft$\)" file.txt
OUTPUT
/device1/element1/MS-E2E003-COM14/Field2
/device1/element1/MS-E2E003-COM14/Field3
/device1/element1/MS-E2E003-COM14/NRepeatLeft
That will grep for lines matching Field[0-9] or lines matching RepeatLeft at the end. Is it what you expect ?

I am not much sure of how to use grep for your purpose.Probably you would like perl for this:
perl -lne 'if(/Field[\d]+/ or /NRepeatLeft/){print}' your_file

$ grep -E '(Field[0-9]*|NRepeatLeft)$' file.txt
Output:
/device1/element1/MS-E2E003-COM14/Field2
/device1/element1/MS-E2E003-COM14/Field3
/device1/element1/MS-E2E003-COM14/NRepeatLeft
Explanation:
Field # Match the literal word
[0-9]* # Followed by any number of digits
| # Or
NRepeatLeft # Match the literal word
$ # Match the end of the string
You can see how this works with your example here.

Related

non matching groups in grep regex not working

I would like to extract 1, 10, and 100 from:
1 one -args 123
10 ten -args 123
100 one hundred -args 123
However this regex returns 100:
echo -e " 1 one\n 10 ten\n100 one hundred" | grep -Po '^(?=[ ]*)\d+(?=.*)'
100
Not ignoring the preceding spaces returns the numbers (but of course with undesired spaces):
echo -e " 1 one\n 10 ten\n100 one hundred" | grep -Po '^[ ]*\d+(?=.*)'
1
10
100
Have I misunderstood non capturing regex groups in grep / Perl (grep version 2.2, Perl as the -P flag should use its regex) or is this a bug? I notice the release notes for 2.6 says "This release fixes an unexpectedly large number of flaws, from outright bugs (surprisingly many, considering this is "grep")".
If someone with 2.6 could try these examples that would be valuable to determine if this is a bug (in 2.2) or intended behaviour.
The issue is what is considered a 'match' by grep. In the absence of telling grep part of the total match is not what you want, it prints everything up to the end of the match regardless of matching groups.
Given:
$ echo "$txt"
1 one -args 123
10 ten -args 123
100 one hundred -args 123
You can get just the first column of digits without leading spaces several ways.
With GNU grep:
$ echo "$txt" | grep -Po '^[ ]*\K\d+'
1
10
100
Here \K is equivalent to a look behind assertion that resets the match text of the match to be what comes after. The left hand, before the \K, is required to match, but is not included in match text printed by grep.
Demo
awk:
$ echo "$txt" | awk '/^[ ]*[0-9]+/{print $1}'
sed:
$ echo "$txt" | sed 's/^[ ]*\([0-9]*\).*/\1/'
Perl:
$ echo "$txt" | perl -lne 'print $1 if /^[ ]*\K(\d+)/'
And then if you want the matches on a single line, run through xargs:
$ echo "$txt" | grep -Po '^[ ]*\K(\d+)' | xargs
1 10 100
Or, if you are using awk or Perl, just change the way it is printed to not include a carriage return.
You can delete the unwanted spaces this way :
echo -e " 1 one\n 10 ten\n100 one hundred" | grep -Po '^[ ]*(\d+)' | tr -d ' '
As for your question of why it is not working, it is not a bug, it is working as intended, you just misinterpreted how it should work.
If we focus on this ^(?=[ ]*)\d+:
The (?=[ ]*) part is a lookahead assertion. So it means that the regex engine tries to check if the ^ is followed by zero or more spaces. But the assertion itself is not part of the match, so in reality this code means :
- Match a ^ that is followed by 0 or more spaces
- After this ^, match one or more digits
So your code will only match when a digit is the first character of the line. The lookahead won't help you on your use case.
I think the anchor messes with the lookahead, which could be a lookbehind, but they can't be ambiguous (I always run into that one). So the following would work:
echo -e " 1 one\n 10 ten\n100 one hundred" | grep -Po '(?=[ ]*)\d+(?=.*)'
As for a better tool, I would use awk as it is suited to any column driven data. So if you were running it off of ps you could do something like:
ps | awk '/stuff you want to look for here/{print $1}'
awk will take care of all the white space by default

Regex behaviour with angle brackets

Please explain to me why the following expression doesn't output anything:
echo "<firstname.lastname#domain.com>" | egrep "<lastname#domain.com>"
but the following does:
echo "<firstname.lastname#domain.com>" | egrep "\<lastname#domain.com>"
The behaviour of the first is as expected but the second should not output. Is the "\<" being ignored within the regex or causing some other special behaviour?
AS #hwnd said \< matches the begining of the word. ie a word boundary \b must exists before the starting word character(character after \< in the input must be a word character),
In your example,
echo "<firstname.lastname#domain.com>" | egrep "<lastname#domain.com>"
In the above example, egrep checks for a literal < character present before the lastname string. But there isn't, so it prints nothing.
$ echo "<firstname.lastname#domain.com>" | egrep "\<lastname#domain.com>"
<firstname.**lastname#domain.com>**
But in this example, a word boundary \b exists before lastname string so it prints the matched characters.
Some more examples:
$ echo "namelastname#domain.com" | egrep "\<e#domain.com"
$ echo "namelastname#domain.com" | egrep "\<lastname#domain.com"
$ echo "namelastname#domain.com" | egrep "\<com"
namelastname#domain.**com**
$ echo "<firstname.lastname#domain.com>" | egrep "\<#domain.com>"
$ echo "n-ame-lastname#domain.com" | egrep "\<ame-lastname#domain.com"
n-**ame-lastname#domain.com**

regular expression (regex) of end of the string

I want to add the symbols related to the end of the string in my regexp
echo aaa.bbb.ccc=3 | grep "aaa\.[^.]\+\.ccc=3"
I tried the following symbols but it does not works
echo aaa.bbb.ccc=3 | grep "aaa\.[^.]\+\.ccc=3\Z"
echo aaa.bbb.ccc=3 | grep "aaa\.[^.]\+\.ccc=3$/"
How I can add end of string symbol to my regexp?
Update
question 2)
echo aaa.bbb.ccc=3 | grep "aaa\.[^.]\+\.ccc=3"
# ^
# |
# What symbols I have to add here in order to say I m expecting end of string or any thing except the digits [^0-9]?
Use echo aaa.bbb.ccc=3 | grep "aaa\.[^.]\+\.ccc=3$"
Answer 2:
Use echo aaa.bbb.ccc=3 | grep "aaa\.[^.]\+\.ccc=3[^0-9]*"
[^0-9]* will include $ also.
Refer Understanding Regular Expressions for more details.
You can use
\($\|[^0-9]\)
to match either the end of input or a non-digit character.

Regex: replacing a string with prefix capture except for a given prefix

I want to replace a string, keeping the prefix, except when it contains a specific prefix.
For instance, any string like "(*)-bar" must be replaced with "(*)-blah" except when "(*)" matches "baz":
foo-bar => should return foo-blah
baz-bar => should remain baz-bar
The best I have so far trims the last letter of the prefix when replacing:
echo "foo-bar" | sed s/"[^(baz)]-bar"/$1-blah/
Use negative lookbehind:
s/(?<!baz)-bar/-blah/
Most sed implementations don't have this advanced regexp feature, but it should work in more modern languages, such as perl.
With sed :
$ echo "foo-bar" | sed '/^foo-baz/!s/^foo-.*$/foo-blah/'
foo-blah
$ echo "foo-baz" | sed '/^foo-baz/!s/^foo-.*$/foo-blah/'
foo-baz
If I decompose :
echo "foo-baz" | sed '/^foo-baz/!s/^foo-.*$/foo-blah/'
| ||| |
+ regex +|+ substitution part +
|
negation of regex

grep for X or Y in unix?

how can I capture all lines from a text file that begin with the character "X" or contain the word "foo"?
This works:
cat text | grep '^#' # begins with #
but I tried:
cat text | grep '^#|[foo]'
and variations but cannot find the right syntax anywhere. how can this be done?
thanks.
If your grep implementation isn't POSIX compliant, you can use egrep instead of grep:
egrep '^#|foo' text
cat text | grep '^#|foo'
does this. [foo] matches one character that's either an f or an o.
If you don't want to match parts of words like the foo in foobar, use word boundary anchors:
cat text | grep '^#|\bfoo\b'
contains the word "foo" is: (.*foo.*) so your regex would become:
cat yourFilePath | grep -E '^#|(.*foo.*)'