regular expression (regex) of end of the string - regex

I want to add the symbols related to the end of the string in my regexp
echo aaa.bbb.ccc=3 | grep "aaa\.[^.]\+\.ccc=3"
I tried the following symbols but it does not works
echo aaa.bbb.ccc=3 | grep "aaa\.[^.]\+\.ccc=3\Z"
echo aaa.bbb.ccc=3 | grep "aaa\.[^.]\+\.ccc=3$/"
How I can add end of string symbol to my regexp?
Update
question 2)
echo aaa.bbb.ccc=3 | grep "aaa\.[^.]\+\.ccc=3"
# ^
# |
# What symbols I have to add here in order to say I m expecting end of string or any thing except the digits [^0-9]?

Use echo aaa.bbb.ccc=3 | grep "aaa\.[^.]\+\.ccc=3$"
Answer 2:
Use echo aaa.bbb.ccc=3 | grep "aaa\.[^.]\+\.ccc=3[^0-9]*"
[^0-9]* will include $ also.
Refer Understanding Regular Expressions for more details.

You can use
\($\|[^0-9]\)
to match either the end of input or a non-digit character.

Related

Regex for uppercase matches with exclusions

I'm trying to come up with a regex for the following case: I need to find any matching paths using grep for the following paths:
Include all uppercase matching paths.
Example:
com/foo/Bar/1.2.3-SNAPSHOT/Bar-1.2.3-SNAPSHOT.jar
Notice the capital B in Bar.
Exclude all uppercase matching paths that only contain SNAPSHOT and have no other uppercase letters.
Example:
com/foo/bar/1.2.3-SNAPSHOT/bar-1.2.3-SNAPSHOT.jar
Is this possible with grep?
Something like this might do:
grep -vE '^([^[:upper:]]*(SNAPSHOT)?)*$'
Breakdown:
-v will reverse the match (show all non matched lines. -E enabled Extended Regular Expressions.
^ # Start of line
( )* # Capturing group repeated zero or more times
[^[:upper:]]* # Match all but uppercase zero or more times
(SNAPSHOT)? # Followed by literal SNAPSHOT zero or one time
$ # End of line
Just use awk:
$ cat file
com/foo/Bar/1.2.3-SNAPSHOT/Bar-1.2.3-SNAPSHOT.jar
com/foo/bar/1.2.3-SNAPSHOT/bar-1.2.3-SNAPSHOT.jar
With GNU awk or mawk for gensub():
$ awk 'gensub(/SNAPSHOT/,"","g")~/[[:upper:]]/' file
com/foo/Bar/1.2.3-SNAPSHOT/Bar-1.2.3-SNAPSHOT.jar
With other awks:
$ awk '{r=$0; gsub(/SNAPSHOT/,"",r)} r~/[[:upper:]]/' file
com/foo/Bar/1.2.3-SNAPSHOT/Bar-1.2.3-SNAPSHOT.jar
Well, you need find to list all paths. Then you can do it with grep with two runs. One includes all capital cases. The other one excludes that contain no capitals except SNAPSHOT:
find . | grep '[A-Z]' | grep -v '.*\/[^A-Z]*SNAPSHOT[^A-Z]*$'
I think only the last grep needs some explanation:
grep -v excludes the matching lines
.*\/ greedily matches everything up to the first slash. There'll always be a slash due to find .
[^A-Z]* finds all characters that are non-capital letters. So we apply it before and after the SNAPSHOT literal, up to the end of the string.
Here you can play with it online.
If you only want to get the matching files. I'll do it like this.
find . -type f -regex '.*[A-Z].*' | while read -r line; do echo "$line" | sed 's/SNAPSHOT//g' | grep -q '.*[A-Z].*' && echo "$line"; done

Regex behaviour with angle brackets

Please explain to me why the following expression doesn't output anything:
echo "<firstname.lastname#domain.com>" | egrep "<lastname#domain.com>"
but the following does:
echo "<firstname.lastname#domain.com>" | egrep "\<lastname#domain.com>"
The behaviour of the first is as expected but the second should not output. Is the "\<" being ignored within the regex or causing some other special behaviour?
AS #hwnd said \< matches the begining of the word. ie a word boundary \b must exists before the starting word character(character after \< in the input must be a word character),
In your example,
echo "<firstname.lastname#domain.com>" | egrep "<lastname#domain.com>"
In the above example, egrep checks for a literal < character present before the lastname string. But there isn't, so it prints nothing.
$ echo "<firstname.lastname#domain.com>" | egrep "\<lastname#domain.com>"
<firstname.**lastname#domain.com>**
But in this example, a word boundary \b exists before lastname string so it prints the matched characters.
Some more examples:
$ echo "namelastname#domain.com" | egrep "\<e#domain.com"
$ echo "namelastname#domain.com" | egrep "\<lastname#domain.com"
$ echo "namelastname#domain.com" | egrep "\<com"
namelastname#domain.**com**
$ echo "<firstname.lastname#domain.com>" | egrep "\<#domain.com>"
$ echo "n-ame-lastname#domain.com" | egrep "\<ame-lastname#domain.com"
n-**ame-lastname#domain.com**

grep to select strings that contains certain words

I have a list:
/device1/element1/CmdDiscovery
/device1/element1/CmdReaction
/device1/element1/Direction
/device1/element1/MS-E2E003-COM14/Field2
/device1/element1/MS-E2E003-COM14/Field3
/device1/element1/MS-E2E003-COM14/NRepeatLeft
How can I grep so that the returned strings containing only "Field" followed by digits or simply NRepeatLeft at the end of string (in my example it will be the last three strings)?
Expected output:
/device1/element1/MS-E2E003-COM14/Field2
/device1/element1/MS-E2E003-COM14/Field3
/device1/element1/MS-E2E003-COM14/NRepeatLeft
Try doing this :
grep -E "(Field[0-9]*|NRepeatLeft$)" file.txt
| | | ||
| | OR end_line |
| opening_choice closing_choice
extented_grep
if you don't have -E switch (stands for ERE : Extented Regex Expression):
grep "\(Field[0-9]*\|NRepeatLeft$\)" file.txt
OUTPUT
/device1/element1/MS-E2E003-COM14/Field2
/device1/element1/MS-E2E003-COM14/Field3
/device1/element1/MS-E2E003-COM14/NRepeatLeft
That will grep for lines matching Field[0-9] or lines matching RepeatLeft at the end. Is it what you expect ?
I am not much sure of how to use grep for your purpose.Probably you would like perl for this:
perl -lne 'if(/Field[\d]+/ or /NRepeatLeft/){print}' your_file
$ grep -E '(Field[0-9]*|NRepeatLeft)$' file.txt
Output:
/device1/element1/MS-E2E003-COM14/Field2
/device1/element1/MS-E2E003-COM14/Field3
/device1/element1/MS-E2E003-COM14/NRepeatLeft
Explanation:
Field # Match the literal word
[0-9]* # Followed by any number of digits
| # Or
NRepeatLeft # Match the literal word
$ # Match the end of the string
You can see how this works with your example here.

Regex: replacing a string with prefix capture except for a given prefix

I want to replace a string, keeping the prefix, except when it contains a specific prefix.
For instance, any string like "(*)-bar" must be replaced with "(*)-blah" except when "(*)" matches "baz":
foo-bar => should return foo-blah
baz-bar => should remain baz-bar
The best I have so far trims the last letter of the prefix when replacing:
echo "foo-bar" | sed s/"[^(baz)]-bar"/$1-blah/
Use negative lookbehind:
s/(?<!baz)-bar/-blah/
Most sed implementations don't have this advanced regexp feature, but it should work in more modern languages, such as perl.
With sed :
$ echo "foo-bar" | sed '/^foo-baz/!s/^foo-.*$/foo-blah/'
foo-blah
$ echo "foo-baz" | sed '/^foo-baz/!s/^foo-.*$/foo-blah/'
foo-baz
If I decompose :
echo "foo-baz" | sed '/^foo-baz/!s/^foo-.*$/foo-blah/'
| ||| |
+ regex +|+ substitution part +
|
negation of regex

Why does grep match all lines for the pattern "\'"

In this SO question there is something that I cannot explain:
grep "\'" input_file
matches all lines in the given file. Does \' have a special meaning for grep?
grep regex GNU extension: ‘\'’ matches the end of the whole input
I did not know this feature of the regular expressions. But it's listed at regular-expressions.info as the end of the string anchor.
It does not exist in all regex implementations only in GNU Basic and Extended Regular Expressions, see this compatibility chart for more info.
That is a really strange beaviour of grep, I don't know how to explain it, but I must note that \' doesn't match any character. It looks like it has the same meaning as $:
$ echo x | grep "x\'"
x
$ echo xy | grep "x\'"
$ echo x | grep "\'x"
Update 1
As it is stated in http://www.gnu.org/software/findutils/manual/html_node/find_html/grep-regular-expression-syntax.html (thanks to Richard Sitze for the link) it really has the same meaning as $. But meanwhile I've noted a difference between \' and $:
$ echo x | grep 'x$'
x
$ echo x | grep 'x$$'
$ echo x | grep "x\'"
x
$ echo x | grep "x\'\'"
x
$ echo x | grep "x\'\'\'"
x
You can specify \' as many times as you wish but that is not so for $. There must be only one $.
Another important remark. The manual says:
‘\'’ matches the end of the whole input
But strictly speaking that's not truth because \' matches not only the end of the whole input but the end of every single line also:
$ (echo x; echo y) | grep "\'"
x
y
Exactly how $ does.
\ is an escape character. This mean the the ' should considered as text to search for, and not as a control character.