Wish to match a regex expression that matches with a sting when there is an exact number of occurrences of '3', '2', and '1' in a given string.
For instance, having a string "(((3x2x2+1)x2x2+1)x2+1)", I wish to have a regex expression to match exactly one occurrence of '3', five occurrences of '2', and three occurrences of '1'. If there would be more or less '3's or '2's or '1's, the regex shouldn't match.
I have a solution with positive and negative look aheads to do it in one regex. However, i must warn that it can become very messy very quickly. In this problem i would advise to simply count the occurrence of each number in your string without regexes.
That being said,
let's consider the following regex :
^(?=.*(?:1[^1]*){2})(?!.*(?:1[^1]*){3}).*$
^ and .*$ are there to say we want to match the whole string
?: just means we don't want to capture the group
?= is a positive lookahead, which means that our match must satisfy the condition .*(?:1[^1]*){2}
?! is a negative lookahead, which means that our match must NOT satisfy the condition .*(?:1[^1]*){3}
To summarize, you want to ALWAYS match the whole string if the positive lookahead condition is respected (digit 1 is present 2 times) and the negative lookahead condition is not (digit 1 present 3 times)
So in the example above 1*5*1 is matched, 11 is matched but 1*1*1 is not
So now, let's say you want your string to have exactly 3 '1', 1 '2', and 2 '3',
it will look like this
^(?=.*(?:1[^1]*){3})(?!.*(1[^1]*){4})(?=.*(?:2[^2]*){1})(?!.*(2[^2]*){2})(?=.*(?:3[^3]*){2})(?!.*(3[^3]*){3}).*$
It will match (1*2*1*3*1*3) but not (1*2*1*3*1*3*1) or (1*2*1*3*1)
Note that it matches (1*2*1*3*1*37) because there is a 3 in 37
Then again, i would advise against using this solution if you have too many numbers as you need to write a positive and a negative lookahead for each of your number.
I'd do it in 3 steps, as follows:
Mac_3.2.57$echo "(((3x2x2+1)x2x2+1)x2+1)" | egrep '^([^3]*3[^3]*){1}$' | egrep '^([^2]*2[^2]*){5}$' | egrep '^([^1]*1[^1]*){3}$'
(((3x2x2+1)x2x2+1)x2+1)
Mac_3.2.57$echo "(((3x2x2+1)x2x2+1)x2+0)" | egrep '^([^3]*3[^3]*){1}$' | egrep '^([^2]*2[^2]*){5}$' | egrep '^([^1]*1[^1]*){3}$'
Mac_3.2.57$echo "(((3x2x2+1)x2x2+1)x2+1+1)" | egrep '^([^3]*3[^3]*){1}$' | egrep '^([^2]*2[^2]*){5}$' | egrep '^([^1]*1[^1]*){3}$'
Mac_3.2.57$echo "(((3x2x2+1)x2x2+1)x0+1)" | egrep '^([^3]*3[^3]*){1}$' | egrep '^([^2]*2[^2]*){5}$' | egrep '^([^1]*1[^1]*){3}$'
Mac_3.2.57$echo "(((3x2x2+1)x2x2+1)x2+2+1)" | egrep '^([^3]*3[^3]*){1}$' | egrep '^([^2]*2[^2]*){5}$' | egrep '^([^1]*1[^1]*){3}$'
Mac_3.2.57$echo "(((0x2x2+1)x2x2+1)x2+1)" | egrep '^([^3]*3[^3]*){1}$' | egrep '^([^2]*2[^2]*){5}$' | egrep '^([^1]*1[^1]*){3}$'
Mac_3.2.57$echo "(((3X3x2x2+1)x2x2+1)x2+1)" | egrep '^([^3]*3[^3]*){1}$' | egrep '^([^2]*2[^2]*){5}$' | egrep '^([^1]*1[^1]*){3}$'
Mac_3.2.57$echo "(((2x2+1)x2x2+1)x2+1+3)" | egrep '^([^3]*3[^3]*){1}$' | egrep '^([^2]*2[^2]*){5}$' | egrep '^([^1]*1[^1]*){3}$'
(((2x2+1)x2x2+1)x2+1+3)
Mac_3.2.57$
You can check for each one separately and then combine the results.
\A[^3]*(3[^3]*){1}\z
\A[^2]*(2[^2]*){5}\z
\A[^1]*(1[^1]*){3}\z
Depending on your regular expressions engine, you may need to use ^ and $ instead of \A and \z.
I initially thought that it may be possible to combine them, but that would still match if there is one '3' and one '2', for example. You'll probably need code similar to this:
match = input.matches(/[123]/)
match = match && input.matches(\A[^3]*(3[^3]*){1}\z) if input.include?('3')
match = match && input.matches(\A[^2]*(2[^2]*){5}\z) if input.include?('2')
match = match && input.matches(\A[^1]*(1[^1]*){3}\z) if input.include?('1')
Note that the code above assumes that it's ok to have some but not all of these characters as long as the existing ones match the requirements.
It just needs a series of lookaheads to verify an exact number of specific characters.
This is a short version.
^(?=[^3]*3[^3]*$)(?=[^2]*(?:2[^2]*){5}$)(?=[^1]*(?:1[^1]*){3}$).+
https://regex101.com/r/2CJ2U6/1
^
(?= [^3]* 3 [^3]* $ ) # 1 of three
(?= # 5 of two
[^2]*
(?: 2 [^2]* ){5}
$
)
(?= # 3 of one
[^1]*
(?: 1 [^1]* ){3}
$
)
.+
Related
I want it to match
https://www.google.com/search?q=abcW&oq=abcW&ie=UTF-8
The other regex should match
https://www.google.com/search?q=abc4&oq=abc4&ie=UTF-8
Mac_3.2.57$echo "https://www.google.com/search?q=abcW&oq=abcW&ie=UTF-8" | grep 'search?q=.*W&'
https://www.google.com/search?q=abcW&oq=abcW&ie=UTF-8
Mac_3.2.57$echo "https://www.google.com/search?q=abc4&oq=abc4&ie=UTF-8" | grep 'search?q=.*[0-9]&'
https://www.google.com/search?q=abc4&oq=abc4&ie=UTF-8
Mac_3.2.57$
I want to match numbers without signs or operator, So I came up this regex.
echo "-123 +1234" | grep -Po '(?<=-)123 (?<=\+)1234'
but it's not matching the string. Why two lookbehind not working? If I do
echo "-123 +1234" | grep -Po '(?<=-)123
I get the correct result 123 but when I do grep -Po '(?<=-)123 (?<=\+)1234' the second part is not matching.
My desired result:
123 1234
In regex patterns, ab means a followed by b, which is another way of saying b preceded by a. Don't forget that (?<=...) matches zero characters from the perspective of the surrounding pattern, so it's as if it wasn't there from the point of view of the surrounding pattern. This means that (?<=-)123 (?<=\+)1234 will match a subset of what 123 1234 matches. It's of particular interest that the pattern will only match if 1234 is preceded by a space.
The subset of strings that match are those where 123 is preceded by a - (thanks to (?<=-)) and 1234 is preceded by a + (thanks to (?<=\+)). It's of particular interest that the pattern will only match if 1234 is preceded by a +.
Since (?<=-)123 (?<=\+)1234 will only match if 1234 is preceded by a space and preceded by a +, it will never match.
It's not clear what you want.
Maybe you want this?
$ echo "-123 +1234" | grep -Po '\d+'
123
1234
Maybe you want this?
$ echo "-123 +1234" | perl -nle'#m=/\d+/g; print "#m" if #m'
123 1234
Maybe you want this?
$ echo "-123 +1234" | perl -nle'print "$1 $2" if /-(\d+) \+(\d+)/'
123 1234
It may be that you want to just match numbers
echo "-123 +1234" | perl -wnE'#m = /([0-9]+)/g; say for #m'
unless you'd like to match numbers only if they come with signs, in which case
echo "-123 +1234" | perl -wnE'#m = /(?<=[+-])([0-9]+)/g; say for #m'
or just
echo "-123 +1234" | perl -wnE'#m = /[+-]([0-9]+)/g; say for #m'
in which case the + or - are consumed.
If you really want to extract the numbers that are preceded by a - or a +, then you can use:
echo '-123 +1234 456' | grep -oP '(?<=-|\+)\d+'
123
1234
If you just want to extract any sequence of digits (that is a word by itself), then using grep in perl like regex is not required and you can simply use:
echo '-123 +1234 456' | grep -Eo '\b[0-9]+\b'
echo "xxabc jkl" | grep -onP '\w+(?!abc\b)'
1:xxabc
1:jkl
Why the result is not as below?
echo "xxabc jkl" | grep -onP '\w+(?!abc\b)'
1:jkl
The first string is xxabc which ending with abc.
I want to extract all characters which not ending with abc,why xxabc matched?
How to fix it,that is to say get only 1:jkl as output?
Why '\w+(?!abc\b)' can't work?
The \w+(?!abc\b) pattern matches xxabc because \w+ matches 1 or more word chars greedily, and thus grabs xxabc at once. Then, the negative lookahead (?!abc\b) makes sure there is no abc with a trailing word boundary immediately to the left of the current location. Since after xxabc there is no abc with a trailing word boundary, the match succeeds.
To match all words that do not end with abc using a PCRE regex, you may use
echo "xxabc jkl" | grep -onP '\b\w+\b(?<!abc)'
See the online demo
Details
\b - a leading word boundary
\w+ - 1 or more word chars
\b - a trailing word boundary
(?<!abc) - a negative lookbehind that fails the match if the 3 letters immediately to the left of the current location are abc.
Without pcregrep special features, you can do it adding a pipe to sed:
echo "xxabc jkl" | sed 's/[a-zA-Z]*abc//g' | grep -onE '[a-zA-Z]+'
or with awk:
echo "xxabc jkl" | awk -F'[^a-zA-Z]+' '{for(i=1;i<=NF;i++){ if ($i!~/abc$/) printf "%s: %s\n",NR,$i }}'
other approach:
echo "xxabc jkl" | awk -F'([^a-zA-Z]|[a-zA-Z]*abc\\>)+' '{OFS="\n"NR": ";if ($1) printf OFS;$1=$1}1'
I need a regex able to match:
a) All combinations of lower-/upper-cases of a certain word
b) Except a couple of certain case-combinations.
I must search the bash thru thousands of source-code files, occurrences of miss-spelled variables.
Specifically, the word I'm searching for is FrontEnd which in our coding-style guide can be written exactly in 2 ways depending on the context:
FrontEnd (F and E upper)
frontend (all lower)
So I need to "catch" any occurences that do not follow our coding standards as:
frontEnd
FRONTEND
fRonTenD
I have been reading many tutorials of regex for this specific example and I cannot find a way to say "match this pattern BUT do not match if it is exactly this one or this other one".
I guess it would be similar to trying to match "any number between 000000 to 999999, except exactly the number 555555 or the number 123456", I suppose the logic is similar (of course I don't knot to do this either :) )
Thnx
Additional comment:
I cannot use grep piped to grep -v because I could miss lines; for example if I do:
grep -i frontend | grep -v FrontEnd | grep -v frontend
would miss a line like this:
if( frontEnd.name == 'hello' || FrontEnd.value == 3 )
because the second occurence would hide the whole line. Therefore I'm searching for a regex to use with egrep capable to do the exact match I need.
You won't be able to do this easily with egrep because it doesn't support lookaheads. It's probably easiest to do this with perl.
perl -ne 'print if /(?!frontend|FrontEnd)(?i)frontend/;'
To use just pipe the text through stdin
How this works:
perl -ne 'print if /(?!frontend|FrontEnd)(?i)frontend/;'
^ ^^ ^ ^ ^ ^ ^ ^ ^ The pattern that matches both the correct and incorrect versions.
| || | | | | | | This switch turns on case insensitive matching for the rest of the regular expression (use (?-i) to turn it off) (perl specific)
| || | | | | | The pattern that match the correct versions.
| || | | | | Negative forward look ahead, ensures that the good stuff won't be matched
| || | | | Begin regular expression match, returns true if match
| || | | Begin if statement, this expression uses perl's reverse if semantics (expression1 if expression2;)
| || | Print content of $_, which is piped in by -n flag
| || Evaluate perl code from command line
| | Wrap code in while (<>) { } takes each line from stdin and puts it in $_
| Perl command, love it or hate it.
This really should be a comment, but is there any reason you cannot use sed? I'm thinking something like
sed 's/frontend/FrontEnd/ig' input.txt
That is, of course, assuming you want to correct the deviant versions...
I have a list:
/device1/element1/CmdDiscovery
/device1/element1/CmdReaction
/device1/element1/Direction
/device1/element1/MS-E2E003-COM14/Field2
/device1/element1/MS-E2E003-COM14/Field3
/device1/element1/MS-E2E003-COM14/NRepeatLeft
How can I grep so that the returned strings containing only "Field" followed by digits or simply NRepeatLeft at the end of string (in my example it will be the last three strings)?
Expected output:
/device1/element1/MS-E2E003-COM14/Field2
/device1/element1/MS-E2E003-COM14/Field3
/device1/element1/MS-E2E003-COM14/NRepeatLeft
Try doing this :
grep -E "(Field[0-9]*|NRepeatLeft$)" file.txt
| | | ||
| | OR end_line |
| opening_choice closing_choice
extented_grep
if you don't have -E switch (stands for ERE : Extented Regex Expression):
grep "\(Field[0-9]*\|NRepeatLeft$\)" file.txt
OUTPUT
/device1/element1/MS-E2E003-COM14/Field2
/device1/element1/MS-E2E003-COM14/Field3
/device1/element1/MS-E2E003-COM14/NRepeatLeft
That will grep for lines matching Field[0-9] or lines matching RepeatLeft at the end. Is it what you expect ?
I am not much sure of how to use grep for your purpose.Probably you would like perl for this:
perl -lne 'if(/Field[\d]+/ or /NRepeatLeft/){print}' your_file
$ grep -E '(Field[0-9]*|NRepeatLeft)$' file.txt
Output:
/device1/element1/MS-E2E003-COM14/Field2
/device1/element1/MS-E2E003-COM14/Field3
/device1/element1/MS-E2E003-COM14/NRepeatLeft
Explanation:
Field # Match the literal word
[0-9]* # Followed by any number of digits
| # Or
NRepeatLeft # Match the literal word
$ # Match the end of the string
You can see how this works with your example here.