egrep matching patterns containing 3 repeated digits, not consecutive - regex

Use egrep to match lines containing 3 repeated digits, not necessarily consecutive,i.e "3 33", "55 5", "666" or "a6b6c6d". I have an initial thought.
I tried:
egrep '1[^1]*1[^1]*1' test
This will recoginize stuff like 1abd1df31.
However, I try not to enumerate all digit from 0 to 9. So how can I generalize this using back reference?
Thanks ahead!
NOTE that: there these three digit should be identical. ie. 3aa2aa1aa should not match.

This will do it:
/(?=.*?(\d))(?:(?:.*?\1){3})/
DEMO
EXPLANATION:
(?=.*?(\d))(?:(?:.*?\1){3})
Assert that the regex below can be matched, starting at this position (positive lookahead) «(?=.*?(\d))»
Match any single character that is not a line break character «.*?»
Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Match the regular expression below and capture its match into backreference number 1 «(\d)»
Match a single digit 0..9 «\d»
Match the regular expression below «(?:(?:.*?\1){3})»
Match the regular expression below «(?:.*?\1){3}»
Exactly 3 times «{3}»
Match any single character that is not a line break character «.*?»
Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Match the same text as most recently matched by capturing group number 1 «\1»

This works for simple cases:
egrep '^[^0-9]*([0-9])[^0-9]*\1[^0-9]*\1[^0-9]*$'
Explanation:
[^0-9]* zero or more non-digits
([0-9]) one digit captured with parens
\1 back-reference to the captured digit
[^0-9] zero or more none digits
^ and $ beginning and end of line
Caveat programmor:
It matches 3 foo 3 bar 3 but fails for 3 4 3 baz 3. In other words, no other digits are allowed in the line, just the 3 you're looking for.
Try this Perl one-liner to match the tricky cases with multiple digit types.
perl -ne '$i=$_;%a=();$a{$_}++for(split//,$i);for(0..9){if($a{$_}==3){print $i;last}}'
For each line $i it creates a hash %a addressed by each character of the line, storing occurrence counts. Then I check for digits with occurrence counts of 3, if any found, line $i is printed.

Related

Regex to match string of limited length and if number digit are present it can't be more than 5 digit

im looking for regex that can match string with requirement below.
must be 5 to 15 characters
Alphanumeric, can accept fully alphabet, if numeric are present, it must not exceed 5 digit and it can be in anywhere in the string.
Example accepted input
helloworld
123helloworld56
1h2e3l4l5oworld
12345
if the numeric digit exceeded 5 it shall be rejected. Example rejected input:
123456
123hello4567
So far i have tried while looking online and done some tweaking, but none work as expected.
^(?=.*\d?.*\d?.*\d?.*\d?.*\d?).{0,15}$
^(?=[a-zA-Z1-9]{5,15}$)[a-zA-Z]{1,15}[1-9]{0,5}$
^(?=.*\d){0,5}.{0,15}$
I have stuck on this for some time now, any help are appreciated!
If there can not be more than 5 digits in total, that means you should not be able to match 6 digits.
You can use a negative lookahead to assert what is on the right can not match 6 digits.
^(?!(?:[^\d\r\n]*\d){6})[a-zA-Z0-9]{5,15}$
Explanation
^ Start of string
(?! Negative lookahead, assert what is at the right is not
(?:[^\d\r\n]*\d){6} Match 6 times any char except a newline or a digit, then match a digit
) Close lookahead
[a-zA-Z0-9]{5,15} Match 5-15 times any of the listed in the character class
$ End of string
Regex demo
Note that using [1-9] in a character class does not match the 0, and \d will
About the patterns in the question
^(?=.*\d?.*\d?.*\d?.*\d?.*\d?).{0,15}$
Here, the lookahead will always be true as all the parts in it are optional. It could also match an empty string as the quantifier {0,15} starts at 0, which makes it optional.
^(?=[a-zA-Z1-9]{5,15}$)[a-zA-Z]{1,15}[1-9]{0,5}$
The pattern asserts a string with 5-15 times any of the listed in the character class. But the matching starts with 1-15 times a char a-zA-Z followed by matching 0-5 times a digit at the end of the string.
^(?=.*\d){0,5}.{0,15}$
The pattern optionally asserts 0-5 digits which is always true as it is optional. Then it matches 0-15 times any char.

how to match a list of fixed length words separated by space or comma?

The words' length could be 2 or 6-10 and could be separated by space or comma. The word only include alphabet, not case sensitive.
Here is the groups of words that should be matched:
RE,re,rereRE
Not matching groups:
RE,rere,rel
RE,RERE
Here is the pattern that I have tried
((([a-zA-Z]{2})|([a-zA-Z]{6,10}))(,|\s+)?)
But unfortunately this pattern can match string like this: RE,RERE
Look like the word boundary has not been set.
You could match chars a-z either 2 or 6 - 10 times using an alternation
Then repeat that pattern 0+ times preceded by a comma or a space [ ,].
^(?:[A-Za-z]{6,10}|[A-Za-z]{2})(?:[, ](?:[A-Za-z]{6,10}|[A-Za-z]{2}))*$
Explanation
^ Start of string
(?:[A-Za-z]{6,10}|[A-Za-z]{2}) Match chars a-z 6 -10 or 2 times
(?: Non capturing group
[, ](?:[A-Za-z]{6,10}|[A-Za-z]{2}) Match comma or space and repeat previous pattern
)* Close non capturing group and repeat 0+ times
$ End of string
Regex demo
If lookarounds are supported, you might also assert what is directly on the left and on the right is not a non whitespace character \S.
(?<!\S)(?:[A-Za-z]{6,10}|[A-Za-z]{2})(?:[ ,](?:[A-Za-z]{6,10}|[A-Za-z]{2}))*(?!\S)
Regex demo
([a-zA-Z]{2}(,|\s)|[a-zA-Z]{6,10}|(,|\s))
This one will get only the words who have 2 letter, or between 6 and 10
\b,?([a-zA-Z]{6,10}|[a-zA-Z]{2}),?\b
You can use this
^(?!.*\b[a-z]{4}\b)(?:(?:[a-z]{2}|[a-z]{6,10})(?:,|[ ]+)?)+$
Regex Demo
This regex will match your first case, but neither of your two other cases:
^((([a-zA-Z]{2})|([a-zA-Z]{6,10}))(,|[ ]+|$))+$
I'm making the assumption here that each line should be a single match.
Here it is in action.

Regex to require space after comma in list

I want to require a space after every comma in a list. I've got this, which works pretty well for my lists that have 5 to 7 digits, separated by commas.
^([^,]{5,7},)*[^,][^ ]{5,7}$
The problem is it allows 12345,12345. I don't want that to pass. 12345, 12345 should pass. I also need just 12345 to pass, so the comma and space is not required if it's just one 5-7 digit number.
Your regex does not match 12345,12345 because this part ([^,]{5,7},)* will match from the start including the comma.
Then it matches not a comma [^,] which will match the second 1 and then it has to match not a whitespace [^ ]{5,7} but there are only 4 characters left to match which are 2345 and it can not match.
If the first part fails it tries to match [^,][^ ]{5,7} which in total matches 6-8 characters.
You might use:
^[^,\s]{5,7}(?:, [^,\s]{5,7})*$
Regex demo
^ Start of the string
[^,\s]{5,7} Match not a whitespace character of a comma 5 - 7 times
(?: Non capturing group
, [^,\s]{5,7} Match a comma, space and not a comma or a whitespace character 5-7 times
)* Close non capturing group and repeat 0+ times
$ End of the string
I didn't understand your regex, but something as simple as this should work:
^(?:\d{5,7}, )*\d{5,7}$
Or if you didn't intend to allow digit-only,
^(?:[^, ]{5,7}, )*[^, ]{5,7}$

Regex Pattern where group may not exist

I have a RegEx pattern that needs to match on any of the following lines:
10-10-15 15:16:41.1 Some Text here
10-10-15 15:16:41.12 Some Text here
10-10-15 15:16:41.123 Some Text here
10-10-15 15:16:41 Some Text here
I can match the first 3 with the pattern below:
(?<date>(?<day>\d{1,2})-(?<month>\d{1,2})-(?<year>(?:\d{4}|\d{2}))\s(?<time>(?<hour>\d{2}):(?<minutes>\d{2}):(?<seconds>\d{2})\.(?<milli>\d{0,3})))\s(?<Line>.*)
How do i Match this line (10-10-15 15:16:41 Some Text here) which has no milliseconds but still get the group back in my result either wit a blank value or with 0 as the value?
Thanks
As i said each of the lines below will match:
10-10-15 15:16:41.123 Some text Here
10-10-15 15:16:41.12 Some Text here
10-10-15 15:16:41.1 Some Text here
10-10-15 15:16:41. Some Text here
The groups look like so:
date [0-18] `10-10-15 15:16:41.`
day [0-2] `10`
month [3-5] `10`
year [6-8] `15`
time [9-18] `15:16:41.`
hour [9-11] `15`
minutes [12-14] `16`
seconds [15-17] `41`
milli [18-18] ``
Line [19-34] `Some Text here `
You can use the following (slightly modified version of your regex):
(?<date>(?<day>\d{1,2})-(?<month>\d{1,2})-(?<year>(?:\d{4}|\d{2}))\s(?<time>(?<hour>\d{2}):(?<minutes>\d{2}):(?<seconds>\d{2})(?<milli>\.\d{0,3})?))\s(?<logEntry>.*)
See DEMO
Explanation:
Make the <milli> part optional.. and not the . since it matches strings like 10-10-15 15:16:41123 Some Text here also..
Worked it out. I needed the following pattern:
(?<date>(?<day>\d{1,2})-(?<month>\d{1,2})-(?<year>(?:\d{4}|\d{2}))\s(?<time>(?<hour>\d{2}):(?<minutes>\d{2}):(?<seconds>\d{2})(?<milli>\.?\d{0,3})))\s(?<logEntry>.*)
^(\d+)-(\d+)-(\d+)\s(\d+):(\d+):(\d+)\.?(\d*)([a-zA-Z\s]+)
Note the (\d*) which will return the group even if empty.
Demo
Make the milliseconds optional ?
/^([\d]{2})-([\d]{2})-([\d]{2}|[\d]{4})\s+([\d]{2}):([\d]{2}):([\d]{2})\.?(\d+)?\s+(.*?)$/
Example:
<?php
$strings = <<< LOL
10-10-15 15:16:41.1 Some Text here
10-10-15 15:16:41.12 Some Text here
10-10-15 15:16:41.123 Some Text here
10-10-15 15:16:41 Some Text here
LOL;
preg_match_all('/^([\d]{2})-([\d]{2})-([\d]{2}|[\d]{4})\s+([\d]{2}):([\d]{2}):([\d]{2})\.?(\d+)?\s+(.*?)$/m', $strings , $matches, PREG_PATTERN_ORDER);
for ($i = 0; $i < count($matches[0]); $i++) {
$day = $matches[1][$i];
$month = $matches[2][$i];
$year = $matches[3][$i];
$hours = $matches[4][$i];
$minutes = $matches[5][$i];
$seconds = $matches[6][$i];
$ms = $matches[7][$i];
$text = $matches[8][$i];
echo "$day $month $year $hours $minutes $seconds $ms $text \n";
}
Regex Demo:
https://regex101.com/r/aF9wN6/1
PHP Demo:
http://ideone.com/1aEt2E
Regex Explanation:
^([\d]{2})-([\d]{2})-([\d]{2}|[\d]{4})\s+([\d]{2}):([\d]{2}):([\d]{2})\.?(\d+)?\s+(.*?)$
Assert position at the beginning of a line (at beginning of the string or after a line break character) (line feed) «^»
Match the regex below and capture its match into backreference number 1 «([\d]{2})»
Match a single character that is a “digit” (any decimal number in any Unicode script) «[\d]{2}»
Exactly 2 times «{2}»
Match the character “-” literally «-»
Match the regex below and capture its match into backreference number 2 «([\d]{2})»
Match a single character that is a “digit” (any decimal number in any Unicode script) «[\d]{2}»
Exactly 2 times «{2}»
Match the character “-” literally «-»
Match the regex below and capture its match into backreference number 3 «([\d]{2}|[\d]{4})»
Match this alternative (attempting the next alternative only if this one fails) «[\d]{2}»
Match a single character that is a “digit” (any decimal number in any Unicode script) «[\d]{2}»
Exactly 2 times «{2}»
Or match this alternative (the entire group fails if this one fails to match) «[\d]{4}»
Match a single character that is a “digit” (any decimal number in any Unicode script) «[\d]{4}»
Exactly 4 times «{4}»
Match a single character that is a “whitespace character” (any Unicode separator, tab, line feed, carriage return, form feed) «\s+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match the regex below and capture its match into backreference number 4 «([\d]{2})»
Match a single character that is a “digit” (any decimal number in any Unicode script) «[\d]{2}»
Exactly 2 times «{2}»
Match the character “:” literally «:»
Match the regex below and capture its match into backreference number 5 «([\d]{2})»
Match a single character that is a “digit” (any decimal number in any Unicode script) «[\d]{2}»
Exactly 2 times «{2}»
Match the character “:” literally «:»
Match the regex below and capture its match into backreference number 6 «([\d]{2})»
Match a single character that is a “digit” (any decimal number in any Unicode script) «[\d]{2}»
Exactly 2 times «{2}»
Match the character “.” literally «\.?»
Between zero and one times, as many times as possible, giving back as needed (greedy) «?»
Match the regex below and capture its match into backreference number 7 «(\d+)?»
Between zero and one times, as many times as possible, giving back as needed (greedy) «?»
Match a single character that is a “digit” (any decimal number in any Unicode script) «\d+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match a single character that is a “whitespace character” (any Unicode separator, tab, line feed, carriage return, form feed) «\s+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match the regex below and capture its match into backreference number 8 «(.*?)»
Match any single character that is NOT a line break character (line feed) «.*?»
Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Assert position at the end of a line (at the end of the string or before a line break character) (line feed) «$»

Regex to find repeating numbers

Can anyone help me or direct me to build a regex to validate repeating numbers
eg : 11111111, 2222, 99999999999, etc
It should validate for any length.
\b(\d)\1+\b
Explanation:
\b # match word boundary
(\d) # match digit remember it
\1+ # match one or more instances of the previously matched digit
\b # match word boundary
If 1 should also be a valid match (zero repetitions), use a * instead of the +.
If you also want to allow longer repeats (123123123) use
\b(\d+)\1+\b
If the regex should be applied to the entire string (as opposed to finding "repeat-numbers in a longer string), use start- and end-of-line anchors instead of \b:
^(\d)\1+$
Edit: How to match the exact opposite, i. e. a number where not all digits are the same (except if the entire number is simply a digit):
^(\d)(?!\1+$)\d*$
^ # Start of string
(\d) # Match a digit
(?! # Assert that the following doesn't match:
\1+ # one or more repetitions of the previously matched digit
$ # until the end of the string
) # End of lookahead assertion
\d* # Match zero or more digits
$ # until the end of the string
To match a number of repetitions of a single digit, you can write ([0-9])\1*.
This matches [0-9] into a group, then matches 0 or more repetions (\1) of that group.
You can write \1+ to match one or more repetitions.
Use a backreference:
(\d)\1+
Probably you want to use some sort of anchors ^(\d)\1+$ or \b(\d)\1+\b
I used this expression to give me all phone numbers that are all the same digit.
Basically, it means to give 9 repetitions of the original first repetition of a given number, which results in 10 of the same number in a row.
([0-9])\1{9}
(\d)\1+? matches any digit repeating
you can get repeted text or numbers easily by backreference take a look on following example:
this code simply means whatever the pattern inside [] . ([inside pattern]) the \1 will go finding same as inside pattern forward to that.