I'm working on a regular expression for SSN with the rules below. I have successfully applied all matching rules except #7. Can someone help alter this expression to include the last rule, #7:
^((?!000|666)[0-8][0-9]{2}-(?!00)[0-9]{2}-(?!0000)[0-9]{4}$|(?!000|666)[0-8][0-9]{2}(?!00)[0-9]{2}(?!0000)[0-9]{4}$)
Hyphens should be optional (this is handled above by using 2 expressions with an OR
Cannot begin with 000
Cannot begin with 666
Cannot begin with 900-999
Middle digits cannot be 00
Last four digits cannot 0000
Cannot be all the same numbers ex: 111-11-1111 or 111111111
Add the following negative look ahead anchored to start:
^(?!(.)(\1|-)+$)
See live demo.
This captures the first character then asserts the rest of the input is not made of that captured char or hyphen.
The whole regex can be shortened to:
^(?!(.)(\1|-)+$)(?!000|666|9..)(?!...-?00)(?!.*0000$)\d{3}(-?)\d\d\3\d{4}$
See live demo.
The main trick to not having to repeat the regex both with and without the hyphens was to capture the optional hyphen (as group 3), then use a back reference \3 to the capture in the next position, so are either both there or both absent.
First, let's shorten the pattern as it contains two next-to identical alternatives, one matching SSN with hyphens, and the other matching the SSN numbers without hyphens. Instead of ^(x-y-z$|xyz$) pattern, you can use a ^x(-?)y\1z$ pattern, so your regex can get reduced to ^(?!000|666)[0-8][0-9]{2}(-?)(?!00)[0-9]{2}\1(?!0000)[0-9]{4}$, see this regex demo here.
To make a pattern never match a string that contains only identical digits, you may add the following negative lookahead right after ^:
(?!\D*(\d)(?:\D*\1)*\D*$)
It fails the match if there are
\D* - zero or more non-digits
(\d) - a digit (captured in Group 1)
(?:\D*\1)* - zero or more occurrences of any zero or more non-digits and then then same digit as in Group 1, and then
\D*$ - zero or more non-digits till the end of string.
Now, since I suggested shortening the regex to the pattern with backreference(s), you will have to adjust the backreferences after adding this lookahead.
So, your solution looks like
^(?!\D*(\d)(?:\D*\1)*\D*$)(?!000|666)[0-8]\d{2}(-?)(?!00)\d{2}\2(?!0000)\d{4}$
^(?![^0-9]*([0-9])(?:[^0-9]*\1)*[^0-9]*$)(?!000|666)[0-8][0-9]{2}(-?)(?!00)[0-9]{2}\2(?!0000)[0-9]{4}$
Note the \1 in the pattern without the lookahead turned into \2 as (-?) became Group 2.
See the regex demo.
Note also that in some regex flavors \d is not equal to [0-9].
Related
I am testing the following regex:
(?<=\d{3}).+(?!',')
This at regex101 regex
Test string:
187 SURNAME First Names 7 Every Street, Welltown Racing Driver
The sequence I require is:
Begin after 3 digit numeral
Read all characters
Don't read the comma
In other words:
SURNAME First Names 7 Every Street
But as demo shows the negative lookahead to the comma has no bearing on the result. I can't see anything wrong with my lookarounds.
You could match the 3 digits, and make use of a capture group capturing any character except a comma.
\b\d{3}\b\s*([^,]+)
Explanation
\b\d{3}\b Match 3 digits between word boundaries to prevent partial word matches
\s* Match optional whitespace chars
([^,]+) Capture group 1, match 1+ chars other than a comma
Regex demo
.+ consumes everything.
So (?!,) is guaranteed to be true.
I'm not sure if using quotes is correct for whichever flavour of regex you are using. Bare comma seems more correct.
Try:
(?<=\d{3})[^,]+
I've got 2 strings in the format:
Some_thing_here_1234 Match Me 1 & 1234 Match Me 1_1
In both cases I want the resultant match to be 1234 Match Me 1
So far I've got (?<=^|_)\d{4}\s.+ which works but in the case of string 2 also captures the _1 at the end. I thought I could use a lookahead at the end with an optional such as (?<=^|_)\d{4}\s.+(?=_\d{1}$|$) but it always seems to revert to the second option and so the _1 gets through.
Any help would be great
You can use
(?<=^|_)\d{4}\s[^_]+
See the regex demo.
Details:
(?<=^|_) - a positive lookbehind that matches a location that is immediately preceded with either start of string or a _ char (equal to (?<![^_]))
\d{4} - four digits
\s - a whitespace
[^_]+ - one or more chars other than _.
Your second pattern (?<=^|_)\d{4}\s.+(?=_\d{1}$|$) is greedy and at the end of the string the second alternative |$ will match so you will keep matching the whole line.
Note that you can omit {1}
If you want to use an optional part in the lookahad, you can make the match non greedy and optionally match :_\d in the lookahead followed by the end of the string.
(?<=^|_)\d{4}\s.+?(?=(?:_\d)?$)
See a regex demo.
I have this kind of text:
other text opt1 opt2 opt3 I_want_only_this_text because_of_this
And am using this regex:
(?<=opt1|opt2|opt3).*?(?=because_of_this)
Which returns me:
opt2 opt3 I_want_only_this_text
However, I want to match only "I_want_only_this_text".
What is the best way to achieve this?
I don't know in what order the opt's will appear and they are only examples. Actual words will be different and there will be more of them.
Test screenshot
Actual data:
regex
(?<=※|を|備考|町|品は|。).*(?=のお届けとなります|でお届けします|にてお届け致します|にてお届けいたします)
text
こだわり豚には通常の豚よりビタミンB1が2倍以上あります。私たちの育てた愛情たっぷりのこだわり豚をぜひ召し上がってください。商品説明名称えびの産こだわり豚切落し産地宮崎県えびの市内容量500g×8パック合計4kg賞味期限90日保存方法-15℃以下で保存すること提供者株式会社さつま屋産業備考・本お礼品は冷凍でのお届けとなります
what I want to get:
冷凍で
You can use
(?<=※|を|備考|町|品は|。)(?:(?!※|を|備考|町|品は|。).)*?(?=のお届けとなります|でお届けします|にてお届け致します|にてお届けいたします)
See the regex demo. The scheme is the same as in (?<=opt1|opt2|opt3)(?:(?!opt1|opt2|opt3).)*?(?=because_of_this) (see demo).
The tempered greedy token solution allows you to match multiple occurrences of the same pattern in a longer string.
Details
(?<=※|を|備考|町|品は|。) - a positive lookbehind that matches a location that is immediately preceded with one of the alternatives listed in the lookbehind
(?:(?!※|を|備考|町|品は|。).)*? - any char other than a line break char, zero or more but as few as possible occurrences, that is not a starting point of any of the alternative patterns in the negative lookahead
(?=のお届けとなります|でお届けします|にてお届け致します|にてお届けいたします) - a positive lookahead that requires one of the alternative patterns to appear immediately to the right of the current location.
You could add a negative lookahead (?!\s*opt\d) to assert that there is no opt and a digit to the right. You can use a character class to list the digits 1, 2 and 3 instead of using the alternation with |.
(?<=\bopt[123]\s(?!\s*opt\d)).*?(?=\s*\bbecause_of_this\b)
Regex demo
It might be a bit more efficient to use a match with a capture group:
\bopt[123]\s(?!\s*opt\d)(.*?)\s*\bbecause_of_this\b
Regex demo
What about:
.*\bopt[123]\b\s*(.*?)\s*because_of_this\b
See the online demo.
.* - A greedy match of any character other than newline upto the last occurence of:
\bopt[123]\b - A word boundary followed by literally "opt" with a trailing number 1, 2 or 3 and another word boundary.
\s* - 0+ whitespace characters.
(.*?) - A 1st capture group with a lazy match of 0+ characters upto:
\s* - 0+ whitespace characters.
because_of_this\b - Literally "because_of_this" followed by a word-boundary.
If you need to have this written out in alternations:
.*\b(?:opt1|opt2|opt3)\b\s*(.*?)\s*because_of_this\b
See that demo.
I need to extract 1234567 from below URLs
http://www.test.in/some--wonders-1234567---2
http://www.test.in/some--wonders-1234567
I tried with .*\-([0-9]+)(?:-{2,}2)?.
but for the first URL it returned 2, but this is in non-capturing group.
Please give me a solution. I am digging it for so long. not getting any idea.
Try this one:
.*?\-([0-9]+)(?:-{2,}2|$)
It sets lazy mode for first .* pattern, you can also remove it at all with same effect:
\-([0-9]+)(?:-{2,}2|$)
If your regex engine supports negative look behinds (some do not), you can do it this way:
(?<!\d+-+)\d+
It gives you any non-empty digit string, which is not preceded by (minuses followed by digits).
Big advantage is that you don't have to use groups here - regex itself returns what you want.
You could match a - followed by one or more digits which you could capture in a group ([0-9]+). This group will contain the value you want to extract.
Then an optional part (?:-{2,}[0-9]+)? that would match ---2 followed by asserting the end of the line $.
-(\d+)(?:-{2,}\d+)?$
Explanation
- Match literally
(\d+) Capture one or more digits in a group
(?: Non capturing group
-{2,} Match 2 or more times -
\d+ Match one or more digits
)? close non capturing group and make it optional
$ Assert position at the end of the line
My regular expression = '(\d+)\1+'
My Aim is to capture repeating patters such as 2323 , 1212, 345345 which have different digits. Current regex also captures 11,22,11111 which I need to exclude
Example -
For the input = 44556841335158684945454545
Matches are
44
55
45454545
Matches should be -
45454545
How do I write a regex which excludes 44 and 55 and gives results which have different digits
Here is the regex I believe you want:
(\d)((?!\1)\d)
A bit of explanation:
(\d)
\d matches a digit (equal to [0-9])
((?!\1)\d)
Negative Lookahead (?!\1)
Assert that the Regex below does not match
\1
matches the same text as most recently matched by the 1st capturing group
\d
matches a digit (equal to [0-9])
Here is a quick JS demo:
var s = "44556841335158684945454545"
console.log(s.match(/(\d)((?!\1)\d)/g))
To say "two different numbers repeated" you can try
((\d)(?!\2)\d)\1
Capturing parentheses are numbered from the left; so \1 matches the entire outer pair of parentheses, and (?!\2) refers to the inner parentheses around the first digit, constraining the second digit so that it cannot be identical to the first.
Demo: https://regex101.com/r/5f2CEf/1
Obviously, add a + at the end to cover all adjacent repetitions of the match.