I have a huge dataset, where I am trying to extract a group of 4 digits. The problem is, sometimes there will be a preceding group of 4 digits that I don't want. These 2 groups will never be the same as each other.
Example:
String String 7777 Some more string
String 1234 7777 Some more string
In both cases, I want to extract ONLY 7777 (or whatever digit combination replaces it). There is no pattern to distinguish which number group will be in which position - any number from 0000 to 9999 can be in either first or second position.
If this were possible, I think it'd do what I want?
\b\d{4}{0,1}\s{0,1}(\d{4})\b
Optional 4 digits, optional space, capture 4 digits. But I've tried it, and some variations of it, but I can't get it to work!
A look-ahead seems like a possible candidate, but I don't understand how to construct the pattern.
You can use a negative look-ahead to check if there is no subsequent 4-digit number after it:
\b\d{4}\b(?!\s?\d{4}\b)
See demo
EDIT:
To capture 4-digit number that is not followed by any text and another 4-digit number, you should use:
\b\d{4}\b(?!.+\b\d{4}\b)
See demo
You can use this expression that matches the four digit group not followed by any other four digit groups:
\d{4}(?!.+\d{4}.+)
Online test here.
Related
I should only catch numbers which are fit the rules.
Rules:
it should be 16 digit
first 11 digit can be any number
after 3 digit should have all zero
last two digit can be any number.
I did this way;
([0-9]{11}[0]{3}[0-9]{2})
number example:
1234567890100012
now I want to get the number even it has got any letter beginning or ending of the string like " abc1234567890100012abc"
my output should be just number like "1234567890100012"
When I add [a-zA-Z]* it gives all string.
Also another point is if there is any number beginning or ending of the string like "999912345678901000129999". program shouldn't take this. I mean It should return none or nothing. How can I write this with regex.
You can use look around to exclude the cases where there are more digits before/after:
(?<!\d)\d{11}000\d\d(?!\d)
On regex101
You can use a capture group, and match optional chars a-zA-Z before and after the group.
To prevent a partial match, you can use word boundaries \b or if the string should match from the start and end of the line you can use anchors ^ and $
\b[a-zA-Z]*([0-9]{11}000[0-9]{2})[a-zA-Z]*\b
Regex demo
String to be evaluated will be either be a 10 digit number or a 4 digit number.
5551119900 (10 Digit)
9999 (4 Digit)
Need regex to test for specific list of 10 digit numbers or 4 digit numbers. I have the following Regex that almost works
55511199(00|01|02|10|20|30)|(0000|9901|9902|9903|9999)
Above is checking for
5551119900
5551119901
5551119902
5551119910
5551119920
5551119930
0000
9901
9902
9903
9999
ISSUE:
(1) Need match to be exactly 10 digits or 4 digits only.
(2) Pattern match (see link below) is showing an exact match and also a "Group 1". I'm not sure what the group match means or if that is a good thing.
Sample: https://regex101.com/r/BbplFG/1/
Try this version of your regex:
^(?:55511199(?:00|01|02|10|20|30)|(?:0000|9901|9902|9903|9999))$
Demo
I have made several changes here:
Used ?: inside terms in parentheses, to turn off group capturing
Placed the entire pattern inside parentheses
Added starting (^) and ending ($) anchors around the entire pattern
Say i give a pattern 123* or 1234* , i would like to match any 10 digit number that starts with that pattern. It should have exactly 10 digits.
Example:
Pattern : 123 should match 1234567890 but not 12345678
I tried this regex : (^(123)(\d{0,10}))(?(1)\d{10}).. obviously it didn't work. I tried to group the pattern and remaining digits as two different groups. It matches 10 digits after the captured group (https://regex101.com/). How do i check the captured group is exactly 10 digits? Or is there any good knacks here. Please guide me.
Sounds like a case for the positive lookahead:
(?=123)\d{10}
This will match any sequence of exactly 10 digits but only if prefixed with 123. Test it here.
Similarly for prefix 1234:
(?=1234)\d{10}
Of course, if you know the prefix length upfront, you can use 123\d{7}, but then you'll have to change range limits with each prefix change (for example: 1234\d{6}).
Additionally, to ensure only isolated groups of 10 digits are captured, you might want to anchor the above expression with a (zero-length) word boundary \b:
\b(?=123)\d{10}\b
or, if your sequence can appear inside of the word, you might want to use negative lookbehind and lookahead on \d (as suggested in comments by #Wiktor):
(?<!\d)(?=123)\d{10}(?!\d)
I would keep it simple:
import re
text = "1234567890"
match = re.search("^123\d{7}$|^1111\d{6}$", text)
if match:
print ("matched")
Just throw your 2 patterns in as such and it should be good to go! Note that 123* would catch 1234* so I'm using 1111\d{6} as an example
I need a regular expression to find the last occurrence of 5 consecutive digits in a string. This is what I have right now:
([0-9]{5})[a-zA-Z]*$
This only matches some of my test strings.
In a live environment the numbers will change, but for testing I expect to capture the substring '12345' in each of the test strings below:
D012345
D012345AS
D012345RM-67
D12345D
12345D67
TEST-Str12345ing-rm6
Updated
Works w/ global flag, passes all tests. No capture group required.:
[0-9]{5}(?![0-9])(?!.*[0-9]{5})
Live example:
http://www.regexr.com/3c875
Let's break this down:
Match any instance of five digits
[0-9]{5}
But the instance cannot be immediately followed by another digit - this way,
we always get the last five in any group of consecutive numbers.
(?![0-9])
Lastly, make sure no further groups of consecutive numbers exist that have
more than five digits.
(?!.*[0-9]{5})
You can use this regex:
.*([0-9]{5})
to make sure you're matching last 5 continuous digits in the input. Matched 5 digits are available in captured group #1.
.* (greedy match) at start makes sure that we match very last 5 digits only.
RegEx Demo
Can I use
\d\d\d\d[^\d]
to check for four consecutive numbers?
For example,
411112 OK
455553 OK
1200003 OK
f44443 OK
g55553 OK
3333 OK
f4442 No
45553 No
f4444g4444 No
f44444444 No
If you want to find any series of 4 digits in a string /\d\d\d\d/ or /\d{4}/ will do. If you want to find a series of exactly 4 digits, use /[^\d]\d{4}[^\d]/. If the string should simply contain 4 consecutive digits use /^\d{4}$/.
Edit: I think you want to find 4 of the same digits, you need a backreference for that. /(\d)\1{3}/ is probably what you're looking for.
Edit 2: /(^|(.)(?!\2))(\d)\3{3}(?!\3)/ will only match strings with exactly 4 of the same consecutive digits.
The first group matches the start of the string or any character. Then there's a negative look-ahead that uses the first group to ensure that the following characters don't match the first character, if any. The third group matches any digit, which is then repeated 3 times with a backreference to group 3. Finally there's a look-ahead that ensures that the following character doesn't match the series of consecutive digits.
This sort of stuff is difficult to do in javascript because you don't have things like forward references and look-behind.
Should the numbers be part of a string, or do you want only the four numbers. In the later case, the regexp should be ^\d{4}$. The ^ marks the beginning of the string, $ the end. That makes sure, that only four numbers are valid, and nothing before or after that.
That should match four digits (\d\d\d\d) followed by a non digit character ([^\d]). If you just want to match any four digits, you should used \d\d\d\d or \d{4}. If you want to make sure that the string contains just four consecutive digits, use ^\d{4}$. The ^ will instruct the regex engine to start matching at the beginning of the string while the $ will instruct the regex engine to stop matching at the end of the string.