Regex to match 10 digit exactly with specific pattern - regex

Say i give a pattern 123* or 1234* , i would like to match any 10 digit number that starts with that pattern. It should have exactly 10 digits.
Example:
Pattern : 123 should match 1234567890 but not 12345678
I tried this regex : (^(123)(\d{0,10}))(?(1)\d{10}).. obviously it didn't work. I tried to group the pattern and remaining digits as two different groups. It matches 10 digits after the captured group (https://regex101.com/). How do i check the captured group is exactly 10 digits? Or is there any good knacks here. Please guide me.

Sounds like a case for the positive lookahead:
(?=123)\d{10}
This will match any sequence of exactly 10 digits but only if prefixed with 123. Test it here.
Similarly for prefix 1234:
(?=1234)\d{10}
Of course, if you know the prefix length upfront, you can use 123\d{7}, but then you'll have to change range limits with each prefix change (for example: 1234\d{6}).
Additionally, to ensure only isolated groups of 10 digits are captured, you might want to anchor the above expression with a (zero-length) word boundary \b:
\b(?=123)\d{10}\b
or, if your sequence can appear inside of the word, you might want to use negative lookbehind and lookahead on \d (as suggested in comments by #Wiktor):
(?<!\d)(?=123)\d{10}(?!\d)

I would keep it simple:
import re
text = "1234567890"
match = re.search("^123\d{7}$|^1111\d{6}$", text)
if match:
print ("matched")
Just throw your 2 patterns in as such and it should be good to go! Note that 123* would catch 1234* so I'm using 1111\d{6} as an example

Related

Regex how can i get only exact part in a string

I should only catch numbers which are fit the rules.
Rules:
it should be 16 digit
first 11 digit can be any number
after 3 digit should have all zero
last two digit can be any number.
I did this way;
([0-9]{11}[0]{3}[0-9]{2})
number example:
1234567890100012
now I want to get the number even it has got any letter beginning or ending of the string like " abc1234567890100012abc"
my output should be just number like "1234567890100012"
When I add [a-zA-Z]* it gives all string.
Also another point is if there is any number beginning or ending of the string like "999912345678901000129999". program shouldn't take this. I mean It should return none or nothing. How can I write this with regex.
You can use look around to exclude the cases where there are more digits before/after:
(?<!\d)\d{11}000\d\d(?!\d)
On regex101
You can use a capture group, and match optional chars a-zA-Z before and after the group.
To prevent a partial match, you can use word boundaries \b or if the string should match from the start and end of the line you can use anchors ^ and $
\b[a-zA-Z]*([0-9]{11}000[0-9]{2})[a-zA-Z]*\b
Regex demo

Regex : extract the biggest number from x to y figures

I have an Url formatted as follow : https://www.mywebsite.com/subdomain/123456789.htm. I know that the webpage number is built with exactly 9 or 10 digits. I would like to extract this number using a Regex.
The Regex I use to perform this operation is :
^https://www.mywebsite.com/[A-Za-z0-9_.-~/]+([0-9]{9,10}).htm$
The problem is that when the number is 10 digits long, I get a match which is good but only the last 9 digits are captured. For example : https://www.mywebsite.com/subdomain/1234567890.htm captures 234567890 only.
I could easily create two regexes (one with 9 digits and one with 10) and take the longest number if both matches, but is there any elegant way to solve this problem using Regex?
EDIT
Following remarks which have been made below, there is actually a mistake in my original Regex : the first character group matches the first digit of the 10, and leaves only the 9 others for the capturing group. I've added a screenshot below. Adding a forward slash to the Regex before the capturing group solved the issue, thanks!
As per #TheFourthBird, you are missing a match on the forward slash. Maybe a slightly different approach to yours would be a non-capturing group:
^https://www.mywebsite.com/(?:[^/]+/)+(\d{9,10}).htm$
The character class [A-Za-z0-9_.-~/]+ matches all the character that follow until the end of the line.
This part ([0-9]{9,10}). will then backtrack until it can match the resulting digits, which it can starting from 9 digits and that will be in the capturing group.
Note to either escape the hyphen \- or place it at the start or end of the character class or else it could possible match a range.
One option is to use a word bounary \b before matching the digits
^https://www\.mywebsite\.com/[A-Za-z0-9_.~/-]+\b([0-9]{9,10})\.htm$
Regex demo
Another way could be matching the / right before the digits.
^https://www\.mywebsite\.com/[A-Za-z0-9_.~/-]+/([0-9]{9,10})\.htm$
Regex demo
If there can also be chars a-zA-Z or an underscoe before the digits and a lookbehind is supported, you could also assert that there is not a digit before (?<!\d)
^https://www\.mywebsite\.com/[A-Za-z0-9_.~/-]+(?<!\d)([0-9]{9,10})\.htm$
Regex demo
One more approach. This gets all the numbers between / and htm
(\d+)(?=\.htm)
RegexDemo

Regular expression to get everything between two characters/strings

I have been trying to use regular expression to extract data from the following strings
LTE_LTE_FSD9167__P_Airport1
I want to extract the 7 digit sitecode(FSD9167) from the above string.
RUR1251__S_KhooNaiWala
I want to extract 7 digit sitecode(RUR1251) from above string.
For LTE_LTE case I wrote LTE_LTE_([^_;]+).* but it selects the whole string including not the required text only.
The pattern I see is three letters followed by four numbers, so:
\w{3}\d{4}
Use () to capture the pattern:
(\w{3}\d{4})
PHP:
$re = '/(\w{3}\d{4})/m';
JavaScript:
const regex = /(\w{3}\d{4})/gm;
Use https://regex101.com/ to learn the explanation.
You can use something like this:
^(?:LTE_LTE_)?(\S{7})\S*$ /gm
This captures the seven non-whitespace characters either at the beginning (case 2) or just after LTE_LTE_
Demo
You did not provide any rule about how the code could look like. I noticed that both codes you provided in the example have 3 letters followed by 4 digits. I made a rule more generic, with at least 2 letters followed by at least 3 digits.
The regex is:
[a-zA-Z]{2,}\d{3,}
Test here.
As you want to match only these 2 strings, use:
(?<![A-Z0-9])[A-Z0-9]{7}(?![A-Z0-9])
Explanation:
(?<![A-Z0-9]) # negative lookbehind, make sure we haven't alphanum before
[A-Z0-9]{7} # 7 alphanumerics
(?![A-Z0-9]) # negative lookahead, make sure we haven't alphanum after
Demo

Regex lookahead part of group accepted

I'm using regex in powershell 5.1.
I need it to detect groups of numbers, but ignore groups followed or preceeded by /, so from this it should detect only 9876.
[regex]::matches('9876 1234/56',‘(?<!/)([0-9]{1,}(?!(\/[0-9])))’).value
As it is now, the result is:
9876
123
6
More examples: "13 17 10/20" should only match 13 and 17.
Tried using something like (?!(\/([0-9]{1,}))), but it did not help.
You may use
\b(?<!/)[0-9]+\b(?!/[0-9])
See the regex demo
Alternatively, if the numbers can be glued to text:
(?<![/0-9])[0-9]+(?!/?[0-9])
See this regex demo.
The first pattern is based on word boundaries \b that make sure there are no letters, digits and _ right before and after an expected match. The second one just makes sure there are no digits and / on both ends of the match.
Details
(?<![/0-9]) - a negative lookbehind making sure there is no digit or / immediately to the left of the current location
[0-9]+ - one or more digis
(?!/?[0-9]) - a negative lookahead making sure there is no optional / followed with a digit immediately to the right of the current location.

Reg Exp: match specific number of characters or digits

My RegExp is very rusty! I have two questions, related to the following RegExp
Question Part 1
I'm trying to get the following RegExp to work
^.*\d{1}\.{1}\d{1}[A-Z]{5}.*$
What I'm trying to pass is x1.1SMITHx or x1.1.JONESx
Where x can be anything of any length but the SMITH or JONES part of the input string is checked for 5 upper case characters only
So:
some preamble 1.1SMITH some more characters 123
xyz1.1JONES some more characters 123
both pass
But
another bit of string1.1SMITHABC some more characters 123
xyz1.1ME some more characters 123
Should not pass because SMITH now contains 3 additional characters, ABC, and ME is only 2 characters.
I only pass if after 1.1 there are 5 characters only
Question Part 2
How do I match on specific number of digits ?
Not bothered what they are, it's the number of them that I can't get working
if I use ^\d{1}$ I'd have thought it'll only pass if one digit is present
It will pass 5 but it also passes 67
It should fail 67 as it's two digits in length.
The RegExp should pass only if 1 digit is present.
For the first one, check out this regex:
^.*\d\.\d[A-Z]{5}[^A-Z]*$
Before solving the problem, I made it easier to read by removing all of the {1}. This is an unnecessary qualifier since regex will default to looking for one character (/abc/ matches abc not aaabbbccc).
To fix the issue, we just need to replace your final .*. This says match 0+ characters of anything. If we make this "dot-match-all" more specific (i.e. [^A-Z]), you won't match SMITHABC.
I came up with a number of solution but I like these most. If your RegEx engine supports negative look-ahead and negative look-behind, you can use this:
Part 1: (?<![A-Z])[A-Z]{5}(?![A-Z])
Part 2: (?<!\d)\d(?!\d)
Both have a pattern of (?<!expr)expr(?!expr).
(?<!...) is a negative look-behind, meaning the match isn't preceded by the expression in the bracket.
(?!...) is a negative look-ahead, meaning the match isn't followed by the expression in the bracket.
So: for the first pattern, it means "find 5 uppercase characters that are neither preceded nor followed by another uppercase character". In other words, match exactly 5 uppercase characters.
The second pattern works the same way: find a digit that is not preceded or followed by another digit.
You can try it on Regex 101.