REGEX : Extract group of number where digits are more than 3 - regex

HI I have a question regarding REGEX.
This sounds very simple and I remember doing it but somehow it got deleted and I am finding it hard to get it back.
I want to extract group of numbers from one line.
If the count of digits > 3 - select that.
EG:
ga3rdparty/phpMyAdmin/i0ndex.php?&t0oken=abf540063shakk
This line can be different everytime but there will be only 1 group of digits with more than 2 digits.
OUTPUT: 540063
Thank you in advance

You can use \d{3,} where 3 is the minimum number of digits. You an take a look at the following python code
import re
var= "ga3rdparty/phpMyAdmin/i0ndex.php?&t0oken=abf540063shakk"
pattern = re.compile(r'\d{3,}')
for match in pattern.findall(ver):
print(match)

Related

Regex for for Phone Numbers allowing for only 6 to 20 characters

Regex beginner here. I've been trying to tackle this rule for phone numbers to no avail and would appreciate some advice:
Minimum 6 characters
Maximum 20 characters
Must contain numbers
Can contain these symbols ()+-.
Do not match if all the numbers included are the same (ie. 111111)
I managed to build two of the following pieces but I'm unable to put them together.
Here's what I've got:
(^(\d)(?!\1+$)\d)
([0-9()-+.,]{6,20})
Many thanks in advance!
I'd go about it by first getting a list of all possible phone numbers (thanks #CAustin for the suggested improvements):
lst_phone_numbers = re.findall('[0-9+()-]{6,20}',your_text)
And then filtering out the ones that do not comply with statement 5 using whatever programming language you're most comfortable.
Try this RegEx:
(?:([\d()+-])(?!\1+$)){6,20}
Explained:
(?: creates a non-capturing group
(\d|[()+-]) creates a group to match a digit, parenthesis, +, or -
(?!\1+$) this will not return a match if it matches the value found from #2 one or more times until the end of the string
{6,20} requires 6-20 matches from the non-capturing group in #1
Try this :
((?:([0-9()+\-])(?!\2{5})){6,20})
So , this part ?!\2{5} means how many times is allowed for each one from the pattern to be repeated like this 22222 and i put 5 as example and you could change it as you want .

Obtaining geographic decimal coordinates from proprietary text format using regex

Using only Notepad++ with regex support I would like to extract some data from a txt file, representing geographic coordinates and organize the output like that:
-123456789 becomes -123.456789
123456789 becomes 123.456789
-23456789 becomes -23.456789
56789 becomes 0.056789
-89 becomes -0.000089
Tried this: (-?)([0-9]*)([0-9]{6}) but fails when input is less than 6 digits long
You will need 2 steps in notepad++ to do this. First, let's take a look at the regex:
(?<sign>-?)(?<first>\d+(?=\d{6}))?(?<last>\d+)
captures the necessary parts in groups.
Explanation: (you can lose the named grouping if you want)
(?<sign>-?) # read the '-' sign
(?<first>\d+(?=\d{6}))? # read as many digits as possible,
# leaving 6 digits at the end.
(?<last>\d+) # read the remaining digits.
see regex101.com
How to use this in notepad++? Using a two step-search and replace:
(-?)(\d+(?=\d{6}))?(\d+)
replace with:
\1(?2\2.:0.)000000\3 # copy sign, if group 2 contains any
# values, copy them, followed by '.'.
# If not show a '0.'
# Print 6 zero's, followed by group 3.
Next, replace the superfluous zeros.
\.(0+(?=\d{6}\b))(\d{6}) # Replace the maximum number of zero's
# leaving 6 digits at the end.
replace with:
.\2
You can do it with three steps :
Step1 : replace : (-?)\b(\d{1,6})\b with \10000000\2
Step2 : replace : (-?)(\d{0,})(\d{6}) with \1\2.\3
Step3 : replace : 0{2,}\. with 0.
The idea is simple :
In the first step comple all the numbers less than 6 length with 6
zeros before to insure the length should be more than 6
In the step two put the dot before the 6th number
Step three replace all the multiple zeros before the dot with just one
In the end the output
-123.456789
123.456789
-23.456789
0.056789
-0.000089
Check the three steps :
You could use a Python Script plugin available for notepad++:
editor.rereplace('(\d+)', lambda m: ('%f' % (float(m.group(1))/1000000)))

Regex for validation of a street number

I'm using an online tool to create contests. In order to send prizes, there's a form in there asking for user information (first name, last name, address,... etc).
There's an option to use regular expressions to validate the data entered in this form.
I'm struggling with the regular expression to put for the street number (I'm located in Belgium).
A street number can be the following:
1234
1234a
1234a12
begins with a number (max 4 digits)
can have letters as well (max 2 char)
Can have numbers after the letter(s) (max3)
I came up with the following expression:
^([0-9]{1,4})([A-Za-z]{1,2})?([0-9]{1,3})?$
But the problem is that as letters and second part of numbers are optional, it allows to enter numbers with up to 8 digits, which is not optimal.
1234 (first group)(no letters in the second group) 5678 (third group)
If one of you can tip me on how to achieve the expected result, it would be greatly appreciated !
You might use this regex:
^\d{1,4}([a-zA-Z]{1,2}\d{1,3}|[a-zA-Z]{1,2}|)$
where:
\d{1,4} - 1-4 digits
([a-zA-Z]{1,2}\d{1,3}|[a-zA-Z]{1,2}|) - optional group, which can be
[a-zA-Z]{1,2}\d{1,3} - 1-2 letters + 1-3 digits
or
[a-zA-Z]{1,2} - 1-2 letters
or
empty
\d{0,4}[a-zA-Z]{0,2}\d{0,3}
\d{0,4} The first groupe matches a number with 4 digits max
[a-zA-Z]{0,2} The second groupe matches a char with 2 digit in max
\d{0,3} The first groupe matches a number with 3 digits max
You have to keep the last two groups together, not allowing the last one to be present, if the second isn't, e.g.
^\d{1,4}(?:[a-zA-z]{1,2}\d{0,3})?$
or a little less optimized (but showing the approach a bit better)
^\d{1,4}(?:[a-zA-z]{1,2}(?:\d{1,3})?)?$
As you are using this for a validation I assumed that you don't need the capturing groups and replaced them with non-capturing ones.
You might want to change the first number check to [1-9]\d{0,3} to disallow leading zeros.
Thank you so much for your answers ! I tried Sebastian's solution :
^\d{1,4}(?:[a-zA-z]{1,2}\d{0,3})?$
And it works like a charm ! I still don't really understand what the ":" stand for, but I'll try to figure it out next time i have to fiddle with Regex !
Have a nice day,
Stan
The first digit cannot be 0.
There shouldn't be other symbols before and after the number.
So:
^[1-9]\d{0,3}(?:[a-zA-Z]{1,2}\d{0,3})?$
The ?: combination means that the () construction does not create a matching substring.
Here is the regex with tests for it.

Regex and numeric value to capture between two differents tags

I'm trying to make a script which help me to get new books from a website.
I'm working with preg_match_all. I have 7 informations to get : title, author, editor...
I've some problem to create my preg match mask. For example, I need the product code from here. There is between 3 and 10 code product to get on each page. :
<li><label>Réf : </label>21608</li>
At first I'm trying this :
$mask ="/Réf :(.*)<\/li>/Us";
It's work, but I want only the numbers. I'm searching on regex guides on the web, but I don't understand how to use the syntax for my goal, because this code product is not betweend two tags like that : <open>...</open>. This code product have 4 or 5 numbers.
Thanks for any help !
Try following regular expression:
/Réf :\D*(\d+)<\/li>/
\D: non-digit
\d: digit
Let's try step by step to match those digits:
We have Réf, let's make it /réf/i and use the i modifier to match case insensitive.
There is space : space, let's make it dynamic and match it with \s* which will match zero or more times whitespaces /réf\s*:\s*/i
We then have no digits at all, we may use \D* which will match everything except digits: /réf\s*:\s*\D*/i
We know that there is 4 to 5 digits, we'll use \d{4,5} which will match a digit 4 or 5 times : /réf\s*:\s*\D*\d{4,5}/i
We need only the digits, so let's put them into a group: /réf\s*:\s*\D*(\d{4,5})/i
PHP code
$string = '<li><label>Réf : </label>21608</li>';
preg_match_all('/réf\s*:\s*\D*(\d{4,5})/i', $string, $m);
print_r($m[1]);
Output
Array
(
[0] => 21608
)
Try this...
/>\s*(\d{3,10})\s*</

What is wrong with this Regular Expression?

I am beginner and have some problems with regexp.
Input text is : something idUser=123654; nick="Tom" something
I need extract value of idUser -> 123456
I try this:
//idUser is already 8 digits number
MatchCollection matchsID = Regex.Matches(pk.html, #"\bidUser=(\w{8})\b");
Text = matchsID[1].Value;
but on output i get idUser=123654, I need only number
The second problem is with nick="Tom", how can I get only text Tom from this expresion.
you don't show your output code, where you get the group from your match collection.
Hint: you will need group 1 and not group 0 if you want to have only what is in the parentheses.
.*?idUser=([0-9]+).*?
That regex should work for you :o)
Here's a pattern that should work:
\bidUser=(\d{3,8})\b|\bnick="(\w+)"
Given the input string:
something idUser=123654; nick="Tom" something
This yields 2 matches (as seen on rubular.com):
First match is User=123654, group 1 captures 123654
Second match is nick="Tom", group 2 captures Tom
Some variations:
In .NET regex, you can also use named groups for better readability.
If nick always appears after idUser, you can match the two at once instead of using alternation as above.
I've used {3,8} repetition to show how to match at least 3 and at most 8 digits.
API links
Match.Groups property
This is how you get what individual groups captured in a match
Use look-around
(?<=idUser=)\d{1,8}(?=(;|$))
To fix length of digits to 6, use (?<=idUser=)\d{6}(?=($|;))