How to extract group of numbers from a phrase using regex? - regex

I have multiple sentences which look like the following sentence -
069054 my name is black fox, $1234. phone number:1234567
I need to extract to extract the first word (or numbers, in this example its 069054).
The conditions that needs to be met are:
it should only consist of 6 digits.
It should be the first thing in the sentence.
If it has more or less digits, i should ignore it.
should only consist of numbers, no chars allowed
Here is what i have, but its not working out for me.
^([\d]{6})$

This is the regex you are looking for:
^(\d{6})(?!\d)
Just remove the $ from the end and replace it with (?!\d). It means the six digits which are not followed by any digit.
If you wish to avoid picking digits from input like 123456xyz then use this one:
^(\d{6})(?![\da-zA-Z])

Related

Different regex conditions on same string

I am trying to implement a regex for phone numbers, based on our business logic.
What the customer wants is that the phone must contain between 8 and 15 characters of numbers, and also can contain any spaces and dots anywhere which doesn't add to the count of numbers. So, theoretically this should be valid:
3 .... 44444444
Because it contains 9 numbers.
I can't really go further on
~[0-9\.\ ]{8,15}$
but obviously it counts dots and spaces to the limit too.
Is it even possible to implement it via regex?
A Regex attempt:
^(?:[ .]*\d){8,15}[ .]*$
This will match 8 to 15 digits, with any number of space or dot happening anywhere in between.
The non-captured group, (?:[ .]*\d), matches any digit preceded by any number of dot or space, {8,15} ensures the range on numbers
[ .]*$ matches any number of dot or space at the end
Demo
As far as I know, regular expressions cannot validate this. However you could maybe globally remove all whitespace and dots and then try to match a regex that is ^[[:digit:]]{8,15}$

Pattern grouping doesent work in HTML5

I try to match this RegExp pattern="([a-zA-Z]+[0-9]*){4,}" which means:
Always start with alphabetic, then if he wants to add a number, all this must be minimum 4; aaaa is validated, but aaa4 is not.
The trick I did is [a-zA-Z]+[a-zA-Z0-9]{4,} so I can oblige the first character to be a letter, then there is at least 4 alphanumeric.
If you want that always start with alphabetic and finish with one or more optional number (must be the minimum 4 elements) your regexp should be
pattern="([a-zA-Z]{4,}\d*|[a-zA-Z]{3,}\d+)"
Although you could want this solution
[a-zA-Z][a-zA-Z0-9]{3,}
In this case you get one alphabet and after three or more alphabetic and numbers characters.
What your current pattern actually does is look for one or more letters followed by 0 or more numbers, and that entire pattern has to repeat a minimum of 4 times.
To get what you want you would need to use a pipe and specify the two different patterns and a negative lookahead:
[a-zA-Z]+[0-9]{4,}|[a-zA-Z]+(?![0-9])
Which means look for a-z one or more times followed by 4 or more digits or look for a-z one or more times and not followed by a digit.
Enter text: <input type="text" pattern="^[a-zA-Z]+[0-9]{4,}|[a-zA-Z]+(?![0-9])">

Regular Expressions for specific number patterns

I have an invoice in readable form. I need to extract PO number from the invoice. The PO numbers come in a particular format (26123456, 26234567). It starts with 26 and has 6 numbers following it. I am trying to extract it using regular expressions.
I have passed this as my parameters.
[26]\d{6,6} also I have tried this ^[26]\d{6,6}
However, the problems I am facing are:
If the PO number is 26454545 and before the PO number there are other numbers in the invoice such as Telephone numbers which have in them a substring with 26, its extracting that as well. For ex. 12345678987 this number is being extracted as well since there is 2 and 6 present in the substring.
Remove the character class and add word boundaries.
\b26\d{6}\b
[26] will match a single character from the given list whether it may be 2 or 6. To match a number 26, just use the number as it is.
By adding \b at the start and at the end helps to match a complete number. Since \b matches between a word character and a non-word character. You could also use assertions here like (?<!\d)26\d{6}(?!\d) .
There is another pattern that i want to extract 12300012345. after the first three numbers there are always 3 zeros followed by 5 numbers.
\b\d{3}000\d{5}\b
If you want to combine the both, then you need to use the regex alternation operator |
\b26\d{6}\b|\b\d{3}000\d{5}\b

Regex to check for 4 consecutive numbers

Can I use
\d\d\d\d[^\d]
to check for four consecutive numbers?
For example,
411112 OK
455553 OK
1200003 OK
f44443 OK
g55553 OK
3333 OK
f4442 No
45553 No
f4444g4444 No
f44444444 No
If you want to find any series of 4 digits in a string /\d\d\d\d/ or /\d{4}/ will do. If you want to find a series of exactly 4 digits, use /[^\d]\d{4}[^\d]/. If the string should simply contain 4 consecutive digits use /^\d{4}$/.
Edit: I think you want to find 4 of the same digits, you need a backreference for that. /(\d)\1{3}/ is probably what you're looking for.
Edit 2: /(^|(.)(?!\2))(\d)\3{3}(?!\3)/ will only match strings with exactly 4 of the same consecutive digits.
The first group matches the start of the string or any character. Then there's a negative look-ahead that uses the first group to ensure that the following characters don't match the first character, if any. The third group matches any digit, which is then repeated 3 times with a backreference to group 3. Finally there's a look-ahead that ensures that the following character doesn't match the series of consecutive digits.
This sort of stuff is difficult to do in javascript because you don't have things like forward references and look-behind.
Should the numbers be part of a string, or do you want only the four numbers. In the later case, the regexp should be ^\d{4}$. The ^ marks the beginning of the string, $ the end. That makes sure, that only four numbers are valid, and nothing before or after that.
That should match four digits (\d\d\d\d) followed by a non digit character ([^\d]). If you just want to match any four digits, you should used \d\d\d\d or \d{4}. If you want to make sure that the string contains just four consecutive digits, use ^\d{4}$. The ^ will instruct the regex engine to start matching at the beginning of the string while the $ will instruct the regex engine to stop matching at the end of the string.

tweak regex to accept several of the pattern

I validate my input if it passes both regex above.
How can I tweak both regex so that it accepts a list ie (like on input2 and input3). Right now my regex only work on input1.
2 or higher:
^\d{2}\d*$
non 0:
^[1-9]\d*$
input1: 123
input2: 123, 456
input3: 123, 456, 789
First of all, the pattern you posted is equivalent to ^\d{2,}$, which requires the number to have two or more digits. The regex for an integer greater than or equal to 2 is more like ^[+-]?0*([2-9]|[1-9]\d+)$. From your description, it's not clear which of these you intended.
Either way, what you want to use is something like this:
^(<pattern>(,\s|$))+$`
So for your scenario, it would be something like:
^(\d{2,}(,\s|$))+$ #2 or more digits
^(0*([2-9]|[1-9]\d+)(,\s|$))+$ #positive integers >= 2
^(0*[1-9]\d*)(,\s|$))+$ #positive integers > 0
I'm not sure what flavor of regex you're using, but if your engine balks at the redundant use of $ in the patterns above, you could try something like
^(<pattern>,\s)*<pattern>$
instead. Example:
^(\d{2,},\s)*\d{,2}$ #2 or more digits, simplified
Bear in mind that a better way to do this is usually to split the string on the comma + whitespace separator, which will give you an array of strings you can try to parse as integers.
First of all, here's what I'd recommend for your base regexes:
Two or more digits:
\d{2,}
One or more digits:
\d+
Now if you want either of them to match a comma-and-space separated list, you could use:
(?:\d{2,}(?:\s*,\s*)?)+
and
(?:\d+(?:\s*,\s*)?)+
respectively
Try this:
^[1-9]\d*(?:\s*,\s*[1-9]\d*)*$
[1-9]\d* matches one or more digits, the first of which cannot be zero. If there are any more characters after the first number, they must comprise a comma optionally surrounded by whitespace -- \s*,\s* -- followed by another number. And that repeats as many times as necessary.
You can be more strict about the format if you like. For example, if the comma must follow immediately after the number and there must be exactly one space after the comma, you can use this:
^[1-9]\d*(?:, [1-9]\d*)*$