Regular Expressions for specific number patterns - regex

I have an invoice in readable form. I need to extract PO number from the invoice. The PO numbers come in a particular format (26123456, 26234567). It starts with 26 and has 6 numbers following it. I am trying to extract it using regular expressions.
I have passed this as my parameters.
[26]\d{6,6} also I have tried this ^[26]\d{6,6}
However, the problems I am facing are:
If the PO number is 26454545 and before the PO number there are other numbers in the invoice such as Telephone numbers which have in them a substring with 26, its extracting that as well. For ex. 12345678987 this number is being extracted as well since there is 2 and 6 present in the substring.

Remove the character class and add word boundaries.
\b26\d{6}\b
[26] will match a single character from the given list whether it may be 2 or 6. To match a number 26, just use the number as it is.
By adding \b at the start and at the end helps to match a complete number. Since \b matches between a word character and a non-word character. You could also use assertions here like (?<!\d)26\d{6}(?!\d) .
There is another pattern that i want to extract 12300012345. after the first three numbers there are always 3 zeros followed by 5 numbers.
\b\d{3}000\d{5}\b
If you want to combine the both, then you need to use the regex alternation operator |
\b26\d{6}\b|\b\d{3}000\d{5}\b

Related

Validating a phone number with a custom rule possibly using nested groups

I am trying to write a regex for validating phone numbers.
We have custom rules, i.e. the phone number must meet the following pattern:
+ or 00 as a prefix
One to three digits
An optional space or hyphen
Then one to n digits (n is still constrained by the rule for total character count below
The total number of characters must not exceed 28.
Here is the regex I have come up with:
/^((\+|00)(\d{1,3})[\s-]?)(\d{1,23}){1,28}$/
I am sure it can be simplified. Can someone please help?
This part of your pattern (\d{1,23}){1,28} matches 1-23 digits followed by repeating that 1-28 times and the maximum is 28×23=644 (Thank you #Toto)
You could check if the string consists of 1-28 times the listed characters using a positive lookahead (?=[+\d -]
The last part currently is \d{1,}, but you could specify a minimum length if you don't want to match +1 1
Note that \s could also possibly match a newline.
^(?=[+\d -]{1,28}$)(?:\+|00)\d{1,3}[ -]?\d{1,}$
Regex demo

Regex Match string having exact number of a char

I Have some strings:
1:2:3:4:5
2:3:4:5
5:3:2
6:7:8:9:0
How to find strings that have exact numbers of colons?
Example I need to find strings where 4 colons.
Result:
1:2:3:4:5 and 6:7:8:9:0
Edit:
No matter what text between colons, it may so:
qwe:::qwe:
:998:qwe:3ee3:00
I have to specify a number of colons, but using regexp_matches.
It something like filter to search broken strings.
Thanks.
With N being the number you search for:
"^([^:]*:){N}[^:]*$"
Here is a test:
for s in ":::foo:bar" "foo:bar:::" "fo:o:ba::r" ; do echo "$s" | egrep "^([^:]*:){4}[^:]*$" ; done
Change 4 to 3 and 5, to see it not matching.
Maybe postgresql needs specific flags or masking for some elements.
"^([^:]*:){N}[^:]*$"
"^ $"
# matching the whole String/Line, not just a part
"([^:]*:){N}[^:]*"
"( ){N}[^:]*"
# N repetitions of something, followed by an arbitrary number of non-colons (maybe zero)
"([^:]*:)"
# non-colons in arbitrary number (including zero), followed by a colon
You want to use the quantifier syntax {4}. The quantifier is used to indicate that the preceding capture group needs to occur n number of times in order to meet the matching criteria. To find the pattern of five digits separated by semi-colons. something like the following would work.
((\d\:){4}\d)
I am assuming you may want any digit or word character but not whitespace or punctuation. In that case use the word character (\w).
((\w)[\:]){4}(\w))
But depending on what you would like to do with that pattern you may need a different regular expression. If you wanted to capture and replace all the colons while leaving the digits intact your pattern would need to use string replacement or more advanced grouping.
Any number of any characters, including 4 :
((.*:){4}.*)

How to make numbers backwards in notepad++

So, I have a lot of numbers in lines like so
rocket123
firefly1000
attack577
Is there any regex to make the numbers reversed?
rocket321
firefly0001
attack775
This is feasible with a little trick.
Step 1. Add a marker for the not-yet-inverted digits.
Find:
\b(\w+?)(\d+)\b
Replace:
$1§$2
You can choose other marker instead of §.
Step 2. Do Replace all enough times with these settings:
Find:
\b(\w+)§(\d*)(\d)\b
Replace:
$1$3§$2
Step 3. Delete all markers.
Find:
\b(\w+\d)§
Replace:
$1
Hope this helps.
If the maximum number of digits to be reversed is known and not too large then a single Notepad++ regular expression search and replace can be used. Suppose the maximum number of digits is 12 then the expressions are:
Search for:
(\d)(\d)(\d)?(\d)?(\d)?(\d)?(\d)?(\d)?(\d)?(\d)?(\d)?(\d)?
Replace with:
(?{12}${12})(?{11}${11})(?{10}${10})(?9$9)(?8$8)(?7$7)(?6$6)(?5$5)(?4$4)(?3$3)$2$1
Explanation:
Any number to be reversed must have at least two digits, so the initial (\d)(\d) in the search gets two digits and the final $2$1 in the replace puts them in reverse order at the end of the output. (The first two digits are the easy part.) The search string then repeats the pattern (\d)? as many times as needed for the maximum number of digits. These match the remaining digits, if any. Each of these (\d)? patterns has a corresponding item in the replace string, they are of the form (?N$N) where each N is the number of the capture group. Single digit captures are like (?4$4) for number 4. For captures 10 and above the number is wrapped in curly braces, such as (?{12}${12}) for number 12. These replacement items test whether a capture group captured anything and, if it did then insert that captured item. See also this answer.
Variations
Add or remove additional search and replacement items as needed for longer or shorter maximum numbers of digits.
If the number of digits might be larger than expected then adding an extra (\d)? to the search string and (?{13}__Some suitable error message__) to the end of the replacement will output the error message on overlong groups of digits. Of course the 13 needs to be altered to match the number of items in search and replacement.
Tested with Notepad++ version 7.5.6.

Different regex conditions on same string

I am trying to implement a regex for phone numbers, based on our business logic.
What the customer wants is that the phone must contain between 8 and 15 characters of numbers, and also can contain any spaces and dots anywhere which doesn't add to the count of numbers. So, theoretically this should be valid:
3 .... 44444444
Because it contains 9 numbers.
I can't really go further on
~[0-9\.\ ]{8,15}$
but obviously it counts dots and spaces to the limit too.
Is it even possible to implement it via regex?
A Regex attempt:
^(?:[ .]*\d){8,15}[ .]*$
This will match 8 to 15 digits, with any number of space or dot happening anywhere in between.
The non-captured group, (?:[ .]*\d), matches any digit preceded by any number of dot or space, {8,15} ensures the range on numbers
[ .]*$ matches any number of dot or space at the end
Demo
As far as I know, regular expressions cannot validate this. However you could maybe globally remove all whitespace and dots and then try to match a regex that is ^[[:digit:]]{8,15}$

How to extract group of numbers from a phrase using regex?

I have multiple sentences which look like the following sentence -
069054 my name is black fox, $1234. phone number:1234567
I need to extract to extract the first word (or numbers, in this example its 069054).
The conditions that needs to be met are:
it should only consist of 6 digits.
It should be the first thing in the sentence.
If it has more or less digits, i should ignore it.
should only consist of numbers, no chars allowed
Here is what i have, but its not working out for me.
^([\d]{6})$
This is the regex you are looking for:
^(\d{6})(?!\d)
Just remove the $ from the end and replace it with (?!\d). It means the six digits which are not followed by any digit.
If you wish to avoid picking digits from input like 123456xyz then use this one:
^(\d{6})(?![\da-zA-Z])