Detect Multiple Account Numbers Using Regex Groups - regex

My SVP wants me to update our regular expression rules on our email system to better detect US bank account numbers. The issue is that bank account numbers in the US are not standardized, they can be between 6 and 17 digits.
We currently use qualifying terms to detect specific strings that we have identified as needing to be blocked. Our current rules are variations of this:
(?i)bank\saccount\s[0-9]{6,17}
The issue that I need to solve is the need to detect the numbers even if they are prepended or followed with bank account. I know I can find a single example with this:
(?=.*?(bank\saccount))(?=.*?(\d{6,17}))
But my SVP also wants to be able to detect the number of account numbers in a particular message. I've tried adding a third capture group with a greedy quantifier so that it grabs a different number than the second:
(?=.*?(bank\saccount))(?=.*?(\d{6,17}))(?=.*(\d{6,17}))
Here is a sandbox with a couple of examples:
https://regex101.com/r/hqIEaR/3
I am new to regex, is there a way to set up this expression to return a number of matches equal to the instances of 6-17 digit numbers in a message where the string "bank account" is present?

Maybe simpler is better:
(?<=\D|^)\d{6,17}(?=\D|$)
Test here.
The idea is that you find all numbers with 6..17 digits. They are probably account numbers.
The problem is that looking for "bank account" is useless. Your statement is:
The issue that I need to solve is the need to detect the numbers even if they are not prepended by "bank account ".
So if that string may or may not be there, just ignore it completely.
How can you differentiate between an account number and a SSN? That is the topic for another question.
If "bank account" AND the numbers must be found, but with no clear relationship between them (considering their location in the text), I would actually use two searches:
a search for bank account;
If the first search succeeds, a second search for the numbers.
I expect (no proof) that it will be even faster than doing it entirely in regex, since many things will not be done at all.

Since you are using PCRE compatible engine, you may use a regex like
(?is)(?:\G(?!\A)|\A(?=.*\bbank\saccount\b)).*?\K\b\d{6,17}\b
See the regex demo.
(?is) - case insensitive and singleline/dotall modes on
(?:\G(?!\A)|\A(?=.*\bbank\saccount\b)) - either the end of the previous match or start of string (\A) that has a blank account whole word anywhere to the right of the current location (a (?=.*\bbank\saccount\b) positive lookahead)
.*? - any 0+ chars, as few as possible
\K - match reset operator that discards the text matched so far from the overall match memory buffer
\b\d{6,17}\b - 6 to 17 digits matched as whole words (no other letters, digits or _ chars can appear on both ends).

Related

Money amount regex currency name after the amount

I am trying to create a regular expression that matches money amount (various currencies either in front of or after the given amount. The decimals are separated by a dot or by a comma).
This is what I've got so far:
\$[0-9.,]+|\£[0-9.,]+|\€[0-9.,]+
However, if I put currencies in the square brackets together with the other signs, it does not work as I expect it to (it still doesn't match 20,000$, only $20,000 and I want it to match both).
Can you tell me how I can modify my regex so that it also matches the amounts with the currency after the digits?
Also, is the only way to include more than one currency in the regex to separate them with a pipe and rewrite the same regular expression over and over again?
Updated:
This regex should match numbers with decimal group separators (zero or more) and a decimal point (zero or one):
(?:\d{1,3},)*\d{1,3}(?:\.\d+)?
For your use-case you should be happy with this regex:
[\$£€](?:\d{1,3},)*\d{1,3}(?:\.\d/{1,2})?|(?:\d{1,3},)*\d{1,3}(?:\.\d{1,2})?[\$£€]
Legacy answer
There is no way in regular expressions (at least that I know of) that would allow you to swap the order of two groups of characters, thus you'll have to specify it like "AB or BA".
Hope, this one works for you:
[\$\£\€]\d+(?:[.,]\d+)?|\d+(?:[.,]\d+)?[\$\£\€]
The \d+(?:[.,]\d+)? part could be simplified back to [\d.,]+. The simplest for of regex (with a lot of information lost) is this:
[\$£€]?[\d.,]+[\$£€]?
... but that allows a lot of erroneous inputs, like 20.$ or $.,€ or simply 5.

Using Regex to find repeating groups in phone numbers

I'm looking for a way to use regex to search for obviously false phone numbers that have the same digit repeating. The numbers are all formatted and stored as follows:
(111)111-1111
I'm not able to alter the text in any way.
I've tried modifying a few of the regex lines I've seen such as:
^([0-9])\1{2}.\1{3}.\1{4}$
which was for finding repeating digits with a period in between the numbers. However, I haven't figured out how to get around the first character as a parenthesis.
Any help would be appreciated!
You misunderstand the purpose of the . Dot Operator. It is not to match a period, it matches anything. In that (quite badly) regex, it serves only to skip the - – and because it matches anything, it will also match something like 11121113111.
Use this regexp instead:
^\(?([0-9])\1{2}\)?\1{3}-?\1{4}$
This checks for parentheses around the first group, optionally so it will still work without; and specifically checks for the presence of a dash between the second and third group of digits, also optionally.

Verifying that a string starts with a number (easy) OR exactly 3 letters?

I'm trying to make a RegEx expression to verify that a field starts with either the number 3 - the easy part - or starts with three letters, then continues to be numbers
My expression so far is
^((3)[\d])|([a-zA-Z]{3}[\d])$
The expression stops you from doing anything BELOW 3, but it still lets you go over...
I've done some searching and can't find a topic that relates to the issue of having an exact amount of characters
And I'm having trouble with limiting it to exactly 3 letter characters. Unfortunately what I'm working with, it HAS to be RegEx and not another language.
^(?:3|[a-zA-Z]{3})\d+$
verifies, that your string starts with either 3 or 3 letters and then is only followed by numbers (at least one) until the end of the string
See https://regex101.com/r/tD2nK4/3 for some positive and negative examples
This regex should do exactly what you want:
^((3)[\d])|([a-zA-Z]{3}[^a-zA-Z])
Please note that this regex can only cope with the ASCII alphabet.

Regex Lookahead character restrictions?

I am trying to learn some things about Regex. I am starting off by trying to hide some matches for a nine digit number, such as a SSN, but let through all nine digit numbers that have the word "order" or "routing number" but it seems that only strings that have the same length will work. Is there any way around this without creating multiple lines? Thanks!
(?<!(Order:\s|Routing\snumber:\s))
(?!000|666)([0-6]\d\d|7[01256]\d|73[0123]|77[012])
([-]?)
([1-9]{2})
\3
([1-9]{4})
(?!([\w&/%"-]))
For blocking out SSNs, this one seems to work
^(?!000)(?!666)(?!9)\d{3}([- ]?)(?!00)\d{2}\1(?!0000)\d{4}$
but I want it to not block out any 9 digit numbers that have the words "order" or "routing number" in front of them.
Many regular expression engines require lookbehind to be of fixed length, and will refuse to execute a variable-length lookbehind; if this is the case for yours, you should see a warning. If you're not seeing a warning, chances are the problem is that your regexp simply doesn't wok the way you think it does.
However, it is usually possible with lookbehinds to simply match the text you would prefer to count as a lookbehind, then discard/ignore it when you're inspecting the captures or match object.

Preparing number using abbreviations

RegEx for BMHT in a sequence is my previous post.
I'm looking to build a number using abbreviations, and ofcourse using regex.
Now I know how to validate a number with BMTH abbreviations.
Now my next and final target is to build a number using the abbreviations.
e.g. -2T2H22.55 should be displayed as -2,222.55
-2M2H22.63 should be displayed as -2,000,222.63
Help appreciated.
Flex's scripting language, ActionScript, is an ECMAScript implementation like JavaScript, so regex literals have to be delimited with slashes, for example: /^(?:\d+B)?(?:\d{1,3}M)?(?:\d{1,3}T)?(?:\d{1}H)?(\.[0-9]*)?/.
But that regex still has some problems. For one thing, you don't account for the minus sign or the two digits after the hundreds place. And, while the decimal point may be optional, if it is present you should require it to be followed by at least one digit (so +, not * in that last group).
Finally, you'll need to capture the various components so you can use them to construct the number. Here's my result:
/^(-?)(?:(\d+)B)?(?:(\d{1,3})M)?(?:(\d{1,3})T)?(?:(\d)H)?(\d{0,2})(\.\d+)?$/
The minus sign, if present, will be captured in group $1. The rest of the components will be in groups $2 through $7. You can use them in a callback function to construct the number. Also, notice that everything in this regex is optional; it will match an empty string or just a hyphen, so you'll need to check for that.