Preparing number using abbreviations - regex

RegEx for BMHT in a sequence is my previous post.
I'm looking to build a number using abbreviations, and ofcourse using regex.
Now I know how to validate a number with BMTH abbreviations.
Now my next and final target is to build a number using the abbreviations.
e.g. -2T2H22.55 should be displayed as -2,222.55
-2M2H22.63 should be displayed as -2,000,222.63
Help appreciated.

Flex's scripting language, ActionScript, is an ECMAScript implementation like JavaScript, so regex literals have to be delimited with slashes, for example: /^(?:\d+B)?(?:\d{1,3}M)?(?:\d{1,3}T)?(?:\d{1}H)?(\.[0-9]*)?/.
But that regex still has some problems. For one thing, you don't account for the minus sign or the two digits after the hundreds place. And, while the decimal point may be optional, if it is present you should require it to be followed by at least one digit (so +, not * in that last group).
Finally, you'll need to capture the various components so you can use them to construct the number. Here's my result:
/^(-?)(?:(\d+)B)?(?:(\d{1,3})M)?(?:(\d{1,3})T)?(?:(\d)H)?(\d{0,2})(\.\d+)?$/
The minus sign, if present, will be captured in group $1. The rest of the components will be in groups $2 through $7. You can use them in a callback function to construct the number. Also, notice that everything in this regex is optional; it will match an empty string or just a hyphen, so you'll need to check for that.

Related

Money amount regex currency name after the amount

I am trying to create a regular expression that matches money amount (various currencies either in front of or after the given amount. The decimals are separated by a dot or by a comma).
This is what I've got so far:
\$[0-9.,]+|\£[0-9.,]+|\€[0-9.,]+
However, if I put currencies in the square brackets together with the other signs, it does not work as I expect it to (it still doesn't match 20,000$, only $20,000 and I want it to match both).
Can you tell me how I can modify my regex so that it also matches the amounts with the currency after the digits?
Also, is the only way to include more than one currency in the regex to separate them with a pipe and rewrite the same regular expression over and over again?
Updated:
This regex should match numbers with decimal group separators (zero or more) and a decimal point (zero or one):
(?:\d{1,3},)*\d{1,3}(?:\.\d+)?
For your use-case you should be happy with this regex:
[\$£€](?:\d{1,3},)*\d{1,3}(?:\.\d/{1,2})?|(?:\d{1,3},)*\d{1,3}(?:\.\d{1,2})?[\$£€]
Legacy answer
There is no way in regular expressions (at least that I know of) that would allow you to swap the order of two groups of characters, thus you'll have to specify it like "AB or BA".
Hope, this one works for you:
[\$\£\€]\d+(?:[.,]\d+)?|\d+(?:[.,]\d+)?[\$\£\€]
The \d+(?:[.,]\d+)? part could be simplified back to [\d.,]+. The simplest for of regex (with a lot of information lost) is this:
[\$£€]?[\d.,]+[\$£€]?
... but that allows a lot of erroneous inputs, like 20.$ or $.,€ or simply 5.

Using Regex to find repeating groups in phone numbers

I'm looking for a way to use regex to search for obviously false phone numbers that have the same digit repeating. The numbers are all formatted and stored as follows:
(111)111-1111
I'm not able to alter the text in any way.
I've tried modifying a few of the regex lines I've seen such as:
^([0-9])\1{2}.\1{3}.\1{4}$
which was for finding repeating digits with a period in between the numbers. However, I haven't figured out how to get around the first character as a parenthesis.
Any help would be appreciated!
You misunderstand the purpose of the . Dot Operator. It is not to match a period, it matches anything. In that (quite badly) regex, it serves only to skip the - – and because it matches anything, it will also match something like 11121113111.
Use this regexp instead:
^\(?([0-9])\1{2}\)?\1{3}-?\1{4}$
This checks for parentheses around the first group, optionally so it will still work without; and specifically checks for the presence of a dash between the second and third group of digits, also optionally.

Multiple spaces, multiple commas and multiple hypens in alphanumeric regex

I am very new to regex and regular expressions, and I am stuck in a situation where I want to apply a regex on an JSF input field.
Where
alphanumeric
multiple spaces
multiple dot(.)
multiple hyphen (‐)
are allowed, and Minimum limit is 1 and Maximum limit is 5.
And for multiple values - they must be separated by comma (,)
So a Single value can be:
3kd-R
or
k3
or
-4
And multiple values (must be comma separated):
kdk30,3.K-4,ER--U,2,.I3,
By the help of stackoverflow, so far I am able to achieve only this:
(^[a-zA-Z0-9 ]{5}(,[a-zA-Z0-9 ]{5})*$)
Something like
^[-.a-zA-Z0-9 ]{1,5}(,[-.a-zA-Z0-9 ]{1,5})*$
Changes made
[-.a-zA-Z0-9 ] Added - and . to the character class so that those are matched as well.
{1,5} Quantifier, ensures that it is matched minimum 1 and maximum 5 characters
Regex demo
You've done pretty good. You need to add hyphen and dot to that first character class. Note: With the hyphen, since it delegates ranges within a character class, you need to position it where contextually it cannot be specifying a range--not to say put it where it seems like it would be an invalid range, e.g., 7-., but positionally cannot be a range, i.e., first or last. So your first character class would look something like this:
[a-zA-Z 0-9.-]{1,5} or [-a-zA-Z0-9 .]{1,5}
So, we've just defined what one segment looks like. That pattern can reoccur zero or more times. Of course, there are many ways to do that, but I would favor a regex subroutine because this allows code reuse. Now if the specs change or you're testing and realize you have to tweak that segment pattern, you only need to change it in one place.
Subroutines are not supported in BRE or ERE, but most widely-used modern regex engines support them (Perl, PCRE, Ruby, Delphi, R, PHP). They are very simple to use and understand. Basically, you just need to be able to refer to it (sound familiar? refer-back? back-reference?), so this means we need to capture the regex we wish to repeat. Then it's as simple as referring back to it, but instead of \1 which refers to the captured value (data), we want to refer to it as (?1), the capturing expression. In doing so, we've logically defined a subroutine:
([a-zA-Z 0-9.-]{1,5})(,(?1))*
So, the first group basically defines our subroutine and the second group consists of a comma followed by the same segment-definition expression we used for the first group, and that is optional ('*' is the zero-or-more quantifier).
If you operate on large quantities of data where efficiency is a consideration, don't capture when you don't have to. If your sole purpose for using parenthesis is to alternate (e.g., \b[bB](asset|eagle)\b hound) or to quantify, as in our second group, use the (?: ... ) notation, which signifies to the regex engine that this is a non-capturing group. Without going into great detail, there is a lot of overhead in maintaining the match locations--not that it's complex, per se, just potentially highly repetitive. Regex engines will match, store the information, then when the match fails, they "give up" the match and try again starting with the next matching substring. Each time they match your capture group, they're storing that information again. Okay, I'm off the soapbox now. :-)
So, we're almost there. I say "almost" because I don't have all the information. But if this should be the sole occupant of the "subject" (line, field, etc.--the data sample you're evaluating), you should anchor it to "assert" that requirement. The caret '^' is beginning of subject, and the dollar '$' is end of subject, so by encapsulating our expression in ^ ... $ we are asserting that the subject matches in it's entirety, front-to-back. These assertions have zero-length; they consume no data, only assert a relative position. You can operate on them, e.g., s/^/ / would indent your entire document two spaces. You haven't really substituted the beginning of line with two spaces, but you're able to operate on that imaginary, zero-length location. (Do some research on zero-length assertions [aka zero-width assertions, or look-arounds] to uncover a powerful feature of modern regex. For example, in the previous regex if I wanted to make sure I did not insert two spaces on blank lines: s/^(?!$)/ /)
Also, you didn't say if you need to capture the results to do something with it. My impression was it's validation only, so that's not necessary. However, if it is needed, you can wrap the entire expression in capturing parenthesis: ^( ... )$.
I'm going to provide a final solution that does not assume you need to capture but does assume the entire subject should consist of this value:
^([a-zA-Z 0-9. -]{1,5})(?:,(?1))*$
I know I went on a bit, but you said you were new to regex, so wanted to provide some detail. I hope it wasn't too much detail.
By the way, an excellent resource with tutorials is regular-expressions dot info, and a wonderful regex development and testing tool is regex101 dot com. And I can never say enough about stack overflow!

positive look ahead and replace

Recently I'm writing/testing regexps on https://regex101.com/.
My question is: Is it possible to do a positive look-ahead AND a replacement in the same "replacement"? Or just limited kind of replacement is possible.
Input is several lines with phone numbers. Let's say the correct phone number where the number of "numbers" are 11. No matter how the numbers are divided/group together with - / characters, no matter if starts with + 00 or it is omitted.
Some example lines:
+48301234567
+48/30/1234567
+48-30-12-345-67
+483011223344556677
0048301234567
+(48)30/1234567
Positive look-ahead able to check if from the beginning until the end of line there are only 11 digits, regardless how many other, above specified character separating them. This works perfectly.
Where the positive look-ahead check is fine, I would like to delete every character but numbers. The replacement works fine until I'm not involving look-ahead.
Checking the regexp itself working perfectly ("gm" modes):
^(?:\+|00)?(?:[\-\/\(\)]?\d){11}$
Checking the replace part works perfectly (replace to nothing):
[^\d\n]
Put this into look-ahead, after the deletion of non new-line and non-digit characters from the matching lines:
(?=^(?:\+|00)?(?:[\-\/\(\)]?\d){11}$)[^\d\n]
Even I put the ^ $ into look-ahead, seems the replacement working only from beginning of the lines until the very first digit.
I know in real life the replacement and the check should/would go separate ways, however I'm curious if I could mix look-ahead/look-behind with string operations like replace, delete, take the string apart and put together as I like.
UPDATE: This is what would do the trick, however I feel this one "ugly" a bit. Is there any prettier solution?
https://regex101.com/r/yT5dA4/2
Or the version which I asked originally, where only digits remains: regex101.com/r/yT5dA4/3
You cannot replace/delete text with regex. Regex is just a tool for matching certain strings and then taking certain action depending on the matching text, eg. perform a substitution, retrieve the second capture group.
However it is possible to perform certain decisions within a regex engine, by using conditionals. The common syntax for this, with a lookahead assertion, is (?(?=regex)then|else).
With conditionals you can change the behaviour depending on how the text matches the regex. For your example you could do something like:
^(\+)?(?(1)\(|\d)
If the phone number starts with a plus it must be followed by a bracket, else it should start with a digit. Although in your situation, this is not very useful.
If you want to read up more on conditionals in regex you can do so here.

A typical regex for phone number validation in Java

I am struggling with getting the right regex for a phone number validation in my application. I have got a regex that will accept only numbers and some special symbols like ()- etc, however, the problem is that it accepts only symbols as well. So for example, it would accept something like ()()()(). I want to modify the regex or get a whole new regex that accepts these symbols but it should have at least one number before and after each symbol.
My requirements are:
Only numbers
Number with combination of special symbols
Each symbol should be followed by a number (before and after) but white spaces are okay
Max length should be 15
In my experience, the parenthesis only appear around the first group of digits and there are never fewer than 3 digits in a group. This regex does that, and prevents multiple consecutive separators with the exception of a space following a paren "(123) 456-7890". I also added support for periods as separators. It allows for 1, 2, or 3 groups of numbers and attempts to enforce an overall range of 7-15 digits but it errs on the permissive side.
^\\s*(\\d{7,15})||(\\d{3,12}[\\-.]?\\s?\\d{3,12}[\\-.\\s]?)||([(]?\\d{3,9}[)\\-.]?\\s?\\d{3,9}[\\-.\\s]?\\d{3,9})\\s*
In my environment I have to escape the backslashes - you may not have to so you may need to replace the \ with . The hyphen must be escaped because in this context it represents a range.