How can I use REGEX to test for currency formats - regex

How can you create a regular expression that checks if a user input matches characters formally found in a currency syntax? (number, period/decimal place, comma, or dollar sign?).
The following can find all characters listed above except for the dollar sign, any idea how to properly structure this?
/([0-9.,])/g

The regex I use for currency validation is as follows:
^(\$)?([1-9]{1}[0-9]{0,2})(\,\d{3})*(\.\d{2})?$|^(\$)?([1-9]{1}[0-9]{0,2})(\d{3})*(\.\d{2})?$|^(0)?(\.\d{2})?$|^(\$0)?(\.\d{2})?$|^$
RegExr is a great website for testing and reviewing these strings (perhaps you could make a regex string that's less of a beast!)

Are you just trying to test the characters? In that case
[0-9,.$]+
will suffice. Or are you testing for the format $1,123,123.12 with the correct placements of commas and everything?
In that case you would need something more like
(\$?\d{1,3}(?:,\d{3})*(?:.\d{2})?)
should do.

You need to define what you want your regex to match, more formally than "matches characters formally found in a currency syntax". We don't know which currencies you're interested in. We don't know how strict you need it to be.
Maybe you'll come up with something like:
These elements must come in this order:
A currency symbol ('£', '€' or '$') (your requirement might specify more currencies)
1 or more numeric digits
A period or a comma
Exactly two numeric digits
Once you have a specification like that, it's easy to translate into a regular expression:
[£€$] // one of these chars.
\d+ // '+' means 'one or more'
[.,] // '[]' means 'any one of these'.
\d\d // Two digits. Could also be written as '\d{2}'
Or concatenated together:
[£€$]\d+[.,]\d\d
If you've learned about escaping special characters like $ and ., you may be surprised not to see it done here. Within [], they lose their special meaning.
(There are dialects of regex -- check the documentation for whatever implementation you're using)
Your requirements may be different though. The example I've given doesn't match:
$ 12.00
$12
USD12
¥200.00
25¢
$0.00005
20 μBTC
44 dollars
£1/19/11¾d ("one pound, nineteen shillings and elevenpence three farthings")
Work out your requirement, then write your code to meet it.

you should set \ before special chars, also you should set star(0+) or plus(1+) for match full currency chars, for example:
/([0-9\.,]*)/g
or for real price how 200,00 where all time exist 2 symbols after comma:
/(([0-9]+)(\.|,)([0-9]){2})/g

Related

Is it possible to negate a group in a regular expression?

Let's say that we have this text:
2020-09-29
2020-09-30
2020-10-01
2020-10-02
2020-10-12
2020-10-16
2020-11-12
2020-11-23
2020-11-15
2020-12-01
2020-12-11
2020-12-30
I want to do something like this:
\d\d\d\d-(NOT10)-(30)
So i want to get all dates of any year, but not of the 10th month and it is important, that the day is 30.
I tried a lot to do this using negative lookahead asserations but i did not come up with any working regexes.
You can use negative lookaheads:
\d\d\d\d-(?!10)\d\d-30
The Part (?!10) ensures that no 10 follows at the point where it is inserted into the regex. Notice that you still need to match the following digits afterwards, thus the \d\d part.
Generally speaking you can not (to my knowledge) negate a part that then also matches parts of the string. But with negative lookaheads you can simulate this as I did above. The generalized idea looks something like:
(?!<special-exclusion-pattern>)<general-inclusion-pattern>
Where the special-exclusion-pattern matches a subset of the general-inclusion-pattern. In the above case the general inclusion pattern is \d\d and the special exclusion pattern ins 10.
Try :
/20\d{2}-(?:0[1-9]|1[12])-30/
Explanation :
20\d{2} it will match 20XX
(?:0[1-9]|1[12]) it will match 0X or 11, 12
30 it will match 30
Demo :https://regex101.com/r/O2F1eV/1
It's easiest to simply convert the substring (if present) that matches /^\d{4}-10-30$/ to an empty string, then split the resulting string on one or more newlines.
If your string were
2020-10-16
2020-10-30
2020-11-12
2020-11-23
and was held by the variable str, then in Ruby, for example,
str.sub(/^\d{4}-10-30$/,'')
#=> "2020-10-16\n\n2020-11-12\n2020-11-23\n"
so
str.sub(/^\d{4}-10-30$/,'').split
#=> ["2020-10-16", "2020-11-12", "2020-11-23"]
Whatever language you are using undoubtedly has similar methods.

VIN Validation RegEx

I have written a VIN validation RegEx based on the http://en.wikipedia.org/wiki/Vehicle_identification_number but then when I try to run some tests it is not accepting some valid VIN Numbers.
My RegEx:
^[A-HJ-NPR-Za-hj-npr-z\\d]{8}[\\dX][A-HJ-NPR-Za-hj-npr-z\\d]{2}\\d{6}$
VIN Number Not Working:
1ftfw1et4bfc45903
WP0ZZZ99ZTS392124
VIN Numbers Working:
19uya31581l000000
1hwa31aa5ae006086
(I think the problem occurs with the numbers at the end, Wikipedia made it sound like it would end with only 6 numbers and the one that is not working but is a valid number only ends with 5)
Any Help Correcting this issue would be greatly appreciated!
I can't help you with a perfect regex for VIN numbers -- but I can explain why this one is failing in your example of 1ftfw1et4bfc45903:
^[A-HJ-NPR-Za-hj-npr-z\d]{8}[\dX][A-HJ-NPR-Za-hj-npr-z\d]{2}\d{6}$
Explanation:
^[A-HJ-NPR-Za-hj-npr-z\d]{8}
This allows for 8 characters, composed of any digits and any letters except I, O, and Q; it properly finds the first 8 characters:
1ftfw1et
[\dX]
This allows for 1 character, either a digit or a capital X; it properly finds the next character:
4
[A-HJ-NPR-Za-hj-npr-z\d]{2}
This allows for 2 characters, composed of any digits and any letters except I, O, and Q; it properly finds the next 2 characters:
bf
\d{6}$
This allows for exactly 6 digits, and is the reason the regex fails; because the final 6 characters are not all digits:
c45903
Dan is correct - VINs have a checksum. You can't utilize that in regex, so the best you can do with regex is casting too wide of a net. By that I mean that your regex will accept all valid VINs, and also around a trillion (rough estimate) non-VIN 17-character strings.
If you are working in a language with named capture groups, you can extract that data as well.
So, if your goal is:
Only to not reject valid VINs (letting in invalid ones is ok) then use Fransisco's answer, [A-HJ-NPR-Z0-9]{17}.
Not reject valid VINs, and grab info like model year, plant code, etc, then use this (note, you must use a language that can support named capture groups - off the top of my head: Perl, Python, Elixir, almost certainly others but not JS): /^(?<wmi>[A-HJ-NPR-Z\d]{3})(?<vds>[A-HJ-NPR-Z\d]{5})(?<check>[\dX])(?<vis>(?<year>[A-HJ-NPR-Z\d])(?<plant>[A-HJ-NPR-Z\d])(?<seq>[A-HJ-NPR-Z\d]{6}))$/ where the names are defined at the end of this answer.
Not reject valid VINs, and prevent some but not all invalid VINs, you can get specific like Pedro does.
Only accept valid VINs: you need to write code (just kidding, GitHub exists).
Capture group name key:
wmi - World manufacturer identifier
vds - Vehicle descriptor section
check - Check digit
vis - Vehicle identifier section
year - Model year
plant - Plant code
seq - Production sequence number
This regular expression is working fine for validating US VINs, including the one you described:
[A-HJ-NPR-Z0-9]{17}
Remember to make it case insensitive with flag i
Source: https://github.com/rfink/angular-vin
VIN should have only A-Z, 0-9 characters, but not I, O, or Q
Last 6 characters of VIN should be a number
VIN should be 17 characters long
You didn't specify which language you're using but the following regex can be used to validate a US VIN with php:
/^(?:([A-HJ-NPR-Z]){3}|\d{3})(?1){2}\d{2}(?:(?1)|\d)(?:\d|X)(?:(?1)+\d+|\d+(?1)+)\d{6}$/i
I feel regex is not the ideal validation. VINs have a built in check digit. https://en.wikibooks.org/wiki/Vehicle_Identification_Numbers_(VIN_codes)/Check_digit or http://www.vsource.org/VFR-RVF_files/BVINcalc.htm
I suggest you build an algorithm using this. (Untested algorithm example)
This should work, it is from splunk search, so there are some additional exclusions**
(?i)(?<VIN>[A-Z0-9^IOQioq_]{11}\d{6})
The NHTSA website provides the method used to calculate the 9th character checksum, if you're interested. It also provides lots of other useful data, such as which characters are allowed in which position, or how to determine whether the 10th character, if alphabetic, refers to a model year up to 1999 or a model year from 2010.
NHTSA VIN eCFR
Hope that helps.
Please, use this regular expression. It is shorter and works with all VIN types
(?=.*\d|[A-Z])(?=.*[A-Z])[A-Z0-9]{17}
I changed above formula by new below formula
(?=.*\d|=.*[A-Z])(?=.*[A-Z])[A-Z0-9]{17}
This regular expression consider any letter but at leats one digit, max 17 characters

Regex to match number of digits in a phone string?

Trying to put together regex that can match minimum 4 digits, maximum 16 digits, and those digits can be separated by characters: ()- x+ (but should not be part of the min/max count).
ie. "555-123-4567" would return true, "1-234" is true, "+44(55)123-3333" is true, "abcd1" is false, "1-()-4++++-()-6" is false.
Any way to do that with purely regex? Trying a couple expressions but not working.
what you need to do, is to match any number of the allowed characters, followed by a digit, followed by any number of the allowed characters, and match that same sequence between 4 an 16 times.
like this
^([()\- x+]*\d[()\- x+]*){4,16}$
http://rubular.com/r/6VhALkFPQZ
This:
/^[(]{0,1}[0-9]{3}[)]{0,1}[-\s\.]{0,1}[0-9]{3}[-\s\.]{0,1}[0-9]{4}$/
Works with these formats:
123-456-7890
(123) 456-7890
1234567890
123.456.7890
TL;DR
The OP has already accepted a regex solution. Below I present an alternative way of looking at the problem. Hopefully it helps the OP, but it's really aimed more at future visitors and followers of regex.
Don't Validate Logic with Regexps
Regular expressions work best for matching or extracting patterns, rather than for complex data validation. For example, the OP gives the following rules:
Trying to put together regex that can match minimum 4 digits, maximum 16 digits, and those digits can be separated by characters: ()- x+ (but should not be part of the min/max count).
but then says that 1-()-4++++4-()-66 should be false. However, it meets the rules for truth as originally defined by the OP. (NB: This example was later changed in the OP's question, but the point I'm making remains valid.)
Example: Using Code to Simplify the Regex Pattern Match
Logic should be encapsulated in short, testable pieces of code, not in complex regular expressions. For example, consider the following Ruby code:
numbers = [
'555-123-4567',
'1-234',
'+44(55)123-3333',
'abcd1',
'1-()-4++++4-()-66'
]
numbers.map { |num| num.delete '- x+()' }.grep /\A\d{4,16}\z/
#=> ["5551234567", "1234", "44551233333", "14466"]
Even if you aren't a Rubyist, the code should be easy to follow. This code strips out the characters that are irrelevant to our match, then checks that each string contains nothing but 4-16 digits anchored to the beginning and end of the string. Instead of validating a complex pattern, you're now just validating a simple pattern (e.g. all numbers) with a well-defined interval from 4 to 16. Furthermore, you can break this kind of logic up into smaller steps rather than simply calling long method chains, making this inherently more testable.
Example: Avoiding Regexp Validation Altogether
You could even go further by avoiding the regex for any sort of validation, and making your Boolean expressions more explicit. Consider the following:
numbers = [
'555-123-4567',
'1-234',
'+44(55)123-3333',
'abcd1',
'1-()-4++++4-()-66'
]
numbers.map do |num|
digits = num.scan /\d/
valid = digits.count >= 4 and digits.count <= 16
puts "#{num}: #{valid}"
end
This will print:
555-123-4567: true
1-234: true
+44(55)123-3333: true
abcd1: false
1-()-4++++4-()-66: true
To me, this seems like a much more robust and flexible way of solving the "phone number validation" question, which gets asked here on Stack Overflow in one form or another with amazing regularity. Your mileage may vary.

Regular Expression for a 0.25 interval

My aim is to write a regular expression for a decimal number where a valid number is one of
xx.0, xx.125, xx.25, xx.375, xx.5, xx.625, xx.75, xx.875 (i.e. measured in 1/8ths) The xx can be 0, 1 or 2 digits.
i have come up with the following regex:
^\d*\.?((25)|(50)|(5)|(75)|(0)|(00))?$
while this works for 0.25,0.5,0.75 it wont work for 0.225, 0.675 etc .
i assumed that the '?' would work in a case where there is preceding number as well.
Can someone point out my mistake
Edit : require the number to be a decimal !
Edit2 : i realized my mistake i was confused about the '?'. Thank you.
I would add another \d* after the literal . check \.
^\d*\.?\d*((25)|(50)|(5)|(75)|(0)|(00))?$
I think it would probably just be easier to multiply the decimal part by 8, but you don't consider digits that lead the last two decimals in the regex.
^\d{0,2}\.(00?|(1|6)?25|(3|8)?75|50?)$
Your mistake is: \.? indicates one optional \., not a digit (or anything else, in this case).
About the ? (question mark) operator: Makes the preceding item optional. Greedy, so the optional item is included in the match if possible. (source)
^\d{0,2}\.(0|(1|2|6)?25|(3|6|8)?75|5)$
Regular expressions are for matching patterns, not checking numeric values. Find a likely string with the regex, then check its numeric value in whatever your host language is (PHP, whatever).

Create shortest possible regex

I want to create a regex that will match any of these values
7-5
6-6 ((0-99) - (0-99))
6-4
6-3
6-2
6-1
6-0
0-6
1-6
2-6
3-6
4-6
the 6-6 example is a special case, here are some examples of values:
6-6 (23-8)
6-6 (4-25)
6-6 (56-34)
Is it possible to make one regex that can do this?
If so, is it possible to further extend that regex for the 6-6 special case such that the the difference between the two numbers within the parentheses is equal to 2 or -2?
I could easily write this with procedural code, but i'm really curious if someone can devise a regex for this.
Lastly, if it could be further extended such that the individual digits were in their own match groups I'd be amazed. An example would be for 7-5, i could have a match group that just had the value 7, and another that had the value 5. However for 6-6 (24-26) I'd like a match group that had the first six, a match group for the second 6, a match group for the 24 and a match group for the 26.
This may be impossible, but some of you can probably get this part of the way there.
Good luck, and thanks for the help.
NO. The answer is "We can't," and the reason is because you're trying to use a hammer to dig a hole.
The problem with writing one long "clever" (this word causes a knee-jerk reaction in many people who are far more anti-regex than I) regex is that, six months from now, you'll have forgotten those clever regex features that you used so heavily, and you'll have written six months worth of code related to something else, and you'll get back to your impressive regex and have to tweak one detail, and you'll say, "WTF?"
This is what (I understand) you want, in Perl:
# data is in $_
if(/7-5|6-[0-4]|[0-4]-6|6-6 \((\d{1,2})-(\d{1,2})\)/) {
if($1 and $2 and abs($1 - $2) == 2) {
# we have the right difference
}
}
Some might say that the given regex is a bit much, but I don't think it's too bad. If the \d{1,2} bit is a little too obscure you could use \d\d? (which is what I used at first, but didn't like the repetition).
You can do it like this:
7-5|6-[0-4]|[0-5]-6|6-6 \(\d\d?-\d\d?\)
Just add parens to get your match groups.
Off the top of my head (there may be some errors but the principle should be good):
\d-\d|6-6 (\d+-\d+)
And like with any regexp, you can surround what you want to extract with parentheses for match groups:
(\d)-(\d)|(6)-(6) ((\d)+-(\d+))
In the 6-6 case, the first two parentheses should get the sixes, and the second two should get the multi-digit values that come afterwards.
Here is one that will match only the numbers you want and let you get each digit by name:
p = r'(?P<a>[0-4]|6|7)-(?P<b>[0-4]|6|5) *(\((?P<c>\d{1,2})-(?P<d>\d{1,2})\))?'
To get each digit you could use:
values = re.search(p, string).group('a', 'b', 'c', 'd')
Which will return a four element tuple with the values you are looking for (or None if no match was found).
One problem with this pattern is that it will patch the stuff in the parenthesis whether or not there was a match to '6-6'. This one will only match the final parenthesis if 6-6 is matched:
p = r'(?P<a>[0-4]|(?P<tmp_a>6)|7)-(?P<b>(?(tmp_a)(?P<tmp_b>6)|([0-4]|5)))(?(tmp_b) *(\((?P<c>\d{1,2})-(?P<d>\d{1,2})\))?)'
I don't know of any way to look for a difference between the numbers in the parenthesis; regex only knows about strings, not numerical values . . .
(I am assuming python syntax here; the perl syntax is slightly different, though perl supports the python way of doing things.)