VIN Validation RegEx - regex

I have written a VIN validation RegEx based on the http://en.wikipedia.org/wiki/Vehicle_identification_number but then when I try to run some tests it is not accepting some valid VIN Numbers.
My RegEx:
^[A-HJ-NPR-Za-hj-npr-z\\d]{8}[\\dX][A-HJ-NPR-Za-hj-npr-z\\d]{2}\\d{6}$
VIN Number Not Working:
1ftfw1et4bfc45903
WP0ZZZ99ZTS392124
VIN Numbers Working:
19uya31581l000000
1hwa31aa5ae006086
(I think the problem occurs with the numbers at the end, Wikipedia made it sound like it would end with only 6 numbers and the one that is not working but is a valid number only ends with 5)
Any Help Correcting this issue would be greatly appreciated!

I can't help you with a perfect regex for VIN numbers -- but I can explain why this one is failing in your example of 1ftfw1et4bfc45903:
^[A-HJ-NPR-Za-hj-npr-z\d]{8}[\dX][A-HJ-NPR-Za-hj-npr-z\d]{2}\d{6}$
Explanation:
^[A-HJ-NPR-Za-hj-npr-z\d]{8}
This allows for 8 characters, composed of any digits and any letters except I, O, and Q; it properly finds the first 8 characters:
1ftfw1et
[\dX]
This allows for 1 character, either a digit or a capital X; it properly finds the next character:
4
[A-HJ-NPR-Za-hj-npr-z\d]{2}
This allows for 2 characters, composed of any digits and any letters except I, O, and Q; it properly finds the next 2 characters:
bf
\d{6}$
This allows for exactly 6 digits, and is the reason the regex fails; because the final 6 characters are not all digits:
c45903

Dan is correct - VINs have a checksum. You can't utilize that in regex, so the best you can do with regex is casting too wide of a net. By that I mean that your regex will accept all valid VINs, and also around a trillion (rough estimate) non-VIN 17-character strings.
If you are working in a language with named capture groups, you can extract that data as well.
So, if your goal is:
Only to not reject valid VINs (letting in invalid ones is ok) then use Fransisco's answer, [A-HJ-NPR-Z0-9]{17}.
Not reject valid VINs, and grab info like model year, plant code, etc, then use this (note, you must use a language that can support named capture groups - off the top of my head: Perl, Python, Elixir, almost certainly others but not JS): /^(?<wmi>[A-HJ-NPR-Z\d]{3})(?<vds>[A-HJ-NPR-Z\d]{5})(?<check>[\dX])(?<vis>(?<year>[A-HJ-NPR-Z\d])(?<plant>[A-HJ-NPR-Z\d])(?<seq>[A-HJ-NPR-Z\d]{6}))$/ where the names are defined at the end of this answer.
Not reject valid VINs, and prevent some but not all invalid VINs, you can get specific like Pedro does.
Only accept valid VINs: you need to write code (just kidding, GitHub exists).
Capture group name key:
wmi - World manufacturer identifier
vds - Vehicle descriptor section
check - Check digit
vis - Vehicle identifier section
year - Model year
plant - Plant code
seq - Production sequence number

This regular expression is working fine for validating US VINs, including the one you described:
[A-HJ-NPR-Z0-9]{17}
Remember to make it case insensitive with flag i
Source: https://github.com/rfink/angular-vin

VIN should have only A-Z, 0-9 characters, but not I, O, or Q
Last 6 characters of VIN should be a number
VIN should be 17 characters long
You didn't specify which language you're using but the following regex can be used to validate a US VIN with php:
/^(?:([A-HJ-NPR-Z]){3}|\d{3})(?1){2}\d{2}(?:(?1)|\d)(?:\d|X)(?:(?1)+\d+|\d+(?1)+)\d{6}$/i

I feel regex is not the ideal validation. VINs have a built in check digit. https://en.wikibooks.org/wiki/Vehicle_Identification_Numbers_(VIN_codes)/Check_digit or http://www.vsource.org/VFR-RVF_files/BVINcalc.htm
I suggest you build an algorithm using this. (Untested algorithm example)

This should work, it is from splunk search, so there are some additional exclusions**
(?i)(?<VIN>[A-Z0-9^IOQioq_]{11}\d{6})

The NHTSA website provides the method used to calculate the 9th character checksum, if you're interested. It also provides lots of other useful data, such as which characters are allowed in which position, or how to determine whether the 10th character, if alphabetic, refers to a model year up to 1999 or a model year from 2010.
NHTSA VIN eCFR
Hope that helps.

Please, use this regular expression. It is shorter and works with all VIN types
(?=.*\d|[A-Z])(?=.*[A-Z])[A-Z0-9]{17}
I changed above formula by new below formula
(?=.*\d|=.*[A-Z])(?=.*[A-Z])[A-Z0-9]{17}
This regular expression consider any letter but at leats one digit, max 17 characters

Related

Regex expression for date within dates range

I need to validate with regex a date in format yyyy-mm-dd (2019-12-31) that should be within the range 2019-12-20 - 2020-01-10.
What would be the regex for this?
Thanks
Regex only deal with characters. so we have to work out at each position in the date what are the valid characters.
The first part is easy. The first two characters have to be 20
Now it gets complicated the next character can be a 1 or a 2 but what follows depends on the value of that character so we split the rest of the regex into two sections the first if the third character matches 1 and the second if it matches 2
We know that if the third character is a 1 then what must follow is the characters 9-12- as the range starts at 2019-12-20 now for the day part. The 9th character is the tens for the day this can only be 2 or 3 as we are already in the last month and the minimum date is 20. The last character can be any digit 0-9. This gives us a day match of [23][0-9]. Putting this together we now have a pattern for years starting 2019 as 19-12-[23][0-9]
It the third character is a 2 then we can match up to the day part of the date a gain as the range ends in January. This gives us a partial match of 20-01- leaving us to work on the day part. Hear we know that the first character of the day can either be a 1 or 0 however if it's a 1 then the last character must be a 0 and if it's a 0 then the last character can only be in the range 1 to 9. This give us another alteration (?:0[1-9]|10) Putting the second part together we get 20-01-(?:0[1-9]|10).
Combining these together gives the final regex 20(?:19-12-[23][0-9]|20-01-(?:0[1-9]|10))
Note that I'm assuming that the date you are testing against is a validly formatted date.
Try this:
(2019|2020)\-(12|01)\-([0-3][0-9]|[0-9])
But be aware that this will allow number up to where the first digit is between zero and three and the second digit between zero and nine for the dd value. You could specify all numbers you want to allow (from 20 to 10) like this (20|21|22|23|24|25|26|27|28|29|30|31|01|1|02|2|03|3|04|4|05|5|06|6|07|7|08|8|09|9|10).
(2019|2020)\-(12|01)\-(20|21|22|23|24|25|26|27|28|29|30|31|01|1|02|2|03|3|04|4|05|5|06|6|07|7|08|8|09|9|10)
But honestly... Regular-Expressions are not the right tool for this. RegExp gives a mask to something, not a logical context. Use regex to extract the data/value from a string and validate those values using another language.
The above 2nd Regex will, f.e. match your dates, but also values outside of this range since there is no context between 2019|2020 and the second group 12|01 so they match values like 2019-12-11 but also 2020-12-11.
To only match the values you want this will be a really large regex like this (inner brackets only if you need them) ((2019)-(12)-(20)|(2019)-(12)-(21)|(2019)-(12)-(22)|...) and continue with all possible dates - and ask yourself: what would you do if you find such a regex in a project you have to work with ;)
Better solution (quick and dirty, there might be better solutions):
(?<yyyy>20[0-9]{2})\-(?<mm>[01][0-9]|[0-9])\-(?<dd>[0-3][0-9]|[0-9])
This way you have three named groups (yyyy, mm, dd) you can access and validate the matched values... The regex is smaller, you have a better association between code and regex and both are easier to maintain.

Regex for validation of a street number

I'm using an online tool to create contests. In order to send prizes, there's a form in there asking for user information (first name, last name, address,... etc).
There's an option to use regular expressions to validate the data entered in this form.
I'm struggling with the regular expression to put for the street number (I'm located in Belgium).
A street number can be the following:
1234
1234a
1234a12
begins with a number (max 4 digits)
can have letters as well (max 2 char)
Can have numbers after the letter(s) (max3)
I came up with the following expression:
^([0-9]{1,4})([A-Za-z]{1,2})?([0-9]{1,3})?$
But the problem is that as letters and second part of numbers are optional, it allows to enter numbers with up to 8 digits, which is not optimal.
1234 (first group)(no letters in the second group) 5678 (third group)
If one of you can tip me on how to achieve the expected result, it would be greatly appreciated !
You might use this regex:
^\d{1,4}([a-zA-Z]{1,2}\d{1,3}|[a-zA-Z]{1,2}|)$
where:
\d{1,4} - 1-4 digits
([a-zA-Z]{1,2}\d{1,3}|[a-zA-Z]{1,2}|) - optional group, which can be
[a-zA-Z]{1,2}\d{1,3} - 1-2 letters + 1-3 digits
or
[a-zA-Z]{1,2} - 1-2 letters
or
empty
\d{0,4}[a-zA-Z]{0,2}\d{0,3}
\d{0,4} The first groupe matches a number with 4 digits max
[a-zA-Z]{0,2} The second groupe matches a char with 2 digit in max
\d{0,3} The first groupe matches a number with 3 digits max
You have to keep the last two groups together, not allowing the last one to be present, if the second isn't, e.g.
^\d{1,4}(?:[a-zA-z]{1,2}\d{0,3})?$
or a little less optimized (but showing the approach a bit better)
^\d{1,4}(?:[a-zA-z]{1,2}(?:\d{1,3})?)?$
As you are using this for a validation I assumed that you don't need the capturing groups and replaced them with non-capturing ones.
You might want to change the first number check to [1-9]\d{0,3} to disallow leading zeros.
Thank you so much for your answers ! I tried Sebastian's solution :
^\d{1,4}(?:[a-zA-z]{1,2}\d{0,3})?$
And it works like a charm ! I still don't really understand what the ":" stand for, but I'll try to figure it out next time i have to fiddle with Regex !
Have a nice day,
Stan
The first digit cannot be 0.
There shouldn't be other symbols before and after the number.
So:
^[1-9]\d{0,3}(?:[a-zA-Z]{1,2}\d{0,3})?$
The ?: combination means that the () construction does not create a matching substring.
Here is the regex with tests for it.

How can I use REGEX to test for currency formats

How can you create a regular expression that checks if a user input matches characters formally found in a currency syntax? (number, period/decimal place, comma, or dollar sign?).
The following can find all characters listed above except for the dollar sign, any idea how to properly structure this?
/([0-9.,])/g
The regex I use for currency validation is as follows:
^(\$)?([1-9]{1}[0-9]{0,2})(\,\d{3})*(\.\d{2})?$|^(\$)?([1-9]{1}[0-9]{0,2})(\d{3})*(\.\d{2})?$|^(0)?(\.\d{2})?$|^(\$0)?(\.\d{2})?$|^$
RegExr is a great website for testing and reviewing these strings (perhaps you could make a regex string that's less of a beast!)
Are you just trying to test the characters? In that case
[0-9,.$]+
will suffice. Or are you testing for the format $1,123,123.12 with the correct placements of commas and everything?
In that case you would need something more like
(\$?\d{1,3}(?:,\d{3})*(?:.\d{2})?)
should do.
You need to define what you want your regex to match, more formally than "matches characters formally found in a currency syntax". We don't know which currencies you're interested in. We don't know how strict you need it to be.
Maybe you'll come up with something like:
These elements must come in this order:
A currency symbol ('£', '€' or '$') (your requirement might specify more currencies)
1 or more numeric digits
A period or a comma
Exactly two numeric digits
Once you have a specification like that, it's easy to translate into a regular expression:
[£€$] // one of these chars.
\d+ // '+' means 'one or more'
[.,] // '[]' means 'any one of these'.
\d\d // Two digits. Could also be written as '\d{2}'
Or concatenated together:
[£€$]\d+[.,]\d\d
If you've learned about escaping special characters like $ and ., you may be surprised not to see it done here. Within [], they lose their special meaning.
(There are dialects of regex -- check the documentation for whatever implementation you're using)
Your requirements may be different though. The example I've given doesn't match:
$ 12.00
$12
USD12
¥200.00
25¢
$0.00005
20 μBTC
44 dollars
£1/19/11¾d ("one pound, nineteen shillings and elevenpence three farthings")
Work out your requirement, then write your code to meet it.
you should set \ before special chars, also you should set star(0+) or plus(1+) for match full currency chars, for example:
/([0-9\.,]*)/g
or for real price how 200,00 where all time exist 2 symbols after comma:
/(([0-9]+)(\.|,)([0-9]){2})/g

Regular expression for secret code

I've created one text field which accepts the product code.
I have tried many ways and got disappointed.
The product code is having some validations like follows,
Product code :315299AZ
1.First 2 digits ranges from[01-31].,should not contain 00.
2.Second 2 digits ranges from [01-52]., should not contain 00.
3.Third 2 digits ranges from [00-99].
4.Last 2 are optional. But should accept only alphabets. Should not accepts numbers.
Please someone help me to get out of it.
You can use the following regex :
(?!00)(([0-2][0-9])|31|30)(?!00)(([0-4][0-9])|51|50|52)(\d{2})([a-zA-Z]{2})?
(?!00) is a negative look-ahead that doesn't allows 00.
Debuggex Demo
There you go:
((0[1-9])|([1-2]\d)|(3[0-1]))((0[1-9])|([1-4]\d)|(5[0-2]))\d{2}([a-zA-Z]{2})?
If you don't like look-aheads.
I know it's not the spirit, but any sensible language supporting regular expressions should allow you to access groups, hence do something along these lines (pseudocode follows):
if product_code matches /^(\d\d)(\d\d)\d\d([a-zA-Z]{2})?$/ {
assert 1 <= int($1) <= 31 // validate first group
assert 1 <= int($2) <= 52 // validate second group
}
Bonus: you can actually read it.
(This is assuming the last optional group contains either two or zero characters. If one character is acceptable, you can replace it with [a-zA-Z]{0,2})

Custom RegEx expression for validating different possibilities of phone number entries?

I'm looking for a custom RegEx expression (that works!) to will validate common phone number with area code entries (no country code) such as:
111-111-1111
(111) 111-1111
(111)111-1111
111 111 1111
111.111.1111
1111111111
And combinations of these / anything else I may have forgotton.
Also, is it possible to have the RegEx expression itself reformat the entry? So take the 1111111111 and put it in 111-111-1111 format. The regex will most likely be entered in a Joomla / some type of CMS module, so I can't really add code to it aside from the expression itself.
\(?(\d{3})\)?[ .-]?(\d{3})[ .-]?(\d{4})
will match all your examples; after a match, backreference 1 will contain the area code, backreference 2 and 3 will contain the phone number.
I hope you don't need to handle international phone numbers, too.
If the phone number is in a string by itself, you could also use
^\s*\(?(\d{3})\)?[ .-]?(\d{3})[ .-]?(\d{4})\s*$
allowing for leading/trailing whitespace and nothing else.
Why not just remove spaces, parenthesis, dashes, and periods, then check that it is a number of 10 digits?
Depending on the language in question, you might be better off using a replace-like statement to replace non-numeric characters: ()-/. with nothing, and then just check if what is left is a 10-digit number.