Simple regex for EU VAT-numbers - regex

I need a regex for validating EU-VAT numbers. There are some out there, but they are all specific to each member state and I do not need it to be so specific. So something that requires the user to enter a certain length of characters with first ones required to be letters, and rest digits with some letters allowed is good enough.
So essentially I need to match following
2-4 first characters must be letters
The rest can either be digits only, or contain max 2 letters among the digits
Ignore hyphens (some member states use them)
Ignore spaces and underscores (because users)
So far I have the following, which kind of does the job, but unfortunately also matches input with only letters (ABCDEFGHIJKLMNOP) link
([A-Za-z]{2,4})([a-zA-Z0-9\-\_ ]{2,12})
Here you can see the format of all the VAT numbers.
https://www.gov.uk/guidance/vat-eu-country-codes-vat-numbers-and-vat-in-other-languages

You may use
^[A-Za-z]{2,4}(?=.{2,12}$)[-_\s0-9]*(?:[a-zA-Z][-_\s0-9]*){0,2}$
See the regex demo
Details
^ - start of string
[A-Za-z]{2,4} - 2 to 4 ASCII letters
(?=.{2,12}$) - then, there must be 2 to 12 chars up to the end of the string (it does not matter much what chars, we are just checking the length of the rest of the string here)
[-_\s0-9]* - zero or more digits, -, _ or whitespace
(?:[a-zA-Z][-_ 0-9]*){0,2} - 0 to 2 consecutive sequences of:
[a-zA-Z] - an ASCII letter
[-_\s0-9]* - zero or more digits, -, _ or whitespace
$ - end of string,

There is a python module to verify VAT number. It internally have series of regexes. I have been using it personally and it is very accurate. You may want to check it out : https://pypi.org/project/vatnumber/

Related

Regex to match code with fixed country code and variable wildcard usage

I need to implement a regex which cover several requirements. These are the following:
A length restriction to max 8 chars should be done (with or
without wildcard). In any case the code is never longer than 8 chars.
When wildcard is given also lower then 8 digits is allowed. Without
wildcard exactly 8 digits are needed.
allowed characters are: 0-9A-Za-z* (all digits, all chars, asterix as wildcard)
pure wildcard must be possible
else the first two digits must contain a 2 chars country code
(alpha-numeric) and then only number or wildcards are allowed.
after country code wildcard can be used at any place (in the middle, at the end, mutliple asterix/wildcards after each other also allowed)
I tried many things so far and thought about Lookahead/Lookbehind because of the asterix and the max. length.
My latest state which covers the most of the requirements is the following:
^([A-Za-z]{2}[0-9*]{0,6}|\*)$
check the live demo with right/wrong combo
But in this example a code without asterix/wildcard is possible with less than 8 chars -> that's wrong.
Thanks a lot for any help in advance :)
You can use
^(?!.*\*\*$)(?!.{9})(?:[A-Za-z]{2}(?:\d*(?:\*\d*)+|\d{6})|\*)$
See the regex demo.
Details:
^ - start of string
(?!.*\*\*$) - no two ** at the end of string allowed
(?!.{9}) - the string must contain less than 9 chars other than line break chars
(?:[A-Za-z]{2}(?:\d*(?:\*\d*)+|\d{6})|\*) - one of the two alternatives:
[A-Za-z]{2}(?:\d*(?:\*\d*)+|\d{6}) - two letters and then either six digits or zero or more digits followed with one or more sequences of an asterisk and zero or more digits
| - or
\* - an asterisk
$ - end of string.

Validating an obfuscation token

I am building a secured algorithm to get rid of obfuscation attacks. The user is validated with the token which should satisfy following condition:
username in lowercase letters only and username is at least 5 digit long.
username is followed with #.
After # first two characters are important. A digit and a character always. This part contains at least a digit, a lowercase and an upperCase Letter.
In between there could be any number of digits or letters only.
In the last the digit and character should exactly match point-3's digit and character.
It should end with #.
The characters in the middle of two # should be at least 5 characters long.
The complete token consists only of two #, lowercase and uppercase letters and digits. And
I don't know about regular expression but my guide told me this task is easily achieved at validation time by regular expressions. After I looked for long on the internet and found some links which are similar and tried to combine them and got this:
^[a-z]{5,}#[a-zA-Z0-9]{2}[A-Z][0-9A-Za-z]*[a-zA-Z0-9]{2}#$
But this only matches 1 test case. I don't know how I can achieve the middle part of two hashes. I tried to explain my problem as per my english. Please help.
Below test cases should pass
userabcd#4a39A234a#
randomuser#4A39a234A#
abcduser#2Aa39232A#
abcdxyz#1q39A231q#
randzzs#1aB1a#
Below test cases should fail:
randuser#1aaa1a#
randuser#1112#
randuser#a1a1##
randuser#1aa#
u#4a39a234a#
userstre#1qqeqe123231q$
user#1239a23$a#
useabcd#4a39a234a#12
You may try:
^[a-z]{5,}#(?=[^a-z\n]*[a-z])(?=[^A-Z\n]*[A-Z])(\d[a-zA-Z])[a-zA-Z\d]*\1#$
Explanation of the above regex:
^, $ - Represents start and end of the line respectively.
[a-z]{5,} - Matches lower case user names 5 or more times.
# - Matches # literally.
(?=[^a-z]*[a-z]) - Represents a positive look-ahead asserting at least a lowercase letters.
(?=[^A-Z]*[A-Z]) - Represents a positive look-ahead asserting at least an uppercase letters.
(\d[a-zA-Z]) - Represents a capturing group matching first 2 character i.e. a digit and a letter. If you want other way then use [a-zA-Z]\d.
[a-zA-Z\d]* - Matching zero or more of the characters in mentioned character set.
\1 - Represents back-reference exactly matching the captured group.
You can find the demo of the above regex in here.
Note: If you want to match one string at a time i.e. for practical purposes; remove \n from the character sets.
You can use this regex as an alternative.
^[a-z]{5,}#(?=.*?[a-z])(?=.*?[A-Z])(\d[a-zA-Z])[a-zA-Z\d]*\1#$
Recommended reading: Principle of contrast

Only allow 2 digits in a string using regex

I need regex that only allows a maximum of 2 digits (or whatever the desired limit is actually) to be entered into an input field.
The requirements for the field are as follows:
Allow a-z A-Z
Allow 0-9
Allow - and . characters
Allow spaces (\s)
Do not allow more than 2 digits
Do not allow any other special characters
I have managed to put together the following regex based on several answers on SO:
^(?:([a-zA-z\d\s\.\-])(?!([a-zA-Z]*\d.*){3}))*$
The above regex is really close. It works successfully for the following:
test 12 test
test12
test-test.12
But it allows an input of:
123 (but not 1234, so it's close).
It only needs to allow an input of 12 when only digits are entered into the field.
I would like some help in finding a more efficient and cleaner (if possible) solution than my current regex - but it must still be regex, no JS.
You could use a positive lookahead like
(?=^(?:\D*\d\D*){2}$) # only two digits
^[- .\w]+$ # allowed characters
See a demo on regex101.com.
You may use a negative lookahead anchored at the start that will make the match fail once there are 3 digits found anywhere in the string:
^(?!(?:[^0-9]*[0-9]){3})[a-zA-Z0-9\s.-]*$
^^^^^^^^^^^^^^^^^^^^^^^
See the regex demo
Details:
^ - start of string
(?!(?:[^0-9]*[0-9]){3}) - the negative lookahead failing the match if exactly 3 following sequences are found:
[^0-9]* - zero or more chars other than digits
[0-9] - a digit (thus, the digits do not have to be adjoining)
[a-zA-Z0-9\s.-]* - 0+ ASCII letters, digits, whitespace, . or - symbols
$ - end of string.

Need a Regex that contains at least one number, zero or more letters, no spaces, min/max

I need a regular expression that will match a string that contains:
at least one number
zero or more letters
no other characters such as spaces
The string must also be a minimum of 8 characters and a maximum of 13 characters.
Placement of the numbers and/or letters within the 8-13 character string does not matter. I haven't figured out how to make sure that the string contains a number, but here are some expressions that don't work because they are picking up spaces in the online tool Regexr. Take a look below:
- ([\w^/s]){8,13}
- ([a-zA-Z0-9]){8,13}
- ([a-zA-Z\d]){8,13}
I am specifically looking to exclude spaces and special characters. The linked and related questions all appear to allow for these characters. This is not for validating passwords, it is for detecting case numbers in natural language processing. This is different from "Password REGEX with min 6 chars, at least one letter and one number and may contain special characters" because I am looking for at least one number but zero or more letters. I also do not want to return strings that contain any special characters including spaces.
This is a typical password validation with your requirements.
Note that this will also match 8-13 digits as well (but it is requested).
Ten million + 1 (and counting) happy customers ..
^(?=.*\d)[a-zA-Z\d]{8,13}$
Explained
^ # Beginning of string
(?= .* \d ) # Lookahead for a digit
[a-zA-Z\d]{8,13} # Consume 8 to 13 alphanum characters
$ # End of string
I've seen the answer above (by sln) everywhere over the internet, but as far as I can tell, it is NOT ACCURATE.
If your string contains 8 to 13 characters with no numbers this expression will match it, because it uses the * quantifier on the wildcard character . in the positive lookahead.
In order to match at least 1 digit, 1 A-Z and 1 a-z in a password that's at least 8 characters long, you'll need something like this:
(?=.{1,7}\d)(?=.{1,7}[a-z])(?=.{1,7}[A-Z])[a-zA-Z\d]{8,13}
it uses 3 lookaheads:
(?=.{1,7}\d)
(?=.{1,7}[a-z])
(?=.{1,7}[A-Z])
each time, it looks for the target (eg the first digit) but allows 1 to 7 occurances of any character before it.
Then it will match 8 to 13 alphanumeric characters.
NOTE to Powershell users:
Use a search group to be able to extract a result
$password = [regex]::match($string-to-search,'(?=.{1,7}\d)(?=.{1,7}[a-z])(?=.{1,7}[A-Z])([a-zA-Z\d]{8,13})').Groups[1].Value

How can I recognize a valid barcode using regex?

I have a barcode of the format 123456########. That is, the first 6 digits are always the same followed by 8 digits.
How would I check that a variable matches that format?
You haven't specified a language, but regexp. syntax is relatively uniform across implementations, so something like the following should work: 123456\d{8}
\d Indicates numeric characters and is typically equivalent to the set [0-9].
{8} indicates repetition of the preceding character set precisely eight times.
Depending on how the input is coming in, you may want to anchor the regexp. thusly:
^123456\d{8}$
Where ^ matches the beginning of the line or string and $ matches the end. Alternatively, you may wish to use word boundaries, to ensure that your bar-code strings are properly separated:
\b123456\d{8}\b
Where \b matches the empty string but only at the edges of a word (normally defined as a sequence consisting exclusively of alphanumeric characters plus the underscore, but this can be locale-dependent).
123456\d{8}
123456 # Literals
\d # Match a digit
{8} # 8 times
You can change the {8} to any number of digits depending on how many are after your static ones.
Regexr will let you try out the regex.
123456\d{8}
should do it. This breaks down to:
123456 - the fixed bit, obviously substitute this for what you're fixed bit is, remember to escape and regex special characters in here, although with just numbers you should be fine
\d - a digit
{8} - the number of times the previous element must be repeated, 8 in this case.
the {8} can take 2 digits if you have a minimum or maximum number in the range so you could do {6,8} if the previous element had to be repeated between 6 and 8 times.
The way you describe it, it's just
^123456[0-9]{8}$
...where you'd replace 123456 with your 6 known digits. I'm using [0-9] instead of \d because I don't know what flavor of regex you're using, and \d allows non-Arabic numerals in some flavors (if that concerns you).