Check ICD10 via regex

Check ICD10 via regex - regex

I need to check icd10 code this code generate with few condition
min length is 3.
first character is letter and not is 'U'.
second and third is digit.
fourth is dot(.)
fifth to eight charactor is letter or digit.
Ex.:
Right : "A18.32","A28.2","A04.0","A18.R252", "A18", "A18.52", "R18", "R18."
Wrong : "A184.32","U18","111."

is this an icd-10-cm code you are looking to verify.
if so I believe that the 3rd digit is alpha or numeric
taken from page 7
https://www.cms.gov/Medicare/Coding/ICD10/downloads/032310_ICD10_Slides.pdf
if so the following regular expression should validate.
^([a-tA-T]|[v-zV-Z])\d[a-zA-Z0-9](\.[a-zA-Z0-9]{1,4})?$
otherwise you can edit the above regular expression to check characte 2 and 3 as numeric.
^([a-tA-T]|[v-zV-Z])\d{2}(\.[a-zA-Z0-9]{1,4})?$

You could try something like so: ^[A-TV-Z]\d{2}(\.[A-Z\d]{0,4})?$. An example is available here.
This is how the answer satisfies your condition:
Min length is 3: ^[A-TV-Z]\d{2}...$ attempts to match a letter and 2 digits. The ^ and $ ensure that there is nothing else in the string which does not satisfy the regular expression. This segment: (\.[A-Z\d]{0,4})? is surrounded by the ? operator: (...)?. This means that the content within the round brackets may or may not be there.
First character is letter and not is 'U'. This is satisfied by [A-TV-Z], which matches all the upper case letters which are between A and T, V and Z inclusive. This omits the letter U.
Second and third is digit. \d{2} means match two digits.
Fourth is dot(.): This is satisfied by \.. The extra \ is needed because the period character is a special character in regular expressions, which means match any character (exception new lines, unless a special option is passed along).
Fifth to eight charactor is letter or digit. [A-Z\d]{0,4} means any letter or digits, repeated between 0 and 4 times.

Try this:
\b[a-tv-zA-TV-Z]\d{2}(\.[a-zA-Z0-9]{,4})?\b
I assume by your example the dot and everything after it is optional
This regex will match a word boundary \b, a letter other than u or U [a-tv-zA-TV-Z], two digits \d{2} and then an optional dot followed by 0-4 letters or digits (\.[a-zA-Z0-9]{,4})? and a second word boundary \b

This question is old, but I had the same issue of validating ICD-10 codes, so it seemed worth an updated answer.
As it turns out, there are two flavors of ICD-10 codes: ICD-10-CM and ICD-10-PCS. From their usage guidelines:
The ICD-10-CM is a morbidity classification published by the United
States for classifying diagnoses and reason for visits in all health
care settings.
and
The ICD-10-PCS is a procedure classification published by the United
States for classifying procedures performed in hospital inpatient
health care settings.
Both Sets
In both the ICD-10-CM and ICD-10-PCS coding systems, you can validate the structure of a code with a regular expression, but validating the content (in terms of which specific combinations of letters and numbers are valid) may be technically possible, but is practically infeasible. A lookup table would be a better bet.
ICD-10-CM
From the Conventions section of the guidelines:
Format and Structure:
The ICD-10-CM Tabular List contains categories, subcategories and
codes. Characters for categories, subcategories and codes may be
either a letter or a number. All categories are 3 characters. A
three-character category that has no further subdivision is equivalent
to a code. Subcategories are either 4 or 5 characters. Codes may be 3,
4, 5, 6 or 7 characters. That is, each level of subdivision after a
category is a subcategory. The final level of subdivision is a code.
Codes that have applicable 7th characters are still referred to as
codes, not subcategories. A code that has an applicable 7th character
is considered invalid without the 7th character.
According to this specification, you'd expect a valid regular expression would look like this:
^\w{3,7}$
However, a review of the actual values shows that, in all cases, the first character is an upper case letter, the second character is a digit, and any alphabetic characters in the remaining available positions are upper case as well. As such, you can use this information to more precisely specify what you're validating:
^[A-Z]\d[A-Z\d]{1,5}$
If you want to allow for a possible period in the fourth position followed by up to four more characters as specified by the OP:
^[A-Z]\d[A-Z\d](\.[A-Z\d]{0,4})?$
ICD-10-PCS
From the Conventions section of the guidelines:
One of 34 possible values can be assigned to each axis of
classification in the seven character code: they are the numbers 0
through 9 and the alphabet (except I and O because they are easily
confused with the numbers 1 and 0). The number of unique values used
in an axis of classification differs as needed...As with words in their
context, the meaning of any single value is a combination of its axis
of classification and any preceding values on which it may be
dependent...Within a PCS table, valid codes include all combinations
of choices in characters 4 through 7 contained in the same row of the
table. [For example], 0JHT3VZ is a valid code, and 0JHW3VZ is
not a valid code.
So to validate the structure of an ICD-10-PCS code:
^[A-HJ-NP-Z\d]{7}$

Use this exp simple :
'^([A-TV-Za-tv-z]{1}[0-9]{1}[A-Za-z0-9]{1}|[A-TV-Za-tv-z]{1}[0-9]{1}[A-Za-z0-9]{1}.[A-Za-z0-9]{1,4})$'

Related

Regex expression for date within dates range

I need to validate with regex a date in format yyyy-mm-dd (2019-12-31) that should be within the range 2019-12-20 - 2020-01-10.
What would be the regex for this?
Thanks

Regex only deal with characters. so we have to work out at each position in the date what are the valid characters.
The first part is easy. The first two characters have to be 20
Now it gets complicated the next character can be a 1 or a 2 but what follows depends on the value of that character so we split the rest of the regex into two sections the first if the third character matches 1 and the second if it matches 2
We know that if the third character is a 1 then what must follow is the characters 9-12- as the range starts at 2019-12-20 now for the day part. The 9th character is the tens for the day this can only be 2 or 3 as we are already in the last month and the minimum date is 20. The last character can be any digit 0-9. This gives us a day match of [23][0-9]. Putting this together we now have a pattern for years starting 2019 as 19-12-[23][0-9]
It the third character is a 2 then we can match up to the day part of the date a gain as the range ends in January. This gives us a partial match of 20-01- leaving us to work on the day part. Hear we know that the first character of the day can either be a 1 or 0 however if it's a 1 then the last character must be a 0 and if it's a 0 then the last character can only be in the range 1 to 9. This give us another alteration (?:0[1-9]|10) Putting the second part together we get 20-01-(?:0[1-9]|10).
Combining these together gives the final regex 20(?:19-12-[23][0-9]|20-01-(?:0[1-9]|10))
Note that I'm assuming that the date you are testing against is a validly formatted date.

Try this:
(2019|2020)\-(12|01)\-([0-3][0-9]|[0-9])
But be aware that this will allow number up to where the first digit is between zero and three and the second digit between zero and nine for the dd value. You could specify all numbers you want to allow (from 20 to 10) like this (20|21|22|23|24|25|26|27|28|29|30|31|01|1|02|2|03|3|04|4|05|5|06|6|07|7|08|8|09|9|10).
(2019|2020)\-(12|01)\-(20|21|22|23|24|25|26|27|28|29|30|31|01|1|02|2|03|3|04|4|05|5|06|6|07|7|08|8|09|9|10)
But honestly... Regular-Expressions are not the right tool for this. RegExp gives a mask to something, not a logical context. Use regex to extract the data/value from a string and validate those values using another language.
The above 2nd Regex will, f.e. match your dates, but also values outside of this range since there is no context between 2019|2020 and the second group 12|01 so they match values like 2019-12-11 but also 2020-12-11.
To only match the values you want this will be a really large regex like this (inner brackets only if you need them) ((2019)-(12)-(20)|(2019)-(12)-(21)|(2019)-(12)-(22)|...) and continue with all possible dates - and ask yourself: what would you do if you find such a regex in a project you have to work with ;)
Better solution (quick and dirty, there might be better solutions):
(?<yyyy>20[0-9]{2})\-(?<mm>[01][0-9]|[0-9])\-(?<dd>[0-3][0-9]|[0-9])
This way you have three named groups (yyyy, mm, dd) you can access and validate the matched values... The regex is smaller, you have a better association between code and regex and both are easier to maintain.

Optimization of Regular Expression to match numbers bigger or equal to 50

I want to check if a number is 50 or more using a regular expression. This in itself is no problem but the number field has another regex checking the format of the entered number.
The number will be in the continental format: 123.456,78 (a dot between groups of three digits and always a comma with 2 digits at the end)
Examples:
100.000,00
50.000,00
50,00
34,34
etc.
I want to capture numbers which are 50 or more. So from the four examples above the first three should be matched.
I've come up with this rather complicated one and am wondering if there is an easier way to do this.
^(\d{1,3}[.]|[5-9][0-9]|\d{3}|[.]\d{1,3})*[,]\d{2}$
EDIT
I want to match continental numbers here. The numbers have this format due to internal regulations and specify a price.
Example: 1000 EUR would be written as 1.000,00 EUR
50000 as 50.000,00 and so on.

It's a matter of taste, obviously, but using a negative lookahead gives a simple solution.
^(?!([1-4]?\d),)[1-9](\d{1,2})?(\.\d{3})*,\d{2}\b
In words: starting from a boundary ignore all numbers that start with 1 digit OR 2 digits (the first being a 1,2,3 or 4), followed by a comma.
Check on regex101.com

Try:
EDIT ^(.{3,}|[5-9]\d),\d{2}$
It checks if:
there 3 chars or more before the ,
there are 2 numbers before the , and the first is between 5 and 9
and then a , and 2 numbers
Donno if it answer your question as it'll return true for:
aa50,00
1sdf,54
But this assumes that your original string is a number in the format you expect (as it was not a requirement in your question).
EDIT 3
The regex below tests if the number is valid referring to the continental format and if it's equal or greater than 50. See tests here.
Regex: ^((([1-9]\d{0,2}\.)(\d{3}\.){0,}\d{3})|([1-9]\d{2})|([5-9]\d)),\d{2}$
Explanation (d is a number):
([1-9]\d{0,2}\.): either d., dd. or ddd. one time with the first d between 1 and 9.
(\d{3}\.){0,}: ddd. zero or x time
\d{3}: ddd 3 digit
These 3 parts combined match any numbers equals or greater than 1000 like: 1.000, 22.002 or 100.000.000.
([1-9]\d{2}): any number between 100 and 999.
([5-9]\d)): a number between 5 and 9 followed by a number. Matches anything between 50 and 99.
So it's either the one of the parts above or this one.
Then ,\d{2}$ matches the comma and the two last digits.

I have named all inner groups, for better understanding what part of number is matched by each group. After you understand how it works, change all ?P<..> to ?:.
This one is for any dec number in the continental format.
^(?P<common_int>(?P<int>(?P<int_start>[1-9]\d{1,2}|[1-9]\d|[1-9])(?P<int_end>\.\d{3})*|0)(?!,)|(?P<dec_int_having_frac>(?P<dec_int>(?P<dec_int_start>[1-9]\d{1,2}|[1-9]\d|[1-9])(?P<dec_int_end>\.\d{3})*,)|0,|,)(?=\d))(?P<frac_from_comma>(?<=,)(?P<frac>(?P<frac_start>\d{3}\.)*(?P<frac_end>\d{1,3})))?$
test
This one is for the same with the limit number>=50
^(?P<common_int>(?P<int>(?P<int_start>[1-9]\d{1,2}|[1-9]\d|[1-9])(?P<int_end>\.\d{3})+|(?P<int_short>[1-9]\d{2}|[5-9]\d))(?!,)|(?P<dec_int_having_frac>(?P<dec_int>(?P<dec_int_start>[1-9]\d{1,2}|[1-9]\d|[1-9])(?P<dec_int_end>\.\d{3})+,)|(?P<dec_short_int>[1-9]\d{2}|[5-9]\d),)(?=\d))(?P<frac_from_comma>(?<=,)(?P<frac>(?P<frac_start>\d{3}\.)*(?P<frac_end>\d{1,3})))?$
tests
If you always have the integer part under 999.999 and fractal part always 2 digits, it will be a bit more simple:
^(?P<dec_int_having_frac>(?P<dec_int>(?P<dec_int_start>[1-9]\d{1,2}|[1-9]\d|[1-9])(?P<dec_int_end>\.\d{3})?,)|(?P<dec_short_int>[1-9]\d{2}|[5-9]\d),)(?=\d)(?P<frac_from_comma>(?<=,)(?P<frac>(?P<frac_end>\d{1,2})))?$
test

If you can guarantee that the number is correctly formed -- that is, that the regex isn't expected to detect that 5,0.1 is invalid, then there are a limited number of passing cases:
ends with \d{3}
ends with [5-9]\d
contains \d{3},
contains [5-9]\d,
It's not actually necessary to do anything with \.
The easiest regex is to code for each of these individually:
(\d{3}$|[5-9]\d$|\d{3},|[5-9]\d)
You could make it more compact and efficient by merging some of the cases:
(\d{3}[$,]|[5-9]\d[$,])
If you need to also validate the format, you will need extra complexity. I would advise against attempting to do both in a single regex.
However unless you have a very good reason for having to do this with a regex, I recommend against it. Parse the string into an integer, and compare it with 50.

Regex to validate only US and India retail number

I am trying to get a regex which serves below requirements:
Validates US, India retail phone number
Excludes special purpose/business purpose phone numbers in both countries. I.e. starting with 800, 888, 877, and 866, 900, at least 10 digits for US, there can be more guidelines but above is just for example.
It should validate special chars if any like (, ), +, 1, 0 if included but satisfies all this points than should be a valid phone number.
If preceded by STD, ISD consider it as valid.
Landline, mobile both should be valid.
I looked whether some came across the same requirements, but the solutions I am getting serve different requirements and not exactly the one I am looking for.

Without a definitive exclusion/inclusion list of the phone numbers you want to match, here is a "template" regular expression that you could use to match US numbers:
(?:^|\b)(\+?1[ -.\/]?)?\(?(?!37|950|958|959|96|976)[2-9]([0-8])(?!\2)\d(?:\) ?|[ -.\/])?[2-9](?!11)\d\d[ -.]?\d{4}(?:$|\b)
A break-down:
(?:^|\b): Start of string or break. This prevents, for example, the match of digits to start in the middle of a longer series of digits;
(\+?1[ -.\/]?)?: this matches an optional prefix of the US country code (i.e. 1), and accepts values like +1, 1/, +1, 1;
\(?: an optional opening bracket for the region code;
(?!37|950|958|959|96|976): exclusion list of region codes. When only 2 digits are given, any region code starting with those is rejected -- you'll need to extend this list to identify other "special business" phone numbers you want to exclude;
[2-9]: first digit of region code; cannot be 0 or 1;
([0-8]): second digit of region code; cannot be 9;
(?!\2)\d: third digit of region code; cannot be the same as the second digit (\2 refers to the second match group);
(?:\) ?|[ -.\/])?: optional separator: ),-,.,/, or space. If ), it can optionally be followed by a space;
[2-9]: first digit of exchange code; cannot be 0 or 1;
(?!11): exclusion for second and third digits of exchange code -- they cannot both be 1 at the same time;
\d\d: second and third digit of exchange code; no further limitations;
[ -.]?: optional separator; can be -, . or space;
\d{4}: four digit customer number; no restrictions.
(?:$|\b): End of string or break. This prevents, for example, the match of digits to stop in the middle of a longer series of digits;
Here is an online regex test.
I suppose with the above as inspiration, you could fine-tune it to your expectations, and add the Indian formats in the same manner. You can use the | operator to separate the two sub-regular expressions you will have, like (US|IND), where you need to replace those two arguments by real expressions of course.
To capture also the prefix STD or ISD, you can insert the following in the above regex, right after the break test:
(?:STD\b\s*|ISD\b\s*|)
...which matches these optional words followed by optional spaces.
However, the complexity of the final regex will increase the more precise you want to match and exclude invalid numbers. For example, if you would want to validate against the All India STD Code List, then your regular expression would get very long and hard to manage.

How Can I Create a RegEx Pattern that will Get N Words Using Custom Word Boundary?

I need a RegEx pattern that will return the first N words using a custom word boundary that is the normal RegEx white space (\s) plus punctuation like .,;:!?-*_
EDIT #1: Thanks for all your comments.
To be clear:
I'd like to set the characters that would be the word delimiters
Lets call this the "Delimiter Set", or strDelimiters
strDelimiters = ".,;:!?-*_"
nNumWordsToFind = 5
A word is defined as any contiguous text that does NOT contain any character in strDelimiters
The RegEx word boundary is any contiguous text that contains one or more of the characters in strDelimiters
I'd like to build the RegEx pattern to get/return the first nNumWordsToFind using the strDelimiters.
EDIT #2: Sat, Aug 8, 2015 at 12:49 AM US CT
#maraca definitely answered my question as originally stated.
But what I actually need is to return the number of words ≤ nNumWordsToFind.
So if the source text has only 3 words, but my RegEx asks for 4 words, I need it to return the 3 words. The answer provided by maraca fails if nNumWordsToFind > number of actual words in the source text.
For example:
one,two;three-four_five.six:seven eight nine! ten
It would see this as 10 words.
If I want the first 5 words, it would return:
one,two;three-four_five.
I have this pattern using the normal \s whitespace, which works, but NOT exactly what I need:
([\w]+\s+){<NumWordsOut>}
where <NumWordsOut> is the number of words to return.
I have also found this word boundary pattern, but I don't know how to use it:
a "real word boundary" that detects the edge between an ASCII letter
and a non-letter.
(?i)(?<=^|[^a-z])(?=[a-z])|(?<=[a-z])(?=$|[^a-z])
However, I would want my words to allow numbers as well.
IAC, I have not been able how to use the above custom word boundary pattern to return the first N words of my text.
BTW, I will be using this in a Keyboard Maestro macro.
Can anyone help?
TIA.

All you have to do is to adapt your pattern ([\w]+\s+){<NumWordsOut>} to, including some special cases:
^[\s.,;:!?*_-]*([^\s.,;:!?*_-]+([\s.,;:!?*_-]+|$)){<NumWordsOut>}
1. 2. 3. 4. 5.
Match any amount of delimiters before the first word
Match a word (= at least one non-delimiter)
The word has to be followed by at least one delimiter
Or it can be at the end of the string (in case no delimiter follows at the end)
Repeat 2. to 4. <NumWordsOut> times
Note how I changed the order of the -, it has to be at the start or end, otherwise it needs to be escaped: \-.

Thanks to #maraca for providing the complete answer to my question.
I just wanted to post the Keyboard Maestro macro that I have built using #maraca's RegEx pattern for anyone interested in the complete solution.
See KM Forum Macro: Get a Max of N Words in String Using RegEx

VIN Validation RegEx

I have written a VIN validation RegEx based on the http://en.wikipedia.org/wiki/Vehicle_identification_number but then when I try to run some tests it is not accepting some valid VIN Numbers.
My RegEx:
^[A-HJ-NPR-Za-hj-npr-z\\d]{8}[\\dX][A-HJ-NPR-Za-hj-npr-z\\d]{2}\\d{6}$
VIN Number Not Working:
1ftfw1et4bfc45903
WP0ZZZ99ZTS392124
VIN Numbers Working:
19uya31581l000000
1hwa31aa5ae006086
(I think the problem occurs with the numbers at the end, Wikipedia made it sound like it would end with only 6 numbers and the one that is not working but is a valid number only ends with 5)
Any Help Correcting this issue would be greatly appreciated!

I can't help you with a perfect regex for VIN numbers -- but I can explain why this one is failing in your example of 1ftfw1et4bfc45903:
^[A-HJ-NPR-Za-hj-npr-z\d]{8}[\dX][A-HJ-NPR-Za-hj-npr-z\d]{2}\d{6}$
Explanation:
^[A-HJ-NPR-Za-hj-npr-z\d]{8}
This allows for 8 characters, composed of any digits and any letters except I, O, and Q; it properly finds the first 8 characters:
1ftfw1et
[\dX]
This allows for 1 character, either a digit or a capital X; it properly finds the next character:
4
[A-HJ-NPR-Za-hj-npr-z\d]{2}
This allows for 2 characters, composed of any digits and any letters except I, O, and Q; it properly finds the next 2 characters:
bf
\d{6}$
This allows for exactly 6 digits, and is the reason the regex fails; because the final 6 characters are not all digits:
c45903

Dan is correct - VINs have a checksum. You can't utilize that in regex, so the best you can do with regex is casting too wide of a net. By that I mean that your regex will accept all valid VINs, and also around a trillion (rough estimate) non-VIN 17-character strings.
If you are working in a language with named capture groups, you can extract that data as well.
So, if your goal is:
Only to not reject valid VINs (letting in invalid ones is ok) then use Fransisco's answer, [A-HJ-NPR-Z0-9]{17}.
Not reject valid VINs, and grab info like model year, plant code, etc, then use this (note, you must use a language that can support named capture groups - off the top of my head: Perl, Python, Elixir, almost certainly others but not JS): /^(?<wmi>[A-HJ-NPR-Z\d]{3})(?<vds>[A-HJ-NPR-Z\d]{5})(?<check>[\dX])(?<vis>(?<year>[A-HJ-NPR-Z\d])(?<plant>[A-HJ-NPR-Z\d])(?<seq>[A-HJ-NPR-Z\d]{6}))$/ where the names are defined at the end of this answer.
Not reject valid VINs, and prevent some but not all invalid VINs, you can get specific like Pedro does.
Only accept valid VINs: you need to write code (just kidding, GitHub exists).
Capture group name key:
wmi - World manufacturer identifier
vds - Vehicle descriptor section
check - Check digit
vis - Vehicle identifier section
year - Model year
plant - Plant code
seq - Production sequence number

This regular expression is working fine for validating US VINs, including the one you described:
[A-HJ-NPR-Z0-9]{17}
Remember to make it case insensitive with flag i
Source: https://github.com/rfink/angular-vin

VIN should have only A-Z, 0-9 characters, but not I, O, or Q
Last 6 characters of VIN should be a number
VIN should be 17 characters long
You didn't specify which language you're using but the following regex can be used to validate a US VIN with php:
/^(?:([A-HJ-NPR-Z]){3}|\d{3})(?1){2}\d{2}(?:(?1)|\d)(?:\d|X)(?:(?1)+\d+|\d+(?1)+)\d{6}$/i

I feel regex is not the ideal validation. VINs have a built in check digit. https://en.wikibooks.org/wiki/Vehicle_Identification_Numbers_(VIN_codes)/Check_digit or http://www.vsource.org/VFR-RVF_files/BVINcalc.htm
I suggest you build an algorithm using this. (Untested algorithm example)

This should work, it is from splunk search, so there are some additional exclusions**
(?i)(?<VIN>[A-Z0-9^IOQioq_]{11}\d{6})

The NHTSA website provides the method used to calculate the 9th character checksum, if you're interested. It also provides lots of other useful data, such as which characters are allowed in which position, or how to determine whether the 10th character, if alphabetic, refers to a model year up to 1999 or a model year from 2010.
NHTSA VIN eCFR
Hope that helps.

Please, use this regular expression. It is shorter and works with all VIN types
(?=.*\d|[A-Z])(?=.*[A-Z])[A-Z0-9]{17}
I changed above formula by new below formula
(?=.*\d|=.*[A-Z])(?=.*[A-Z])[A-Z0-9]{17}
This regular expression consider any letter but at leats one digit, max 17 characters

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js