RegEx not matching date - regex

Why does the date RegEx:
^(19|20)\d\d([- /.])(0[1-9]|1[012])\2(0[1-9]|[12][0-9]|3[01])$
Does not match 1999/01-01?
Can't figure this out. Is it because of delimiters?

This regex uses a reference to the second captured group using \2. Your second captured group is:
^(19|20)\d\d([- /.])(0[1-9]|1[012])\2(0[1-9]|[12][0-9]|3[01])$*
^^^^^^^^
Which is the delimiter of your date, it could be any of the following - /.. As of now, your second delimiter have to be the same than the first one (because of the reference to the second capturing group) and that's why you can't match string of the format xxxxZxxYxx if Z != Y.
If you want such case to be matched, you can change your regex to:
^(19|20)\d\d[- /.](0[1-9]|1[012])[- /.](0[1-9]|[12][0-9]|3[01])$*
But note that if this is a sort of correct regex to find a date in the range 1900-2099, this will not allow you to test if the date is correct or not (it doesn't check the correct pairing of days number with month, ie: you can have 31 days in February).

Related

Regex to validate date format

I'm figuring out a way to validate my input date in Altva Mapforce which is in the format "YYYYMMDD".
I know to verify year I can use [0-9]{4} but I'm having trouble figuring out a way to "restrict" date range to "01-31" and month to "01-12". Please note "01" is valid while "1" should be invalid.
Can someone please provide a regex expression to validate this sort of input?
From searching internet I got one for month: ([1-9]|[12]\d|3[01]) but this one is valid for range 1-31. I want 01-31 and so on.
For a month from 1-12 adding a zero for a single digit 1-9:
(?:0[1-9]|1[012])
For a day 1-31 adding a zero for a single digit 1-9:
(?:0[1-9]|[12]\d|3[01])
Putting it all together with 4 digits to match a year (note that \d{4} can also match 0000 and 9999), enclosed in word boundaries \b to prevent a partial match with leading or trailing digits / word characters:
\b\d{4}(?:0[1-9]|1[012])(?:0[1-9]|[12]\d|3[01])\b
A variation, limiting the scope for a year to for example 1900 - 2099
\b(?:19|20)\d{2}(?:0[1-9]|1[012])(?:0[1-9]|[12]\d|3[01])\b
But note that this does not validate the date itself, it can for example also match 20210231. To validate a date, use a designated api for handling a date in the tool or code.

Matching multiple pattern without knowing if they are in the string

I am trying to find a regex in order to match with one pattern and then capture all the year dates following this pattern.
For example, I have the following strings and I am trying to get one regex expression to capture pattern 1 and then the following dates (max 2) if they exist.
"'pattern1' foobar 4 foo 1 bar foo 1900 and 2000"
"'pattern1' foobar 4 foo 1 bar foo 1900"
"'pattern1' foobar 4 foo 1 bar foo"
The following expression matches the first case but not if a date is removed:
('pattern1').*?(\d{4}).*?(\d{4})
Adding ? after the potential date groups only matches the pattern as it satisfies the expression with no match of dates:
('pattern1').*?(\d{4}).*?(\d{4})
Hence my issue is not being able to specify that a group can or can not be in the expression but match if it is
You could make the both parts optional and use word boundaries around the digits to prevent them being part of a larger word.
If you want to match more years, you would have to add more optional groups. In that case, I would suggest using an approach like in the answer of #Alexander Mashin.
('pattern1')(?:.*?(\b\d{4}\b)(?:.*?(\b\d{4}\b))?)?
Regex demo
If you must solve your problem with one regular expression, simply use ^'pattern1'|\d{4}. The first match will contain 'pattern1' in the beginning of the string, remaining ones, the years (post AD 999).
A more correct solution would be to match a line, containing 'pattern1' and dates, capturing 'pattern1' and the tail containing dates (e.g. ^(?<head>'pattern1')(?<tail>(?:.*?\d{4})+.*$), and then matching dates in the tail (just \d{4}). But the exact code depends on your environment.

How to creating a regex pattern in VBA to extract dates from string and exclude false matches

I am trying to use Regex to parse a series of strings to extract one or more text dates that may be in multiple formats. The strings will look something like the following:
24 Aug 2016: nno-emvirt010a/b; 16 Aug 2016 nnt-emvirt010a/b nnd-emvirt010a/b COSI-1.6.5
24.16 nno-emvirt010a/b nnt-emvirt010a/b nnd-emvirt010a/b EI.01.02.03\
9/23/16: COSI-1.6.5 Logs updated at /vobs/COTS/1.6.5/files/Status_2016-07-27.log, Status_2016-07-28.log, Status_2016-08-05.log, Status_2016-08-08.log
I am not concerned about validating the individual date fields; just extracting the date string. The part I am unable to figure out is how to not match on number sequences that match the pattern but aren’t dates (‘1.6.5’ in ex. (1) and 01.02.03 in ex. (2)) and dates that are part of a file name (2016-07-27 in ex. (3)). In each of these exception cases in my input data, the initial numbers are preceded by either a period(.), underscore (_) or dash (-), but I cannot determine how to use this to edit the pattern syntax to not match these strings.
The pattern I have that partially works is below. It will only ignore the non date matches if it starts with 1 digit as in example 1.
/[^_\.\(\/]\d{1,4}[/\-\.\s*]([1-9]|0[1-9]|[12][0-9]|3[01]|[a-z]{3})[/\-\.\s*]\d{1,4}/ig`
I am not sure about vba check if this works . seems they have given so much options : https://www.safaribooksonline.com/library/view/regular-expressions-cookbook/9781449327453/ch04s04.html
^(?:(1[0-2]|0?[1-9])/(3[01]|[12][0-9]|0?[1-9])|↵
(3[01]|[12][0-9]|0?[1-9])/(1[0-2]|0?[1-9]))/(?:[0-9]{2})?[0-9]{2}$
^(?:
# m/d or mm/dd
(1[0-2]|0?[1-9])/(3[01]|[12][0-9]|0?[1-9])
|
# d/m or dd/mm
(3[01]|[12][0-9]|0?[1-9])/(1[0-2]|0?[1-9])
)
# /yy or /yyyy
/(?:[0-9]{2})?[0-9]{2}$
According to the test strings you've presented, you can use the following regex
See this regex in use here
(?<=[^a-zA-Z\d.]|^)((?:\d{1,2}\s*[A-Z][a-z]{2}\s*\d+)|(?:(?:\d{1,2}\/){2}\d+)|(?:\d+(?:-\d{2}){2})|\d{2}\.\d{2})(?=[^a-zA-Z\d.])
This regex ensures that specific date formats are met and are preceded by nothing (beginning of the string) or by a non-word character (specifically a-z, A-Z, 0-9) or dot .. The date formats that will be matched are:
24 Aug 2016
24.16
9/23/16
The regex could be further manipulated to ensure numbers are in the proper range according to days/month, etc., however, I don't feel that is really necessary.
Edits
Edit 1
Since VBA doesn't support lookbehinds, you can use the following. The date is in capture group 1.
(?:[^a-zA-Z\d.]|^)((?:\d{1,2}\s*[A-Z][a-z]{2}\s*\d+)|(?:(?:\d{1,2}\/){2}\d+)|(?:\d+(?:-\d{2}){2})|\d{2}\.\d{2})(?=[^a-zA-Z\d.])
Edit 2
As per bulbus's comment below
(?:[^\w.]|^)((?:\d{1,2}\s*[A-Z][a-z]{2}\s*\d{2,4})|(?:(?:\d{‌1,2}\/){2}\d{2,4})|(‌​?:\d{2,4}(?:-\d{2}){‌​2})|\d{2}\.\d{2})
Took liberty to edit that a bit.
replaced [^a-zA-Z\d.] with [^\w.], comes with added advantage of excluding dates with _2016-07-28.log
Due to 1 removed trailing condition (?=[^a-zA-Z\d.]).
Forced year digits from \d+ to \d{2,4}
Edit 3
Due to added conditions of the regex, I've made the following edits (to improve upon both previous edits). As per the OP:
The edited pattern above works in all but 2 cases:
it does not find dates with the year first (ex. 2016/07/11)
if the date is contained within parenthesis in the string, it returns the left parenthesis as part of the date (ex. match = (8/20/2016)
Can you provide the edit to fix these?
In the below regexes, I've changed years to \d+ in order for it to work on any year greater than or equal to 0.
See the code in use here
(?:[^\w.]|^)((?:\d{1,2}\s+[A-Z][a-z]{2}\s+\d+)|(?:(?:\d{1,2}\/){2}\d+)|(?:\d+(?:\/\d{1,2}){2})|(?:\d+(?:-\d{2}){2})|\d{2}\.\d+)
This regex adds the possibility of dates in the XXXX/XX/XX format where the date may appear first.
The reason you are getting ( as a match before the regex is the nature of the Full Match. You need to, instead, grab the value of the first capture group and not the whole regex result. See this answer on how to grab submatches from a regex pattern in VBA.
Also, note that any additional date formats you need to catch need to be explicitly set in the regex. Currently, the regex supports the following date formats:
\d{1,2}\s+[A-Z][a-z]{2}\s+\d+
12 Apr 17
12 Apr 2017
(?:\d{1,2}\/){2}\d+
1/4/17
01/04/17
1/4/2017
01/04/2017
\d+(?:\/\d{1,2}){2}
17/04/01
2017/4/1
2017/04/01
17/4/1
\d+(?:-\d{2}){2}
17-04-01
2017-04-01
\d{2}\.\d+ - Although I'm not sure what this date format is even used for and how it could be considered efficient if it's missing month
24.16

Regular expression to match credit card expiration date

I have the following pattern which I'm trying to use to match credit card expiration dates:
(0[1-9]|1[0-2])\/?(([0-9]{4})|[0-9]{2}$)
and I'm testing on the following strings:
02/13
0213
022013
02/2013
02/203
02/2
02/20322
It should only match the first four strings, and the last 3 should not be a match as they are invalid. However the current pattern is also matching the last string. What am I doing wrong?
You're missing start of line anchor ^ and parenthesis are unmatched.
This should work:
re = /^(0[1-9]|1[0-2])\/?([0-9]{4}|[0-9]{2})$/;
OR using word boundaries:
re = /\b(0[1-9]|1[0-2])\/?([0-9]{4}|[0-9]{2})\b/;
Working Demo: http://regex101.com/r/gN5wH2
Since we're talking about a credit card expiration date, once you have validated the input date string using one of the fine regex expressions in the other answers, you'll certainly want to confirm that the date is not in the past.
To do so:
Express your input date string as YYYYMM. For example: 201409
Do the same for the current date. For example: 201312
Then simply compare the date strings lexicographically: For example: 201409 ge 201312.
In Perl, ge is the greater than or equal to string comparison operator. Note that as #Dan Cowell advised, credit cards typically expire on the last day of the expiry month, so it would be inappropriate to use the gt (greater than) operator.
Alternatively, if your language doesn't support comparing strings in this fashion, convert both strings to integers and instead do an arithmetic comparison.
Move a right paran:
^(0[1-9]|1[0-2])\/?(([0-9]{4}|[0-9]{2})$)
The end anchor wasn't being applied to the [0-9]{4} option, so more numbers were allowed.

Something concerns me about regular expression

I want to match some sub-string like date in month as "21st" or "22nd" or "23rd" in a string, so I made a regular expression using this pattern:
((\d{1,2})(st)|(nd)|(rd)|(th)).
I made these group because I want to do replace. But when I match some string like "Monday March 21st 2012", it always matches two sub-string: Mo'nd'ay March '21st' 2012.
So I am confused why it matches "Mo'nd'ay"?
Because you have a missing set of parenthesis. Try:
((\d{1,2})((st)|(nd)|(rd)|(th)))
What you had, matched:
(\d{1,2})(st)
OR (nd)
OR (rd)
OR (th)
You don't have correct parenthesis around your |s. You have ((\d{1,2})(st)|(nd)|(rd)|(th)), but you should have: (\d{1,2})(st|nd|rd|th).
You're matching the strings nd, rd, th, or (one or two digits followed by st).