Modify regex to match dates with ordinals "st", "nd", "rd", "th" - regex

How can the regex below be modified to match dates with ordinals on the day part? This regex matches "Jan 1, 2003 | February 29, 2004 | November 02, 3202" but I need it to match also: "Jan 1st, 2003 | February 29th, 2004 | November 02nd, 3202 | March 3rd, 2010"
^(?:(((Jan(uary)?|Ma(r(ch)?|y)|Jul(y)?|Aug(ust)?|Oct(ober)?|Dec(ember)?)\ 31)|((Jan(uary)?|Ma(r(ch)?|y)|Apr(il)?|Ju((ly?)|(ne?))|Aug(ust)?|Oct(ober)?|(Sept|Nov|Dec)(ember)?)\ (0?[1-9]|([12]\d)|30))|(Feb(ruary)?\ (0?[1-9]|1\d|2[0-8]|(29(?=,\ ((1[6-9]|[2-9]\d)(0[48]|[2468][048]|[13579][26])|((16|[2468][048]|[3579][26])00)))))))\,\ ((1[6-9]|[2-9]\d)\d{2}))
Thank you.

This will depend on your use case, but in the interest of pragmatism, you might do well to just match anything matching:
(1) any month name or abbreviation;
(2) whitespace;
(3) any one or two digits;
(4) whitespace;
(5) any st,nd,rd,th;
(6) whitespace OR comma + optional whitespace;
(7) any four digits;
I'm not sure what you're matching in, but if I had Jan 35nd,3001, I think I'd rather capture it now and invalidate it later than to just skip over it right at the get-go.
Also, depending on your data set, consider case sensitivity issues and common international English variants, like 1 Jan 2004 or 1st Jan, 2004 or January, 2004 etc.
line breaks added
^(?:j(?:an(?:uary)?|un(?:e)?|ul(?:y)?)?|feb(?:ruary)?|ma(?:r(?:ch)?|y)
|a(?:pr(?:il)?|ug(?:ust)?)|sep(?:t|tember)?|oct(?:ober)?|(?:nov|dec)(?:ember)?)
\s+\d{1,2}(?:st|nd|rd|th)?(?:\s+|,\s*)\d{4}\b
Even more pragmatic (and readable), unless you have a very bizarre dataset, is to allow anything after the common prefixes:
(?:jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)[a-z]*?\s+\d{1,2}(?:[a-z]{2})?(?:\s+|,\s*)\d{4}\b
Would this match octagenarianism 99xx, 0000 ? Yes. Is that likely to be an issue? I doubt it.

That regex is doing waaaaay too much. You'd be much better off using your language's equivalent of strptime(). However, the regex below will match ordinals:
^(?:(((Jan(uary)?|Ma(r(ch)?|y)|Jul(y)?|Aug(ust)?|Oct(ober)?|Dec(ember)?)\ 31(st)?)|((Jan(uary)?|Ma(r(ch)?|y)|Apr(il)?|Ju((ly?)|(ne?))|Aug(ust)?|Oct(ober)?|(Sept|Nov|Dec)(ember)?)\ (0?[1-9]|([12]\d)|30))(st|nd|rd|th)?|(Feb(ruary)?\ (0?[1-9]|1\d|2[0-8]|(29(th)?(?=,\ ((1[6-9]|[2-9]\d)(0[48]|[2468][048]|[13579][26])|((16|[2468][048]|[3579][26])00)))))(st|nd|rd|th)?))\,\ ((1[6-9]|[2-9]\d)\d{2}))
Note that it will also match things like "20nd" but the likelihood of encountering that in real data is way too low to bother caring in most cases.

Related

Regex to match some dates matching non-dates

I'm using some Regex to find date strings of the form Jan 12, 2015 or Feb 3, 1999.
The regex I'm using is \w+\s\d{1,2},\s\d{4} and it's working correctly, but the thing is that on the file are also some strings with the form:
Weg 58, 4047 or Strasse 1, 4482 and I also match them.
How can I avoid those non-date matches? My approach is:
The first string (the one of the month, Jan, Feb, etc.) has to have always length 3.
The year has to start with 1 or 2.
The thing is that I dont know how can I add these two options to my regex. Any help please?
You can make the test right here: https://regex101.com/r/bN2pO0/1
Thanks in advance.
Since the months won't change (ie: consistent values between January - Decemeber, we can put the 3 starting characters).
We can then use a OR | operator to select years starting with 1 or 2
/((Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\s\d{1,2},\s(1|2)\d{3})/ig
https://regex101.com/r/bN2pO0/3
Just as you used \d{1,2} to match a digit 1 or 2 times and \d{4} to match a digit 4 times, you can use \w{3} to match a word character 3 times.
For the year, you can use the pipe "or" operator |.
\w{3}\s\d{1,2},\s(?:1|2)\d{3}
Although, this will also match non-dates of form Abc xy, 1xyz
If you want, you can go with brute force approach or just get rid of regex and use code to capture the dates.
Brute force:
(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\s[0-2]?[0-9],\s[12]\d{3}

Regex for capturing different date formats

I'm tasked to capture date for itineraries in email message, but the dates given were all in different formats, I guess I need help to find out if there's any way to capture the following formats:
02 APR
APR 02
2 APR
APR 2
2nd APR
APR 2nd
2nd April
April 2nd
APR 12th
April 12th
12th April
April 13-16
13-16 April
APR 13-16
13-16 APR
April 13th-16th
13th-16th April
APR 13th-16th
13th-16th APR
I've tried numerous ways but just could not understand or fathom as I'm a
newbie to regex.
The closest I could get was using this:
(\d*)-(\d*) APR|April \d*\d*
EDIT- Found out that i`ve missed some more formats.
13th - 16th APR
13~16 April
13/16 APR
I`ve tried using the following:
(Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)\ *\d+(?:[nr]d|th|st)?(?: * \d+(?:[nr]d|th|st)?)?|\d+(?:[nr]d|th|st)?(?: . \d+(?:[nr]d|th|st)?)?\ *(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)
Could either capture dates with space or without space.
Is there a way to capture all formats, and split the dates with '-', '/','~' and output/write into a single standardize format?
(Group 1 Date)-Month (Group 2 Date)-Month eg: 13-Apr 16-Apr
Appreciate for your kind suggestions and comments.
You need to account for optional values. Here is an enhanced version matching your sample input:
/(\d+)(?:st|[nr]d|th)?-?(\d*)(?:st|[nr]d|th)?\s*Apr(?:il)?|Apr(?:il)?\s*(\d+)(?:st|[nr]d|th)?-?(\d*)(?:st|[nr]d|th)?/i
See the regex demo (note you need to use a case-insensitive modifier to match any variants of April)
Basically, there are 2 alternatives matching April and date ranges:
(\d+)(?:st|[nr]d|th)?-?(\d*)(?:st|[nr]d|th)?\s*Apr(?:il)? - 1+ digits followed with an optional st, nd, rd, th, followed with an optional hyphen, followed with 0+ digits, followed with optional st, etc. followed with 0+ whitespace and then Apr or April (case insensitive due to /i modifier)
| - or
Apr(?:il)?\s*(\d+)(?:st|[nr]d|th)?-?(\d*)(?:st|[nr]d|th)? - the same as above but swapped.
I came up with this Regex:
(?:APR|April)\ *\d+(?:[nr]d|th|st)?(?:-\d+(?:[nr]d|th|st)?)?|\d+(?:[nr]d|th|st)?(?:-\d+(?:[nr]d|th|st)?)?\ *(?:APR|April)
See details here: Regex101
Maybe it's overkill, but I came up with this regex that will match with any month:
(?:January|JAN|February|FEB|March|MAR|April|APR|May|MAY|June|JUN|July|JUL|August|AUG|September|SEP|October|OCT|November|NOV|December|DEC)\ *\d+(?:[nr]d|th|st)?(?:-\d+(?:[nr]d|th|st)?)?|\d+(?:[nr]d|th|st)?(?:-\d+(?:[nr]d|th|st)?)?\ *(?:January|JAN|February|FEB|March|MAR|April|APR|May|MAY|June|JUN|July|JUL|August|AUG|September|SEP|October|OCT|November|NOV|December|DEC)
Unreadable, check here if you want details: Regex101
Improved version using Wiktor Stribiżew's trick:
(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)\ *\d+(?:[nr]d|th|st)?(?:-\d+(?:[nr]d|th|st)?)?|\d+(?:[nr]d|th|st)?(?:-\d+(?:[nr]d|th|st)?)?\ *(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)
See details here: Regex101
It matches every monthes, it uses less steps (more efficient)
BUT, you need to make sure you're case insensitive
I came up with this:
(\d+(?:th|st|[nr]d)?(?:-\d+(?:th|st|[nr]d)?)?\s*(?:APR|April))|((?:APR|April)\s*\d+(?:th|st|[nr]d)?(?:-\d+(?:th|st|[nr]d)?)?)
Live Demo

Regular expression bracket mystery in R

I'm trying to use str_extract to find dates in a text document. However, I've run into a bit of a conundrum. Generally I expect dates to come in one of two forms: 1) June 15th, 1914 2) June 15, 1914. But when I try to build a pattern to catch both of these options, I get an NA result.
For example, if I try to str_extract("No. 1. June 20th, 1914.", "[:alpha:]{3,8} [0-9]{1,2}[[a-z]{2}]?, [0-9]{4}"), I get NA. But if I remove the brackets around [a-z]{2} it works. However, if I remove the brackets, I of course get an NA for the string "No. 1. June 20, 1914.". This does, however, work if I leave the brackets.
I can of course work around this by using a simple if/else if statement, but I'm curious as to why this isn't working, and if there is a better way to handle these combined cases.
If you're trying to extract dates, why not use the lubridate package?
> lubridate::mdy("No. 1. June 20th, 1914.")
[1] "1914-01-20 UTC"
(where mdy is telling lubridate that the date-data appears in month-day-year order).
It's not working because of the following reasons:
Your POSIX character class is not properly wrapped inside a bracketed expression.
You're trying to use a character class as an optional group construct.
Your regular expression fixed would look like:
x <- 'No. 1. June 20th, 1914.'
str_extract(x, '[[:alpha:]]{3,8} [0-9]{1,2}([a-z]{2})?, [0-9]{4}')
## [1] "June 20th, 1914"
You could modify your regular expression:
str_extract(x, '[a-zA-Z]+ \\d{1,2}([a-z]{2})?, \\d{4}')
>str_extract("No. 1. June 20, 1914.", "[[:alpha:]]{3,8} [[:digit:]]{1,2}.+?, [[:digit:]]{4}")
[1] "June 20, 1914"
> str_extract("No. 1. June 20th, 1914.", "[[:alpha:]]{3,8} [[:digit:]]{1,2}.+?, [[:digit:]]{4}")
[1] "June 20th, 1914"
As the . matches any character, the function returns the greatest possible sequence of any characters before ',' and then we use quantifiers + and ? for the condition

Parsing dates from string - regex

I'm terible with regex and I can't seem to wrap my head around this simple task.
I need to parse out the two dates in a string which always has one of two formats:
"Inquiry at your property for December 29, 2013 - January 03, 2014"
OR
"Inquiry at your property for 29 December , 2013 - 03 January, 2014"
the 2 different date formats are throwing me off. Any insights would be appreciated!
/(\d+ \w+, \d+|\w+ \d+, \d+)/ for example. Try it out on Rubular.
For sure, it would pickup more stuff, like 2013 NotReallyAMonth, 12345. But if you don't have things in the input that look like a date, but not actually a date this might work.
You could make the regexp stronger, but applying more restrictions on what is matched:
/(\d{2} (?:January|December), \d{4}|(?:January|December) \d{2}, \d{4})/
In this case the day is always two digits, the year is 4. Months are listed explicitly (you would have to list all of them).
Update: For ranges it would be a different regexp:
/((?:Jan|Dec) \d+ - \d+, \d{4})/
Obviously they can all be combined together:
/(\d{2} (?:January|December), \d{4}|(?:January|December) \d{2}, \d{4}|(?:Jan|Dec) \d+ - \d+, \d{4})/

Regular Expression to match valid dates

I'm trying to write a regular expression that validates a date. The regex needs to match the following
M/D/YYYY
MM/DD/YYYY
Single digit months can start with a leading zero (eg: 03/12/2008)
Single digit days can start with a leading zero (eg: 3/02/2008)
CANNOT include February 30 or February 31 (eg: 2/31/2008)
So far I have
^(([1-9]|1[012])[-/.]([1-9]|[12][0-9]|3[01])[-/.](19|20)\d\d)|((1[012]|0[1-9])(3[01]|2\d|1\d|0[1-9])(19|20)\d\d)|((1[012]|0[1-9])[-/.](3[01]|2\d|1\d|0[1-9])[-/.](19|20)\d\d)$
This matches properly EXCEPT it still includes 2/30/2008 & 2/31/2008.
Does anyone have a better suggestion?
Edit: I found the answer on RegExLib
^((((0[13578])|([13578])|(1[02]))[\/](([1-9])|([0-2][0-9])|(3[01])))|(((0[469])|([469])|(11))[\/](([1-9])|([0-2][0-9])|(30)))|((2|02)[\/](([1-9])|([0-2][0-9]))))[\/]\d{4}$|^\d{4}$
It matches all valid months that follow the MM/DD/YYYY format.
Thanks everyone for the help.
This is not an appropriate use of regular expressions. You'd be better off using
[0-9]{2}/[0-9]{2}/[0-9]{4}
and then checking ranges in a higher-level language.
Here is the Reg ex that matches all valid dates including leap years. Formats accepted mm/dd/yyyy or mm-dd-yyyy or mm.dd.yyyy format
^(?:(?:(?:0?[13578]|1[02])(\/|-|\.)31)\1|(?:(?:0?[1,3-9]|1[0-2])(\/|-|\.)(?:29|30)\2))(?:(?:1[6-9]|[2-9]\d)?\d{2})$|^(?:0?2(\/|-|\.)29\3(?:(?:(?:1[6-9]|[2-9]\d)?(?:0[48]|[2468][048]|[13579][26])|(?:(?:16|[2468][048]|[3579][26])00))))$|^(?:(?:0?[1-9])|(?:1[0-2]))(\/|-|\.)(?:0?[1-9]|1\d|2[0-8])\4(?:(?:1[6-9]|[2-9]\d)?\d{2})$
courtesy Asiq Ahamed
I landed here because the title of this question is broad and I was looking for a regex that I could use to match on a specific date format (like the OP). But I then discovered, as many of the answers and comments have comprehensively highlighted, there are many pitfalls that make constructing an effective pattern very tricky when extracting dates that are mixed-in with poor quality or non-structured source data.
In my exploration of the issues, I have come up with a system that enables you to build a regular expression by arranging together four simpler sub-expressions that match on the delimiter, and valid ranges for the year, month and day fields in the order you require.
These are :-
Delimeters
[^\w\d\r\n:]
This will match anything that is not a word character, digit character, carriage return, new line or colon. The colon has to be there to prevent matching on times that look like dates (see my test Data)
You can optimise this part of the pattern to speed up matching, but this is a good foundation that detects most valid delimiters.
Note however; It will match a string with mixed delimiters like this 2/12-73 that may not actually be a valid date.
Year Values
(\d{4}|\d{2})
This matches a group of two or 4 digits, in most cases this is acceptable, but if you're dealing with data from the years 0-999 or beyond 9999 you need to decide how to handle that because in most cases a 1, 3 or >4 digit year is garbage.
Month Values
(0?[1-9]|1[0-2])
Matches any number between 1 and 12 with or without a leading zero - note: 0 and 00 is not matched.
Date Values
(0?[1-9]|[12]\d|30|31)
Matches any number between 1 and 31 with or without a leading zero - note: 0 and 00 is not matched.
This expression matches Date, Month, Year formatted dates
(0?[1-9]|[12]\d|30|31)[^\w\d\r\n:](0?[1-9]|1[0-2])[^\w\d\r\n:](\d{4}|\d{2})
But it will also match some of the Year, Month Date ones. It should also be bookended with the boundary operators to ensure the whole date string is selected and prevent valid sub-dates being extracted from data that is not well-formed i.e. without boundary tags 20/12/194 matches as 20/12/19 and 101/12/1974 matches as 01/12/1974
Compare the results of the next expression to the one above with the test data in the nonsense section (below)
\b(0?[1-9]|[12]\d|30|31)[^\w\d\r\n:](0?[1-9]|1[0-2])[^\w\d\r\n:](\d{4}|\d{2})\b
There's no validation in this regex so a well-formed but invalid date such as 31/02/2001 would be matched. That is a data quality issue, and as others have said, your regex shouldn't need to validate the data.
Because you (as a developer) can't guarantee the quality of the source data you do need to perform and handle additional validation in your code, if you try to match and validate the data in the RegEx it gets very messy and becomes difficult to support without very concise documentation.
Garbage in, garbage out.
Having said that, if you do have mixed formats where the date values vary, and you have to extract as much as you can; You can combine a couple of expressions together like so;
This (disastrous) expression matches DMY and YMD dates
(\b(0?[1-9]|[12]\d|30|31)[^\w\d\r\n:](0?[1-9]|1[0-2])[^\w\d\r\n:](\d{4}|\d{2})\b)|(\b(0?[1-9]|1[0-2])[^\w\d\r\n:](0?[1-9]|[12]\d|30|31)[^\w\d\r\n:](\d{4}|\d{2})\b)
BUT you won't be able to tell if dates like 6/9/1973 are the 6th of September or the 9th of June. I'm struggling to think of a scenario where that is not going to cause a problem somewhere down the line, it's bad practice and you shouldn't have to deal with it like that - find the data owner and hit them with the governance hammer.
Finally, if you want to match a YYYYMMDD string with no delimiters you can take some of the uncertainty out and the expression looks like this
\b(\d{4})(0[1-9]|1[0-2])(0[1-9]|[12]\d|30|31)\b
But note again, it will match on well-formed but invalid values like 20010231 (31th Feb!) :)
Test data
In experimenting with the solutions in this thread I ended up with a test data set that includes a variety of valid and non-valid dates and some tricky situations where you may or may not want to match i.e. Times that could match as dates and dates on multiple lines.
I hope this is useful to someone.
Valid Dates in various formats
Day, month, year
2/11/73
02/11/1973
2/1/73
02/01/73
31/1/1973
02/1/1973
31.1.2011
31-1-2001
29/2/1973
29/02/1976
03/06/2010
12/6/90
month, day, year
02/24/1975
06/19/66
03.31.1991
2.29.2003
02-29-55
03-13-55
03-13-1955
12\24\1974
12\30\1974
1\31\1974
03/31/2001
01/21/2001
12/13/2001
Match both DMY and MDY
12/12/1978
6/6/78
06/6/1978
6/06/1978
using whitespace as a delimiter
13 11 2001
11 13 2001
11 13 01
13 11 01
1 1 01
1 1 2001
Year Month Day order
76/02/02
1976/02/29
1976/2/13
76/09/31
YYYYMMDD sortable format
19741213
19750101
Valid dates before Epoch
12/1/10
12/01/660
12/01/00
12/01/0000
Valid date after 2038
01/01/2039
01/01/39
Valid date beyond the year 9999
01/01/10000
Dates with leading or trailing characters
12/31/21/
31/12/1921AD
31/12/1921.10:55
12/10/2016 8:26:00.39
wfuwdf12/11/74iuhwf
fwefew13/11/1974
01/12/1974vdwdfwe
01/01/99werwer
12321301/01/99
Times that look like dates
12:13:56
13:12:01
1:12:01PM
1:12:01 AM
Dates that runs across two lines
1/12/19
74
01/12/19
74/13/1946
31/12/20
08:13
Invalid, corrupted or nonsense dates
0/1/2001
1/0/2001
00/01/2100
01/0/2001
0101/2001
01/131/2001
31/31/2001
101/12/1974
56/56/56
00/00/0000
0/0/1999
12/01/0
12/10/-100
74/2/29
12/32/45
20/12/194
2/12-73
Maintainable Perl 5.10 version
/
(?:
(?<month> (?&mon_29)) [\/] (?<day>(?&day_29))
| (?<month> (?&mon_30)) [\/] (?<day>(?&day_30))
| (?<month> (?&mon_31)) [\/] (?<day>(?&day_31))
)
[\/]
(?<year> [0-9]{4})
(?(DEFINE)
(?<mon_29> 0?2 )
(?<mon_30> 0?[469] | (11) )
(?<mon_31> 0?[13578] | 1[02] )
(?<day_29> 0?[1-9] | [1-2]?[0-9] )
(?<day_30> 0?[1-9] | [1-2]?[0-9] | 30 )
(?<day_31> 0?[1-9] | [1-2]?[0-9] | 3[01] )
)
/x
You can retrieve the elements by name in this version.
say "Month=$+{month} Day=$+{day} Year=$+{year}";
( No attempt has been made to restrict the values for the year. )
To control a date validity under the following format :
YYYY/MM/DD or YYYY-MM-DD
I would recommand you tu use the following regular expression :
(((19|20)([2468][048]|[13579][26]|0[48])|2000)[/-]02[/-]29|((19|20)[0-9]{2}[/-](0[4678]|1[02])[/-](0[1-9]|[12][0-9]|30)|(19|20)[0-9]{2}[/-](0[1359]|11)[/-](0[1-9]|[12][0-9]|3[01])|(19|20)[0-9]{2}[/-]02[/-](0[1-9]|1[0-9]|2[0-8])))
Matches
2016-02-29 | 2012-04-30 | 2019/09/31
Non-Matches
2016-02-30 | 2012-04-31 | 2019/09/35
You can customise it if you wants to allow only '/' or '-' separators.
This RegEx strictly controls the validity of the date and verify 28,30 and 31 days months, even leap years with 29/02 month.
Try it, it works very well and prevent your code from lot of bugs !
FYI : I made a variant for the SQL datetime. You'll find it there (look for my name) : Regular Expression to validate a timestamp
Feedback are welcomed :)
Sounds like you're overextending regex for this purpose. What I would do is use a regex to match a few date formats and then use a separate function to validate the values of the date fields so extracted.
Perl expanded version
Note use of /x modifier.
/^(
(
( # 31 day months
(0[13578])
| ([13578])
| (1[02])
)
[\/]
(
([1-9])
| ([0-2][0-9])
| (3[01])
)
)
| (
( # 30 day months
(0[469])
| ([469])
| (11)
)
[\/]
(
([1-9])
| ([0-2][0-9])
| (30)
)
)
| ( # 29 day month (Feb)
(2|02)
[\/]
(
([1-9])
| ([0-2][0-9])
)
)
)
[\/]
# year
\d{4}$
| ^\d{4}$ # year only
/x
Original
^((((0[13578])|([13578])|(1[02]))[\/](([1-9])|([0-2][0-9])|(3[01])))|(((0[469])|([469])|(11))[\/](([1-9])|([0-2][0-9])|(30)))|((2|02)[\/](([1-9])|([0-2][0-9]))))[\/]\d{4}$|^\d{4}$
if you didn't get those above suggestions working, I use this, as it gets any date I ran this expression through 50 links, and it got all the dates on each page.
^20\d\d-(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)-(0[1-9]|[1-2][0-9]|3[01])$
This regex validates dates between 01-01-2000 and 12-31-2099 with matching separators.
^(0[1-9]|1[012])([- /.])(0[1-9]|[12][0-9]|3[01])\2(19|20)\d\d$
var dtRegex = new RegExp(/[1-9\-]{4}[0-9\-]{2}[0-9\-]{2}/);
if(dtRegex.test(date) == true){
var evalDate = date.split('-');
if(evalDate[0] != '0000' && evalDate[1] != '00' && evalDate[2] != '00'){
return true;
}
}
Regex was not meant to validate number ranges(this number must be from 1 to 5 when the number preceding it happens to be a 2 and the number preceding that happens to be below 6).
Just look for the pattern of placement of numbers in regex. If you need to validate is qualities of a date, put it in a date object js/c#/vb, and interogate the numbers there.
I know this does not answer your question, but why don't you use a date handling routine to check if it's a valid date? Even if you modify the regexp with a negative lookahead assertion like (?!31/0?2) (ie, do not match 31/2 or 31/02) you'll still have the problem of accepting 29 02 on non leap years and about a single separator date format.
The problem is not easy if you want to really validate a date, check this forum thread.
For an example or a better way, in C#, check this link
If you are using another platform/language, let us know
Perl 6 version
rx{
^
$<month> = (\d ** 1..2)
{ $<month> <= 12 or fail }
'/'
$<day> = (\d ** 1..2)
{
given( +$<month> ){
when 1|3|5|7|8|10|12 {
$<day> <= 31 or fail
}
when 4|6|9|11 {
$<day> <= 30 or fail
}
when 2 {
$<day> <= 29 or fail
}
default { fail }
}
}
'/'
$<year> = (\d ** 4)
$
}
After you use this to check the input the values are available in $/ or individually as $<month>, $<day>, $<year>. ( those are just syntax for accessing values in $/ )
No attempt has been made to check the year, or that it doesn't match the 29th of Feburary on non leap years.
If you're going to insist on doing this with a regular expression, I'd recommend something like:
( (0?1|0?3| <...> |10|11|12) / (0?1| <...> |30|31) |
0?2 / (0?1| <...> |28|29) )
/ (19|20)[0-9]{2}
This might make it possible to read and understand.
/(([1-9]{1}|0[1-9]|1[0-2])\/(0[1-9]|[1-9]{1}|[12]\d|3[01])\/[12]\d{3})/
This would validate for following -
Single and 2 digit day with range from 1 to 31. Eg, 1, 01, 11, 31.
Single and 2 digit month with range from 1 to 12. Eg. 1, 01, 12.
4 digit year. Eg. 2021, 1980.
A slightly different approach that may or may not be useful for you.
I'm in php.
The project this relates to will never have a date prior to the 1st of January 2008. So, I take the 'date' inputed and use strtotime(). If the answer is >= 1199167200 then I have a date that is useful to me. If something that doesn't look like a date is entered -1 is returned. If null is entered it does return today's date number so you do need a check for a non-null entry first.
Works for my situation, perhaps yours too?