String Split AND Replace - regex

I am trying to replace a string based on the split portion. This string is a date, where the year should be formatted as a superscript.
Eg. Jan 24, 2014 needs to be split at 2014 then replaced with Jan 24, ^2014^ where 2014 is the superscript.
Example pseudo:
mydate.Split(" ", 2).Replace("^2014^")
But, instead of replacing the new split string, it should be the original (or copy of original). I can't just edit based on index because the formatting may not always be the same, at times the date may be expanded to January 24th, 2014 which would then break the traditional replace by index.

You can try
(?<=[A-Z][a-z]{2} \d{2}, )(\d{4})
Replaced with ^$1^ or ^\1^
Here is online demo and tested it on regexstorm
If you want to match January 24th, 2014 as well then try
([A-Z][a-z]{2,9} \d{2}[a-z]{0,2}, )(\d{4})
Replaced with $1^$2^
Here is demo

You can use a combination of lookarounds to achieve your result.
Regex.Replace(input, "(?<=\d{4})|(?=\d{4})", "^")
Explanation:
(?<= # look behind to see if there is:
\d{4} # digits (0-9) (4 times)
) # end of look-behind
| # OR
(?= # look ahead to see if there is:
\d{4} # digits (0-9) (4 times)
) # end of look-ahead
Live Demo

Normalize you date string by assigning it to a Date variable, then do the formatting from there.
Dim dt As Date = "Jan 24, 2014"
Dim s As String = dt.ToShortDateString.Replace("2014", "^2014^")
MsgBox(s)
' or '
s = dt.Month.ToString & "/" & dt.Day.ToString & "/^" & dt.Year.ToString & "^"
MsgBox(s)
IMO RegEx is write once code and is difficult to debug/maintain.

Related

Complex Regex finding date and time

Is there someone to help me with the following:
I'm trying to find specific date and time strings in a text (to be used within VBA Word).
Currently working with the following RegEx string:
(?:([0-9]{1,2})[ |-])?(?:(jan(?:uari)?|feb(?:ruari)?|m(?:aa)?rt|apr(?:il)?|mei|jun(?:i)?|jul(?:i)?|aug(?:ustus)?|sep(?:tember|t)?|okt(?:ober)?|nov(?:ember)?|dec(?:ember)?))?(?: |-)?(?(3)(?: around | at | ))?(?:([0-9]{1,2}:[0-9]{1,2})?(?: uur| u|u)?)?
Tested output on following text:
date with around time: 26 sep 2016 around 09:00u
date with at time: 1 sep 2016 at 09:00 uur
date and time u: 1 sep 2018 09:00 u
time without date: 08:30 uur
date with time u: 1 sep 2016 at 09:00u
only time: 09:00
only month: jan
month and year: feb 2019
only day: 02
only day with '-': 2-
day and month: 2 jan
month year: jan 2018
date with '-': 2-feb-2018 09:00
other month: 01 sept 2016
full month: 1 september 2018
shortened year: jul '18
Rules:
a date followed by time is valid
a date followed by text 'around' or 'at', followed by time is valid
a date without day number is valid
a date without year is valid
a date, month only is not valid
a day, without month or year not valid
a date may contain dashes '-'
a year may be shortenend with ', like jun '18
month name can be short or long
full match includes ' uur' or 'u' (to highlight the text in ms-Word)
submatches text from capture are without prepending or trailing spaces
example at: [https://regex101.com/r/6CFgBP/1/]
Expected output (when using in VBA Word):
An regex Matches collection object in which each Match.SubMatches contains the individual items d, m, y, hh:mm from the capture groups in the regex search string.
So for example 1: the Submatches (or capture groups) contains values: '26' ','sep','2016','09:00'
The RegEx works fine, but some false-positives need to be excluded:
In case there is a day without month/year, should be excluded from Regex (example 9 and 10)
In case there is a month without day, should be excluded (example 7)
(I was trying with som lookahead and reference \1 and ?(1), but was not able to get it running properly...)
Any advice highly appreciated!
As I understood, you require that each date/time part (day, month, year, hour
and minute) must be present.
So you should remove ? after relevant groups (they are not optional).
It is also a good practice to have each group captured as a relevant capturing group.
There is no need to write something like jun(?:i)?. It is enough
(and easier to read) when you write just juni? (the ? refers just
to preceding i).
Another hint: As the regex language contains \d char class, use just
it instead of [0-9] (the regex is shorter and easier to read.
Optional parts (at / around) should be an optional and non-capturing group.
Anything after the minute part is not needed in the regex.
So I propose a regex like below (for readability, I divided it into rows):
(\d{1,2})[ -](jan(?:uari)?|feb(?:ruari)?|m(?:aa)?rt|apr(?:il)?|mei|juni?
|juli?|aug(?:ustus)?|sep(?:tember|t)?|okt(?:ober)?|nov(?:ember)?|dec(?:ember)?)
[ -](\d{4}) (?:around |at )?(\d{1,2}:\d{1,2})
Details:
(\d{1,2}) - Day.
[ -] - A separator after the day (either a space or a minus).
(jan(?:uari)?|...dec(?:ember)?) - Month.
[ -] - A separator after the month.
(\d{4}) - year.
(?:around |at )? - Actually, 3 variants of a separator between year
and hour (space / around / at), note the space before (...)?.
(\d{1,2}:\d{1,2}) - Hour and minute.
It matches variants 1, 2, 3, 5 and 13.
All remaining fail to contain each required part, so they are not matched.
If you allow e.g. that the hour/minute part is optional, change the respective fragment
into:
( (?:around |at )?(\d{1,2}:\d{1,2}))?
i.e. surround the space/around/at / hour / minute part with ( and )?,
making this part an optional group. Then, variants 14 and 15 will also
be matched.
One more extension: If you also allow the hour/minute part alone,
add |(\d{1,2}:\d{1,2}) to the regex (all before is the first variant and
the added part is the second variant for just hour/minute.
Then, your variants No 4 and 6 will also be matched.
For a working example see https://regex101.com/r/33t1ps/1
Edit
Following your list of rules, I propose the following regex:
(\d{1,2}[ -])? - Day + separator, optional.
(jan(?:uari)?|...|dec(?:ember)?) - Month.
(?:[ -](\d{4}|'\d{2}))? - Separator + year (either 4 or 2 digits with "'").
( (?:around |at )?(\d{1,2}:\d{1,2}))? - Separator + hour/minute -
optional end of variant 1.
|(\d{1,2}:\d{1,2}) - Variant 2 - only hour and minute.
It does not match only your variants No 9 and 10.
For full regex, including also "uur" see https://regex101.com/r/33t1ps/3
Finally I found something that helps me using the month properly :-)
\b(?:([1-3]|[0-3]\d)[ |-](?'month'(?:[1-9]|\d[12])|(?:jan(?:uari)?|feb(?:ruari)?|m(?:aa)?rt|apr(?:il)?|mei|jun(?:i)?|jul(?:i)?|aug(?:ustus)?|sep(?:tember|t)?|okt(?:ober)?|nov(?:ember)?|dec(?:ember)?))?)?(?:(\g'month')[ |-]((?:19|20|\')(?:\d{2})))?\b(?: omstreeks | om | )?(?:(\d{1,2}[:]\d{2}(?: uur|u)?|[0-2]\d{3}(?: uur|u)))?\b
It uses a named constructor/subroutine. Found here:
https://www.regular-expressions.info/subroutine.html

Excluding text at the beginning of a string

I'm new to using RegEx and I'm still stumbling around a bit, so I'm sorry if this is a basic question. I'm trying to extract a string from between two parenthesis and I can't seem to figure out how to exclude the first part from my match.
This is my regex pattern:
(.+?)(?= -)
I want to extract a birth date, for example, excluding the "b." and the training "-". Here's a sample set:
( b. circa 1883 - d. Mar 03, 1960 )
( b. May 21, 1887 - d. Jan 24, 1979 )
( b. May 28, 1902 Zembin, BELARUS - d. Dec 22, 1998 Florida, USA )
( b. Jan 09, 1886 Philadelphia, Pennsylvania, USA - d. May 17, 1969 New York, New York, USA )
My regex matches ( b. Jan 09, 1886 Philadelphia, Pennsylvania, USA (for example) but also includes "( b. " prefix, which I want to exclude.
The regex also matches the following text, which I would like to exclude as well:
Husband of Sarah Wilder (August 2000
Also, I cannot get the following string to match, presumably because of the dot and space in St. Louis.
( b. Jun 28, 1920 St. Louis, Missouri, USA )
I've been banging my head for several hours and just can't quite get the rest of it. Any help or guidance would be very much appreciated. I've already gotten a lot of help from reading many of the posts here.
Thanks so much!
Assuming that your data always contains a hyphen followed by d., you can try this: (?<=b\. )(.*) - d\.
(?<=b\. ) matches the b. text without it being added to the matching text.
(.*) is a capturing group that contains the match. It captures everything until the terminating - d. is hit. Note that the . characters must be escaped to match correctly as they are regex special characters.
If it always starts with ( b. and end with - d. <something> ), you can simply do
(?<=^\( b\. ).*(?= - d\..*\))
Which actually means you are match any characters (.*), with <start of line>( b. in front of it ((?<=^\( b\. )), and with - d. <something>) behind it ((?= - d\..*\))). https://regex101.com/r/vB2fmP/1
Or, if you don't mind using matching group:
^\( b\. (.*) - d \..*\)$
^ start of line
\( b\. open parenthesis, space, b, dot, space
( ) capture group
.* any char, any occurence
- d \..*\) space, hyphen, space, d, dot,
then any char any occurrence,
close parenthesis,
$ end of line
and capture group 1 is the value you need (personally I prefer this one instead).
To prevent capturing the leading ( b. you could prefix your regex with \(\s*b\.\s* which will match the ( and the b. surrounded by zero or more whitespace characters \s*.
Then from that point you would capture your values in a group (.*?) and you could update your positive lookahead (?= (?:\-|\))) to include a whitespace with either a - or a ).
\(\s*b\.\s*(.*?)(?= (?:\-|\)))
You can do this be making two passes through the search string. On the first pass you capture all text inside brackets, and on the second you clean up your results by removing the unwanted expressions. You don't say what language you are using, so I will use PHP.
$want = "/\(.+?\)/";
$dontWant = "/(b/.|/-)/";
$desiredResult = array();
$result = preg_match_all($want, $searchText, $matches); // Get all text inside brackets
if (count($matches[0])>0) { // $matches[0] holds all the matches
foreach ($matches[0] as $match) { // Loop through the matches
$desiredResult[] = preg_replace( $dontWant, "", $match); // Remove unwanted text
}
}
You can adjust this to whatever language you are using.

Validate Month Year Format

I need to validate Text box in this format (ex:FEB 2014 MMM YYYY).
I am using the following regular expression string
^(JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC)\-\d{4}$
Only issue is that my input is with a 'space' and not '-' i.e. JUN 2012 not JUN-2012
Can someone please amend the above regex to cater for space
Thanks
Try the below regex to match month and year in this MMM YYYY format ,
^(?:JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC) \d{4}$
DEMO
use \s instead of \- in your regex
like this :
^(JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC)\s\d{4}$
#Avinash is right \s matches [\r\n\t\f] better use " " instead.

Regex pattern for HH:MM:SS time string

I want to parse a hh:mm:ss string.
A simple one is ([0-1]?\d|2[0-3]):([0-5]?\d):([0-5]?\d) which expects 2:3:24 or 02:03:24 string.
I want to take it a step further and pass the validation even in cases like
if you enter just 56, it should be pass, as 56 can be considered as 56 secs [SS]
if you enter 2:3 or 02:03 or 02:3 or 2:03 it should pass. 2 minutes and 3 seconds [MM:SS]
If you enter 20:30:12 pass with 20 hrs, 30 minutes and 12 secs [HH:MM:SS]
if you enter 78:12 , do not pass 78 minutes is wrong....
Basically, if one ":" is found, consider number before ":" as MM and number after ":" as SS
. If two ":" are found consider as HH:MM:SS
I came up with this pattern.
(^([0-1]?\d|2[0-3]):([0-5]?\d):([0-5]?\d)$)|(^([0-5]?\d):([0-5]?\d)$)|(^[0-5]?\d$)
It seems to be working fine. I wanted to know any other simpler regular expression, that can do the job.
^(?:(?:([01]?\d|2[0-3]):)?([0-5]?\d):)?([0-5]?\d)$
Explanation:
^ # Start of string
(?: # Try to match...
(?: # Try to match...
([01]?\d|2[0-3]): # HH:
)? # (optionally).
([0-5]?\d): # MM: (required)
)? # (entire group optional, so either HH:MM:, MM: or nothing)
([0-5]?\d) # SS (required)
$ # End of string
#Tim Pietzcker covers the OP's requirement for a HH:MM:SS parser where SS was mandatory, i.e.
HH:MM:SS
MM:SS
SS
If you permit me to deviate from the OP's requirement for a bit, and consider a case where HH is mandatory, i.e.
HH
HH:MM
HH:MM:SS
The regex I came up with was:
^([0-1]?\d|2[0-3])(?::([0-5]?\d))?(?::([0-5]?\d))?$
Let's break it down:
([0-1]?\d|2[0-3]) - matches for hours
(?::([0-5]?\d))? - optionally matches for minutes
(?::([0-5]?\d))? - optionally matches for seconds

Regular Expression to match valid dates

I'm trying to write a regular expression that validates a date. The regex needs to match the following
M/D/YYYY
MM/DD/YYYY
Single digit months can start with a leading zero (eg: 03/12/2008)
Single digit days can start with a leading zero (eg: 3/02/2008)
CANNOT include February 30 or February 31 (eg: 2/31/2008)
So far I have
^(([1-9]|1[012])[-/.]([1-9]|[12][0-9]|3[01])[-/.](19|20)\d\d)|((1[012]|0[1-9])(3[01]|2\d|1\d|0[1-9])(19|20)\d\d)|((1[012]|0[1-9])[-/.](3[01]|2\d|1\d|0[1-9])[-/.](19|20)\d\d)$
This matches properly EXCEPT it still includes 2/30/2008 & 2/31/2008.
Does anyone have a better suggestion?
Edit: I found the answer on RegExLib
^((((0[13578])|([13578])|(1[02]))[\/](([1-9])|([0-2][0-9])|(3[01])))|(((0[469])|([469])|(11))[\/](([1-9])|([0-2][0-9])|(30)))|((2|02)[\/](([1-9])|([0-2][0-9]))))[\/]\d{4}$|^\d{4}$
It matches all valid months that follow the MM/DD/YYYY format.
Thanks everyone for the help.
This is not an appropriate use of regular expressions. You'd be better off using
[0-9]{2}/[0-9]{2}/[0-9]{4}
and then checking ranges in a higher-level language.
Here is the Reg ex that matches all valid dates including leap years. Formats accepted mm/dd/yyyy or mm-dd-yyyy or mm.dd.yyyy format
^(?:(?:(?:0?[13578]|1[02])(\/|-|\.)31)\1|(?:(?:0?[1,3-9]|1[0-2])(\/|-|\.)(?:29|30)\2))(?:(?:1[6-9]|[2-9]\d)?\d{2})$|^(?:0?2(\/|-|\.)29\3(?:(?:(?:1[6-9]|[2-9]\d)?(?:0[48]|[2468][048]|[13579][26])|(?:(?:16|[2468][048]|[3579][26])00))))$|^(?:(?:0?[1-9])|(?:1[0-2]))(\/|-|\.)(?:0?[1-9]|1\d|2[0-8])\4(?:(?:1[6-9]|[2-9]\d)?\d{2})$
courtesy Asiq Ahamed
I landed here because the title of this question is broad and I was looking for a regex that I could use to match on a specific date format (like the OP). But I then discovered, as many of the answers and comments have comprehensively highlighted, there are many pitfalls that make constructing an effective pattern very tricky when extracting dates that are mixed-in with poor quality or non-structured source data.
In my exploration of the issues, I have come up with a system that enables you to build a regular expression by arranging together four simpler sub-expressions that match on the delimiter, and valid ranges for the year, month and day fields in the order you require.
These are :-
Delimeters
[^\w\d\r\n:]
This will match anything that is not a word character, digit character, carriage return, new line or colon. The colon has to be there to prevent matching on times that look like dates (see my test Data)
You can optimise this part of the pattern to speed up matching, but this is a good foundation that detects most valid delimiters.
Note however; It will match a string with mixed delimiters like this 2/12-73 that may not actually be a valid date.
Year Values
(\d{4}|\d{2})
This matches a group of two or 4 digits, in most cases this is acceptable, but if you're dealing with data from the years 0-999 or beyond 9999 you need to decide how to handle that because in most cases a 1, 3 or >4 digit year is garbage.
Month Values
(0?[1-9]|1[0-2])
Matches any number between 1 and 12 with or without a leading zero - note: 0 and 00 is not matched.
Date Values
(0?[1-9]|[12]\d|30|31)
Matches any number between 1 and 31 with or without a leading zero - note: 0 and 00 is not matched.
This expression matches Date, Month, Year formatted dates
(0?[1-9]|[12]\d|30|31)[^\w\d\r\n:](0?[1-9]|1[0-2])[^\w\d\r\n:](\d{4}|\d{2})
But it will also match some of the Year, Month Date ones. It should also be bookended with the boundary operators to ensure the whole date string is selected and prevent valid sub-dates being extracted from data that is not well-formed i.e. without boundary tags 20/12/194 matches as 20/12/19 and 101/12/1974 matches as 01/12/1974
Compare the results of the next expression to the one above with the test data in the nonsense section (below)
\b(0?[1-9]|[12]\d|30|31)[^\w\d\r\n:](0?[1-9]|1[0-2])[^\w\d\r\n:](\d{4}|\d{2})\b
There's no validation in this regex so a well-formed but invalid date such as 31/02/2001 would be matched. That is a data quality issue, and as others have said, your regex shouldn't need to validate the data.
Because you (as a developer) can't guarantee the quality of the source data you do need to perform and handle additional validation in your code, if you try to match and validate the data in the RegEx it gets very messy and becomes difficult to support without very concise documentation.
Garbage in, garbage out.
Having said that, if you do have mixed formats where the date values vary, and you have to extract as much as you can; You can combine a couple of expressions together like so;
This (disastrous) expression matches DMY and YMD dates
(\b(0?[1-9]|[12]\d|30|31)[^\w\d\r\n:](0?[1-9]|1[0-2])[^\w\d\r\n:](\d{4}|\d{2})\b)|(\b(0?[1-9]|1[0-2])[^\w\d\r\n:](0?[1-9]|[12]\d|30|31)[^\w\d\r\n:](\d{4}|\d{2})\b)
BUT you won't be able to tell if dates like 6/9/1973 are the 6th of September or the 9th of June. I'm struggling to think of a scenario where that is not going to cause a problem somewhere down the line, it's bad practice and you shouldn't have to deal with it like that - find the data owner and hit them with the governance hammer.
Finally, if you want to match a YYYYMMDD string with no delimiters you can take some of the uncertainty out and the expression looks like this
\b(\d{4})(0[1-9]|1[0-2])(0[1-9]|[12]\d|30|31)\b
But note again, it will match on well-formed but invalid values like 20010231 (31th Feb!) :)
Test data
In experimenting with the solutions in this thread I ended up with a test data set that includes a variety of valid and non-valid dates and some tricky situations where you may or may not want to match i.e. Times that could match as dates and dates on multiple lines.
I hope this is useful to someone.
Valid Dates in various formats
Day, month, year
2/11/73
02/11/1973
2/1/73
02/01/73
31/1/1973
02/1/1973
31.1.2011
31-1-2001
29/2/1973
29/02/1976
03/06/2010
12/6/90
month, day, year
02/24/1975
06/19/66
03.31.1991
2.29.2003
02-29-55
03-13-55
03-13-1955
12\24\1974
12\30\1974
1\31\1974
03/31/2001
01/21/2001
12/13/2001
Match both DMY and MDY
12/12/1978
6/6/78
06/6/1978
6/06/1978
using whitespace as a delimiter
13 11 2001
11 13 2001
11 13 01
13 11 01
1 1 01
1 1 2001
Year Month Day order
76/02/02
1976/02/29
1976/2/13
76/09/31
YYYYMMDD sortable format
19741213
19750101
Valid dates before Epoch
12/1/10
12/01/660
12/01/00
12/01/0000
Valid date after 2038
01/01/2039
01/01/39
Valid date beyond the year 9999
01/01/10000
Dates with leading or trailing characters
12/31/21/
31/12/1921AD
31/12/1921.10:55
12/10/2016 8:26:00.39
wfuwdf12/11/74iuhwf
fwefew13/11/1974
01/12/1974vdwdfwe
01/01/99werwer
12321301/01/99
Times that look like dates
12:13:56
13:12:01
1:12:01PM
1:12:01 AM
Dates that runs across two lines
1/12/19
74
01/12/19
74/13/1946
31/12/20
08:13
Invalid, corrupted or nonsense dates
0/1/2001
1/0/2001
00/01/2100
01/0/2001
0101/2001
01/131/2001
31/31/2001
101/12/1974
56/56/56
00/00/0000
0/0/1999
12/01/0
12/10/-100
74/2/29
12/32/45
20/12/194
2/12-73
Maintainable Perl 5.10 version
/
(?:
(?<month> (?&mon_29)) [\/] (?<day>(?&day_29))
| (?<month> (?&mon_30)) [\/] (?<day>(?&day_30))
| (?<month> (?&mon_31)) [\/] (?<day>(?&day_31))
)
[\/]
(?<year> [0-9]{4})
(?(DEFINE)
(?<mon_29> 0?2 )
(?<mon_30> 0?[469] | (11) )
(?<mon_31> 0?[13578] | 1[02] )
(?<day_29> 0?[1-9] | [1-2]?[0-9] )
(?<day_30> 0?[1-9] | [1-2]?[0-9] | 30 )
(?<day_31> 0?[1-9] | [1-2]?[0-9] | 3[01] )
)
/x
You can retrieve the elements by name in this version.
say "Month=$+{month} Day=$+{day} Year=$+{year}";
( No attempt has been made to restrict the values for the year. )
To control a date validity under the following format :
YYYY/MM/DD or YYYY-MM-DD
I would recommand you tu use the following regular expression :
(((19|20)([2468][048]|[13579][26]|0[48])|2000)[/-]02[/-]29|((19|20)[0-9]{2}[/-](0[4678]|1[02])[/-](0[1-9]|[12][0-9]|30)|(19|20)[0-9]{2}[/-](0[1359]|11)[/-](0[1-9]|[12][0-9]|3[01])|(19|20)[0-9]{2}[/-]02[/-](0[1-9]|1[0-9]|2[0-8])))
Matches
2016-02-29 | 2012-04-30 | 2019/09/31
Non-Matches
2016-02-30 | 2012-04-31 | 2019/09/35
You can customise it if you wants to allow only '/' or '-' separators.
This RegEx strictly controls the validity of the date and verify 28,30 and 31 days months, even leap years with 29/02 month.
Try it, it works very well and prevent your code from lot of bugs !
FYI : I made a variant for the SQL datetime. You'll find it there (look for my name) : Regular Expression to validate a timestamp
Feedback are welcomed :)
Sounds like you're overextending regex for this purpose. What I would do is use a regex to match a few date formats and then use a separate function to validate the values of the date fields so extracted.
Perl expanded version
Note use of /x modifier.
/^(
(
( # 31 day months
(0[13578])
| ([13578])
| (1[02])
)
[\/]
(
([1-9])
| ([0-2][0-9])
| (3[01])
)
)
| (
( # 30 day months
(0[469])
| ([469])
| (11)
)
[\/]
(
([1-9])
| ([0-2][0-9])
| (30)
)
)
| ( # 29 day month (Feb)
(2|02)
[\/]
(
([1-9])
| ([0-2][0-9])
)
)
)
[\/]
# year
\d{4}$
| ^\d{4}$ # year only
/x
Original
^((((0[13578])|([13578])|(1[02]))[\/](([1-9])|([0-2][0-9])|(3[01])))|(((0[469])|([469])|(11))[\/](([1-9])|([0-2][0-9])|(30)))|((2|02)[\/](([1-9])|([0-2][0-9]))))[\/]\d{4}$|^\d{4}$
if you didn't get those above suggestions working, I use this, as it gets any date I ran this expression through 50 links, and it got all the dates on each page.
^20\d\d-(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)-(0[1-9]|[1-2][0-9]|3[01])$
This regex validates dates between 01-01-2000 and 12-31-2099 with matching separators.
^(0[1-9]|1[012])([- /.])(0[1-9]|[12][0-9]|3[01])\2(19|20)\d\d$
var dtRegex = new RegExp(/[1-9\-]{4}[0-9\-]{2}[0-9\-]{2}/);
if(dtRegex.test(date) == true){
var evalDate = date.split('-');
if(evalDate[0] != '0000' && evalDate[1] != '00' && evalDate[2] != '00'){
return true;
}
}
Regex was not meant to validate number ranges(this number must be from 1 to 5 when the number preceding it happens to be a 2 and the number preceding that happens to be below 6).
Just look for the pattern of placement of numbers in regex. If you need to validate is qualities of a date, put it in a date object js/c#/vb, and interogate the numbers there.
I know this does not answer your question, but why don't you use a date handling routine to check if it's a valid date? Even if you modify the regexp with a negative lookahead assertion like (?!31/0?2) (ie, do not match 31/2 or 31/02) you'll still have the problem of accepting 29 02 on non leap years and about a single separator date format.
The problem is not easy if you want to really validate a date, check this forum thread.
For an example or a better way, in C#, check this link
If you are using another platform/language, let us know
Perl 6 version
rx{
^
$<month> = (\d ** 1..2)
{ $<month> <= 12 or fail }
'/'
$<day> = (\d ** 1..2)
{
given( +$<month> ){
when 1|3|5|7|8|10|12 {
$<day> <= 31 or fail
}
when 4|6|9|11 {
$<day> <= 30 or fail
}
when 2 {
$<day> <= 29 or fail
}
default { fail }
}
}
'/'
$<year> = (\d ** 4)
$
}
After you use this to check the input the values are available in $/ or individually as $<month>, $<day>, $<year>. ( those are just syntax for accessing values in $/ )
No attempt has been made to check the year, or that it doesn't match the 29th of Feburary on non leap years.
If you're going to insist on doing this with a regular expression, I'd recommend something like:
( (0?1|0?3| <...> |10|11|12) / (0?1| <...> |30|31) |
0?2 / (0?1| <...> |28|29) )
/ (19|20)[0-9]{2}
This might make it possible to read and understand.
/(([1-9]{1}|0[1-9]|1[0-2])\/(0[1-9]|[1-9]{1}|[12]\d|3[01])\/[12]\d{3})/
This would validate for following -
Single and 2 digit day with range from 1 to 31. Eg, 1, 01, 11, 31.
Single and 2 digit month with range from 1 to 12. Eg. 1, 01, 12.
4 digit year. Eg. 2021, 1980.
A slightly different approach that may or may not be useful for you.
I'm in php.
The project this relates to will never have a date prior to the 1st of January 2008. So, I take the 'date' inputed and use strtotime(). If the answer is >= 1199167200 then I have a date that is useful to me. If something that doesn't look like a date is entered -1 is returned. If null is entered it does return today's date number so you do need a check for a non-null entry first.
Works for my situation, perhaps yours too?