I have been looking at date regular expressions for a while not but whhenever i come up with a regeular expression there always seems to be a problem.
Expression is below
^(0?[1-9]|[12][0-9]|3[01])
any help on what a date regex looks like for the DDMMM format?
I read the below example from somewhere.
Let's say we want to match a date in mm/dd/yy format, but we want to leave the user the choice of date separators.
The quick solution is \d\d.\d\d.\d\d. Seems fine at first.
\d\d[- /.]\d\d[- /.]\d\d is a better solution. This regex allows a dash, space, dot and forward slash as date separators.
This regex is still far from perfect. It matches 99/99/99 as a valid date.
[0-1]\d[- /.][0-3]\d[- /.]\d\d is a step ahead, though it will still match 19/39/99.
Still a better one,
[0-1][1-9][- /. ]?(0[1-9]|[12][0-9]|3[01])[- /.]?(18|19|20|21)\d\d$
I came up with this, it's long, but it covers most of the situations, except leap years:
^((0?[1-9]|[12][0-8])[/.-](0[1-9]|1[0-2]))|((29|30)[/.-](0[13-9]|1[0-2]))|(31[/.-](0[13578]|1[02]))$
Every month has 28 days (you'll have to check for leap years with some sort of function inside your code).
(0?[1-9]|[12][0-8]) # days : 01 to 28
[/.-] # separator
(0[1-9]|1[0-2]) # months : 1 to 12
Every month except February, has 29 and 30 days:
(29|30) # days : 29 or 30
[/.-] #separator
(0[13-9]|1[0-2]) # months: 1, and 3 to 12
Only some months have 31 days: (January, March, May, July, August, October and December)
31 # days : 31
[/.-] # separator
(0[13578]|1[02]) # months: 01, 03, 05, 07, 08, 10, 12
Related
How do I capture the days of months as numbers, excluding any suffixes. For instance - January 11th would be 11, and March 25th would be 25.
You could use the regex string and then only use the 3rd capturing group.
We accept 3 letter months Jan 1st and full name January 1st and accept space, hyphen,comma or slash as in Jan 01 Jan-01 Jan,1st Jan/31
(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)(uary|ruary|ch|il|e|y|ust|tember|ember)?[ \/,-]{0,2}([0-3]?[0-9])
You would do better to look for native time manipulation if possible.
Is there someone to help me with the following:
I'm trying to find specific date and time strings in a text (to be used within VBA Word).
Currently working with the following RegEx string:
(?:([0-9]{1,2})[ |-])?(?:(jan(?:uari)?|feb(?:ruari)?|m(?:aa)?rt|apr(?:il)?|mei|jun(?:i)?|jul(?:i)?|aug(?:ustus)?|sep(?:tember|t)?|okt(?:ober)?|nov(?:ember)?|dec(?:ember)?))?(?: |-)?(?(3)(?: around | at | ))?(?:([0-9]{1,2}:[0-9]{1,2})?(?: uur| u|u)?)?
Tested output on following text:
date with around time: 26 sep 2016 around 09:00u
date with at time: 1 sep 2016 at 09:00 uur
date and time u: 1 sep 2018 09:00 u
time without date: 08:30 uur
date with time u: 1 sep 2016 at 09:00u
only time: 09:00
only month: jan
month and year: feb 2019
only day: 02
only day with '-': 2-
day and month: 2 jan
month year: jan 2018
date with '-': 2-feb-2018 09:00
other month: 01 sept 2016
full month: 1 september 2018
shortened year: jul '18
Rules:
a date followed by time is valid
a date followed by text 'around' or 'at', followed by time is valid
a date without day number is valid
a date without year is valid
a date, month only is not valid
a day, without month or year not valid
a date may contain dashes '-'
a year may be shortenend with ', like jun '18
month name can be short or long
full match includes ' uur' or 'u' (to highlight the text in ms-Word)
submatches text from capture are without prepending or trailing spaces
example at: [https://regex101.com/r/6CFgBP/1/]
Expected output (when using in VBA Word):
An regex Matches collection object in which each Match.SubMatches contains the individual items d, m, y, hh:mm from the capture groups in the regex search string.
So for example 1: the Submatches (or capture groups) contains values: '26' ','sep','2016','09:00'
The RegEx works fine, but some false-positives need to be excluded:
In case there is a day without month/year, should be excluded from Regex (example 9 and 10)
In case there is a month without day, should be excluded (example 7)
(I was trying with som lookahead and reference \1 and ?(1), but was not able to get it running properly...)
Any advice highly appreciated!
As I understood, you require that each date/time part (day, month, year, hour
and minute) must be present.
So you should remove ? after relevant groups (they are not optional).
It is also a good practice to have each group captured as a relevant capturing group.
There is no need to write something like jun(?:i)?. It is enough
(and easier to read) when you write just juni? (the ? refers just
to preceding i).
Another hint: As the regex language contains \d char class, use just
it instead of [0-9] (the regex is shorter and easier to read.
Optional parts (at / around) should be an optional and non-capturing group.
Anything after the minute part is not needed in the regex.
So I propose a regex like below (for readability, I divided it into rows):
(\d{1,2})[ -](jan(?:uari)?|feb(?:ruari)?|m(?:aa)?rt|apr(?:il)?|mei|juni?
|juli?|aug(?:ustus)?|sep(?:tember|t)?|okt(?:ober)?|nov(?:ember)?|dec(?:ember)?)
[ -](\d{4}) (?:around |at )?(\d{1,2}:\d{1,2})
Details:
(\d{1,2}) - Day.
[ -] - A separator after the day (either a space or a minus).
(jan(?:uari)?|...dec(?:ember)?) - Month.
[ -] - A separator after the month.
(\d{4}) - year.
(?:around |at )? - Actually, 3 variants of a separator between year
and hour (space / around / at), note the space before (...)?.
(\d{1,2}:\d{1,2}) - Hour and minute.
It matches variants 1, 2, 3, 5 and 13.
All remaining fail to contain each required part, so they are not matched.
If you allow e.g. that the hour/minute part is optional, change the respective fragment
into:
( (?:around |at )?(\d{1,2}:\d{1,2}))?
i.e. surround the space/around/at / hour / minute part with ( and )?,
making this part an optional group. Then, variants 14 and 15 will also
be matched.
One more extension: If you also allow the hour/minute part alone,
add |(\d{1,2}:\d{1,2}) to the regex (all before is the first variant and
the added part is the second variant for just hour/minute.
Then, your variants No 4 and 6 will also be matched.
For a working example see https://regex101.com/r/33t1ps/1
Edit
Following your list of rules, I propose the following regex:
(\d{1,2}[ -])? - Day + separator, optional.
(jan(?:uari)?|...|dec(?:ember)?) - Month.
(?:[ -](\d{4}|'\d{2}))? - Separator + year (either 4 or 2 digits with "'").
( (?:around |at )?(\d{1,2}:\d{1,2}))? - Separator + hour/minute -
optional end of variant 1.
|(\d{1,2}:\d{1,2}) - Variant 2 - only hour and minute.
It does not match only your variants No 9 and 10.
For full regex, including also "uur" see https://regex101.com/r/33t1ps/3
Finally I found something that helps me using the month properly :-)
\b(?:([1-3]|[0-3]\d)[ |-](?'month'(?:[1-9]|\d[12])|(?:jan(?:uari)?|feb(?:ruari)?|m(?:aa)?rt|apr(?:il)?|mei|jun(?:i)?|jul(?:i)?|aug(?:ustus)?|sep(?:tember|t)?|okt(?:ober)?|nov(?:ember)?|dec(?:ember)?))?)?(?:(\g'month')[ |-]((?:19|20|\')(?:\d{2})))?\b(?: omstreeks | om | )?(?:(\d{1,2}[:]\d{2}(?: uur|u)?|[0-2]\d{3}(?: uur|u)))?\b
It uses a named constructor/subroutine. Found here:
https://www.regular-expressions.info/subroutine.html
In QML2 I didn't find any Calender control and I have implemented a control which takes date and time as input and I am using the regular expression for the validation which matches dates including leap year and other validations.
The main problem is space/backspace should also be considered as a valid for example:
\s\s/\s\s/\s\s \s\s:\s\s:\s\s
Following is the code :
TextField{
id:textEditDate
width:parent.width * 0.50
height:parent.height
text : "01/01/2017 00:00:00"
inputMask: "99/99/9999 99:99:99"
validator: RegExpValidator { regExp: /^(((([0\s][1-9\s]|[1\s][0-9\s]|[2\s][0-8\s])[\/]([0\s][1-9\s]|[1\s][012\s]))|((29|30|31)[\/]([0\s][13578\s]|[1\s][02\s]))|((29|30)[\/]([0\s][4,6,9]|11)))[\/]([19\s[2-9\s][0-9\s])\d\d|(^29[\/]02[\/]([19\s]|[2-9\s][0-9\s])(00|04|08|12|16|20|24|28|32|36|40|44|48|52|56|60|64|68|72|76|80|84|88|92|96)))\s([0-1\s]?[0-9\s]|2[0-3\s]):([0-5\s][0-9\s]):([0-5\s][0-9\s])$/}
horizontalAlignment: Text.AlignHCenter
inputMethodHints: Qt.ImhDigitsOnly
}
Now, everything works well except for the year and I am not able to match backspace/space for the year and user is not able to clear the year.
Can you please suggest how to achieve this ? or is there any other method to do this.
Answer
Brief
So I decided to make a really nice regex that actually works on leap years properly! I then added the rest of the logic you required, and voila, a beauty!
Code
See regex in use here
(?(DEFINE)
(?# Date )
(?# Day ranges )
(?<d_day28>0[1-9]|1\d|2[0-8])
(?<d_day29>0[1-9]|1\d|2\d)
(?<d_day30>0[1-9]|1\d|2\d|30)
(?<d_day31>0[1-9]|1\d|2\d|3[01])
(?# Month specifications )
(?<d_month28>02)
(?<d_month29>02)
(?<d_month30>0[469]|11)
(?<d_month31>0[13578]|1[02])
(?# Year specifications )
(?<d_year>\d+)
(?<d_yearLeap>(?:\d*?(?:(?:(?!00)[02468][048]|[13579][26])|(?:(?:[02468][048]|[13579][26])00))|[48]00|[48])(?=\D))
(?# Valid date formats )
(?<d_format>
(?&d_day28)\/(?&d_month28)\/(?&d_year)|
(?&d_day29)\/(?&d_month29)\/(?&d_yearLeap)|
(?&d_day30)\/(?&d_month30)\/(?&d_year)|
(?&d_day31)\/(?&d_month31)\/(?&d_year)
)
(?# Time )
(?# Time properties )
(?<t_period12>(?i)[ap]m|[ap]\.m\.(?-i))
(?# Hours )
(?<t_hours12>0\d|1[01])
(?<t_hours24>[01]\d|2[0-3])
(?# Minutes )
(?<t_minutes>[0-5]\d)
(?# Seconds )
(?<t_seconds>[0-5]\d)
(?# Milliseconds )
(?<t_milliseconds>\d{3})
(?# Valid time formats )
(?<t_format>
(?&t_hours12):(?&t_minutes):(?&t_seconds)(?:\.(?&t_milliseconds))?\ ?(?&t_period12)|
(?&t_hours24):(?&t_minutes):(?&t_seconds)(?:\.(?&t_milliseconds))?
)
(?# Datetime )
(?<dt_format>(?&d_format)\ (?&t_format))
)
\b(?&dt_format)\b
Or in one line...
See regex in use here
\b(?:(?:0[1-9]|1\d|2[0-8])\/(?:02)\/(?:\d+)|(?:0[1-9]|1\d|2\d)\/(?:02)\/(?:(?:\d*?(?:(?:(?!00)[02468][048]|[13579][26])|(?:(?:[02468][048]|[13579][26])00))|[48]00|[48])(?=\D))|(?:0[1-9]|1\d|2\d|30)\/(?:0[469]|11)\/(?:\d+)|(?:0[1-9]|1\d|2\d|3[01])\/(?:0[13578]|1[02])\/(?:\d+))\ (?:(?:0\d|1[01]):(?:[0-5]\d):(?:[0-5]\d)(?:\.(?:\d{3}))?\ ?(?:(?i)[ap]m|[ap]\.m\.(?-i))|(?:[01]\d|2[0-3]):(?:[0-5]\d):(?:[0-5]\d)(?:\.(?:\d{3}))?)\b
Explanation
I'll explain the first version as the second version is simply a slimmed down version of it. Note that the regex can easily be changed to accommodate for more formats (only 1 format with slight variations is accepted, but this is a very customizable regex).
d_days28: Match any number from 01 to 28
d_days29: Match any number from 01 to 29
d_days30: Match any number from 01 to 30
d_days31: Match any number from 01 to 31
d_month28: Match months that may only have 28 days (February - thus 02)
d_month29: Match months that may only have 29 days (February - thus 02)
d_month30: Match months that only have 30 days (April, June, September, November - thus 04, 06, 09, 11)
d_month31: Match months that only have 31 days (January, March, May, July, August, October, December - thus 01, 03, 05, 07, 08, 10, 12)
d_year: Match any year (must have at least one digit \d)
d_yearLeap: I'll break this into multiple segments for better clarity
\d*?
Match any number of digits, but as few as possible
Match one of the following
(?:(?:(?!00)[02468][048]|[13579][26])|(?:(?:[02468][048]|[13579][26])00)) - Match one of the following
(?:(?!00)[02468][048]|[13579][26]) - Match one of the following
One of 02468, followed by one of 048, but not 00
One of 13579, followed by one of 26
(?:(?:[02468][048]|[13579][26])00) - Match one of the following, followed by 00
One of 02468, followed by one of 048
One of 13579, followed by one of 26
[48]00 - Match 400 or 800
[48] - Match 4 or 8
(?=\D|\b) - Ensure what follows is either a non-digit character \D or word boundary character \b
d_format: This points to previous groups in order to ensure months are properly formatted and match the days/month and days/year(leap year) requirements so that we can ensure proper date validation
t_period: This was added in case others needed this for validation purposes
Ensures the period is either am, pm, a.m, p.m or their respective uppercase versions (including things such as a.M where multliple cases are used)
t_hours12: Match any hour from 00 to 11
t_hours24: Match any hour from 00 to 23
t_minutes: Match any minutes from 00 to 59
t_seconds: Match any seconds from 00 to 59
t_milliseconds: Match any 3 digits (000 to 999)
t_format: This points to previous groups in order to ensure time is properly formatted. I've added an additional time setting (as well as an addition including milliseconds and time period for others' use)
dt_format: Datetime format to check against (in your case it's date time - separation by a space character)
Following the define block is \b(?&dt_format)\b, which simply matches the dt_format as specified above, ensuring what precedes and supercedes it is a word boundary character (or no character) \b
Leap year
To further understand the leap year section of the regex...
I am assuming the following:
All years are NOT leap years, unless, the following is true
((Year modulo 4 is 0) AND (year modulo 100 is not 0)) OR (year modulo 400 is 0)
Source: leap year calculation
Leap years have always existed (at least since year 1) - since I don't want to start assuming and do even more research.
The regex works by ensuring:
All leap years that end in 0, 4, 8 are preceded by a 0, 2, 4, 6, 8 (all of which result in 0 after modulus -> i.e. 24 % 4 = 0)
All leap years that end in 2, 6 are preceded by a 1, 3, 5, 7, 9 (all of which result in 0 after modulus -> i.e. 32 % 4 = 0)
All leap years that end in 00, for 1. and 2., are negated ((?!00) does this)
All leap years that end in 00 are preceded by 1. and 2. (exactly the same since 4 * 100 = 400 - nothing needs to be changed except the last two digits)
Add the years 400, 800, 4, 8 since they are not satisfied by any of the above conditions
Edits
October 25th, 2017
Thanks to #sln for the input on the leap year's functionality. The regex below performs slightly faster due to changes provided in the comments of this answer by sln (on a separate question). Changed (?:(?!00)[02468][048]|[13579][26]) to (?:0[48]|[13579][26]|[2468][048]) in the leap year section.
See regex in use here
(?(DEFINE)
(?# Date )
(?# Day ranges )
(?<d_day28>0[1-9]|1\d|2[0-8])
(?<d_day29>0[1-9]|1\d|2\d)
(?<d_day30>0[1-9]|1\d|2\d|30)
(?<d_day31>0[1-9]|1\d|2\d|3[01])
(?# Month specifications )
(?<d_month28>02)
(?<d_month29>02)
(?<d_month30>0[469]|11)
(?<d_month31>0[13578]|1[02])
(?# Year specifications )
(?<d_year>\d+)
(?<d_yearLeap>(?:\d*?(?:(?:0[48]|[13579][26]|[2468][048])|(?:(?:[02468][048]|[13579][26])00))|[48]00|[48])(?=\D|\b))
(?# Valid date formats )
(?<d_format>
(?&d_day28)\/(?&d_month28)\/(?&d_year)|
(?&d_day29)\/(?&d_month29)\/(?&d_yearLeap)|
(?&d_day30)\/(?&d_month30)\/(?&d_year)|
(?&d_day31)\/(?&d_month31)\/(?&d_year)
)
(?# Time )
(?# Time properties )
(?<t_period12>(?i)[ap]m|[ap]\.m\.(?-i))
(?# Hours )
(?<t_hours12>0\d|1[01])
(?<t_hours24>[01]\d|2[0-3])
(?# Minutes )
(?<t_minutes>[0-5]\d)
(?# Seconds )
(?<t_seconds>[0-5]\d)
(?# Milliseconds )
(?<t_milliseconds>\d{3})
(?# Valid time formats )
(?<t_format>
(?&t_hours12):(?&t_minutes):(?&t_seconds)(?:\.(?&t_milliseconds))?\ ?(?&t_period12)|
(?&t_hours24):(?&t_minutes):(?&t_seconds)(?:\.(?&t_milliseconds))?
)
(?# Datetime )
(?<dt_format>(?&d_format)\ (?&t_format))
)
\b(?&dt_format)\b
Your sequence to match the year is:
([19\s[2-9\s][0-9\s])\d\d
Which looks malformed, as the brackets do not match.
Also, the presence of the two digits (using \d) means that the expression will not match white space.
RegEx to check for valid dates following ISO 8601, SQL standard.
Has a range from 1000-9999
Checks for Invalid Dates
Checks for valid leap year dates
Format: YYYY-MM-DD HH:MM:SS
^([1-9]\d{3}[\-.](0[13578]|1[02])[\-.](0[1-9]|[12][0-9]|3[01]) ([01]\d|2[0123]):([012345]\d):([012345]\d))|([1-9]\d{3}[\-.](0[469]|11)[\-.](0[1-9]|[12][0-9]|30) ([01]\d|2[0123]):([012345]\d):([012345]\d))|([1-9]\d{3}[\-.](02)[\-.](0[1-9]|1[0-9]|2[0-8]) ([01]\d|2[0123]):([012345]\d):([012345]\d))|(((([1-9]\d)(0[48]|[2468][048]|[13579][26])|(([2468][048]|[13579][26])00)))[\-.](02)[\-.]29 ([01]\d|2[0123]):([012345]\d):([012345]\d))$
Date can be easily changed to format: DD/MM/YYY
Also, replace the "-" separator by "/"
I have made a regex, which matches all months perfectly. Well, perfectly as far as I can see. It matches 01-end of each month, and I cannot seem to generate a false month, unless I enter something like the 32nd of March, which is an invalid date.
Anyway, what I need to do is match the last yy of the regex. If yy ends in a number that can be divided by 4, such as 20, 24, 16, etc, it should ONLY make 2902yy valid. Since I am not checking yyyy, I cannot check if the year is 1900 or 2000, which both ends in 00. Here you can see my current regex:
(((((0[1-9]|1[0-9]|2[0-9]|30)|31)(0[13789]|(10|12)))|(((0[1-9]|1[0-9]|2[0-9]|30))(0[34569]|11))|(((0[1-9]|1[0-9]|2[0-7])|(28|29))02))(0?[0-9]|[1-9][0-9]){2})
Check out my regex and matches here: http://regexr.com/3buc2
Should not match:
290291 because there is no leap year ending in xx91
Should match:
290292 because there is a leap year in 1992/1892/1792
Get what I mean? How can I possibly do that to my regex? Also, can my regex be optimized? \d instead of [0-9] could be done, but it's slower because it matches numbers in different encodings too, and I only need to match 0-9.
Using regular expressions for this is madness, or at least borderline. But here is a sketch at a solution.
Days 00 through 28 should always be okay.
Day 30 should be okay if the month is not 02.
Day 31 should be okay if the month is not 02, 04, 06, 09, or 11.
Day 29 should be okay if the month is not 02 or the year is a leap year.
Since you only have two digits for the year, we assume you only want to operate in the current century. The leap years are the years which are divisible by 4. (There are some complications, but they do not apply in this century, because 2000 is evenly divisible by 400 as well as by 100.)
So we can enumerate the years which are leap years: 00, 04, 08, 12, 16, 20, ...
If the first digit in the two-digit year is an even number, then the year is a leap year if the second digit is 0, 4, or 8.
If the first digit is odd, the year is a leap year if the second digit is 2 or 6.
([01][0-9]|2[0-8])(0[0-9]|1[0-2])[0-9][0-9]|
30(0[013-9]|1[0-2])[0-9][0-9]|
31(0[13578]|1[02])[0-9][0-9]|
29((0[013-9]|1[0-2])[0-9][0-9]|02([0246][048]|[13579][26]))
Note that you will need a different regex for the years 1900-1999 because the leap years were different then (in particular, 1900 was not a leap year, because it is not divisible by 400.)
I'm terible with regex and I can't seem to wrap my head around this simple task.
I need to parse out the two dates in a string which always has one of two formats:
"Inquiry at your property for December 29, 2013 - January 03, 2014"
OR
"Inquiry at your property for 29 December , 2013 - 03 January, 2014"
the 2 different date formats are throwing me off. Any insights would be appreciated!
/(\d+ \w+, \d+|\w+ \d+, \d+)/ for example. Try it out on Rubular.
For sure, it would pickup more stuff, like 2013 NotReallyAMonth, 12345. But if you don't have things in the input that look like a date, but not actually a date this might work.
You could make the regexp stronger, but applying more restrictions on what is matched:
/(\d{2} (?:January|December), \d{4}|(?:January|December) \d{2}, \d{4})/
In this case the day is always two digits, the year is 4. Months are listed explicitly (you would have to list all of them).
Update: For ranges it would be a different regexp:
/((?:Jan|Dec) \d+ - \d+, \d{4})/
Obviously they can all be combined together:
/(\d{2} (?:January|December), \d{4}|(?:January|December) \d{2}, \d{4}|(?:Jan|Dec) \d+ - \d+, \d{4})/