I'm trying to find any kind of date format in a text as:
04.04.17
4/5/2016
6 December 1900
9 Dec 2014
1st of May 1920
2017
Dec. 21
October 10, 1930
October 10th, 2017
March 10-12 2015
Years only 1800 until 2017
That's what I have so far:
(0?[1-9]|[12][0-9]|3[01])?([\/\-\.]|st of\s|nd of\s|rd of\s|th of\s|\s)(Jan.?(uary)?|Feb.?(ruary)?|Mar.?(ch)?|Apr.?(il)?|May|Jun.?(e)?|Jul.?(y)?|Aug.?(ust)?|Sep.?(tember)?|Oct.?(ober)?|Nov.?(ember)?|Dec.?(ember)?|0?[1-9]|1[012])([\/\-\.]|\s)(((18|19)\d{2}|20[01][0-7])|[01][0-7])
The expression above can find the formats no. 1 to 5. If I try to work with the question mark quantifier after the first groups to find dates like "Dec. 21" and "2017" it does not work for the other date formats anymore.
Furthermore, the format no. 1 to 7 is more or less dd/mm/yyyy. However, format no. 8 to 10 is mm/dd/yyyy.
Any advice to solve this problem in one regex expression?
Thank you in advance!
Suggestion: instead of a monster regex, which would be nearly impossible to maintain, how about having an array of regex, one for each format you're accepting. Then loop through your array to see if the input matches any of your regexes. It would be easier to maintain, and likely would run faster, too.
Related
Similar to How to use REGEX patterns to return day of the week if a date is entered? except I'm only looking for Regex answers not "any other clever method".
What is is simplest regex to match a particular day of the week, e.g. Thursday from a date string formatted like 2017-05-04T10:14:07Z.
This is what I have as a starting point.
2017-05-(04|11|18|25)T.*$
Is there a way to achieve a solution without too many pipes and covering all possible years or at least the last decade?
First and foremost, Thursday can not be extracted from a date string formatted like 2017-05-04T10:14:07Z, as "Thursday" never appears in the string. The best you can capture is the 2 digit day number (the "04").
You CAN get the day number (\d+-\d{1,2}-(\d{1,2})T.*?Z), However regex can't verify the correctness of the day. (for example, for a random year, can you tell me if Feb 28 is valid without listing every single instance?) So ONLY DO THIS IF YOU ACCEPT DAY MAY NOT BE VALID (or source will always be right)
i have a dataframe with entries looking like:
"Wittmann 2014 100 Hills Dry Riesling (Rheinhessen)" and
"Hazlitt 1852 Vineyards 2013 Riesling (Finger Lakes)"
I need to extract the years (vintage of the wine) out of the String, but only the years from 2012 till 2015...
Would be nice if someone can help me find the right code/regex in R.
Maybe this regex will work for you:
/(\b201[2-5]\b)/g
Try it online
In Gregorian calendar, if a year is divisible by 400 then it is a leap year. 2000 is a leap year where 2100 is not. how would you implement (if year % 400 == 0) using only regular expression constructs? implementation using if else statement would be considered invalid since this will be dealt with externally. the solution will validate if February has 28 or 29 days. My problem deals with 4 digit year (from 1000) but any guide to a general solution will also be very helpful.
EDIT: Nevermind. Found Guide.
http://regexadvice.com/blogs/mash/archive/2004/04/02/Dealing-with-dates-and-leap-years.aspx
Try This Regex
It will match a leap year.
((^(10|12|0?[13578])([/])(3[01]|[12][0-9]|0?[1-9])([/])((1[8-9]\d{2})|([2-9]\d{3}))$)|(^(11|0?[469])([/])(30|[12][0-9]|0?[1-9])([/])((1[8-9]\d{2})|([2-9]\d{3}))$)|(^(0?2)([/])(2[0-8]|1[0-9]|0?[1-9])([/])((1[8-9]\d{2})|([2-9]\d{3}))$)|(^(0?2)([/])(29)([/])([2468][048]00)$)|(^(0?2)([/])(29)([/])([3579][26]00)$)|(^(0?2)([/])(29)([/])([1][89][0][48])$)|(^(0?2)([/])(29)([/])([2-9][0-9][0][48])$)|(^(0?2)([/])(29)([/])([1][89][2468][048])$)|(^(0?2)([/])(29)([/])([2-9][0-9][2468][048])$)|(^(0?2)([/])(29)([/])([1][89][13579][26])$)|(^(0?2)([/])(29)([/])([2-9][0-9][13579][26])$))
It will match
mm/dd/yyyy,m/dd/yyyy,mm,d/yyyy and m/d/yyyy
Check Here
I'm trying to match dates of type:ddmmyyyy
, like: 04072001
So far I have this:
^(?:(?:31(?:0?[13578]|1[02]))\1|(?:(?:29|30)(?:0?[1,3-9]|1[0-2])\2))(?:(?:1[6-9]|[2-9]\d)?\d{2})$|^(?:290?2\3(?:(?:(?:1[6-9]|[2-9]\d)?(?:0[48]|[2468][048]|[13579][26])|(?:(?:16|[2468][048]|[3579][26])00))))$|^(?:0?[1-9]|1\d|2[0-8])(?:(?:0?[1-9])|(?:1[0-2]))\4(?:(?:1[6-9]|[2-9]\d)?\d{2})$
which is almost the same as here but without the delimiters( (\/|-|\.) )
You could use something more simple like this:
^(0[1-9]|[1-2][0-9]|31(?!(?:0[2469]|11))|30(?!02))(0[1-9]|1[0-2])([12]\d{3})$
It captures the day, month, year, and validates everything except for whether Feb. 29 is actually a leap year. (To do that, I'd just perform the math on the captured year/date afterwards rather than trying to write it into the expression).
Working example: http://regex101.com/r/dH8mG3
Explained:
- Capture the day: 01-29
- OR 31, if not succeeded by 02, 04, 06, 09, or 11
- OR 30, if not succeeded by 02
- Capture the month: 01-12
- Capture the year: 1000-2999 (you could narrow this down
by using number ranges like
(1[8-9]\d{2}|20\d{2}) == 1800-2099
A regular expression is not the best tool for this job.
If at all possible, just match ^\d{8}$ (or ^\d\d\d\d\d\d\d\d$ if your regexp engine doesn't support the {8} syntax) and then programmatically check that the date is valid.
In a bit more detail:
Match ^(\d\d)(\d\d)(\d\d\d\d)$ (adjust syntax as needed).
Extract the three matching groups and programmatically check that they constitute a valid date.
The latter requires (a) knowing the number of days in each month, and (b) knowing which years are leap years (which depends on which calendar you're using; Gregorian is the obvious choice, but think about years before it was introduced).
The resulting code will be much easier to read and maintain.
(Also, if you have any control over the format, consider using YYYYMMDD rather than DDMMYYYY; it sorts correctly and it's one of the formats specified by the ISO 8601 standard.)
Do not validate dates using only RegEx. Your language probably has a built in date object with its own methods and such which you can use in conjunction with your input whose format you've validated with RegEx.
Can you translate something for me? My boyfriend is a programmer and has posted a message that I can't understand at all.
^((((31\/(0?[13578]|1[02]))|((29|30)\/(0?[1,3-9]|1[0-2])))\/(1[6-9]|[2-9]\d)?\d{2})|(29\/0?2\/(((1[6-9]|[2-9]\d)?(0[48]|[2468][048]|[13579][26])|((16|[2468][048]|[3579][26])00))))|(0?[1-9]|1\d|2[0-8])\/((0?[1-9])|(1[0-2]))\/((1[6-9]|[2-9]\d)?\d{2})) (20|21|22|23|[0-1]?\d):[0-5]?\d:[0-5]?\d$
What does it mean? Is it a normal message with words or is it some kind of other code?
It's an expression that attempts to match all valid date/times in d/m/y H:M:S format, with or without leading zeros, and using 2- or 4-digit years, including Feb 29 on leap years. Not sure why he'd be sending you this, unless the context of your conversation makes it relevant somehow.
It'd match:
the 31st day of January, March, May, July, August, October, or December, or the 29th or 30th days of any month but February, in any year from 1600 to 9999;
the 29th day of February in any multiple-of-4-but-not-100 year from 1604 to 9996, or multiple-of-400 years from 1600 to 9600;
or day 1-28 in any month in any year from 1600 to 9999;
plus a time in 24-hour format.
Looks like he didn't account for leap seconds. Bad boy.
EDIT:
Looking over the regex again, it also looks like it won't match 29/2/00 00:00:00. The leap year match for multiple-of-400 years doesn't take 2-digit years into account. It really can't do so in a way that won't break in 80 years or so (or whenever 00 starts to mean 2100 and not 2000), unless he wants to define 00 as meaning 2000 for the expected life of the software and risk a very subtle Y2.1K bug if it lives that long.