Similar to How to use REGEX patterns to return day of the week if a date is entered? except I'm only looking for Regex answers not "any other clever method".
What is is simplest regex to match a particular day of the week, e.g. Thursday from a date string formatted like 2017-05-04T10:14:07Z.
This is what I have as a starting point.
2017-05-(04|11|18|25)T.*$
Is there a way to achieve a solution without too many pipes and covering all possible years or at least the last decade?
First and foremost, Thursday can not be extracted from a date string formatted like 2017-05-04T10:14:07Z, as "Thursday" never appears in the string. The best you can capture is the 2 digit day number (the "04").
You CAN get the day number (\d+-\d{1,2}-(\d{1,2})T.*?Z), However regex can't verify the correctness of the day. (for example, for a random year, can you tell me if Feb 28 is valid without listing every single instance?) So ONLY DO THIS IF YOU ACCEPT DAY MAY NOT BE VALID (or source will always be right)
Related
I'm trying to extract regions around keywords from longer passages of text. They should include complete sentences, based on the following conditions:
n=250 Charactars before / after keyword should be included if existing (the keyword can be closer then this to the start / end of the text)
from there it should expand further to include the complete sentence (let's assume here we can define sentence borders with ".?! or :" knowing it's not completely accurate)
I already achieved the expanding to the end of the last sentence, but not to start of the first in the following example, where vitamin is the keyword and the italic is captured by the regex. However, it should capture from "An extra 24 hours..."
Apparently I don't get the corresponding group up front, neither using lazy nor using lookbehind.
((.{0,250}(vitamin)\b.{0,250})(.+?(\.|\!|\?|\:))?)/ig
Well, this year you’re getting an extra day to get ahead on your taxes or (finally) clean out the garage. (Hey, we’re not trying to tell you what do but you might as well be productive.) February 29 is back on the calendar this year because it’s a leap year. Whether you love or loathe the extra winter day, you’re probably wondering why it happens in the first place. An extra 24 hours — or day — is built into the calen dar every four years to ensure it aligns with the Earth’s movement around the sun. There’s 365 days in a calendar year, but it actually takes longer for the Earth’s annual journey — about 365.2421 days — around the star that gives us light, life and vitamin D. The difference may seem like no big deal to us, but over time, it adds up. “To ensure consistency with the true astronomical year, it is necessary to periodically add in an extra day to make up the lost time and get the calendar back in sync with the heavens,” according the history. com.
Acknowledgement of the need for a leap year happened around the time of Julius Caesar. In 46 B.C., Caesar enlisted the help of astronomer Sosigenes to update the calendar so that it had 12 months and 365 days, including a leap year every four years.,
You can try something like this:
(([.?!:][^.?!:]*.{250}\bvitamin\b.{250})[^.?!:]*[.?!:])
It works by consuming 250 characters of text before and after the keyword "vitamin". From that point it finds the first punctuation point (.?!:) before/after the 250 characters of text.
Here's a sample of it in action.
You can you use extra parentheses () to strategically group what exact output you want. For example, the above answer includes the ending period from the preceding sentence in the output. So you could use
(([.?!:]([^.?!:]*.{250}\bvitamin\b.{250})[^.?!:]*[.?!:]))
and use group 3 from the result set which doesn't have this ending period.
I do not see how the specification in the question can be matched by a regex. It boils down to the following logic problem:
to match as many characters as possible but no more than 250 before/after the keyword, .{0,250} needs to be greedy and can neither be lazy .{0,250}? nor possessive .{0,250}+
if this part is greedy, you will miss the occurrences of the keyword that start before the .{0,250} part is matched.
The same logic applies to my understanding to the 'match back to the start of the sentnence as well.
I played around with the following more or less meaningful regex:
[.?!:]?([^.?!:]*?(.{0,250}\byear\b.{0,250})[^.?!:]*[.?!:]?) misses first 'year'
[.?!:]?([^.?!:]*?(.{0,250}?\byear\b.{0,250})[^.?!:]*[.?!:]?) gets the first 'year' but fails on others.
I suggest you write your on extraction logic in a function, eihter using regex or not, to achieve the extraction you want.
You could for example find the index of the start of the keyword \bkeyword\b and the full stops (\.[^\d]|[.?!:]$) and then with this information extract the part of the text you want.
I am writing a regex to capture various date formats. To keep it short and flexible, I wanted to pack all the possible combinations of months, days and years into separate groups. Let`s assume I have two dates like this:
01.01. - 31.12.2013
jan - dec 2013
Now, what I want to achieve is to write a regex that would capture both dates like the above ones. that's easy. but I also want to exclude dates like e.g. those:
01.01. - 31 dec 2013
In other words, whenever the months are mixed, I don't want those dates. Also, if the first date doesn't have a day, I don't want that day to be captured in the second one either.
I wanted to build a conditional which captures only the second date's appropriate fields, based on what is found in the first one (so, e.g. if the first date has an alpha month, look only for an alpha month in the second one, ignore numeric). My regex looks like this:
(?<firstDay>0[1-9]|[12][0-9]|3[01]|[1-9])[-/\s\.](?<firstMonth>0[1-9]|1[012]|[\p{L}]{3,}|[1-9])\s*[-\s/\.]*\s*(?<secondDay>0[1-9]|[12][0-9]|3[01]|[1-9])[-\s/.]*(?<secondMonth>((?<firstMonth>)(?<=0[1-9]|1[012]|[1-9]))(0[1-9]|1[012]|[1-9])|[\p{L}]{3,})[-\s/\.]*(?<year>(19|20)\d\d|[012][0-9]$)
This is all background, but what my question is, is it possible to check what the captured group is equal to and build a capturing condition based on that? I found some similar topic on Stack Overflow (can't find it now to referenec, unfortunately), but when I implement it, it stops capturing some proper dates (e.g. 01.01. - 31.12.2013). This is that part:
(?<secondMonth>((?<firstMonth>)(?<=0[1-9]|1[012]|[1-9]))(0[1-9]|1[012]|[1-9])|[\p{L}]{3,})
I'm trying to match dates of type:ddmmyyyy
, like: 04072001
So far I have this:
^(?:(?:31(?:0?[13578]|1[02]))\1|(?:(?:29|30)(?:0?[1,3-9]|1[0-2])\2))(?:(?:1[6-9]|[2-9]\d)?\d{2})$|^(?:290?2\3(?:(?:(?:1[6-9]|[2-9]\d)?(?:0[48]|[2468][048]|[13579][26])|(?:(?:16|[2468][048]|[3579][26])00))))$|^(?:0?[1-9]|1\d|2[0-8])(?:(?:0?[1-9])|(?:1[0-2]))\4(?:(?:1[6-9]|[2-9]\d)?\d{2})$
which is almost the same as here but without the delimiters( (\/|-|\.) )
You could use something more simple like this:
^(0[1-9]|[1-2][0-9]|31(?!(?:0[2469]|11))|30(?!02))(0[1-9]|1[0-2])([12]\d{3})$
It captures the day, month, year, and validates everything except for whether Feb. 29 is actually a leap year. (To do that, I'd just perform the math on the captured year/date afterwards rather than trying to write it into the expression).
Working example: http://regex101.com/r/dH8mG3
Explained:
- Capture the day: 01-29
- OR 31, if not succeeded by 02, 04, 06, 09, or 11
- OR 30, if not succeeded by 02
- Capture the month: 01-12
- Capture the year: 1000-2999 (you could narrow this down
by using number ranges like
(1[8-9]\d{2}|20\d{2}) == 1800-2099
A regular expression is not the best tool for this job.
If at all possible, just match ^\d{8}$ (or ^\d\d\d\d\d\d\d\d$ if your regexp engine doesn't support the {8} syntax) and then programmatically check that the date is valid.
In a bit more detail:
Match ^(\d\d)(\d\d)(\d\d\d\d)$ (adjust syntax as needed).
Extract the three matching groups and programmatically check that they constitute a valid date.
The latter requires (a) knowing the number of days in each month, and (b) knowing which years are leap years (which depends on which calendar you're using; Gregorian is the obvious choice, but think about years before it was introduced).
The resulting code will be much easier to read and maintain.
(Also, if you have any control over the format, consider using YYYYMMDD rather than DDMMYYYY; it sorts correctly and it's one of the formats specified by the ISO 8601 standard.)
Do not validate dates using only RegEx. Your language probably has a built in date object with its own methods and such which you can use in conjunction with your input whose format you've validated with RegEx.
I need serious help building two Regex statements for a project. The software we're using ONLY accepts Regex for validation.
I need one that fires for any date <4/1/2009
and a second that fires for any date <10/1/2009
My co-worker gave me the following code to check for <=10/01/2010, but it checks leap years and all that stuff. I need something a little more streamlined than this in the MM/DD/YYYY format. Thanks in advance!
^(?:(?:0?[1-9])|(?:1[0-2]))(\/|-|\.)(?:0?[1-9]|1\d|2[0-8])(\/|-|\.)(?:2[0-9][2-9][0-9])$|^(?:(?:0?[1-9])|(?:1[0-2]))(\/|-|\.)(?:0?[1-9]|1\d|2[0-8])(\/|-|\.)(?:201[1-9])$|^(?:(?:(?:0?[13578]|1[02])(\/|-|\.)31)|(?:(?:0?[1,3-9]|1[0-2])(\/|-|\.)(?:29|30)))(\/|-|\.)(?:201[1-9])$|^(?:(?:(?:11)(\/|-|\.))(?:0?[1-9]|1\d|2[0-9]|30)(\/|-|\.))(2010)$|^(?:(?:(?:10|12)(\/|-|\.))(?:0?[1-9]|1\d|2[0-9]|30|31)(\/|-|\.))(2010)$|^(?:(?:0?[1-9])|(?:1[0-2]))(\/|-|\.)(?:0?[1-9]|1\d|2[0-8])(\/|-|\.)(?:2[0-9][2-9][0-9])$|^(?:(?:(?:0?[13578]|1[02])(\/|-|\.)31)\1|(?:(?:0?[1,3-9]|1[0-2])(\/|-|\.)(?:29|30)))(\/|-|\.)(?:2[0-9][2-9][0-9])$|^(?:(?:0?[1-9])|(?:1[0-2]))(\/|-|\.)(?:0?[1-9]|1\d|2[0-8])(\/|-|\.)(?:2011)$|^(?:0?2(\/|-|\.)29\3(?:(?:(?:2[0-9][1-9]\d)?(?:0[48]|[2468][048]|[13579][26])|(?:(?:16|[2468][048]|[3579][26])00))))$
^(?:(?:0?2/(?:[12][0-9]|0?[1-9])|0?[13]/(?:3[01]|[12][0-9]|0?[1-9]))/2009|(?:0?2/(?:[12][0-9]|0?[1-9])|(?:0?[469]|11)/(?:30|[12][0-9]|0?[1-9])|(?:0?[13578]|1[02])/(?:3[01]|[12][0-9]|0?[1-9]))/(?:200[0-8]|19[0-9]{2}))$
will match any date between 1/1/1900 and 3/31/2009, ignoring leap years but otherwise matching only valid dates;
^(?:(?:0?2/(?:[12][0-9]|0?[1-9])|0?[469]/(?:30|[12][0-9]|0?[1-9])|0?[13578]/(?:3[01]|[12][0-9]|0?[1-9]))/2009|(?:0?2/(?:[12][0-9]|0?[1-9])|(?:0?[469]|11)/(?:30|[12][0-9]|0?[1-9])|(?:0?[13578]|1[02])/(?:3[01]|[12][0-9]|0?[1-9]))/(?:200[0-8]|19[0-9]{2}))$
does the same for 1/1/1900-9/30/2009.
EDIT: It looks like "firing" means "not matching" in your question. So
^(?:(?:(?:0?[469]|11)/(?:30|[12][0-9]|0?[1-9])|(?:0?[578]|1[02])/(?:3[01]|[12][0-9]|0?[1-9]))/2009|(?:0?2/(?:[12][0-9]|0?[1-9])|(?:0?[469]|11)/(?:30|[12][0-9]|0?[1-9])|(?:0?[13578]|1[02])/(?:3[01]|[12][0-9]|0?[1-9]))/(?:[3-9][0-9]{2}|2[1-9][0-9]|20[1-9])[0-9])$
will match any date from 4/1/2009 onwards, and
^(?:(?:11/(?:30|[12][0-9]|0?[1-9])|1[02]/(?:3[01]|[12][0-9]|0?[1-9]))/2009|(?:0?2/(?:[12][0-9]|0?[1-9])|(?:0?[469]|11)/(?:30|[12][0-9]|0?[1-9])|(?:0?[13578]|1[02])/(?:3[01]|[12][0-9]|0?[1-9]))/(?:[3-9][0-9]{2}|2[1-9][0-9]|20[1-9])[0-9])$
will match any date from 10/1/2009 onwards.
All regexes created using RegexMagic.
I have this regex (\d{4})-(\d{2})-(\d{2}) to detect a valid date, however, it is not perfect as some of the incoming data are 2009-24-09 (YYYY-DD-MM) and some are 2009-09-24 (YYYY-MM-DD).
Is it possible to have a one-line regex to detect whether the second & third portion is greater than 12 to better validate the date?
If you don't know the format, you will get ambiguous results.
take 2010-01-04 is that January 4th or March 1st?
You can't validate that with a regex.
As Albert said, try to parse the date, and make sure users know which format to use. You might try to separate the month and year portions into different fields or comboboxes.
Regex are not really good with dates validation, in my opinion is better to try to parse the date, and you could keep the regex as a sanity check before parsing it.
But if you still need it you can fix the month section using the following regex (\d{4})-(\d{2})-((1[012])|(0\d)|\d) but it goes downhill after that, since you need to check for correct days on months and leap years.
(\d{4})-((0[1-9]|1[0-2])-(\d{2}))|((\d{2})-(0[1-9]|1[0-2]))
YYYY-(MM-DD)|(DD-MM)
to validate YYYY-MM-DD or YYYY-DD-MM:
$ptn = '/(\d{4})-(?:(0[1-9]|1[0-2])-(0[1-9]|[1-2][0-9]|3[0-2])|(0[1-9]|' .
'[1-2][0-9]|3[0-2])-(0[1-9]|1[0-2]))/';
echo preg_match_all($ptn, '2009-24-09 2009-09-24 dd', $m); // returns 2
even so, the date could be invalid, e.g.: 2010-02-29, to deal with that there's checkdate():
checkdate(2, 29, 2010); // returns false