Regex for capturing different date formats - regex

I'm tasked to capture date for itineraries in email message, but the dates given were all in different formats, I guess I need help to find out if there's any way to capture the following formats:
02 APR
APR 02
2 APR
APR 2
2nd APR
APR 2nd
2nd April
April 2nd
APR 12th
April 12th
12th April
April 13-16
13-16 April
APR 13-16
13-16 APR
April 13th-16th
13th-16th April
APR 13th-16th
13th-16th APR
I've tried numerous ways but just could not understand or fathom as I'm a
newbie to regex.
The closest I could get was using this:
(\d*)-(\d*) APR|April \d*\d*
EDIT- Found out that i`ve missed some more formats.
13th - 16th APR
13~16 April
13/16 APR
I`ve tried using the following:
(Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)\ *\d+(?:[nr]d|th|st)?(?: * \d+(?:[nr]d|th|st)?)?|\d+(?:[nr]d|th|st)?(?: . \d+(?:[nr]d|th|st)?)?\ *(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)
Could either capture dates with space or without space.
Is there a way to capture all formats, and split the dates with '-', '/','~' and output/write into a single standardize format?
(Group 1 Date)-Month (Group 2 Date)-Month eg: 13-Apr 16-Apr
Appreciate for your kind suggestions and comments.

You need to account for optional values. Here is an enhanced version matching your sample input:
/(\d+)(?:st|[nr]d|th)?-?(\d*)(?:st|[nr]d|th)?\s*Apr(?:il)?|Apr(?:il)?\s*(\d+)(?:st|[nr]d|th)?-?(\d*)(?:st|[nr]d|th)?/i
See the regex demo (note you need to use a case-insensitive modifier to match any variants of April)
Basically, there are 2 alternatives matching April and date ranges:
(\d+)(?:st|[nr]d|th)?-?(\d*)(?:st|[nr]d|th)?\s*Apr(?:il)? - 1+ digits followed with an optional st, nd, rd, th, followed with an optional hyphen, followed with 0+ digits, followed with optional st, etc. followed with 0+ whitespace and then Apr or April (case insensitive due to /i modifier)
| - or
Apr(?:il)?\s*(\d+)(?:st|[nr]d|th)?-?(\d*)(?:st|[nr]d|th)? - the same as above but swapped.

I came up with this Regex:
(?:APR|April)\ *\d+(?:[nr]d|th|st)?(?:-\d+(?:[nr]d|th|st)?)?|\d+(?:[nr]d|th|st)?(?:-\d+(?:[nr]d|th|st)?)?\ *(?:APR|April)
See details here: Regex101
Maybe it's overkill, but I came up with this regex that will match with any month:
(?:January|JAN|February|FEB|March|MAR|April|APR|May|MAY|June|JUN|July|JUL|August|AUG|September|SEP|October|OCT|November|NOV|December|DEC)\ *\d+(?:[nr]d|th|st)?(?:-\d+(?:[nr]d|th|st)?)?|\d+(?:[nr]d|th|st)?(?:-\d+(?:[nr]d|th|st)?)?\ *(?:January|JAN|February|FEB|March|MAR|April|APR|May|MAY|June|JUN|July|JUL|August|AUG|September|SEP|October|OCT|November|NOV|December|DEC)
Unreadable, check here if you want details: Regex101
Improved version using Wiktor Stribiżew's trick:
(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)\ *\d+(?:[nr]d|th|st)?(?:-\d+(?:[nr]d|th|st)?)?|\d+(?:[nr]d|th|st)?(?:-\d+(?:[nr]d|th|st)?)?\ *(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)
See details here: Regex101
It matches every monthes, it uses less steps (more efficient)
BUT, you need to make sure you're case insensitive

I came up with this:
(\d+(?:th|st|[nr]d)?(?:-\d+(?:th|st|[nr]d)?)?\s*(?:APR|April))|((?:APR|April)\s*\d+(?:th|st|[nr]d)?(?:-\d+(?:th|st|[nr]d)?)?)
Live Demo

Related

How do I capture the Months with proper Regex?

How do I capture the days of months as numbers, excluding any suffixes. For instance - January 11th would be 11, and March 25th would be 25.
You could use the regex string and then only use the 3rd capturing group.
We accept 3 letter months Jan 1st and full name January 1st and accept space, hyphen,comma or slash as in Jan 01 Jan-01 Jan,1st Jan/31
(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)(uary|ruary|ch|il|e|y|ust|tember|ember)?[ \/,-]{0,2}([0-3]?[0-9])
You would do better to look for native time manipulation if possible.

Regex for date format dd Mmm yyyy from email header

I have the following regex that I have been working on:
^(\d\d)\s(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\s(\d{4})?$
I am trying to grab the date from an email header that is formatted like so:
"Mon, 18 Nov 2019 09:19:17 -0700 (MST)"
and I want the result to be:
18 Nov 2019
It seems that the \s for whitespace could be the culprit, but I have yet to find another forum result that grabs dates with whitespace instead of "-" or "/".
Does anyone have any suggestions for getting this working to extract as described above? Thanks in advance.
The problem is that you have added the "^" and "$" symbol on the start and end of the regex.
"^n": The ^n quantifier matches any string with n at the beginning of it.
"n$": The n$ quantifier matches any string with n at the end of it.
Since the text is not start with 2 digit (\d\d) and end with 2 digit (\d{4}). You will not get any result from this regex.
You can simply remove those two symbol or use the following code to achieve that.
/(\d{2}\s(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\s\d{4})/.exec("Mon, 18 Nov 2019 09:19:17 -0700 (MST)")[1]

REQ: Assistance with Splunk - Rex Query

I'm having some issues with a rex query where a single digit date renders an incorrect result, but a double digit date provides the correct result.
These are the log entries I'm querying:
Mar 7 14:24:29 10.52.176.215 Mar 7 12:24:29 963568 - Melbourne details-cable-issue - vdvfvfv
Mar 20 09:52:55 10.52.176.215 Mar 20 07:52:55 963569 - Brisbane cable-issue
And this is the query:
^(?:[^ \n]* ){7}(?P<extension>[^ ]+)[^\-\n]*\-\s+(?P<location>\w+)
For the Mar 7 entry, my query is giving me group extension "7" whilst my Mar 20 entry is giving me group extension "963569" which is correct.
Can someone shed some light on my query to acknowledge a single and double digit date? #7 vs 20
Thanks all :)
There are several consecutive spaces (they look like padding spaces) in the first string, and since you only match one space within (?:[^ \n]* ) you get mismatches.
I suggest matching 1 or more spaces in that first group and adjusting the limiting quantifier:
^(?:[^ \n]* +){5}(?P<extension>[^ ]+)[^-\n]*-\s+(?P<location>\w+)
^ ^
See the regex demo

Matching possible date elements in range

I'm having difficulty matching other cases for a date range. The end goal will be to extract each group to build an ISO 8601 date format.
Test cases
May 8th – 14th, 2019
November 25th – December 2nd
November 5th, 2018 – January 13th, 2019
September 17th – 23rd
Regex
(\w{3,9})\s([1-9]|[12]\d|3[01])(?:st|nd|rd|th),\s(19|20)\d{2}\s–\s(\w{3,9})\s([1-9]|[12]\d|3[01])(?:st|nd|rd|th),\s(19|20)\d{2}
regexr
I would like to be able to capture each group regardless if it exists or not.
For example, May 8th – 14th, 2019
Group 1 May
Group 2 8th
Group 3
Group 4
Group 5 14th
Group 6 2019
And November 5th, 2018 – January 13th, 2019
Group 1 November
Group 2 5th
Group 3 2018
Group 4 January
Group 5 13th
Group 6 2019
To capture the empty string if the group doesn't match otherwise, the general idea is to use (<characters to match>|)
Try this one:
([A-z]{3,9})\s((?:[1-9]|[12]\d|3[01])(?:st|nd|rd|th))(?:, (?=19|20))?(\d{4}|)\s–\s([A-z]{3,9}|)\s?((?:[1-9]|[12]\d|3[01])(?:st|nd|rd|th))(?:, (?=19|20))?(\d{4}|)
https://regex101.com/r/4UY0WE/1/
When trying to capture the month (the first group), make sure to use [A-z]{3,9} rather than \w{3,9}, otherwise you might match, eg, 23rd rather than a month string.
Separated out:
([A-z]{3,9}) # Month ("January")
\s
((?:[1-9]|[12]\d|3[01])(?:st|nd|rd|th)) # Day of month, including suffix ("23rd")
(?:, (?=19|20))? # Comma and space, if followed by year
(\d{4}|) # Year
\s–\s #
([A-z]{3,9}|) # same as first line
\s?
# same as third to fifth lines:
((?:[1-9]|[12]\d|3[01])(?:st|nd|rd|th))
(?:, (?=19|20))?
(\d{4}|)
This one saves some space by consolidating some of the groupings.
Try it here
Full regex:
([A-z]{3,9}) ((?:[1-9]|[12]\d|3[01])(?:st|nd|rd|th))(?:, ((?:19|20)\d{2}))? [–-] ([A-z]{3,9}\s)?((?:[1-9]|[12]\d|3[01])(?:st|nd|rd|th))(?:, ((?:19|20)\d{2}))?
Separated by group (spaces replaced by \s for readability):
1. ([A-z]{3,9})
\s
2. ((?:[1-9]|[12]\d|3[01])(?:st|nd|rd|th))
3. (?:,\s((?:19|20)\d{2}))?
\s[–-]\s
4. ([A-z]{3,9}\s)?
5. ((?:[1-9]|[12]\d|3[01])(?:st|nd|rd|th))
6. (?:,\s((?:19|20)\d{2}))?
This method does not use lookups so is generally safe for any regex engine.

Parsing dates from string - regex

I'm terible with regex and I can't seem to wrap my head around this simple task.
I need to parse out the two dates in a string which always has one of two formats:
"Inquiry at your property for December 29, 2013 - January 03, 2014"
OR
"Inquiry at your property for 29 December , 2013 - 03 January, 2014"
the 2 different date formats are throwing me off. Any insights would be appreciated!
/(\d+ \w+, \d+|\w+ \d+, \d+)/ for example. Try it out on Rubular.
For sure, it would pickup more stuff, like 2013 NotReallyAMonth, 12345. But if you don't have things in the input that look like a date, but not actually a date this might work.
You could make the regexp stronger, but applying more restrictions on what is matched:
/(\d{2} (?:January|December), \d{4}|(?:January|December) \d{2}, \d{4})/
In this case the day is always two digits, the year is 4. Months are listed explicitly (you would have to list all of them).
Update: For ranges it would be a different regexp:
/((?:Jan|Dec) \d+ - \d+, \d{4})/
Obviously they can all be combined together:
/(\d{2} (?:January|December), \d{4}|(?:January|December) \d{2}, \d{4}|(?:Jan|Dec) \d+ - \d+, \d{4})/