Regex for date format dd Mmm yyyy from email header - regex

I have the following regex that I have been working on:
^(\d\d)\s(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\s(\d{4})?$
I am trying to grab the date from an email header that is formatted like so:
"Mon, 18 Nov 2019 09:19:17 -0700 (MST)"
and I want the result to be:
18 Nov 2019
It seems that the \s for whitespace could be the culprit, but I have yet to find another forum result that grabs dates with whitespace instead of "-" or "/".
Does anyone have any suggestions for getting this working to extract as described above? Thanks in advance.

The problem is that you have added the "^" and "$" symbol on the start and end of the regex.
"^n": The ^n quantifier matches any string with n at the beginning of it.
"n$": The n$ quantifier matches any string with n at the end of it.
Since the text is not start with 2 digit (\d\d) and end with 2 digit (\d{4}). You will not get any result from this regex.
You can simply remove those two symbol or use the following code to achieve that.
/(\d{2}\s(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\s\d{4})/.exec("Mon, 18 Nov 2019 09:19:17 -0700 (MST)")[1]

Related

REQ: Assistance with Splunk - Rex Query

I'm having some issues with a rex query where a single digit date renders an incorrect result, but a double digit date provides the correct result.
These are the log entries I'm querying:
Mar 7 14:24:29 10.52.176.215 Mar 7 12:24:29 963568 - Melbourne details-cable-issue - vdvfvfv
Mar 20 09:52:55 10.52.176.215 Mar 20 07:52:55 963569 - Brisbane cable-issue
And this is the query:
^(?:[^ \n]* ){7}(?P<extension>[^ ]+)[^\-\n]*\-\s+(?P<location>\w+)
For the Mar 7 entry, my query is giving me group extension "7" whilst my Mar 20 entry is giving me group extension "963569" which is correct.
Can someone shed some light on my query to acknowledge a single and double digit date? #7 vs 20
Thanks all :)
There are several consecutive spaces (they look like padding spaces) in the first string, and since you only match one space within (?:[^ \n]* ) you get mismatches.
I suggest matching 1 or more spaces in that first group and adjusting the limiting quantifier:
^(?:[^ \n]* +){5}(?P<extension>[^ ]+)[^-\n]*-\s+(?P<location>\w+)
^ ^
See the regex demo

get date from arabic string

I have an Arabic string that shows date in Arabian with detail how can I get dd mm yyyy from that with regex
Ex:
الأحد 21 مايو 2017 01:20 م
i use this regex but doesn't work
^\d{2} [\u0600-\u06FF] \d{4}
what can i do ?
This regex
(\d{4}\ \d{2}:\d{2})
Will match 2017 01:20
I guess you could play around with capture groups to get it in the right order if you want to use the result afterwards.
Regex
The code is here. Arabic right to left should reverse the code.
\d\d\d\d\s\d\d:\d\d

Regex for capturing different date formats

I'm tasked to capture date for itineraries in email message, but the dates given were all in different formats, I guess I need help to find out if there's any way to capture the following formats:
02 APR
APR 02
2 APR
APR 2
2nd APR
APR 2nd
2nd April
April 2nd
APR 12th
April 12th
12th April
April 13-16
13-16 April
APR 13-16
13-16 APR
April 13th-16th
13th-16th April
APR 13th-16th
13th-16th APR
I've tried numerous ways but just could not understand or fathom as I'm a
newbie to regex.
The closest I could get was using this:
(\d*)-(\d*) APR|April \d*\d*
EDIT- Found out that i`ve missed some more formats.
13th - 16th APR
13~16 April
13/16 APR
I`ve tried using the following:
(Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)\ *\d+(?:[nr]d|th|st)?(?: * \d+(?:[nr]d|th|st)?)?|\d+(?:[nr]d|th|st)?(?: . \d+(?:[nr]d|th|st)?)?\ *(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)
Could either capture dates with space or without space.
Is there a way to capture all formats, and split the dates with '-', '/','~' and output/write into a single standardize format?
(Group 1 Date)-Month (Group 2 Date)-Month eg: 13-Apr 16-Apr
Appreciate for your kind suggestions and comments.
You need to account for optional values. Here is an enhanced version matching your sample input:
/(\d+)(?:st|[nr]d|th)?-?(\d*)(?:st|[nr]d|th)?\s*Apr(?:il)?|Apr(?:il)?\s*(\d+)(?:st|[nr]d|th)?-?(\d*)(?:st|[nr]d|th)?/i
See the regex demo (note you need to use a case-insensitive modifier to match any variants of April)
Basically, there are 2 alternatives matching April and date ranges:
(\d+)(?:st|[nr]d|th)?-?(\d*)(?:st|[nr]d|th)?\s*Apr(?:il)? - 1+ digits followed with an optional st, nd, rd, th, followed with an optional hyphen, followed with 0+ digits, followed with optional st, etc. followed with 0+ whitespace and then Apr or April (case insensitive due to /i modifier)
| - or
Apr(?:il)?\s*(\d+)(?:st|[nr]d|th)?-?(\d*)(?:st|[nr]d|th)? - the same as above but swapped.
I came up with this Regex:
(?:APR|April)\ *\d+(?:[nr]d|th|st)?(?:-\d+(?:[nr]d|th|st)?)?|\d+(?:[nr]d|th|st)?(?:-\d+(?:[nr]d|th|st)?)?\ *(?:APR|April)
See details here: Regex101
Maybe it's overkill, but I came up with this regex that will match with any month:
(?:January|JAN|February|FEB|March|MAR|April|APR|May|MAY|June|JUN|July|JUL|August|AUG|September|SEP|October|OCT|November|NOV|December|DEC)\ *\d+(?:[nr]d|th|st)?(?:-\d+(?:[nr]d|th|st)?)?|\d+(?:[nr]d|th|st)?(?:-\d+(?:[nr]d|th|st)?)?\ *(?:January|JAN|February|FEB|March|MAR|April|APR|May|MAY|June|JUN|July|JUL|August|AUG|September|SEP|October|OCT|November|NOV|December|DEC)
Unreadable, check here if you want details: Regex101
Improved version using Wiktor Stribiżew's trick:
(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)\ *\d+(?:[nr]d|th|st)?(?:-\d+(?:[nr]d|th|st)?)?|\d+(?:[nr]d|th|st)?(?:-\d+(?:[nr]d|th|st)?)?\ *(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)
See details here: Regex101
It matches every monthes, it uses less steps (more efficient)
BUT, you need to make sure you're case insensitive
I came up with this:
(\d+(?:th|st|[nr]d)?(?:-\d+(?:th|st|[nr]d)?)?\s*(?:APR|April))|((?:APR|April)\s*\d+(?:th|st|[nr]d)?(?:-\d+(?:th|st|[nr]d)?)?)
Live Demo

regex to skip intermediate characters

Given the following string "Mon Feb 01 02:42:27 +0000 2013", what should the regex be to get the following "Feb 01 2013".
The following regex "\w{3}\s\w{3}\s\d{1,2}" will yield "Mon Feb 01". How do I write a regex to ignore the day (Mon) and the time and secs ?
I had asked a related question in an earlier post, which was kindly answered by the community - unable to parse whitespace in regex
This could work :
(Mon|Tue|Wed|Thu|Fri|Sat|Sun) [a-zA-Z]{3} ([0-9]{2}) ([0-9]{2})\:([0-9]{2})\:([0-9]{2}) \+([0-9]{4}) ([0-9]{4})

Regex: How to match a unix datestamp?

I'd like to be able to match this entire line (to highlight this sort of thing in vim): Fri Mar 18 14:10:23 ICT 2011. I'm trying to do it by finding a line that contains ICT 20 (first two digits of the year of the year), like this: syntax match myDate /^*ICT 20*$/, but I can't get it working. I'm very new to regex. Basically what I want to say: find a line that contains "ICT 20" and can have anything on either side of it, and match that whole line. Is there an easy way to do this?
.*ITC 20.*
should do the trick. . is a wildcard that matches any character, and * means you can have 0 or more of the pattern it follows. (i.e. ba(na)* will match ba, banana, bananananana and so on)