Pattern for month with low and capital letters - regex

I have following strings:
Emini Mar 15 ME
Emini ICE MAR 15 RTA
Emini ABC Apr 15 RTA
and use pattern:
[\S]*(Jan|Feb|Mar|Apr|May|Jin|Jul|Aug|Sep|Oct|Nov|Dec)+(\s+\d{1,2})
How to create short pattern instead ...(Jan|JAN|jan|Feb|FEB|feb...) etc.
Thanks in advance

Just add case-insensitive modifier i
(?i)\S*(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)+(\s+\d{1,2})

Related

Regular Expression that matches a set of characters or nothing

I have a set of text coming in as:
1 240R 15 Apr 2021 240 Litre Sulo Bin Recycling - Dkt#11053610 O/N: ONE DENTAL $10.00.
1 1.5cm 06 Apr 2021 1.5m Co-Mingle Recycling Bin - Dkt#11028471 $18.00.
1 1.5m 12 Apr 2021 Service 1.5m Front Lift bin - Dkt#11028421 $24.00
1 660 14 Apr 2021 660L Rear Lift Bin - Dkt#11156377 O/N: YOUR CAR SOLD $22.50
I am trying the regex: PCRE(PHP<7.3)
^(\d+) [^\$]+ (\d+ \w+ \d+) (\d.+ \w.+) \S (Dkt\S\d+)[^\$]+ (\$.*)$
Which parses the above However, I am unable to parse "O/N: ONE DENTAL" or "O/N: YOUR CAR SOLD" from the above list
How can I do it that if there is something after Dkt#xxxx it will also be parsed or nothing
since you already have the pattern, I made a few slight modifications to it. This is based on the example strings provided and may need further modification if you have other variations of the rules used to match.
Example:
^(\d) (\S+) (\d{2} \w{3} \d{4}) (.*)\s+-\s(Dkt\S+)\s*(.*?)\s*(\$\d+(?:\.\d+)?)\.?\s*$

How to capture strings between time stamps with python?

I have paragraphs like below
Dec 27 09:00:06 test event[1] number one
Dec 30 02:00:06 here is event[22] Feb 01 04:36:11 helloworld2
Dec 07 04:00:11 Now is event{3} Jan 01 04:36:11 Helloworld
Jan 02 23:00:11 helloworld evnt{45}
Feb 12 04:36:11 mesg10 Feb 13 04:36:11 mesg11 Feb 14 04:36:11 testmesg12
I want to capture the time stamp and message that occurred on that time stamp
Im using pythex.org to test the python regex (?P\w{3}\s\w{2}\s\w{2}:\w{2}:\w{2})\b(?P.*)
but this is only working for line separated and fails on paragraph having multiple(1+) Timestamps and message on same line. For example in above paragraphs I cannot capture Timestamp and message on Feb 12 04:36:11 mesg10 Feb 13 04:36:11 mesg11 Feb 14 04:36:11 testmesg12
Here is a 2.x Python solution which uses findall to find multiple matches in each line of your log file:
import re
p_str = '\w{3}\s\d{2}\s\d{2}:\d{2}:\d{2}\s(.*?)(?=\w{3}\s\d{2}\s\d{2}:\d{2}:\d{2}\s|$)'
pattern = re.compile(p_str, re.IGNORECASE)
log_str = 'Feb 12 04:36:11 mesg10 Feb 13 04:36:11 mesg11 Feb 14 04:36:11 testmesg12'
match = pattern.findall(log_str)
print match
['mesg10 ', 'mesg11 ', 'testmesg12']
Demo
The challenge here is in formulating a pattern which will work. I went the route of matching a timestamp, and then using a lookahead to know when to stop matching. We stop matching when we either see another timestamp, or we see the end of the line. Note that matching the next timestamp won't work here, because we need that to be the start of the next match as the regex works its way across the line.
Explore the demo to see the code in action.

Regular expression that returns two dates

I'm struggling in creating a regex that given this string:
20 - 29 APR 2017, 9 nights
returns two groups, first:
20 APR 2017
and second:
29 APR 2017
You can try this regex:
(\d+)\s*-\s*(\d+)\s*([a-zA-Z]+)\s*(\d+).*
And replace by:
$1 $3 $4\n$2 $3 $4
Regex Demo

Regex for capturing different date formats

I'm tasked to capture date for itineraries in email message, but the dates given were all in different formats, I guess I need help to find out if there's any way to capture the following formats:
02 APR
APR 02
2 APR
APR 2
2nd APR
APR 2nd
2nd April
April 2nd
APR 12th
April 12th
12th April
April 13-16
13-16 April
APR 13-16
13-16 APR
April 13th-16th
13th-16th April
APR 13th-16th
13th-16th APR
I've tried numerous ways but just could not understand or fathom as I'm a
newbie to regex.
The closest I could get was using this:
(\d*)-(\d*) APR|April \d*\d*
EDIT- Found out that i`ve missed some more formats.
13th - 16th APR
13~16 April
13/16 APR
I`ve tried using the following:
(Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)\ *\d+(?:[nr]d|th|st)?(?: * \d+(?:[nr]d|th|st)?)?|\d+(?:[nr]d|th|st)?(?: . \d+(?:[nr]d|th|st)?)?\ *(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)
Could either capture dates with space or without space.
Is there a way to capture all formats, and split the dates with '-', '/','~' and output/write into a single standardize format?
(Group 1 Date)-Month (Group 2 Date)-Month eg: 13-Apr 16-Apr
Appreciate for your kind suggestions and comments.
You need to account for optional values. Here is an enhanced version matching your sample input:
/(\d+)(?:st|[nr]d|th)?-?(\d*)(?:st|[nr]d|th)?\s*Apr(?:il)?|Apr(?:il)?\s*(\d+)(?:st|[nr]d|th)?-?(\d*)(?:st|[nr]d|th)?/i
See the regex demo (note you need to use a case-insensitive modifier to match any variants of April)
Basically, there are 2 alternatives matching April and date ranges:
(\d+)(?:st|[nr]d|th)?-?(\d*)(?:st|[nr]d|th)?\s*Apr(?:il)? - 1+ digits followed with an optional st, nd, rd, th, followed with an optional hyphen, followed with 0+ digits, followed with optional st, etc. followed with 0+ whitespace and then Apr or April (case insensitive due to /i modifier)
| - or
Apr(?:il)?\s*(\d+)(?:st|[nr]d|th)?-?(\d*)(?:st|[nr]d|th)? - the same as above but swapped.
I came up with this Regex:
(?:APR|April)\ *\d+(?:[nr]d|th|st)?(?:-\d+(?:[nr]d|th|st)?)?|\d+(?:[nr]d|th|st)?(?:-\d+(?:[nr]d|th|st)?)?\ *(?:APR|April)
See details here: Regex101
Maybe it's overkill, but I came up with this regex that will match with any month:
(?:January|JAN|February|FEB|March|MAR|April|APR|May|MAY|June|JUN|July|JUL|August|AUG|September|SEP|October|OCT|November|NOV|December|DEC)\ *\d+(?:[nr]d|th|st)?(?:-\d+(?:[nr]d|th|st)?)?|\d+(?:[nr]d|th|st)?(?:-\d+(?:[nr]d|th|st)?)?\ *(?:January|JAN|February|FEB|March|MAR|April|APR|May|MAY|June|JUN|July|JUL|August|AUG|September|SEP|October|OCT|November|NOV|December|DEC)
Unreadable, check here if you want details: Regex101
Improved version using Wiktor Stribiżew's trick:
(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)\ *\d+(?:[nr]d|th|st)?(?:-\d+(?:[nr]d|th|st)?)?|\d+(?:[nr]d|th|st)?(?:-\d+(?:[nr]d|th|st)?)?\ *(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)
See details here: Regex101
It matches every monthes, it uses less steps (more efficient)
BUT, you need to make sure you're case insensitive
I came up with this:
(\d+(?:th|st|[nr]d)?(?:-\d+(?:th|st|[nr]d)?)?\s*(?:APR|April))|((?:APR|April)\s*\d+(?:th|st|[nr]d)?(?:-\d+(?:th|st|[nr]d)?)?)
Live Demo

Regex- How to exclude words

I'm trying to match and exclude specific words using regex. I'm essentially trying to match all strings in a 24 hour with an option of an am or pm. However I would like to excludes string that begin with 2014 or 2013. For example:
Input:
11 45 pm
12 34 am
1230pm
2013pm
12 pm
12p
2014 pm
Desired output:
11 45 pm
12 34 am
1230pm
12 pm
I would like to only use one regex to match this. I know how to accomplish this task with two regex's.
I'm using the following command:
grep -E '^(?!2014)(?!2013)([01]?[0-9]|2[0-3])( )?[0-5][0-9]?\s?(am|pm)?' output.txt
with no success. Any suggestions? Thanks!
You can use a pattern like:
^((?:0?[0-9]|1[012])\s?(?:[0-5][0-9])?\s?[ap]m)
Here, I've assumed that either am or pm is present at the end of statement.
This works too:
[0-9]{1,2}\s*[0-9]{1,2}(?<!2013)(?<!2014)\s*(am|pm)