Regular expression that returns two dates - regex

I'm struggling in creating a regex that given this string:
20 - 29 APR 2017, 9 nights
returns two groups, first:
20 APR 2017
and second:
29 APR 2017

You can try this regex:
(\d+)\s*-\s*(\d+)\s*([a-zA-Z]+)\s*(\d+).*
And replace by:
$1 $3 $4\n$2 $3 $4
Regex Demo

Related

Regex greedy match failed?

I have the below 2 lines out of other lines that I need to match.
2015 - 2016 2016 - 2017 2017 - 2018 2018 - 2019 2019 - 2020 2020
2015 - 2016 2016 - 2017 2017 - 2018 2018 - 2019 2019 - 2020 APR 2021
Top is to be able to match and the get the year value '2020' and bottom is match and get the month and the year value 'APR 2021'.
I'm able to match both using this regex (^\d.*(?<month>[a-zA-Z]{3})?\s(?<year>\d{4})$) to match these 2 lines but for the bottom matched line, it does not give me the month value.
Any idea?
This is the output from
regex101.com
If all characters that aren't in month names are all going to be digits, dashes and spaces, explicitly match those (e.g: [0-9\- ]*) rather than anything (e.g : .*) :
# 'anything goes' matched here
# v
^\d[0-9 \-]*(?<month>[a-zA-Z]{3})?\s(?<year>\d{4})$

How to capture strings between time stamps with python?

I have paragraphs like below
Dec 27 09:00:06 test event[1] number one
Dec 30 02:00:06 here is event[22] Feb 01 04:36:11 helloworld2
Dec 07 04:00:11 Now is event{3} Jan 01 04:36:11 Helloworld
Jan 02 23:00:11 helloworld evnt{45}
Feb 12 04:36:11 mesg10 Feb 13 04:36:11 mesg11 Feb 14 04:36:11 testmesg12
I want to capture the time stamp and message that occurred on that time stamp
Im using pythex.org to test the python regex (?P\w{3}\s\w{2}\s\w{2}:\w{2}:\w{2})\b(?P.*)
but this is only working for line separated and fails on paragraph having multiple(1+) Timestamps and message on same line. For example in above paragraphs I cannot capture Timestamp and message on Feb 12 04:36:11 mesg10 Feb 13 04:36:11 mesg11 Feb 14 04:36:11 testmesg12
Here is a 2.x Python solution which uses findall to find multiple matches in each line of your log file:
import re
p_str = '\w{3}\s\d{2}\s\d{2}:\d{2}:\d{2}\s(.*?)(?=\w{3}\s\d{2}\s\d{2}:\d{2}:\d{2}\s|$)'
pattern = re.compile(p_str, re.IGNORECASE)
log_str = 'Feb 12 04:36:11 mesg10 Feb 13 04:36:11 mesg11 Feb 14 04:36:11 testmesg12'
match = pattern.findall(log_str)
print match
['mesg10 ', 'mesg11 ', 'testmesg12']
Demo
The challenge here is in formulating a pattern which will work. I went the route of matching a timestamp, and then using a lookahead to know when to stop matching. We stop matching when we either see another timestamp, or we see the end of the line. Note that matching the next timestamp won't work here, because we need that to be the start of the next match as the regex works its way across the line.
Explore the demo to see the code in action.

Pattern for month with low and capital letters

I have following strings:
Emini Mar 15 ME
Emini ICE MAR 15 RTA
Emini ABC Apr 15 RTA
and use pattern:
[\S]*(Jan|Feb|Mar|Apr|May|Jin|Jul|Aug|Sep|Oct|Nov|Dec)+(\s+\d{1,2})
How to create short pattern instead ...(Jan|JAN|jan|Feb|FEB|feb...) etc.
Thanks in advance
Just add case-insensitive modifier i
(?i)\S*(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)+(\s+\d{1,2})

Regex- How to exclude words

I'm trying to match and exclude specific words using regex. I'm essentially trying to match all strings in a 24 hour with an option of an am or pm. However I would like to excludes string that begin with 2014 or 2013. For example:
Input:
11 45 pm
12 34 am
1230pm
2013pm
12 pm
12p
2014 pm
Desired output:
11 45 pm
12 34 am
1230pm
12 pm
I would like to only use one regex to match this. I know how to accomplish this task with two regex's.
I'm using the following command:
grep -E '^(?!2014)(?!2013)([01]?[0-9]|2[0-3])( )?[0-5][0-9]?\s?(am|pm)?' output.txt
with no success. Any suggestions? Thanks!
You can use a pattern like:
^((?:0?[0-9]|1[012])\s?(?:[0-5][0-9])?\s?[ap]m)
Here, I've assumed that either am or pm is present at the end of statement.
This works too:
[0-9]{1,2}\s*[0-9]{1,2}(?<!2013)(?<!2014)\s*(am|pm)

regex to skip intermediate characters

Given the following string "Mon Feb 01 02:42:27 +0000 2013", what should the regex be to get the following "Feb 01 2013".
The following regex "\w{3}\s\w{3}\s\d{1,2}" will yield "Mon Feb 01". How do I write a regex to ignore the day (Mon) and the time and secs ?
I had asked a related question in an earlier post, which was kindly answered by the community - unable to parse whitespace in regex
This could work :
(Mon|Tue|Wed|Thu|Fri|Sat|Sun) [a-zA-Z]{3} ([0-9]{2}) ([0-9]{2})\:([0-9]{2})\:([0-9]{2}) \+([0-9]{4}) ([0-9]{4})