In the below string I need to use re.findall for '(any day)' then have it print upto the delimiter ',' prior
rr='PU3lserver1^server2|ABAP|Revisions|true|null|Weekend
only,ATN|server3|ABAP|Revisions|true|null|1:00 AM to 3:00 AM CET (any
day),B4P|server4^server5|ABAP|Revisions|true|Generic AFL|8:00 PM to 3:00 AM
CET (any day),C8B|server6|ABAP|Revisions|true|Generic AFL|8:00 PM to 3:00 AM
CET (any day),QU8|testserver|ABAP|Revisions|true|null|1:00 AM to 3:00 AM CET
(any day),S77|testserver|ABAP|Revisions|true|null|Weekend only'
works well as expected:
re.findall(r'[^\s,]+Weekend\s\bonly' ,rr, re.M)
Does not work as expected:
re.findall(r'[\s,]+\(any\s\bday\)' ,rr, re.M)
Any help or suggestion where I am going wrong.
You were almost correct
All you need to do is to change the character class to negation as
r'[^,]+\(any\s\bday\)'
[^,]+ Negated character class, matches anything other than , till any day is found
re.M can be dropped as the input is single lined
Test
>>> re.findall(r'[^,]+\(any\s\bday\)' ,rr)
['B4P|server4^server5|ABAP|Revisions|true|Generic AFL|8:00 PM to 3:00 AM \nCET (any day)',
'C8B|server6|ABAP|Revisions|true|Generic AFL|8:00 PM to 3:00 AM \nCET (any day)',
'QU8|testserver|ABAP|Revisions|true|null|1:00 AM to 3:00 AM CET \n(any day)']
Related
I have the following regex that I have been working on:
^(\d\d)\s(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\s(\d{4})?$
I am trying to grab the date from an email header that is formatted like so:
"Mon, 18 Nov 2019 09:19:17 -0700 (MST)"
and I want the result to be:
18 Nov 2019
It seems that the \s for whitespace could be the culprit, but I have yet to find another forum result that grabs dates with whitespace instead of "-" or "/".
Does anyone have any suggestions for getting this working to extract as described above? Thanks in advance.
The problem is that you have added the "^" and "$" symbol on the start and end of the regex.
"^n": The ^n quantifier matches any string with n at the beginning of it.
"n$": The n$ quantifier matches any string with n at the end of it.
Since the text is not start with 2 digit (\d\d) and end with 2 digit (\d{4}). You will not get any result from this regex.
You can simply remove those two symbol or use the following code to achieve that.
/(\d{2}\s(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\s\d{4})/.exec("Mon, 18 Nov 2019 09:19:17 -0700 (MST)")[1]
I am vexed - and I suspect there is an easy solution to this but after a fair amount of research, I'm reaching out to the community.
I'm using the regex method in vba to try to split strings. What I want to occur is that the entire string will match the pattern unless there is another name in the string. The name can be described by:
"\s?[a-zA-Z-]*,\s[a-zA-Z]*:\s.*"
I would expect that the method would return everything after the name is matched - until another name is matched. This would be the desired outcome.
The strings I'm applying that pattern to are:
Meck, Mary: Fri 6/14/2019 5:00 PM -- 10:00 PM CLERKPETRO Flinstone, Fred: Fri 6/14/2019 10:00 AM -- 4:00 PM CLERKPETRO Powers, Kenny: Fri 6/14/2019 10:00 PM -- 11:00 PM
Rhodes, Randy: Sat 6/15/2019 10:15 AM -- 11:30 AM SERVCNTR Sat 6/15/2019 11:30 AM -- 12:45 PM CLICK AND PICK Sat 6/15/2019 12:45 PM -- 2:15 PM SERVCNTR
When I apply the pattern to either string, the entire string is returned. This is not optimal because I'm trying to split on names using matches(0), matches(1), etc.. so the first string should match on:
Meck, Mary: Fri 6/14/2019 5:00 PM -- 10:00 PM CLERKPETRO
Flinstone, Fred: Fri 6/14/2019 10:00 AM -- 4:00 PM CLERKPETRO
Powers, Kenny: Fri 6/14/2019 10:00 PM -- 11:00 PM
yet the second string should match on the entire string (as it currently does) because there is not a second name in that string.
How do I solve this problem?
This is one way to do it
\b[a-zA-Z-]+,\s?[a-zA-Z]+:.*?(?=\b[a-zA-Z-]+,\s?[a-zA-Z]+:|$)
https://regex101.com/r/ccj6ea/1
Expanded
\b
[a-zA-Z-]+
,
\s?
[a-zA-Z]+
:
.*?
(?=
\b
[a-zA-Z-]+
,
\s?
[a-zA-Z]+
:
|
$
)
RegEx 1
I'm guessing that we wish to capture three parts of the strings listed in the question, which if that might be the case, we would be starting by slightly modifying the original expression:
(?:\s+)?([a-zA-Z-]+),?(?:\s+)?([a-zA-Z]+):(.+?[A-Z]{3,}).*
where our desired outputs are in these three groups:
([a-zA-Z-]+)
([a-zA-Z]+)
(.+?[A-Z]{3,})
Demo
RegEx Circuit
jex.im visualizes regular expressions:
RegEx 2
If we wish to split them on names, we would simplify our expression to:
(?:\s+)?([A-Z][a-zA-Z-]+),?(?:\s+)?([A-Z][a-zA-Z]+):
Demo 2
I'm tasked to capture date for itineraries in email message, but the dates given were all in different formats, I guess I need help to find out if there's any way to capture the following formats:
02 APR
APR 02
2 APR
APR 2
2nd APR
APR 2nd
2nd April
April 2nd
APR 12th
April 12th
12th April
April 13-16
13-16 April
APR 13-16
13-16 APR
April 13th-16th
13th-16th April
APR 13th-16th
13th-16th APR
I've tried numerous ways but just could not understand or fathom as I'm a
newbie to regex.
The closest I could get was using this:
(\d*)-(\d*) APR|April \d*\d*
EDIT- Found out that i`ve missed some more formats.
13th - 16th APR
13~16 April
13/16 APR
I`ve tried using the following:
(Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)\ *\d+(?:[nr]d|th|st)?(?: * \d+(?:[nr]d|th|st)?)?|\d+(?:[nr]d|th|st)?(?: . \d+(?:[nr]d|th|st)?)?\ *(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)
Could either capture dates with space or without space.
Is there a way to capture all formats, and split the dates with '-', '/','~' and output/write into a single standardize format?
(Group 1 Date)-Month (Group 2 Date)-Month eg: 13-Apr 16-Apr
Appreciate for your kind suggestions and comments.
You need to account for optional values. Here is an enhanced version matching your sample input:
/(\d+)(?:st|[nr]d|th)?-?(\d*)(?:st|[nr]d|th)?\s*Apr(?:il)?|Apr(?:il)?\s*(\d+)(?:st|[nr]d|th)?-?(\d*)(?:st|[nr]d|th)?/i
See the regex demo (note you need to use a case-insensitive modifier to match any variants of April)
Basically, there are 2 alternatives matching April and date ranges:
(\d+)(?:st|[nr]d|th)?-?(\d*)(?:st|[nr]d|th)?\s*Apr(?:il)? - 1+ digits followed with an optional st, nd, rd, th, followed with an optional hyphen, followed with 0+ digits, followed with optional st, etc. followed with 0+ whitespace and then Apr or April (case insensitive due to /i modifier)
| - or
Apr(?:il)?\s*(\d+)(?:st|[nr]d|th)?-?(\d*)(?:st|[nr]d|th)? - the same as above but swapped.
I came up with this Regex:
(?:APR|April)\ *\d+(?:[nr]d|th|st)?(?:-\d+(?:[nr]d|th|st)?)?|\d+(?:[nr]d|th|st)?(?:-\d+(?:[nr]d|th|st)?)?\ *(?:APR|April)
See details here: Regex101
Maybe it's overkill, but I came up with this regex that will match with any month:
(?:January|JAN|February|FEB|March|MAR|April|APR|May|MAY|June|JUN|July|JUL|August|AUG|September|SEP|October|OCT|November|NOV|December|DEC)\ *\d+(?:[nr]d|th|st)?(?:-\d+(?:[nr]d|th|st)?)?|\d+(?:[nr]d|th|st)?(?:-\d+(?:[nr]d|th|st)?)?\ *(?:January|JAN|February|FEB|March|MAR|April|APR|May|MAY|June|JUN|July|JUL|August|AUG|September|SEP|October|OCT|November|NOV|December|DEC)
Unreadable, check here if you want details: Regex101
Improved version using Wiktor Stribiżew's trick:
(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)\ *\d+(?:[nr]d|th|st)?(?:-\d+(?:[nr]d|th|st)?)?|\d+(?:[nr]d|th|st)?(?:-\d+(?:[nr]d|th|st)?)?\ *(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)
See details here: Regex101
It matches every monthes, it uses less steps (more efficient)
BUT, you need to make sure you're case insensitive
I came up with this:
(\d+(?:th|st|[nr]d)?(?:-\d+(?:th|st|[nr]d)?)?\s*(?:APR|April))|((?:APR|April)\s*\d+(?:th|st|[nr]d)?(?:-\d+(?:th|st|[nr]d)?)?)
Live Demo
I have a date in the following format which is how it is stored in an external application
06/12/2014 6:31 PM IST
I want to change this format to which needs to check AM and PM also and rest of the minutes to be hardcoded as :00+5:30
2014-06-12 18:31:00+05:30
Since it is not a simple GMT to PDT change I tried with epoch, but the problem here is that the date is in different format.
you can just do it with regex and a printf. in your particular case i suggest:
my $yourstring = '06/12/2014 6:31 PM IST';
my ($mm,$dd,$yy,$hh,$mi,$ampm) = $yourstring =~ m~(\d+)/(\d+)/(\d+)\s+(\d+):(\d+)\s+(AM|PM)\s+~;
if ($ampm eq 'PM' && $hh < 12) { $hh += 12 }
printf ('%04d-%02d-%02d %02d:%02d:00+05:30', $yy, $mm, $dd, $hh, $mi);
or look into DateTime::Format::DateParse
The best thing to do is to use Perl's now built in Date/Time functions in Time::Piece. Once you convert your time into a true Perl time, you can format it any way you please.
The problem is that Time::Piece doesn't do a very good job at handling timezones. The easiest way of handling timezones on strings is to simply remove them:
use strict;
use warnings;
use Time::Piece;
# This is what you have given
my $input_time="6/12/2014 6:31 PM IDT";
# Remove timezone (the last four characters)
my $munged_time = substr $input_time, 0, -4;
# This is a description of your "format" above
# You can find this by doing a "man date" on Unix.
# Sometimes it's "man strftime".
#
# The various %x are bits that represent the time
# in various formats.
# %m = month %d = month date %Y = four digit year
# %I = hour 01-12 %M = Minutes %p = AM/PM
my $time_format="%m/%d/%Y %I:%M %p";
# Now convert the time from the format to a Time::Zone
# Date Object.
my $time = Time::Piece->strptime($munged_time, $time_format);
# Now that the time is in a Time::Piece object, we can easily
# manipulate it to do whatever we want. We could add or subtract
# date times from each other, add or subtract hours, months, etc.
# Here, I'm just printing out the various bits and pieces of
# "datetime" that I am interested in. Note I'm using time->second
# even though I never specified seconds. It's okay, Time::Piece
# simply assumes that seconds are zero.
#
printf "%04d-%02d-%02d %02d:%02d:%02d+05:30\n",
$time->year, $time->mon, $time->mday,
$time->hour, $time->minute, $time->second;
The question does not contain any information about which time formats are possible at all. Are day of month and month always with 2 digits? Is the hour in range 0 to 9 always without a leading 0? Is it possible that the minutes 0 to 9 are without a leading 0? Which time zones are possible, just IST or others as well?
Here is a list of Perl regular expression search and replace strings which must be applied all to reformat date and time strings for time zone IST to get the date and time in format YYYY-MM-DD hh:mm:00+05:30
Reformat all times with AM in string.
Search: ([01]?\d)/([0-3]?\d)/([12][09]\d\d) +([01]?\d:[0-5]?\d) AM IST
Replace: $3-$1-$2 $4:00+05:30
Reformat all times with 12:xx PM in string.
Search: ([01]?\d)/([0-3]?\d)/([12][09]\d\d) +(12:[0-5]?\d) PM IST
Replace: $3-$1-$2 $4:00+05:30
Reformat all times with 01:xx PM in string.
Search: ([01]?\d)/([0-3]?\d)/([12][09]\d\d) +0?1(:[0-5]?\d) PM IST
Replace: $3-$1-$2 13$4:00+05:30
Reformat all times with 02:xx PM in string.
Search: ([01]?\d)/([0-3]?\d)/([12][09]\d\d) +0?2(:[0-5]?\d) PM IST
Replace: $3-$1-$2 14$4:00+05:30
Reformat all times with 03:xx PM in string.
Search: ([01]?\d)/([0-3]?\d)/([12][09]\d\d) +0?3(:[0-5]?\d) PM IST
Replace: $3-$1-$2 15$4:00+05:30
Reformat all times with 04:xx PM in string.
Search: ([01]?\d)/([0-3]?\d)/([12][09]\d\d) +0?4(:[0-5]?\d) PM IST
Replace: $3-$1-$2 16$4:00+05:30
Reformat all times with 05:xx PM in string.
Search: ([01]?\d)/([0-3]?\d)/([12][09]\d\d) +0?5(:[0-5]?\d) PM IST
Replace: $3-$1-$2 17$4:00+05:30
Reformat all times with 06:xx PM in string.
Search: ([01]?\d)/([0-3]?\d)/([12][09]\d\d) +0?6(:[0-5]?\d) PM IST
Replace: $3-$1-$2 18$4:00+05:30
Reformat all times with 07:xx PM in string.
Search: ([01]?\d)/([0-3]?\d)/([12][09]\d\d) +0?7(:[0-5]?\d) PM IST
Replace: $3-$1-$2 19$4:00+05:30
Reformat all times with 08:xx PM in string.
Search: ([01]?\d)/([0-3]?\d)/([12][09]\d\d) +0?8(:[0-5]?\d) PM IST
Replace: $3-$1-$2 20$4:00+05:30
Reformat all times with 09:xx PM in string.
Search: ([01]?\d)/([0-3]?\d)/([12][09]\d\d) +0?9(:[0-5]?\d) PM IST
Replace: $3-$1-$2 21$4:00+05:30
Reformat all times with 10:xx PM in string.
Search: ([01]?\d)/([0-3]?\d)/([12][09]\d\d) +10(:[0-5]?\d) PM IST
Replace: $3-$1-$2 22$4:00+05:30
Reformat all times with 11:xx PM in string.
Search: ([01]?\d)/([0-3]?\d)/([12][09]\d\d) +11(:[0-5]?\d) PM IST
Replace: $3-$1-$2 23$4:00+05:30
Insert leading 0 where missing in date (month, day of month) or time (hour, minute) string using an expression with a positive lookbehind and a positive lookahead:
Search: (?<=[ \-:])(\d)(?=[ \-:])
Replace: 0$1
In case of above expression does not work because of lookbehind/lookahead, it would be also possible to use less restrictive regular expression:
Search: \b(\d)\b
Replace: 0$1
Given the following string "Mon Feb 01 02:42:27 +0000 2013", what should the regex be to get the following "Feb 01 2013".
The following regex "\w{3}\s\w{3}\s\d{1,2}" will yield "Mon Feb 01". How do I write a regex to ignore the day (Mon) and the time and secs ?
I had asked a related question in an earlier post, which was kindly answered by the community - unable to parse whitespace in regex
This could work :
(Mon|Tue|Wed|Thu|Fri|Sat|Sun) [a-zA-Z]{3} ([0-9]{2}) ([0-9]{2})\:([0-9]{2})\:([0-9]{2}) \+([0-9]{4}) ([0-9]{4})