Python Regex. Capturing text between matches issue

Python Regex. Capturing text between matches issue - regex

Quite new to regex and having trouble getting the right match.
I have the following string:
The AGM will be held at the Company's registered office at Unity House, Telford Road, Basingstoke, Hampshire, RG21 6YJ on 13 January 2016 at 10.00 a.m.
The Company announces that its 2016 Annual General Meeting will be held on 11 February 2016 at 10.00 a.m. at Hangar 89, London Luton Airport, Luton, Bedfordshire, LU2 9PF.
I am trying to extract the address from the last occurrence of 'at' till the postcode. So Unity House, Telford Road, Basingstoke, Hampshire, RG21 6YJ and Hangar 89, London Luton Airport, Luton, Bedfordshire, LU2 9PF
This is what I use (at)(?!.*at)(.*)\s([A-Z]{1,2}[0-9][A-Z0-9]? [0-9][ABD-HJLNP-UW-Z]{2})
it extracts only the second address. Any thoughts?
Thanks

Looks like you want to use ((?:(?!at).)*) instead (?!.*at)(.*) for avoiding to skip over at
(at)((?:(?!at).)*)\s([A-Z]{1,2}[0-9][A-Z0-9]? [0-9][ABD-HJLNP-UW-Z]{2})
See demo at regex101
If you use (at)(?!.*at)(.*) with s flag, there is only at the last at not another at ahead. So it is expected that only the last one would match. (at)((?:(?!at).)*) will not skip over another at.

Related

Google sheets: Parse date from text

Google Sheets - Parsing
From given text, how do you extract the date?
Given text
Extracted date (to be generated)
Graduation reunion on Saturday, September 10, 2022 at 123 Front Street
September 10, 2022
BBQ Party on Sunday October 1, 2022 at 213 South Street
October 1, 2022
Google Sheets link
--
I've tried
=regexextract(A2,"\w{9} \d{2}, \d{4}")*1
As shown in the Google Sheets, this only works for the first one which is September 10, 2022. However, not all months have the same number of characters.

You may use either of the below:
Here, you have to drag down for the formula to populate below
=REGEXEXTRACT(A2,", (.*?) at")
while the code below, will automatically expand on the column
=ARRAYFORMULA(IF(A2:A="","",REGEXEXTRACT(A2:A,", (.*?) at")))
The formula, will take the characters after the first comma until 'at'.

try:
=INDEX(IFNA(REGEXEXTRACT(A2:A, "(\w+ \d+, \d{4})")*1))

How do I capture the Months with proper Regex?

How do I capture the days of months as numbers, excluding any suffixes. For instance - January 11th would be 11, and March 25th would be 25.

You could use the regex string and then only use the 3rd capturing group.
We accept 3 letter months Jan 1st and full name January 1st and accept space, hyphen,comma or slash as in Jan 01 Jan-01 Jan,1st Jan/31
(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)(uary|ruary|ch|il|e|y|ust|tember|ember)?[ \/,-]{0,2}([0-3]?[0-9])
You would do better to look for native time manipulation if possible.

Regex to find capitalized words and code string?

I'm looking to change a simple dash to an em dash in some obituaries we receive. But it's only after the City of Death where that em dash should go.
The text looks like this:
#M_DeathNoticeHed:Alex <\n>Ornelas
#M_DeathNoticeBod:ALAMO <\!-> Alex Ornelas <\n>, 25, died Tuesday, Aug. <\n>16, 2016 at Alamo. Me<\h>morial Funeral Home of <\n>San Juan is in charge of ar<\h>rangements.
#M_DeathNoticeHed:Almaquire Cadena
#M_DeathNoticeBod:RIO GRANDE CITY <\!-> Almaquire <\n>Cadena , 87, died Tues<\h>day, Aug. 16, 2016 at Pax <\n>Villa Hospice, in McAllen, <\n>TX. Sanchez Funeral Home <\n> of Rio Grande City is in <\n>charge of arrangements.
#M_DeathNoticeHed:AnaRose <\n>Collazi
#M_DeathNoticeBod:MISSION <\!-> AnaRose <\n>Collazo , 44, died Wednes<\h>day, Aug. 17, 2016 at Mis<\h>sion Regional Medical Cen<\h>ter in Mission. Virgil Wilson <\n>Mortuary of Mission is in <\n>charge of arrangements.
#M_DeathNoticeHed:Andy Garza
#M_DeathNoticeBod:RIO GRANDE CITY <\!-> Andy <\n>Garza , 21, died Tuesday, <\n>Aug. 16, 2016 at Chicago, <\n>IL. Rodriguez Funeral <\n>Home of Roma is in <\n>charge of arrangements.
Notice that after every "#M_DeathNoticeBod: CITY" is "<\!->" which symbolizes the dash that I need changed to an em dash.
My regex code is not getting the "<\!->" selected along with the preceding city and "#M_DeathNoticeHed:".
#M_DeathNoticeBod:([^A-Za-z]*?[A-Z][A-Za-z]*)([^A-Za-z]*?[A-Z][A-Za-z]*) [<\!->]
It is also not selecting cities with 3 names in it like "RIO GRANDE CITY". I'm selecting this because the dash appears in other spots in the file that I do not want replaced.
If I can select that section I can replace the dash here.

If the lines you care about always start with "#M_DeathNoticeBod:" followed by the city of death, followed by the <!-> you wish to replace, I think something simple would do the job:
(#M_DeathNoticeBod:.*)<\\!->
Capture group 1 will contain everything up until that first "<\!->", so you if you're doing a search and replace you can just replace each occurrence of that regex with the contents of group 1 (usually denoted by '\1') followed by an em dash.

This regex should do:
#M_DeathNoticeBod:([A-Z ]*) (<\\!->)

I think this is what you are actually looking for:
(?<=#M_DeathNoticeBod:).+<\\!->
To explain things, the first portion, (?<=#M_DeathNoticeBod:) within the parenthesis is the positive lookbehind that doesn't participate in matching but ensures the trailing portion shall always be preceded by that expression.
I the trailing portion .+ should capture any city name containing any character sequence followed by your <!-> delimiter, which is captured by <\\!-> regex.

Regex expression matches without specifying comma (,) in regex

I am taking help of this website to learn regex and I am stuck at this particular lesson. Looks like regex is wrong there.
When I write (\w+\s\d+)((\,\d+)?) "text" and "capture" goes green but "result" appears wrong (cross marks).
But if Write (\w+ (\d+)) it gives below result.
your task text capture result
capture text Jan 1987 Jan 1987, 1987 ✓
capture text May 1969 May 1969, 1969 ✓
capture text Aug 2011 Aug 2011, 2011 ✓
Now, question is (\w+ (\d+)) doesn't show that it going to capture comma but is right answer.And, in this (\w+\s\d+)((\,\d+)?) expression I have specified but it is coming wrong, why?

That's because the capture column tells you, what you should capture. For example: Jan 1987, 1987 means you should capture two groups. 1) Jan 1987 2) 1987
They use the comma as divider between the groups. So it's not part of the string you should capture, but just a divider to tell you where the next excepted capture group starts.
If you step to the next lesson http://regexone.com/lesson/13 my example will be much more clear. In the text column there isn't any comma (e.g. 1280x720) but in capture column you're asked for "1280, 720". So this props my theory.

Parsing dates from string - regex

I'm terible with regex and I can't seem to wrap my head around this simple task.
I need to parse out the two dates in a string which always has one of two formats:
"Inquiry at your property for December 29, 2013 - January 03, 2014"
OR
"Inquiry at your property for 29 December , 2013 - 03 January, 2014"
the 2 different date formats are throwing me off. Any insights would be appreciated!

/(\d+ \w+, \d+|\w+ \d+, \d+)/ for example. Try it out on Rubular.
For sure, it would pickup more stuff, like 2013 NotReallyAMonth, 12345. But if you don't have things in the input that look like a date, but not actually a date this might work.
You could make the regexp stronger, but applying more restrictions on what is matched:
/(\d{2} (?:January|December), \d{4}|(?:January|December) \d{2}, \d{4})/
In this case the day is always two digits, the year is 4. Months are listed explicitly (you would have to list all of them).
Update: For ranges it would be a different regexp:
/((?:Jan|Dec) \d+ - \d+, \d{4})/
Obviously they can all be combined together:
/(\d{2} (?:January|December), \d{4}|(?:January|December) \d{2}, \d{4}|(?:Jan|Dec) \d+ - \d+, \d{4})/

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Python Regex. Capturing text between matches issue - regex

Related

Google sheets: Parse date from text

How do I capture the Months with proper Regex?

Regex to find capitalized words and code string?

Regex expression matches without specifying comma (,) in regex

Parsing dates from string - regex

Categories

Resources