regex to select text between 2 string? - regex

I've the following string
Month: March 2011
Month: January 2012
Month: December 2011
and I'd like to write a regex which select the name of the month (ie "March") only for 2011. This mean to select everything between the string "Month: " and the year "2011".
The regex I made is
^(Month:)[A-Za-z0-9]+(2011)$
but it doesn't seem to work. What's wrong???
The results should be "March" and "December".
Thanks!

Use a lookahead:
/[a-z]+(?= 2011)/ig
See it here in action: http://regex101.com/r/fE9lR9
Here's a JavaScript demo: http://jsfiddle.net/rZJur/

This expression works (look behind / look ahead) (?<=Month\:\s)(.+?)(?=\s2011).
Edit:
With just Non-Capturing groups, this works: (?:Month\:\s)(.+?)(?:\s2011)

It isn't matching because you're not accounting for the white space in between the month and the year.
You could capture the month group after accounting for the space:
^(?:Month: )([A-Za-z]+)(?: 2011)$
Or you could use a look ahead/lookbehind combo:
(?<=^Month: )[A-Za-z]+(?= 2011$)

Related

REGEXP_EXTRACT specific string to extract year or month in Google Data Studio

I'm trying to categorize my sites but they have not always the same uri-structure so I want to extract the year in one column and in the second one I want to extract the month.
The results should be year and months in seperate columns/fields:
url
year
months
/www.site.com/path1/resort/2021/02/sitename
2021
02
/www.site.com/path1/2021/02
2021
02
/www.site.com/path1/2020/11-12
2020
11-12
/www.site.com/path1/2020/07-08
2020
07-08
/www.site.com/path1/resort/
null
null
the following regex for the year worked:
REGEXP_EXTRACT(url,'([0-9]{4})') >> result: 2020, null etc.
but the regex for the month didnt extract only the months:
REGEXP_EXTRACT(url,'((?:[0-9]{4}/)[0-9]+.?[0-9]*/)') >> result: 2020/11-12/,2021/02/, null etc.
Thanks for the help in advance.
You can use
(?:^|/)((?:19|20)[0-9]{2})/((?:0?[1-9]|1[0-2])(?:-(?:0?[1-9]|1[0-2]))?)(?:/|$)
See the regex demo.
If you need to capture only once per a match, replace the capturing group with non-capturing, or remove the extra pattern:
REGEXP_EXTRACT(col_url, '(?:^|/)((?:19|20)[0-9]{2})(?:/|$)') as Year
REGEXP_EXTRACT(col_url, '(?:^|/)((?:0?[1-9]|1[0-2])(?:-(?:0?[1-9]|1[0-2]))?)(?:/|$)') as Month
Details:
(?:^|/) - string start or /
((?:19|20)[0-9]{2}) - Group 1: a year, 19 or 20 followed with any two digits
/ - a / char
((?:0?[1-9]|1[0-2])(?:-(?:0?[1-9]|1[0-2]))?) - Group 2 (month): an optional 0 and then 1 to 9, or 1 and then 0 to 2 (00-12), and then an optional occurrence of - and the same month pattern
(?:/|$) - / or end of string.

Regex, 1) call last matching group 2) match exact word in a line( not partially matching in a line)

UiPath base Regex - I'm trying to get the match in UiPath where it's the
(1) last match group
(2) match with whole line(not partially match in a line)
RawData
(this data it's just part of the full data)
MAT year 2019
MAT year 2020
MAT year 2021
year 2016
year 2017
year 2018
Expected outcome(1)
MAT year 2021
Expected outcome (2)
year 2017
year 2018
year 2019
ps# not include year from first three lines
year 2019
year 2020
year 2021
Solution (1) I tried:
get index variable from for each loop till very last, and RawData(IndexVariable).ToString
(not working, if RawData(2).ToString then working, but not all the time will be index 2)
Regex for (1): MAT to (\d\d|\d)/(\d\d|\d)/\d\d\d\d
Solution (2) I tried:
Regex for (2): Year\s\d\d\d\d
and (?!mat)(Year\s\d\d\d\d) #PS not working, look ahead
Remarks: tried also ^ and $, but this only match first line or last line, not all starting line.
I'm guessing that your desired expression might be:
^MAT\syear\s\d{4}(?=(?:\s*year\s\d{4}))$|^(?!MAT\s)year\s\d{4}$
Demo
If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.
RegEx Circuit
jex.im visualizes regular expressions:
after super long process of investigation, i finalise with answer that fit for my case as well as solution for query 2 from #Emma,
Explanation for query 1 solution: for each query i making, get the last match with condition of all phrase starting with MAT has to be group together, can't mixed over like below:
MAT year 2012
year 2019
MAT year 2322
Solution:
^MAT(?:.(?!\nMAT))+$
where i'm trying to read start from MAT and next line not beginning with MAT.
Solution for query two, where only choose exact match in the line.
^year\s\d{4}$
a line where start from year and end with \d for 4 times
Also, learning from #Emma, combine two queries into 1, where adding | for both queries. so that run once can get both value on hand
^MAT(?:.(?!\nMAT))+$|^year\s\d{4}$

Complex Regex finding date and time

Is there someone to help me with the following:
I'm trying to find specific date and time strings in a text (to be used within VBA Word).
Currently working with the following RegEx string:
(?:([0-9]{1,2})[ |-])?(?:(jan(?:uari)?|feb(?:ruari)?|m(?:aa)?rt|apr(?:il)?|mei|jun(?:i)?|jul(?:i)?|aug(?:ustus)?|sep(?:tember|t)?|okt(?:ober)?|nov(?:ember)?|dec(?:ember)?))?(?: |-)?(?(3)(?: around | at | ))?(?:([0-9]{1,2}:[0-9]{1,2})?(?: uur| u|u)?)?
Tested output on following text:
date with around time: 26 sep 2016 around 09:00u
date with at time: 1 sep 2016 at 09:00 uur
date and time u: 1 sep 2018 09:00 u
time without date: 08:30 uur
date with time u: 1 sep 2016 at 09:00u
only time: 09:00
only month: jan
month and year: feb 2019
only day: 02
only day with '-': 2-
day and month: 2 jan
month year: jan 2018
date with '-': 2-feb-2018 09:00
other month: 01 sept 2016
full month: 1 september 2018
shortened year: jul '18
Rules:
a date followed by time is valid
a date followed by text 'around' or 'at', followed by time is valid
a date without day number is valid
a date without year is valid
a date, month only is not valid
a day, without month or year not valid
a date may contain dashes '-'
a year may be shortenend with ', like jun '18
month name can be short or long
full match includes ' uur' or 'u' (to highlight the text in ms-Word)
submatches text from capture are without prepending or trailing spaces
example at: [https://regex101.com/r/6CFgBP/1/]
Expected output (when using in VBA Word):
An regex Matches collection object in which each Match.SubMatches contains the individual items d, m, y, hh:mm from the capture groups in the regex search string.
So for example 1: the Submatches (or capture groups) contains values: '26' ','sep','2016','09:00'
The RegEx works fine, but some false-positives need to be excluded:
In case there is a day without month/year, should be excluded from Regex (example 9 and 10)
In case there is a month without day, should be excluded (example 7)
(I was trying with som lookahead and reference \1 and ?(1), but was not able to get it running properly...)
Any advice highly appreciated!
As I understood, you require that each date/time part (day, month, year, hour
and minute) must be present.
So you should remove ? after relevant groups (they are not optional).
It is also a good practice to have each group captured as a relevant capturing group.
There is no need to write something like jun(?:i)?. It is enough
(and easier to read) when you write just juni? (the ? refers just
to preceding i).
Another hint: As the regex language contains \d char class, use just
it instead of [0-9] (the regex is shorter and easier to read.
Optional parts (at / around) should be an optional and non-capturing group.
Anything after the minute part is not needed in the regex.
So I propose a regex like below (for readability, I divided it into rows):
(\d{1,2})[ -](jan(?:uari)?|feb(?:ruari)?|m(?:aa)?rt|apr(?:il)?|mei|juni?
|juli?|aug(?:ustus)?|sep(?:tember|t)?|okt(?:ober)?|nov(?:ember)?|dec(?:ember)?)
[ -](\d{4}) (?:around |at )?(\d{1,2}:\d{1,2})
Details:
(\d{1,2}) - Day.
[ -] - A separator after the day (either a space or a minus).
(jan(?:uari)?|...dec(?:ember)?) - Month.
[ -] - A separator after the month.
(\d{4}) - year.
(?:around |at )? - Actually, 3 variants of a separator between year
and hour (space / around / at), note the space before (...)?.
(\d{1,2}:\d{1,2}) - Hour and minute.
It matches variants 1, 2, 3, 5 and 13.
All remaining fail to contain each required part, so they are not matched.
If you allow e.g. that the hour/minute part is optional, change the respective fragment
into:
( (?:around |at )?(\d{1,2}:\d{1,2}))?
i.e. surround the space/around/at / hour / minute part with ( and )?,
making this part an optional group. Then, variants 14 and 15 will also
be matched.
One more extension: If you also allow the hour/minute part alone,
add |(\d{1,2}:\d{1,2}) to the regex (all before is the first variant and
the added part is the second variant for just hour/minute.
Then, your variants No 4 and 6 will also be matched.
For a working example see https://regex101.com/r/33t1ps/1
Edit
Following your list of rules, I propose the following regex:
(\d{1,2}[ -])? - Day + separator, optional.
(jan(?:uari)?|...|dec(?:ember)?) - Month.
(?:[ -](\d{4}|'\d{2}))? - Separator + year (either 4 or 2 digits with "'").
( (?:around |at )?(\d{1,2}:\d{1,2}))? - Separator + hour/minute -
optional end of variant 1.
|(\d{1,2}:\d{1,2}) - Variant 2 - only hour and minute.
It does not match only your variants No 9 and 10.
For full regex, including also "uur" see https://regex101.com/r/33t1ps/3
Finally I found something that helps me using the month properly :-)
\b(?:([1-3]|[0-3]\d)[ |-](?'month'(?:[1-9]|\d[12])|(?:jan(?:uari)?|feb(?:ruari)?|m(?:aa)?rt|apr(?:il)?|mei|jun(?:i)?|jul(?:i)?|aug(?:ustus)?|sep(?:tember|t)?|okt(?:ober)?|nov(?:ember)?|dec(?:ember)?))?)?(?:(\g'month')[ |-]((?:19|20|\')(?:\d{2})))?\b(?: omstreeks | om | )?(?:(\d{1,2}[:]\d{2}(?: uur|u)?|[0-2]\d{3}(?: uur|u)))?\b
It uses a named constructor/subroutine. Found here:
https://www.regular-expressions.info/subroutine.html

Validate Month Year Format

I need to validate Text box in this format (ex:FEB 2014 MMM YYYY).
I am using the following regular expression string
^(JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC)\-\d{4}$
Only issue is that my input is with a 'space' and not '-' i.e. JUN 2012 not JUN-2012
Can someone please amend the above regex to cater for space
Thanks
Try the below regex to match month and year in this MMM YYYY format ,
^(?:JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC) \d{4}$
DEMO
use \s instead of \- in your regex
like this :
^(JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC)\s\d{4}$
#Avinash is right \s matches [\r\n\t\f] better use " " instead.

Parsing dates from string - regex

I'm terible with regex and I can't seem to wrap my head around this simple task.
I need to parse out the two dates in a string which always has one of two formats:
"Inquiry at your property for December 29, 2013 - January 03, 2014"
OR
"Inquiry at your property for 29 December , 2013 - 03 January, 2014"
the 2 different date formats are throwing me off. Any insights would be appreciated!
/(\d+ \w+, \d+|\w+ \d+, \d+)/ for example. Try it out on Rubular.
For sure, it would pickup more stuff, like 2013 NotReallyAMonth, 12345. But if you don't have things in the input that look like a date, but not actually a date this might work.
You could make the regexp stronger, but applying more restrictions on what is matched:
/(\d{2} (?:January|December), \d{4}|(?:January|December) \d{2}, \d{4})/
In this case the day is always two digits, the year is 4. Months are listed explicitly (you would have to list all of them).
Update: For ranges it would be a different regexp:
/((?:Jan|Dec) \d+ - \d+, \d{4})/
Obviously they can all be combined together:
/(\d{2} (?:January|December), \d{4}|(?:January|December) \d{2}, \d{4}|(?:Jan|Dec) \d+ - \d+, \d{4})/