How to extract the year out of a string in R - regex

i have a dataframe with entries looking like:
"Wittmann 2014 100 Hills Dry Riesling (Rheinhessen)" and
"Hazlitt 1852 Vineyards 2013 Riesling (Finger Lakes)"
I need to extract the years (vintage of the wine) out of the String, but only the years from 2012 till 2015...
Would be nice if someone can help me find the right code/regex in R.

Maybe this regex will work for you:
/(\b201[2-5]\b)/g
Try it online

Related

Regex : dd/mm/yyyy excluding somme years [duplicate]

This question already has answers here:
Regex to match Date
(7 answers)
Closed 2 years ago.
I made a google forms which i asked a date of birth like dd/mm/yyyy.
I'm looking for a RegEx that allow every date from 01/01/1900 to 31/12/2015 but refuse every date who contains this 5 years 2016, 2017, 2018, 2019, 2020.
Does someone have an idea ?
Thanks for help.
If you really only want to check dates for years 1900-2015, it suffices to code
\b(\d{1,2}/\d{1,2}/(19\d{2}|200\d|201[0-5]))\b
The \b...\b bound is less restrictive than ^...$
Because the previous answer did not specify any year bounds, they need to be added, for example, 1900…2099 (excluding 2016-2020)
\b(?!2016|2017|2018|2019|2020)(\d{1,2}/\d{1,2}/(19|20)\d{2})\b

Trying to separate the year month and day fields via RegEx [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 4 years ago.
I have a series of dates that I am trying to separate into years, months and days. These dates are in the yyyy-mm-dd format. I'm not very familiar with RegEx, but I have tried (\dddd)\-(\dd)\-(\dd).
Any help is appreciated.
Accepted answer would also match invalid dates like 1234-56-78, or if mm & dd were in the wrong positions.
([1-2][0-9]{3})\-([0-1][0-9])\-([0-3][0-9])
Does a little more validation.

Regex to validate any kind of date format

I'm trying to find any kind of date format in a text as:
04.04.17
4/5/2016
6 December 1900
9 Dec 2014
1st of May 1920
2017
Dec. 21
October 10, 1930
October 10th, 2017
March 10-12 2015
Years only 1800 until 2017
That's what I have so far:
(0?[1-9]|[12][0-9]|3[01])?([\/\-\.]|st of\s|nd of\s|rd of\s|th of\s|\s)(Jan.?(uary)?|Feb.?(ruary)?|Mar.?(ch)?|Apr.?(il)?|May|Jun.?(e)?|Jul.?(y)?|Aug.?(ust)?|Sep.?(tember)?|Oct.?(ober)?|Nov.?(ember)?|Dec.?(ember)?|0?[1-9]|1[012])([\/\-\.]|\s)(((18|19)\d{2}|20[01][0-7])|[01][0-7])
The expression above can find the formats no. 1 to 5. If I try to work with the question mark quantifier after the first groups to find dates like "Dec. 21" and "2017" it does not work for the other date formats anymore.
Furthermore, the format no. 1 to 7 is more or less dd/mm/yyyy. However, format no. 8 to 10 is mm/dd/yyyy.
Any advice to solve this problem in one regex expression?
Thank you in advance!
Suggestion: instead of a monster regex, which would be nearly impossible to maintain, how about having an array of regex, one for each format you're accepting. Then loop through your array to see if the input matches any of your regexes. It would be easier to maintain, and likely would run faster, too.

Regular expression to return full file name and path

I'm a bit rusty when it comes to regular expressions so I could really use some expert help for the syntax. I'm looking for a regular expression that will return the full file name and path from a string. I am using the reference "Microsoft VBScript Regular Expressions 5.5" for Excel 2010 VBA. I just need the regex string.
Here's an example of what I'm working on. If the string is
=VLOOKUP($X18, 'E:\BUDGET 2012-13\Round 2 - final\program worksheets[AD allocations Support 2012 R2.xlsx]2013'!costcenter,
Y$5+2, FALSE)
then the returned value would be
'E:\BUDGET 2012-13\Round 2 - final\program worksheets[AD allocations
Support 2012 R2.xlsx]2013'
OR
'E:\BUDGET 2012-13\Round 2 - final\program worksheets[AD allocations
Support 2012 R2.xlsx]
( I can code around either return value ).
Thank you!
-- DOH! --
I figured it out a few minutes ago. Being a newb I can't answer my own question so I'm doing it here -- some of the rules here are odd...anyway... The syntax is
'.+?'
and will return
'E:\BUDGET 2012-13\Round 2 - final\program worksheets[AD allocations
Support 2012 R2.xlsx]2013'
If you know of a better way please feel free to post it.
copy from my regex toolbox:
'([a-zA-Z]:\\(?:[^\\/:*?"<>|#]++\\)*+)([^\\/:*?"<>|#]+)'
I have tested your data with http://regexr.com?31oaq, you will get path from group 1, file name from group 2. I hope this can help you :)

Regex statements for date ranges <=4/1/2009 and <=10/01/2009

I need serious help building two Regex statements for a project. The software we're using ONLY accepts Regex for validation.
I need one that fires for any date <4/1/2009
and a second that fires for any date <10/1/2009
My co-worker gave me the following code to check for <=10/01/2010, but it checks leap years and all that stuff. I need something a little more streamlined than this in the MM/DD/YYYY format. Thanks in advance!
^(?:(?:0?[1-9])|(?:1[0-2]))(\/|-|\.)(?:0?[1-9]|1\d|2[0-8])(\/|-|\.)(?:2[0-9][2-9][0-9])$|^(?:(?:0?[1-9])|(?:1[0-2]))(\/|-|\.)(?:0?[1-9]|1\d|2[0-8])(\/|-|\.)(?:201[1-9])$|^(?:(?:(?:0?[13578]|1[02])(\/|-|\.)31)|(?:(?:0?[1,3-9]|1[0-2])(\/|-|\.)(?:29|30)))(\/|-|\.)(?:201[1-9])$|^(?:(?:(?:11)(\/|-|\.))(?:0?[1-9]|1\d|2[0-9]|30)(\/|-|\.))(2010)$|^(?:(?:(?:10|12)(\/|-|\.))(?:0?[1-9]|1\d|2[0-9]|30|31)(\/|-|\.))(2010)$|^(?:(?:0?[1-9])|(?:1[0-2]))(\/|-|\.)(?:0?[1-9]|1\d|2[0-8])(\/|-|\.)(?:2[0-9][2-9][0-9])$|^(?:(?:(?:0?[13578]|1[02])(\/|-|\.)31)\1|(?:(?:0?[1,3-9]|1[0-2])(\/|-|\.)(?:29|30)))(\/|-|\.)(?:2[0-9][2-9][0-9])$|^(?:(?:0?[1-9])|(?:1[0-2]))(\/|-|\.)(?:0?[1-9]|1\d|2[0-8])(\/|-|\.)(?:2011)$|^(?:0?2(\/|-|\.)29\3(?:(?:(?:2[0-9][1-9]\d)?(?:0[48]|[2468][048]|[13579][26])|(?:(?:16|[2468][048]|[3579][26])00))))$
^(?:(?:0?2/(?:[12][0-9]|0?[1-9])|0?[13]/(?:3[01]|[12][0-9]|0?[1-9]))/2009|(?:0?2/(?:[12][0-9]|0?[1-9])|(?:0?[469]|11)/(?:30|[12][0-9]|0?[1-9])|(?:0?[13578]|1[02])/(?:3[01]|[12][0-9]|0?[1-9]))/(?:200[0-8]|19[0-9]{2}))$
will match any date between 1/1/1900 and 3/31/2009, ignoring leap years but otherwise matching only valid dates;
^(?:(?:0?2/(?:[12][0-9]|0?[1-9])|0?[469]/(?:30|[12][0-9]|0?[1-9])|0?[13578]/(?:3[01]|[12][0-9]|0?[1-9]))/2009|(?:0?2/(?:[12][0-9]|0?[1-9])|(?:0?[469]|11)/(?:30|[12][0-9]|0?[1-9])|(?:0?[13578]|1[02])/(?:3[01]|[12][0-9]|0?[1-9]))/(?:200[0-8]|19[0-9]{2}))$
does the same for 1/1/1900-9/30/2009.
EDIT: It looks like "firing" means "not matching" in your question. So
^(?:(?:(?:0?[469]|11)/(?:30|[12][0-9]|0?[1-9])|(?:0?[578]|1[02])/(?:3[01]|[12][0-9]|0?[1-9]))/2009|(?:0?2/(?:[12][0-9]|0?[1-9])|(?:0?[469]|11)/(?:30|[12][0-9]|0?[1-9])|(?:0?[13578]|1[02])/(?:3[01]|[12][0-9]|0?[1-9]))/(?:[3-9][0-9]{2}|2[1-9][0-9]|20[1-9])[0-9])$
will match any date from 4/1/2009 onwards, and
^(?:(?:11/(?:30|[12][0-9]|0?[1-9])|1[02]/(?:3[01]|[12][0-9]|0?[1-9]))/2009|(?:0?2/(?:[12][0-9]|0?[1-9])|(?:0?[469]|11)/(?:30|[12][0-9]|0?[1-9])|(?:0?[13578]|1[02])/(?:3[01]|[12][0-9]|0?[1-9]))/(?:[3-9][0-9]{2}|2[1-9][0-9]|20[1-9])[0-9])$
will match any date from 10/1/2009 onwards.
All regexes created using RegexMagic.