Hello Im trying to get only time groups such as (these 3 strings):
'6 hrs'
'16 mins'
'5 hrs 30 mins'
From text
6 hrs blah blah 16 mins blah
blah 5 hrs 30 mins xyz
I tried with this /(\d+ hrs)?( \d+ mins)?/gm ,
but I get 32 matches instead. Please see this https://regex101.com/r/xmoMKz/4
How to get only the groups I want? Without the other matches - I even dont know what are the other matches, positions?
If you get rid of the capturing groups with ?:, you should be able to get what you want. And rather than have both subgroups be optional, it’s better to make the first one not optional and just check for either hrs or mins. So try:
(?:\d+ (?:hrs|mins))(?: \d+ mins)?
Regex101: https://regex101.com/r/nQbrb0/1
Related
I'm having trouble trying to regex extract the 'positions' from the following types of strings:
6 red players position 5, button 2
earn $50 pos3, up to $1,000
earn $50 pos 2, up to $500
table button 4, before Jan 21
I want to get the number that comes after 'pos' or 'position', and if there's no such keyword, get the last number before the first comma. The position value can be a number between 1 and 100. So 'position' for each of the previous rows would be:
Input text
Desired match (position)
6 red players position 5, button 2
5
earn $50 pos3, up to $1,000
3
earn $50 pos 2, up to $500
2
table button 4, before Jan 21
4
I have a big data set (in BigQuery) populated with basically those 4 types of strings.
I've already searched for this type of problem but found no solution or point to start from.
I've tried .+?(?=,) (link) which extracts everything up to the first comma (,), but then I'm not sure how to go about extracting only the numbers from this.
I've tried (?:position|pos)\s?(\d) (link) which extracts what I want for group 1 (by using non-capturing groups), but doesn't solve the 4th type of string.
I feel like there's a way to combine these two, but I just don't know how to get there yet.
And so, after the two things I've tried, I have two questions:
Is this possible with only regex? If so, how?
What would I need to do in SQL to make my life easier at getting these values?
I'd appreciate the help/guidance with this. Thanks a ton!
You can use
^(?:[^,]*[^0-9,])?(\d+),
See the RE2 regex demo. Details:
^ - start of string
(?:[^,]*[^0-9,])? - an optional sequence of:
[^,]* - zero or more chars other than comma
[^0-9,] - a char other than a digit and comma
(\d+) - Group 1: one or more digits
, - a comma
Use look ahead for a comma, with a look behind requiring the previous char to be a space or a letter to prevent matching the “1” in “$1,000”:
(?<=[ a-z])(\d+)(?=,)
See live demo.
I'm using Dreamweaver to replace about 1,000 instances of page titles that have a similar format:
5 5 2016 Nice tasty halibut
5 19 2016 A good king salmon and halibut day
...
I'd like the date to be formatted like:
5-5-2016 Nice tasty halibut
5-19-2016 A good king salmon and halibut day
I tried several ways of using Regular Expressions to fix this, but couldn't get the replaced value with the desired format. Can anyone help me out here?
Suggest using ([0-9]{1,2})\s+([0-9]{1,2})\s+([0-9]{4}).
This is a stricter regexpr. Essentially you are capturing 3 groups of numbers where
Group 1 must be digits and there can only be 1 or 2.
Group 2, same 1 or 2 digits.
Group 3, exactly 4 digits for the year.
Group 4, rest of the string.
And \s+ means 1 or more white spaces.
Then $1-$2-$3 $4 to match back all 4 groups together.
See:
https://regex101.com/r/wO3wD6/1
Search for ([0-9]+) ([0-9]+) (.*) and replace it with $1-$2-$3.
I'm using some Regex to find date strings of the form Jan 12, 2015 or Feb 3, 1999.
The regex I'm using is \w+\s\d{1,2},\s\d{4} and it's working correctly, but the thing is that on the file are also some strings with the form:
Weg 58, 4047 or Strasse 1, 4482 and I also match them.
How can I avoid those non-date matches? My approach is:
The first string (the one of the month, Jan, Feb, etc.) has to have always length 3.
The year has to start with 1 or 2.
The thing is that I dont know how can I add these two options to my regex. Any help please?
You can make the test right here: https://regex101.com/r/bN2pO0/1
Thanks in advance.
Since the months won't change (ie: consistent values between January - Decemeber, we can put the 3 starting characters).
We can then use a OR | operator to select years starting with 1 or 2
/((Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\s\d{1,2},\s(1|2)\d{3})/ig
https://regex101.com/r/bN2pO0/3
Just as you used \d{1,2} to match a digit 1 or 2 times and \d{4} to match a digit 4 times, you can use \w{3} to match a word character 3 times.
For the year, you can use the pipe "or" operator |.
\w{3}\s\d{1,2},\s(?:1|2)\d{3}
Although, this will also match non-dates of form Abc xy, 1xyz
If you want, you can go with brute force approach or just get rid of regex and use code to capture the dates.
Brute force:
(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\s[0-2]?[0-9],\s[12]\d{3}
10 Sts - $5,763Jan17 11 Lon -2 ft-1 Janet HallFeb2 9 Lon -10gd-4 F-nw7000lc
Using Notepad++ in the above phrase I wanted to start a new line with the dates Jan17 and Feb2 but when I try Jan\d+ to \r\nJan I get Jan 11 Lon -2 ft-1 Janet Hall without the 17 part of the date.
I can split the line again with Feb\d+ to \r\nFeb but again the 2 part of the date is missing in the newly created line.
You need to use a replacement group.
Try Find what: Jan(\d+)
Replace: \r\nJan\1
Using (\d+) will capture the number into a replacement group. Using \1 will insert the captured characters in the first replacement group.
Im writing regular expression that accepts days in months ([0-3])([0-9]). How to change it so it will only accept proper amount of days from 1 to 31, but not 37 like mine... i tried alternation |, but i don't know how to include first group into it.
([0-2])([0-9])|(3)([0-1]) does not work
How to change it so i will have still 2 groups and proper dates?
edit: 2 groups, not 4
Try this :
(0)([1-9])|(1|2)([0-9])|(3)(0|1)
DEMO Match numbers between 01 and 31 only
(0[1-9]|[12][0-9]|3[01])
This accepts values between 0-31 in one group, but does not care about about that February has no days as 30,31.
Sorry, misread it.
If you want to get the values in two groups you have to use negative lookahead like so:
([0-2]|3(?![^0-1]))([0-9])
But I think gawk does not support this.