Regex to match all hours except some specific hours - regex

In Google BigQuery, I have to list all tables that contain hours between 02 till 23 hour in their names.
Aka match any hour in this range [02->23], so I created the following regex:
([0-2][2-9])
But the problem is it will skip the hour 10 and 11, and REGEX_EXTRACT returns null value for those “unparsable?” tables.
I could try to match all hours first, then exclude hour 00 and 01. But I could’t find a way in regex to add them as exceptions..
([0-2][0-9])
What would you recommend/suggest in this case? Given that I can’t split this into 2 different regexes.
Thank you.

You could write the pattern as
\b(?:0[2-9]|1\d|2[0-3])\b
Regex demo

Related

How do I regextract the second date in a string?

I am trying to extract the second date displayed in this string, however my code keeps extracting just the first date in gsheet:
String: BOT +1 1/1 CUSTOM IWM 100 12 SEP 22/7 SEP 22 184/184 PUT/CALL #6.13
This is my code: =REGEXEXTRACT(A3,"(\d{1,2}\s+[A-Za-z]+\s\d{2,4})")
my result: 12 SEP 22
Desired result should be: 7 SEP 22
Appreciate the help, thanks in advance!
Considering you have already a working formula for detecting dates, you can try adding first outside of the parentheses the same structure. So it will look for the first date, then .+ will consider that there will be some characters in between, and then your working pattern between parenthesis. Then only that last part will be extracted:
=REGEXEXTRACT(A3,"\d{1,2}\s+[A-Za-z]+\s\d{2,4}.+(\d{1,2}\s+[A-Za-z]+\s\d{2,4})")
Here's one approach to dynamically extract N number of dates within your string OR extract the 2nd or 3rd date pattern as per the requirement.
=index(if(len(A:A),lambda(y,regexextract(y,lambda(z,regexreplace(y,"(?i)("&z&")","($1)"))("\d{1,2}\s"&JOIN("\s\d{2}|\d{1,2}\s",INDEX(TEXT(SEQUENCE(12,1,DATE(2022,1,1),31),"MMM")))&"\s\d{2}")))(regexreplace(A:A,"[\(\)/+]","")),))
if its to pick specific number pattern, wrap the formula within index + number as shown in the screenshot
=index(formula,,pattern number)
To extract just the second date, you can modify the code as follows:
=REGEXEXTRACT(A3,"\d{1,2}\s+[A-Za-z]+\s\d{2,4}.*(\d{1,2}\s+[A-Za-z]+\s\d{2,4})")
This regular expression \d{1,2}\s+[A-Za-z]+\s\d{2,4}.*(\d{1,2}\s+[A-Za-z]+\s\d{2,4}) will match the first date and the second date in the string, and then extract just the second date.

REGEX filtering on time

Looking to use Regex to filter time information. My idea is as follows:
Sample Inputs:
..."sunday\":[[\"1...
..."sunday\":[[\"2...
..."sunday\":[[\"3...
...
..."sunday\":[[\"9:59...
Essentially, I am looking to filter times that come before 10:00 on Sunday. My data comes in the following format, with the preceding and latter part of the string denoted by "..." as text representing other days of the week. I am looking to create a regex that is able to accomplish this. All of the sample input should pass. Example of input that would fail:
..."sunday\":[[\"11:00...
..."sunday\":[[\"10:01...
..."sunday\":[[\"12:01...
THank you!
Try this .+?sunday.+?"[0-9]:\d+.+
It checks for anything + sunday + anything + an hour format that suits your < 10:00 requirement.
Test it here regex101

Regex for month and day before specific symbols

I'm trying to get the day and month from strings such as:
5月2日 or 4月22日 or 12月2日
However I can't see to figure out the correct regex:
I've tried \d{1,2}[^月] and \d{1,2}[^日] however this only returns something if there is a double digit in the day or month.
Any ideas what I'm missing?
Thanks.
\d{1,2} is matching 1 digit and [^月] is matching another. Your current regex will match two digits and then any character except 月
The correct way to ensure the 月 follows is to use a lookahead \d{1,2}(?=月) as seen in use here
Assuming you have 12 months per year and up to 31 days per month this will get you close, you'll still have to do bounds checking after you determine the syntax is correct; (read; month 19 day 37 will be valid syntax here)
1?\d月[123]?\d日
Edit: Here's a better regex that doesn't need to be bounds checked and doesn't require lookahead;
^(1[012]|[1-9])月(3[01]|[12]\d|[1-9])日$

date regex which validates 3 different formats

I need a regex for date string which validates
YYYY:MM:DD:HH
YYYY:MM:DD:HH:mm
YYYY:MM:DD:HH:mm:ss
means all 3 formats are valid.
Can someone help me with this ?
I have
d\d\d\d:(0\d|1[012]):([012]\d|3[01]):([01]\d|2[0-3])$ YYYY:MM:DD:HH
^\d\d\d\d:(0\d|1[012]):([012]\d|3[01]):([01]\d|2[0-3]):[0-5]\d$ YYYY:MM:DD:HH:MM
^\d\d\d\d:(0\d|1[012]):([012]\d|3[01]):([01]\d|2[0-3]):[0-5]\d:[0-5]\d$ YYYY:MM:DD:HH:MM:SS
These 3 regex and needs to be combine in one
this is your pattern
YYYY:MM:DD:HH(:mm(:ss)?)?
? means 0 or 1 time
you can test it here
I kept your year month day expression d\d\d\d:(0\d|1[012]):([012]\d|3[01]):([01]\d|2[0-3]). Since your hour and minute expressions where the same :[0-5]\d I just required them to appear zero, once or twice with.
The resulting expression is:
^\d\d\d\d:(0\d|1[012]):([012]\d|3[01]):([01]\d|2[0-3])(:[0-5]\d){0,2}$
This expression by francis-gagnon is a slight modification to prevent edge cases where the day or month is expressed as 00.
^\d\d\d\d:(0[1-9]|1[012]):(0[1-9]|[12]\d|3[01]):([01]\d|2[0-3])(:[0-5]\d){0,2}$
If you're looking to also check the date is valid then you could use something like this monster which will test each date position to it's valid and that the time will fit into 24 hour clock:
^(?:(?:(?:(?:(?:1[6-9]|[2-9]\d)?(?:0[48]|[2468][048]|[13579][26])|(?:(?:16|[2468][048]|[3579][26])00)))(:|\/|-|\.)(?:0?2\1(?:29)))|(?:(?:(?:1[6-9]|[2-9]\d)?\d{2})(:|\/|-|\.)(?:(?:(?:0?[13578]|1[02])\2(?:31))|(?:(?:0?[13-9]|1[0-2])\2(?:29|30))|(?:(?:0?[1-9])|(?:1[0-2]))\2(?:0?[1-9]|1\d|2[0-8]))))(?::(?:[01]\d|2[0-3]))?(?::[0-5]\d){0,2}$
\d{4}:[0-1][0-9]:[0-3][0-9](?::[0-5][0-9](?::[0-5][0-9])?)?

Gawk regular expression for days in month

Im writing regular expression that accepts days in months ([0-3])([0-9]). How to change it so it will only accept proper amount of days from 1 to 31, but not 37 like mine... i tried alternation |, but i don't know how to include first group into it.
([0-2])([0-9])|(3)([0-1]) does not work
How to change it so i will have still 2 groups and proper dates?
edit: 2 groups, not 4
Try this :
(0)([1-9])|(1|2)([0-9])|(3)(0|1)
DEMO Match numbers between 01 and 31 only
(0[1-9]|[12][0-9]|3[01])
This accepts values between 0-31 in one group, but does not care about about that February has no days as 30,31.
Sorry, misread it.
If you want to get the values in two groups you have to use negative lookahead like so:
([0-2]|3(?![^0-1]))([0-9])
But I think gawk does not support this.