Regex pattern for a number within a number - regex

Can anyone think of a better way to write this? It works but it is a little ugly.
Input data looks like this: 125100001
The first two numbers are the year, next two are the week number, and last 5 are the serial. I want to validate that the week number is not over 52 for an angular input[number] pattern option. Basically just to leverage the $error field :)
So here it is:
^\d\d(0[0-9]|1[0-9]|2[0-9]|3[0-9]|4[0-9]|5[0-2]){1}\d{5}$

Use this:
^(\d{2})([0-4][1-9]|[1-5]0|5[12])(\d{5})$
Notes
The first set of parentheses (0[1-9]|1[0-2]) validates the month: 01-12
The second set of parentheses ([0-4][1-9]|[1-5]0|5[12]) validates the week: 01-52
If you wish, you can retrieve each component with groups 1, 2 and 2

Just for the week part:
[0-4]\d|5[0-2]
so the entire regex would be:
^\d\d([0-4]\d|5[0-2])\d{5}$

Related

Regex expression for date within dates range

I need to validate with regex a date in format yyyy-mm-dd (2019-12-31) that should be within the range 2019-12-20 - 2020-01-10.
What would be the regex for this?
Thanks
Regex only deal with characters. so we have to work out at each position in the date what are the valid characters.
The first part is easy. The first two characters have to be 20
Now it gets complicated the next character can be a 1 or a 2 but what follows depends on the value of that character so we split the rest of the regex into two sections the first if the third character matches 1 and the second if it matches 2
We know that if the third character is a 1 then what must follow is the characters 9-12- as the range starts at 2019-12-20 now for the day part. The 9th character is the tens for the day this can only be 2 or 3 as we are already in the last month and the minimum date is 20. The last character can be any digit 0-9. This gives us a day match of [23][0-9]. Putting this together we now have a pattern for years starting 2019 as 19-12-[23][0-9]
It the third character is a 2 then we can match up to the day part of the date a gain as the range ends in January. This gives us a partial match of 20-01- leaving us to work on the day part. Hear we know that the first character of the day can either be a 1 or 0 however if it's a 1 then the last character must be a 0 and if it's a 0 then the last character can only be in the range 1 to 9. This give us another alteration (?:0[1-9]|10) Putting the second part together we get 20-01-(?:0[1-9]|10).
Combining these together gives the final regex 20(?:19-12-[23][0-9]|20-01-(?:0[1-9]|10))
Note that I'm assuming that the date you are testing against is a validly formatted date.
Try this:
(2019|2020)\-(12|01)\-([0-3][0-9]|[0-9])
But be aware that this will allow number up to where the first digit is between zero and three and the second digit between zero and nine for the dd value. You could specify all numbers you want to allow (from 20 to 10) like this (20|21|22|23|24|25|26|27|28|29|30|31|01|1|02|2|03|3|04|4|05|5|06|6|07|7|08|8|09|9|10).
(2019|2020)\-(12|01)\-(20|21|22|23|24|25|26|27|28|29|30|31|01|1|02|2|03|3|04|4|05|5|06|6|07|7|08|8|09|9|10)
But honestly... Regular-Expressions are not the right tool for this. RegExp gives a mask to something, not a logical context. Use regex to extract the data/value from a string and validate those values using another language.
The above 2nd Regex will, f.e. match your dates, but also values outside of this range since there is no context between 2019|2020 and the second group 12|01 so they match values like 2019-12-11 but also 2020-12-11.
To only match the values you want this will be a really large regex like this (inner brackets only if you need them) ((2019)-(12)-(20)|(2019)-(12)-(21)|(2019)-(12)-(22)|...) and continue with all possible dates - and ask yourself: what would you do if you find such a regex in a project you have to work with ;)
Better solution (quick and dirty, there might be better solutions):
(?<yyyy>20[0-9]{2})\-(?<mm>[01][0-9]|[0-9])\-(?<dd>[0-3][0-9]|[0-9])
This way you have three named groups (yyyy, mm, dd) you can access and validate the matched values... The regex is smaller, you have a better association between code and regex and both are easier to maintain.

REGEXEXTRACT - Error when trying to get a phone number from sting

I am wondering if someone can help me get this formula right in google spreadsheets.
After a 2 week event I do get a spreadsheet with more that 2000 rows of comments which include phone numbers here and there. I am trying to extract the phone numbers from those strings.
example string: call at 228-219-4241 after
formula: =IFERROR(REGEXEXTRACT(V133,"^(?(?:\d{3}))?[-.]?(?:\d{3})[-.]?(?:\d{4})$"),"NOT FOUND!!!")
and I do get "NOT FOUND!!!!
image from gsheet... NOT FOUND!!!
But it works only in this case..
just the number
Cheers.
Your regex is too complicated and your restricting it to a rule that says the number is the first thing in the string, change to this:
=iferror(regexextract(A1,"\d{3}\-\d{3}\-\d{4}"))
In your example the '^' sign means beginning of the line and '$' means the end so your saying the first thing in your string will always be 3 numbers and the last will always be 4

Date format comparmission

I have question. I am trying to prepare date regex comparmission. The problem is month and day if its one digit it can be present as 03 or 3 for both month and day. For instance possible values:
2015/03/27 or 2015/4/12 or 2015/07/05 or 2015/2/2 or 2015/02/3
What i did so far is:
^(?<Month>\d(0([0-1]|1[0-2])|([1-12])){1,2})/(?<Day>\d{1,2})/(?<Year>(?:\d{4}|\d{2}))$
I started to make now for month:
(?<Month>\d(0([0-1]|1[0-2])|([1-12])){1,2})
(0([0-1]|1[0-2])|([1-12])){1,2})
so {1,2} - because can be one digit or two for instance (12, 2, 02)
0([0-1]|1[0-2]) | ([1-12])) - because can be two digits or one
somehow i cant figure it into the final version.
Can you help me out?
Using just \d, you might end up with fake dates, like 12/67/4567.
Also, your input has another date format: Year/Month/Day.
I suggest using this regex for your input format:
^(?<Year>(?:19|20)\d{2})\/(?<Month>0?[1-9]|1[0-2])\/(?<Day>3[01]|0?[1-9]|[12][0-9])$
See demo
Optional 0s are made possible due to the ? quantifier after 0.
If it is for .NET, you do not have to escape /s.
To validate the date, use the classes and methods of the programming environment you are using. Here is an example in C#:
var resultFromRegex = "2015/03/27";
DateTime validDate;
var isValid = DateTime.TryParseExact(resultFromRegex, "yyyy/MM/dd", new System.Globalization.CultureInfo("en-US"), System.Globalization.DateTimeStyles.None, out validDate);

date regex which validates 3 different formats

I need a regex for date string which validates
YYYY:MM:DD:HH
YYYY:MM:DD:HH:mm
YYYY:MM:DD:HH:mm:ss
means all 3 formats are valid.
Can someone help me with this ?
I have
d\d\d\d:(0\d|1[012]):([012]\d|3[01]):([01]\d|2[0-3])$ YYYY:MM:DD:HH
^\d\d\d\d:(0\d|1[012]):([012]\d|3[01]):([01]\d|2[0-3]):[0-5]\d$ YYYY:MM:DD:HH:MM
^\d\d\d\d:(0\d|1[012]):([012]\d|3[01]):([01]\d|2[0-3]):[0-5]\d:[0-5]\d$ YYYY:MM:DD:HH:MM:SS
These 3 regex and needs to be combine in one
this is your pattern
YYYY:MM:DD:HH(:mm(:ss)?)?
? means 0 or 1 time
you can test it here
I kept your year month day expression d\d\d\d:(0\d|1[012]):([012]\d|3[01]):([01]\d|2[0-3]). Since your hour and minute expressions where the same :[0-5]\d I just required them to appear zero, once or twice with.
The resulting expression is:
^\d\d\d\d:(0\d|1[012]):([012]\d|3[01]):([01]\d|2[0-3])(:[0-5]\d){0,2}$
This expression by francis-gagnon is a slight modification to prevent edge cases where the day or month is expressed as 00.
^\d\d\d\d:(0[1-9]|1[012]):(0[1-9]|[12]\d|3[01]):([01]\d|2[0-3])(:[0-5]\d){0,2}$
If you're looking to also check the date is valid then you could use something like this monster which will test each date position to it's valid and that the time will fit into 24 hour clock:
^(?:(?:(?:(?:(?:1[6-9]|[2-9]\d)?(?:0[48]|[2468][048]|[13579][26])|(?:(?:16|[2468][048]|[3579][26])00)))(:|\/|-|\.)(?:0?2\1(?:29)))|(?:(?:(?:1[6-9]|[2-9]\d)?\d{2})(:|\/|-|\.)(?:(?:(?:0?[13578]|1[02])\2(?:31))|(?:(?:0?[13-9]|1[0-2])\2(?:29|30))|(?:(?:0?[1-9])|(?:1[0-2]))\2(?:0?[1-9]|1\d|2[0-8]))))(?::(?:[01]\d|2[0-3]))?(?::[0-5]\d){0,2}$
\d{4}:[0-1][0-9]:[0-3][0-9](?::[0-5][0-9](?::[0-5][0-9])?)?

Gawk regular expression for days in month

Im writing regular expression that accepts days in months ([0-3])([0-9]). How to change it so it will only accept proper amount of days from 1 to 31, but not 37 like mine... i tried alternation |, but i don't know how to include first group into it.
([0-2])([0-9])|(3)([0-1]) does not work
How to change it so i will have still 2 groups and proper dates?
edit: 2 groups, not 4
Try this :
(0)([1-9])|(1|2)([0-9])|(3)(0|1)
DEMO Match numbers between 01 and 31 only
(0[1-9]|[12][0-9]|3[01])
This accepts values between 0-31 in one group, but does not care about about that February has no days as 30,31.
Sorry, misread it.
If you want to get the values in two groups you have to use negative lookahead like so:
([0-2]|3(?![^0-1]))([0-9])
But I think gawk does not support this.