Restrict Regex to accept certain format - regex

Right now, Regex say is valid a date if I have 200011 - Which is Jan 1st 2000
but i want to restrict that to have the format YYYYMMDD so it will accept only 20000101 as a valid date. How can I achieve this?
My code:
^(?:(?:(?:(?:(?:[1-9]\d)(?:0[48]|[2468][048]|[13579][26])|(?:(?:[2468][048]|[13579][26])00))([-\/.]?)(?:0?2\1(?:29)))|(?:(?:[1-9]\d{3})([-\/.]?)(?:(?:(?:0?[13578]|1[02])\2(?:31))|(?:(?:0?[13-9]|1[0-2])\2(?:29|30))|(?:(?:0?[1-9])|(?:1[0-2]))\2(?:0?[1-9]|1\d|2[0-8])))))$

You need to remove ? after all 0s:
^(?:(?:(?:(?:(?:[1-9]\d)(?:0[48]|[2468][048]|[13579][26])|(?:(?:[2468][048]|[13579][26])00))([-\/.]?)(?:02\1(?:29)))|(?:(?:[1-9]\d{3})([-\/.]?)(?:(?:(?:0[13578]|1[02])\2(?:31))|(?:(?:0[13-9]|1[0-2])\2(?:29|30))|(?:(?:0[1-9])|(?:1[0-2]))\2(?:0[1-9]|1\d|2[0-8])))))$
See the regex demo
For example, the last 0?[1-9] would match 0 one or zero times, and then a non-zero digit. When you remove ? quantifier, the 0 will become required.

Related

Regex expression for date within dates range

I need to validate with regex a date in format yyyy-mm-dd (2019-12-31) that should be within the range 2019-12-20 - 2020-01-10.
What would be the regex for this?
Thanks
Regex only deal with characters. so we have to work out at each position in the date what are the valid characters.
The first part is easy. The first two characters have to be 20
Now it gets complicated the next character can be a 1 or a 2 but what follows depends on the value of that character so we split the rest of the regex into two sections the first if the third character matches 1 and the second if it matches 2
We know that if the third character is a 1 then what must follow is the characters 9-12- as the range starts at 2019-12-20 now for the day part. The 9th character is the tens for the day this can only be 2 or 3 as we are already in the last month and the minimum date is 20. The last character can be any digit 0-9. This gives us a day match of [23][0-9]. Putting this together we now have a pattern for years starting 2019 as 19-12-[23][0-9]
It the third character is a 2 then we can match up to the day part of the date a gain as the range ends in January. This gives us a partial match of 20-01- leaving us to work on the day part. Hear we know that the first character of the day can either be a 1 or 0 however if it's a 1 then the last character must be a 0 and if it's a 0 then the last character can only be in the range 1 to 9. This give us another alteration (?:0[1-9]|10) Putting the second part together we get 20-01-(?:0[1-9]|10).
Combining these together gives the final regex 20(?:19-12-[23][0-9]|20-01-(?:0[1-9]|10))
Note that I'm assuming that the date you are testing against is a validly formatted date.
Try this:
(2019|2020)\-(12|01)\-([0-3][0-9]|[0-9])
But be aware that this will allow number up to where the first digit is between zero and three and the second digit between zero and nine for the dd value. You could specify all numbers you want to allow (from 20 to 10) like this (20|21|22|23|24|25|26|27|28|29|30|31|01|1|02|2|03|3|04|4|05|5|06|6|07|7|08|8|09|9|10).
(2019|2020)\-(12|01)\-(20|21|22|23|24|25|26|27|28|29|30|31|01|1|02|2|03|3|04|4|05|5|06|6|07|7|08|8|09|9|10)
But honestly... Regular-Expressions are not the right tool for this. RegExp gives a mask to something, not a logical context. Use regex to extract the data/value from a string and validate those values using another language.
The above 2nd Regex will, f.e. match your dates, but also values outside of this range since there is no context between 2019|2020 and the second group 12|01 so they match values like 2019-12-11 but also 2020-12-11.
To only match the values you want this will be a really large regex like this (inner brackets only if you need them) ((2019)-(12)-(20)|(2019)-(12)-(21)|(2019)-(12)-(22)|...) and continue with all possible dates - and ask yourself: what would you do if you find such a regex in a project you have to work with ;)
Better solution (quick and dirty, there might be better solutions):
(?<yyyy>20[0-9]{2})\-(?<mm>[01][0-9]|[0-9])\-(?<dd>[0-3][0-9]|[0-9])
This way you have three named groups (yyyy, mm, dd) you can access and validate the matched values... The regex is smaller, you have a better association between code and regex and both are easier to maintain.

How to creating a regex pattern in VBA to extract dates from string and exclude false matches

I am trying to use Regex to parse a series of strings to extract one or more text dates that may be in multiple formats. The strings will look something like the following:
24 Aug 2016: nno-emvirt010a/b; 16 Aug 2016 nnt-emvirt010a/b nnd-emvirt010a/b COSI-1.6.5
24.16 nno-emvirt010a/b nnt-emvirt010a/b nnd-emvirt010a/b EI.01.02.03\
9/23/16: COSI-1.6.5 Logs updated at /vobs/COTS/1.6.5/files/Status_2016-07-27.log, Status_2016-07-28.log, Status_2016-08-05.log, Status_2016-08-08.log
I am not concerned about validating the individual date fields; just extracting the date string. The part I am unable to figure out is how to not match on number sequences that match the pattern but aren’t dates (‘1.6.5’ in ex. (1) and 01.02.03 in ex. (2)) and dates that are part of a file name (2016-07-27 in ex. (3)). In each of these exception cases in my input data, the initial numbers are preceded by either a period(.), underscore (_) or dash (-), but I cannot determine how to use this to edit the pattern syntax to not match these strings.
The pattern I have that partially works is below. It will only ignore the non date matches if it starts with 1 digit as in example 1.
/[^_\.\(\/]\d{1,4}[/\-\.\s*]([1-9]|0[1-9]|[12][0-9]|3[01]|[a-z]{3})[/\-\.\s*]\d{1,4}/ig`
I am not sure about vba check if this works . seems they have given so much options : https://www.safaribooksonline.com/library/view/regular-expressions-cookbook/9781449327453/ch04s04.html
^(?:(1[0-2]|0?[1-9])/(3[01]|[12][0-9]|0?[1-9])|↵
(3[01]|[12][0-9]|0?[1-9])/(1[0-2]|0?[1-9]))/(?:[0-9]{2})?[0-9]{2}$
^(?:
# m/d or mm/dd
(1[0-2]|0?[1-9])/(3[01]|[12][0-9]|0?[1-9])
|
# d/m or dd/mm
(3[01]|[12][0-9]|0?[1-9])/(1[0-2]|0?[1-9])
)
# /yy or /yyyy
/(?:[0-9]{2})?[0-9]{2}$
According to the test strings you've presented, you can use the following regex
See this regex in use here
(?<=[^a-zA-Z\d.]|^)((?:\d{1,2}\s*[A-Z][a-z]{2}\s*\d+)|(?:(?:\d{1,2}\/){2}\d+)|(?:\d+(?:-\d{2}){2})|\d{2}\.\d{2})(?=[^a-zA-Z\d.])
This regex ensures that specific date formats are met and are preceded by nothing (beginning of the string) or by a non-word character (specifically a-z, A-Z, 0-9) or dot .. The date formats that will be matched are:
24 Aug 2016
24.16
9/23/16
The regex could be further manipulated to ensure numbers are in the proper range according to days/month, etc., however, I don't feel that is really necessary.
Edits
Edit 1
Since VBA doesn't support lookbehinds, you can use the following. The date is in capture group 1.
(?:[^a-zA-Z\d.]|^)((?:\d{1,2}\s*[A-Z][a-z]{2}\s*\d+)|(?:(?:\d{1,2}\/){2}\d+)|(?:\d+(?:-\d{2}){2})|\d{2}\.\d{2})(?=[^a-zA-Z\d.])
Edit 2
As per bulbus's comment below
(?:[^\w.]|^)((?:\d{1,2}\s*[A-Z][a-z]{2}\s*\d{2,4})|(?:(?:\d{‌1,2}\/){2}\d{2,4})|(‌​?:\d{2,4}(?:-\d{2}){‌​2})|\d{2}\.\d{2})
Took liberty to edit that a bit.
replaced [^a-zA-Z\d.] with [^\w.], comes with added advantage of excluding dates with _2016-07-28.log
Due to 1 removed trailing condition (?=[^a-zA-Z\d.]).
Forced year digits from \d+ to \d{2,4}
Edit 3
Due to added conditions of the regex, I've made the following edits (to improve upon both previous edits). As per the OP:
The edited pattern above works in all but 2 cases:
it does not find dates with the year first (ex. 2016/07/11)
if the date is contained within parenthesis in the string, it returns the left parenthesis as part of the date (ex. match = (8/20/2016)
Can you provide the edit to fix these?
In the below regexes, I've changed years to \d+ in order for it to work on any year greater than or equal to 0.
See the code in use here
(?:[^\w.]|^)((?:\d{1,2}\s+[A-Z][a-z]{2}\s+\d+)|(?:(?:\d{1,2}\/){2}\d+)|(?:\d+(?:\/\d{1,2}){2})|(?:\d+(?:-\d{2}){2})|\d{2}\.\d+)
This regex adds the possibility of dates in the XXXX/XX/XX format where the date may appear first.
The reason you are getting ( as a match before the regex is the nature of the Full Match. You need to, instead, grab the value of the first capture group and not the whole regex result. See this answer on how to grab submatches from a regex pattern in VBA.
Also, note that any additional date formats you need to catch need to be explicitly set in the regex. Currently, the regex supports the following date formats:
\d{1,2}\s+[A-Z][a-z]{2}\s+\d+
12 Apr 17
12 Apr 2017
(?:\d{1,2}\/){2}\d+
1/4/17
01/04/17
1/4/2017
01/04/2017
\d+(?:\/\d{1,2}){2}
17/04/01
2017/4/1
2017/04/01
17/4/1
\d+(?:-\d{2}){2}
17-04-01
2017-04-01
\d{2}\.\d+ - Although I'm not sure what this date format is even used for and how it could be considered efficient if it's missing month
24.16

Regex to add leading zero in date record

Question - what is the shortest form of regex to add a leading zero into single digit in date record?
So I want to convert 8/8/2014 8:04:34 to 08/08/2014 8:04:34 - add leading zero when only one digit is presented.
The record can have two single digit entry, one single digit entry or no single digit entry. Some records can be in forms like 25/06/2014 19:50:18 or 9/06/2014 8:27:35 - in other words, some of them could be already normalized and regex needs to fix only single digit entry.
Not a regex user by any means. Your help is appreciated.
How about:
Ctrl+H
Find what: \b(\d)(?=/)
Replace with: 0$1
Replace all
This will change 8/8/2014 8:04:34 into 08/08/2014 8:04:34
Use the following regex to find:
(\d)(\d)?/(\d)(\d)?/(.*)
Then use the following to replace:
(?{2}\1\2:0\1)/(?{4}\3\4:0\3)/\5
What we are using is called conditionals in terms of regex. Refer this answer for explanation.
Make sure you have unselected the checkbox which says ". matches newline".
First of all, let's do some test-driven development and write the test cases. We can ignore the time and concentrate on the date alone. Also, the year is not important. We have to find all the possible cases for the day and the month. For each of them, we can have:
A single digit
Two digits, the first of which is already a 0
Two digits, the first of which is not a 0
Two digits, the second of which is a 0 (probably not needed, but just in case).
The case where we have to do something is only the first one, and the last 3 could be joined into a single one, but I prefer to keep them separated. We need to test 16 combinations:
8/8/2014
8/08/2014
8/12/2014
8/10/2014
08/8/2014
08/08/2014
08/12/2014
08/10/2014
12/8/2014
12/08/2014
12/12/2014
12/10/2014
10/8/2014
10/08/2014
10/12/2014
10/10/2014
Of all of these, only 1, 2, 3, 4, 5, 9, 13 must be changed. I don't know how to do it with a single regex, but with 2 regexes it's easy:
First regex, for the day:
(?<!\d)(\d/\d{1,2}/\d+)
replace with:
0\1
It matches a date where the day has only one digit, followed by a month with either 1 or 2 days, followed by a year with any number of digits, and it simply adds a 0 at the beginning.
Second regex, for the month:
(\d{2}/)(\d/\d+)
replace with:
\10\2
This one assumes that the first one has already been run, and thus the day has 2 digits. It finds dates where the month has a single digit, and adds a 0 before it. Please note that \10\2 means: the first group that matched, followed by a 0, followed by the second group. It doesn't mean: the tenth group, followed by the second. So the digits 1 and 0 are logically separated.
Run the first one, then the second one, and it gives the correct result:
08/08/2014
08/08/2014
08/12/2014
08/10/2014
08/08/2014
08/08/2014
08/12/2014
08/10/2014
12/08/2014
12/08/2014
12/12/2014
12/10/2014
10/08/2014
10/08/2014
10/12/2014
10/10/2014
Thanks to this recent answer I finally can give you an (hopefully) correct answer ;)
Replace
\b(?:(\d\d)|(\d))/(?:(\d\d)|(\d))/(\d\d)
with
(?{1}\1:0$2)/(?{3}\3:0\4)/\5
It uses Notepad++ conditionals (which I didn't know of until I stumbled over the mention question) to handle when only one or the other is single digit.
The regex matches a word boundary \b followed by two digits, captured in group 1, or one digit, captured in group 2, followed by a /. Then the same logic is repeated for day, which is captured in group 3 (2 digit) or 4 (1 digit). Then finally it checks that a year follows (at least two digits).
The conditional replace is explained in the linked answer. But simply put the (?{1} test if a match to group 1 was made it replaces with the expression before the :, otherwise the one after.
Hope this helps.
Regards
If you had a date like (ISO format)
2017-9-5
This
replace(/(\D)(\d)(?!\d)/g, '$10$2')
will turn it into
2017-09-05
and will preserve two digits in dates like
2017-11-11 or 2017-9-05
a general approach is to search for (in this case 5 digit numbers):
(\d)??(\d)??(\d)??(\d)??(\d)
Replace with
(?1\1:0)(?2\2:0)(?3\3:0)(?4\4:0)\5
You can use /^\d\/|(?<=\/)\d\/\d/g to select text, then add 0 before selected text, it should work for all your conditions.

Visual Basic - RegEx - Overall Length Check regardless the number of matches

I have the following problem :
This is my RegEx-Pattern :
\d*[a-z A-Z][a-zA-Z0-9 _?!()\/\\]*
It allows anything but numbers that stand alone like : 1 , 11 , 111 or so on.
My question : How can I set the overall Length of the input regardless of the matches ?
i tried it with several options like {1,30} before each match and i put the regex in a group with ( ) and then {1,30} but it still doesnt work.
If anyone could help me i would appreciate it :).
Allowed string:
Group1
Group 1
1Group
Group!?()\/
Group !()\?!
a1 a1 a1 a1
Not Allowed:
1
11
And so on. {1,30} after a match restricts the number of how many times i can input the match. What i want to know is: How can i set the maximum length of my above RegEx, like after 30 chars the input is reached regardless of the matches?
In order to disallow a numeric string input only, you can use a negative look-ahead (?!\d+$) and to set a limit to the input, use a limiting quantifier {1,30}:
(?!\d+$)[a-zA-Z0-9 _?!()\/\\]{1,30}
See demo
Note that if you plan to match whole strings, you'd need anchors: ^ at the beginning will anchor the regex to the beginning of string, and $ will anchor at the end.
^(?!\d+$)[a-zA-Z0-9 _?!()\/\\]{1,30}$
See another demo

Is there a regex to match a numeric range, and also ensure that it's a valid range?

Is there a regular expression to match a numeric range, e.g. 1 - 20?
If so, is it possible to ensure that the left value is always less than the right value? It wouldn't make sense to have a range e.g. 20 - 1 or 15 - 5
As the commentors note: if this is possible in a regex it will be very hard. There is no direct support to perform arithmetical comparisons (including greater than) in a regex.
Better to use a regex to validate the format and capture the two numbers. If the regex matches then use the host language to convert the captures into numbers and compare.
Yes you can do that. You can ensure that a given number is smaller than another number in the same text.
This regex tests whether the first number is smaller than the next number: Format: XXX,YYY
XXX < YYY:
\b(?:[1-9](?<open>\B\d)+\d*,(?<close-open>\d)+(?(open)(?!))\b
|
(?<prefix>\d*)(?:(?<g0>0)|(?<g1>1)|(?<g2>2)|(?<g3>3)|(?<g4>4)|(?<g5>5)|(?<g6>6)|(?<g7>7)|(?<g8>8)|(?<g9>9))(?<suffix>\d)*,\k<prefix>(?(g0)(?!)|(?(g1)0|(?(g2)[01]|(?(g3)[0-2]|(?(g4)[0-3]|(?(g5)[0-4]|(?(g6)[0-5]|(?(g7)[0-6]|(?(g8)[0-7]|(?(g9)[0-8]))))))))))(?<suffix2-suffix>\d)*(?(suffix)(?!)))\b
XXX > YYY:
(?<open>\B\d|\b[1-9])+,[1-9](?<close-open>\d)+(?(open)(?!))\d*\b
|
(?<prefix>\d*)(?:(?<g0>0)|(?<g1>1)|(?<g2>2)|(?<g3>3)|(?<g4>4)|(?<g5>5)|(?<g6>6)|(?<g7>7)|(?<g8>8)|(?<g9>9))(?<suffix>\d)*,\k<prefix>(?(g0)[1-9]|(?(g1)[2-9]|(?(g2)[3-9]|(?(g3)[4-9]|(?(g4)[5-9]|(?(g5)[6-9]|(?(g6)[7-9]|(?(g7)[89]|(?(g8)9|(?(g9)(?!)))))))))))(?<suffix2-suffix>\d)*(?(suffix)(?!))
If you want to use - as separator you only have to replace the , with - in this regex. This regex was created and tested using C# regex.