I need to implement regex validation for value that will be used in my server side to get data where certain timestamp is older (smaller) than now() - interval 'myValue'.
pSQL interval function is explained here, and in short it can have values like 2 days,3 years,12 hours, but also you can nest more different values like 2 days 6 hours 30 minutes etc.
I currently have a regex /^\d+\s(seconds?|minutes?|hours?|days?|weeks?|months?|years?)$/i that accepts only one value (e.g. 2 days), but can't figure out how to allow multiple values, and set a rule that a certain string from this group can only be repeated once or not at all.
This regex /^\d+\s(seconds?|minutes?|hours?|days?|weeks?|months?|years?)(\s\d+\s(seconds?|minutes?|hours?|days?|weeks?|months?|years?))*$/i allows nesting but also allows repetition of values e.g. 2 days 12 hours 6 hours 2 minutes which will result in a fatal error in pSQL query.
I tried restricting repetition of values in this group with \1 and {0,1} combination of regex operators but I just can't nail it precisely enough.
NOTE: Regex is unfortunately only way I can validate this value, since I don't have access to server-side controller which receives this value nor do I have access to client-side frontend of this form. I can't just throw exceptions or skip query because it is a part of important cron-job, and must be stable at all time.
(All I have access to is json schema of this value, and therefore can only define regex pattern for it)
Any help is appreciated, thanks.
You can use
^(?!.*(second|minute|hour|day|week|month|year).*\1)\d+\s+(?:second|minute|hour|day|week|month|year)s?(?:\s+\d+\s+(?:second|minute|hour|day|week|month|year)s?)*$
See the regex demo
Details
^ - start of string
(?!.*(second|minute|hour|day|week|month|year).*\1) - no second, minute, hour day, week, monthoryear` string repetition allowed in the whole string
\d+\s+(?:second|minute|hour|day|week|month|year)s? - 1 or more digits, one or more whitespaces, then either second, minute, hour, day, week, month or year, and then an optional s letter
(?:\s+\d+\s+(?:second|minute|hour|day|week|month|year)s?)* - zero or more repetition of one or more whitespaces followed with the pattern described above
$ - end of string.
Forget it. The only complete documentation of the supported values for interval is the implementation (the guts are in ParseDateTime).
Consider these:
SELECT INTERVAL '12 00:12:00';
interval
══════════════════
12 days 00:12:00
(1 row)
SELECT INTERVAL '12 d 12 mins';
interval
══════════════════
12 days 00:12:00
(1 row)
SELECT INTERVAL '3-2';
interval
════════════════
3 years 2 mons
(1 row)
What I would do in your place is to write a function that casts the string to interval and catches and reports an error:
CREATE FUNCTION interval_ok(text) RETURNS boolean
LANGUAGE plpgsql AS
$$BEGIN
PERFORM CAST ($1 AS interval);
RETURN TRUE;
EXCEPTION
WHEN invalid_datetime_format THEN
RETURN FALSE;
END;$$;
Related
I need to validate with regex a date in format yyyy-mm-dd (2019-12-31) that should be within the range 2019-12-20 - 2020-01-10.
What would be the regex for this?
Thanks
Regex only deal with characters. so we have to work out at each position in the date what are the valid characters.
The first part is easy. The first two characters have to be 20
Now it gets complicated the next character can be a 1 or a 2 but what follows depends on the value of that character so we split the rest of the regex into two sections the first if the third character matches 1 and the second if it matches 2
We know that if the third character is a 1 then what must follow is the characters 9-12- as the range starts at 2019-12-20 now for the day part. The 9th character is the tens for the day this can only be 2 or 3 as we are already in the last month and the minimum date is 20. The last character can be any digit 0-9. This gives us a day match of [23][0-9]. Putting this together we now have a pattern for years starting 2019 as 19-12-[23][0-9]
It the third character is a 2 then we can match up to the day part of the date a gain as the range ends in January. This gives us a partial match of 20-01- leaving us to work on the day part. Hear we know that the first character of the day can either be a 1 or 0 however if it's a 1 then the last character must be a 0 and if it's a 0 then the last character can only be in the range 1 to 9. This give us another alteration (?:0[1-9]|10) Putting the second part together we get 20-01-(?:0[1-9]|10).
Combining these together gives the final regex 20(?:19-12-[23][0-9]|20-01-(?:0[1-9]|10))
Note that I'm assuming that the date you are testing against is a validly formatted date.
Try this:
(2019|2020)\-(12|01)\-([0-3][0-9]|[0-9])
But be aware that this will allow number up to where the first digit is between zero and three and the second digit between zero and nine for the dd value. You could specify all numbers you want to allow (from 20 to 10) like this (20|21|22|23|24|25|26|27|28|29|30|31|01|1|02|2|03|3|04|4|05|5|06|6|07|7|08|8|09|9|10).
(2019|2020)\-(12|01)\-(20|21|22|23|24|25|26|27|28|29|30|31|01|1|02|2|03|3|04|4|05|5|06|6|07|7|08|8|09|9|10)
But honestly... Regular-Expressions are not the right tool for this. RegExp gives a mask to something, not a logical context. Use regex to extract the data/value from a string and validate those values using another language.
The above 2nd Regex will, f.e. match your dates, but also values outside of this range since there is no context between 2019|2020 and the second group 12|01 so they match values like 2019-12-11 but also 2020-12-11.
To only match the values you want this will be a really large regex like this (inner brackets only if you need them) ((2019)-(12)-(20)|(2019)-(12)-(21)|(2019)-(12)-(22)|...) and continue with all possible dates - and ask yourself: what would you do if you find such a regex in a project you have to work with ;)
Better solution (quick and dirty, there might be better solutions):
(?<yyyy>20[0-9]{2})\-(?<mm>[01][0-9]|[0-9])\-(?<dd>[0-3][0-9]|[0-9])
This way you have three named groups (yyyy, mm, dd) you can access and validate the matched values... The regex is smaller, you have a better association between code and regex and both are easier to maintain.
I'm trying to get the day and month from strings such as:
5月2日 or 4月22日 or 12月2日
However I can't see to figure out the correct regex:
I've tried \d{1,2}[^月] and \d{1,2}[^日] however this only returns something if there is a double digit in the day or month.
Any ideas what I'm missing?
Thanks.
\d{1,2} is matching 1 digit and [^月] is matching another. Your current regex will match two digits and then any character except 月
The correct way to ensure the 月 follows is to use a lookahead \d{1,2}(?=月) as seen in use here
Assuming you have 12 months per year and up to 31 days per month this will get you close, you'll still have to do bounds checking after you determine the syntax is correct; (read; month 19 day 37 will be valid syntax here)
1?\d月[123]?\d日
Edit: Here's a better regex that doesn't need to be bounds checked and doesn't require lookahead;
^(1[012]|[1-9])月(3[01]|[12]\d|[1-9])日$
I have question. I am trying to prepare date regex comparmission. The problem is month and day if its one digit it can be present as 03 or 3 for both month and day. For instance possible values:
2015/03/27 or 2015/4/12 or 2015/07/05 or 2015/2/2 or 2015/02/3
What i did so far is:
^(?<Month>\d(0([0-1]|1[0-2])|([1-12])){1,2})/(?<Day>\d{1,2})/(?<Year>(?:\d{4}|\d{2}))$
I started to make now for month:
(?<Month>\d(0([0-1]|1[0-2])|([1-12])){1,2})
(0([0-1]|1[0-2])|([1-12])){1,2})
so {1,2} - because can be one digit or two for instance (12, 2, 02)
0([0-1]|1[0-2]) | ([1-12])) - because can be two digits or one
somehow i cant figure it into the final version.
Can you help me out?
Using just \d, you might end up with fake dates, like 12/67/4567.
Also, your input has another date format: Year/Month/Day.
I suggest using this regex for your input format:
^(?<Year>(?:19|20)\d{2})\/(?<Month>0?[1-9]|1[0-2])\/(?<Day>3[01]|0?[1-9]|[12][0-9])$
See demo
Optional 0s are made possible due to the ? quantifier after 0.
If it is for .NET, you do not have to escape /s.
To validate the date, use the classes and methods of the programming environment you are using. Here is an example in C#:
var resultFromRegex = "2015/03/27";
DateTime validDate;
var isValid = DateTime.TryParseExact(resultFromRegex, "yyyy/MM/dd", new System.Globalization.CultureInfo("en-US"), System.Globalization.DateTimeStyles.None, out validDate);
In GAS, using the .replace(), Is it possible to match any term within a long text string that is at least 5 consecutive ALL CAPS characters (may have 1 space in there) and prefix it with a string, such as ][? There may be multiple matches within the text string, so I want to insert markers that begin and end a phrase beginning with an ALLCAPS category.
An example of a similar type of text would be this (structurally similar, but with other sensitive data):
"VACATION: Approved by Supervisor - Frequency 1-3 times per year; duration not to exceed 5 days. SICK LEAVE: Approved by Supervisor - Frequency up to 8 per year, no more than 5 days consecutively without MD excuse. FMLA FEDERAL: Approved by HR - Frequency as needed, must be approved at least 14 days in advance, or within 24 hours of employee's identified need."
I have learned, through Serge, how to replace globally, which was a big help, but the more I research regexp's, the more confusing it gets. I tried substituting the all caps regexp for a specific term, but failed. I think that I could go through and extract all of the all caps regexp's and use them in a replace with multiple values, but it seems that would be a very long way around.
Is it possible, in a couple of lines to make the above text look like this:
"][VACATION: Approved by Supervisor - Frequency 1-3 times per year; duration not to exceed 5 days. ][SICK LEAVE: Approved by Supervisor - Frequency up to 8 per year, no more than 5 days consecutively without MD excuse. ][FMLA FEDERAL: Approved by HR - Frequency as needed, must be approved at least 14 days in advance, or within 24 hours of employee's identified need."
My intention is to then split on the ] Which would mean that new cells would start with the all caps term, and end with ]. I have the code to convert the text to an array (there are lots of entries), then use .replace() to find and replace within the array, and to set the values back into the sheet, but I just don't know if there is a way to either prefix (my research says lookback isn't possible in GAS), or to pick up the allcaps value, add the string "][", and put it back.
If this is asking too much, or feels like I haven't included any code, here is the first part that Serge already helped with: Looking for a Google script that will perform CTRL+F replace for a string
Here is the code, as I used it, combining Serge's previous help and the new recommendation. I had to fix some case issues with a term before running the all caps because some people can't follow a template, but it works.
function insertSplitMarkers(){
var sh = SpreadsheetApp.getActiveSpreadsheet().getSheetByName('Freq Iso');
var data = sh.getRange(2,1,sh.getLastRow(),sh.getLastColumn()).getValues();// get all data
var regexp = /(([A-Z]\s*){5,})/g;
for(var n=0;n<data.length;n++){
for(var m=0;m<data[0].length;m++){
if(typeof(data[n][m])=='string'){ // if it is a string
data[n][m]=data[n][m].replace(/Interventions/g,'INTERVENTIONS');// use the regex replace with /g parameter meaning "globally"
data[n][m]=data[n][m].replace(regexp, "][$1");
}
}
}
Logger.log(data);
sh.getRange(2,1,data.length,data[0].length).setValues(data);
}
It looks like this will do what you want although as is, it will also pick out aoAOEOUE:
var yourString = "VACATION: Approved by Supervisor - Frequency 1-3 times per year; duration not to exceed 5 days. SICK LEAVE: Approved by Supervisor - Frequency up to 8 per year, no more than 5 days consecutively without MD excuse. FMLA FEDERAL: Approved by HR - Frequency as needed, must be approved at least 14 days in advance, or within 24 hours of employee's identified need.";
var regexp = /(([A-Z]\s*){5,})/g;
var newString = yourString.replace(regexp, "][$1");
Logger.log(newString);
#user3169581 I've adjusted your regex slightly to try to eliminate matching whitespace around the desired phrase and ensure you get the whole desired phrase, it will require a little adjustment in the replace:
var regexp = /\b([A-Z\s]{5,})(:)/g
...
data[n][m] = data[n][m].replace(regexp,"][$2$3")
Link to regex101 with working matching here: http://regex101.com/r/rD5kS9
HTH
EDIT: for some reason the existing answer wasn't showing up for me when I started this response. Forgive the redundancy.
I need a regex for date string which validates
YYYY:MM:DD:HH
YYYY:MM:DD:HH:mm
YYYY:MM:DD:HH:mm:ss
means all 3 formats are valid.
Can someone help me with this ?
I have
d\d\d\d:(0\d|1[012]):([012]\d|3[01]):([01]\d|2[0-3])$ YYYY:MM:DD:HH
^\d\d\d\d:(0\d|1[012]):([012]\d|3[01]):([01]\d|2[0-3]):[0-5]\d$ YYYY:MM:DD:HH:MM
^\d\d\d\d:(0\d|1[012]):([012]\d|3[01]):([01]\d|2[0-3]):[0-5]\d:[0-5]\d$ YYYY:MM:DD:HH:MM:SS
These 3 regex and needs to be combine in one
this is your pattern
YYYY:MM:DD:HH(:mm(:ss)?)?
? means 0 or 1 time
you can test it here
I kept your year month day expression d\d\d\d:(0\d|1[012]):([012]\d|3[01]):([01]\d|2[0-3]). Since your hour and minute expressions where the same :[0-5]\d I just required them to appear zero, once or twice with.
The resulting expression is:
^\d\d\d\d:(0\d|1[012]):([012]\d|3[01]):([01]\d|2[0-3])(:[0-5]\d){0,2}$
This expression by francis-gagnon is a slight modification to prevent edge cases where the day or month is expressed as 00.
^\d\d\d\d:(0[1-9]|1[012]):(0[1-9]|[12]\d|3[01]):([01]\d|2[0-3])(:[0-5]\d){0,2}$
If you're looking to also check the date is valid then you could use something like this monster which will test each date position to it's valid and that the time will fit into 24 hour clock:
^(?:(?:(?:(?:(?:1[6-9]|[2-9]\d)?(?:0[48]|[2468][048]|[13579][26])|(?:(?:16|[2468][048]|[3579][26])00)))(:|\/|-|\.)(?:0?2\1(?:29)))|(?:(?:(?:1[6-9]|[2-9]\d)?\d{2})(:|\/|-|\.)(?:(?:(?:0?[13578]|1[02])\2(?:31))|(?:(?:0?[13-9]|1[0-2])\2(?:29|30))|(?:(?:0?[1-9])|(?:1[0-2]))\2(?:0?[1-9]|1\d|2[0-8]))))(?::(?:[01]\d|2[0-3]))?(?::[0-5]\d){0,2}$
\d{4}:[0-1][0-9]:[0-3][0-9](?::[0-5][0-9](?::[0-5][0-9])?)?