Regex date formats - regex

I need help with with three regular expressions for date validation. The date formats to validate against should be:
- MMyy
- ddMMyy
- ddMMyyyy
Further:
I want the regular expressions to match the exact number of digits in the formats above. For instance, January should be 01, NOT 1:
060117 // ddMMyy format: Ok
06117 // ddMMyy format: NOT Ok
Hyphens and slashes are NOT allowed, like: 06-01-17, or 06/01/17.
Below are the regex:es that I use. I cannot get them quite right though.
string regex_MMyy = #"^(1[0-2]|0[1-9]|\d)(\d{2})$";
string regex_ddMMyy = #"^(0[1-9]|[12]\d|3[01])(1[0-2]|0[1-9]|\d)(\d{2})$";
string regex_ddMMyyyy = #"^(0[1-9]|[12]\d|3[01])(1[0-2]|0[1-9]|\d)(\d{4})$";
var test_MMyy_1 = Regex.IsMatch("0617", regex_MMyy); // Pass
var test_MMyy_2 = Regex.IsMatch("617", regex_MMyy); // Pass, do NOT want this to pass.
var test_ddMMyy_1 = Regex.IsMatch("060117", regex_ddMMyy); // Pass
var test_ddMMyy_2 = Regex.IsMatch("06117", regex_ddMMyy); // Pass, do NOT want this to pass.
var test_ddMMyyyy_1 = Regex.IsMatch("06012017", regex_ddMMyyyy); // Pass
var test_ddMMyyyy_2 = Regex.IsMatch("0612017", regex_ddMMyyyy); // Pass, do NOT want this to pass.
(If anyone could take allowed days for each month, and leap years into account, that would be a huge bonus :)).
Thanks,
Best Regards

Related

How to detect incomplet date from list and replace with flutter?

Hello I don't find how to detect an incomplet date from listString. I think about regex but don't know how to extract this sequence input.
input=[2022-01-20 20:01, 2022-01-20 21, 2022-01-20 22:25, 2022-01-20 23:01]
Here I tried to match 2022-01-20 21 (it's the only who not have minute)
after match I want to add minute :00 to remove wrong date format
Here is what I search to have
output=[2022-01-20 20:01, 2022-01-20 21:00, 2022-01-20 22:25, 2022-01-20 23:01]
here is what I tried
dateList=[2022-01-20 20:01, 2022-01-20 21, 2022-01-20 22:25, 2022-01-20 23:01];
for (var i = 1; i < dateList.length; i++) {
RegExp regExp = new RegExp(
r"^((?!:).)*$",
);
var match = regExp.firstMatch("${dateList}");
var index = dateList1.indexOf(match);
dateList.replaceRange(index, index + 1, ["$match:00"]);
}
for each index of my stringlist I seach the only who not have : after I found the index who have a problem, and I replace the index with the add :00
problem match return null...
Thank you
I agree that using regular expressions is the way to go here. Detecting a date is relatively simple, you're basically looking for
4-digits dash 2-digits dash 2-digits space 2-digits colon 2-digits
Which, in RegExp language is
\d{4}-\d{2}-\d{2} \d{2}:\d{2}
Now we can detect whether a given String contains a complete datetime. The only thing that's left is to add the trailing minutes when it is missing. Note that you can decide what to add using another regular expression, but this code will just add the minutes, assuming that's always the issue.
List<String> input = ['2022-01-20 20:01', '2022-01-20 21', '2022-01-20 22:25', '2022-01-20 23:01'];
List<String> output = [];
// detect a date + time
RegExp regex = RegExp(r'\d{4}-\d{2}-\d{2} \d{2}:\d{2}');
for (String maybeDate in input) {
bool isCompleteDate = regex.hasMatch(maybeDate);
if (isCompleteDate) {
output.add(maybeDate);
} else {
// we want to comlete the String
// in this case, I assume it's always just the minutes missing, but you could use another regex to see which part is missing
output.add(maybeDate + ':00');
}
}
print(output);
Alternatively, you can indeed use negative lookahead to find the missing minutes:
// detects a date and hour, without a colon and two digits (the minutes)
RegExp missingMinutes = RegExp(r'(\d{4}-\d{2}-\d{2} \d{2})(?!:\d{2})');
Which, in case you have a String instead of a List<String> would result in
List<String> input = ['2022-01-20 20:01', '2022-01-20 21', '2022-01-20 22:25', '2022-01-20 23:01'];
String listAsString = input.toString();
RegExp missingMinutes = RegExp(r'(\d{4}-\d{2}-\d{2} \d{2})(?!:\d{2})');
List<RegExpMatch?> matches = missingMinutes.allMatches(listAsString).toList();
for (int i = matches.length - 1; i >= 0; i--) {
// walk through all matches
if (matches[i] == null) continue;
listAsString = listAsString.substring(0, matches[i]!.end) + ':00' + listAsString.substring(matches[i]!.end);
}
print(listAsString);

Regexp to get the utm values

I'm looking to extract some of the utm values from a URL using regexp. My URL would look something like the below -
utm_source=ko_1d5b57661294a3154&utm_medium=internetq&utm_campaign=-android5436af9f1aef91a654a7255038&utm_term=searchthis&utm_content=mainpage&
Is there any way to have a regexp that would extract all the utm values such as utm_source, utm_medium, utm_capaign, utm_term, utm_content ?
You could grab all patching pairs and then convert it to an object.
NOTE: The object conversion is simplistic (doesn't account for multiple params of the same key, etc.).
var regexp = /(?!&)utm_[^=]*=[^&]*(?=&)/g;
var query = 'utm_source=ko_1d5b57661294a3154&utm_medium=internetq&utm_campaign=-android5436af9f1aef91a654a7255038&utm_term=searchthis&utm_content=mainpage&';
var matches = query.match(regexp);
var values = matches.reduce(function(obj, param) {
var keyVal = param.split('=');
obj[keyVal[0]] = keyVal[1];
return obj;
}, {});
document.write('<pre>' + JSON.stringify({
matches: matches,
values: values
}, null, 2) + '<pre>');
You could use a positive lookbehind for this case. The pattern would look like that:
(?<=utm_[a-z]+=)\w+
This pattern matches any alphanumerical characters that are preceeded by "utm_???="
Here i am what i am doing is getting every value between = and & sign.
/[^=]\w+(?=&)/g
Another one according to utm_
/[^utm_=]\w+(?=&)/g

How to parse out data of a string

I have a function which gets a string from another website and if I extract it I end up with the following string
IFX TMP2134567 1433010010 WT33 PARTIAL 2014-11-26 09:43:58 IFX TEMP12345 1433010003 SW80 PARTIAL 2014-11-26 09:43:10 IFX AP RETERM 007 1418310108 MB01 CONFIRMED 2014-07-03 09:48:37
In this case it's 2 records which have 6 fields each and they are all separated by a space. how can I go and read the string and add these into an structure and array to access them.
The fields would be set up like this
IFX
TMP2134567 (this field may contain a space)
1433010010
WT33
PARTIAL
2014-11-26 09:43:58.
So if we use the " " as a separator we would get 7 since the 6th is a date time and has a space between I could also use 7 since I can put 6 and 7 back together and store date and time separately.
My question is there a way to do this with 6 or if I have to use 7 how would I do that. I tried valuelist but that does not work.
I know a couple of things in my list, 1st one is always 3 Char, 4th is always 4 char and my record ends with a date time in format YYYY-MM-DD HH:MM:SS
To make it a bit more complicated I just found that the 2nd field can have spaces like in the 3rd record which looks like this "AP RETERM 007"
Another option is to create a JSON string with your data like this, and then deserialize it.
<cfsavecontent variable="sampledata">
IFX TMP2134567 1433010010 WT33 PARTIAL 2014-11-26 09:43:58 IFX TEM P12345 1433010003 SW80 PARTIAL 2014-11-26 09:43:10 IFX AP RETERM 007 1418310108 MB01 CONFIRMED 2014-07-03 09:48:37</cfsavecontent>
<cfset asJson = ReReplaceNoCase(sampledata,"\s*(.{3}) (.*?) (\d+) (.{4}) ([^\s]*) (\d+-\d+-\d+ \d+:\d+:\d+)\s*",'["\1","\2","\3","\4","\5","\6"],',"ALL")>
<!--- Replace the last comma in the generated string with a closing bracket --->
<cfset asJson = "[" & ReReplace(asJson,",$","]","ALL")>
<cfset result_array = DeSerializeJSON(asJson)>
<cfdump var="#result_array#">
You can access the data simply with the resulting array.
So here's how I understand it
3 characters
Variable string
All digits
4 characters
I assume this value never contains a space
Date/Time
Based on assuming a "yes" to my question above, this solution works:
<cfscript>
raw = " IFX TMP2134567 1433010010 WT33 PARTIAL 2014-11-26 09:43:58 IFX TEMP12345 1433010003 SW80 PARTIAL 2014-11-26 09:43:10 IFX AP RETERM 007 1418310108 MB01 CONFIRMED 2014-07-03 09:48:37";
recordPattern = "(\S+)\s+([\w\s]+)\s+(\d+)\s+(\S+)\s+(\S+)\s+(\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2})";
keys = ["a","b","c","d","e","f"];
records = getRecordsFromString(raw, recordPattern, keys);
writeDump(records);
function getRecordsFromString(raw, pattern, keys){
var offset = 1;
var records = [];
while (true) {
var result = getRecord(raw, recordPattern, keys, offset);
offset = result.offset;
if (!offset) break;
arrayAppend(records, result.record);
}
return records;
}
function getRecord(raw, recordPattern, keys, offset){
var match = reFind(recordPattern, raw, offset, true);
if (arrayLen(match.pos) != arrayLen(keys)+1){
return {record="", offset=0};
}
var keyIdx=1;
for (var key in keys){
record[key] = mid(raw, match.pos[++keyIdx], match.len[keyIdx]);
}
return {record=record, offset=offset+match.len[1]};
}
</cfscript>
Obviously you will need to tweak the recordPattern and keys to suit your actual needs.
And if you don't understand the regular expression usage there, do yourself a favour and read up on it. I do a series on "regular expressions in CFML" on my blog, which would be an adequate starting point.

Matching algorithm or regular expression?

I have a huge log file with different types of string rows, and I need to extract data in a "smart" way from these.
Sample snippet:
2011-03-05 node32_three INFO stack trace, at empty string asfa 11120023
--- - MON 23 02 2011 ERROR stack trace NONE      
For instance, what is the best way to extract the date from each row, independent of date format?
You could make a regex for different formats like so:
(fmt1)|(fmt2)|....
Where fmt1, fmt2 etc are the individual regexes, for yor example
(20\d\d-[01]\d-[0123]\d)|((?MON|TUE|WED|THU|FRI|SAT|SUN) [0123]\d [01]\d 20\d\d)
Note that to prevent the chance to match arbitrary numbers I restricted year, month and day numbers accordingly. For example, a day number cannot start with 4, neither can a month number start with 2.
This gives the following pseudo code:
// remember that you need to double each backslash when writing the
// pattern in string form
Pattern p = Pattern.compile("..."); // compile once and for all
String s;
for each line
s = current input line;
Matcher m = p.matcher(s);
if (m.find()) {
String d = m.group(); // d is the string that matched
....
}
Each individual date pattern is written in () to make it possible to find out what format we had, like so:
int fmt = 0;
// each (fmt) is a group, numbered starting with 1 from left to right
for (int i = 1; fmt == 0 && i <= total number of different formats; i++)
if (m.group(i) != null) fmt = i;
For this to work, inner (regex) groups must be written (?regex) so that they do not count as capture-groups, look at updated example.
If you use Java, you may want to have a look at Joda time. Also, read this question and related answers. I think Joda DateTimeFormat should give you all the flexibility that you need to parse the various date/time format of your log file.
A quick example:
String dateString = "2011-04-18 10:41:33";
DateTimeFormatter formatter =
DateTimeFormat.forPattern("yyyy-MM-dd HH:mm:ss");
DateTime dateTime = formatter.parseDateTime(dateString);
Just define a String[] for the formats of you date/time, and pass each element to DateTimeFormat to get the corresponding DateTimeFormatter. You can use regex just separate date strings from other stuff in the log lines, and then you can use the various DateTimeFormatters to try and parse them.

Does anyone know of a reg expression for uk date format

Hi does any one know a reg ex for a uk date format e.g. dd/mm/yyyy.
The dd or mm can be 1 character e.g. 1/1/2010 but the year must always be 4 characters.
Thanks in advance
^\d{1,2}/\d{1,2}/\d{4}$
will match 1/1/2000, 07/05/1999, but also 99/77/8765.
So if you want to do some rudimentary plausibility checking, you need
^(0?[1-9]|[12][0-9]|3[01])/(0?[1-9]|1[012])/\d{4}$
This will still match 31/02/9999, so if you want to catch those, it's getting hairier:
^(?:(?:[12][0-9]|0?[1-9])/0?2|(?:30|[12][0-9]|0?[1-9])/(?:0?[469]|11)|(?:3[01]|[12][0-9]|0?[1-9])/(?:0?[13578]|1[02]))/\d{4}$
But this still won't catch leap years. So, modifying a beast of a regex from regexlib.com:
^(?:(?:(?:(?:31\/(?:0?[13578]|1[02]))|(?:(?:29|30)\/(?:0?[13-9]|1[0-2])))\/(?:1[6-9]|[2-9]\d)\d{2})|(?:29\/0?2\/(?:(?:(1[6-9]|[2-9]\d)(?:0[48]|[2468][048]|[13579][26])|(?:(?:16|[2468][048]|[3579][26])00))))|(?:0?[1-9]|1\d|2[0-8])\/(?:(?:0?[1-9])|(?:1[0-2]))\/(?:(?:1[6-9]|[2-9]\d)\d{2}))$
will match
1/1/2001
31/5/2010
29/02/2000
29/2/2400
23/5/1671
01/1/9000
and fail
31/2/2000
31/6/1800
12/12/90
29/2/2100
33/3/3333
All in all, regular expressions may be able to match dates; validating them is not their forte, but if they are all you can use, it's certainly possible. But looks horrifying :)
Regex is not the right tool for this job.
It is very difficult (but possible) to come up with the regex to match a valid date. Things like ensuring Feb has 29 days on leap year and stuff is not easily doable in regex.
Instead check if your language library provides any function for validating dates.
PHP has one such function called checkdate :
bool checkdate ( int $month , int $day , int $year)
\b(0?[1-9]|[12][0-9]|3[01])[/](0?[1-9]|1[012])[/](19|20)?[0-9]{2}\b
Match :
1/1/2010
01/01/2010
But also invalid dates such as February 31st
^\d{1,2}/\d{1,2}/\d{4}$
In braces there is min and max char count. \d means digit, ^ start, and $ end of string.
\b(0[1-9]|[12][0-9]|3[01])[- /.](0[1-9]|1[012])[- /.](19|20)\d\d
This wont validate the date, but you can check for format
I ran into the similar requirements.
Here is the complete regular expression along with Leap Year validation.
Format: dd/MM/yyyy
(3[01]|[12]\d|0[1-9])/(0[13578]|10|12)/((?!0000)\d{4})|(30|[12]\d|0[1-9])/(0[469]|11)/((?!0000)\d{4})|(2[0-8]|[01]\d|0[1-9])/(02)/((?!0000)\d{4})|
29/(02)/(1600|2000|2400|2800|00)|29/(02)/(\d\d)(0[48]|[2468][048]|[13579][26])
It can be easily modified to US format or other EU formats.
edited:
(3[01]|[12]\d|0[1-9])/(0[13578]|10|12)/((?!0000)\d{4})|(30|[12]\d|0[1-9])/(0[469]|11)/((?!0000)\d{4})|(2[0-8]|[01]\d|0[1-9])/(02)/((?!0000)\d{4})|29/(02)/(1600|2000|2400|2800|00)|29/(02)/(\d\d)(0[48]|[2468][048]|[13579][26])
There are two things you want to do, which in my view are best considered separately
1) You want to make sure that the date is a real, actual date.
For example the 2019-02-29 isn't a real date whereas 2020-02-29 is a real date because 2020 is a leap year
2) You want to check that the date is in the correct format (so dd/mm/yyyy)
The second point can be done easily enough with a simple RegEx, plenty of examples of that.
To complicate matters, if you ask Firefox if 2019-02-29 is a real date, it'll return NaN, which is what you'd expect.
Chrome, on the other hand will say it is a real date and give you back the 1st of March 2019 - which will validate
Chrome, will also accept a single digit number as a proper date too for some strange reason, feed it "2" and it'll give you full date from 2001 back - which will validate
So first step is to create a function which attempts to decipher a date (no matter the format) and works cross-browser to return a boolean indicating if the date is valid or not
function validatableDate(value)
{
Date.prototype.isValid = function()
{ // An invalid date object returns NaN for getTime() and NaN is the only
// object not strictly equal to itself.
return this.getTime() === this.getTime();
};
minTwoDigits = function(n)
{ //pads any digit less than 10 with a leading 0
return (parseInt(n) < 10 ? '0' : '') + parseInt(n);
}
var valid_date = false;
var iso_array = null;
// check if there are date dividers (gets around chrome allowing single digit numbers)
if ((value.indexOf('/') != -1) || (value.indexOf('-') != -1)) { //if we're dealing with - dividers we'll do some pre-processing and swap them out for /
if (value.indexOf('-') != -1) {
dash_parts = value.split('-');
value = dash_parts.join("/");
//if we have a leading year, we'll put it at the end and work things out from there
if (dash_parts[0].length > 2) {
value = dash_parts[1] + '/' + dash_parts[2] + '/' + dash_parts[0];
}
}
parts = value.split('/');
if (parts[0] > 12) { //convert to ISO from UK dd/mm/yyyy format
iso_array = [parts[2], minTwoDigits(parts[1]), minTwoDigits(parts[0])]
} else if (parts[1] > 12) { //convert to ISO from American mm/dd/yyyy format
iso_array = [parts[2], minTwoDigits(parts[0]), minTwoDigits(parts[1])]
} else //if a date is valid in either UK or US (e.g. 12/12/2017 , 10/10/2017) then we don't particularly care what format it is in - it's valid regardless
{
iso_array = [parts[2], minTwoDigits(parts[0]), minTwoDigits(parts[1])]
}
if (Array.isArray(iso_array)) {
value = iso_array.join("-");
var d = new Date(value + 'T00:00:01Z');
if (d.isValid()) //test if it is a valid date (there are issues with this in Chrome with Feb)
{
valid_date = true;
}
//if the month is Feb we need to do another step to cope with Chrome peculiarities
if (parseInt(iso_array[1]) == 2) {
month_info = new Date(iso_array[0], iso_array[1], 0);
//if the day inputed is larger than the last day of the February in that year
if (iso_array[2] > month_info.getDate()) {
valid_date = false;
}
}
}
}
return valid_date;
}
That can be compressed down to
function validatableDate(t) {
Date.prototype.isValid = function () {
return this.getTime() === this.getTime()
}, minTwoDigits = function (t) {
return (parseInt(t) < 10 ? "0" : "") + parseInt(t)
};
var a = !1,
i = null;
return -1 == t.indexOf("/") && -1 == t.indexOf("-") || (-1 != t.indexOf("-") && (dash_parts = t.split("-"), t = dash_parts.join("/"), dash_parts[0].length > 2 && (t = dash_parts[1] + "/" + dash_parts[2] + "/" + dash_parts[0])), parts = t.split("/"), i = parts[0] > 12 ? [parts[2], minTwoDigits(parts[1]), minTwoDigits(parts[0])] : (parts[1], [parts[2], minTwoDigits(parts[0]), minTwoDigits(parts[1])]), Array.isArray(i) && (t = i.join("-"), new Date(t + "T00:00:01Z").isValid() && (a = !0), 2 == parseInt(i[1]) && (month_info = new Date(i[0], i[1], 0), i[2] > month_info.getDate() && (a = !1)))), a
}
That gets you a cross-browser test as to whether the date can be validated or not and it'll read & decipher dates in formats
yyyy-mm-dd
dd-mm-yyyy
mm-dd-yyyy
dd/mm/yyyy
mm/dd/yyyy
Once you've validated the date is a real, proper one you can then test the format with a regex. So for UK dd/mm/yy
function dateUK(value) {
valid_uk_date=false;
valid_date=validatableDate(value);
if(valid_date && value.match(/^(0?[1-9]|[12][0-9]|3[01])[\/](0?[1-9]|1[012])[\/]\d{4}$/))
{ valid_uk_date=true;
}
return valid_uk_date;
}
You then know that the date is a real one and that it's in the correct format.
For yyyy-mm-dd format, you'd do:
function dateISO(value) {
valid_iso_date=false;
valid_date=validatableDate(value);
if(valid_date && value.match(/^\d{4}[\/\-]\d{1,2}[\/\-]\d{1,2}$/))
{ valid_iso_date=true;
}
return valid_iso_date;
}
It depends how thorough you want to be of course, for a rough check of format sanity a RegEx may be enough for your purposes. If however you want to test if the date is a real one AND if the format is valid then this will hopefully help point you along the way
Thanks