Dynamic regex for date-time formats - regex

Is there an existing solution to create a regular expressions dynamically out of a given date-time format pattern? The supported date-time format pattern does not matter (Joda DateTimeFormat, java.text.SimpleDateTimeFormat or others).
As a specific example, for a given date-time format like dd/MM/yyyy hh:mm, it should generate the corresponding regular expression to match the date-times within the specified formats.

I guess you have a limited alphabet that your time formats can be constructed of. That means, "HH" would always be "hours" on the 24-hour clock, "dd" always the day with leading zero, and so on.
Because of the sequential nature of a time format, you could try to tokenize a format string of "dd/mm/yyyy HH:nn" into an array ["dd", "/", "mm", "/", "yyyy", " ", "HH", ":", "nn"]. Then go ahead and form a pattern string from that array by replacing "HH" with "([01][0-9]|2[0-3])" and so on. Preconstruct these pattern atoms into a lookup table/array. All parts of your array that are not in the lookup table are literals. Escape them to according regex rules and append them to you pattern string.
EDIT: As a side effect for a regex based solution, when you put all regex "atoms" of your lookup table into parens and keep track of their order in a given format string, you would be able to use sub-matches to extract the required components from a match and feed them into a CreateDate function, thus skipping the ParseDate part altogether.

If you are looking for basic date checking, this code matches this data.
\b(0?[1-9]|[12][0-9]|3[01])[- /.](0?[1-9]|1[012])[- /.](19|20)?[0-9]{2}\b
10/07/2008
10.07.2008
1-01/2008
10/07/08
10.07.2008
1-01/08
Code via regexbuddy

SimpleDateFormat already does this with the parse() method.
If you need to parse multiple dates from a single string, start with a regex (even if it matches too leniently), and use parse() on all the potential matches found by the regex.

The below given js / jQuery code is for dynamically generated RegEx for the Date format only, not for DateTime (Development version not fully tested yet.)
Date Format should be in "D M Y".
E.g.
DD-MM-YY
DD-MM-YYYY
YYYY-MM-DD
YYYY-DD-MM
MM-DD-YYYY
MM-DD-YY
DD/MM/YY
DD/MM/YYYY
YYYY/MM/DD
YYYY/DD/MM
MM/DD/YYYY
MM/DD/YY
Or other formats but created with "D M Y" characters:
var dateFormat = "DD-MM-YYYY";
var order = [];
var position = {"D":dateFormat.search('D'),"M":dateFormat.search('M'),"Y":dateFormat.search('Y')};
var count = {"D":dateFormat.split("D").length - 1,"M":dateFormat.split("M").length - 1,"Y":dateFormat.split("Y").length - 1};
var seprator ='';
for(var i=0; i<dateFormat.length; i++){
if(["Y","M","D"].indexOf(dateFormat.charAt(i))<0){
seprator = dateFormat.charAt(i);
}else{
if(order.indexOf(dateFormat.charAt(i)) <0 ){
order.push(dateFormat.charAt(i));
}
}
}
var regEx = "^";
$(order).each(function(ok,ov){
regEx += '(\d{'+count[ov]+'})'+seprator;
});
regEx = regEx.substr(0,(regEx.length)-1);
regEx +="$";
var re = new RegExp(regEx);
console.log(re);
NOTE: There is no validation check for months / days
e.g. month should be in 01-12 or date should be in 01-31

Related

Regex match everything not between a pair of characters

Suppose a string (representing elapsed time in the format HH:MM:ss) like this:
"123:59:00"
I want to match everything except the numbers for the minutes, i.e.: the regex should match the bold parts and not the number between colons:
"123: 59 :00"
In the example, the 59 should be the only part unmatched.
Is there any way to accomplish this with a js regex?
EDIT: I'm asking explicitly for a regex, because I'm using the Notion Formula API and can only use JS regex here.
You don't necessarily need to use RegEx for this. Use split() instead.
const timeString = "12:59:00";
const [hours, _, seconds] = timeString.split(":");
console.log(hours, seconds);
If you want to use Regex you can use the following:
const timeString = "12:59:00";
const matches = timeString.match(/(?<hours>^\d{2}(?=:\d{2}:))|(?<seconds>(?<=:\d{2}:)\d{2}$)/g);
console.log(matches);
// if you want to include the colons use this
const matchesWithColons = timeString.match(/(?<hours>^\d{2}:(?=\d{2}:))|(?<seconds>(?<=:\d{2}):\d{2}$)/g);
console.log(matchesWithColons);
You can drop the named groups ?<hours> and ?<seconds>.
Using split() might be the most canonical way to go, but here is a regex approach using match():
var input = "123:59:00";
var parts = input.match(/^[^:]+|[^:]+$/g);
console.log(parts);
If you want to also capture the trailing/leading colons, then use this version:
var input = "123:59:00";
var parts = input.match(/^[^:]+:|:[^:]+$/g);
console.log(parts);
Could also work
^([0-9]{2})\:[0-9]{2}\:([0-9]{2})$/mg

Regex to insert space with certain characters but avoid date and time

I made a regex which inserts a space where ever there is any of the characters
-:\*_/;, present for example JET*AIRWAYS\INDIA/858701/IDBI 05/05/05;05:05:05 a/c should beJET* AIRWAYS\ INDIA/ 858701/ IDBI 05/05/05; 05:05:05 a/c
The regex I used is (?!a\/c|w\/d|m\/s|s\/w|m\/o)(\D-|\D:|\D\*|\D_|\D\\|\D\/|\D\;)
I have added some words exceptions like a/c w/d etc. \D conditions given to avoid date/time values getting separated, but this created an issue, the numbers followed by the above mentioned characters never get split.
My requirement is
1. Insert a space after characters -:\*_/;,
2. but date and time should not get split which may have / :
3. need exception on words like a/c w/d
The following is the full code
Private Function formatColon(oldString As String) As String
Dim reg As New RegExp: reg.Global = True: reg.Pattern = "(?!a\/c|w\/d|m\/s|s\/w|m\/o)(\D-|\D:|\D\*|\D_|\D\\|\D\/|\D\;)" '"(\D:|\D/|\D-|^w/d)"
Dim newString As String: newString = reg.Replace(oldString, "$1 ")
formatColon = XtraspaceKill(newString)
End Function
I would use 3 replacements.
Replace all date and time special characters with a special macro that should never be found in your text, e.g. for 05/15/2018 4:06 PM, something based on your name:
05MANUMOHANSLASH15MANUMOHANSLASH2018 4MANUMOHANCOLON06 PM
You can encode exceptions too, like this:
aMANUMOHANSLASHc
Now run your original regex to replace all special characters.
Finally, unreplace the macros MANUMOHANSLASH and MANUMOHANCOLON.
Meanwhile, let me tell you why this is complicated in a single regex.
If trying to do this in a single regex, you have to ask, for each / or :, "Am I a part of a date or time?"
To answer that, you need to use lookahead and lookbehind assertions, the latter of which Microsoft has finally added support for.
But given a /, you don't know if you're between the first and second, or second and third parts of the date. Similar for time.
The number of cases you need to consider will render your regex unmaintainably complex.
So please just use a few separate replacements :-)

Scala regex find/replace with additional formatting

I'm trying to replace parts of a string that contains what should be dates, but which are possibly in an impermissible format. Specifically, all of the dates are in the form "mm/dd/YYYY" and they need to be in the form "YYYY-mm-dd". One caveat is that the original dates may not exactly be in the mm/dd/YYYY format; some are like "5/6/2015". For example, if
val x = "where date >= '05/06/2017'"
then
x.replaceAll("'([0-9]{1,2})/([0-9]{1,2})/([0-9]{4})'", "'$3-$1-$2'")
performs the desired replacement (returns "2017-05-06"), but for
val y = "where date >= '5/6/2017'"
this does not return the desired replacement (returns "2017-5-6" -- for me, an invalid representation). With the Joda Time wrapper nscala-time, I've tried capturing the dates and then reformatting them:
import com.github.nscala_time.time.Imports._
import org.joda.time.DateTime
val f = DateTimeFormat.forPattern("yyyy-MM-dd")
y.replaceAll("'([0-9]{1,2}/[0-9]{1,2}/[0-9]{4})'",
"'"+f.print(DateTimeFormat.forPattern("MM/dd/yyyy").parseDateTime("$1"))+"'")
But this fails with a java.lang.IllegalArgumentException: Invalid format: "$1". I've also tried using the f interpolator and padding with 0s, but it doesn't seem to like that either.
Are you not able to do additional processing on the captured groups ($1, etc.) inside the replaceAll? If not, how else can I achieve the desired result?
The $1 like backreferences can only be used inside string replacement patterns. In your code, "$1" is not a backreference any longer.
You may use a "callback" with replaceAllIn to actually get the match object and access its groups to further manipulate them:
val pattern = "'([0-9]{1,2}/[0-9]{1,2}/[0-9]{4})'".r
y = pattern replaceAllIn (y, m => "'"+f.print(DateTimeFormat.forPattern("MM/dd/yyyy").parseDateTime(m.group(1)))+"'")
Regex.replaceAllIn is overloaded and can take a Match => String.

Use RegEx to find dates and increment year by a value

I have a large number of files that contain dates. I would like to use a Regular Expression to find the dates and if possible increment the year of the date by 10.
The files can have multiple date formats ..
04/22/78
06-OCT-14
How would one write a regular expression that could find, increment, and replace the dates, or even just the year of the dates?
I plan to use a text editor like Text Pad, UltraEdit, or Notepad++ to search the files
Assuming the pattern of date is date.month.year. . in date can be any field separator.
You can use simple perl program to do this:
perl -ne 's/(\d+)$/($1+10)/e && print' filename
This will add 10 to the year, and print the date.
Output for this is:
04/22/88
06-OCT-24
Just wrote this python snippet to get it done.
import re
def add_ten_years(date):
reg = "((\d{2})(.)(\w{2,4})(.)(\d{2}))"
mat = re.search(reg, date)
if mat:
mat = mat.groups()
return ''.join(mat[1:5])+str(int(mat[5])+10)
print add_ten_years("04/22/78")
print add_ten_years("06-OCT-14")
You can configure the regex pattern to generalize it even more. Or can be easily translated to other languages. Hope it helped!

Splitting a string based on positions with regex

I need to convert this (date) String "12112014" to "12.11.2014"
What i would like to to is:
Split first 2 Strings "12", add ".",
then split the string from 3-4 to get "11", add "."
at the end split the last 4 strings (or 5-8) to get "2012"
I already found out how to get the first 2 characters ( "^\d{2}" ), but I failed to get characters based on a position.
Whatever be the programming language, You should try to extract the digits from string and then join them with a ".".
In perl, it can be done as :
$_ = '12112014';
s/(\d{2})(\d{2})(\d{4})/$1.$2.$3/;
print "$_";
Without you specifying the language you're after, I've picked javascript:
var s = '12012011';
var s2 = s.replace(/(\d{2})(\d{2})(\d{4})/,'$1.$2.$3'));
console.log(s2); // prints "12.01.2011"
The gist of it is that you use () to specify groups inside your regular expression and then can use the groups in your replace expression.
Same in Java:
String s = "12012011";
String s2 = s.replaceAll("(\\d{2})(\\d{2})(\\d{4})", "$1.$2.$3");
System.out.println(s2);
I dont think that you could do that only with split.
You could expand your expression to:
"(^(\d{2})(\d{2})(\d{4}))"
Then access the groups with the Regex language of your choice and build the string you want.
Note that - besides all regex learning - alternatively you could always parse the original string into strongly typed Date or DateTime variables and output the value using the appropriate locales.