Check if string is of SortableDateTimePattern format - regex

Is there any way I can easily check if a string conforms to the SortableDateTimePattern ("s"), or do I need to write a regular expression?
I've got a form where users can input a copyright date (as a string), and these are the allowed formats:
Year: YYYY (eg 1997)
Year and month: YYYY-MM (eg 1997-07)
Complete date: YYYY-MM-DD (eg 1997-07-16)
Complete date plus hours and minutes: YYYY-MM-DDThh:mmTZD (eg 1997-07-16T19:20+01:00)
Complete date plus hours, minutes and seconds: YYYY-MM-DDThh:mm:ssTZD (eg 1997-07-16T19:20:30+01:00)
Complete date plus hours, minutes, seconds and a decimal fraction of a second
YYYY-MM-DDThh:mm:ss.sTZD (eg 1997-07-16T19:20:30.45+01:00)
I don't have much experience of writing regular expressions so if there's an easier way of doing it I'd be very grateful!

Not thoroughly tested and hence not foolproof, but the following seems to work:
var regex:RegExp = /(?<=\s|^)\d{4}(-\d{2}(-\d{2}(T\d{2}:\d{2}(:\d{2}(\.\d{2})?)?\+\d{2}:\d{2})?)?)?(?=\s|$)/g;
var test:String = "23 1997 1998-07 1995-07s 1937-04-16 " +
"1970-0716 1993-07-16T19:20+01:01 1979-07-16T19:20+0100 " +
"2997-07-16T19:20:30+01:08 3997-07-16T19:20:30.45+01:00";
var result:Object
while(result = regex.exec(test))
trace(result[0]);
Traced output:
1997
1998-07
1937-04-16
1993-07-16T19:20+01:01
2997-07-16T19:20:30+01:08
3997-07-16T19:20:30.45+01:00
I am using ActionScript here, but the regex should work in most flavors. When implementing it in your language, note that the first and last / are delimiters and the last g stands for global.

I'd split the input field into many (one for year, month, day etc.).
You can use Javscript to advance from one field to the next once full (i.e. once four characters are in the year box, move focus to month) for smoother entry.
You can then validate each field independently and finally construct the complete date string.

Related

How to find the difference in hours between two dates dd/mm/yyyy hh:mm

I have two dates in cells
A1=05.11.2021 18:16
B1=05.11.2021 20:16
I need to find difference in hours between two dates. Result should be (B1-A1)=2 I can't find an answer on the Internet, I ask for help.
use:
=TEXT((DATE(
REGEXEXTRACT(B1, "\d{4}"),
REGEXEXTRACT(B1, "\.(\d+)\."),
REGEXEXTRACT(B1, "^\d+"))+INDEX(SPLIT(B1, " "),,2))-(DATE(
REGEXEXTRACT(A1, "\d{4}"),
REGEXEXTRACT(A1, "\.(\d+)\."),
REGEXEXTRACT(A1, "^\d+"))+INDEX(SPLIT(A1, " "),,2)), "[h]")
arrayformula:
=INDEX(IFNA(TEXT((DATE(
REGEXEXTRACT(B1:B, "\d{4}"),
REGEXEXTRACT(B1:B, "\.(\d+)\."),
REGEXEXTRACT(B1:B, "^\d+"))+INDEX(SPLIT(B1:B, " "),,2))-(DATE(
REGEXEXTRACT(A1:A, "\d{4}"),
REGEXEXTRACT(A1:A, "\.(\d+)\."),
REGEXEXTRACT(A1:A, "^\d+"))+INDEX(SPLIT(A1:A, " "),,2)), "[h]")))
shorter:
=INDEX(IFERROR(1/(1/(TEXT(
REGEXREPLACE(B1:B, "(\d+).(\d+).(\d{4})", "$2/$1/$3")-
REGEXREPLACE(A1:A, "(\d+).(\d+).(\d{4})", "$2/$1/$3"), "[h]")))))
EDIT:
As what #basic mentioned in the above comment, you can format the cell where your output goes or use text with h for hour difference and [h] for the whole duration in hours (got from Cooper's answer). See usage and difference below:
Text:
=text(B1-A1, "h")
or
=text(B1-A1, "[h]")
Update:
Make sure your Date Times uses proper delimiters. / and - are acceptable (e.g. 5/11/2021 18:16:00 or 5-11-2021 18:16:00). (This depends entirely on your locale.)
If you want to show it having . as delimiter, just use a custom Date Time format and use . as its delimiter.
Using custom format:
Actual value vs Display value:
If you don't want to do any changes to the date time and want to have it as text, then replace them using regexreplace before using them in text.
RegexReplace:
=text(REGEXREPLACE(B1, "\.", "/") - REGEXREPLACE(A1, "\.", "/"), "h")
or
=text(REGEXREPLACE(B1, "\.", "/") - REGEXREPLACE(A1, "\.", "/"), "[h]")

Condensing a string of time blocks; displaying date ranges

A user selects a number hours (as DateTime objects) when making a booking for a rental space.
List<DateTime> dateTimeList = getDateTimeList();
I convert that list to a presentable string like so:
List<String> hourList = List<String>();
for (DateTime dateTime in dateTimeList) {
String hour = getHour(dateTime, context); // getHour returns e.g. 14:00 or 2pm
String nextHour = getHour(dateTime.add(Duration(hours: 1)), context);
hourList.add(hour + " - " + nextHour);
}
hourList.sort();
return hourList.join(", ");
Eventually I have the following list:
10:00 - 11:00
11:00 - 12:00
15:00 - 16:00
16:00 - 17:00
20:00 - 21:00
Q: How can I condense it, so consecutive blocks are merged? Like so:
10:00 - 12:00
15:00 - 17:00
20:00 - 21:00
I've thought of regex replace and various for loops that get to complicated before I finish... and this is not so delicate:
string = string.replaceAll("- 01:00, 01:00", "");
string = string.replaceAll("- 02:00, 02:00", "");
string = string.replaceAll("- 03:00, 03:00", "");
etc
A RegExp doesn't understand the meaning of the text, just the structure, so you are usually better off parsing the text and handling it with code that understands what is going on.
In this particular case, your textual structure is actually so simple that a RegExp can handle it, because you are looking for - xy:zw, xy:zw for any digits x, y, z and w.
A RegExp mathching that is:
var repeatedTimeRE = RegExp(r"- (\d\d:\d\d), \1 ");
you can then replace do string = string.replaceAll(repatedTimeRE, ""); to join adjacent time intervals where the second starts at exactly the same time as the first ends.
If your format is not precisely as written, say one o'clock can be written both as "1:00" and "01:00", then textual matching becomes much harder.
If you can have overlapping intervals, then a RegExp also can't catch it, say:
01:00 - 02:00, 01:59 - 02:35.
A semantic merge function could recognize the overlap and merge anyway, textual matching only really works for strictly equal texts.

Matching diverse dates in Openrefine

I am trying to use the value.match command in OpenRefine 2.6 for splitting the information presents in a column into (at least) 2 columns.
The data are, however, quite messed up.
I have sometimes full dates:
May 30, 1949
Sometimes full dates are combined with other dates and attributes:
May 30, 1949, published 1979
May 30, 1949 and 1951, published 1979
May 30, 1949, printed 1980
May 30, 1949, print executed 1988
May 30, 1949, prints executed 1988
published 1940
Sometimes you have timespan:
1905-05 OR 1905-1906
Sometimes only the year
1905
Sometimes year with attributes
August or September 1908
Doesn't seems to follow any specific schema or order.
I would like to extract (at least)ca start and end date year, in order to have two columns:
-----------------------
|start_date | end_date|
|1905 | 1906 |
-----------------------
without the rest of the attributes.
I can find the last date using
value.match(/.*(\d{4}).*?/)[0]
and the first one with
value.match(/.*^(\d{4}).*?/)[0]
but I got some trouble with the two formulas.
The latter cannot match anything in case of:
May 30, 1949 and 1951, published 1979
while in the case of:
Paris, winter 1911-12
The latter formula cannot match anything and the former formula match 1911
Anyone know how I can resolve the problem?
I would need a solution that take the first date as start_date and final date as end_date, or better (don't know if it is possible) earliest date as start_date and latest date as end_date.
Moreover, I would be glad to have some clue about how to extract other information, such as
if published or printed or executed is present in the text -> copy date to a new column name “execution”.
should be something like create a new column
if(value.match("string1|string2|string3" + (\d{4}), "perform the operation", do nothing)
value.match() is a very useful but sometimes tricky function. To extract a pattern from a text, I prefer to use Python/Jython's regular expressions :
import re
pattern = re.compile(r"\d{4}")
return pattern.findall(value)
From there, you can create a string with all the years concatenated:
return ",".join(pattern.findall(value))
Or select only the first:
return pattern.findall(value)[0]
Or the last:
return pattern.findall(value)[-1]
etc.
Same thing for your sub-question:
import re
pattern = re.compile(r"(published|printed|executed)\s+(\d+)")
return pattern.findall(value)[0][1]
Or :
import re
pattern = re.compile(r"(published|printed|executed)\s+(\d+)")
m = re.search(pattern, value)
return m.group(2)
Example:
Here is a regex which will extract start_date and end_date in named groups :
If there is only one date, then it consider it's the start_date :
((?<start_date>\d{4}).*?)?(?<end_date>\d{4}|(?<=-)\d{2})?$
Demo

Python Time Series

I am working on a real estate cash-flow simulation.
What I want in the end is a time series where everyday I report if the property is vacant, leased and if I collected rent.
In my present code, I create first a profit array with values of "Leased", "Vacant" or "Today you collected rent of $1000", so I used this to create my time series:
rng=pd.date_range('6/1/2016', periods=len(profit), freq='D')
ts=pd.Series(profit, index=rng)
To simplify, I assumed I collected rent every 30 days. Now I want to be more specific and collect it every 5th day of the month (for example) and be flexible on the day the next tenant will move in.
Do you know commands or a good source where I can learn how to iterate from month to month?
Any help would be appreciated
You can build a sequence of dates using date_range and .shift() (freq='M' is for month-end frequencies) with pd.datetools.day like so:
date_sequence = pd.date_range(start, end, freq='M').shift(num_of_days, freq=pd.datetools.day)
and then use this sequence to select dates from the DateTimeIndex using
df.loc[date_sequence, 'column_name'] = value
Alternatively, you can use pd.DateOffset() like so:
ts = pd.date_range(start=date(2015, 6, 1), end=date(2015, 12, 1), freq='MS')
DatetimeIndex(['2015-06-01', '2015-07-01', '2015-08-01', '2015-09-01',
'2015-10-01', '2015-11-01', '2015-12-01'],
dtype='datetime64[ns]', freq='MS')
Now add 5 days:
ts + pd.DateOffset(days=5)
to get:
DatetimeIndex(['2015-06-06', '2015-07-06', '2015-08-06', '2015-09-06',
'2015-10-06', '2015-11-06', '2015-12-06'],
dtype='datetime64[ns]', freq=None)

How can I limit two dates in an ExtJS DatePicker?

I have a DatePicker in ExtJS4. I only want to allow TWO dates for each month. The 15th and last day (30/31/28/29 depending on month/year)
How can I disable every day in the picker but allow those two dates?
See disabledDates config option for Ext.form.field.Date
From API docs:
disabledDates : String[] An array of "dates" to disable, as strings.
These strings will be used to build a dynamic regular expression so
they are very powerful. Some examples:
// disable these exact dates:
disabledDates: ["03/08/2003", "09/16/2003"]
// disable these days for every year:
disabledDates: ["03/08", "09/16"]
// only match the beginning (useful if you are using short years):
disabledDates: ["^03/08"]
// disable every day in March 2006:
disabledDates: ["03/../2006"]
// disable every day in every March:
disabledDates: ["^03"]
Note that the format of the dates included in the array should exactly
match the format config. In order to support regular expressions, if
you are using a date format that has "." in it, you will have to
escape the dot when restricting dates. For example: ["03\.08\.03"].
//Get the last date of the month. If in Feb 2012, lastDate is 29.
var lastDate = Ext.Date.getDaysInMonth(new Date());
//15th
var middleDate = 15;
//Construct regular expression
var disabledArray=[];
var today = Ext.Date.format(new Date(), 'm/d/Y');
var dateReg = /(\d{2}\/)\d{2}(\/\d{4})/;
disabledArray.push(today.replace(dateReg, '$1' + middleDate + '$2'));
disabledArray.push(today.replace(dateReg, '$1' + lastDate + '$2'));
//Something like "^(?!02/15/2012|02/29/2012).*$" including the two days allowed.
var disabledReg = '^(?!' + disabledArray.join('|') + ').*$';
//Apply the regular expression to date field
var dateField = new Ext.form.field.Date({
format: 'm/d/Y',
disabledDates: [disabledReg]
});