locale specific dateformatting buggy? - coldfusion

We have been running into date formatting issues for over a year now. We have a EU formatting in our database (dd/mm/yyyy) and we also want to output that on our website. Problem is that we're running the dates through the date formatting functions of coldfusion, to be certain we're always outputting our dates the same way (and for other reasons).
That's where it goes wrong. The code below outputs 2 different dates, where we would expect the same date.
<cfoutput>
#LSDateFormat('01/02/2015', 'dd/mm/yyyy', 'nl_BE')# <br />
#LSDateTimeFormat('01/02/2015', 'dd/mm/yyyy HH:nn', 'nl_BE')#
</cfoutput>
// output
// 01/02/2015
// 02/01/2015 00:00
I have tried on trycf.com, using all different available engines.
Please explain to me what I'm doing wrong here. Or tell me this is a bug that no-one ever has mentioned. But I would prefer me being wrong here..

I think you are misunderstanding the 'format' functions. They are designed for presentation. Their purpose is to convert a date object into a string: ie LSDateTimeFormat (date , mask). The 'mask' is used to determine what that output string looks like, not to parse the input. Notice if you pass in a date object, NOT a string, it works exactly as you expected? The result is 01/02/2015 00:00
dateObject = createDate(2015,2,1);
writeDump("dateObject = "& LSDateTimeFormat(dateObject, 'dd/mm/yyyy HH:nn', 'nl_BE'));
Yes, CF allows you to be lazy, and pass in a string instead. However, CF must still convert that string into a date object before it can apply the mask - and you have no control over how CF does that. When you use strings, you are essentially leaving the interpretation of that string entirely up to CF. In this case, CF interprets the ambiguous string "01/02/2015" according to U.S. date rules ie month first. That produces January 2, 2015. Hence why the output of the mask dd/mm/... is 02/01/2015 00:00. So in essence, what your code is really doing is this:
// parse string according to U.S. rules - mm/dd/yyyy
HowCFInterpretsYourString = parseDateTime(dateString);
LSDateTimeFormat(HowCFInterpretsYourString, 'dd/mm/yyyy HH:nn', 'nl_BE');
Results:
HowCFInterpretsYourString = {ts '2015-01-02 00:00:00'} <=== January 2nd
LSDateTimeFormat = 02/01/2015 00:00 <=== Day = 2, Month = 1
If you do not want CF doing the interpretation for you, pass in date objects - not strings.
As for why LSDateFormat's behavior seems inconsistent with LSDateTimeFormat, I do not know. However, strings are ambiguous. So when you use them instead of date objects, well ... expect the unexpected.
We should just convert to the correct date object first and then format using the normal dateFormat method.
Just because you are only using the format functions to output the numeric date parts, does not mean that is all they do ;-) The format functions also output names, which are locale specific. For example, "MMMMM" might produce "September" or "septiembre" depending on the current locale. There are also other region specific rules, such as the placement of "month" and "day" and the exact capitalization of names. The standard Date/TimeFormat functions always use U.S. date conventions. Whereas LSDateTimeFormat uses whatever locale is supplied. In this specific case, there is not much difference because you are only outputting the numeric date parts:
Numeric date parts (only)
dateObject = createDate(2015,2,1);
writeDump("LSDateTimeFormat = "& LSDateTimeFormat(dateObject, 'dd/mm/yyyy', 'nl_BE'));
writeDump("DateTimeFormat = "& DateTimeFormat(dateObject, 'dd/mm/yyyy'));
Results:
LSDateTimeFormat = 01/02/2015
DateTimeFormat = 01/02/2015
However, for other formats there is a big difference. A date object may not be tied to a locale, but a string representation of a date is .. so the two functions are not interchangeable.
Date Names:
dateObject = createDate(2015,2,1);
writeDump("LSDateTimeFormat = "& LSDateTimeFormat(dateObject, 'full', 'nl_BE'));
writeDump("DateTimeFormat = "& DateTimeFormat(dateObject, 'full'));
Results:
LSDateTimeFormat = zondag 1 februari 2015 0.00 u. UTC
DateTimeFormat = Sunday, February 1, 2015 12:00:00 AM UTC
"EU formatting in our database (dd/mm/yyyy)"
Not sure what you mean by that. Date/time objects do not have a format. Your IDE may display them as human friendly strings, but the date values themselves are stored as numbers. Based on what you described, it sounds like either the values are stored as strings OR are being converted to strings, which would explain the results. Instead, store the values in a date/time column, then retrieve them and pass them into the function "as is" and it should work fine.

I use English (U.S.) locale conventions, so I can use ParseDateTime(), DateFormat() or DateTimeFormat()... but since your dates are UE, you must use LSParseDateTime() on the string that represents the date and the results should be consistent
<cfset D = "01/02/2015">
<cfoutput>
0. #D#<br />
1. #DateFormat(D)#<br />
2. #ParseDateTime(D)# (Parse default US)<br />
3. #LSParseDateTime(D,'en_GB')# (Parse en_GB)<br />
4. #LSDateFormat(D, 'dd/mm/yyyy', 'nl_BE')# (no parsing, default US)<br />
5. #LSDateTimeFormat(D, 'dd/mm/yyyy HH:nn', 'nl_BE')# (no parsing, default US)<br />
6. #LSDateFormat(ParseDateTime(D), 'dd/mm/yyyy', 'nl_BE')# (parsed as default US)<br />
7. #LSDateTimeFormat(ParseDateTime(D), 'dd/mm/yyyy HH:nn', 'nl_BE')# (parsed as default US)<br />
8. #LSDateFormat(LSParseDateTime(D,'en_GB'), 'dd/mm/yyyy', 'nl_BE')# (parsed as en_GB locale)<br />
9. #LSDateTimeFormat(LSParseDateTime(D,'en_GB'), 'dd/mm/yyyy HH:nn', 'nl_BE')# (parsed as en_GB locale)<br />
</cfoutput>
Results for text string "01/02/2015":
0. 01/02/2015
1. 02-Jan-15
2. {ts '2015-01-02 00:00:00'} (Parse default US)
3. {ts '2015-02-01 00:00:00'} (Parse en_GB)
4. 01/02/2015 (no parsing, default US)
5. 02/01/2015 00:00 (no parsing, default US)
6. 02/01/2015 (parsed as default US)
7. 02/01/2015 00:00 (parsed as default US)
8. 01/02/2015 (parsed as en_GB locale)
9. 01/02/2015 00:00 (parsed as en_GB locale)
You could use SQL to re-format the query data to standardize on US locale conventions:
http://www.sql-server-helper.com/tips/date-formats.aspx
SELECT CONVERT(VARCHAR(10), DateField, 101) AS USDate_MMDDYYYY
Either way, I recommend creating a UDF so that you can modify this rule in one single place in case you need to make any any future modifications.

Related

In Coldfusion a certain date June 01, 2008 is not getting casted/parsed to datetime object while using “CreateODBCDateTime” method

<cfoutput>
<cfset mydate = 'June 01, 2008'>
<cfset JobStartDate=CreateODBCDateTime(mydate)>
</cfoutput>
Error:
Date value passed to date function createDateTime is unspecified or invalid.
Specify a valid date in createDateTime function.
Even isdate(mydate) // isdate('June 01, 2008') throws the exception.
Even *DateDiff // DateDiff('m', 'June 01, 2008', 'October 14, 2010') also gives exception.
It works okay with other dates for example: 'August 01, 2008', 'May 14, 2012' etc
I am using ColdFusion 2021 Update 3. Any help would be highly appreciated?
Adding a few more details in response to the comments:
Under JVM details Java Default Locale is en_US. Also by running GetLocale() gives me English (US).
The issue does'nt reproduce on cftry or cffiddle. But it can be reproduced if you install Coldfusion via Commandbox and try running the code.
Just do a lsParseDateTime to fix this. You are declaring that as a string so CF wont consider that as a date
<cfset JobStartDate = CreateODBCDateTime(lsParseDateTime(mydate, "en_US"))>

Pandas regex replacement when there is no match

I'm using pandas.Series.str.replace to extract numbers from strings (its data that has been scraped from #WPWeather) and have got the point where I've extracted all the fields into a DataFrame like this...
df.head()
Out[48]:
temp pressure relative_humidity \
created_at
2019-12-13 10:19:13 5.2\xc2\xbaC, 975.4mb, 91.3%.
2019-12-12 10:19:07 2\xc2\xbaC, 990.3mb, 96.9%.
2019-12-11 10:19:07 4.2\xc2\xbaC, 1000.8mb, 85.7%.
2019-12-10 10:19:00 6.3\xc2\xbaC, 1008.5mb, 94.4%.
2019-12-09 10:18:51 5.4\xc2\xbaC, 1006.7mb, 68.5%.
last_24_max_temp last_24_min_temp rain sunshine
created_at
2019-12-13 10:19:13 7\xc2\xbaC, 2\xc2\xbaC, 9.5mm, 0
2019-12-12 10:19:07 6\xc2\xbaC, 1.5\xc2\xbaC, 0.9mm.' NaN
2019-12-11 10:19:07 11.7\xc2\xbaC, 2.2\xc2\xbaC, 14.1mm.' NaN
2019-12-10 10:19:00 6.5\xc2\xbaC, 1.9\xc2\xbaC, 1.1mm.' NaN
2019-12-09 10:18:51 8.5\xc2\xbaC, 5.2\xc2\xbaC, 1.5mm, 1.9
I'm trying to use regex's to extract the numerical values using...
pd.to_numeric(df['temp'].str.replace(r'(^-?\d+(?:\.\d+)?)(.*)', r'\1', regex=True))
...and it works well but I've hit an instance where one of the temperature fields doesn't have a value and is simply \xc2\xbaC,, as a consequence there is nothing matched in the first grouping to use in r'\1' and when it gets to trying to convert to numeric it fails with...
pandas/_libs/lib.pyx in pandas._libs.lib.maybe_convert_numeric()
ValueError: Unable to parse string "\xc2\xbaC," at position 120
How do I replace non-matches with something sane such as blank so that when I then call pd.to_numeric() it will convert to NaN?
Onde idea is change string for replace, then got not exist values get missing values:
df['temp'] = pd.to_numeric(df['temp'].str.replace(r'\xc2\xbaC,', '', regex=True))
print (df)
temp pressure relative_humidity
created_at
2019-12-13 10:19:13 5.2 975.4mb, 91.3%.
2019-12-12 10:19:07 2.0 990.3mb, 96.9%.
2019-12-11 10:19:07 4.2 1000.8mb, 85.7%.
2019-12-10 10:19:00 6.3 1008.5mb, 94.4%.
2019-12-09 10:18:51 5.4 1006.7mb, 68.5%.
Your solution should be changed with parameter errors='coerce' in to_numeric for replace non numeric to missing values:
df['temp'] = (pd.to_numeric(df['temp'].str.replace(r'(^-?\d+(?:\.\d+)?)(.*)',r'\1',regex=True),
errors='coerce'))

display regular date for FormattedRelative

I want to display friendly format date just like whatsapp and telegram do. For example, for today's date it shows "today" and yesterday date it shows "yesterday". But I don't want to show three days before as "3 days ago". It should be the regular date like this "Sun, 7 Jul 2019".
I don't have any custom to the current code because it still uses the example from the repo. But I tried to change the format but none of that works.
What does your code look like? You'd have to do some logic like
if (daysAgo > -2) {
return <FormattedRelativeTime numeric="auto" unit="day" value={daysAgo} />
}
return <FormattedDate weekday="short" day="numeric" month="short" year="numeric" value={ts} />

Matching diverse dates in Openrefine

I am trying to use the value.match command in OpenRefine 2.6 for splitting the information presents in a column into (at least) 2 columns.
The data are, however, quite messed up.
I have sometimes full dates:
May 30, 1949
Sometimes full dates are combined with other dates and attributes:
May 30, 1949, published 1979
May 30, 1949 and 1951, published 1979
May 30, 1949, printed 1980
May 30, 1949, print executed 1988
May 30, 1949, prints executed 1988
published 1940
Sometimes you have timespan:
1905-05 OR 1905-1906
Sometimes only the year
1905
Sometimes year with attributes
August or September 1908
Doesn't seems to follow any specific schema or order.
I would like to extract (at least)ca start and end date year, in order to have two columns:
-----------------------
|start_date | end_date|
|1905 | 1906 |
-----------------------
without the rest of the attributes.
I can find the last date using
value.match(/.*(\d{4}).*?/)[0]
and the first one with
value.match(/.*^(\d{4}).*?/)[0]
but I got some trouble with the two formulas.
The latter cannot match anything in case of:
May 30, 1949 and 1951, published 1979
while in the case of:
Paris, winter 1911-12
The latter formula cannot match anything and the former formula match 1911
Anyone know how I can resolve the problem?
I would need a solution that take the first date as start_date and final date as end_date, or better (don't know if it is possible) earliest date as start_date and latest date as end_date.
Moreover, I would be glad to have some clue about how to extract other information, such as
if published or printed or executed is present in the text -> copy date to a new column name “execution”.
should be something like create a new column
if(value.match("string1|string2|string3" + (\d{4}), "perform the operation", do nothing)
value.match() is a very useful but sometimes tricky function. To extract a pattern from a text, I prefer to use Python/Jython's regular expressions :
import re
pattern = re.compile(r"\d{4}")
return pattern.findall(value)
From there, you can create a string with all the years concatenated:
return ",".join(pattern.findall(value))
Or select only the first:
return pattern.findall(value)[0]
Or the last:
return pattern.findall(value)[-1]
etc.
Same thing for your sub-question:
import re
pattern = re.compile(r"(published|printed|executed)\s+(\d+)")
return pattern.findall(value)[0][1]
Or :
import re
pattern = re.compile(r"(published|printed|executed)\s+(\d+)")
m = re.search(pattern, value)
return m.group(2)
Example:
Here is a regex which will extract start_date and end_date in named groups :
If there is only one date, then it consider it's the start_date :
((?<start_date>\d{4}).*?)?(?<end_date>\d{4}|(?<=-)\d{2})?$
Demo

Check if string is of SortableDateTimePattern format

Is there any way I can easily check if a string conforms to the SortableDateTimePattern ("s"), or do I need to write a regular expression?
I've got a form where users can input a copyright date (as a string), and these are the allowed formats:
Year: YYYY (eg 1997)
Year and month: YYYY-MM (eg 1997-07)
Complete date: YYYY-MM-DD (eg 1997-07-16)
Complete date plus hours and minutes: YYYY-MM-DDThh:mmTZD (eg 1997-07-16T19:20+01:00)
Complete date plus hours, minutes and seconds: YYYY-MM-DDThh:mm:ssTZD (eg 1997-07-16T19:20:30+01:00)
Complete date plus hours, minutes, seconds and a decimal fraction of a second
YYYY-MM-DDThh:mm:ss.sTZD (eg 1997-07-16T19:20:30.45+01:00)
I don't have much experience of writing regular expressions so if there's an easier way of doing it I'd be very grateful!
Not thoroughly tested and hence not foolproof, but the following seems to work:
var regex:RegExp = /(?<=\s|^)\d{4}(-\d{2}(-\d{2}(T\d{2}:\d{2}(:\d{2}(\.\d{2})?)?\+\d{2}:\d{2})?)?)?(?=\s|$)/g;
var test:String = "23 1997 1998-07 1995-07s 1937-04-16 " +
"1970-0716 1993-07-16T19:20+01:01 1979-07-16T19:20+0100 " +
"2997-07-16T19:20:30+01:08 3997-07-16T19:20:30.45+01:00";
var result:Object
while(result = regex.exec(test))
trace(result[0]);
Traced output:
1997
1998-07
1937-04-16
1993-07-16T19:20+01:01
2997-07-16T19:20:30+01:08
3997-07-16T19:20:30.45+01:00
I am using ActionScript here, but the regex should work in most flavors. When implementing it in your language, note that the first and last / are delimiters and the last g stands for global.
I'd split the input field into many (one for year, month, day etc.).
You can use Javscript to advance from one field to the next once full (i.e. once four characters are in the year box, move focus to month) for smoother entry.
You can then validate each field independently and finally construct the complete date string.