AWS Athena BIGINT with ddmmyyyyhhmmss to date time - amazon-web-services

i have a bigint data type value 10062019192751 it is said to me that it is a datetime formated as ddmmyyyyhhmmss (10-06-2019 19:27:51)
how can i convert or parse it to datetime in AWS Athena
using syntax from_unixtime, is giving me different value

Amazon Athena is based on Presto, so you can use Date and Time Functions and Operators — Presto.
The date_parse() command can convert a string into a date by defining the format of the string (consult the above link to see the syntax).
Here is a solution, which first converts the number into a string (varchar) and then converts it into a date:
select date_parse(cast(10062019192751 as varchar),'%d%c%Y%k%i%s')
The output is:
2019-06-10 19:27:51.000

Related

AWS Firehose dynamic partitioning and date parsing

I'm trying to do dynamic data partitioning by date with a kinesis delivery/firehose stream. The payload I'm expecting is JSON, with this general format
{
"clientId": "ASGr496mndGs80oCC97mf",
"createdAt": "2022-09-21T14:44:53.708Z",
...
}
I don't control the format of this date I'm working with.
I have my delivery firehose set to have "Dynamic Partitioning" and "Inline JSON Parsing" enabled (because both are apparently required per the AWS console UI).
I've got these set as "Dynamic Partitioning Keys"
year
.createdAt| strptime("%Y-%m-%dT%H:%M:%S.%fZ")| strftime("%Y")
month
.createdAt| strptime("%Y-%m-%dT%H:%M:%S.%fZ")| strftime("%m")
day
.createdAt| strptime("%Y-%m-%dT%H:%M:%S.%fZ")| strftime("%d")
hour
.createdAt| strptime("%Y-%m-%dT%H:%M:%S.%fZ")| strftime("%h")
But that gives me errors like date \"2022-09-21T18:30:04.431Z\" does not match format \"%Y-%m-%dT%H:%M:%S.%fZ.
It looks like strptime expects decimal seconds to be padded out to 6 places, but I have 3. I don't control the format of this date I'm working with. This seems to be JQ expressions, but I have exactly zero experience using it, and the AWS documentation for this stuff leaves an awful lot to be desired.
Is there a way to get strptime to successfully parse this format, or to just ignore the minute, second, and millisecond part of the time (I only care about hours)?
Is there another way to achieve what I'm trying to do here?
You can try following :
.createdAt | strptime("%Y-%m-%dT%H:%M:%S%Z") | strftime("%Y")
It is trimming the milliseconds whereas retaining rest of the information in the datetime.
Here is the jq snippet example

Cannot parse UTC date in Athena

I have the date string in the form: 2019-02-18 09:17:31.260000+00:00 and I am trying to convert it into date in Athena.
I have tried converting into timestamp as suggested in the SO answers but failed.
There is a discussion in https://github.com/prestodb/presto/issues/10567 but no answer to this particular date format.
I tried several format like 'YYYY-MM-dd HH:mm:ss.SSSSSSZ' but doesn't work and get error like INVALID_FUNCTION_ARGUMENT: Invalid format:..is malformed at "+00:00".
Been stuck for a while, any help is appreciated!
Athena is based on a very old version of Presto, and there is no straightforwad way of doing that with some string manipulation trick. For instance, you can use regexp_replace to extract the part of the string that's compatible with the built-in timestamp with timezone type and do:
SELECT cast(regexp_replace('2019-02-18 09:17:31.260000+00:00','(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\.\d{3})\d{3}(.*)', '$1$2') AS timestamp with time zone)
Recent versions of Trino (formerly known as PrestoSQL) introduced support for variable-precision temporal types with up to nanosecond precision (12 decimals).
With that feature, you can just do:
trino> select cast('2019-02-18 09:17:31.260000+00:00' as timestamp(6) with time zone);
_col0
--------------------------------
2019-02-18 09:17:31.260000 UTC
(1 row)
A shorter version to Martin Traverso's answer is to sub string the extra characters:
select cast(substr('2019-02-18 09:17:31.260000+00:00',1,23) as timestamp);

S3 select - How can I query by non-standard timestamp comparison

I'm using a S3 bucket where the data is organized into files by an ID & year/month - meaning one file per ID & month.
In each (csv.gz) file each record has a timestamp in the format: YYYY-MM-dd HH:mm:ss (note the missing T).
Now, when querying the data I want to support datetime granularity down to seconds so naturally it's desired to filter the data in S3 already prior to managing the data in Python.
I can't however find any method to do this.
The function TO_TIMESTAMP doesn't support a user provided format (expects a T date/time separator) and combining SUBSTRING and CAST (CAST(SUBSTRING(my_timestamp_column, 1, 10) AS TIMESTAMP)) yields a The query cannot be evaluated error.
Is there any way around this?
The documentation states that the function TO_TIMESTAMP is "the inverse operation of TO_STRING" which is not quite true as the latter supports a time_format_pattern.
I think I had to solve the same or similar problem. As you said, the documentation (https://docs.aws.amazon.com/AmazonS3/latest/dev/s3-glacier-select-sql-reference-date.html#s3-glacier-select-sql-reference-to-timestamp) states that the function TO_TIMESTAMP is the inverse operation of TO_STRING. But the documentation to me was misleading, because it does not make clear that the TO_TIMESTAMP function does support time_format_pattern as a second argument. The documentation shows that it only takes one argument, but it can in fact take two.
I was able to convert a non-standard timestamp 20190101T050000.000Z from type string to timestamp like so:
aws s3api select-object-content --bucket foo_bucket --key foo.json.gz --expression "SELECT * FROM s3object s WHERE TO_TIMESTAMP(s.\"timestamp\", 'yMMdd''T''Hmmss.SSS''Z''') < TO_TIMESTAMP('20190101T050000.000Z', 'yMMdd''T''Hmmss.SSS''Z''')" --expression-type 'SQL' --input-serialization '{ "CompressionType": "GZIP","JSON": {"Type": "DOCUMENT"}}' --output-serialization '{"JSON": {"RecordDelimiter": "\n"}}' /dev/shm/foo.json
Hope that helps somebody out.
Having same issue over here, I went an step over and change my csv file to grant date field with require format by timestamp date type in S3 Select.The requiere format is described here S3 data types
So first, in order to response the question, based on S3 Select documentation, I think is not possible to work with a date without T at the end. By the time you correct that, you will be able to work with CAST function. Next is what I do:
select * from s3object as s where CAST('2020-01-01T' AS TIMESTAMP) < CAST('2021-01-01T' AS TIMESTAMP)
That works just okay, however as you can see, I'm not passing s."Date" which is the field header in my csv file due to following error:
Attempt to convert from one data type to another failed at line 1, column 39: cast from STRING to TIMESTAMP.
I hope have been help a little bit, and hope someone can help with this error.

Extract Date from epoch in NiFi

I have a CSV file with an attribute having epoch values like '1517334599.906'.
I want to convert/update the Epoch values into ISO timestamp 'yyyy-MM-dd HH:mm:ss.SSS' via NiFi.
That conversion is for Kibana to recognize the field as Timestamp. Is there a way to do this? If there is can anyone help me with the configuration?
Using NiFi's record capabilities you can use UpdateRecord with a CsvReader and CsvWriter.
See the "format" function in expression language for converting an epoch to a date string:
https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html#format
In UpdateRecord you would do something like:
/eventDate = ${field.value:format("yyyy-MM-dd HH:mm:ss.SSS")}
This says take the value of /eventDate (change this to your field name) and set the value of that field to the result of the format function on the right.
The only thing I am not sure about is whether an epoch can have a decimal portion as shown in your example. I would expect it to be converted to a long which would be a whole number.

convert wxString to time_t

I have a wxString which has a date as its value. The date format is stored depending on the regional setting or locale settings.
For eg. wxString dateStr = "9/10/2013" [dd/mm/yyyy format for Italy as regional locale setting].
When I parse the date string using wxDateTime::ParseDate(dateStr) and try to convert it in time_t using wxDateTime::GetTicks() function. But it swaps the value of day and month when the day is less than or equal to 12 for example 3/10/2013 or 12/11/2013. I am getting month as 3 and 12, and day as 10 and 11 respectively. But it works fine if the date is greater than 12 i.e 14/10/2013 or 28/10/2013.
I want to convert the above date string into time_t depending upon the locale setting. I am using windows as well as linux for development env.
Please help me out from this problem with an example or code snippet.
I suggest you use wxDateTime::ParseDateFormat instead, then you can specify the exact format of the date-string.
The reason you have problem with ParseDate is that it first tries to parse the date-string in American format (where the format is mm/dd/yyyy), and if it fails it tries other formats.