Timestamp field into s3 table from parquet file - amazon-web-services

I am trying to insert the timestamp with timezone value into the s3 table from the parquet file but could not able to cast milliseconds and timezone.
Source data : 2022-03-12 13:21:38.688000 +00:00
Cast syntax used : cast(to_timestamp(timestamp_col,'yyyy-MM-dd HH:mm:ss') as timestamp) EXECUTION_TS /
cast(from_unixtime(unix_timestamp(substr(timestamp_col,'yyyy-MM-dd HH:mm:ss')) as timestamp) EXECUTION_TS
op i got: 2022-03-12T13:21:38.000+0000
i used .ms TZ/.SSSSS TZD to get milliseconds information but it's populating as null in output. can please tell me how we can retrieve milliseconds and time zone information.

Related

Prestosql/Amazon Athena: Time Zone Change

I need to change a UTC timestamp to 'US/Eastern' timestamp without changing the date and time - essentially update only the timezone information and later convert that to a different timezone.
For example (what I need):
'2021-06-09 19:00:36.000000' UTC --> '2021-06-09 19:00:36.000000' US/Eastern
Then I need to convert that to 'America/New_York'.
'2021-06-09 19:00:36.000000' US/Eastern --> '2021-06-09 16:00:36.000000' America/Los Angeles
When I try the query below, it's not giving me the correct results, since it is converting from UTC to America/Los Angeles. When it should be US/Eastern to America/Los Angeles.
SELECT id
, date_utc
, CAST(date_utc AT TIME ZONE 'America/Los Angeles') AS date_la
FROM call_records
I'm not sure if this will work for Athena, as it's based on a very old version of Presto/Trino.
In recent versions of Trino (formerly known as PrestoSQL), you can do this:
Cast the timestamp with time zone to timestamp to remove the timezone part.
Then, use with_timezone to reinterpret the resulting timestamp in US/Eastern.
Finally, use AT TIME ZONE to change the time zone of the resulting timestamp with time zone while preserving the instant.
Take a look at the example below:
trino:tiny> WITH t(ts) AS (VALUES TIMESTAMP '2021-06-09 19:00:36.000000 UTC')
-> SELECT with_timezone(cast(ts as timestamp(6)), 'US/Eastern') AT TIME ZONE 'America/Los_Angeles'
-> FROM t;
_col0
------------------------------------------------
2021-06-09 16:00:36.000000 America/Los_Angeles
(1 row)

Redshift to_timestamp with timezone offset

I have difficulties in converting this timestamp string 2020-09-08T15:30:00+00:00 to a correct UTC time:
If I do this:
select to_timestamp('2020-09-08T15:30:00+00:00', 'YYYY-MM-DD"T"HH24:MI:SS');
I get 2020-09-08 15:30:00.000000 -04:00, which is on a wrong timezone.
How can I parse the +00:00 part of the string? I tried TZ/OF based on AWS document but they are not allowed to be added:
[0A000][500310] [Amazon](500310) Invalid operation: "TZ"/"tz" not supported;
, while I'm doing this: select to_timestamp('2020-09-08T15:30:00+00:00', 'YYYY-MM-DD"T"HH24:MI:SS+TZ');
Not sure why you are trying to convert using to_timestamp, because in your example its already UTC/GMT only.
I have used this way in past, I hope it should work for you as well.
#below is something saved in IST(+5:30) to GMT
SELECT CONVERT_TIMEZONE('GMT','2020-09-08 15:30:00+05:30');
Similarly, it could be converted to US/Newyork timezone.
SELECT CONVERT_TIMEZONE('America/New_York','2020-09-08 15:30:00+05:30') ;

How to convert a varchar datetime to timestamp in Athena (Presto)?

I'm having a problem converting this varchar into an AWS Athena datetime
"2012-06-10T11:33:25.202615+00:00"
I've tried some like date_parse(pickup, %Y-%m-%dT%T)
I want to make a view like this using the timestamp already converted
CREATE OR REPLACE VIEW vw_ton AS
(
SELECT
id,
date_parse(pickup, timestamp) as pickup,
date_parse(dropoff, timestamp) as dropoff,
FROM "table"."ton"
)
You can use parse_datetime() function:
presto> SELECT parse_datetime('2012-06-10T11:33:25.202615+00:00', 'YYYY-mm-dd''T''HH:mm:ss.SSSSSSZ');
_col0
-----------------------------
2012-01-10 11:33:25.202 UTC
(1 row)
(Verified on Presto 339)

AWS Athena query error when trying to filter by date

I am trying to use Athena to query some data I have stored in an s3 bucket in parquet format. I have field called datetime which is defined as a date data type in my AWS Glue Data Catalog.
When I try running the following query in Athena, I get the error below:
SELECT DISTINCT datetime
FROM "craigslist"."pq_craigslist_rental_data_parquet"
WHERE datetime > '2018-09-14'
ORDER BY datetime DESC;
And the error:
Your query has the following error(s):
SYNTAX_ERROR: line 3:16: '>' cannot be applied to date, varchar(10)
What am I doing wrong here? How can I properly filter this data by date?
the string literal that you provide has to be casted to a date, in order to compare to a date.
where datetime = date('2019-11-27')
its having issue with the string literal used for date filter. Use WHERE datetime > date '2018-09-14'
from_iso8601_date or date should work.
SELECT DISTINCT datetime
FROM "craigslist"."pq_craigslist_rental_data_parquet"
WHERE datetime > from_iso8601_date('2018-09-14')
ORDER BY datetime DESC;
both return a proper date object.
SELECT typeof(from_iso8601_date('2018-09-14'))
Bit late here, but I had the same issue and the only workaround I have found is:
WHERE datetime > (select date '2018-09-14')

DB2 The syntax of the string representation of a datetime value is incorrect

We have a staging table that's used to load raw data from our suppliers.
One column is used to capture a time-stamp but its data-type is varchar(265). Data's dirty: about 40% of the time, there is garbage data, otherwise time-stamp data like this
2011/11/15 20:58:48.041
I have to create a report that filters some dates/timestamps out that column but where I try to cast it, I get an error:
db2 => select cast(loadedon as timestamp) from automation
1
--------------------------
SQL0180N The syntax of the string representation of a datetime value is incorrect. SQLSTATE=22007
What do I need to do in order to parse/cast the timestamp string?
The string format for a DB2 timestamp is either:
'2002-10-20-12.00.00.000000'
or
'2002-10-20 12:00:00'
You have to get your date string in either of these formats.
Also DB2 runs on a 24 hour clock even though the output sometimes uses a 12 hour clock (AM / PM)
So '2002-10-20 14:49:50' For 2:49:50 PM
Or '2002-10-20 00:00:00' For midnight. Output would be 12:00:00 AM
It seems you have a lot of garbage data, so firt of all you should check if the data is a valid timestamp in the format you expect ('2011/11/15 20:58:48.041'). We could use a simple solution - just replace all digits with '0' and check the result format:
TRANSLATE(timestamp_column,'0','0123456789','0') = '0000/00/00 00:00:00.000'
If the format is the expected one, you should convert to DB2 timestamp. In DB2 for iSeries there is a build-in function since V6R1 TIMESTAMP_FORMAT. In your case it will look like that:
TIMESTAMP_FORMAT('2011/11/15 20:58:48.041','YYYY/MM/DD HH24:MI:SS.NNNNNN')
So the solution query combined should look something like that:
SELECT
CASE
WHEN TRANSLATE(timestamp_column,'0','0123456789','0') = '0000/00/00 00:00:00.000'
THEN TIMESTAMP_FORMAT(timestamp_column,'YYYY/MM/DD HH24:MI:SS.NNNNNN')
ELSE NULL
END
FROM
your_table_with_bad_data
EDIT
I just saw your comment that provider agreed to clean the data. You could use the solution provided to speed up the process and clean the data by yourself:
ALTER your_table_with_bad_data ADD COLUMN clean_timestamp TIMESTAMP DEFAULT NULL;
UPDATE your_table_with_bad_data
SET clean_timestamp =
CASE
WHEN TRANSLATE(timestamp_column,'0','0123456789','0') = '0000/00/00 00:00:00.000'
THEN TIMESTAMP_FORMAT(timestamp_column,'YYYY/MM/DD HH24:MI:SS.NNNNNN')
ELSE NULL
END;