How to convert a varchar datetime to timestamp in Athena (Presto)? - amazon-web-services

I'm having a problem converting this varchar into an AWS Athena datetime
"2012-06-10T11:33:25.202615+00:00"
I've tried some like date_parse(pickup, %Y-%m-%dT%T)
I want to make a view like this using the timestamp already converted
CREATE OR REPLACE VIEW vw_ton AS
(
SELECT
id,
date_parse(pickup, timestamp) as pickup,
date_parse(dropoff, timestamp) as dropoff,
FROM "table"."ton"
)

You can use parse_datetime() function:
presto> SELECT parse_datetime('2012-06-10T11:33:25.202615+00:00', 'YYYY-mm-dd''T''HH:mm:ss.SSSSSSZ');
_col0
-----------------------------
2012-01-10 11:33:25.202 UTC
(1 row)
(Verified on Presto 339)

Related

Converting BigQuery string to datetime

I have a bq table withhas a column timestamp as a string, with format 20090630 16:36:23:880, how can I convert it to a proper timestamp ?
parse_datetime('%Y%m%d %H:%M:%E3S', '20090630 16:36:23.880')

Parse timestamp in Hive during table creation

I have a file that looks like this:
33.49.147.163 20140416123526 https://news.google.com/topstories?hl=en-US&gl=US&ceid=US:en 29 409 Firefox/5.0
I want to load it into a hive table. I do it this way:
create external table Logs (
ip string,
ts timestamp,
request string,
page_size smallint,
status_code smallint,
info string
)
row format serde 'org.apache.hadoop.hive.serde2.RegexSerDe'
with serdeproperties (
"timestamp.formats" = "yyyyMMddHHmmss",
"input.regex" = '^(\\S*)\\t{3}(\\d{14})\\t(\\S*)\\t(\\S*)\\t(\\S*)\\t(\\S*).*$'
)
stored as textfile
location '/data/user_logs/user_logs_M';
And
select * from Logs limit 10;
results in
33.49.147.16 NULL https://news.google.com/topstories?hl=en-US&gl=US&ceid=US:en 29 409 Firefox/5.0
How to parse timestamps correctly, to avoid this NULLs?
"timestamp.formats" SerDe property works only with LazySimpleSerDe (STORED AS TEXTFILE), it does not work with RegexSerDe. If you are using RegexSerDe, then parse timestamp in a query.
Define ts column as STRING data type in CREATE TABLE and in the query transform it like this:
select timestamp(regexp_replace(ts,'(\\d{4})(\\d{2})(\\d{2})(\\d{2})(\\d{2})(\\d{2})','$1-$2-$3 $4:$5:$6.0')) as ts
Of course, you can extract each part of the timestamp using SerDe as separate columns and properly concatenate them with delimiters in the query to get correct timestamp format, but it will not give you any improvement because anyway you will need additional transformation in the query.

AWS Athena query error when trying to filter by date

I am trying to use Athena to query some data I have stored in an s3 bucket in parquet format. I have field called datetime which is defined as a date data type in my AWS Glue Data Catalog.
When I try running the following query in Athena, I get the error below:
SELECT DISTINCT datetime
FROM "craigslist"."pq_craigslist_rental_data_parquet"
WHERE datetime > '2018-09-14'
ORDER BY datetime DESC;
And the error:
Your query has the following error(s):
SYNTAX_ERROR: line 3:16: '>' cannot be applied to date, varchar(10)
What am I doing wrong here? How can I properly filter this data by date?
the string literal that you provide has to be casted to a date, in order to compare to a date.
where datetime = date('2019-11-27')
its having issue with the string literal used for date filter. Use WHERE datetime > date '2018-09-14'
from_iso8601_date or date should work.
SELECT DISTINCT datetime
FROM "craigslist"."pq_craigslist_rental_data_parquet"
WHERE datetime > from_iso8601_date('2018-09-14')
ORDER BY datetime DESC;
both return a proper date object.
SELECT typeof(from_iso8601_date('2018-09-14'))
Bit late here, but I had the same issue and the only workaround I have found is:
WHERE datetime > (select date '2018-09-14')

AWS Athena - Cast CloudFront log time field to timestamp

I'm following the example AWS documentation gave for creating a CloudFront log table in Athena.
CREATE EXTERNAL TABLE IF NOT EXISTS default.cloudfront_logs (
`date` DATE,
time STRING,
location STRING,
bytes BIGINT,
requestip STRING,
method STRING,
host STRING,
uri STRING,
status INT,
referrer STRING,
useragent STRING,
querystring STRING,
cookie STRING,
resulttype STRING,
requestid STRING,
hostheader STRING,
requestprotocol STRING,
requestbytes BIGINT,
timetaken FLOAT,
xforwardedfor STRING,
sslprotocol STRING,
sslcipher STRING,
responseresulttype STRING,
httpversion STRING,
filestatus STRING,
encryptedfields INT
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LOCATION 's3://your_log_bucket/prefix/'
TBLPROPERTIES ( 'skip.header.line.count'='2' )
Creating the table with the time field as a string doesn't allow me to run conditional queries. I tried re-creating the table with the following:
CREATE EXTERNAL TABLE IF NOT EXISTS default.cloudfront_logs (
`date` DATE,
time timestamp,
....
Unfortunately this did not work and I received no results in the time field when I previewed the table.
Does anyone have any experience casting the time to something that I can use to query?
Concat the date and time into a timestamp in a subquery:
WITH ds AS
(SELECT *,
parse_datetime( concat( concat( format_datetime(date,
'yyyy-MM-dd'), '-' ), time ),'yyyy-MM-dd-HH:mm:ss') AS datetime
FROM default.cloudfront_www
WHERE requestip = '207.30.46.111')
SELECT *
FROM ds
WHERE datetime
BETWEEN timestamp '2018-11-19 06:00:00'
AND timestamp '2018-11-19 12:00:00'
It's frustrating that there isn't a straightforward way to have usable timestamps (dates with times included) in a table based on CloudFront logs.
However, this is now my workaround:
I create a view based on the original table. Say my original table is cloudfront_prod_logs. I create a view, cloudfront_prod_logs_w_datetime that has a proper datetime/timestamp field and I use that in queries, instead of the original table.
CREATE OR REPLACE VIEW cloudfront_prod_logs_w_datetime AS
SELECT
"date_parse"("concat"(CAST(date AS varchar), ' ', CAST(time AS varchar)), '%Y-%m-%d %H:%i:%s') datetime
, *
FROM
cloudfront_prod_logs

Inserting date/timestamp values into SQL Server table using Python

I have three variables stored as number, string and string, as shown below.
load_id = 100
t_date = '2014-06-18'
p_date = '19-JUN-14 10.51.45.378196'
I would like to insert them into a SQL Server table using Python 2.7. The SQL Server table structure is as follows
load_id = float
t_date = date
p_date = timestamp
In Oracle, we tend to use TO_DATE or TO_TIMESTAMP to convert the string to DATE or TIMESTAMP field.
I would like to know how I can do similar conversion while inserting into an SQL Server table.
Thanks in advance.
convert with :
import datetime
import calendar
thedate=datetime.datetime.strptime(p_date,'%d-%b-%y %H.%M.%S.%f')
thetimestamp=calendar.timegm(thedate.utctimetuple())
https://community.toadworld.com/platforms/sql-server/b/weblog/archive/2012/04/18/convert-datetime-to-timestamp
DECLARE #DateTimeVariable DATETIME
SELECT #DateTimeVariable = GETDATE()
SELECT #DateTimeVariable AS DateTimeValue,
CAST(#DateTimeVariable AS TIMESTAMP) AS DateTimeConvertedToTimestampCAST
SELECT CAST(CAST(#DateTimeVariable AS TIMESTAMP) AS DATETIME) AS
TimestampToDatetime
Do the conversion with SQL instead of trying to get Python to match the SQL format.
Neither format matches yours, however the DATETIME type should be adequate.