Athena / Presto data last week - amazon-web-services

I am currently trying to write an Athena query to fetch all data in a table from the last 7 days.
SELECT *
FROM "engagement_metrics"."spikes"
where spike_noticed_moment_utc > date_add('day', -7, now())
When running this query I get the following error:
SYNTAX_ERROR: line 3:32: '>' cannot be applied to varchar, timestamp with time zone
How can I achieve grabbing data from the last week given the current day in Athena?

Looks like the column spike_noticed_moment_utc is defined as varchar, you can cast it quite easy to timestamp using from_iso8601_timestamp:
SELECT *
FROM "engagement_metrics"."spikes"
where from_iso8601_timestamp(spike_noticed_moment_utc) > date_add('day', -7, now())

Related

How to query the time in unix epoch timestamp in aws athena

I have a simple table contains the node, message, starttime, endtime details where starttime and endtime are in unix timestamp. The query I am running is:
select node, message, (select from_unixtime(starttime)), (select from_unixtime(endtime)) from table1 WHERE try(select from_unixtime(starttime)) > to_iso8601(current_timestamp - interval '24' hour) limit 100
The query is not working and throwing the syntax error.
I am trying to fetch the following information from the table:
query the table using start time and end time for past 'n' hours or 'n' days and get the output of starttime and endtime in human readable format
query the table using a specific date and time in human readable format
You don't need "extra" selects and you don't need to_iso8601 in the where clasue:
WITH dataset AS (
SELECT * FROM (VALUES
(1627409073, 1627409074),
(1627225824, 1627225826)
) AS t (starttime, endtime))
SELECT from_unixtime(starttime), from_unixtime(endtime)
FROM
dataset
WHERE from_unixtime(starttime) > (current_timestamp - interval '24' hour) limit 100
Output:
_col0
_col1
2021-07-27 18:04:33.000
2021-07-27 18:04:34.000
to search last week you can use
WHERE your_date >= to_unixtime(CAST(now() - interval '7' day AS timestamp))

How to select data from aws athena table which is partitioned like 'year=yyyy/month=MM/date=dd/' for a given date range?

Athena Tables are partitioned like and same as s3 folder path
parent=9ab4fcca-65d8-11ea-bc55-0242ac130003/year=2020/month=4/date=17
parent=9ab4fcca-65d8-11ea-bc55-0242ac130003/year=2020/month=4/date=9
parent=0fc966a0-bba7-4c0b-a648-cff7f0332059/year=2020/month=4/date=16
parent=9ab4fcca-65d8-11ea-bc55-0242ac130003/year=2020/month=4/date=14
PARTITIONED BY (
`parent` string,
`year` int,
`month` tinyint,
`date` tinyint)
Now how should I form the where condition for a select query to get data for parent = "9ab4fcca-65d8-11ea-bc55-0242ac130003" from 2019-06-01 to 2020-04-31 ?
SELECT *
FROM table
WHERE parent = '9ab4fcca-65d8-11ea-bc55-0242ac130003' AND year >= 2019 AND year <= 2020 AND month >= 04 AND month <= 06 AND date >= 01 AND date <= 31 ;
But this isn't correct. Please help
Partitioning on year, month, and day separately makes it unnecessarily difficult to query tables. If you're starting out I really suggest to avoid this kind of partitioning scheme. If you can't avoid it you can still make things easier by creating the table partitions differently.
Most guides will tell you to create directory structures like year=2020/month=4/date=1/file1, create a table with three corresponding partition columns, and then run MSCK REPAIR TABLE to load partitions. This works, but it's far from the best way to use Athena. MSCK REPAIR TABLE has atrocious performance, and partitioning like that is far from ideal.
I suggest creating directory structures that are just 2020-03-01/file1, but if you can't, you can actually have any structure you want, 2020/03/01/file1, year=2020/month=4/date=1/file1, or any other structure where there is one distinct prefix per date will work more or less equally well.
I also suggest you create tables with only one partition column: date (or dt or day if you want avoid quoting), typed as DATE, not string.
What you do then, instead of running MSCK REPAIR TABLE is that you use ALTER TABLE … ADD PARTITION or the Glue APIs directly, to add partitions. This command lets you specify the location separately from the partition column value:
ALTER TABLE my_table ADD
PARTITION (day = '2020-04-01') LOCATION 's3://some-bucket/path/to/2020-04-01/'
The important thing here is that the partition column value doesn't have to have any relationship at all with the location, this would work equally well:
ALTER TABLE my_table ADD
PARTITION (day = '2020-04-01') LOCATION 's3://some-bucket/path/to/data-for-first-of-april/'
For your specific case you could have:
PARTITIONED BY (`parent` string, `day` date)
and then do:
ALTER TABLE your_table ADD
PARTITION (parent = '9ab4fcca-65d8-11ea-bc55-0242ac130003', day = '2020-04-17') LOCATION 's3://your-bucket/parent=9ab4fcca-65d8-11ea-bc55-0242ac130003/year=2020/month=4/date=17'
PARTITION (parent = '9ab4fcca-65d8-11ea-bc55-0242ac130003', day = '2020-04-09') LOCATION 's3://your-bucket/parent=9ab4fcca-65d8-11ea-bc55-0242ac130003/year=2020/month=4/date=9'
PARTITION (parent = '0fc966a0-bba7-4c0b-a648-cff7f0332059', day = '2020-04-16') LOCATION 's3://your-bucket/parent=0fc966a0-bba7-4c0b-a648-cff7f0332059/year=2020/month=4/date=16'
PARTITION (parent = '9ab4fcca-65d8-11ea-bc55-0242ac130003', day = '2020-04-14') LOCATION 's3://your-bucket/parent=9ab4fcca-65d8-11ea-bc55-0242ac130003/year=2020/month=4/date=14'
Here is how you can use year, month and day values the come from partitions in order to select date range
SELECT col1, col2
FROM my_table
WHERE CAST(date_parse(concat(CAST(year AS VARCHAR(4)),'-',
CAST(month AS VARCHAR(2)),'-',
CAST(day AS VARCHAR(2))
), '%Y-%m-%d') as DATE)
BETWEEN DATE '2019-06-01' AND DATE '2020-04-31'
You can add additional filter statements as needed)

How to calculate gap between 2 timestamps (edited for AWS Athena )

I Have many IOT devices that sends data to my Amazon Athena server, i created a table to store the data and the table contains 2 columns: LocalTime indicate the time that the IOT device capture his status, ServerTime indicate the time the Data arrived to server (sometimes the IOT device doesn't have network connections )
I would like to count the "gaps" in block of hours (let's say 1 hour ) in order to know the deviation of the data arriving, for example:
the result that I would like to get is:
In order to calculate the result i want to calculate how many hours passed between serverTime and LocalTime.
so the first entry (1.1.2019 12:15 - 1.1.2019 10:25 ) = 1-2 hours.
Thanks
If it is MSSQL Server is your database, you can try this below script to get your desired output-
SELECT
CAST(DATEDIFF(HH,localTime,serverTime)-1 AS VARCHAR) +'-'+
CAST(DATEDIFF(HH,localTime,serverTime) AS VARCHAR) [Hours],
COUNT(*) [Count]
FROM your_table
GROUP BY CAST(DATEDIFF(HH,localTime,serverTime)-1 AS VARCHAR) +'-'+
CAST(DATEDIFF(HH,localTime,serverTime) AS VARCHAR)
Oracle
If you using Oracle database as a system, you can use this statement:
select CONCAT(CONCAT (diff_hours,'-') , diff_hours+1) as Hours, count(diff_hours) as Count
from (select 24 * (to_date(LocalTime, 'YYYY-MM-DD hh24:mi') - to_date(ServerTime, 'YYYY-MM-DD hh24:mi')) diff_hours from T_TIMETABLE )
group by diff_hours
order by diff_hours;
Note: This will not display the empty intervals.

Query to calculate cost by month using AWS Athena querying

I have a table like below.
item_id bill_start_date bill_end_date usage_amount
635212 2019-02-01 00:00:00.000 3/1/2019 00:00:00.000 13.345 user_project
IBM
I am trying to find usage_amount by each month and each project. Amazon Athena query engine is based on Presto 0.172. Due to the limitations in Athena, it's not recognizing query like select sysdate from dual;.
I tried to convert bill_start_date and bill_end_date from timestamp to date but failed. even current_date() didn't work in my case. I am able to do calculate the total cost by hard coding the values but my end goal is to perform the action on columns.
SELECT (FLOOR(SUM(usage_amount)*100)/100) AS total,
user_project
FROM test_table
WHERE bill_start_date
BETWEEN date '2019-02-01'
AND date '2019-03-01'
GROUP BY user_project;
In Presto, current_timestamp is a SQL standard function which does not use parentheses.
To group by month, I'd use date_trunc('month', bill_start_date).
All of these functions are documented here

copy timestamp from AWS iot rule to Amazon redshift table column

My current iot design is iot > rule > kinesis firehose > redshift
I have iot rule as
SELECT *, timestamp() AS timestamp FROM 'topic/#
I get json message something like below
{
"deviceID": "device6",
"timestamp": 1480926222159
}
In my redshift table I have a column eventtime as Timestamp
Now i want to store the json timestamp value to eventtime column, but it gives me error as it needs
TIMEFORMAT AS 'MM.DD.YYYY HH:MI:SS
for timestamp. So how to covert the iot rules timestamp to redshift timestamp?
There is no direct way to converting epoch date value while inserting it to Redshift table Timestamp datatype column.
I have created a column with Bigint datatype and inserting epoch value directly to this column.
After that I am using Quicksight for analytics so I can edit my dataset and create New calculated field for this column and use Qucksight function as below
epochDate(epoch_date)
which converts the epoch value to timestamp field.
One can use similar functions like
SELECT
(TIMESTAMP 'epoch' + myunixtimeclm * INTERVAL '1 Second ')
AS mytimestamp
FROM
example_table