Query to calculate cost by month using AWS Athena querying - amazon-web-services

I have a table like below.
item_id bill_start_date bill_end_date usage_amount
635212 2019-02-01 00:00:00.000 3/1/2019 00:00:00.000 13.345 user_project
IBM
I am trying to find usage_amount by each month and each project. Amazon Athena query engine is based on Presto 0.172. Due to the limitations in Athena, it's not recognizing query like select sysdate from dual;.
I tried to convert bill_start_date and bill_end_date from timestamp to date but failed. even current_date() didn't work in my case. I am able to do calculate the total cost by hard coding the values but my end goal is to perform the action on columns.
SELECT (FLOOR(SUM(usage_amount)*100)/100) AS total,
user_project
FROM test_table
WHERE bill_start_date
BETWEEN date '2019-02-01'
AND date '2019-03-01'
GROUP BY user_project;

In Presto, current_timestamp is a SQL standard function which does not use parentheses.
To group by month, I'd use date_trunc('month', bill_start_date).
All of these functions are documented here

Related

Explode a table with a monthly increment in Amazon Redshift

I have a sample table:
id
start_dt
end_dt
100
06/07/2021
30/09/2021
I would like to get the following output
id
start_dt
end_dt
100
06/07/2021
31/07/2021
100
01/08/2021
30/08/2021
100
01/09/2021
30/09/2021
I have tried using GENERATE_SERIES() in Amazon Redshift, but that does not give the required result.
The existing table is quite large so I could use temp tables then join back to another table at a later stage.
I have trawled through other posts, but other proposed solutions isn't quite giving the desired results / don't work at all on Amazon Redshift. Any help in solving this would be appreciated.
The traditional method would be:
Create a Calendar table that contains one row per month, with start_date and end_date columns
Join your table to the Calendar table, where table.start_dt <= calendar.end_dt AND table.end_dt >= calendar.start_dt
The two columns would be:
GREATEST(table.start_dt, calendar.start_dt)
LEAST(table.end_dt, calendar.end_dt)

AWS Athena date sql query

Below is the data in csv file in s3 bucket which I have used to build Athena database.
John
Wright
cricket
25
Steve
Adams
football
30
I am able to run the query and get the data.
Now I am trying to fetch date of birth based on age column. Is it possible to generate date of birth from age column like current date - age (column) and print only the date of birth?
I tried below query but not sure whether it is correct way
select (current_date - interval age day) from table_name;
Please help me with this.
You can use the date_add function, like this:
SELECT date_add('year', -age, current_date) FROM table_name
I.e. subtract age number of 'year'(s) from the current date.

Subtracting current date from column to show in oracle apex classic report

I have a very simple question but since i am not familiar with SQL or PL/SQL, i got no idea to do that.
In my Oracle APEX Application, I am loading data from a table into a CLASSIC REPORT through setting Local Database/SQL Query as source.
I have to make 4 columns from data of 2 columns stored in a table. I can load 3 without any issue using the below simple statement:
Select TaskName, DueDate, DueDate - 3 as ReminderDate
from table_name
Fourth column should be "RemainingDays" which equals to DueDate-current date, I have tried writing DueDate - Sys_date and DueDate - current_date in the above statement to get the fourth column but probably its not the correct way as i get error instead of all 4 columns. (I am doing in it basic excel/dax way). Any Help here?
When you subtract a date from another date, Oracle returns a number which is the number of days between the two dates.
One thing to note when using SYSDATE or CURRENT_DATE is that you may get different results if your user is not in the same timezone as the database. SYSDATE returns the current time of the database. CURRENT_DATE returns the current time of the user whatever timezone they may be in.
If possible, try building the query in a tool such as SQL Developer, get it working there, then build your Classic Report in APEX. If you are still receiving an error, please share the error you are receiving as well as the query you are using.
Example
--Start of sample data
WITH
t (task_name, due_date)
AS
(SELECT 'task1', DATE '2020-9-30' FROM DUAL
UNION ALL
SELECT 'task2', DATE '2020-9-28' FROM DUAL)
--End of sample data
SELECT task_name,
due_date,
due_date - 3 AS reminder_date,
ROUND (due_date - SYSDATE,2) AS days_remaining
FROM t;
Result
TASK_NAME DUE_DATE REMINDER_DATE DAYS_REMAINING
____________ ____________ ________________ _________________
task1 30-SEP-20 27-SEP-20 13.66
task2 28-SEP-20 25-SEP-20 11.66

How to calculate gap between 2 timestamps (edited for AWS Athena )

I Have many IOT devices that sends data to my Amazon Athena server, i created a table to store the data and the table contains 2 columns: LocalTime indicate the time that the IOT device capture his status, ServerTime indicate the time the Data arrived to server (sometimes the IOT device doesn't have network connections )
I would like to count the "gaps" in block of hours (let's say 1 hour ) in order to know the deviation of the data arriving, for example:
the result that I would like to get is:
In order to calculate the result i want to calculate how many hours passed between serverTime and LocalTime.
so the first entry (1.1.2019 12:15 - 1.1.2019 10:25 ) = 1-2 hours.
Thanks
If it is MSSQL Server is your database, you can try this below script to get your desired output-
SELECT
CAST(DATEDIFF(HH,localTime,serverTime)-1 AS VARCHAR) +'-'+
CAST(DATEDIFF(HH,localTime,serverTime) AS VARCHAR) [Hours],
COUNT(*) [Count]
FROM your_table
GROUP BY CAST(DATEDIFF(HH,localTime,serverTime)-1 AS VARCHAR) +'-'+
CAST(DATEDIFF(HH,localTime,serverTime) AS VARCHAR)
Oracle
If you using Oracle database as a system, you can use this statement:
select CONCAT(CONCAT (diff_hours,'-') , diff_hours+1) as Hours, count(diff_hours) as Count
from (select 24 * (to_date(LocalTime, 'YYYY-MM-DD hh24:mi') - to_date(ServerTime, 'YYYY-MM-DD hh24:mi')) diff_hours from T_TIMETABLE )
group by diff_hours
order by diff_hours;
Note: This will not display the empty intervals.

Python UDF in Redshift fast with CTE than direct select query

I have a function written in python for redshift, code below (calculates business days between two dates "South African Holidays and weekends"):
CREATE OR REPLACE FUNCTION b_days (start_date timestamp, end_date timestamp)
RETURNS INTEGER IMMUTABLE as $$
from pandas import date_range
from pandas.tseries.holiday import AbstractHolidayCalendar, Holiday
the code is super quick when test on
Jupyter Notebook
run function on a simple select: select business_days('2018-06-29','2018-07-02') return 2 as answer
using CTE
with cte as(
select id, sdate, edate
from bigtable --with million rows
)
select id, sdate, edate, business_days(sdate,edate)
from cte
but it runs non stop when i say:
select id, sdate, edate, business_days(sdate,edate)
from bigtable --with million rows