AWS Athena date sql query - amazon-web-services

Below is the data in csv file in s3 bucket which I have used to build Athena database.
John
Wright
cricket
25
Steve
Adams
football
30
I am able to run the query and get the data.
Now I am trying to fetch date of birth based on age column. Is it possible to generate date of birth from age column like current date - age (column) and print only the date of birth?
I tried below query but not sure whether it is correct way
select (current_date - interval age day) from table_name;
Please help me with this.

You can use the date_add function, like this:
SELECT date_add('year', -age, current_date) FROM table_name
I.e. subtract age number of 'year'(s) from the current date.

Related

Explode a table with a monthly increment in Amazon Redshift

I have a sample table:
id
start_dt
end_dt
100
06/07/2021
30/09/2021
I would like to get the following output
id
start_dt
end_dt
100
06/07/2021
31/07/2021
100
01/08/2021
30/08/2021
100
01/09/2021
30/09/2021
I have tried using GENERATE_SERIES() in Amazon Redshift, but that does not give the required result.
The existing table is quite large so I could use temp tables then join back to another table at a later stage.
I have trawled through other posts, but other proposed solutions isn't quite giving the desired results / don't work at all on Amazon Redshift. Any help in solving this would be appreciated.
The traditional method would be:
Create a Calendar table that contains one row per month, with start_date and end_date columns
Join your table to the Calendar table, where table.start_dt <= calendar.end_dt AND table.end_dt >= calendar.start_dt
The two columns would be:
GREATEST(table.start_dt, calendar.start_dt)
LEAST(table.end_dt, calendar.end_dt)

Power BI translating a sql query to filters

I was wondering if this is possible in Power BI, I am extremely new to this and I am trying to relate how a sql query can translate in to a power bi report.
SELECT
expiresDate,
Name,
Addr,
ValidFrom,
ValidTo,
ChildName,
ChildValidFrom,
ChildValidTo,
RecValidFrom,
RecValidTo
FROM Table
WHERE expiresDate Between <date1> and <date2>
AND <Date3> BETWEEN ValidFrom AND ValidTo
AND <Date3> BETWEEN ValidFrom AND ValidTo
AND <Date3> BETWEEN ValidFrom AND ValidTo
A brief explanation. The report is for 3 months in advance. So in August the report is for September <date1 = 01/09/2021) and October (date2 = 31/10/2021) data. However the data can change on a daily basis. So this depends on Date3 which could be any day in August.
I have created a table that is a calendar and has the additional columns that calculate the start and end dates from a particular date. I just can't work out how to relate this to the dataset which is the query without the WHERE. I would then want the filters to be able to determine the result. Ultimately as I have it at present a single date that will then get the dates from the start and end dates as described earlier. Or display by range using the latest iteration of the record to display.
For example, First part of table
expiresDate
AccNo
Name
Addr
ValidFrom
ValidTo
ChildName
2021-10-01
1
Robert
1 Here
2019-01-01
2021-08-16
Cheese
2021-10-01
1
Robert
1 Here
2019-01-01
2021-08-16
Rhubarb
2021-10-01
1
Bob
1 Here
2021-08-17
2020-08-23
Rhubarb
Second half of table
ChildValidFrom
ChildValidTo
RecValidFrom
RecValidTo
2019-01-01
2021-08-10
2019-19-01
2020-12-31
2021-08-11
2021-08-23
2021-01-01
2021-08-15
2021-08-11
2021-08-23
2021-08-16
2020-08-23
The table is a view which has squashed the data to unique records and when the changes occurred. The dataset is considerably lower, a record count from 10m to 54k.
The requirement is that all To - From dates are within the date specified. Either being a date in the calendar that is entered as a filter... or today.
The report would bring out all records that have an expiryDate greater than 1 calendar month of the date, and less than 3 calendar months. I am just using August dates for the example so this would be from the 01/09/2021 - 31/10/2021.
If I use date 2021-08-01.
In my example there are 3 results for AccNo 1, but Only 1 should be displayed.
If I use the date 2021-08-01 the first row would be displayed.
If I use the date 2021-08-12 the second row should displayed.
If I use the date 2021-08-23 the third row should displayed.
Because the date used should fall between the date range of all 3 criteria
ValidFrom - ChildValidTo
ChildValidFrom - ChildValidTo
RecValidFrom - RecValidTo
Any help would be greatly appreciated. This is extremely frustrating, but I can understand that if this is possibly that this would make a nice visual for the users to check through their data based on entering a date.
Many thanks

Power BI Sum by Category and Month

I have a Power BI/DAX question. I'm looking to summarize my data by getting monthly transaction sums (including the year as well, i.e. MM/YY) and filtering them by individual account numbers. Here is an example:
I want to take that and make it into this:
I converted the dates to the format I want with this code: 
Transaction Month = MONTH(Table[Date]) & "/" & YEAR(Table[Date])
Then got the total monthly sum:
Total Monthly Sum = CALCULATE(sum(Table[Transaction Amount]),ALLEXCEPT(Table, Table[Transaction Month]))
Now I'm trying to figure out how to filter the total monthly sum by individual account numbers. Just as a note - I need this to be a calculated column as well because I'll want to identify accounts that surpass individual account monthly spending limits. Can anyone help me with this?
Thanks so much!
When working with calendar dates, it pays to have a calendar table linked to the transaction table. In the calendar table you will have each date, from the start date of your relevant time period to the end of the time period relevant to your data. The columns of the calendar table can then contain calculations on that date like month number, month name, year, year-month key, transaction month (as the first day of the month for the date in that row), etc.
Next, connect the two tables in the data model by dragging the transaction date to the calendar date column.
Now you can build charts and report tables that group data by month without writing any complicated DAX. Just pull the field "transaction month" from the calendar table and the Total Sum measure from the transaction table into the field well of the visual.
That's what Power BI is all about.

How to extract Month and Year from column in PowerBI powerquery

I have a column (monthyear) in the image below. I want to extract the Month and year from the column to put it in the new column. Note: In my dataset this information goes for every day of the year
So the new column would look like:
01/2020
01/2020
01/2020
etc.
In Power Query, use some of the date functions.
To get the year it will be
Date.Year([monthyear])
For the month, it will depend on how you want to format it. Using the month of June as an example:
To get 'Jun'
Date.ToText([monthyear],"MMM")
To get the month number in the format 06
Number.ToText(Date.Month([monthyear]), "00")
Just to get the number 6 it will be:
Date.Month([monthyear])
In DAX use the date functions
For year the calculated column will be:
YEAR([monthyear])
For the month:
MONTH([monthyear])
I would always do a much data transformation in Power Query when you can before it gets to the data model.

Query to calculate cost by month using AWS Athena querying

I have a table like below.
item_id bill_start_date bill_end_date usage_amount
635212 2019-02-01 00:00:00.000 3/1/2019 00:00:00.000 13.345 user_project
IBM
I am trying to find usage_amount by each month and each project. Amazon Athena query engine is based on Presto 0.172. Due to the limitations in Athena, it's not recognizing query like select sysdate from dual;.
I tried to convert bill_start_date and bill_end_date from timestamp to date but failed. even current_date() didn't work in my case. I am able to do calculate the total cost by hard coding the values but my end goal is to perform the action on columns.
SELECT (FLOOR(SUM(usage_amount)*100)/100) AS total,
user_project
FROM test_table
WHERE bill_start_date
BETWEEN date '2019-02-01'
AND date '2019-03-01'
GROUP BY user_project;
In Presto, current_timestamp is a SQL standard function which does not use parentheses.
To group by month, I'd use date_trunc('month', bill_start_date).
All of these functions are documented here