Python UDF in Redshift fast with CTE than direct select query

Python UDF in Redshift fast with CTE than direct select query - python-2.7

I have a function written in python for redshift, code below (calculates business days between two dates "South African Holidays and weekends"):
CREATE OR REPLACE FUNCTION b_days (start_date timestamp, end_date timestamp)
RETURNS INTEGER IMMUTABLE as $$
from pandas import date_range
from pandas.tseries.holiday import AbstractHolidayCalendar, Holiday
the code is super quick when test on
Jupyter Notebook
run function on a simple select: select business_days('2018-06-29','2018-07-02') return 2 as answer
using CTE
with cte as(
select id, sdate, edate
from bigtable --with million rows
)
select id, sdate, edate, business_days(sdate,edate)
from cte
but it runs non stop when i say:
select id, sdate, edate, business_days(sdate,edate)
from bigtable --with million rows

Related

Query Google Big Query Table by today's date

I have query pertaining to the google big query tables. We are currently looking to query the big query table based on the file uploaded on the day into the cloud storage.
Meaning:
I have to load the data into big query table based on every day's data into cloud storage.
When i query:
select * from BQT where load_date =<TODAY's DATE>
Can we achieve this without adding the date field into the file?

If you just don't want to add a date column, Append current date suffix to your table name like BQT_20200112 when the GCS file is uploaded.
Then you can query specific datetime table by _TABLE_SUFFIX syntax.
Below is example query using _TABLE_SUFFIX
SELECT
field1,
field2,
field3
FROM
`your_dataset.BQT_*`
WHERE
_TABLE_SUFFIX = '20200112'
As you see, You don't need to add additional field like load_date when you query the tables using date suffix and wildcard symbol.

Subtracting current date from column to show in oracle apex classic report

I have a very simple question but since i am not familiar with SQL or PL/SQL, i got no idea to do that.
In my Oracle APEX Application, I am loading data from a table into a CLASSIC REPORT through setting Local Database/SQL Query as source.
I have to make 4 columns from data of 2 columns stored in a table. I can load 3 without any issue using the below simple statement:
Select TaskName, DueDate, DueDate - 3 as ReminderDate
from table_name
Fourth column should be "RemainingDays" which equals to DueDate-current date, I have tried writing DueDate - Sys_date and DueDate - current_date in the above statement to get the fourth column but probably its not the correct way as i get error instead of all 4 columns. (I am doing in it basic excel/dax way). Any Help here?

When you subtract a date from another date, Oracle returns a number which is the number of days between the two dates.
One thing to note when using SYSDATE or CURRENT_DATE is that you may get different results if your user is not in the same timezone as the database. SYSDATE returns the current time of the database. CURRENT_DATE returns the current time of the user whatever timezone they may be in.
If possible, try building the query in a tool such as SQL Developer, get it working there, then build your Classic Report in APEX. If you are still receiving an error, please share the error you are receiving as well as the query you are using.
Example
--Start of sample data
WITH
t (task_name, due_date)
AS
(SELECT 'task1', DATE '2020-9-30' FROM DUAL
UNION ALL
SELECT 'task2', DATE '2020-9-28' FROM DUAL)
--End of sample data
SELECT task_name,
due_date,
due_date - 3 AS reminder_date,
ROUND (due_date - SYSDATE,2) AS days_remaining
FROM t;
Result
TASK_NAME DUE_DATE REMINDER_DATE DAYS_REMAINING
____________ ____________ ________________ _________________
task1 30-SEP-20 27-SEP-20 13.66
task2 28-SEP-20 25-SEP-20 11.66

Pivot with dynamic DATE columns

I have a query that I created from a table.
example:
select
pkey,
trunc (createdformat) business_date,
regexp_substr (statistics, 'business_ \ w *') business_statistics
from business_data
where statistics like '% business_%'
group by regexp_substr(statistics, 'business_\w*'), trunc(createdformat)
This works great thanks to your help.
Now I want to show that in a crosstab / pivot.
That means in the first column are the "business_statistics", the column headings are the "dynamic days from business_date".
I've tried the following, but it doesn't quite work yet
SELECT *
FROM (
select
pkey,
trunc(createdformat) business_date,
regexp_substr(statistics, 'business_\w*') business_statistics
from business_data
where statistics like '%business_%'
)
PIVOT(
count(pkey)
FOR business_date
IN ('17.06.2020','18.06.2020')
)
ORDER BY business_statistics
If I specify the date, like here 17.06.2020 and 18.06.2020 it works. 3 columns (Business_Statistic, 17.06.2020, 18.06.2020). But from column 2 it should be dynamic. That means he should show me the days (date) that are also included in the query / table. So that is the result of X columns (Business_Statistics, Date1, Date2, Date3, Date4, ....). Dynamic based on the table data.
For example, this does not work:
...
IN (SELECT DISTINCT trunc(createdformat) FROM BUSINESS_DATA WHERE statistics like '%business_%' order by trunc(createdformat))
...

The pivot clause doesn't work with dynamic values.
But there are some workarounds discuss here: How to Convert Rows to Columns and Back Again with SQL (Aka PIVOT and UNPIVOT)
You may find one workaround that suits your requirements.

Unfortunately, I am not very familiar with PL / SQL. But could I still process the start date and the end date of the user for the query?
For example, the user enters the APEX environment as StartDate: June 17, 2020 and as EndDate: June 20, 2020.
Then the daily difference is calculated in the PL / SQL query, then a variable is filled with the value of the entered period using Loop.
Example: (Just an idea, I'm not that fit in PL / SQL yet)
DECLARE
startdate := :P9999_StartDate 'Example 17.06.2020
enddate := P9999_EndDate 'Example 20.06.2020
BEGIN
LOOP 'From the startdate to the enddate day
businessdate := businessdate .... 'Example: 17.06.2020,18.06.2020,19.06.2020, ...
END LOOP
SELECT *
FROM (
select
pkey,
trunc(createdformat) business_date,
regexp_substr(statistics, 'business_\w*') business_statistics
from business_data
where statistics like '%business_%'
)
PIVOT(
count(pkey)
FOR business_date
IN (businessdate)
)
ORDER BY business_statistics
END;
That would be my idea, but I fail to implement it. Is that possible? I hope you understand what I mean

Query to calculate cost by month using AWS Athena querying

I have a table like below.
item_id bill_start_date bill_end_date usage_amount
635212 2019-02-01 00:00:00.000 3/1/2019 00:00:00.000 13.345 user_project
IBM
I am trying to find usage_amount by each month and each project. Amazon Athena query engine is based on Presto 0.172. Due to the limitations in Athena, it's not recognizing query like select sysdate from dual;.
I tried to convert bill_start_date and bill_end_date from timestamp to date but failed. even current_date() didn't work in my case. I am able to do calculate the total cost by hard coding the values but my end goal is to perform the action on columns.
SELECT (FLOOR(SUM(usage_amount)*100)/100) AS total,
user_project
FROM test_table
WHERE bill_start_date
BETWEEN date '2019-02-01'
AND date '2019-03-01'
GROUP BY user_project;

In Presto, current_timestamp is a SQL standard function which does not use parentheses.
To group by month, I'd use date_trunc('month', bill_start_date).
All of these functions are documented here

How to query historical table size of database in Redshift to determine database size growth

I want to project forward the size of my Amazon Redshift tables because I'm planning to expand my Redshift cluster size.
I know how to query the table size for today (see query below) but how can I measure the growth of my table sizes over time without make an ETL job to make snapshot day-by-day table size?
-- Capture table sizes
select
trim(pgdb.datname) as Database,
trim(pgn.nspname) as Schema,
trim(a.name) as Table,
b.mbytes,
a.rows
from (
select db_id, id, name, sum(rows) as rows
from stv_tbl_perm a
group by db_id, id, name
) as a
join pg_class as pgc on pgc.oid = a.id
join pg_namespace as pgn on pgn.oid = pgc.relnamespace
join pg_database as pgdb on pgdb.oid = a.db_id
join (
select tbl, count(*) as mbytes
from stv_blocklist
group by tbl
) b on a.id = b.tbl
order by mbytes desc, a.db_id, a.name;

There is no historical table size information retained by Amazon Redshift. You would need to run a query on a regular basis, such as the one in your question.
You could wrap the query in an INSERT statement and run it on a weekly basis, inserting the results into a table. This way, you'll have historical table size information for each table each week that you can use to predict future growth.
It would be worth doing a VACUUM prior to such measurements, to remove deleted rows from storage.

Following metrics is available in cloudwatch
RedshiftManagedStorageTotalCapacity (m1)
PercentageDiskSpaceUsed (m2).
Create a cloudwatch math expression m1*m2/100 to get this data for the past 3 months.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Python UDF in Redshift fast with CTE than direct select query - python-2.7

Related

Query Google Big Query Table by today's date

Subtracting current date from column to show in oracle apex classic report

Pivot with dynamic DATE columns

Query to calculate cost by month using AWS Athena querying

How to query historical table size of database in Redshift to determine database size growth

Categories

Resources