How to improve the query to result the data in one row?

How to improve the query to result the data in one row? - amazon-web-services

The following redshift query is showing the data as screenshot in below:
select
distinct id_butaca,
max(CASE WHEN id_tipoorden = 1 THEN fechahoracompra ELSE NULL END) over (partition by id_butaca,id_tipoorden ) as max_fecha_compra,
max(CASE WHEN id_tipoorden = 4 THEN fechahoracompra ELSE NULL END) over (partition by id_butaca,id_tipoorden ) as max_fecha_devo
from dw_fact_table
where
id_butaca = 175044501
How can I remove the empty values and put the values in the same row?
id_butaca, max_fecha_compra, max_fecha_devo
175044501 2023-01-09 12:11:04.0 2023-01-09 12:09:55

This will help you to merge two rows in one row.
select id_butaca,
max(max_fecha_compra) as max_fecha_compra,
max(max_fecha_devo) as max_fecha_devo
from (
select distinct id_butaca,
max(
CASE
WHEN id_tipoorden = 1 THEN fechahoracompra
ELSE to_timestamp('1970-01-01 00:00:00', 'YYYY-MM-DD HH24:MI:SS')
END
) over (partition by id_butaca, id_tipoorden) as max_fecha_compra,
max(
CASE
WHEN id_tipoorden = 4 THEN fechahoracompra
ELSE to_timestamp('1970-01-01 00:00:00', 'YYYY-MM-DD HH24:MI:SS')
END
) over (partition by id_butaca, id_tipoorden) as max_fecha_devo
from dw_fact_table
where id_butaca = 175044501
)
group by id_butaca

Maybe something like this?
select id_butaca,
max(CASE WHEN id_tipoorden = 1 THEN fechahoracompra ELSE NULL END) over (partition by id_butaca,id_tipoorden ) as max_fecha_compra,
max(CASE WHEN id_tipoorden = 4 THEN fechahoracompra ELSE NULL END) over (partition by id_butaca,id_tipoorden ) as max_fecha_devo
from dw_fact_table
where id_butaca = 175044501
group by id_butaca

Related

Redshift Error when executing the delete script with EXISTS function. The Select runs fine for this query

This Redshift query fails -
DELETE FROM TBL_1 stg
WHERE EXISTS (
WITH CCDA as (
SELECT
row_number() OVER (PARTITION BY emp_id,customer_id ORDER BY seq_num desc) rn
, *
FROM TBL_2
WHERE end_dt > (SELECT max(end_dt) FROM TBL_3)
)
SELECT emp_id,customer_id FROM CCDA WHERE rn = 1
AND stg.emp_id = CCDA.emp_id
AND stg.customer_id = CCDA.customer_id
);
Error: Invalid operation: syntax error at or near "stg"
However, the below query runs fine -
SELECT * FROM TBL_1 stg
WHERE EXISTS (
WITH CCDA as (
SELECT
row_number() OVER (PARTITION BY emp_id,customer_id ORDER BY seq_num desc) rn
, *
FROM TBL_2
WHERE end_dt > (SELECT max(end_dt) FROM TBL_3)
)
SELECT emp_id,customer_id FROM CCDA WHERE rn = 1
AND stg.emp_id = CCDA.emp_id
AND stg.customer_id = CCDA.customer_id
);
Am I missing something?

You cannot use an alias in a DELETE statement for the target table. "stg" cannot be used as the alias and this is why you are getting this error.
Also to reference other tables in a DELETE statement you need to use the USING clause.
See: https://docs.aws.amazon.com/redshift/latest/dg/r_DELETE.html
A quick stab of what this would look like (untested):
WITH CCDA as (
SELECT
row_number() OVER (PARTITION BY emp_id,customer_id ORDER BY seq_num desc) rn
, *
FROM TBL_2
WHERE end_dt > (SELECT max(end_dt) FROM TBL_3)
)
DELETE FROM TBL_1
USING CCDA
WHERE CCDA.rn = 1
AND TBL_1.emp_id = CCDA.emp_id
AND TBL_1.customer_id = CCDA.customer_id
;

SQL for nested WITH CLAUSE - RESULTS OFFSET in Oracle 19c

Please suggest a way to implement nesting of (temp - results - select) as shown below?
I see that oracle 19c does not allow nesting of WITH clause.
with temp2 as
(
with temp1 as
(
__
__
),
results(..fields..) as
(
select ..<calc part>.. from temp1, results where __
)
select ..<calc part>.. from temp1 join results where __
),
results(..fields..) as
(
select ..<calc part>.. from temp2, results where __
)
select ..<calc part>.. from temp2 join results where __
For instance:
DB Fiddle
I need to calculate CALC3 in similar recursive way as of CALC
CREATE TABLE TEST ( DT DATE, NAME VARCHAR2(10), VALUE NUMBER(10,3));
insert into TEST values ( to_date( '01-jan-2021'), 'apple', 198.95 );
insert into TEST values ( to_date( '02-jan-2021'), 'apple', 6.15 );
insert into TEST values ( to_date( '03-jan-2021'), 'apple', 4.65 );
insert into TEST values ( to_date( '06-jan-2021'), 'apple', 20.85 );
insert into TEST values ( to_date( '01-jan-2021'), 'banana', 80.5 );
insert into TEST values ( to_date( '02-jan-2021'), 'banana', 9.5 );
insert into TEST values ( to_date( '03-jan-2021'), 'banana', 31.65 );
--Existing working code -
with t as
( select
test.*,
row_number() over ( partition by name order by dt ) as seq
from test
),
results(name, dt, value, calc ,seq) as
(
select name, dt, value, value/5 calc, seq
from t
where seq = 1
union all
select t.name, t.dt, t.value, ( 4 * results.calc + t.value ) / 5, t.seq
from t, results
where t.seq - 1 = results.seq
and t.name = results.name
)
select results.*, calc*3 as calc2 -- Some xyz complex logic as calc2
from results
order by name, seq;
Desired output:
CALC3 - grouped by name and dt -
((CALC3 of prev day record * 4) + CALC2 of current record )/ 5
i.e for APPLE
for 1-jan-21, CALC = ((0*4)+119.37)/5 = 23.87 -------> since it is 1st record, have taken 0 as CALC3 of prev day record
for 2-jan-21, CALC = ((23.87*4)+99.19)/5= 115.33 -----> prev CALC3 is considered from 1-jan-21 - 23.87 and 99.19 from current row
for 3-jan-21, CALC = ((115.33*4)+82.14)/5= 477.76 and so on
For BANANA
1-jan-21, CALC = ((0*4)+48.30)/5=9.66
1-jan-21, CALC = ((9.66*4)+44.34)/5=47.51
etc

You do not need to, you can just do it all in one level:
with temp1(...fields...) as
(
__
__
__
),
results1(...fields...) as
(
select ...<calc part>... from temp1 where __
),
temp2( ...fields...) as
(
select ...<calc part>... from temp1 join results1 where __
),
results2(...fields...) as
(
select ...<calc part>... from temp2 where __
)
select ...<calc part>... from temp2 join results2 where __
For your actual problem, you can use a MODEL clause:
SELECT dt,
name,
amount,
calc,
seq,
calc2,
calc3
FROM (
SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY name ORDER BY dt) AS seq
FROM test t
)
MODEL
PARTITION BY (name)
DIMENSION BY (seq)
MEASURES ( dt, amount, 0 AS calc, 0 AS calc2, 0 as calc3)
RULES (
calc[1] = amount[1]/5,
calc[seq>1] = (amount[cv(seq)] + 4*calc[cv(seq)-1])/5,
calc2[seq] = 3*calc[cv(seq)],
calc3[1] = calc2[1]/5,
calc3[seq>1] = (calc2[cv(seq)] + 4*calc3[cv(seq)-1])/5
)
Which outputs:
DT
NAME
AMOUNT
CALC
SEQ
CALC2
CALC3
01-JAN-21
banana
80.5
16.1
1
48.3
9.66
02-JAN-21
banana
9.5
14.78
2
44.34
16.596
03-JAN-21
banana
31.65
18.154
3
54.462
24.1692
01-JAN-21
apple
198.95
39.79
1
119.37
23.874
02-JAN-21
apple
6.15
33.062
2
99.186
38.9364
03-JAN-21
apple
4.65
27.3796
3
82.1388
47.57688
06-JAN-21
apple
20.85
26.07368
4
78.22104
53.705712
db<>fiddle here

Converting Case statement, Filter from t-SQL query to DAX

I have a problem converting below t-sql query into DAX.
Overview - There are two sample tables - Table1 and Table2 with below schema
Table1 (ID varchar(20),Name varchar(30))
Table2 (CapID varchar(20),CAPName varchar(30), CapID_Final varchar(20))
Please note : There exists one to many relationship between above tables : [ID] in Table2 with [CapID] in Table1
I am trying to derive CapID_Final column in table2 based on conditions as per my t-SQL query in below which works perfectly fine -
SELECT CASE
WHEN [CapID] like 'CA%' and [CAPName]='x12345-Sample'
and [CapID] not in(select [ID] from Table1 where Name='x12345-Sample')
THEN 'Undefined_Cap_1'
WHEN [CapID] like 'CA%' and [CAPName]='z12345-Sample'
and [CapID] not in(select [ID] from Table1 where Name='z12345-Sample')
THEN 'Undefined_Cap_2'
WHEN [CapID] like 'CA%' and [CAPName]='a123-Sample'
and [CapID] not in(select [ID] from Table1 where Name='a123-Sample')
THEN 'Undefined'
ELSE [CapID]
END AS [CapID_Final] from Table2
However, I want the same derivation for CapID_Final column in Power BI in a calculated column using DAX.
So far, I have tried below code - but it returns "Undefined" for even matched conditions -
CapID_Final =
IF(LEFT(Table2[CapID],2)="CA" && Table2[CAPName]="z12345-Sample" &&
NOT
(COUNTROWS (
FILTER (
Table1,CONTAINS(Table1,Table1[ID],Table2[CapID])
)
) > 0),"Undefined_Cap_1","Undefined"
)
I am not familiar with DAX, however I tried and couldn't figure it out.
Could you please let me know how to convert my sql query to equivalent DAX in Power BI?

A SWITCH is basically the equivalent of a CASE clause here:
CapID_Final =
SWITCH (
TRUE (),
LEFT ( Table2[CapID], 2 ) = "CA"
&& Table2[CAPName] = "x12345-Sample"
&& NOT (
Table2[CapID]
IN CALCULATETABLE ( VALUES ( Table1[ID] ), Table1[Name] = "x12345-Sample" )
), "Undefined_Cap_1",
LEFT ( Table2[CapID], 2 ) = "CA"
&& Table2[CAPName] = "z12345-Sample"
&& NOT (
Table2[CapID]
IN CALCULATETABLE ( VALUES ( Table1[ID] ), Table1[Name] = "z12345-Sample" )
), "Undefined_Cap_2",
LEFT ( Table2[CapID], 2 ) = "CA"
&& Table2[CAPName] = "a12345-Sample"
&& NOT (
Table2[CapID]
IN CALCULATETABLE ( VALUES ( Table1[ID] ), Table1[Name] = "a12345-Sample" )
), "Undefined",
Table1[CapID]
)
You might even be able to refactor it a bit to be more code efficient. Assuming I didn't make any logic mistakes:
CapID_Final =
VAR IDs =
CALCULATETABLE ( VALUES ( Table1[ID] ), Table1[Name] = Table2[CAPName] )
RETURN
IF (
LEFT ( Table2[CapID], 2 ) = "CA"
&& NOT ( Table2[CapID] IN IDs ),
SWITCH (
Table2[CAPName],
"x12345-Sample", "Undefined_Cap_1",
"z12345-Sample", "Undefined_Cap_2",
"a12345-Sample", "Undefined"
),
Table1[CapID]
)

As a best-practice never use calculated column. In fact, if extensively used they slow down your model refresh and heavily increase your model weight (because they are not compressed). Instead, calculate it in your back-end database or using M Query.
Having said this, the solution to your question is very simple using a SWITCH function:
SWITCH ( <Expression>, <Value>, <Result> [, <Value>, <Result> [, … ] ] [, <Else>] )
In your case would be as follow:
CapIDFinal:=
SWITCH(TRUE(),
AND(CONDITION_1_1, CONDITION_1_2), "Value if condition 1 is true",
AND(CONDITION_2_1, CONDITION_2_2), "Value if condition 2 is true",
"Value if none of above conditions is true
)

Percentage of change with 2 data slicers in Power BI

I have a scenario with two data slicers. The first data slicer filters data for one period, the second one for another period. By editing visual interactions I got this works at the same page.
Now I want to compare two resulting values (in this case, number of transactions, and find a percentage of change between two selected periods.
I duplicated data column so I have two date columns for each slicer and I calculated the next measures:
# of Transactions 1 = CALCULATE(COUNT(Report[ProductID]),DATESBETWEEN(Report[Date1],[Start Date 1],[Last Date 1]))
# of Transactions 2 = CALCULATE(COUNT(Report[ProductID]),DATESBETWEEN(Report[Date2],[Start Date 2],[Last Date 2]))
% Transaction Change = ([# of Transactions 1]/[# of Transactions 2]) - 1
The first 2 measures are accurate (# of Transactions 1 & 2), but % of change doesn't work.
If you look at the screenshot below, you'll see # od Transactions 1 = 1,990 and # of Transactions 2 = 2,787. I want to compare this 2 values now.
How can I solve this?
Thank you.

First create two measure for your date bounds:
Min Date :=
MIN ( 'Report'[Date] )
Max Date :=
MAX ( 'Report'[Date] )
Then create a date table using the following DAX, this will join to you 'Report' table on the primary date:
Dates :=
VAR MinDate = [Min Date]
VAR MaxDate = [Max Date]
VAR BaseCalendar =
CALENDAR ( MinDate, MaxDate )
RETURN
GENERATE (
BaseCalendar,
VAR BaseDate = [Date]
VAR YearDate =
YEAR ( BaseDate )
VAR MonthNumber =
MONTH ( BaseDate )
VAR YrMonth =
100 * YEAR ( BaseDate )
+ MONTH ( BaseDate )
VAR Qtr =
CONCATENATE ( "Q", CEILING ( MONTH ( BaseDate ) / 3, 1 ) )
VAR YrMonthQtr =
100 * YEAR ( BaseDate )
+ MONTH ( BaseDate )
& CONCATENATE ( "Q", CEILING ( MONTH ( BaseDate ) / 3, 1 ) )
VAR YrMonthQtrDay =
100 * YEAR ( BaseDate )
+ MONTH ( BaseDate )
& CONCATENATE ( "Q", CEILING ( MONTH ( BaseDate ) / 3, 1 ) )
& DAY ( BaseDate )
RETURN
ROW (
"Day", BaseDate,
"Year", YearDate,
"Month Number", MonthNumber,
"Month", FORMAT ( BaseDate, "mmmm" ),
"Year Month", FORMAT ( BaseDate, "mmm yy" ),
"YrMonth", YrMonth,
"Qtr", Qtr,
"YrMonthQtr", YrMonthQtr,
"YrMonthQtrday", YrMonthQtrDay
)
)
Now create another date table from which to compare, and join to your primary date table in 'Report' and ensure the relationship is inactive:
Compare Dates :=
ALLNOBLANKROW ( 'Dates' )
Now create the [# of transaction] measure; one for 'Dates' and another for 'Compare Dates' like so:
[# of Transaction 1] :=
CALCULATE (
COUNT ( Report[ProductID] )
)
[# of Transaction 2] :=
CALCULATE (
[# of transaction 1],
ALL ( 'Dates' ),
USERELATIONSHIP ( 'Compare Dates'[Date], 'Report'[Date] )
)
Now Create the % Delta measure:
Transaction Change := CALCULATE(DIVIDE([# of Transactions 1],[# of Transactions 2]) - 1)
This should work like a charm and will work for any dates selected in your slicers, you will still need to associate your date slicers with your new date tables.
I hope this helps!!

Translating MySql query into Django ORM query

I have a query in MySql that I need translated into Django ORM. It involves joining on two tables with two counts on one of the tables. I'm pretty close to it in Django but I get duplicate results. Here's the query:
SELECT au.id,
au.username,
COALESCE(orders_ct, 0) AS orders_ct,
COALESCE(clean_ct, 0) AS clean_ct,
COALESCE(wash_ct, 0) AS wash_ct
FROM auth_user AS au
LEFT OUTER JOIN
( SELECT user_id,
Count(*) AS orders_ct
FROM `order`
GROUP BY user_id
) AS o
ON au.id = o.user_id
LEFT OUTER JOIN
( SELECT user_id,
Count(CASE WHEN service = 'clean' THEN 1
END) AS clean_ct,
Count(CASE WHEN service = 'wash' THEN 1
END) AS wash_ct
FROM job
GROUP BY user_id
) AS j
ON au.id = j.user_id
ORDER BY au.id DESC
LIMIT 100 ;
My current Django query (which brings back unwanted duplicates):
User.objects.annotate(
orders_ct = Count( 'orders', distinct = True )
).annotate(
clean_ct = Count( Case(
When( job__service__exact = 'clean', then = 1 )
) )
).annotate(
wash_ct = Count( Case(
When( job__service__exact = 'wash', then = 1 )
) )
)
The above Django code produces the following query which is close but not right:
SELECT DISTINCT `auth_user`.`id`,
`auth_user`.`username`,
Count(DISTINCT `order`.`id`) AS `orders_ct`,
Count(CASE
WHEN `job`.`service` = 'clean' THEN 1
ELSE NULL
end) AS `clean_ct`,
Count(CASE
WHEN `job`.`service` = 'wash' THEN 1
ELSE NULL
end) AS `wash_ct`
FROM `auth_user`
LEFT OUTER JOIN `order`
ON ( `auth_user`.`id` = `order`.`user_id` )
LEFT OUTER JOIN `job`
ON ( `auth_user`.`id` = `job`.`user_id` )
GROUP BY `auth_user`.`id`
ORDER BY `auth_user`.`id` DESC
LIMIT 100
I could probably achieve it by doing some raw sql subqueries but I would like to remain as abstract as possible.

Based on this answer, you can write:
User.objects.annotate(
orders_ct = Count( 'orders', distinct = True ),
clean_ct = Count( Case(
When( job__service__exact = 'clean', then = F('job__pk') )
), distinct = True ),
wash_ct = Count( Case(
When( job__service__exact = 'wash', then = F('job__pk') )
), distinct = True )
)
Table (after joins):
user.id order.id job.id job.service your case/when my case/when
1 1 1 wash 1 1
1 1 2 wash 1 2
1 1 3 clean NULL NULL
1 1 4 other NULL NULL
1 2 1 wash 1 1
1 2 2 wash 1 2
1 2 3 clean NULL NULL
1 2 4 other NULL NULL
Desired output for wash_ct is 2. Counting distinct values in my case/when, we will get 2.

I think this will work, the chained annotation of job might have produced duplicate users.
If not can you elaborate on duplicates you are seeing.
User.objects.annotate(
orders_ct = Count( 'orders', distinct = True )
).annotate(
clean_ct = Count( Case(
When( job__service__exact = 'clean', then = 1 )
) ),
wash_ct = Count( Case(
When( job__service__exact = 'wash', then = 1 )
) )
)

Try adding values(), also when distinct=True you can combine Count()'s in one annotation().
Users.objects.values("id").annotate(
orders_ct = Count('orders', distinct = True)
).annotate(
clean_ct = Count(Case(When(job__service__exact='clean', then=1)),
distinct = True),
wash_ct = Count(Case(When(job__service__exact='wash',then=1)),
distinct = True)
).values("id", "username", "orders_ct", "clean_ct", "wash_сt")
Using values("id") should add GROUP BY 'id' for annotations and therefore prevent duplicates, see docs.
Also, there's Coalesce, but it doesn't look like it's needed, since Count() returns int anyway. And distinct, but again the distinct in Count() should be enough.
Not sure if Case needed inside Count() as it should count them anyway.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to improve the query to result the data in one row? - amazon-web-services

Related

Redshift Error when executing the delete script with EXISTS function. The Select runs fine for this query

SQL for nested WITH CLAUSE - RESULTS OFFSET in Oracle 19c

Converting Case statement, Filter from t-SQL query to DAX

Percentage of change with 2 data slicers in Power BI

Translating MySql query into Django ORM query

Categories

Resources