Django aggregrate/annotate with additional join - django

I have two model classes: Class A which represents table_a and Class B which represents table_b. Is there a way to aggregate or annotate Class A to include the value of a field in Class B other than the column that is aggregated on?
The SQL query that would accomplish what I need is as follows:
SELECT
a.*,
b2.col2
FROM table_a AS a
LEFT JOIN (SELECT
max(col1) AS m,
a_id
FROM table_b
GROUP BY a_id) AS b1 ON b1.a_id = a.id
LEFT JOIN table_b AS b2 ON b1.a_id = b2.a_id AND b2.col1 = b1.m
But the aggregate function only returns the data that would be retrieved with this:
SELECT
a.*,
b1.m
FROM table_a AS a
LEFT JOIN (SELECT
max(col1) AS m,
a_id
FROM table_b
GROUP BY a_id) AS b1 ON b1.a_id = a.id

Related

BigQuery compare all the columns(100+) from two rows in a sinle table

I have input table as below-
id
col1
col2
time
01
abc
001
12:00
01
def
002
12:10
Required output table-
id
col1
col2
time
diff_field
01
abc
001
12:00
null
01
def
002
12:10
col1,col2
I need to compare both the rows and find all the columns for which there is difference in value and keep those column names in a new column diff_field.
I need a optimized solution for this as my table has more than 100 columns(all the columns need to be compared)
You might consider below approach:
WITH sample_table AS (
SELECT '01' id, 'abc' col1, '001' col2, '12:00' time UNION ALL
SELECT '01' id, 'def' col1, '002' col2, '12:10' time UNION ALL
SELECT '01' id, 'def' col1, '002' col2, '12:20' time UNION ALL
SELECT '01' id, 'ddf' col1, '002' col2, '12:30' time
)
SELECT * EXCEPT(curr, prev),
(SELECT STRING_AGG('col' || offset)
FROM UNNEST(SPLIT(curr)) c WITH offset
JOIN UNNEST(SPLIT(prev)) p WITH offset USING (offset)
WHERE c <> p AND offset < ARRAY_LENGTH(SPLIT(curr)) - 1
) diff_field
FROM (
SELECT *, FORMAT('%t', t) AS curr, LAG(FORMAT('%t', t)) OVER w AS prev
FROM sample_table t
WINDOW w AS (PARTITION BY id ORDER BY time)
);
Query results
Below approach has no dependency on actual columns' names or any names convention rather then only id and time
create temp function extract_keys(input string) returns array<string> language js as """
return Object.keys(JSON.parse(input));
""";
create temp function extract_values(input string) returns array<string> language js as """
return Object.values(JSON.parse(input));
""";
select t.*,
( select string_agg(col)
from unnest(extract_keys(cur)) as col with offset
join unnest(extract_values(cur)) as cur_val with offset using(offset)
join unnest(extract_values(prev)) as prev_val with offset using(offset)
where cur_val != prev_val and col != 'time'
) as diff_field
from (
select t, to_json_string(t) cur, to_json_string(ifnull(lag(t) over(win), t)) prev
from your_table t
window win as (partition by id order by time)
)
if apply to sample data in your question (or rather extended version of it that I borrowed from Jaytiger answer) - the output is

SELECT MAX PARTITION TABLE

I have a table with partition on date(transaction_time), And I have a
problem with a select MAX.
I'm trying to get the row with the highest timestamp if I get more then 1 row in the result on one ID.
Example of data:
1. ID = 1 , Transaction_time = "2018-12-10 12:00:00"
2. ID = 1 , Transaction_time = "2018-12-09 12:00:00"
3. ID = 2 , Transaction_time = "2018-12-10 12:00:00"
4. ID = 2 , Transaction_time = "2018-12-09 12:00:00"
Result that I want:
1. ID = 1 , Transaction_time = "2018-12-10 12:00:00"
2. ID = 2 , Transaction_time = "2018-12-10 12:00:00"
This is my query
SELECT ID, TRANSACTION_TIME FROM `table1` AS T1
WHERE TRANSACTION_TIME = (SELECT MAX(TRANSACTION_TIME)
FROM `table1` AS T2
WHERE T2.ID = T1.ID )
The error I receive:
Error: Cannot query over table 'table1' without a filter over
column(s) 'TRANSACTION_TIME' that can be used for partition
elimination
It looks like BigQuery does not the correlated subquery in the WHERE clause. I don't know how to fix your current approach, but you might be able to just use ROW_NUMBER here:
SELECT t.ID, t.TRANSACTION_TIME
FROM
(
SELECT ID, TRANSACTION_TIME,
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY TRANSACTION_TIME DESC) rn
FROM table1
) t
WHERE rn = 1;
can be done this way:
SELECT id, MAX(transaction_time) FROM `table1` GROUP BY id;

SAS : Map a column from 1 table to any of the multiple columns in another table

I have a table1 that contains 4 different kind of ids
Data table1;
Input id1 $ id2 $ id3 $ final_id $;
Datalines;
1 a a1 p
2 b b2 q
- c c2 r
3 d - s
4 - d4 t
A table2 contains any of the ids from id1, id2 or id3 of table1:
Data table1;
Input id $ col1 $ col2 $;
Datalines;
1 gsh ywu
b hsjs kall
c2 jsjs ywe
3 sja weei
d4 ase uwh
I want to left join table1 on table2 such that I get a new column in table2 giving me final_id from table1.
How do i go about this problem?
Please help.
Thank you.
You can do it using SQL:
proc SQL noprint;
create table merged as
select b.final_id, a.*
from table2 as a left join table1 as b
on (a.id eq b.id1 or a.id eq b.id2 or a.id eq b.id3)
;
quit;

I have two tables emp and dept and I want to update the salary in emp table to increase by 10000 when the department name is "Software Engineer"

I have two tables emp and dept and I want to update the salary in emp table to increase by 10000 when the department name is "Software Engineer".emp table does not have dep name.
I have tried this query :
update emp
set salary = salary + 10000
where exists (select d.depatment_name, e.salary
from emp e
join department d on e.dep_id = d.department_id
where dep_name = 'Software Engineer');
select * from emp;
But its updating the salary for all rows.
I think the join is not working as expected. try selecting the department first then left joining employees. also, dept_name needed the "d." table alias.
update emp
set salary = salary + 10000
where exists (select d.depatment_name, e.salary
from department d
left join emp e
on e.dep_id = d.department_id
where d.dep_name = 'Software Engineer');
select * from emp;

SAS Proc SQL, combine where, left join and case

I have three piece of code. How can I combine them into one so that they look elegant? data1: pull data with some condition; data2: data1 left join new data; data3: set to data2 and create a new variable.
proc sql; create table data1 as select
a.ID,
b.decison_CD,
c.type,
from
dataA a,
dataB b,
dataC c,
where a.ID=b.ID
and a.ID=c.ID
and c.type not in ('Unknown')
and b.decison_CD in (ā€˜Yā€™,ā€™Nā€™)
; quit;
proc sql;
create table data2 as select
a.*
,b.payId
from data1 a
left join datanew b
on a.ID=b.ID;
quit;
data data3;
set data2;
if payID= . then booked =0;
else if payID=1 then booked=1;
run;
It looks like you can just use left joins and treat datanew as a fourth dataset:
proc sql;
create table data1 as select
a.ID, b.decison_CD, c.type, d.payId,
case when missing(d.payId) then 0 else
case when d.payID = 1 then 1 end end as booked
from dataA as a
left join dataB (where = (decision_CD in('Y','N'))) as b on a.id = b.id
left join dataC (where = (type notin('Unknown'))) as c on a.id = c.id
left join datanew as d on a.id = d.id;
quit;