SELECT MAX PARTITION TABLE - google-cloud-platform

SELECT MAX PARTITION TABLE - google-cloud-platform

I have a table with partition on date(transaction_time), And I have a
problem with a select MAX.
I'm trying to get the row with the highest timestamp if I get more then 1 row in the result on one ID.
Example of data:
1. ID = 1 , Transaction_time = "2018-12-10 12:00:00"
2. ID = 1 , Transaction_time = "2018-12-09 12:00:00"
3. ID = 2 , Transaction_time = "2018-12-10 12:00:00"
4. ID = 2 , Transaction_time = "2018-12-09 12:00:00"
Result that I want:
1. ID = 1 , Transaction_time = "2018-12-10 12:00:00"
2. ID = 2 , Transaction_time = "2018-12-10 12:00:00"
This is my query
SELECT ID, TRANSACTION_TIME FROM `table1` AS T1
WHERE TRANSACTION_TIME = (SELECT MAX(TRANSACTION_TIME)
FROM `table1` AS T2
WHERE T2.ID = T1.ID )
The error I receive:
Error: Cannot query over table 'table1' without a filter over
column(s) 'TRANSACTION_TIME' that can be used for partition
elimination

It looks like BigQuery does not the correlated subquery in the WHERE clause. I don't know how to fix your current approach, but you might be able to just use ROW_NUMBER here:
SELECT t.ID, t.TRANSACTION_TIME
FROM
(
SELECT ID, TRANSACTION_TIME,
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY TRANSACTION_TIME DESC) rn
FROM table1
) t
WHERE rn = 1;

can be done this way:
SELECT id, MAX(transaction_time) FROM `table1` GROUP BY id;

Related

BigQuery compare all the columns(100+) from two rows in a sinle table

I have input table as below-
id
col1
col2
time
01
abc
001
12:00
01
def
002
12:10
Required output table-
id
col1
col2
time
diff_field
01
abc
001
12:00
null
01
def
002
12:10
col1,col2
I need to compare both the rows and find all the columns for which there is difference in value and keep those column names in a new column diff_field.
I need a optimized solution for this as my table has more than 100 columns(all the columns need to be compared)

You might consider below approach:
WITH sample_table AS (
SELECT '01' id, 'abc' col1, '001' col2, '12:00' time UNION ALL
SELECT '01' id, 'def' col1, '002' col2, '12:10' time UNION ALL
SELECT '01' id, 'def' col1, '002' col2, '12:20' time UNION ALL
SELECT '01' id, 'ddf' col1, '002' col2, '12:30' time
)
SELECT * EXCEPT(curr, prev),
(SELECT STRING_AGG('col' || offset)
FROM UNNEST(SPLIT(curr)) c WITH offset
JOIN UNNEST(SPLIT(prev)) p WITH offset USING (offset)
WHERE c <> p AND offset < ARRAY_LENGTH(SPLIT(curr)) - 1
) diff_field
FROM (
SELECT *, FORMAT('%t', t) AS curr, LAG(FORMAT('%t', t)) OVER w AS prev
FROM sample_table t
WINDOW w AS (PARTITION BY id ORDER BY time)
);
Query results

Below approach has no dependency on actual columns' names or any names convention rather then only id and time
create temp function extract_keys(input string) returns array<string> language js as """
return Object.keys(JSON.parse(input));
""";
create temp function extract_values(input string) returns array<string> language js as """
return Object.values(JSON.parse(input));
""";
select t.*,
( select string_agg(col)
from unnest(extract_keys(cur)) as col with offset
join unnest(extract_values(cur)) as cur_val with offset using(offset)
join unnest(extract_values(prev)) as prev_val with offset using(offset)
where cur_val != prev_val and col != 'time'
) as diff_field
from (
select t, to_json_string(t) cur, to_json_string(ifnull(lag(t) over(win), t)) prev
from your_table t
window win as (partition by id order by time)
)
if apply to sample data in your question (or rather extended version of it that I borrowed from Jaytiger answer) - the output is

PowerBi Dax subquery with filter

any help on this is greatly appreciated.
I'm looking to count the rows in a table where the PersonID is in a subquery list with a filter condition. In SQL it would look like this.
select count(*)
from tableA
where PersonId in(select distinct PersonId from tableA where CallResult = 1)
tableA has the same PersonId multiple times for each day and I'm looking to count how many times that PersonId is in the table but only if the PersonId has CallResult = 1 in any row within the table. There are other PersonId that don't have CallResult = 1 and I'm not looking to count those.
Maybe I'm overthinking this one but dax isn't my strength
PersonId | CallResult | CallNumber |
AB12 1 3
AB12 0 2
AB12 0 1
CD21 0 2
CD21 0 1
EF32 1 2
EF32 0 1
In this example I would expect the subquery to return AB12 and EF32 and the count to be 5 (3+2) calls

I may have figured it out. Posting here in case it helps anyone.
Count = CALCULATE(
SUM(TableA[Calls]),
CALCULATETABLE(
SUMMARIZE(TableA,TablA[PersonID]),
TableA[CallResult]=1
)
)

How to do the count

I've a code which will provide me count of records. Addition to this I need to know which level the 'Vinod' is there (whether it is in second level, third level or fourth level'). This information needed in label.
Select 'Open' label, count(*) value from DATA
WHERE SECOND_LEVEL = 'vinod' or THIRD_LEVEL = 'vinod' or FOURTH_LEVEL = 'vinod'

See whether this helps; lvl represents your second to fourth level.
with temp as
(select 2 lvl, count(*) cnt
from data
where second_level = 'vinod'
union all
select 3 lvl, count(*) cnt
from data
where third_level = 'vinod'
union all
select 4 lvl, count(*) cnt
from data
where fourth_level = 'vinod'
)
select lvl, sum(cnt) sum_cnt
from temp
group by lvl;
Or this (which might perform better, as the above example queries the same table 3 times):
select case when second_level = 'vinod' then 2
when third_level = 'vinod' then 3
when fourth_level = 'vinod' then 4
end lvl,
count(*) cnt
from data
where 'vinod' in (second_level, third_level, fourth_level)
group by case when second_level = 'vinod' then 2
when third_level = 'vinod' then 3
when fourth_level = 'vinod' then 4
end;

SAS: merge/ join datasets by dynamic columns in the look up table

I want to join sas dataset with the look up table but the column/key for joining is a value in the look up table
Dataset: table4
ID lev1 lev2 lev3 lev4 lev5
1 12548 14589 85652 45896 45889
2 12548 14589 85652 45896 45890
3 12548 14547 85685 45845 45825
4 66588 24647 55255 30895 15764
Look up table:
context table_name column operator value
extract table1 col1 equals xyd
asset table2 var1 equals 11111
asset table2 var2 equals 25858
prod table3 x1 equals 87999
unprod table4 lev2 equals 14589
unprod table4 lev2 equals 14589
unprod table4 lev3 equals 55255
Now I want to join table4 with lookup table but it is only possible with fields lev2 and lev3(it is dynamic so could be changed in the future, so don't want to hardcode in it).
I have tried below code but doesn't want to hard code as the fields are dynamic( someone might add lev4 as well in future).
proc sql ;
create table want as
select ID
from table4 as a
inner join lookup as b
on a.lev2 = input(value,12.) or a.lev3=input(value,12.)
where Context="unprod";
quit;
Thanks heaps in advance.

That does not look like a lookup table. It appears to be a set of rules. You could use it to generate code. Let's simplify the process by making the table contain actual code instead of three columns. But you could easily write the code to convert from your current format into code strings.
data rules ;
infile cards truncover ;
input context $ table_name $ rule $100. ;
cards;
extract table1 col1 = xyd
asset table2 var1 = 11111
asset table2 var2 = 25858
prod table3 x1 = 87999
unprod table4 lev2 = 14589
unprod table4 lev2 = 14589
unprod table4 lev3 = 55255
;
So now it looks like you want to take the rules that have a specific value of CONTEXT and use that to generate a new dataset from the dataset named in TABLE_NAME. Not sure what name you want to use for the generated table or what you want to do when more than one table is mentioned in the same "context".
%let context=unprod ;
filename code temp;
data _null_;
set rules ;
where context=symget('context');
by table_name ;
file code ;
if first.table_name then table_no+1;
if first.table_name then put
'data want' table_no ';'
/ ' set ' table_name ';'
/ ' where 1=0'
;
put ' or (' rule ')' ;
if last.table_name then put
';'
/ 'run;'
;
run;
%include code / source2 ;
Which results in code like this:
130 +data want1 ;
131 + set table4 ;
132 + where 1=0
133 + or (lev2 = 14589 )
134 + or (lev2 = 14589 )
135 + or (lev3 = 55255 )
136 +;
137 +run;
NOTE: There were 3 observations read from the data set WORK.TABLE4.
WHERE (lev2=14589) or (lev3=55255);
NOTE: The data set WORK.WANT1 has 3 observations and 6 variables.

Here is a sample code that would get what I understood you are trying to do. This code is based on the comment by #Reeza. If this is not what you are trying to do, please send a sample output file.
data table4;
input ID $ lev1 $ lev2 $ lev3 $ lev4 $ lev5 $;
datalines;
1 12548 14589 85652 45896 45889
2 12548 14589 85652 45896 45890
3 12548 14547 85685 45845 45825
4 66588 24647 55255 30895 15764
;
run;
data look_up;
input context $ table_name $ column $ operator $ value $;
datalines;
extract table1 col1 equals xyd
asset table2 var1 equals 11111
asset table2 var2 equals 25858
prod table3 x1 equals 87999
unprod table4 lev2 equals 14589
unprod table4 lev2 equals 14589
unprod table4 lev3 equals 55255
;
run;
PROC transpose DATA=work.table4 out=temp1 prefix=value;
by ID;
VAR lev1-lev5;
run;
proc sql;
create table want as
select a.*, b.ID
from look_up as a
inner join temp1 as b
on a.value=b.value1 and a.column=_Name_;
quit;

Extract string from a large string oracle regexp

I have String as below.
select b.col1,a.col2,lower(a.col3) from table1 a inner join table2 b on a.col = b.col and a.col = b.col
inner join (select col1, col2, col3,col4 from tablename ) c on a.col1=b.col2
where
a.col = 'value'
Output need to be table1,table2 and tablename from above string. please let me know the regex to get the result.

Should be a simple one :-)
SQL> WITH DATA AS(
2 select q'[select b.col1,a.col2,lower(a.col3) from table1 a inner join table2 b on
3 a.col = b.col and a.col = b.col inner join (select col1, col2, col3,col4 from tablename )
4 c on a.col1=b.col2 where a.col = 'value']' str
5 FROM DUAL)
6 SELECT LISTAGG(TABLE_NAMES, ' , ') WITHIN GROUP (
7 ORDER BY val) table_names
8 FROM
9 (SELECT 1 val,
10 regexp_substr(str,'table[[:alnum:]]+',1,level) table_names
11 FROM DATA
12 CONNECT BY level <= regexp_count(str,'table')
13 )
14 /
TABLE_NAMES
--------------------------------------------------------------------------------
table1 , table2 , tablename
SQL>
Brief explanation, so that OP/even others might find it useful :
The REGEXP_SUBSTR looks for the words 'table', it could be followed
by a number or string like 1,2, name etc.
To find all such words, I used connect by level technique, but it
gives the output in different rows.
Finally, to put them in a single row as comma separated values, I
used LISTAGG.
Oh yes, and that q'[]' is the string literal technique.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

SELECT MAX PARTITION TABLE - google-cloud-platform

can be done this way: SELECT id, MAX(transaction_time) FROM `table1` GROUP BY id;

Related

BigQuery compare all the columns(100+) from two rows in a sinle table

PowerBi Dax subquery with filter

How to do the count

SAS: merge/ join datasets by dynamic columns in the look up table

Extract string from a large string oracle regexp

Categories

Resources