Compare Tables in BigQuery

Compare Tables in BigQuery - compare

How would I compare two tables (Table1 and Table2) and find all the new entries or changes in Table2.
Using SQL Server I can use
Select * from Table1
Except
Select * from Table2
Here a sample of what I want
Table1
A | 1
B | 2
C | 3
Table2
A | 1
B | 2
C | 2
D | 4
So, if I comparing the two tables I want my results to show me the following
C | 2
D | 4
I tried a few statements with no luck.

Now that I have your actual sample dataset, I can write a query that finds every domain in one table that is not on the other table:
https://bigquery.cloud.google.com/table/inbound-acolyte-377:demo.1024 has 24,729,816 rows. https://bigquery.cloud.google.com/table/inbound-acolyte-377:demo.1025 has 24,732,640 rows.
Let's look at everything in 1025 that is not in 1024:
SELECT a.domain
FROM [inbound-acolyte-377:demo.1025] a
LEFT OUTER JOIN EACH [inbound-acolyte-377:demo.1024] b
ON a.domain = b.domain
WHERE b.domain IS NULL
Result: 39,629 rows.
(8.1s elapsed, 2.04 GB processed)

To get the differences (given that tkey is your unique row identifier):
SELECT a.tkey, a.name, b.name
FROM [your.tableold] a
JOIN EACH [your.tablenew] b
ON a.tkey = b.tkey
WHERE a.name != b.name
LIMIT 100
For the new rows, one way is the one you proposed:
SELECT col1, col2
FROM table2
WHERE col1 NOT IN
(SELECT col1 FROM Table1)
(you'll have to switch to a JOIN EACH when Table1 gets too large)

Related

Summarize a table based on a column in another table

I have two tables in Power BI model
Table A
value1
value2
value3
....
value 1000
Table B
value 1 | 10
value 2 | 10
value 1 | 50
value 3 | 10
value 1 | -10
value 2 | 70
Can I make a new column (or measure) in Table A to Sum UP connected values ???
Expected RESULT:
value 1 | 50 --- (10+50-10)
value 2 | 80 --- (10+70)
value 3 | 10 --- (10)
Just something like SUM.IF in Excel, which can I drag to all rows ? Thanks in advance.
I tried to CALCULATE, but I can't do this for all different rows in Table A

You don't need Table A for this. SUMMARIZE() will create a column of distinct values to group by. Use the following Calculated Table. Note that this is NOT a MEASURE!
Result =
SUMMARIZE(
'Table B',
'Table B'[value ID],
"Sum", SUM('Table B'[number])
)

Yes, Possible ! Just Add a new column on your table, and write this DAX Code after relationship is created!

Use DAX to get data between 2 tables

I have table 'tblA' with only 1 column named 'Value'
Value
1
2
The second table 'tblB' with several columns
Col1 Col2
Test A
Dump B
How can I have a join between them so that I will have new table with result like this (each value in tblA will fill in to all rows in tblB):
Col1 Col2 Value
Test A 1
Dump B 1
Test A 2
Dump B 2
I also tried to use for loop to get one-by-one value in tblA. But it seems that DAX didn't support loop.
Please advise.

Use expression for a calculated table
tblC = CROSSJOIN ( tblA, tblB )

Query exhausted resources on this scale factor

I am trying to left join a very big table (52 MIllion rows) to a massive table with 11,553,668,111 observations, but just two columns
Simple left join commands err out with "Query exhausted resources at this scale factor."
-- create smaller table to save $$
CREATE TABLE targetsmart_idl_data_mi_pa_maid AS
SELECT targetsmart_idl_data_pa_mi_pa.idl, targetsmart_idl_data_pa_mi_pa.grouping_indicator, targetsmart_idl_data_pa_mi_pa.vb_voterbase_dob, targetsmart_idl_data_pa_mi_pa.vb_voterbase_gender, targetsmart_idl_data_pa_mi_pa.ts_tsmart_urbanicity, targetsmart_idl_data_pa_mi_pa.ts_tsmart_high_school_only_score,
targetsmart_idl_data_pa_mi_pa.ts_tsmart_college_graduate_score, targetsmart_idl_data_pa_mi_pa.ts_tsmart_partisan_score, targetsmart_idl_data_pa_mi_pa.ts_tsmart_presidential_general_turnout_score, targetsmart_idl_data_pa_mi_pa.vb_voterbase_marital_status, targetsmart_idl_data_pa_mi_pa.vb_tsmart_census_id,
targetsmart_idl_data_pa_mi_pa.vb_voterbase_deceased_flag, idl_maid_base.maid
FROM targetsmart_idl_data_pa_mi_pa
LEFT JOIN idl_maid_base
ON targetsmart_idl_data_pa_mi_pa.idl = idl_maid_base.idl

I was able to overcome the issue by having the large table as driving table
For example.
select col1, col2 from table a join table b on a.col1 =b.col1
table a is small with less than 1000 records where as table b has millions of records. The above query error out
Re-write the query as
select col1, col2 from table b join table a on a.col1 =b.col1

Inputting missing value in primary dataset based on values in secondary dataset and a matching condition

my understanding of SAS is very elementary. I am trying to do something like this and i need help.
I have a primary dataset A with 20,000 observations where Col1 stores the CITY and Col2 stores the MILES. Col2 contains a lot of missing data. Which is as shown below.
+----------------+---------------+
| Col1 | Col2 |
+----------------+---------------+
| Gary,IN | 242.34 |
+----------------+---------------+
| Lafayette,OH | . |
+----------------+---------------+
| Ames, IA | 123.19 |
+----------------+---------------+
| San Jose,CA | 212.55 |
+----------------+---------------+
| Schuaumburg,IL | . |
+----------------+---------------+
| Santa Cruz,CA | 454.44 |
+----------------+---------------+
I have another secondary dataset B this has around 5000 observations and very similar to dataset A where Col1 stores the CITY and Col2 stores the MILES. However in this dataset B, Col2 DOES NOT CONTAIN MISSING DATA.
+----------------+---------------+
| Col1 | Col2 |
+----------------+---------------+
| Lafayette,OH | 321.45 |
+----------------+---------------+
| San Jose,CA | 212.55 |
+----------------+---------------+
| Schuaumburg,IL | 176.34 |
+----------------+---------------+
| Santa Cruz,CA | 454.44 |
+----------------+---------------+
My goal is to fill the missing miles in Dataset A based on the miles in Dataset B by matching the city names in col1.
In this example, I am trying to fill in 321.45 in Dataset A from Dataset B and similarly 176.34 by matching Col1 (city names) between the two datasets.
I am need help doing this in SAS

You just have to merge the two datasets. Note that values of Col1 needs to match exactly in the two datasets.
Also, I am assuming that Col1 is unique in dataset B. Otherwise you need to somehow tell more exactly what value you want to use or remove the duplicates (for example by adding nodupkey in proc sort statement).
Here is an example how to merge in SAS:
proc sort data=A;
by Col1;
proc sort data=B;
by Col1;
data AB;
merge A(in=a) B(keep=Col1 Col2 rename=(Col2 = Col2_new));
by Col1;
if a;
if missing(Col2) then Col2 = Col2_new;
drop Col2_new;
run;
This includes all observations and columns from dataset A. If Col2 is missing in A then we use the value from B.

Pekka's solution is perfectly working, I add an alternative solution for the sake of completeness.
Sometimes in SAS a PROC SQL lets you skip some passages compared to a DATA step (with the relative gain in storage resources and computational time), and a MERGE is a typical example.
Here you can avoid sorting both input datasets and handling the renaming of variables (here the matching key has the same name col1 but in general this is not the case).
proc sql;
create table want as
select A.col1,
coalesce(A.col2,B.col2) as col2
from A left join B
on A.col1=B.col1
order by A.col1;
quit;
The coalesce() function returns the first non missing element encountered in the arguments list.

Kettle Pentaho - Getting records from two table which does not match on common Key (Merge)

I have two tables in transformation and I need to get data from two tables which does not meet on common key. i.e I am doing join on table A and B
from table A I need those records which are not present in table B.
it will be helpful if someone can tell me what step I can use in Kettle spoon to do above transformation

You can achieve this with the Merge Join step. Under Join Type choose LEFT OUTER. After this step your results will look like this:
key_a|value_a|key_b|value_b
1 | 1| null | null
2 | 2 | null | null
3 | 3| 3| 3|
Then choose the Filter rows step and set key_b as the field and the condition to IS NULL.
If you also need records where key_a does not match key_b, choose the Join Type as FULL OUTER.
If both your tables are in a database of the same type, this can easily be achieved by using the Table input step and doing the join in the query itself:
SELECT table_a.key
, table_a.value
FROM table_a
LEFT JOIN table_b
ON table_a.key = table_b.key
WHERE table_b.key IS NULL

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Compare Tables in BigQuery - compare

Related

Summarize a table based on a column in another table

Use DAX to get data between 2 tables

Query exhausted resources on this scale factor

Inputting missing value in primary dataset based on values in secondary dataset and a matching condition

Kettle Pentaho - Getting records from two table which does not match on common Key (Merge)

Categories

Resources