How to perform Left Anti Join in AWS Athena DB? or any alternative to it? - amazon-athena

How to do left anti join in AWS Athena DB? I have googled it and i didn't get any help. Or any alternative solution would be appreciated.
I have 2 tables emp and dept and i want to do left anti join with these tables using columns "emp_dept_id" and "dept_id".
I need a query for Athena.

Here is a left anti-join query per your request:
SELECT e.*
FROM Emp e
LEFT JOIN Dept d
ON d.dept_id = e.emp_dept_id
WHERE d.dept_id IS NULL;
Note that you could also express the above using exists logic:
SELECT e.*
FROM Emp e
WHERE NOT EXISTS (
SELECT 1
FROM Dept d
d.dept_id = e.emp_dept_id
);

Related

Informatica Cloud Data Integration - find non matching rows

I am working on Informatica Cloud Data
Integraion.I have 2 tables- Tab1 and Tab2.The joining column is id.I want to find all records in Tab1 that do not exist in Tab2.What transformations can I use to achieve this?
Tab1
id name
1 n1
2 n2
3 n3
Tab2
id
1
5
6
I want to get records with id 2 and 3 from tab1 as they do not exist in tab2
You can use database source qualifier overwrite sql
Select * from table1 where id not in ( select id from table2)
Or else you can use informatica like below.
Do a lookup on table2, on join condition on id.
In exp transformation, create a flag
out_flag= iif(isnull (:lkp(id)),'pass','fail')
Put a filter next and keep the condition as out_flag= 'pass'
Whole map should be like this
Lkp
|
Sq --exp|-----> fil---tgt

Join 2 tables results in query timeout

I have a few tables created in AWS Athena under "TestDB". These tables are created by running an AWS Glue crawler through the S3 buckets. I am trying to create a new table by joining 2 existing tables under "TestDB". It is a simple left outer join as follows:
CREATE TABLE IF NOT EXISTS TestTab1
AS (
SELECT *
FROM (
(
SELECT col1, col2, col3, col4
FROM "TestDB"."tab1" a
WHERE a.partition_0 = '10-24-2021'
AND substring(a.datetimestamp, 1, 10) = '2021-10-24'
)
LEFT OUTER JOIN (
SELECT col1, col2, col3,col4
FROM "TestDB"."tab2" b
WHERE b.partition_0 = '10-24-2021'
AND substring(b.datetimestamp,1,10) = '2021-10-24'
)
ON (a.col1 = b.col1)
)
)
The query scans around 5GB worth of data but times out after ~30 mins since that is the timeout limit. Other than requesting an increase in timeout limit, is there any other way to create a join of 2 tables on AWS?
It's very hard to say from the information you provide, but it's probably down to the result becoming very big or an intermediate result becoming big enough for the executors to run out of memory and having to spill to disk.
Does running just the query work? You can also try to run EXPLAIN SELECT … to get the query plan and see if that tells you anything.
Your query is unnecessarily complex with multiple nested SELECT statements. I think Athena's query planner will be smart enough to rewrite it to something like the following, which is easier to read and understand:
CREATE TABLE IF NOT EXISTS TestTab1 AS
SELECT col1, col2, col3, col4
FROM "TestDB"."tab1" a LEFT JOIN "TestDB"."tab2" b USING (col1)
WHERE a.partition_0 = '10-24-2021'
AND b.partition_0 = '10-24-2021'
AND substring(a.datetimestamp, 1, 10) = '2021-10-24'
AND substring(b.datetimestamp, 1, 10) = '2021-10-24'

How to do an Update from a Select in Azure?

I need to update a second table with the results of this query:
SELECT Tag, battery, Wearlevel, SensorTime
FROM (
SELECT m.* , ROW_NUMBER() OVER (PARTITION BY TAG ORDER BY SensorTime DESC) AS rn
FROM [dbo].[TELE] m
) m2
where m2.rn = 1;
But. I had a hard time fixing the SET without messing it up. I want to have a table which has all data from last date of each TAG without duplicates.
Below code maybe you want.
UPDATE
Table_A
SET
Table_A.Primarykey = 'ss'+Table_B.Primarykey,
Table_A.AddTime = 'jason_'+Table_B.AddTime
FROM
Test AS Table_A
INNER JOIN UsersInfo AS Table_B
ON Table_A.id = Table_B.id
WHERE
Table_A.Primarykey = '559713e6-0d85-4fe7-87a4-e9ceb22abdcf'
For more details, you also can refer below posts and blogs.
1. How do I UPDATE from a SELECT in SQL Server?
2. How to UPDATE from SELECT in SQL Server

Exasol Update Table using subselect

I got this statement, which works in Oracle:
update table a set
a.attribute =
(select
round(sum(r.attribute1),4)
from table2 p, table3 r
where 1 = 1
and some joins
)
where 1 = 1
and a.attribute3 > 10
;
Now I would like to do the same statement in Exasol DB. But I got error [Code: 0, SQL State: 0A000] Feature not supported: this kind of correlated subselect (Session: 1665921074538906818)
After some research, I found out you need to write the query in following syntax:
UPDATE table a
set a.attribute = r.attribute2
FROM table a, table2 p, table3 r
where 1 = 1
and some joins
and a.attribute3 > 10;
The problem is I can't take sum of r.attribute2. So I get unstable set of rows. Is there any way to do the first query in Exasol DB?
Thanks for help guys!
Following SQL UPDATE statement will work for cases if JOIN between table1 and table2 are 1-to-1 (or if there is a 1-to-1 relation between target table and resultset of JOINs)
In this case target table val column is updated otherwise an error is returned
UPDATE table1 AS a
SET a.val = table2.val
FROM table1, table2
WHERE table1.id = table2.id;
On the other hand, if the join is causing multiple returns for single table1 rows, then the unstable error raised.
If you want to sum the column values of the multiplying rows, maybe following approach can help
First sum all rows of table2 in bases of table1 and use this sub-select as a new temp table, then use this in UPDATE FROM statement
UPDATE table1 AS a
SET a.val = table2.val
FROM table1
INNER JOIN (
select id, sum(val) val from table2 group by id
) table2
ON table1.id = table2.id;
I tried to solve the issue using two tables
In your case probably you will use table2 and table3 in the subselect statement
I hope this is the answer you were looking for

Amazon Athena LEFT OUTER JOIN query not working as expected

I am trying to do a left ourter join in Athena and my query looks like the following:
SELECT customer.name, orders.price
FROM customer LEFT OUTER JOIN order
ON customer.id = orders.customer_id
WHERE price IS NULL;
Where each customer could only have one order in the orders table at most and there are customers with no order in the orders table at all. So I am expecting to get some number of records where there is a customer in the customer table with no records in orders table which means when I do LEFT OUTER JOIN the price will be NULL. But this query returns 0 every time I run it. I have queries both tables separately and pretty sure there is data in both but not sure why this is returning zero where it works if I remove the price IS NULL. I have also tried price = '' and price IN ('') and none of them works. Has anyone here had a similar experience before? Or is there something wrong with my query that I can not see or identify?
It seems that your query is correct. To validate, I created two CTEs that should match up with your customer and orders table and ran your query against them. When running the query below, it returns a record for customer 3 Ted Johnson who did not have an order.
WITH customer AS (
SELECT 1 AS id, 'John Doe' AS name
UNION
SELECT 2 AS id, 'Jane Smith' AS name
UNION
SELECT 3 AS id, 'Ted Johnson' AS name
),
orders AS (
SELECT 1 AS customer_id, 20 AS price
UNION
SELECT 2 AS customer_id, 15 AS price
)
SELECT customer.name, orders.price
FROM customer LEFT OUTER JOIN orders
ON customer.id = orders.customer_id
WHERE price IS NULL;
I'd suggest running the following queries:
COUNT(DISTINCT id) FROM customers;
COUNT(DISTINCT customer_id) FROM orders;
Based on the results you are seeing, I would expect those counts to match. Perhaps your system is creating a record in the orders table whenever a customer is created with a price of 0.
Probably you can't use where for order table.
SELECT customer.name, order.price
FROM customer LEFT OUTER JOIN order
ON customer.id = orders.customer_id AND order.price IS NULL;