I have to convert below SQL Query to informatica cloud logic - informatica

Out put of joiner transformation(TestData table name in SQL Query) as below
I need to load data in to target as below
I wrote SQL Query like
SELECT * FROM TestData AS A
WHERE SourceSystem = 'NC' AND
EXISTS
(SELECT * FROM TestData AS B
WHERE B.ClusterID = A.ClusterID AND B.SourceString = '50012559'
AND B.SourceSystem = 'ACB'
)
can you help me how to covert this SQL Query to Informatica cloud .

Assuming testData is a joiner output here is the solution. I assumed, cluster id is the unique key. if it not unique key, then this solution will cause duplicates.
First, sort your joiner output TestData by ClusterID.
Then put a filter(filter1) on SourceSystem = 'NC' and create pipeline 1.
Connect another filter(filter2) to sorter on SourceString = '50012559' AND B.SourceSystem = 'ACB' and create pipeline 2.
Add another joiner - conditions will be pipeline1.ClusterID =pipeline2.ClusterID.
The output from joiner will be your desired data.
This is how the mapping would look like -
|-Filter 1 ->|
TestData_Joiner... -SRT_ClusterID-->|-Filter 2 ->| -JNR_ClusterID-> <desired output>
Pls note, this will generate data just like below SQL.
``
SELECT * FROM TestData AS A WHERE SourceSystem = 'NC' AND EXISTS (SELECT * FROM TestData AS B WHERE B.ClusterID = A.ClusterID AND B.SourceString = '50012559' AND B.SourceSystem = 'ACB' )

Related

Why does left join in redshift not working?

We are facing a weird issue with Redshift and I am looking for help to debug it please. Details of the issue are following:
I have 2 tables and I am trying to perform left join as follows:
select count(*)
from abc.orders ot
left outer join abc.events e on **ot.context_id = e.context_id**
where ot.order_id = '222:102'
Above query returns ~7000 records. Looks like it is performing default join as we have only 1 record in [Orders] table with Order ID = ‘222:102’
select count(*)
from abc.orders ot
left outer join abc.events e on **ot.event_id = e.event_id**
where ot.order_id = '222:102'
Above query returns 1 record correctly. If you notice, I have just changed column for joining 2 tables. Event_ID in [Events] table is identity column but I thought I should get similar records even if I use any other column like Context_ID.
Further, I tried following query under the impression it should return all the ~7000 records as I am using default join but surprisingly it returned only 1 record.
select count(*)
from abc.orders ot
**join** abc.events e on ot.event_id = e.event_id
where ot.order_id = '222:102'
Following are the Redshift database details:
Cutdown version of table metadata:
CREATE TABLE abc.orders (
order_id character varying(30) NOT NULL ENCODE raw,
context_id integer ENCODE raw,
event_id character varying(21) NOT NULL ENCODE zstd,
FOREIGN KEY (event_id) REFERENCES events_20191014(event_id)
)
DISTSTYLE EVEN
SORTKEY ( context_id, order_id );
CREATE TABLE abc.events (
event_id character varying(21) NOT NULL ENCODE raw,
context_id integer ENCODE raw,
PRIMARY KEY (event_id)
)
DISTSTYLE ALL
SORTKEY ( context_id, event_id );
Database: Amazon Redshift cluster
I think, I am missing something essential while joining the tables. Could you please guide me in right direction?
Thank you

saving athena results to another table with partitions

From AWS Athena
I am trying to concatenate multiple tables then save it with partitoned key.
after running
select *
from t1
union all
select *
from t2
select *
from t3
create table on the console creates query like this,
create table db.table_name
with(
format='parquet',
external_location=...
) AS
select *
from t1
union all
select *
from t2
select *
from t3;
But I want to add partitoned by column. I've tried
adding partitioned by on top and bottom. Also saved query result then created new table from that using CREATE EXTERNAL TABLE command (this works but return empty row -> even after running MSCK REPAIR)
From https://aws.amazon.com/premiumsupport/knowledge-center/athena-create-use-partitioned-tables/ It seems like I need to save data into S3 by partitions so in bucket1 it will have bucket1/2021, bucket1/2022 if 'year' column is the partition column. Correct? If yes, is there efficient to create partitioned buckets?
I have successfully created new, partitioned tables by using this method:
CREATE TABLE my_table
WITH (
format = 'PARQUET',
parquet_compression = 'SNAPPY',
external_location = 's3://bucket/folder/',
partitioned_by = ARRAY['year']
)
AS
SELECT
...

How to add a partition boundary only when not exists in SQL Data Warehouse?

I am using Azure SQL Data Warehouse Gen 1, and I create a partition table like this
CREATE TABLE [dbo].[StatsPerBin1](
[Bin1] [varchar](100) NOT NULL,
[TimeWindow] [datetime] NOT NULL,
[Count] [int] NOT NULL,
[Timestamp] [datetime] NOT NULL)
WITH
(
DISTRIBUTION = HASH ( [Bin1] ),
CLUSTERED INDEX([Bin1]),
PARTITION
(
[TimeWindow] RANGE RIGHT FOR VALUES ()
)
)
How should I split a partition only when there is no such boundary?
First I think if I can get partition boundaries by table name, then I can write a if statement to determine add partition boundary or not.
But I cannot find a way to associate a table with its corresponding partition values, the partition values of all partitions can be retrieved by
SELECT * FROM sys.partition_range_values
But it only contains function_id as identifier which I don't know how to join other tables so that I can get partition boundaries by table name.
Have you tried joining sys.partition_range_values with sys.partition_functions view?
Granted we cannot create partition functions in SQL DW, but the view seems to be still supported.
I know this is an out of date question, but I was having the same problem. Here is a query I ended up with that can get you started. It is modified slightly from a query for SQL Server documentation:
SELECT s.[name] AS [schema_name]
, t.[name] AS [table_name]
, p.[partition_number] AS [partition_number]
, rv.[value] AS [partition_boundary_value]
, p.[data_compression_desc] AS [partition_compression_desc]
FROM sys.schemas s
JOIN sys.tables t ON t.[schema_id] = s.[schema_id]
JOIN sys.partitions p ON p.[object_id] = t.[object_id]
JOIN sys.indexes i ON i.[object_id] = p.[object_id]
AND i.[index_id] = p.[index_id]
JOIN sys.data_spaces ds ON ds.[data_space_id] = i.[data_space_id]
LEFT JOIN sys.partition_schemes ps ON ps.[data_space_id] = ds.[data_space_id]
LEFT JOIN sys.partition_functions pf ON pf.[function_id] = ps.[function_id]
LEFT JOIN sys.partition_range_values rv ON rv.[function_id] = pf.[function_id]
AND rv.[boundary_id] = p.[partition_number]

CFQuery - Update a table by comparing it to another table [duplicate]

I have a database with account numbers and card numbers. I match these to a file to update any card numbers to the account number so that I am only working with account numbers.
I created a view linking the table to the account/card database to return the Table ID and the related account number, and now I need to update those records where the ID matches the Account Number.
This is the Sales_Import table, where the account number field needs to be updated:
LeadID
AccountNumber
147
5807811235
150
5807811326
185
7006100100007267039
And this is the RetrieveAccountNumber table, where I need to update from:
LeadID
AccountNumber
147
7006100100007266957
150
7006100100007267039
I tried the below, but no luck so far:
UPDATE [Sales_Lead].[dbo].[Sales_Import]
SET [AccountNumber] = (SELECT RetrieveAccountNumber.AccountNumber
FROM RetrieveAccountNumber
WHERE [Sales_Lead].[dbo].[Sales_Import]. LeadID =
RetrieveAccountNumber.LeadID)
It updates the card numbers to account numbers, but the account numbers get replaced by NULL
I believe an UPDATE FROM with a JOIN will help:
MS SQL
UPDATE
Sales_Import
SET
Sales_Import.AccountNumber = RAN.AccountNumber
FROM
Sales_Import SI
INNER JOIN
RetrieveAccountNumber RAN
ON
SI.LeadID = RAN.LeadID;
MySQL and MariaDB
UPDATE
Sales_Import SI,
RetrieveAccountNumber RAN
SET
SI.AccountNumber = RAN.AccountNumber
WHERE
SI.LeadID = RAN.LeadID;
The simple Way to copy the content from one table to other is as follow:
UPDATE table2
SET table2.col1 = table1.col1,
table2.col2 = table1.col2,
...
FROM table1, table2
WHERE table1.memberid = table2.memberid
You can also add the condition to get the particular data copied.
For SQL Server 2008 + Using MERGE rather than the proprietary UPDATE ... FROM syntax has some appeal.
As well as being standard SQL and thus more portable it also will raise an error in the event of there being multiple joined rows on the source side (and thus multiple possible different values to use in the update) rather than having the final result be undeterministic.
MERGE INTO Sales_Import
USING RetrieveAccountNumber
ON Sales_Import.LeadID = RetrieveAccountNumber.LeadID
WHEN MATCHED THEN
UPDATE
SET AccountNumber = RetrieveAccountNumber.AccountNumber;
Unfortunately the choice of which to use may not come down purely to preferred style however. The implementation of MERGE in SQL Server has been afflicted with various bugs. Aaron Bertrand has compiled a list of the reported ones here.
Generic answer for future developers.
SQL Server
UPDATE
t1
SET
t1.column = t2.column
FROM
Table1 t1
INNER JOIN Table2 t2
ON t1.id = t2.id;
Oracle (and SQL Server)
UPDATE
t1
SET
t1.colmun = t2.column
FROM
Table1 t1,
Table2 t2
WHERE
t1.ID = t2.ID;
MySQL
UPDATE
Table1 t1,
Table2 t2
SET
t1.column = t2.column
WHERE
t1.ID = t2.ID;
For PostgreSQL:
UPDATE Sales_Import SI
SET AccountNumber = RAN.AccountNumber
FROM RetrieveAccountNumber RAN
WHERE RAN.LeadID = SI.LeadID;
Seems you are using MSSQL, then, if I remember correctly, it is done like this:
UPDATE [Sales_Lead].[dbo].[Sales_Import] SET [AccountNumber] =
RetrieveAccountNumber.AccountNumber
FROM RetrieveAccountNumber
WHERE [Sales_Lead].[dbo].[Sales_Import].LeadID = RetrieveAccountNumber.LeadID
I had the same problem with foo.new being set to null for rows of foo that had no matching key in bar. I did something like this in Oracle:
update foo
set foo.new = (select bar.new
from bar
where foo.key = bar.key)
where exists (select 1
from bar
where foo.key = bar.key)
Here's what worked for me in SQL Server:
UPDATE [AspNetUsers] SET
[AspNetUsers].[OrganizationId] = [UserProfile].[OrganizationId],
[AspNetUsers].[Name] = [UserProfile].[Name]
FROM [AspNetUsers], [UserProfile]
WHERE [AspNetUsers].[Id] = [UserProfile].[Id];
For MySql that works fine:
UPDATE
Sales_Import SI,RetrieveAccountNumber RAN
SET
SI.AccountNumber = RAN.AccountNumber
WHERE
SI.LeadID = RAN.LeadID
Thanks for the responses. I found a solution tho.
UPDATE Sales_Import
SET AccountNumber = (SELECT RetrieveAccountNumber.AccountNumber
FROM RetrieveAccountNumber
WHERE Sales_Import.leadid =RetrieveAccountNumber.LeadID)
WHERE Sales_Import.leadid = (SELECT RetrieveAccountNumber.LeadID
FROM RetrieveAccountNumber
WHERE Sales_Import.leadid = RetrieveAccountNumber.LeadID)
In case the tables are in a different databases. (MSSQL)
update database1..Ciudad
set CiudadDistrito=c2.CiudadDistrito
FROM database1..Ciudad c1
inner join
database2..Ciudad c2 on c2.CiudadID=c1.CiudadID
Use the following block of query to update Table1 with Table2 based on ID:
UPDATE Sales_Import, RetrieveAccountNumber
SET Sales_Import.AccountNumber = RetrieveAccountNumber.AccountNumber
where Sales_Import.LeadID = RetrieveAccountNumber.LeadID;
This is the easiest way to tackle this problem.
MS Sql
UPDATE c4 SET Price=cp.Price*p.FactorRate FROM TableNamea_A c4
inner join TableNamea_B p on c4.Calcid=p.calcid
inner join TableNamea_A cp on c4.Calcid=cp.calcid
WHERE c4..Name='MyName';
Oracle 11g
MERGE INTO TableNamea_A u
using
(
SELECT c4.TableName_A_ID,(cp.Price*p.FactorRate) as CalcTot
FROM TableNamea_A c4
inner join TableNamea_B p on c4.Calcid=p.calcid
inner join TableNamea_A cp on c4.Calcid=cp.calcid
WHERE p.Name='MyName'
) rt
on (u.TableNamea_A_ID=rt.TableNamea_B_ID)
WHEN MATCHED THEN
Update set Price=CalcTot ;
update from one table to another table on id matched
UPDATE
TABLE1 t1,
TABLE2 t2
SET
t1.column_name = t2.column_name
WHERE
t1.id = t2.id;
The below SQL someone suggested, does NOT work in SQL Server. This syntax reminds me of my old school class:
UPDATE table2
SET table2.col1 = table1.col1,
table2.col2 = table1.col2,
...
FROM table1, table2
WHERE table1.memberid = table2.memberid
All other queries using NOT IN or NOT EXISTS are not recommended. NULLs show up because OP compares entire dataset with smaller subset, then of course there will be matching problem. This must be fixed by writing proper SQL with correct JOIN instead of dodging problem by using NOT IN. You might run into other problems by using NOT IN or NOT EXISTS in this case.
My vote for the top one, which is conventional way of updating a table based on another table by joining in SQL Server. Like I said, you cannot use two tables in same UPDATE statement in SQL Server unless you join them first.
This is the easiest and best have seen for Mysql and Maria DB
UPDATE table2, table1 SET table2.by_department = table1.department WHERE table1.id = table2.by_id
Note: If you encounter the following error based on your Mysql/Maria DB version "Error Code: 1175. You are using safe update mode and you tried to update a table without a WHERE that uses a KEY column To disable safe mode, toggle the option in Preferences"
Then run the code like this
SET SQL_SAFE_UPDATES=0;
UPDATE table2, table1 SET table2.by_department = table1.department WHERE table1.id = table2.by_id
it works with postgresql
UPDATE application
SET omts_received_date = (
SELECT
date_created
FROM
application_history
WHERE
application.id = application_history.application_id
AND application_history.application_status_id = 8
);
update within the same table:
DECLARE #TB1 TABLE
(
No Int
,Name NVarchar(50)
,linkNo int
)
DECLARE #TB2 TABLE
(
No Int
,Name NVarchar(50)
,linkNo int
)
INSERT INTO #TB1 VALUES(1,'changed person data', 0);
INSERT INTO #TB1 VALUES(2,'old linked data of person', 1);
INSERT INTO #TB2 SELECT * FROM #TB1 WHERE linkNo = 0
SELECT * FROM #TB1
SELECT * FROM #TB2
UPDATE #TB1
SET Name = T2.Name
FROM #TB1 T1
INNER JOIN #TB2 T2 ON T2.No = T1.linkNo
SELECT * FROM #TB1
I thought this is a simple example might someone get it easier,
DECLARE #TB1 TABLE
(
No Int
,Name NVarchar(50)
)
DECLARE #TB2 TABLE
(
No Int
,Name NVarchar(50)
)
INSERT INTO #TB1 VALUES(1,'asdf');
INSERT INTO #TB1 VALUES(2,'awerq');
INSERT INTO #TB2 VALUES(1,';oiup');
INSERT INTO #TB2 VALUES(2,'lkjhj');
SELECT * FROM #TB1
UPDATE #TB1 SET Name =S.Name
FROM #TB1 T
INNER JOIN #TB2 S
ON S.No = T.No
SELECT * FROM #TB1
try this :
UPDATE
Table_A
SET
Table_A.AccountNumber = Table_B.AccountNumber ,
FROM
dbo.Sales_Import AS Table_A
INNER JOIN dbo.RetrieveAccountNumber AS Table_B
ON Table_A.LeadID = Table_B.LeadID
WHERE
Table_A.LeadID = Table_B.LeadID
MYSQL (This is my preferred way for restoring all specific column reasonId values, based on primary key id equivalence)
UPDATE `site` AS destination
INNER JOIN `site_copy` AS backupOnTuesday
ON backupOnTuesday.`id` = destination.`id`
SET destdestination.`reasonId` = backupOnTuesday.`reasonId`
This will allow you to update a table based on the column value not being found in another table.
UPDATE table1 SET table1.column = 'some_new_val' WHERE table1.id IN (
SELECT *
FROM (
SELECT table1.id
FROM table1
LEFT JOIN table2 ON ( table2.column = table1.column )
WHERE table1.column = 'some_expected_val'
AND table12.column IS NULL
) AS Xalias
)
This will update a table based on the column value being found in both tables.
UPDATE table1 SET table1.column = 'some_new_val' WHERE table1.id IN (
SELECT *
FROM (
SELECT table1.id
FROM table1
JOIN table2 ON ( table2.column = table1.column )
WHERE table1.column = 'some_expected_val'
) AS Xalias
)
Summarizing the other answers, there're 4 variants of how to update target table using data from another table only when "match exists"
Query and sub-query:
update si
set si.AccountNumber = (
select ran.AccountNumber
from RetrieveAccountNumber ran
where si.LeadID = ran.LeadID
)
from Sales_Import si
where exists (select * from RetrieveAccountNumber ran where ran.LeadID = si.LeadID)
Inner join:
update si
set si.AccountNumber = ran.AccountNumber
from Sales_Import si inner join RetrieveAccountNumber ran on si.LeadID = ran.LeadID
Cross join:
update si
set si.AccountNumber = ran.AccountNumber
from Sales_Import si, RetrieveAccountNumber ran
where si.LeadID = ran.LeadID
Merge:
merge into Sales_Import si
using RetrieveAccountNumber ran on si.LeadID = ran.LeadID
when matched then update set si.accountnumber = ran.accountnumber;
All variants are more-less trivial and understandable, personally I prefer "inner join" option. But any of them could be used and developer has to select "better option" according to his/her needs
From performance perspective variants with join-s are more preferable:
Oracle 11g
merge into Sales_Import
using RetrieveAccountNumber
on (Sales_Import.LeadId = RetrieveAccountNumber.LeadId)
when matched then update set Sales_Import.AccountNumber = RetrieveAccountNumber.AccountNumber;
For Oracle SQL try using alias
UPDATE Sales_Lead.dbo.Sales_Import SI
SET SI.AccountNumber = (SELECT RAN.AccountNumber FROM RetrieveAccountNumber RAN WHERE RAN.LeadID = SI.LeadID);
I'd like to add one extra thing.
Don't update a value with the same value, it generates extra logging and unnecessary overhead.
See example below - it will only perform the update on 2 records despite linking on 3.
DROP TABLE #TMP1
DROP TABLE #TMP2
CREATE TABLE #TMP1(LeadID Int,AccountNumber NVarchar(50))
CREATE TABLE #TMP2(LeadID Int,AccountNumber NVarchar(50))
INSERT INTO #TMP1 VALUES
(147,'5807811235')
,(150,'5807811326')
,(185,'7006100100007267039');
INSERT INTO #TMP2 VALUES
(147,'7006100100007266957')
,(150,'7006100100007267039')
,(185,'7006100100007267039');
UPDATE A
SET A.AccountNumber = B.AccountNumber
FROM
#TMP1 A
INNER JOIN #TMP2 B
ON
A.LeadID = B.LeadID
WHERE
A.AccountNumber <> B.AccountNumber --DON'T OVERWRITE A VALUE WITH THE SAME VALUE
SELECT * FROM #TMP1
ORACLE
use
UPDATE suppliers
SET supplier_name = (SELECT customers.customer_name
FROM customers
WHERE customers.customer_id = suppliers.supplier_id)
WHERE EXISTS (SELECT customers.customer_name
FROM customers
WHERE customers.customer_id = suppliers.supplier_id);
update table1 dpm set col1 = dpu.col1 from table2 dpu where dpm.parameter_master_id = dpu.parameter_master_id;
If above answers not working for you try this
Update Sales_Import A left join RetrieveAccountNumber B on A.LeadID = B.LeadID
Set A.AccountNumber = B.AccountNumber
where A.LeadID = B.LeadID

Doctrine join query to get all record satisfies count greater than 1

I tried with normal sql query
SELECT activity_shares.id FROM `activity_shares`
INNER JOIN (SELECT `activity_id` FROM `activity_shares`
GROUP BY `activity_id`
HAVING COUNT(`activity_id`) > 1 ) dup ON activity_shares.activity_id = dup.activity_id
Which gives me record id say 10 and 11
But same query I tried to do in Doctrine query builder,
$qb3=$this->getEntityManager()->createQueryBuilder('c')
->add('select','c.id')
->add('from','MyBundleDataBundle:ActivityShare c')
->innerJoin('c.activity', 'ca')
// ->andWhere('ca.id = c.activity')
->groupBy('ca.id')
->having('count(ca.id)>1');
Edited:
$query3=$qb3->getQuery();
$query3->getResult();
Generated SQL is:
SELECT a0_.id AS id0 FROM activity_shares a0_
INNER JOIN activities a1_ ON a0_.activity_id = a1_.id
GROUP BY a1_.id HAVING count(a1_.id) > 1
Gives only 1 record that is 10.I want to get both.I'm not getting idea where I went wrong.Any idea?
My tables structure is:
ActivityShare
+-----+---------+-----+---
| Id |activity |Share| etc...
+-----+---------+-----+----
| 1 | 1 |1 |
+-----+---------+-----+---
| 2 | 1 | 2 |
+-----+---------+-----+---
Activity is foreign key to Activity table.
I want to get Id's 1 and 2
Simplified SQL
first of all let me simplify that query so it gives the same result :
SELECT id FROM `activity_shares`
GROUP BY `id`
HAVING COUNT(`activity_id`) > 1
Docrtrine QueryBuilder
If you store the id of the activty in the table like you sql suggests:
You can use the simplified SQL to build a query:
$results =$this->getEntityManager()->createQueryBuilder('c')
->add('select','c.id')
->add('from','MyBundleDataBundle:ActivityShare c')
->groupBy('c.id')
->having('count(c.activity)>1');
->getResult();
If you are using association tables ( Doctrine logic)
here you will have to use join but the count may be tricky
Solution 1
use the associative table like an entitiy ( as i see it you only need the id)
Let's say the table name is activityshare_activity
it will have two fields activity_id and activityshare_id, if you find a way to add a new column id to that table and make it Autoincrement + Primary the rest is easy :
the new entity being called ActivityShareActivity
$results =$this->getEntityManager()->createQueryBuilder('c')
->add('select','c.activityshare_id')
->add('from','MyBundleDataBundle:ActivityShareActivity c')
->groupBy('c.activityshare_id')
->having('count(c.activity_id)>1');
->getResult();
the steps to add the new identification column to make it compatible with doctrine (you need to do this once):
add the column (INT , NOT NULL) don' t put the autoincrement yet
ALTER TABLE tableName ADD id INT NOT NULL
Populate the column using a php loop like for
Modify the column to be autoincrement
ALTER TABLE tableName MODIFY id INT NOT NULL AUTO_INCREMENT
Solution2
The correction to your query
$result=$this->getEntityManager()->createQueryBuilder()
->select('c.id')
->from('MyBundleDataBundle:ActivityShare', 'c')
->innerJoin('c.activity', 'ca')
->groupBy('c.id') //note: it's c.id not ca.id
->having('count(ca.id)>1')
->getResult();
I posted this one last because i am not 100% sure of the output of having+ count but it should word just fine :)
Thanks for your answers.I finally managed to get answer
My Doctrine query is:
$subquery=$this->getEntityManager()->createQueryBuilder('as')
->add('select','a.id')
->add('from','MyBundleDataBundle:ActivityShare as')
->innerJoin('as.activity', 'a')
->groupBy('a.id')
->having('count(a.id)>1');
$query=$this->getEntityManager()->createQueryBuilder('c')
->add('select','c.id')
->add('from','ChowzterDataBundle:ActivityShare c')
->innerJoin('c.activity', 'ca');
$query->andWhere($query->expr()->in('ca.id', $subquery->getDql()))
;
$result = $query->getQuery();
print_r($result->getResult());
And SQL looks like:
SELECT a0_.id AS id0 FROM activity_shares a0_ INNER JOIN activities a1_ ON a0_.activity_id = a1_.id WHERE a1_.id IN (SELECT a2_.id FROM activity_shares a3_ INNER JOIN activities a2_ ON a3_.activity_id = a2_.id GROUP BY a2_.id HAVING count(a2_.id) > 1