Informatica : something like CDC without adding any column in target table - informatica

I have a source table named A in oracle.
Initially Table A is loaded(copied) into table B
next I operate DML on Table A like Insert , Delete , Update .
How do we reflect it in table B ?
without creating any extra column in target table.
Time stamp for the row is not available.
I have to compare the rows in source and target
eg : if a row is deleted in source then it should be deleted in target.
if a row is updated then update in target and if not available in source then insert it in the target .
Please help !!

Take A and B as source.
Do a full outer join using a joiner (or if both tables are in the same databse, you can join in Source Qualifier)
In a expression create a flag based on the following scenarios.
A key fields are null => flag='Delete',
B key fields are null => flag='Insert',
Both A and B key fields are present - Compare non-key fields of A and B, if any of the fields are not equal set flag to 'Update' else 'No Change'
Now you can send the records to target(B) after applying the appropriate function using Update Strategy

If you do not want to retain the operations done in target table (as no extra column is allowed), the fastest way would simply be -
1) Truncate B
2) Insert A into B

Related

BigQuery MERGE statement billing more bytes than editor shows

I have a very large (3.5B records) table that I want to update/insert (upsert) using the MERGE statement in BigQuery. The source table is a staging table that contains only the new data, and I need to check if the record with a corresponding ID is in the target table, updating the row if so or inserting if not.
The target table is partitioned by an integer field called IdParent, and the matching is done on IdParent and another integer field called IdChild. My merge statement/script looks like this:
declare parentList array<int64>;
set parentList = array(select distinct IdParent from dataset.Staging);
merge into dataset.Target t
using dataset.Staging s
on
-- target is partitioned by IdParent, do this for partition pruning
t.IdParent in unnest(parentList)
and t.IdParent = s.IdParent
and t.IdChild = s.IdChild
when matched and t.IdParent in unnest(parentList) then
update
set t.Column1 = s.Column1,
t.Column2 = s.Column2,
...<more columns>
when not matched and IdParent in unnest(parentList) then
insert (<all the fields>)
values (<all the fields)
;
So I:
Pull the IdParent list from the staging table to know which partitions to prune
limit the partitions of the target table in the join predicate
also limit the partitions of the target table in the match/not matched conditions
The total size of dataset.Target is ~250GB. If I put this script in my BQ editor and remove all the IdParent in unnest(parentList) then it shows ~250GB to bill in the editor (as expected since there's no partition pruning). If I add the IdParent in unnest(parentList) back in so the script is exactly like you see it above i.e. attempting to partition prune, the editor shows ~97MB to bill. However, when I look at the query results, I see that it actually billed ~180GB:
The target table is also clustered on the two fields being matched, and I'm aware that the benefits of clustering are typically not shown in the editor's estimate. However, my understanding is that that should only make the bytes billed smaller... I can't think of any reason why this would happen.
Is this a BQ bug, or am I just missing something? BigQuery doesn't even say "the script is estimated to process XX MB", it says "This will process XX MB" and then it processes way more.
That's very interesting. What you did seems totally correct.
It seems BQ query planner could interpret your SQL correctly and know the partition pruning is provided, but when it executes. it failed to do so.
try removing t.IdParent in unnest(parentList) from both when matched clauses to see if the issue still happens, that is,
declare parentList array<int64>;
set parentList = array(select distinct IdParent from dataset.Staging);
merge into dataset.Target t
using dataset.Staging s
on
-- target is partitioned by IdParent, do this for partition pruning
t.IdParent in unnest(parentList)
and t.IdParent = s.IdParent
and t.IdChild = s.IdChild
when matched then
update
set t.Column1 = s.Column1,
t.Column2 = s.Column2,
...<more columns>
when not matched then
insert (<all the fields>)
values (<all the fields)
;
It would be a good idea to submit a bug to BigQuery if it couldn't be resolved.

Teradata update target table with source table with multiple ID instances with different dates

I have two tables A and B.
Table A has has fields such as ID,DATE,AMOUNT etc. ID field has id numbers spanning from 10-15 digit numbers. Some of the numbers are 1 to 4 digit in length and we need to get full digit for these sets. There is another table B that has a ID number details and matches with partial Ids in table A. I need to update table A partial ID with full ID from table B where only few numbers match.
I attempted the following and keep getting an error, please help.
Update A
FROM tablenamexyz a,
(select distinct a.ID1,b.ID,date
FROM TABLE tablenamexyz a,
tablenamebty b
where a.ID1=RIGHT(b.ID,4)
QUALIFY ROW_NUMBER(OVER(PARTITION BY b.ID ORDER BY date DESC)=1)z
SET a.ID=b.ID
where a.ID1=b.ID1
--I keep getting update failed error-7547, Target row update by multiple source rows. You can see in table B, we can just use one source with the latest date. Please help. Thanks in advance.
No need to include table A in your derived table Z, but you do need to partition by the value you are going to use in the JOIN (ID1 not ID) for uniqueness:
Update A
FROM tablenamexyz a,
(select RIGHT(b.ID,4) as ID_4,b.ID,"date"
FROM tablenamebty b
QUALIFY ROW_NUMBER()OVER(PARTITION BY ID_4 ORDER BY "date" DESC)=1)z
SET a.ID=z.ID
where a.ID1=z.ID_4
Note that date is a reserved word so if that's really your column name you should quote it.

How to update redshift column: simple text replacement

I have a large target table with columns (id, value). I want to update value='old' to value='new'.
The simplest way would be to UPDATE target SET value='new' WHERE value='old';
However, this deletes and creates new rows and is not recommended, possibly. So I tried to do a merge column update:
# staging
CREATE TABLE stage (LIKE target INCLUDING DEFAULTS);
INSERT INTO stage (SELECT id, value FROM target WHERE value=`old`);
UPDATE stage SET value='new' WHERE value='old'; # ??? how do you update value?
# merge
begin transaction;
UPDATE target
SET value = stage.value FROM stage
WHERE target.id = stage.id and target.distkey = stage.distkey; # collocated join?
end transaction;
DROP TABLE stage;
This can't be the best way of creating the table stage: I have to do all these UPDATE delete/writes when I update this way. Is there a way to do it in the INSERT?
Is it necessary to force the collocated join when I use CREATE TABLE LIKE?
Are you updating all the rows in the table?
If yes you can use CTAS (create table as) which is recommended method
Assuming you table looks like this
table1
id, col1,col2, value
You can use the following SQL to create a new table
CREATE TABLE tmp_table AS
SELECT id, col1,col2, 'new_value'
FROM table1;
After you verify data in tmp_table
DROP TABLE table1;
ALTER TABLE tmp_table RENAME TO table1;
If you are not updating all the rows you can use a filter to do a CTAS and insert the rest of the rows to the new table, let me know if you need more info if this is the case
CREATE TABLE tmp_table AS
SELECT id, col1,col2, 'new_value'
FROM table1
WHERE value = 'old'
INSERT INTO tmp_table SELECT * from table1;
Next step would be DROP the tmp table and rename table1
Update: Based on your comment you can do the following, let me know if this solves your case.
This method basically creates a new table to replace your existing table.
I have used some of your code
CREATE TABLE stage (LIKE target INCLUDING DEFAULTS);
INSERT INTO stage SELECT id, 'new' FROM target WHERE value=`old`;
Above INSERT inserts rows to be updated with 'new', no need to run an UPDATE after this.
Bring unchanged rows
INSERT INTO stage SELECT id, value FROM target WHERE value!=`old`;
After this point you have target table which is your original table intact
stage table will have both sets of rows, updated rows with 'new' value and rows you did not want to change
To replace your target with stage
DROP TABLE target;
or to keep it further verification
ALTER TABLE target RENAME TO target_old;
ALTER TABLE stage RENAME TO target;
From a redshift developer:
This case doesn't require an upsert, or update+insert, and it is fine to just run the update:
UPDATE target SET value='new' WHERE value='old';
Another way would be to INSERT the rows you need and DELETE the other rows, but that's unnecessarily complicated.

How to substitute NULL with value in Power BI when joining one to many

In my model I have table AssignedToUser that don't contain any NULL values.
Table TotalCounts contains number of tasks for each User.
Those two table joined on UserGUID, and table TotalCounts contains NULL value for UserGUID.
When I drop everything in one table there is NULL value for AssignedToUser.
How can I substitute value NULL for AssignedToUser for "POOL".
Under EditQuery I tried to Create additional column
if [AssignedToUser] = null then "POOL" else [AssignedToUser]
But that didnt help.
UPDATE:
Thanks Alexis.
I have created FullAssignedToUsers table, but when I try to make a relationship with TotalCounts on UserGUID - it doesnt like it.
Data in new a table looks like this:
UPDATE:
File .ipbx can be accessed here:
https://www.dropbox.com/s/95frggpaq6tce7q/User%20Open%20Closed%20Tasks%20Experiment.pbix?dl=0
I believe the problem here is that your join has UserGUID values that are not in your AssignedToUsers table.
To correct this, one possibility is to replace your AssignedToUsers table with one that contains all the UserGUID values from the TotalCounts table.
FullAssignedToUsers =
ADDCOLUMNS(VALUES(TotalCounts[UserGUID]),
"AssignedToUser",
LOOKUPVALUE(AssignedToUsers[AssignedToUser],
AssignedToUsers[UserGUID], TotalCounts[UserGUID]))
The should get you the full outer join. You can then create the custom column like you described in the table and use that column in your visual.
You'll probably want to break the relationships with the original AssignedToUsers table and create relationships with the new one instead.
If you don't want to take that extra step, you can do an ISBLANK inside your new table formula.
FullAssignedToUsers =
ADDCOLUMNS(VALUES(TotalCounts[UserGUID]),
"AssignedToUser",
IF(
ISBLANK(
LOOKUPVALUE(AssignedToUsers[AssignedToUser],
AssignedToUsers[UserGUID], TotalCounts[UserGUID])),
"POOL",
LOOKUPVALUE(AssignedToUsers[AssignedToUser],
AssignedToUsers[UserGUID], TotalCounts[UserGUID])))
Note: This is equivalent to doing a right outer join merge on the AssignedToUsers table in the query editor and then replacing the nulls with "POOL". I'd actually recommend approaching it that way instead.
Another way to approach it is to pull the AssignedToUser column over to the TotalCounts table in a custom column and use that column in your visual instead.
AssignedToUsers =
IF(ISBLANK(RELATED(AssignedToUsers[AssignedToUser])),
"POOL",
RELATED(AssignedToUsers[AssignedToUser]))
This is equivalent to doing a left outer join merge on the TotalCounts table in the query editor, expanding the AssignedToUser column, then replacing nulls with "POOL" in that column.
In Dax missing values are Blank() not null. Try this:
=IF(
ISBLANK(AssignedToUsers[AssignedToUser]),
"Pool",
AssignedToUsers[AssignedToUser]
)

Adding NOT NULL constraint to Cloud Spanner table

If a Cloud Spanner table is created with nullable columns, is it possible to add a NOT NULL constraint on a column without recreating the table?
You can add a NOT NULL constraint to a non-key column. You must first ensure that all rows actually do have values for the column. Spanner will scan the data to verify before fully applying the NOT NULL constraint. More information about how to alter tables is here and here.
However, you can not add such a constraint to a key column. That kind of change would require rewriting all the data in the table, because the nullness of the key affects how the data is encoded. The only option for making that change is to create a new table that's set up the way you want, make code changes to support using both tables temporarily, gradually move the data from the old table to the new table, and eventually change the code to use only the new table and drop the old table. If you further then wanted the original table name, you'd have to do the whole thing again.
Unfortenately there is not way to add not null column
The way to do it:
1 add nullable column
ALTER TABLE table1 ADD COLUMN column1 STRING(255)
UPDATE table1.column1, SET NOT NULL VALUE to the column (if the table is not empty).
UPDATE TABLE table1 SET column1 = "<GENERATED DATA>"
Add constraint
ALTER TABLE table1 ADD COLUMN column1 STRING(255) NOT NULL
Thanks.
Creating a non-nullable column in Spanner on an existing table is typically a three step process:
# add new column to table
ALTER TABLE <table_name> ADD COLUMN <column_name> <value_type>;
# create default values
UPDATE <table_name> SET <column_name>=<default_value> WHERE TRUE;
# add constraint
ALTER TABLE <table_name> ALTER COLUMN <column_name> <value_type> NOT NULL;