Powercenter - concurrent target instances - concurrency

We have a situation where we have a different execution order of instances of the same target being loaded from a single source qualifier.
We have a problem when we promote a mapping from DEV to TEST when we execute in TEST after promoting there are problems.
For instance we have a router with 3 groups for Insert, Update and Delete followed by the appropriate update strategies to set the row type accordingly followed by three target instances.
RTR ----> UPD_Insert -----> TGT_Insert
\
\__> UPD_Update -------> TGT_Update
\
\__> UPD_Delete ---------> TGT_Delete
When we test this out using data to do an insert followed by an update followed by a delete all based on the same primary key we get a different execution order in TEST compared to the same data in our DEV environment.
Anyone have any thoughts - I would post an image but I don't have enough cred yet.
Cheers,
Gil.

You can not controll the load order as long as you have a single source. I you could separate the loads to use separate sources the target load order setting in the mapping could be used, or you could even create separate mappings for them.
As it is now you should use a single target and utilize the update strategy transformation to determine the wanted operation for each record passing through. It is then possible to use a sort to define in what order the different operations is made to the physical table.

You can use the sorter transformation just before update strategy......based on update strategy condition you can sort the incoming rows....So first date will go through the Insert, than update at last through delete strategy.

Simple solution is try renaming the target definition in alphabetical order... like INSERT_A, UPDATE_B, DELETE_C then start loading
This will load in A,B,C order. Try and let me know

Related

Maintain a audit table through re usable frame work

I was asked to create control table with Informatica. I am a newbie and do not have much knowledge about it. I saw the same kind of stuff in my previous project but don't know the way to create a mapplet for that. So the requirement is that I have to create a mapplet which has the following columns:
-mapping_name
-session_name
-last_run_date
--source count
--target count
--status
So what happens is
Example: We executed a workflow with a particular mapping last week.
Now after 1 week we are executing the same mapping.
The requirement is that we should be fetching only those records which fall in this particular time frame(i.e from previous run to the current run). This is something I do not know.
Can you please help me out? I can provide furthur details if required.
There is a solution provided in below link but it doesnt use mapplet.
See, if you want to use mapplet, you wont get 'status' attribute and mapplet approach can be difficult to implement for all mappings.
You can use this link to gather statistics as well.
http://powercenternotes.blogspot.com/2014/01/an-etl-framework-for-operational.html
Now, regarding your other requirement, it seems to me to be an issue with incremental extract. So, you need to store the date parameter when you ran your flow last - into a DB table or flat file.
Use that as reference and pull anything greater than that date.
Mapplet - We used this approach earlier to gather statistics. But this is difficult because you need to add this mapplet + a reusable generic target to capture stats.
Input -
Type_of_data- (this can be source, target)
unique_key - (unique key of the mapping)
MappingName - $PMMappingName
SessionName - $PMSessionName
Aggregator -
i/p-
Type_of_data
unique_key
MappingName group by
SessionName group by
o/p-
count_row = COUNT(*)
Output -
Type_of_data
MappingName
SessionName
count_row
Use a reusable generic target to capture all the rows. You need to add one set after each source, one set before each target. The approach in the link is better i think.

In Camunda , What is the correct way of optimised coding among two options?

Retrieving userTasks first and then from task, retrive the processInstanceId.
First retrieving ProcessInstances and then iterate it with a loop and find usertasks in each process instance.
Please give reason too.
If you want to build a task list you should use the task queries, as you can do the right queries and filters there easily. Process instance id's are part of the result.
But it would indeed be interesting to understand the use case, what exactly you need the process instance id for.

Informatica session refuses to update first then insert in "update then insert" mode

Very basic setup: source-to-target - wanted to replicate the MERGE behavior.
Removed the update strategy, activated "update then insert" rule on target within the session. Doesn't work as described, always attempts to insert into the primary key column, even though the same key arrives, which should have triggered an "update" statement. Tried other target methods - always attempts to insert. Attached is the mapping pic.
basic merge attempt
Finally figured this out. You have to make edits in 3 places: a) mapping - remove update strategy b) session::target properties - set the "update then insert" method c) session's own properties - "treat source rows as"
In the third case you have to switch "treat source rows as" from insert to update.
Which will then allow both - updates and inserts.
Why is it set like this is beyond me. But it works.
I'll make an attempt to clarify this a bit.
First of all, using Update Strategy in the mapping requires the session Treat source rows as property to be set to Data driven. This is slowest possible option as it means it will be set on row-by-row basis within the mapping - but that's exactly what you need if using the Update Strategy transformation. So in order to mirror MERGE, you need to remove it.
And tell the session not to expect this in the mapping anymore - so the property needs to be set to one of the remaining ones. There are two options:
set Treat source rows as to Insert - this means all the rows will be inserted each time. If there are no errors (e.g. caused by unique index), the data will be multiplied. In order to mimic MERGE behavior, you'd need to add the unique index that would prevent inserts and tell the target connector to insert else update. This way in case the insert fails it will make an update attempt.
set Treat source rows as to Update - now this will tell PowerCenter to try updates for each and every input row. Now, using update else insert will cause that in case of failure (i.e. no row to update) there will be no error - instead an insert attempt will be made. Here there's no need for unique index. That's one difference.
Additional difference - although both solutions will reflect the MERGE operation - might be observed in performance. In the environment where new data is very rare, the first approach will be slow: each time an insert attempt will be made just to fail and do an update operation then. Just a few times it will succeed at first attempt. Second approach will be faster: updates will succeed most of the time and just on a rare occasion it will fail and result in an insert operation.
Of course, if updates are not often expected, it will be exactly the opposite.
This can be seen as complex solution for a simple merge. But it also lets the developer to influence the performance.
Hope this sheds some light!

Load One Source into Two Targets With Fail condition

We have a single source & two targets Target A & Target B. We want to load Both Tar A first & Tar B. We want the Load to continue In case Either one Fails. Meaning in case Target A fails, Target B should be loaded and vice versa. We donot want the load to abandon with All or None Scenario. Any options where we query the Source only onetime. Two independent Job flows is not an option as we want a single pull
In your workflow you can write the output to a staging table in your first session and then query the same staging table in 2 parallel sessions writing to the 2 separate tables, that way you guarantee that it doesn't matter what happens with the unrelated load to each of the targets.

Reprocess batches of items over and over again - and the batch might change any time

I am just looking for ideas on how to solve one specific thing I'd like to build.
Say I have two sets of items. Each item is just a couple of lines of JSON. Any time an item is added to one set I immediately (well, almost) want to process this against the full other set. So item is added to set A: Process against each item in set B. And vice versa.
Items come in through API Gateway + Lambda. Match processing in Lambda from a queue/stream.
What AWS technology would be a good fit? I have no idea and no clear pattern on when or how often the sets change. Also, I want it to be as strongly consistent as possible. And of course, I want it to be as serverless and cost-effective as possible. :)
Options could be:
sets stored in Aurora, match processing for a new item in A would need to query the full set B from the database each time
sets stored in DynamoDB, maybe with DynamoDB stream in the background; match processing for a new item in A would need to query the full set B from Dynamo; but spiky load, not a good fit because of unclear read/write provisioning
have each set in its own "static" Kinesis stream where match processing reads through items but doesn't trim. Streams to be replaced with fresh sets regularly
My pain point is: While processing items from A there might be thousands of items in B to be matched. And I want to avoid having to load the full set B from some database every time I process an item from A. I was thinking about some caching of sets but then would need a good option to invalidate that cache whenever something changes.