trigger Informatica workflow based on the status column in oracle table - informatica

I want to implement the below scenario without using pl/sql procedure or trigger
I have a table called emp_details with coulmns (empno,ename,salary,emp_status,flag,date1) .
If someone updates the columns emp_status='abc' and flag='y', Informatica WF 1 would be in continuous running status and checking emp_status value "ABC"
If it found record / records then query all the records and it will invoke WF 2.
WF 1 will pass value ename,salary,Date1 to WF 2 (Wf2 will populate will insert the records into the table emp_details2).
How can I do this using the informatica approach instead of plsql or trigger?

If you want to achieve this in real time, write the output of WF1 to a message queue and in the second workflow WF2 subscribe to the message queue produced from WF1.
If you have batch process in place. Produce a output file from WF1 and use this output file in WF2. You can easily setup this dependency using job schedulers.

I don't understand why do you need two workflows in the first place. Why not accomplish emp_details2 table updates with the very same workflow that is looking for differences.
Anyway, this can be done using indicator file:
WF1 running continously should create a file if any changes have been found.
WF2 should be running continously with EventWait set to wait for the indicator file specified above. Once found it should use the Assignment Task to rename/delete the file and fetch the desired data from source and populate the emp_details2 table.
If you need it this way, you can pass the data through the indicator file

You can do this in a single workflow, Create a dummy session which which check for the flag in table after this divide the flow into two based on the below link conditions,
Flow one: Link condition, Session.Status=SUCCEEDED and SOURCE_SUCCESS_ROWS(count)>=1 then run your actual session which will load the data
Flow two: Link Condition, Session.Status=SUCCEEDED and SOURCE_SUCCESS_ROWS=0, connect this to control task and mark the workflow as complete.
Make sure you schedule the workflow at Informatica level to run continousuly.
Cheers

Related

How to avoid "ALREADY EXISTS" Mutation while writing Mutation to Spanner Table?

I have a certain use-case, where I'm trying to write around TB's of data in Spanner. We are extracting that data from DynamoDb and exporting that data in bzip2 format in Google Cloud Storage. So Basically we have primary id's in Spanner and we have to ignore already exist rows in Spanner. So I have written the below code to achieve the same.
Mutation.WriteBuilder mutation = Mutation.newInsertBuilder(spannerTable.get());
I have written insert builder as I don't want to update the existing row in Spanner. Using the below code to write row to Spanner by setting FailureMode.
results2.apply("Write Mutations to Spanner",SpannerIO.write()
.withInstanceId(spannerInstanceId)
.withDatabaseId(spannerDatabaseId)
//.withBatchSizeBytes(2000000)
//.withMaxNumMutations(maxNumMutations)
.withFailureMode(FailureMode.REPORT_FAILURES)
);
But the problem with the code is, Dataflow code is retrying the entire batch because of '"ALREADY EXISTS"' Mutation. I can't use FailureMode.FAST_FAIL, because it stops the entire pipeline. I also tried to set the minimal MaxNumMutation to decrease the batch_size(basically to decrease the probability of 'already_exist' record in Mutation batch), but the overall performance hampers. So is there any way I can stop the retry mechanism for 'already exist' mutation record?
You should use an InsertOrUpdate mutation instead of InsertMutation. That will do exactly what you expect in this case; insert the row if it is not already there, and otherwise update it. In your case, the update will not do anything useful as no values should change, but it will stop your pipeline from breaking.
Example:
Mutation mutation = Mutation.newInsertOrUpdateBuilder("FOO")
.set("ID")
.to(1)
.set("BAR")
.to("BAZ")
.build();
You can also see the previous in the Spanner Official Documentation .

PDI - Update field value in Logging tables

I'm trying create a transformation that can change field value in DB (postgreSQL what i use).
Case :
In postgre db I have table called Monitoring and it has several field like id, date, starttime, endtime, duration, transformation name, status, desc. All those value I get from Transformation Logging.
So, when I run the transformation it will insert into Monitoring table and set value for field status with Running. And when it done it will update the status into Finish. What I'm trying is to define value in table field by myself not take it from Transformation Logging so I can customize the value like I want to.
Goal is Update transformation status value from 'running' to 'finish/error/abort etc' in my db using pentaho and display that status in web app
I have thinking to used Modified Java Script step to do it but if there any other way maybe? A better one. (Just need opinion about this)
Apart from my remark, did you try the Value Mapper?
modified javascript is not a good idea to use. Ideally, it shouldn't be used due to the performance issue. You can use "add constant" step or "User defined Java Class" for an alternative.
You cannot change the values of the built-in Logging tables, for the simple reason that they are reserved for PDI usage. This causes a known issue in case of hard error: for example the status is not set to finish when the data base server crashes, or when a NullException is not catch by the DPI code.
You have some work around.
The simplest, the one used in the ETL-Pilot is to test (Status=Finish OR LogDate< 15 minutes ago) is the web app.
You can update the table when the transformation is not running. For example, put an hourly (or less) crontab that changes to Finish the status of any transformation whose LogDate is older than 15 mn. This crontab may be a simple SQL or included in a transformation that also check the tables size and/or send an email in case of potential error.
You can copy the table (if it is a non locking operation in your DB system), modify the Status column and use this table for your web app.

Select stmt in source qualifier along with procedure call in Informatica

We have a situation where we are dealing with a relational source(Oracle). The system is developed in a way where we have to first execute a package which will enable data read from Oracle and user will be able to get results out of select statement. I am trying to find a way on how to implement this in informatica mapping.
What we tried
1. In PreSQL we tried to execute the package and in SQL query we wrote select statement - data not getting loaded in target.
2. In PreSQL we wrote a block in which we are executing the package and just after that(within same beging...end block) we wrote insert statement on top of select statement - This is inserting data through insert statement however I am not in favor of this solution as both source and target are dummy which will confuse people in future.
Is there any possibility to implement this solution somehow by using 1st option.
Please help and suggest.
Thanks
The stored procedure transformation is there for this purpose configure it to execute source pre load
Pre-Sql and data read are not a part of same session. From what I understand, this needs to be done within the same session as otherwise the read is granted only for the session.
What you can do, is create a stored procedure/package that will grant read access and then return the data. Use it as a SQL Override on your SQ. This way SQ will read the data as usual. The concept:
CREATE PROCEDURE ReadMyData AS
BEGIN
execute immediate 'GiveMeTheReadAccess';
select * from MyTable;
END;
And use the ReadMyData on the Source Qualifier.

Simple data update with talend

I have a job in talend which migrates some data from one database to another... At the end of data migration, I should update the date of last extraction in the source database with SYSDATE so it could be used as a criteria for the next extraction. The SQL query would be something like :
UPDATE MIGR_FOLLOWUP SET LAST_EXTR = SYSDATE WHERE SYSTEM = 'TARGET3'
I'd like to do that update in talend, and I guess it should be a component triggered by OnSubjobOK, but I just can't seem to understand how to do this in a simple manner... The only way I could possibly think of is using both tOracleInput and tOracleOutput components, in order to first extract the wanted row and then update it, but it really doesn't sound like a good manner to do this...
Can anyone point me out on how to do this?
Thanks!!
You can run arbitrary SQL by using the database row components such as tOracleRow.
If you linked this with an on subjob ok link from your main migration then once your main migration completes successfully it would update the LAST_EXTR field with the current time.
Alternatively you could update this using a tOracleOutput component but you would need to have Talend define the date-time stamp using something like Talend.getCurrentDate in a tMap or tFixedFlowInput.

Informatica target file

I have a workflow which writes data from a table into a flatfile. It works just fine, but I want to insert a blank line inbetween each records. How can this be achieved ? Any pointer ?
Here, you can create 2 target instances. One with the proper data and in other instance pass blank line. Set Merge Type as "Concurrent Merge" in session properties.
Multiple possibilities -
You can prepare appropriate dataset into a relational table, and afterwards, dump data from that into a flat file. For preparation of that data set, you can insert blank rows into that relational target.
Send a blank line to a separate target file (based on some business condition using a router or something similar), after that you can use merge files option (in session config) to get that data into a single file.