Auditing Tables in Informatica - informatica

We want to maintain auditing of tables ,
For that my question is
1)will the commit interval in Informatica will be stored anywhere in any variable ,
so that we can maintain the record count for every commit interval.
2)is there any method/script to read the stats from session log and save in audit table.
3)If there are multiple targets in my mapping then in monitor after executing it will show target success count and target reject count as total for all the targets in the mapping.
how to get individual target success and reject count .

You need to use informatica metadata tables which informatica doesn't recommend(still I am mentioning for your reference). So your options are to create a sh/bat script to get these info from session log or create a maplet that collects this kind of statistics and add that maplet in every infa mappings. To answers your questions -
Yes, commit intervals stored in informatica table opb_task_attr where attr_id=14 and select attr_value.
Nope, either you can use some infa mapplet to collct such stats or some shell script.
Yes this is possible. use infomatica view rep_sess_tbl_log for this purpose. Here you can get each target's statistics of a particular session's run.
Koushik

Related

How to create a audit mapplet in Informatica?

I want to create an audit, which can be re-used across multiple mappings to capture Source record count and target record count in when source database is oracle and target database is sql server
We are using it from source to staging mappings
It's all there, in Metadata tables. There's no need to add anything that will make your loads longer and more complex.
You can review this Framework for some ideas.

Reflecting changes on big tables in hdfs

I have an order table in the OLTP system.
Each order record has a OrderStatus field.
When end users created an order, OrderStatus field set as "Open".
When somebody cancels the order, OrderStatus field set as "Canceled".
When an order process finished(transformed into invoice), OrderStatus field set to "Close".
There are more than one hundred million record in the table in the Oltp system.
I want to design and populate data warehouse and data marts on hdfs layer.
In order to design data marts, I need to import whole order table to hdfs and then I need to reflect changes on the table continuously.
First, I can import whole table into hdfs in the initial load process by using sqoop. I may take long time but I will do this once.
When an order record is updated or a new order record entered, I need to reflect changes in hdfs. How can I achieve this in hdfs for such a big transaction table?
Thanks
One of the easier ways is to work with database triggers in your OLTP source db and every change an update happens use that trigger to push an update event to your Hadoop environment.
On the other hand (this depends on the requirements for your data users) it might be enough to reload the whole data dump every night.
Also, if there is some kind of last changed timestamp, it might be a possible way to load only the newest data and do some kind of delta check.
This all depends on your data structure, your requirements and your ressources at hand.
There are several other ways to do this but usually those involve messaging, development and new servers and I suppose in your case this infrastructure or those ressources are not available.
EDIT
Since you have a last changed date, you might be able to pull the data with a statement like
SELECT columns FROM table WHERE lastchangedate < (now - 24 hours)
or whatever your interval for loading might be.
Then process the data with sqoop or ETL tools or the like. If the records are already available in your Hadoop environment, you want to UPDATE it. If the records are not available, INSERT them with your appropriate mechanism. This is also called UPSERTING sometimes.

API Gateway generating 11 sql queries per second on REG_LOG

We have sysdig running on our WSO2 API gateway machine and we notice that it fires a large number of SQL queries to the database for a minute, than waits a minute and repeats.
The query looks like this:
Every minute it goes wild, waits for a minute and goes wild again with a request of the following format:
SELECT REG_PATH, REG_USER_ID, REG_LOGGED_TIME, REG_ACTION, REG_ACTION_DATA
FROM REG_LOG
WHERE REG_LOGGED_TIME>'2016-02-29 09:57:54'
AND REG_LOGGED_TIME<'2016-03-02 11:43:59.959' AND REG_TENANT_ID=-1234
There is no load on the server. What is causing this? What can we do to avoid this?
screen shot sysdig api gateway process
This particular query is the result of the registry indexing task that runs in the background. The REG_LOG table is being queried periodically to retrieve the latest registry actions. The indexing task cannot be stopped. However, one can configure the frequency of the indexing task through the following parameter that is in the registry.xml. See [1] for more information.
indexingFrequencyInSeconds
If this table is filled up, one can clean the data using a simple SQL query. However, when deleting the records, one must be careful not to delete all the data. The latest records of each resource path should be left in the REG_LOG table since reindexing of data requires at least one reference of each resource path.
Also, if required, before clearing up the REG_LOG table, you can take a dump of the data in case you do not want to loose old records. Hope this answer provides information you require.
[1] - https://docs.wso2.com/display/Governance510/Configuration+for+Indexing

z/os cics db2 cobol program to process database entries concurrently

I have a DB2 table containing large amount of records to be send out to external system via MQs. There is a column in the table containing whether the record status (sent or pending to be sent).
I write a scheduler program to continually check if there are records in the table that are "pending to sent". If yes, the program will send the pending records out and update the status accordingly
That schedule will be started in multiple transactions. Therefore I am expecting multiple instances of the same program will be running concurrently
My questions is how to prevent the same records being pick up and sent by multiple schedulers at the same time?
I was told to use cursor with row level locks? but i am not sure how this works
remarks: I am working on CICS COBOL in z/os environment
I think you have a design problem. We accomplish something similar what you are trying to do by having a trigger on the DB2 table which sends an MQ message to a queue which is defined to trigger a CICS transaction.
In your case, you can probably dispense with CICS altogether and just do as #BillWoodger suggests and send the message when you set the pending flag.
One way to do this is as follows
1) Determine the Clustering index for the large DB2 table
2) Then have different instances of the program run only looking at different portions of this clustering index. E.G if the clustering index was on a numeric ID field that is unique, like Account ID and the ID size is Integer 9 than have instance one look at account ID ranges from 0 - 099999999 and instance 2 look at account ID ranges from 100000000 to 1999999999 and .....
This way you can write your cusror with hold, perform updates and commits as needed.
CICS will coordinate SQL transactions with DB2 for you. Each one of the CICS transactions you run will be able to select and lock for update rows and DB2 can coordinate between all of them and prevent the selection of multiple records if you do two things.
When you read the rows that qualify, use a SELECT FOR UPDATE type operation, this will lock every row you retrieve and prevent other concurrent transactions from accessing the same one (also requires you BIND with row level locks unless you want full pages locked, see your DBA about the options based on row size).
Before you release the records or end the CICS transaction, you must do something to flag said records as "sent" so that other, waiting, concurrent transactions do not grab them and send them again. This could be as simple as adding a sent Y/N column to the table and adding "AND sent <> 'Y'" to your select where clause. After you have sent the records, do an UPDATE on those records and set sent = 'Y'. Depending on your row data, you could maybe use something else, like time sent or whatever, it just needs to be something that would exclude said row from reselection.

Is it possible to trigger Informatica workflow using a data in database table?

I am a newbie in ETL and will be using Informatica soon for one of the requirements we have.
The requirement is that Informatica needs to monitor a table in Oracle for certain "trigger data" and as soon as that data is available in that table, Informatica should start executing steps in its workflow.
Is it possible to do this? If yes, could someone please point me to a link/document where this is explained.
Many thanks.
No, it is not possible (checked in PowerCenter 9.5.1).
The Event-Wait task supports only two types of events:
predefined events (the task instructs the Integration Service to wait for the specified indicator file to appear before continuing),
user-defined events (the event is triggered by an Event-Raise task somewhere in the workflow).
Yes it is possible and you will be needing a script that can be created with following steps.
--create a shell script that checks if data is present in table on not you can use this just by taking count of the table
--if count is grater than create an empty file say DUMMY.txt (by using touch command) at a specified path.
--in you Informatica scheduling either by scheduler or by script check every 5 mins if file is present.
--if file is present call you Informatica workflow and delete the DUMMY file.
--once workflow is completed start the process again.