PDI - Update field value in Logging tables - kettle

I'm trying create a transformation that can change field value in DB (postgreSQL what i use).
Case :
In postgre db I have table called Monitoring and it has several field like id, date, starttime, endtime, duration, transformation name, status, desc. All those value I get from Transformation Logging.
So, when I run the transformation it will insert into Monitoring table and set value for field status with Running. And when it done it will update the status into Finish. What I'm trying is to define value in table field by myself not take it from Transformation Logging so I can customize the value like I want to.
Goal is Update transformation status value from 'running' to 'finish/error/abort etc' in my db using pentaho and display that status in web app
I have thinking to used Modified Java Script step to do it but if there any other way maybe? A better one. (Just need opinion about this)

Apart from my remark, did you try the Value Mapper?

modified javascript is not a good idea to use. Ideally, it shouldn't be used due to the performance issue. You can use "add constant" step or "User defined Java Class" for an alternative.

You cannot change the values of the built-in Logging tables, for the simple reason that they are reserved for PDI usage. This causes a known issue in case of hard error: for example the status is not set to finish when the data base server crashes, or when a NullException is not catch by the DPI code.
You have some work around.
The simplest, the one used in the ETL-Pilot is to test (Status=Finish OR LogDate< 15 minutes ago) is the web app.
You can update the table when the transformation is not running. For example, put an hourly (or less) crontab that changes to Finish the status of any transformation whose LogDate is older than 15 mn. This crontab may be a simple SQL or included in a transformation that also check the tables size and/or send an email in case of potential error.
You can copy the table (if it is a non locking operation in your DB system), modify the Status column and use this table for your web app.

Related

Informatica : taking very long time when doing insert

i have one mapping which just includes one source table and one target table. The source table has 100 columns and around 33xxxx records, i need to use this tool to insert to the target table and the logic is insert only. The version of informatica is 9.6.1 version and Database is SQL Server 2012.
After i run the workflow, it takes 5x/s to insert. the speed is too slow. I think it may be related to the number of columns
Can anyone help me how to increase the speed?
Thanks a lot
I think i know the reason why it happened. It is there are two fields which are ntext field in this table. That's why it takes very long time.
You can try the below options
1) Use bulk option for 'Target Load type' attribute in session if the target table doesn't have any indexes or keys on it
2) If there is any SQL override in the SOURCE QUALIFIER try to tune the query
3) Find for 'BUSY' in the session log and note down the busy percentages of each thread. Based on the thread percentages you will be able to identify the exact thread which is taking more time (Reader, Transformation, Writer)
4) Try to use informatica partitions through which you can achieve parallel processing.
Thanks and Regards,
Raj
Consider following points to increase the performance:
Increase the "commit interval" size in the session level properties.
Use the "bulk load" in session level properties.
You can also use the "partitioning" in session level, to do this you need partitioning license.
If your source is a database and you are doing sql override in source qualifier transformation , then you can also use the "Hints" for increasing the performan

trigger Informatica workflow based on the status column in oracle table

I want to implement the below scenario without using pl/sql procedure or trigger
I have a table called emp_details with coulmns (empno,ename,salary,emp_status,flag,date1) .
If someone updates the columns emp_status='abc' and flag='y', Informatica WF 1 would be in continuous running status and checking emp_status value "ABC"
If it found record / records then query all the records and it will invoke WF 2.
WF 1 will pass value ename,salary,Date1 to WF 2 (Wf2 will populate will insert the records into the table emp_details2).
How can I do this using the informatica approach instead of plsql or trigger?
If you want to achieve this in real time, write the output of WF1 to a message queue and in the second workflow WF2 subscribe to the message queue produced from WF1.
If you have batch process in place. Produce a output file from WF1 and use this output file in WF2. You can easily setup this dependency using job schedulers.
I don't understand why do you need two workflows in the first place. Why not accomplish emp_details2 table updates with the very same workflow that is looking for differences.
Anyway, this can be done using indicator file:
WF1 running continously should create a file if any changes have been found.
WF2 should be running continously with EventWait set to wait for the indicator file specified above. Once found it should use the Assignment Task to rename/delete the file and fetch the desired data from source and populate the emp_details2 table.
If you need it this way, you can pass the data through the indicator file
You can do this in a single workflow, Create a dummy session which which check for the flag in table after this divide the flow into two based on the below link conditions,
Flow one: Link condition, Session.Status=SUCCEEDED and SOURCE_SUCCESS_ROWS(count)>=1 then run your actual session which will load the data
Flow two: Link Condition, Session.Status=SUCCEEDED and SOURCE_SUCCESS_ROWS=0, connect this to control task and mark the workflow as complete.
Make sure you schedule the workflow at Informatica level to run continousuly.
Cheers

Changing Length of Siebel Column

Suppose we have a existing siebel column and this column has corresponding mapped eim column also. If I change the length of this siebel base table's column from 100 to 200varhcar by running alter query from backend. How it will impact on the EIM process? Will import process be successful?
Regards,
Robin
If you are interested in knowing conceptually, here are the implications that i can foresee.
a) Table column added using alter table is virtually useless as the application wont be able to use it because its definition is missing from Siebel Repository.
b) If you change the length of an existing column, application would still be using the length mentioned in Siebel Repository.
c) EIM process will ignore your new column length as it loads data dictionary before running the job.
d) And finally, during code migration you have to do the alter table every time since DDLSync process cannot take care of your scenario.
I would advise you not to alter the length of an existing vanilla table column, and instead extend the database table to add a new column. Just as the other poster mentioned, you should do this using Siebel Tools. You will then need to also add reference for this new field into the EIM components (this you also do using Siebel Tools).
This is a best-practice. If your client ever had an Siebel code review done by Oracle, you would be told to do what I described above (not what you were considering doing).
Changing the column length using the alter table command will only change it in the database layer, which will have no repercussions with a siebel standpoint. The EIM tables will still be valid as they will be using the column length mentioned in the repository sent in by tools. If you dont change it in the tools and apply the table, I dont think the changes will work.
I would not recommend that you do this. In this case, probably nothing will go wrong. EIM columns will load data that are upto 100 characters long but from the gui, you could insert upto 200 characters. Something unexpected can go wrong, we would need to know your application better to answer this question.

Can data be changed in my tables that will break batch updating while it modifies multiple rows?

I have an update query that is based on the result of a select, typically returning more than 1000 rows.
If some of these rows are updated by other queries before this update can touch them could that cause a problem with the records? For example could they get out of sync with the original query?
If so would it be better to select and update individual rows rather than in batch?
If it makes a difference, the query is being run on Microsoft SQL Server 2008 R2
Thanks.
No.
A Table cannot be updated while something else is in the process of updating it.
Databases use concurrency control and have ACID properties to prevent exactly this type of problem.
I would recommend reading up on isolation levels. The default in SQL Server is READ COMMITTED, which means that other transactions cannot read data that has been updated but not committed by a given transaction.
This means that data returned by your select/update statement will be an accurate reflection of the database at a moment in time.
If you were to change your database to READ UNCOMMITTED then you could get into a situation where the data from your select/update is out of synch.
If you're selecting first, then updating, you can use a transaction
BEGIN TRAN
-- your select WITHOUT LOCKING HINT
-- your update based upon select
COMMIT TRAN
However, if you're updating directly from a select, then, no need to worry about it. A single transaction is implied.
UPDATE mytable
SET value = mot.value
FROM myOtherTable mot
BUT... do NOT do the following, otherwise you'll run into a deadlock
UPDATE mytable
SET value = mot.value
FROM myOtherTable mot WITH (NOLOCK)

Memcache db models to make search more efficient

I need to set up some kind of e-store with search functionality.
For every search request I got to query structure like this:
product:
-name
-tags
--tag
-ingredients
--ingredient
---tags
----tag
---options
----option
-----option details
-variants
--variant
---tags
----tag
---options
----option measure
----value
---price
Now imagine number of queries... Database is normalized (2nd level I guess).
It seems to me that one obvious solution here is to store each fetched model result set (product set, ingredient set, attribute set, tag set etc.) in memory for a very long time (products and its attributes updated not so often and only by admin) and make query from there.
So what do you think? Is there a better way to reduce db queries count?
Another option I thought about is to use sphinx, but I don't need full-text search at all, just exact matches with tag-like fields.
Thank you in advance!
On my Google App Engine app I normally move things from the datastore to memcache and work with them there since querying for the data can take a lot of time. MemCache, in my case, returns the data and has less load on CPU than accessing the data which can go through a number of queries until it gets what it is looking for.
I would recommend setting a long timeout on your memcache so that memcache doesnt flush it more often than you are expecting. I think the maximum timout is up to 1 month but normally setting it for a couple days will suffice.
You can always add code to flush the memcache if the data for a product has been updated so that you do the DB hit again but only once this time