Informatica : taking very long time when doing insert - informatica

i have one mapping which just includes one source table and one target table. The source table has 100 columns and around 33xxxx records, i need to use this tool to insert to the target table and the logic is insert only. The version of informatica is 9.6.1 version and Database is SQL Server 2012.
After i run the workflow, it takes 5x/s to insert. the speed is too slow. I think it may be related to the number of columns
Can anyone help me how to increase the speed?
Thanks a lot

I think i know the reason why it happened. It is there are two fields which are ntext field in this table. That's why it takes very long time.

You can try the below options
1) Use bulk option for 'Target Load type' attribute in session if the target table doesn't have any indexes or keys on it
2) If there is any SQL override in the SOURCE QUALIFIER try to tune the query
3) Find for 'BUSY' in the session log and note down the busy percentages of each thread. Based on the thread percentages you will be able to identify the exact thread which is taking more time (Reader, Transformation, Writer)
4) Try to use informatica partitions through which you can achieve parallel processing.
Thanks and Regards,
Raj

Consider following points to increase the performance:
Increase the "commit interval" size in the session level properties.
Use the "bulk load" in session level properties.
You can also use the "partitioning" in session level, to do this you need partitioning license.
If your source is a database and you are doing sql override in source qualifier transformation , then you can also use the "Hints" for increasing the performan

Related

PDI - Update field value in Logging tables

I'm trying create a transformation that can change field value in DB (postgreSQL what i use).
Case :
In postgre db I have table called Monitoring and it has several field like id, date, starttime, endtime, duration, transformation name, status, desc. All those value I get from Transformation Logging.
So, when I run the transformation it will insert into Monitoring table and set value for field status with Running. And when it done it will update the status into Finish. What I'm trying is to define value in table field by myself not take it from Transformation Logging so I can customize the value like I want to.
Goal is Update transformation status value from 'running' to 'finish/error/abort etc' in my db using pentaho and display that status in web app
I have thinking to used Modified Java Script step to do it but if there any other way maybe? A better one. (Just need opinion about this)
Apart from my remark, did you try the Value Mapper?
modified javascript is not a good idea to use. Ideally, it shouldn't be used due to the performance issue. You can use "add constant" step or "User defined Java Class" for an alternative.
You cannot change the values of the built-in Logging tables, for the simple reason that they are reserved for PDI usage. This causes a known issue in case of hard error: for example the status is not set to finish when the data base server crashes, or when a NullException is not catch by the DPI code.
You have some work around.
The simplest, the one used in the ETL-Pilot is to test (Status=Finish OR LogDate< 15 minutes ago) is the web app.
You can update the table when the transformation is not running. For example, put an hourly (or less) crontab that changes to Finish the status of any transformation whose LogDate is older than 15 mn. This crontab may be a simple SQL or included in a transformation that also check the tables size and/or send an email in case of potential error.
You can copy the table (if it is a non locking operation in your DB system), modify the Status column and use this table for your web app.

Best way to update a column of a table of tens of millions of rows

Question
What is the Best way to update a column of a table of tens of millions of rows?
1)
I saw creating a new table and rename the old one when finish
2)
I saw update in batches using a temp table
3)
I saw single transaction (don't like this one though)
4)
never listen to cursor solution for a problema like this and I think it's not worthy to try
5) I read about loading data from file (Using BCP), but have not read if the performance is better or not. was not clear if it is just to copy or if it would allow join a big table with something and then bull copy.
really would like have some advice here.
Priority is performance
At the momment I'm testing solution 2) and Exploring solution 5)
Additional Information (UPDATE)
thank you for the critical thinking in here.
The operation be done in downtime.
UPDATE Will not cause row forwarding
All the tables go indexes, average 5 indexes, although few tables got
like 13 indexes.
the probability of target column is present in one of the table
indexes something like 50%.
Some tables can be rebuilt and replace, others don't because they
make part of a software solution, and we might lose support to those.
from those tables some got triggers.
I'll need to do this for more than 600 tables where ~150 range from
0.8 Million to 35 Million rows
The update is always in the same column in the various fields
References
BCP for data transfer
Actually it depends:
on the number of indexes the table contains
the size of the row before and after the UPDATE operation
type of UPDATE - would it be in place? does it need to modify the row length
does the operation cause row forwarding?
how big is the table?
how big would the transaction log of the UPDATE command be?
does the table contain triggers?
can the operation be done in downtime?
will the table be modified during the operation?
are minimal logging operations allowed?
would the whole UPDATE transaction fit in the transaction log?
can the table be rebuilt & replaced with a new one?
what was the timing of the operation on the test environment?
what about free space in the database - is there enough space for a copy of the table?
what kind of UPDATE operation is to be performed? does additional SELECT commands have to be done to calculate the new value of every row? or is it a static change?
Depending on the answers and the results of the operation in the test environment we could consider the fastest operations to be:
minimal logging copy of the table
an in place UPDATE operation preferably in batches

DynamoDB Scan with no FilterExpression vs Query

I have created a DynamoDB table and a Global Secondary Index on that table. I need to fetch all data from the GSI of that table.
There are two options:
Scan operation with No Filter Expression.
Query operation with no condition.
I need to find out which one has better performance, so that I start my implementation.
I have read a lot about the DynamoDB Scan and Query operations but could resolve my query.
Please help me in resolving my query.
Thanks in advance.
Abhishek
They will both impose the same performance overhead. So choosing either should be okay.
You should think of adding optimizations on top of whichever approach you use - for instance performing parallel scans as mentionedin the best practices:
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/QueryAndScanGuidelines.html
or caching data in your application
Do note that parallel scans will eat up your provisions.
Another thing to watch out for while making your decision would be, how likely is the query pattern going to change? Do you plan on adding filters in the future? If so, then query would be better since scan loads all the data (consuming provisioned read capacity) and then filters results.

Changing Length of Siebel Column

Suppose we have a existing siebel column and this column has corresponding mapped eim column also. If I change the length of this siebel base table's column from 100 to 200varhcar by running alter query from backend. How it will impact on the EIM process? Will import process be successful?
Regards,
Robin
If you are interested in knowing conceptually, here are the implications that i can foresee.
a) Table column added using alter table is virtually useless as the application wont be able to use it because its definition is missing from Siebel Repository.
b) If you change the length of an existing column, application would still be using the length mentioned in Siebel Repository.
c) EIM process will ignore your new column length as it loads data dictionary before running the job.
d) And finally, during code migration you have to do the alter table every time since DDLSync process cannot take care of your scenario.
I would advise you not to alter the length of an existing vanilla table column, and instead extend the database table to add a new column. Just as the other poster mentioned, you should do this using Siebel Tools. You will then need to also add reference for this new field into the EIM components (this you also do using Siebel Tools).
This is a best-practice. If your client ever had an Siebel code review done by Oracle, you would be told to do what I described above (not what you were considering doing).
Changing the column length using the alter table command will only change it in the database layer, which will have no repercussions with a siebel standpoint. The EIM tables will still be valid as they will be using the column length mentioned in the repository sent in by tools. If you dont change it in the tools and apply the table, I dont think the changes will work.
I would not recommend that you do this. In this case, probably nothing will go wrong. EIM columns will load data that are upto 100 characters long but from the gui, you could insert upto 200 characters. Something unexpected can go wrong, we would need to know your application better to answer this question.

Can data be changed in my tables that will break batch updating while it modifies multiple rows?

I have an update query that is based on the result of a select, typically returning more than 1000 rows.
If some of these rows are updated by other queries before this update can touch them could that cause a problem with the records? For example could they get out of sync with the original query?
If so would it be better to select and update individual rows rather than in batch?
If it makes a difference, the query is being run on Microsoft SQL Server 2008 R2
Thanks.
No.
A Table cannot be updated while something else is in the process of updating it.
Databases use concurrency control and have ACID properties to prevent exactly this type of problem.
I would recommend reading up on isolation levels. The default in SQL Server is READ COMMITTED, which means that other transactions cannot read data that has been updated but not committed by a given transaction.
This means that data returned by your select/update statement will be an accurate reflection of the database at a moment in time.
If you were to change your database to READ UNCOMMITTED then you could get into a situation where the data from your select/update is out of synch.
If you're selecting first, then updating, you can use a transaction
BEGIN TRAN
-- your select WITHOUT LOCKING HINT
-- your update based upon select
COMMIT TRAN
However, if you're updating directly from a select, then, no need to worry about it. A single transaction is implied.
UPDATE mytable
SET value = mot.value
FROM myOtherTable mot
BUT... do NOT do the following, otherwise you'll run into a deadlock
UPDATE mytable
SET value = mot.value
FROM myOtherTable mot WITH (NOLOCK)