PDI - How to keep Transformation run even an error occur? - kettle

I have a transformation with several steps that run by batch script using Windows Task Scheduler.
Sometimes the first step or the n steps fail and it stops the entire transformation.
I want to transformation to run from start to end regardless of any errors, any way of doing this?

1)One way is to “error handling”, however it is not available for all the steps. You can right click on the step and check whether error handling option is available or not.
2) if you are getting errors because of incorrect datatype, for example: you are expecting a integer value and for some specific record you may get string value so it may fail , for handling such situation you can use data validation step.
Basically you can implement logic based on the transformation you have created. Above are some of the General methods.

This is what you called "Error Handling". Though your transformation runs with some Errors, you still want your transformation to continue to run.
Situations:
- Data type issues in the data stream.
Ex: say you have a column X of data type integer but by mistake you got string value. then you can define Error handling to capture all these records.
- while Processing json data.
Ex: the path you mentioned to retrieve a value of json field and for some data node the path can't identify or missing it. you can define error handling to capture all missing path details.
- while Update table
- If you are updating a table with some key, and if the key was not available as it is coming from input stream then an error will occur. you can define error handling here also.

Related

How can i remove junk values and load multiple .csv files(different Schema) into BigQuery?

i have many .csv files which are stored into gcs and i want to load data from.csv to BigQuery using below commands:
bq load 'datasate.table' gs://path.csv json_schema
i have tried but giving errors, same error is giving for many file.
error screenshot
how can i remove unwanted values from .csv files before importing into table.
Suggest me to load file in easiest way
The answer depends on what do you want to do with this junk rows. If you look at the documentation, you have several options
Number of errors allowed. By default, it's set to 0 and that why the load job fails at the first line. If you know the total number of rom, set this value to the Number of errors allowed and all the errors will be ignored in the Load Job
Ignore unknown values. If your errors are made because some line contains more column as defined in the schema, this option keep the line in error and only the known column, the others are ignore
Allow jagged rows. If your errors are made by too short line (and it is in your message) and you still want to keep the first columns (because the last ones are optional and/or not relevant), you can check this option
For more advanced and specific filters, you have to perform pre or post processing. If it's the case, let me know to add this part to my answer.

PDI - Check data types of field

I'm trying to create a transformation read csv files and check data types for each field in that csv.
Like this : the standard field A should string(1) character and field B is integer/number.
And what I want is to check/validate: If A not string(1) then set Status = Not Valid also if B not a integer/number to. Then all file with status Not Valid will be moved to error folder.
I know I can use Data Validator to do it, but how to move the file with that status? I can't find any step to do it.
You can read files in loop, and
add step as below,
after data validation, you can filter rows with the negative result(not matched) -> add constant values step and with error = 1 -> add set variable step for error field with default values 0.
after transformation finishes, you can do add simple evaluation step in parent job to check value of ERROR variable.
If it has value 1 then move files else ....
I hope this can help.
You can do same as in this question. Once read use the Group by to have one flag per file. However, this time you cannot do it in one transform, you should use a job.
Your use case is in the samples that was shipped with your PDI distribution. The sample is in the folder your-PDI/samples/jobs/run_all. Open the Run all sample transformations.kjb and replace the Filter 2 of the Get Files - Get all transformations.ktr by your logic which includes a Group by to have one status per file and not one status per row.
In case you wonder why you need such a complex logic for such a task, remember that the PDI starts all the steps of a transformation at the same time. That's its great power, but you do not know if you have to move the file before every row has been processed.
Alternatively, you have the quick and dirty solution of your similar question. Change the filter row by a type check, and the final Synchronize after merge by a Process File/Move
And a final advice: instead of checking the type with a Data validator, which is a good solution in itself, you may use a Javascript like
there. It is more flexible if you need maintenance on the long run.

how to call Sequence function from SQL into Informatica

I have a port 'Number_1' in expression transformation in Informatica. I connected the Number_1 port to a target sql table. I need to generate number for this port 'Number_1' every time i run the mapping starting from 1 till 999. once it reaches 999 then again the value of the Number_1 should reset to 1. I'm aware there is sequence generator trans. but i need to call Sequence function from SQL server. how to achieve above?
Create a stored procedure in sql server then use stored procedure tranformation to call this from Informatica
You may call it using Stored Procedure transfomation. You may also use a stored procedure as SQL Override on Source Qualifier, however...
I hope you know what you're doing, as this is in general a very bad idea. Each call requires communication between Intergration Service and database - this will cause huge delays. Therefore it's much better to use Informatica's Sequence Generator or - even better perhaps, if all you need is an integer port with round robin - a simple expression variable.
While maciejg says makes a lot of sense performance wise - however I've known a fair few people who were more comfortable using the native database sequencer than the inbuilt one ( even some Informatica specialists ).
The thing with Informatica sequencer is how much flexibility they give and when they get set wrong it can lead to unexpected numbers being picked.
One example I have is for a sequencer being used to create unique keys in a table - if you persist the value between sessions then it works fine until you select the incorrect option while reimporting the mapping.
If you choose to lookup your previous ending value from a config file / table and add the value produced by the sequencer to this then one day when someone by mistake sets the sequencer to persist values between runs you'll all of a sudden skip that many numbers in the sequence each time the session is restarted. Native db sequencers are very basic making them very predictable and fool proof

Informatica skipped records need to be written to bad files

in my case, i have an expression transformation which uses the default function ERROR('transformation') to skip the records in which date value coming inside is not in the correct format. In this, the skipped rows are not written to the reject files, so that we are getting the reconcilation problem. I need the skipped rows to be written to the bad files.Please help me how can i achieve this.
Put the Update Strategy transformation in your mapping and flag these rows for reject (use the DD_REJECT constant).
More information: Update Strategy Transformation
Well, don't use the abort() function then. Use a router to write the bad formated dates to a different target then.

Underlying mechanism in firing SQL Queries in Oracle

When we fire a SQL query like
SELECT * FROM SOME_TABLE_NAME under ORACLE
What exactly happens internally? Is there any parser at work? Is it in C/C++ ?
Can any body please explain ?
Thanks in advance to all.
Short answer is yes, of course there is a parser module inside Oracle that interprets the statement text. My understanding is that the bulk of Oracle's source code is in C.
For general reference:
Any SQL statement potentially goes through three steps when Oracle is asked to execute it. Often, control is returned to the client between each of these steps, although the details can depend on the specific client being used and the manner in which calls are made.
(1) Parse -- I believe the first action is actually to check whether Oracle has a cached copy of the exact statement text. If so, it can save the work of parsing your statement again. If not, it must of course parse the text, then determine an execution plan that Oracle thinks is optimal for the statement. So conceptually at least there are two entities at work in this phase -- the parser and the optimizer.
(2) Execute -- For a SELECT statement this step would generally run just enough of the execution plan to be ready to return some rows to the client. Depending on the details of the plan, that might mean running the whole thing, or might mean doing just a very small fraction of the work. For any other kind of statement, the execute phase is when all of the work is actually done.
(3) Fetch -- This is when rows are actually returned to the client. Generally the client has a predetermined fetch array size which sets the maximum number of rows that will be returned by a single fetch call. So there may be many fetches made for a single statement. Of course if the statement is one that cannot return rows, then there is no fetch step necessary.
Manasi,
I think internally Oracle would have its own parser, which does parsing and tries compiling the query. Think its not related to C or C++.
But need to confirm.
-Justin Samuel.