PDI - Check data types of field - kettle

I'm trying to create a transformation read csv files and check data types for each field in that csv.
Like this : the standard field A should string(1) character and field B is integer/number.
And what I want is to check/validate: If A not string(1) then set Status = Not Valid also if B not a integer/number to. Then all file with status Not Valid will be moved to error folder.
I know I can use Data Validator to do it, but how to move the file with that status? I can't find any step to do it.

You can read files in loop, and
add step as below,
after data validation, you can filter rows with the negative result(not matched) -> add constant values step and with error = 1 -> add set variable step for error field with default values 0.
after transformation finishes, you can do add simple evaluation step in parent job to check value of ERROR variable.
If it has value 1 then move files else ....
I hope this can help.

You can do same as in this question. Once read use the Group by to have one flag per file. However, this time you cannot do it in one transform, you should use a job.
Your use case is in the samples that was shipped with your PDI distribution. The sample is in the folder your-PDI/samples/jobs/run_all. Open the Run all sample transformations.kjb and replace the Filter 2 of the Get Files - Get all transformations.ktr by your logic which includes a Group by to have one status per file and not one status per row.
In case you wonder why you need such a complex logic for such a task, remember that the PDI starts all the steps of a transformation at the same time. That's its great power, but you do not know if you have to move the file before every row has been processed.
Alternatively, you have the quick and dirty solution of your similar question. Change the filter row by a type check, and the final Synchronize after merge by a Process File/Move
And a final advice: instead of checking the type with a Data validator, which is a good solution in itself, you may use a Javascript like
there. It is more flexible if you need maintenance on the long run.

Related

How to set NULL values to a single character in IICS?

There are 100+ incoming fields for a target transformation in IICS. NULLs can appear in any of these columns. But the end goal is to convert the NULLs in each of the incoming fields to * so that the data in the target consists of * instead of NULL.
A laborious way to do this is to define an expression for each column. That 100+ expressions to cover each and every column. The task of the expression is to convert NULL into *. But that is difficult in terms of maintenance.
In Informatica Power center there is a property on the target object that converts all the NULL values to * as shown in the below screenshot.
Tried setting the property Replacement Character on IICS for the target transformation. But that didn't help. The data is still coming in as NULL.
Do we have a similar functionality or property for target transformation on IICS? If so how to use it?
i think i find easier to create a reusable exp transformation with 10 input and 10 putput. Then copy it 10 times for 100 fields.
create an input, output port like below -
in_col
out_col = IIF(isnull(in_col) OR is_spaces(in_col),'*',in_col)
Then copy in_col - 10 times. And copy out_col 10 times. You need to adjust/fix the formula though.
Save it and make it reusable'
Then copy that reusable widget 10 times.
This has flexibility - if formula changes, you just have to change only 1 widget and viola, everything changed.
Try using Vertical macro. It allows writing a function that will affect a set of indicated ports. Follow the link for full documentation with examples.

How can i remove junk values and load multiple .csv files(different Schema) into BigQuery?

i have many .csv files which are stored into gcs and i want to load data from.csv to BigQuery using below commands:
bq load 'datasate.table' gs://path.csv json_schema
i have tried but giving errors, same error is giving for many file.
error screenshot
how can i remove unwanted values from .csv files before importing into table.
Suggest me to load file in easiest way
The answer depends on what do you want to do with this junk rows. If you look at the documentation, you have several options
Number of errors allowed. By default, it's set to 0 and that why the load job fails at the first line. If you know the total number of rom, set this value to the Number of errors allowed and all the errors will be ignored in the Load Job
Ignore unknown values. If your errors are made because some line contains more column as defined in the schema, this option keep the line in error and only the known column, the others are ignore
Allow jagged rows. If your errors are made by too short line (and it is in your message) and you still want to keep the first columns (because the last ones are optional and/or not relevant), you can check this option
For more advanced and specific filters, you have to perform pre or post processing. If it's the case, let me know to add this part to my answer.

PDI - How to keep Transformation run even an error occur?

I have a transformation with several steps that run by batch script using Windows Task Scheduler.
Sometimes the first step or the n steps fail and it stops the entire transformation.
I want to transformation to run from start to end regardless of any errors, any way of doing this?
1)One way is to “error handling”, however it is not available for all the steps. You can right click on the step and check whether error handling option is available or not.
2) if you are getting errors because of incorrect datatype, for example: you are expecting a integer value and for some specific record you may get string value so it may fail , for handling such situation you can use data validation step.
Basically you can implement logic based on the transformation you have created. Above are some of the General methods.
This is what you called "Error Handling". Though your transformation runs with some Errors, you still want your transformation to continue to run.
Situations:
- Data type issues in the data stream.
Ex: say you have a column X of data type integer but by mistake you got string value. then you can define Error handling to capture all these records.
- while Processing json data.
Ex: the path you mentioned to retrieve a value of json field and for some data node the path can't identify or missing it. you can define error handling to capture all missing path details.
- while Update table
- If you are updating a table with some key, and if the key was not available as it is coming from input stream then an error will occur. you can define error handling here also.

Source Header (first row) are dates in a FF then mapping success else fail

I have a requirement where my source flat file with first row with dates, second row with field names and so on and i am reading it as one string and loading into target table.
So I need to do a unit test where if the source file don't have dates in there first row but have some thing else then i want to fail my mapping else success.
Example of source file:
"2015-05-23","2015-06-05"
"carrier","contract",'Group",'Name",'record"
"1234","abcd","4567","kiran","1".
How do I approach this logic in Informatica, please share your inputs.
You can do a substring of the first line and check if it contains a date using the IS_DATE function.
ex. IS_DATE(SUBSTR(input, 2, 10),'YYYY-MM-DD')
Then, if the above return false, use the ABORT function to fail the workflow.
You can create two separate pipelines -
one which picks up first row from the file, check if its a date and abort whole flow if it isn't. Picking up first row - you can use sequence generator to determine if that is first row or not. then use IS_DATE(SUBSTR(input, 2, 10),'YYYY-MM-DD') and ABORT.
Second pipeline will process the data as usual.

Informatica skipped records need to be written to bad files

in my case, i have an expression transformation which uses the default function ERROR('transformation') to skip the records in which date value coming inside is not in the correct format. In this, the skipped rows are not written to the reject files, so that we are getting the reconcilation problem. I need the skipped rows to be written to the bad files.Please help me how can i achieve this.
Put the Update Strategy transformation in your mapping and flag these rows for reject (use the DD_REJECT constant).
More information: Update Strategy Transformation
Well, don't use the abort() function then. Use a router to write the bad formated dates to a different target then.