Source Header (first row) are dates in a FF then mapping success else fail - informatica

I have a requirement where my source flat file with first row with dates, second row with field names and so on and i am reading it as one string and loading into target table.
So I need to do a unit test where if the source file don't have dates in there first row but have some thing else then i want to fail my mapping else success.
Example of source file:
"2015-05-23","2015-06-05"
"carrier","contract",'Group",'Name",'record"
"1234","abcd","4567","kiran","1".
How do I approach this logic in Informatica, please share your inputs.

You can do a substring of the first line and check if it contains a date using the IS_DATE function.
ex. IS_DATE(SUBSTR(input, 2, 10),'YYYY-MM-DD')
Then, if the above return false, use the ABORT function to fail the workflow.

You can create two separate pipelines -
one which picks up first row from the file, check if its a date and abort whole flow if it isn't. Picking up first row - you can use sequence generator to determine if that is first row or not. then use IS_DATE(SUBSTR(input, 2, 10),'YYYY-MM-DD') and ABORT.
Second pipeline will process the data as usual.

Related

In Libreoffice Calc, which formula will check if a a keyword or part of it is contained in a cell in a row and copy the entire content of that cell?

I am learning how to use formulas in spreadsheets, I do use libre office.
I need to sort out data in a quite huge messy spreadsheet.
Each column contains mixed data, the sheet is huge, dozens of columns and thousands of rows, if the spreadsheet does not contain errors each cell in a row either contains a different keyword or is empty, there should not be two cells in the same row containing the same keyword.
The problem to solve is to sort out all the data so to reach to have a new spreadsheet in which each cell marked with a given specific keyword is kept in the same position but placed in one column dedicated to that same keyword.
the kind of spreadsheet with mixed up cells to be sorted out
the data in the spreadsheet has to be fixed so to appear in this way
A formula that can be used to extract sorted out data from a cell is the following:
=IF(SEARCH("Text1";B2;1);B2;0)
The formula can be dragged to each cell below to hit the proper cell next to it. The result is correct.
The results are correct, but I do not know why the expected 0 is not printed, there is #VALUE! instead
The logic is very simple, if the cell contains the keyword or any other text that contains that keyword the result is the full content of that cell, otherwise the result is 0.
Here comes the first question, why do I get #VALUE! as a result for those cells that do not contain the keyword? I expected to get 0 instead, just as indicated in the formula,
I tried to leave this filed empty and also to put the 0 result in quotes, the actual result is always the same, #VALUE!...
However, of course this formula extracts only the information contained in one column, so for each other column the process must be repeated.
In order to avoid to create a column with the formula for each column in the spreadsheet or anyway to process each column one by one and more importantly to have then to merge all the results to form one columns containing only cells with a given keyword I thought to use the same formula extending the parsing to each next cell in the row as follows:
=IF(SEARCH("text";B2;1);B2;IF(SEARCH("text";C2;1);C2;IF(SEARCH("text";D2;1);D2;0)))
The logic is very simple and should output in one go a column containing all the cells containing the keyword that are found in the row, check if the first cell in the row contains a word using the search function, if does then the result is the content of that cell, otherwise perform the next test, the next test is the same, check if the next cell contains a certain word using the search function, if does then the result is the content of that cell, otherwise proceed to the next test…. and so on until last test, if no test gave a true result then print 0 (but we get #VALUE!, OK I could live with that...).
In theory should work for a any number of cells, but in the practice does not at all, in fact does work only for the first IF test and cell indicated in the formula.
WHY?
The result using the extended version of the formula to parse N cells in sequence is the same obtained with the simple formula to parse only one cell
Finally, how do I resolve this problem using IF and Search?
Is there any other better approach and way to solve this kind of problems and sort out data in huge spreadsheets of this kind?
Thank you for any hint and help.

PDI - Check data types of field

I'm trying to create a transformation read csv files and check data types for each field in that csv.
Like this : the standard field A should string(1) character and field B is integer/number.
And what I want is to check/validate: If A not string(1) then set Status = Not Valid also if B not a integer/number to. Then all file with status Not Valid will be moved to error folder.
I know I can use Data Validator to do it, but how to move the file with that status? I can't find any step to do it.
You can read files in loop, and
add step as below,
after data validation, you can filter rows with the negative result(not matched) -> add constant values step and with error = 1 -> add set variable step for error field with default values 0.
after transformation finishes, you can do add simple evaluation step in parent job to check value of ERROR variable.
If it has value 1 then move files else ....
I hope this can help.
You can do same as in this question. Once read use the Group by to have one flag per file. However, this time you cannot do it in one transform, you should use a job.
Your use case is in the samples that was shipped with your PDI distribution. The sample is in the folder your-PDI/samples/jobs/run_all. Open the Run all sample transformations.kjb and replace the Filter 2 of the Get Files - Get all transformations.ktr by your logic which includes a Group by to have one status per file and not one status per row.
In case you wonder why you need such a complex logic for such a task, remember that the PDI starts all the steps of a transformation at the same time. That's its great power, but you do not know if you have to move the file before every row has been processed.
Alternatively, you have the quick and dirty solution of your similar question. Change the filter row by a type check, and the final Synchronize after merge by a Process File/Move
And a final advice: instead of checking the type with a Data validator, which is a good solution in itself, you may use a Javascript like
there. It is more flexible if you need maintenance on the long run.

Fixed Length Flat file Parsing

I have a flat file tables say, student.tbl and employee.tbl. Both files are fixed length files. I have a supporting files for both files with the information field code, field description, field Offset and field size.
for example,
ename string 0 10
eage number 10 2
ecity string 12 10
I wrote code to fetch data from the flat files using STL in c++. I am using vector to load those data.
My simple algorithm to load data from Fixed Length file.
1) Read Supporting file.
2) Load supporting file data into a 2D vector string say,
column_records;
3) Read Table file.
4) Get First Line from the Table File, say Data Line.
5) Get First Column Information from the supporting Table Which is
First Row of column_records.
6) Chop Data Line based on the column_record
7) Push the chopped data into a One Dimensional Vector say,
record_vector.
8) Do Step 5, Until the Last Column Information has processed.
9) Push record_vector into 2D vector say,Table_Vector.
10) Do Step 4, Until the last line of the Fixed File has reached.
Well. I did it well. It works fine. But my problem is, in Step 5.
For every fixed length data, I feel there was some repeat Iterations.
I know for a fact, First Fixed Length data itself can have retain the column descriptions for other fixed length data. But I repeatedly doing the Iteration N*M. I wish to my iteration should be 1*M.
I know that I can store my column description in a Structure array. But I have many type of tables. say students.tbl and employee.tbl. Both have different set of columns. So I think it will be bad Idea to have 'N'-struct declaration to load 'N'-supporting Tables.
I wish to use same routine to fetch data from the both tables or 'N' tables. My supporting table format will not be changed. It is in tab delimited format. This is my case.
How do I fetch data from table with 1*M iteration?
I hope I can use enumeration to do this. But I don't know how to do that? will using enumeration or macro solve this issue?
I hope my working source code will not be needed for this Question. If you think source code are needed to answer this question, definitely I will update this question with that source code. I have medium level of English Knowledge. So Sorry for Inconvenience.
Thank You.

Informatica character sequence

I am trying to create character sequence like AAA, AAB, AAC, AAD,....,BAA, BAB, BAC,.... and So on in a flat file using Informatica. I have the formula to create the charater sequence.
Here I need to have sequence numbers generated in informatica. But I dont have any source file or database to have this source.
Is there any method in Informatica to create sequence using Sequence Generater when there is no source records to read?
This is bit tricky as Informatica will do row by row processing and your mapping won't initialize until you give source rows through input(File or DB). So for generating sequence of n length by Informatica trnasformations you need n rows in input.
Another soltion to this is to use Dummy Source(i.e. Source with one row) and you can pass the loop parameters from this source and then use Java transfornmation and Java code to generate this sequence.
There is no way to generate rows without a source in a mapping.
When I need to do that I use one of these methods :
Generating a file with as many lines as I need, with the seq command under Unix. It could also be used as a direct pipeline source without creating the file.
Getting lines from a database
For example Oracle can generate as many lines as you want with a hierarchical query :
SELECT LEVEL just_a_column
FROM dual
CONNECT BY LEVEL <= 26*26*26
DB2 can do that with a recursive query :
WITH DUMMY(ID) AS (
SELECT 1 FROM SYSIBM.SYSDUMMY1
UNION ALL
SELECT ID + 1 FROM DUMMY WHERE ID < 26*26*26
)
SELECT ID FROM DUMMY
You can generate rows using Java transformation. But even to use that , you need a source. I suggest you to use the formula in the Java transform and a dummy source to a database with a select getdate() statement so that a record is returned to call the Java transform. You can then generate the sequence as well in Java transform or connect sequence generator to output of Java transform to number them.
We have an option to create a sequence number even it is not available in the source.
Create a Sequence generator transformation. You will be getting NEXTVAL and CURRVAL.
In a property tab you will be having an option the create a sequence numbers.
Start values - the value from which it should start
Increment by - increment value
End value - the value in which it should end
Current value - your current value
Cycle - In case you require in cyclic
No.of cached values
Reset
Tracing level
Connect the NEXTVAL to your target column.

How to get MIDDLE Data from a FILE

I have 10 records in a file and I don't need the first and the last line, I need data from 2 through 9 lines only.
Can anybody provide me solution on it?
Source file example:
SIDE,MTYPE,PAGENO,CONTIND,SUBACC,SIGN,DEAL QUANTITY,SECURITY,SOURCE SYSTEM,TODATE,SETTLEMENT DATE,REFERENCE 4,REFERENCE 2,TRADE DATE,ACCRUED INTEREST,ACCRUED INTEREST CURRENCY,XAMT1,XAMT2,XAMT3,XAMT4,XAMT5
L,536,1,M,L_CAMS_COLATAGREEMENT,C,0,AGREEMENTS,CAMS_AGREEMENT,6/12/2013,6/12/2013,00107020052_CSA,107020052,6/12/2013,0,USD,,0,250000,0,200000
L,536,1,M,L_CAMS_COLATAGREEMENT,C,0,AGREEMENTS,CAMS_AGREEMENT,6/12/2013,6/12/2013,00115020036_CSA,115020036,6/12/2013,0,USD,,0,250000,0,220000
L,536,1,M,L_CAMS_COLATAGREEMENT,C,0,AGREEMENTS,CAMS_AGREEMENT,6/12/2013,6/12/2013,00301410097_CSA,301410097,6/12/2013,0,USD,,0,226725,0,226725
L,536,1,M,L_CAMS_COLATAGREEMENT,C,0,AGREEMENTS,CAMS_AGREEMENT,6/12/2013,6/12/2013,00030020088_CSA,30020088,6/12/2013,0,USD,,0,250000,0,250000
L,536,1,M,L_CAMS_COLATAGREEMENT,C,0,AGREEMENTS,CAMS_AGREEMENT,6/12/2013,6/12/2013,00106410075_CSA,106410075,6/12/2013,0,USD,,0,250000,0,260000
L,536,1,M,L_CAMS_COLATAGREEMENT,C,0,AGREEMENTS,CAMS_AGREEMENT,6/12/2013,6/12/2013,00116510010_CSA,116510010,6/12/2013,300000,USD,,0,250000,0,260000
L,536,1,M,L_CAMS_COLATAGREEMENT,C,0,AGREEMENTS,CAMS_AGREEMENT,6/12/2013,6/12/2013,00177020015_CSA,177020015,6/12/2013,0,USD,,0,250000,0,270000
L,536,1,M,L_CAMS_COLATAGREEMENT,C,0,AGREEMENTS,CAMS_AGREEMENT,6/12/2013,6/12/2013,00189110093_CSA,189110093,6/12/2013,0,USD,,0,250000,0,280000
L,536,1,M,L_CAMS_COLATAGREEMENT,C,0,AGREEMENTS,CAMS_AGREEMENT,6/12/2013,6/12/2013,00272220015_CSA,272220015,6/12/2013,0,USD,,0,250000,0,10000
L,536,1,M,L_CAMS_COLATAGREEMENT,C,0,AGREEMENTS,CAMS_AGREEMENT,6/12/2013,6/12/2013,SLAVE1,189110093,6/12/2013,0,USD,,0,250000,0,250000
L,536,1,M,L_CAMS_COLATAGREEMENT,C,0,AGREEMENTS,CAMS_AGREEMENT,6/12/2013,6/12/2013,SLAVE2,272220015,6/12/2013,0,USD,,0,250000,0,1000
L,536,1,M,L_CAMS_COLATAGREEMENT,C,0,AGREEMENTS,CAMS_AGREEMENT,6/12/2013,6/12/2013,SLAVE3,301410097,6/12/2013,0,USD,,0,250000,0,200
Not an expert in Informatica but I found the following answer on the web, hope it should be useful for you.
Step 1: You have to assign row numbers to each record. Generate the row numbers using the expression transformation. Create a DUMMY output port in the same expression transformation and assign 1 to that port. So that, the DUMMY output port always return 1 for each row.
Step 2: Pass the output of expression transformation to aggregator and do not specify any group by condition. Create an output port Ototalrecords in the aggregator and assign Ocount port to it. The aggregator will return the last row by default. The output of aggregator contains the DUMMY port which has value 1 and Ototal_records port which has the value of total number of records in the source.
Step 3: Pass the output of expression transformation, aggregator transformation to joiner transformation and join on the DUMMY port. In the joiner transformation check the property sorted input, then only you can connect both expression and aggregator to joiner transformation.
Step 4: In the last step use router transformation. In the router transformation create two output groups.
In the first group, the condition should be Ocount = 1 and connect the corresponding output group to table A. In the second group, the condition should be Ocount = Ototalrecords and connect the corresponding output group to table B. The output of default group should be connected to table C, which will contain all records except first & last.
Source: http://www.queryhome.com/47922/informatica-how-to-get-middle-data-from-a-file
From informatica prospective, There are multiple way to do this.
if data in flat file, the sqloverride would not work. you can create two pipe line, first line read from source and use aggregator get the count and assign to a mapping variable such v_total. second pipe line you use another variable v_count, initialize to 0 , call count function. create filter transformation, filter out v_count=1 and (v_total-v_count)=1, the rest will be load to target.
Seems a lot of code wasted making the mapping unnecessarilly complex when a simple unix command such as
head -9 (currentfilename) (newinputfilename)
Will do the job. Then all you need do is use the new file for your mapping (if you even need it anymore)
For a windows server equivalent see https://serverfault.com/questions/490841/how-to-display-the-first-n-lines-of-a-command-output-in-windows-the-equivalent