How to separate Header, Footer and Trailer of a FlatFile in informatica - informatica

Suppose i have a flatFile having data like
HEADER 2082021
Rec1
Rec2
Rec3
.
.
.
Trailer total_rec_count
Footer
How to separate Header, Footer and Trailer from flatfile and load data into DB

Two options -
using shell script to preprocess the data and load using three separate SQ.
Use informatica to sequence all rows and then separate them using the sequence number.
Each options have their pros, cons, and complexity.
Option 1 -
a. Use a shell script to generate three files - header, trailer and main file.
head -1 file.txt>header.txt
tail -1 file.txt>trailer.txt
sed '1d;$d' file.txt
b. use three separate source qualifier to process header, trailer and file.txt.
If you have multiple files, you need to create create separate set of files and list files. And this fits nicely for multiple files because you can control them using shell script.
Option 2 -
a. Read data using one SQ with all possible fields.
b. Link a sequence generator to give unique id to each row. always start from 0 and reset every time before running.
c. Use am aggregator to get max sequence number. Pass sequence field + any dummy column and set no group by. join agg back to main pipeline using a dummy port.
d. use a router. set groups like this -
Seq No. = 1, then header row
Seq No. = max_row> seq >1), then main data row
Seq No. = max_row, then trailer row
e. Then you have three pipelines for different data set - header, trailer, and main data. Process next as per your business logic. Mapping should look like this -
SQ_GEN-| |->AGG_MAX->| |seq=1 header
SQ ----------|->SRT ->|---------->JNR--> RTR_SEPERATOR -->|seq=max trailer
|max > seq >1 = main data
Now, if you have multiple files, you need to process one file at a time and you cant process all of them together. Also, if you have millions of rows, this may be inefficient.

Read each record as a string
In expression transformation create an output port with name REC_TYPE and use SUBSTR(record,0,6).
Segregate the header, trailer or detail record with REC_TYPE using router or filter transformation.
Header record : REC_TYPE ='HEADER'.
Detail record : REC_TYPE !='HEADER' && REC_TYPE !='Traile'.
Trailer record: REC_TYPE = 'Traile'.
Process each group as per your requirement.

Related

how to load multiple files into one file using informatica

I am new to informatics, I have created a mapping that using expression and sorter transformation to load multiple files into one single file which have 2 columns
1 data
2 seq number
All 10 files have random sequence numbers Like
example:
file1
erfef 3
abcdn 1
file 2
wewewr 4
wderfv 5
and so on till 10 files.
Expression transformation code is :
INTEGER(LTRIM(RTRIM(seq_num)),TRUE)
what I want is to load the file into one big file and sort it according to the seq number.
Got data in output file but number with incorrect seq number.
How to get data in the final table with a correct sequence number.
doing exactly what is mention in the below solution but still getting wrong output. getting output like:
erfef 3
abcdn 1
wewewr 4
wderfv 5
where as it should be like below:
where as it should be like
abcdn 1
erfef 3
wewewr 4
wderfv 5
Thanks in Advance !!!
Use indirect file load using a list of files to load all files together. Then use sorter on col2 to order the data. Finally use a target file to store data.
Whole mapping should be like this -
SQ --> EXP--> SRT(key = col2) --> Target
Few things to note -
In the session, use indirect file and use a list file name - mention filelist1.txt
Use ls -1 file* >filelist1.txt in pre session command task to create a file list with all required files.
Expression transformation- convert the col2 to INTEGER if its coming up as string in SQ.
Sorter transformation- use col2 as key column.
Using indirect file source is one way.
Another way is to use command as source and specify a command that will spit out data from all the files, like cat file*.csv.
Just change the Input Type to Command and provide the command - all this can be set by editing session -> mapping tab -> Source -> properties.
Here's an example session:

PDI - Check data types of field

I'm trying to create a transformation read csv files and check data types for each field in that csv.
Like this : the standard field A should string(1) character and field B is integer/number.
And what I want is to check/validate: If A not string(1) then set Status = Not Valid also if B not a integer/number to. Then all file with status Not Valid will be moved to error folder.
I know I can use Data Validator to do it, but how to move the file with that status? I can't find any step to do it.
You can read files in loop, and
add step as below,
after data validation, you can filter rows with the negative result(not matched) -> add constant values step and with error = 1 -> add set variable step for error field with default values 0.
after transformation finishes, you can do add simple evaluation step in parent job to check value of ERROR variable.
If it has value 1 then move files else ....
I hope this can help.
You can do same as in this question. Once read use the Group by to have one flag per file. However, this time you cannot do it in one transform, you should use a job.
Your use case is in the samples that was shipped with your PDI distribution. The sample is in the folder your-PDI/samples/jobs/run_all. Open the Run all sample transformations.kjb and replace the Filter 2 of the Get Files - Get all transformations.ktr by your logic which includes a Group by to have one status per file and not one status per row.
In case you wonder why you need such a complex logic for such a task, remember that the PDI starts all the steps of a transformation at the same time. That's its great power, but you do not know if you have to move the file before every row has been processed.
Alternatively, you have the quick and dirty solution of your similar question. Change the filter row by a type check, and the final Synchronize after merge by a Process File/Move
And a final advice: instead of checking the type with a Data validator, which is a good solution in itself, you may use a Javascript like
there. It is more flexible if you need maintenance on the long run.

How to load the first half records in one file and other half in other file in informatica?

I have tried expression transformation so far along with aggregate transformation to get the maximum value of the sequence number.Source is flat file
The way you are designing would require reading the source twice in the mapping, one to get the total number of records (max sequence as you called it) and then another one to read the detail records and pass them to target1 or target2.
You can simplify it by passing the number of records as a mapping parameter.
Either way, to decide when to route to a target - you can count the number of records read by keeping a running total in a variable port, incrementing every time a row passes thru the expression and checking against the (record count)/2.
If you don't really care about first half and second half and all you need is two output files equal in size, you can:
number the rows (with a rank transformation or a variable port),
then route even and odd rows to two different targets.
If you can, write a Unix (assuming your platform is Unix) shell script to do a head of the first file with half the file size in lines (use wc of the file with the right param as the param to head after dividing it by 2) and direct the output to a 3rd file. Then do a tail on the second file also using wc as just described and >> the output to the 3rd file you created. These would be pre-session commands. You'd use that 3rd file as the source file for your session. It'd look something like this (untested, but it gets the general idea across):
halfsize=`wc -l filename`
halfsize=$((halfsize/2))
head -n $halfsize filename > thirdfile
halfsize=`wc -l filename2`
halfsize=$((halfsize/2))
tail -n $halfsize filename2 >> thirdfile
prior to writing to the target you keep counts in an expression. then connect this expression to a router.
The router should have 2 groups
group1 count1 <= n/2 then route it to Target1
group2 count1 > n/2 then route it to Target2
Or
MOD(nextval/2) will send alternative records to alternative targets.
I guess it won't send first half to 1st target and 2nd half to 2nd target.

Informatica character sequence

I am trying to create character sequence like AAA, AAB, AAC, AAD,....,BAA, BAB, BAC,.... and So on in a flat file using Informatica. I have the formula to create the charater sequence.
Here I need to have sequence numbers generated in informatica. But I dont have any source file or database to have this source.
Is there any method in Informatica to create sequence using Sequence Generater when there is no source records to read?
This is bit tricky as Informatica will do row by row processing and your mapping won't initialize until you give source rows through input(File or DB). So for generating sequence of n length by Informatica trnasformations you need n rows in input.
Another soltion to this is to use Dummy Source(i.e. Source with one row) and you can pass the loop parameters from this source and then use Java transfornmation and Java code to generate this sequence.
There is no way to generate rows without a source in a mapping.
When I need to do that I use one of these methods :
Generating a file with as many lines as I need, with the seq command under Unix. It could also be used as a direct pipeline source without creating the file.
Getting lines from a database
For example Oracle can generate as many lines as you want with a hierarchical query :
SELECT LEVEL just_a_column
FROM dual
CONNECT BY LEVEL <= 26*26*26
DB2 can do that with a recursive query :
WITH DUMMY(ID) AS (
SELECT 1 FROM SYSIBM.SYSDUMMY1
UNION ALL
SELECT ID + 1 FROM DUMMY WHERE ID < 26*26*26
)
SELECT ID FROM DUMMY
You can generate rows using Java transformation. But even to use that , you need a source. I suggest you to use the formula in the Java transform and a dummy source to a database with a select getdate() statement so that a record is returned to call the Java transform. You can then generate the sequence as well in Java transform or connect sequence generator to output of Java transform to number them.
We have an option to create a sequence number even it is not available in the source.
Create a Sequence generator transformation. You will be getting NEXTVAL and CURRVAL.
In a property tab you will be having an option the create a sequence numbers.
Start values - the value from which it should start
Increment by - increment value
End value - the value in which it should end
Current value - your current value
Cycle - In case you require in cyclic
No.of cached values
Reset
Tracing level
Connect the NEXTVAL to your target column.

How to get MIDDLE Data from a FILE

I have 10 records in a file and I don't need the first and the last line, I need data from 2 through 9 lines only.
Can anybody provide me solution on it?
Source file example:
SIDE,MTYPE,PAGENO,CONTIND,SUBACC,SIGN,DEAL QUANTITY,SECURITY,SOURCE SYSTEM,TODATE,SETTLEMENT DATE,REFERENCE 4,REFERENCE 2,TRADE DATE,ACCRUED INTEREST,ACCRUED INTEREST CURRENCY,XAMT1,XAMT2,XAMT3,XAMT4,XAMT5
L,536,1,M,L_CAMS_COLATAGREEMENT,C,0,AGREEMENTS,CAMS_AGREEMENT,6/12/2013,6/12/2013,00107020052_CSA,107020052,6/12/2013,0,USD,,0,250000,0,200000
L,536,1,M,L_CAMS_COLATAGREEMENT,C,0,AGREEMENTS,CAMS_AGREEMENT,6/12/2013,6/12/2013,00115020036_CSA,115020036,6/12/2013,0,USD,,0,250000,0,220000
L,536,1,M,L_CAMS_COLATAGREEMENT,C,0,AGREEMENTS,CAMS_AGREEMENT,6/12/2013,6/12/2013,00301410097_CSA,301410097,6/12/2013,0,USD,,0,226725,0,226725
L,536,1,M,L_CAMS_COLATAGREEMENT,C,0,AGREEMENTS,CAMS_AGREEMENT,6/12/2013,6/12/2013,00030020088_CSA,30020088,6/12/2013,0,USD,,0,250000,0,250000
L,536,1,M,L_CAMS_COLATAGREEMENT,C,0,AGREEMENTS,CAMS_AGREEMENT,6/12/2013,6/12/2013,00106410075_CSA,106410075,6/12/2013,0,USD,,0,250000,0,260000
L,536,1,M,L_CAMS_COLATAGREEMENT,C,0,AGREEMENTS,CAMS_AGREEMENT,6/12/2013,6/12/2013,00116510010_CSA,116510010,6/12/2013,300000,USD,,0,250000,0,260000
L,536,1,M,L_CAMS_COLATAGREEMENT,C,0,AGREEMENTS,CAMS_AGREEMENT,6/12/2013,6/12/2013,00177020015_CSA,177020015,6/12/2013,0,USD,,0,250000,0,270000
L,536,1,M,L_CAMS_COLATAGREEMENT,C,0,AGREEMENTS,CAMS_AGREEMENT,6/12/2013,6/12/2013,00189110093_CSA,189110093,6/12/2013,0,USD,,0,250000,0,280000
L,536,1,M,L_CAMS_COLATAGREEMENT,C,0,AGREEMENTS,CAMS_AGREEMENT,6/12/2013,6/12/2013,00272220015_CSA,272220015,6/12/2013,0,USD,,0,250000,0,10000
L,536,1,M,L_CAMS_COLATAGREEMENT,C,0,AGREEMENTS,CAMS_AGREEMENT,6/12/2013,6/12/2013,SLAVE1,189110093,6/12/2013,0,USD,,0,250000,0,250000
L,536,1,M,L_CAMS_COLATAGREEMENT,C,0,AGREEMENTS,CAMS_AGREEMENT,6/12/2013,6/12/2013,SLAVE2,272220015,6/12/2013,0,USD,,0,250000,0,1000
L,536,1,M,L_CAMS_COLATAGREEMENT,C,0,AGREEMENTS,CAMS_AGREEMENT,6/12/2013,6/12/2013,SLAVE3,301410097,6/12/2013,0,USD,,0,250000,0,200
Not an expert in Informatica but I found the following answer on the web, hope it should be useful for you.
Step 1: You have to assign row numbers to each record. Generate the row numbers using the expression transformation. Create a DUMMY output port in the same expression transformation and assign 1 to that port. So that, the DUMMY output port always return 1 for each row.
Step 2: Pass the output of expression transformation to aggregator and do not specify any group by condition. Create an output port Ototalrecords in the aggregator and assign Ocount port to it. The aggregator will return the last row by default. The output of aggregator contains the DUMMY port which has value 1 and Ototal_records port which has the value of total number of records in the source.
Step 3: Pass the output of expression transformation, aggregator transformation to joiner transformation and join on the DUMMY port. In the joiner transformation check the property sorted input, then only you can connect both expression and aggregator to joiner transformation.
Step 4: In the last step use router transformation. In the router transformation create two output groups.
In the first group, the condition should be Ocount = 1 and connect the corresponding output group to table A. In the second group, the condition should be Ocount = Ototalrecords and connect the corresponding output group to table B. The output of default group should be connected to table C, which will contain all records except first & last.
Source: http://www.queryhome.com/47922/informatica-how-to-get-middle-data-from-a-file
From informatica prospective, There are multiple way to do this.
if data in flat file, the sqloverride would not work. you can create two pipe line, first line read from source and use aggregator get the count and assign to a mapping variable such v_total. second pipe line you use another variable v_count, initialize to 0 , call count function. create filter transformation, filter out v_count=1 and (v_total-v_count)=1, the rest will be load to target.
Seems a lot of code wasted making the mapping unnecessarilly complex when a simple unix command such as
head -9 (currentfilename) (newinputfilename)
Will do the job. Then all you need do is use the new file for your mapping (if you even need it anymore)
For a windows server equivalent see https://serverfault.com/questions/490841/how-to-display-the-first-n-lines-of-a-command-output-in-windows-the-equivalent