How to get MIDDLE Data from a FILE - informatica

I have 10 records in a file and I don't need the first and the last line, I need data from 2 through 9 lines only.
Can anybody provide me solution on it?
Source file example:
SIDE,MTYPE,PAGENO,CONTIND,SUBACC,SIGN,DEAL QUANTITY,SECURITY,SOURCE SYSTEM,TODATE,SETTLEMENT DATE,REFERENCE 4,REFERENCE 2,TRADE DATE,ACCRUED INTEREST,ACCRUED INTEREST CURRENCY,XAMT1,XAMT2,XAMT3,XAMT4,XAMT5
L,536,1,M,L_CAMS_COLATAGREEMENT,C,0,AGREEMENTS,CAMS_AGREEMENT,6/12/2013,6/12/2013,00107020052_CSA,107020052,6/12/2013,0,USD,,0,250000,0,200000
L,536,1,M,L_CAMS_COLATAGREEMENT,C,0,AGREEMENTS,CAMS_AGREEMENT,6/12/2013,6/12/2013,00115020036_CSA,115020036,6/12/2013,0,USD,,0,250000,0,220000
L,536,1,M,L_CAMS_COLATAGREEMENT,C,0,AGREEMENTS,CAMS_AGREEMENT,6/12/2013,6/12/2013,00301410097_CSA,301410097,6/12/2013,0,USD,,0,226725,0,226725
L,536,1,M,L_CAMS_COLATAGREEMENT,C,0,AGREEMENTS,CAMS_AGREEMENT,6/12/2013,6/12/2013,00030020088_CSA,30020088,6/12/2013,0,USD,,0,250000,0,250000
L,536,1,M,L_CAMS_COLATAGREEMENT,C,0,AGREEMENTS,CAMS_AGREEMENT,6/12/2013,6/12/2013,00106410075_CSA,106410075,6/12/2013,0,USD,,0,250000,0,260000
L,536,1,M,L_CAMS_COLATAGREEMENT,C,0,AGREEMENTS,CAMS_AGREEMENT,6/12/2013,6/12/2013,00116510010_CSA,116510010,6/12/2013,300000,USD,,0,250000,0,260000
L,536,1,M,L_CAMS_COLATAGREEMENT,C,0,AGREEMENTS,CAMS_AGREEMENT,6/12/2013,6/12/2013,00177020015_CSA,177020015,6/12/2013,0,USD,,0,250000,0,270000
L,536,1,M,L_CAMS_COLATAGREEMENT,C,0,AGREEMENTS,CAMS_AGREEMENT,6/12/2013,6/12/2013,00189110093_CSA,189110093,6/12/2013,0,USD,,0,250000,0,280000
L,536,1,M,L_CAMS_COLATAGREEMENT,C,0,AGREEMENTS,CAMS_AGREEMENT,6/12/2013,6/12/2013,00272220015_CSA,272220015,6/12/2013,0,USD,,0,250000,0,10000
L,536,1,M,L_CAMS_COLATAGREEMENT,C,0,AGREEMENTS,CAMS_AGREEMENT,6/12/2013,6/12/2013,SLAVE1,189110093,6/12/2013,0,USD,,0,250000,0,250000
L,536,1,M,L_CAMS_COLATAGREEMENT,C,0,AGREEMENTS,CAMS_AGREEMENT,6/12/2013,6/12/2013,SLAVE2,272220015,6/12/2013,0,USD,,0,250000,0,1000
L,536,1,M,L_CAMS_COLATAGREEMENT,C,0,AGREEMENTS,CAMS_AGREEMENT,6/12/2013,6/12/2013,SLAVE3,301410097,6/12/2013,0,USD,,0,250000,0,200

Not an expert in Informatica but I found the following answer on the web, hope it should be useful for you.
Step 1: You have to assign row numbers to each record. Generate the row numbers using the expression transformation. Create a DUMMY output port in the same expression transformation and assign 1 to that port. So that, the DUMMY output port always return 1 for each row.
Step 2: Pass the output of expression transformation to aggregator and do not specify any group by condition. Create an output port Ototalrecords in the aggregator and assign Ocount port to it. The aggregator will return the last row by default. The output of aggregator contains the DUMMY port which has value 1 and Ototal_records port which has the value of total number of records in the source.
Step 3: Pass the output of expression transformation, aggregator transformation to joiner transformation and join on the DUMMY port. In the joiner transformation check the property sorted input, then only you can connect both expression and aggregator to joiner transformation.
Step 4: In the last step use router transformation. In the router transformation create two output groups.
In the first group, the condition should be Ocount = 1 and connect the corresponding output group to table A. In the second group, the condition should be Ocount = Ototalrecords and connect the corresponding output group to table B. The output of default group should be connected to table C, which will contain all records except first & last.
Source: http://www.queryhome.com/47922/informatica-how-to-get-middle-data-from-a-file

From informatica prospective, There are multiple way to do this.
if data in flat file, the sqloverride would not work. you can create two pipe line, first line read from source and use aggregator get the count and assign to a mapping variable such v_total. second pipe line you use another variable v_count, initialize to 0 , call count function. create filter transformation, filter out v_count=1 and (v_total-v_count)=1, the rest will be load to target.

Seems a lot of code wasted making the mapping unnecessarilly complex when a simple unix command such as
head -9 (currentfilename) (newinputfilename)
Will do the job. Then all you need do is use the new file for your mapping (if you even need it anymore)
For a windows server equivalent see https://serverfault.com/questions/490841/how-to-display-the-first-n-lines-of-a-command-output-in-windows-the-equivalent

Related

How to set NULL values to a single character in IICS?

There are 100+ incoming fields for a target transformation in IICS. NULLs can appear in any of these columns. But the end goal is to convert the NULLs in each of the incoming fields to * so that the data in the target consists of * instead of NULL.
A laborious way to do this is to define an expression for each column. That 100+ expressions to cover each and every column. The task of the expression is to convert NULL into *. But that is difficult in terms of maintenance.
In Informatica Power center there is a property on the target object that converts all the NULL values to * as shown in the below screenshot.
Tried setting the property Replacement Character on IICS for the target transformation. But that didn't help. The data is still coming in as NULL.
Do we have a similar functionality or property for target transformation on IICS? If so how to use it?
i think i find easier to create a reusable exp transformation with 10 input and 10 putput. Then copy it 10 times for 100 fields.
create an input, output port like below -
in_col
out_col = IIF(isnull(in_col) OR is_spaces(in_col),'*',in_col)
Then copy in_col - 10 times. And copy out_col 10 times. You need to adjust/fix the formula though.
Save it and make it reusable'
Then copy that reusable widget 10 times.
This has flexibility - if formula changes, you just have to change only 1 widget and viola, everything changed.
Try using Vertical macro. It allows writing a function that will affect a set of indicated ports. Follow the link for full documentation with examples.

PDI - Check data types of field

I'm trying to create a transformation read csv files and check data types for each field in that csv.
Like this : the standard field A should string(1) character and field B is integer/number.
And what I want is to check/validate: If A not string(1) then set Status = Not Valid also if B not a integer/number to. Then all file with status Not Valid will be moved to error folder.
I know I can use Data Validator to do it, but how to move the file with that status? I can't find any step to do it.
You can read files in loop, and
add step as below,
after data validation, you can filter rows with the negative result(not matched) -> add constant values step and with error = 1 -> add set variable step for error field with default values 0.
after transformation finishes, you can do add simple evaluation step in parent job to check value of ERROR variable.
If it has value 1 then move files else ....
I hope this can help.
You can do same as in this question. Once read use the Group by to have one flag per file. However, this time you cannot do it in one transform, you should use a job.
Your use case is in the samples that was shipped with your PDI distribution. The sample is in the folder your-PDI/samples/jobs/run_all. Open the Run all sample transformations.kjb and replace the Filter 2 of the Get Files - Get all transformations.ktr by your logic which includes a Group by to have one status per file and not one status per row.
In case you wonder why you need such a complex logic for such a task, remember that the PDI starts all the steps of a transformation at the same time. That's its great power, but you do not know if you have to move the file before every row has been processed.
Alternatively, you have the quick and dirty solution of your similar question. Change the filter row by a type check, and the final Synchronize after merge by a Process File/Move
And a final advice: instead of checking the type with a Data validator, which is a good solution in itself, you may use a Javascript like
there. It is more flexible if you need maintenance on the long run.

Sequence generator with aggregator

Data is being passed through an aggregator transformation and grouped by customer account number to ensure I have distinct values. This is then passed to an expression transformation. I have a sequence generator transformation linked to the expression transformation - it never touches the aggregator. A variable in the expression is populated with the sequence number.
The problem I am running into is that the variable is coming up with a value in excess of the sequence number - e.g if there are 499 rows, the value of the variable is 501. It's as though the value assigned to the variable is ignoring the grouping and returning a non-distinct count.
Any idea what's happening here?
edit: More info on how this is being done. (Can't screenshot as it's too big.)
Flow 1 takes a list of account numbers, service numbers and destination systems and uses a router to sort them into flat files by destination system.
123456|0299999999|SYSA
123456|0299999999|SYSB
123457|0299999998|SYSA
123457|0299999998|SYSB
123457|0299999997|SYSA
123457|0299999997|SYSB
Some systems don't want the service number and some do. For those that do, it's a simple exercise of routing them through an expression transformation to set the variable using the sequence number. So the required output for SYSA would look like:
123456|0299999999|SYSA
123457|0299999998|SYSA
123457|0299999997|SYSA
And the expression transformation sets the variable using:
SETVARIABLE($$SYSA, SEQUENCE_NO)
In a second flow, I construct header and trailer files. For the trailer record count, I simply output the current value of $$SYSA like so:
SETVARIABLE($$SYSA, NULL)
I use Target Load Plan to execute the second flow only after the first completes.
I can demonstrate that using the variable in this way works, because the workflow outputs the correct values every time - I can alter the source datato increase or decrease the number of rows, and the value output for $$SYSA in the second flow is correct the first time (i.e it can't be a persisted value).
Where this is falling down is when the destination system only wants distinct account numbers and no service numbers. The required output for SYSB would be:
123456|SYSB
123457|SYSB
i.e the third row for SYSB is discarded because the account number is not unique. I'm trying to achieve this by putting an aggregator between the router and the expression, and grouping by the account number. However the $$SYSB variable isn't being assigned correctly in this case.
It appears Informatica was only updating the value of the variable if it is higher than the persistent value stored in the repository. So if a successful run persists a value of 501 to the repository, that value is picked up again at the start of the next run and it's only overridden if the new value is higher. I worked around it by declaring a starting value of 0 in the parameter file.

How to load the first half records in one file and other half in other file in informatica?

I have tried expression transformation so far along with aggregate transformation to get the maximum value of the sequence number.Source is flat file
The way you are designing would require reading the source twice in the mapping, one to get the total number of records (max sequence as you called it) and then another one to read the detail records and pass them to target1 or target2.
You can simplify it by passing the number of records as a mapping parameter.
Either way, to decide when to route to a target - you can count the number of records read by keeping a running total in a variable port, incrementing every time a row passes thru the expression and checking against the (record count)/2.
If you don't really care about first half and second half and all you need is two output files equal in size, you can:
number the rows (with a rank transformation or a variable port),
then route even and odd rows to two different targets.
If you can, write a Unix (assuming your platform is Unix) shell script to do a head of the first file with half the file size in lines (use wc of the file with the right param as the param to head after dividing it by 2) and direct the output to a 3rd file. Then do a tail on the second file also using wc as just described and >> the output to the 3rd file you created. These would be pre-session commands. You'd use that 3rd file as the source file for your session. It'd look something like this (untested, but it gets the general idea across):
halfsize=`wc -l filename`
halfsize=$((halfsize/2))
head -n $halfsize filename > thirdfile
halfsize=`wc -l filename2`
halfsize=$((halfsize/2))
tail -n $halfsize filename2 >> thirdfile
prior to writing to the target you keep counts in an expression. then connect this expression to a router.
The router should have 2 groups
group1 count1 <= n/2 then route it to Target1
group2 count1 > n/2 then route it to Target2
Or
MOD(nextval/2) will send alternative records to alternative targets.
I guess it won't send first half to 1st target and 2nd half to 2nd target.

Informatica character sequence

I am trying to create character sequence like AAA, AAB, AAC, AAD,....,BAA, BAB, BAC,.... and So on in a flat file using Informatica. I have the formula to create the charater sequence.
Here I need to have sequence numbers generated in informatica. But I dont have any source file or database to have this source.
Is there any method in Informatica to create sequence using Sequence Generater when there is no source records to read?
This is bit tricky as Informatica will do row by row processing and your mapping won't initialize until you give source rows through input(File or DB). So for generating sequence of n length by Informatica trnasformations you need n rows in input.
Another soltion to this is to use Dummy Source(i.e. Source with one row) and you can pass the loop parameters from this source and then use Java transfornmation and Java code to generate this sequence.
There is no way to generate rows without a source in a mapping.
When I need to do that I use one of these methods :
Generating a file with as many lines as I need, with the seq command under Unix. It could also be used as a direct pipeline source without creating the file.
Getting lines from a database
For example Oracle can generate as many lines as you want with a hierarchical query :
SELECT LEVEL just_a_column
FROM dual
CONNECT BY LEVEL <= 26*26*26
DB2 can do that with a recursive query :
WITH DUMMY(ID) AS (
SELECT 1 FROM SYSIBM.SYSDUMMY1
UNION ALL
SELECT ID + 1 FROM DUMMY WHERE ID < 26*26*26
)
SELECT ID FROM DUMMY
You can generate rows using Java transformation. But even to use that , you need a source. I suggest you to use the formula in the Java transform and a dummy source to a database with a select getdate() statement so that a record is returned to call the Java transform. You can then generate the sequence as well in Java transform or connect sequence generator to output of Java transform to number them.
We have an option to create a sequence number even it is not available in the source.
Create a Sequence generator transformation. You will be getting NEXTVAL and CURRVAL.
In a property tab you will be having an option the create a sequence numbers.
Start values - the value from which it should start
Increment by - increment value
End value - the value in which it should end
Current value - your current value
Cycle - In case you require in cyclic
No.of cached values
Reset
Tracing level
Connect the NEXTVAL to your target column.