Informatica records scenario - informatica

I have huge no of records from source. I have to load first 1000 records into first target, second 1000 records into second target, third 1000 records into third target, fourth 1000 records into first target, fifth 1000 records into second target, next thousand records into third target and so on...can any one give me solution for this...I was able to load 3000 records into 3 different targets but unable to load 3001-4000 records into first target and 4001-5000 records into second target and so on...

try something like this...
Use a sequence generator to tag each row with a row number
Create a calculated field that gives a value of 1, 2 or 3 based on the sequence value i.e using the logic you outlined in your question
Use a router transformation to direct each row to the appropriate target based on the calculated field

Related

how to find a pattern which is repeated n number of times in a column of a table in informatica

i have a scenario in which a field, of a particular record, in my table looks like below (array format)
The set of id, email and address can be repeated n number of times for each record. So i need to set up a mapping in informatica where it will give me the output like below:
......waiting for a solution thanks
i tried with substr and instr functions but with that i need to know beforehand how many times the mail id is occurring in a particular record. since the email can be repeated n number of times for each row, hence i am not able to find a way which will dynamically tell my instr function to run for n number of times

Power BI - comparing row value with column from another table results in excessive computational load

I'm analyzing the logs from a firewall and I wanted to add two columns in Power Query M to determine if either the source or the destination ip address are LAN addresses or from the Internet.
I created a file called Private_IPs.txt that contains row by row all the internal subnets (10. , 172.16., 172.17. etc) and loaded it as a table.
The code for the calculation is this:
#"Add dst isPrivate" = Table.AddColumn(#"Add src isPrivate", "dst_isPrivate", each List.Count(let tmp_dst = [dest_ip] in List.Select(Table.Column(Private_IPs, "Subnet"), each Text.StartsWith(tmp_dst, _))))
It creates a list by selecting the column "Subnet" of the table Private_IPs
From this lists, it only keeps the element that [dest_ip] starts with, if any.
It then counts the numbers of items in the filtered list, so that my resulting column value is either 0 or 1
It works but the problem is that when I Refresh the data, it loads the file "PrivateIPs.csv" two times for each row of the table, resulting in minutes and minutes of loading time and the counter reports something like
"10MB from Private_IPs.csv"
"20MB from Private_IPs.csv"
[...]
"400MB from Private_IPs.csv"
Why does this happen? Shouldn't it keep the table in memory instead of reading from the file each time? How do I make it so? It's only a text file with 17 rows in it, maybe my solution is too convoluted?
Since the table with the private subnets was very small and it won't change, I created it manually inside the Query Editor and it's now only loading once.

Redshift -- Query Performance Issues

SELECT
a.id,
b.url as codingurl
FROM fact_A a
INNER JOIN dim_B b
ON strpos(a.url,b.url)> 0
Records Count in Fact_A: 2 Million
Records Count in Dim_B : 1500
Time Taken to Execute : 10 Mins
No of Nodes: 2
Could someone help me with an understanding why the above query takes more time to execute?
We have declared the distribution key in Fact_A to appropriately distribute the records evenly in both the nodes and also Sort Key is created on URL in Fact_A.
Dim_B table is created with DISTRIBUTION ALL.
Redshift does not have full-text search indexes or prefix indexes, so a query like this (with strpos used in filter) will result in full table scan, executing strpos 3 billion times.
Depending on which urls are in dim_B, you might be able to optimise this by extracting prefixes into separate columns. For example, if you always compare subpaths of the form http[s]://hostname/part1/part2/part3 then you can extract "part1/part2/part3" as a separate column both in fact_A and dim_B, and make it the dist and sort keys.
You can also rely on parallelism of Redshift. If you resize your cluster from 2 nodes to 20 nodes, you should see immediate performance improvement of 8-10 times as this kind of query can be executed by each node in parallel (for the most part).

How to load the first half records in one file and other half in other file in informatica?

I have tried expression transformation so far along with aggregate transformation to get the maximum value of the sequence number.Source is flat file
The way you are designing would require reading the source twice in the mapping, one to get the total number of records (max sequence as you called it) and then another one to read the detail records and pass them to target1 or target2.
You can simplify it by passing the number of records as a mapping parameter.
Either way, to decide when to route to a target - you can count the number of records read by keeping a running total in a variable port, incrementing every time a row passes thru the expression and checking against the (record count)/2.
If you don't really care about first half and second half and all you need is two output files equal in size, you can:
number the rows (with a rank transformation or a variable port),
then route even and odd rows to two different targets.
If you can, write a Unix (assuming your platform is Unix) shell script to do a head of the first file with half the file size in lines (use wc of the file with the right param as the param to head after dividing it by 2) and direct the output to a 3rd file. Then do a tail on the second file also using wc as just described and >> the output to the 3rd file you created. These would be pre-session commands. You'd use that 3rd file as the source file for your session. It'd look something like this (untested, but it gets the general idea across):
halfsize=`wc -l filename`
halfsize=$((halfsize/2))
head -n $halfsize filename > thirdfile
halfsize=`wc -l filename2`
halfsize=$((halfsize/2))
tail -n $halfsize filename2 >> thirdfile
prior to writing to the target you keep counts in an expression. then connect this expression to a router.
The router should have 2 groups
group1 count1 <= n/2 then route it to Target1
group2 count1 > n/2 then route it to Target2
Or
MOD(nextval/2) will send alternative records to alternative targets.
I guess it won't send first half to 1st target and 2nd half to 2nd target.

How to get MIDDLE Data from a FILE

I have 10 records in a file and I don't need the first and the last line, I need data from 2 through 9 lines only.
Can anybody provide me solution on it?
Source file example:
SIDE,MTYPE,PAGENO,CONTIND,SUBACC,SIGN,DEAL QUANTITY,SECURITY,SOURCE SYSTEM,TODATE,SETTLEMENT DATE,REFERENCE 4,REFERENCE 2,TRADE DATE,ACCRUED INTEREST,ACCRUED INTEREST CURRENCY,XAMT1,XAMT2,XAMT3,XAMT4,XAMT5
L,536,1,M,L_CAMS_COLATAGREEMENT,C,0,AGREEMENTS,CAMS_AGREEMENT,6/12/2013,6/12/2013,00107020052_CSA,107020052,6/12/2013,0,USD,,0,250000,0,200000
L,536,1,M,L_CAMS_COLATAGREEMENT,C,0,AGREEMENTS,CAMS_AGREEMENT,6/12/2013,6/12/2013,00115020036_CSA,115020036,6/12/2013,0,USD,,0,250000,0,220000
L,536,1,M,L_CAMS_COLATAGREEMENT,C,0,AGREEMENTS,CAMS_AGREEMENT,6/12/2013,6/12/2013,00301410097_CSA,301410097,6/12/2013,0,USD,,0,226725,0,226725
L,536,1,M,L_CAMS_COLATAGREEMENT,C,0,AGREEMENTS,CAMS_AGREEMENT,6/12/2013,6/12/2013,00030020088_CSA,30020088,6/12/2013,0,USD,,0,250000,0,250000
L,536,1,M,L_CAMS_COLATAGREEMENT,C,0,AGREEMENTS,CAMS_AGREEMENT,6/12/2013,6/12/2013,00106410075_CSA,106410075,6/12/2013,0,USD,,0,250000,0,260000
L,536,1,M,L_CAMS_COLATAGREEMENT,C,0,AGREEMENTS,CAMS_AGREEMENT,6/12/2013,6/12/2013,00116510010_CSA,116510010,6/12/2013,300000,USD,,0,250000,0,260000
L,536,1,M,L_CAMS_COLATAGREEMENT,C,0,AGREEMENTS,CAMS_AGREEMENT,6/12/2013,6/12/2013,00177020015_CSA,177020015,6/12/2013,0,USD,,0,250000,0,270000
L,536,1,M,L_CAMS_COLATAGREEMENT,C,0,AGREEMENTS,CAMS_AGREEMENT,6/12/2013,6/12/2013,00189110093_CSA,189110093,6/12/2013,0,USD,,0,250000,0,280000
L,536,1,M,L_CAMS_COLATAGREEMENT,C,0,AGREEMENTS,CAMS_AGREEMENT,6/12/2013,6/12/2013,00272220015_CSA,272220015,6/12/2013,0,USD,,0,250000,0,10000
L,536,1,M,L_CAMS_COLATAGREEMENT,C,0,AGREEMENTS,CAMS_AGREEMENT,6/12/2013,6/12/2013,SLAVE1,189110093,6/12/2013,0,USD,,0,250000,0,250000
L,536,1,M,L_CAMS_COLATAGREEMENT,C,0,AGREEMENTS,CAMS_AGREEMENT,6/12/2013,6/12/2013,SLAVE2,272220015,6/12/2013,0,USD,,0,250000,0,1000
L,536,1,M,L_CAMS_COLATAGREEMENT,C,0,AGREEMENTS,CAMS_AGREEMENT,6/12/2013,6/12/2013,SLAVE3,301410097,6/12/2013,0,USD,,0,250000,0,200
Not an expert in Informatica but I found the following answer on the web, hope it should be useful for you.
Step 1: You have to assign row numbers to each record. Generate the row numbers using the expression transformation. Create a DUMMY output port in the same expression transformation and assign 1 to that port. So that, the DUMMY output port always return 1 for each row.
Step 2: Pass the output of expression transformation to aggregator and do not specify any group by condition. Create an output port Ototalrecords in the aggregator and assign Ocount port to it. The aggregator will return the last row by default. The output of aggregator contains the DUMMY port which has value 1 and Ototal_records port which has the value of total number of records in the source.
Step 3: Pass the output of expression transformation, aggregator transformation to joiner transformation and join on the DUMMY port. In the joiner transformation check the property sorted input, then only you can connect both expression and aggregator to joiner transformation.
Step 4: In the last step use router transformation. In the router transformation create two output groups.
In the first group, the condition should be Ocount = 1 and connect the corresponding output group to table A. In the second group, the condition should be Ocount = Ototalrecords and connect the corresponding output group to table B. The output of default group should be connected to table C, which will contain all records except first & last.
Source: http://www.queryhome.com/47922/informatica-how-to-get-middle-data-from-a-file
From informatica prospective, There are multiple way to do this.
if data in flat file, the sqloverride would not work. you can create two pipe line, first line read from source and use aggregator get the count and assign to a mapping variable such v_total. second pipe line you use another variable v_count, initialize to 0 , call count function. create filter transformation, filter out v_count=1 and (v_total-v_count)=1, the rest will be load to target.
Seems a lot of code wasted making the mapping unnecessarilly complex when a simple unix command such as
head -9 (currentfilename) (newinputfilename)
Will do the job. Then all you need do is use the new file for your mapping (if you even need it anymore)
For a windows server equivalent see https://serverfault.com/questions/490841/how-to-display-the-first-n-lines-of-a-command-output-in-windows-the-equivalent