Infomatica Row Count - informatica

I am new to Informatica and I am confused.
I have data from a flat file and need to do some transformation for it. I just need a general idea on how to actually do it.
Say I have data that looks like this:
COL1, COl2, COl3, COL4
A B C D
A B B B
G G G G
B D D X
F F F F
B B A D
1) I need to transfer only rows that have the first column as A or B
2) I need the count of the rows that are A, and I need a separate count that is B
3) I need a comparison of the count of A and the count of B. If the count do not match then I need an email sent.
Can someone give me a link to something helpful or tell me exactly that types of transformation / logic I should be using? Thanks

There are multiple ways. Here's a simple one, step-by-step.
Use a filter on Source Qualifier to get just the data you need.
Separate into two pipelines using Router Transformation with two groups defined as COL1='A' and COL1='B'
Use Aggregate Transformation to get the counts (for each pipeline).
Use Expression Transformaiton to add a dummy port e.g. joinPort = 1 (for each pipeline).
Join the piplelines on dummy port using Joiner Transformation
Use Expression Transformation to compare the results.
Sending email a separate story.
Use a Workflow variable e.g. wfSendEmail initialized to 0 and a mapping variable e.g. mSendEmail
On session Components tab do the Pre-session variable assignment and assign wfSendEmail to mSendEmail.
In the exression transformation mentioned in p.6 above use the SETVARIABLE function if the counts do not match to set the mSendEmail to 1.
On session Components tab do the Post-session variable assignment and assign mSendEmail value to wfSendEmail.
Add and Email task with a condition wfSendEmail=1 on the link from the session.

Related

How to set NULL values to a single character in IICS?

There are 100+ incoming fields for a target transformation in IICS. NULLs can appear in any of these columns. But the end goal is to convert the NULLs in each of the incoming fields to * so that the data in the target consists of * instead of NULL.
A laborious way to do this is to define an expression for each column. That 100+ expressions to cover each and every column. The task of the expression is to convert NULL into *. But that is difficult in terms of maintenance.
In Informatica Power center there is a property on the target object that converts all the NULL values to * as shown in the below screenshot.
Tried setting the property Replacement Character on IICS for the target transformation. But that didn't help. The data is still coming in as NULL.
Do we have a similar functionality or property for target transformation on IICS? If so how to use it?
i think i find easier to create a reusable exp transformation with 10 input and 10 putput. Then copy it 10 times for 100 fields.
create an input, output port like below -
in_col
out_col = IIF(isnull(in_col) OR is_spaces(in_col),'*',in_col)
Then copy in_col - 10 times. And copy out_col 10 times. You need to adjust/fix the formula though.
Save it and make it reusable'
Then copy that reusable widget 10 times.
This has flexibility - if formula changes, you just have to change only 1 widget and viola, everything changed.
Try using Vertical macro. It allows writing a function that will affect a set of indicated ports. Follow the link for full documentation with examples.

sumproduct if a cell contains a certain value, give all results otherwise just the specified result

My issue is that I need to reference a cell (A1) which will either be the name of a state that can be found in column L, or it can be "All States" which I then want to include all results of column L. I can't work out how to include this.
=SUMPRODUCT(--(IF(A1="All States",Data!$L:$L,Data!$L:$L=A1)),Data!Q:Q)
I want to add a bunch more criteria based on the above so I don’t want to go down the route of imbedding the sumproduct in an if function because the formula will quickly become too unweildy.
You have a lot of choices. Using your initial formula I would tweak it to
(A) =SUMPRODUCT((IF($A$1="All States",1,($L$2:$L$11=$A$1)))*($Q$2:$Q$11))
But this would need to be entered as an array formula so instead of just confirming with ENTER, you need CONTROL+SHIFT+ENTER. You will know you have done it right when { } show up around your formula. Note that they cannot be added manually.
A non array type formula which would be faster I believe would be to look at your two options. You are either dealing with a single state or all states. Set up an IF check to determine if you need to sum all of column Q, or if you need to find a single value from column Q. I used the following formula:
(B) =IF(A1="all states",SUM($Q$2:$Q$11),INDEX($Q$2:$Q$11,MATCH($A$1,$L$2:$L$11,0)))
A bit of a cheat but but simplifies things, is to add a final state to the bottom of your list in L and call is "All States". In the corresponding row in Q place =sum(First Cell:Last Cell). If you do that then you can use the following formula:
(C) =SUMPRODUCT(($L$2:$L$12=$A$1)*($Q$2:$Q$12))
That are other options out there as well, just thought I would show some options.

Calculate ever expanding number of columns with data to the right

Currently have a spreadsheet that tracks attendance. First column is name, second column is attendance % and contains the formula I need to revise, subsequent columns simply have an X or O in them and denote whether someone attended or not (headers for these columns are dates).
Currently using a COUNTIF() I can check how many X's there are and then the formula is SUM(100/no_of_columns*COUNTIF(A3:A12))
Ideally I want to firstly replace no_of_columns with the actual number of columns with data to the right.
I've thought about replacing this with a SUM(COUNTIF('X')+COUNTIF('O')) but it seems pretty messy?
Secondly I want to replace the A12 with whatever the last column value is.
I could just make the last column a very high column value, but again feels messy and would like to know if there is a better way...
Example: https://docs.google.com/spreadsheets/d/1rjnUQP7V-U1EZTp3Z8yO7HybBCuQjf2y4LJ4Dv4ctF8/edit?usp=sharing
Presume you only have the attendance dates in Row 1 without other information such as headers for Column A and B,
Put the following formula in Cell B2 and drag it down,
=COUNTIF(INDEX(OFFSET($C2,,,,COUNTA($1:$1)),),"x")/COUNTA($1:$1)*100
The logic is to use INDEX + OFFSET function to dynamically return the range of columns on the right, and use COUNTA to find out how many dates are there, and you should understand the use of COUNTIF, the calculation is self-explanatory.
EDIT #2
After looking into your worksheet, I guess you are adding the new dates by inserting columns between B and C so you probably want to use the following formula in Cell B2 instead to avoid the system shifting the starting cell reference automatically:
=COUNTIF(INDEX(OFFSET($B2,,1,,COUNTA($1:$1)),),"x")/COUNTA($1:$1)*100
The logic is the same as the previous one but just a little change to the OFFSET references so it starts looking for the range from Column B instead of C.
I have tested the above in both Excel and Google-sheets working just fine. Let me know if you have any questions. Cheers :)
paste in B2:
=ARRAYFORMULA(IFERROR(IF(LEN(A2:A),
MMULT(IF(INDIRECT("C2:"&ADDRESS(ROWS(A2:A), MAX(IF(1:1<>"", COLUMN(1:1), ))))="x", 1, 0),
TRANSPOSE(COLUMN(INDIRECT("C2:"&ADDRESS(ROWS(A2:A), MAX(IF(1:1<>"", COLUMN(1:1), )))))^0))/
MMULT(IF(INDIRECT("C2:"&ADDRESS(ROWS(A2:A), MAX(IF(1:1<>"", COLUMN(1:1), ))))<>"", 1, 0),
TRANSPOSE(COLUMN(INDIRECT("C2:"&ADDRESS(ROWS(A2:A), MAX(IF(1:1<>"", COLUMN(1:1), )))))^0))*100, ), 0))
spreadsheet demo

How can one create a single table from multiple datasets?

I'm trying to create a descriptive table by treatment group. For my analysis, I have 3 different partitions of the data (because I'm running 3 separate analyses) from a complete data set, but I only have one statistic from each subset that I am trying to describe, so I think it'd look better in one complete table. At the end, I'd like an output that can convert to latex (as I'm using bookdown).
I've been using the compareGroups package to easily create each table individually. I know that there is an rbind function that allows to create a stacked table, but it won't let me combine them because the n of each separate data frame is different (due to missingness). For instance, I'm trying to study marriage in one of my analyses, and later divorce (which is a separate analysis), and so the n's of these two data frames differ, but the definition of treatment group is the same.
Ideally, I'd have two columns, one for the treatment group and one for the control group. There would be two rows, one that has age of first marriage, and the second row which would have length of that first marriage, and then the respective ns of the cells.
library(compareGroups)
d1 <- compareGroups(treat ~ time1mar,
data = nlsy.mar,
simplify=TRUE,
na.action=na.omit) %>% createTable(.,
type=1,
show.p.overall = FALSE)
d2 <- compareGroups(treat ~ time1div,
data = nlsy.div,
simplify=TRUE,
na.action=na.omit) %>% createTable(.,
type=1,
show.p.overall = FALSE)
d.tot <- rbind(`First Age at Marriage` = d1, `Length of First Marriage` = d2)
This is the error that I get:
Error in data.frame(..., check.names = FALSE) :
arguments imply differing number of rows: 6626, 5057
Any suggestions?
The problem might be that you're using na.omit which delets the cases/rows with NAs from both of your datasets. Probably a different amount of cases get removed from each data set. But actually different numbers of row should only be a problem with cbind. However you might try to change the na.action option.
I'm just guessing. As said by joshpk without sample data is difficult to reproduce your problem.

How to get MIDDLE Data from a FILE

I have 10 records in a file and I don't need the first and the last line, I need data from 2 through 9 lines only.
Can anybody provide me solution on it?
Source file example:
SIDE,MTYPE,PAGENO,CONTIND,SUBACC,SIGN,DEAL QUANTITY,SECURITY,SOURCE SYSTEM,TODATE,SETTLEMENT DATE,REFERENCE 4,REFERENCE 2,TRADE DATE,ACCRUED INTEREST,ACCRUED INTEREST CURRENCY,XAMT1,XAMT2,XAMT3,XAMT4,XAMT5
L,536,1,M,L_CAMS_COLATAGREEMENT,C,0,AGREEMENTS,CAMS_AGREEMENT,6/12/2013,6/12/2013,00107020052_CSA,107020052,6/12/2013,0,USD,,0,250000,0,200000
L,536,1,M,L_CAMS_COLATAGREEMENT,C,0,AGREEMENTS,CAMS_AGREEMENT,6/12/2013,6/12/2013,00115020036_CSA,115020036,6/12/2013,0,USD,,0,250000,0,220000
L,536,1,M,L_CAMS_COLATAGREEMENT,C,0,AGREEMENTS,CAMS_AGREEMENT,6/12/2013,6/12/2013,00301410097_CSA,301410097,6/12/2013,0,USD,,0,226725,0,226725
L,536,1,M,L_CAMS_COLATAGREEMENT,C,0,AGREEMENTS,CAMS_AGREEMENT,6/12/2013,6/12/2013,00030020088_CSA,30020088,6/12/2013,0,USD,,0,250000,0,250000
L,536,1,M,L_CAMS_COLATAGREEMENT,C,0,AGREEMENTS,CAMS_AGREEMENT,6/12/2013,6/12/2013,00106410075_CSA,106410075,6/12/2013,0,USD,,0,250000,0,260000
L,536,1,M,L_CAMS_COLATAGREEMENT,C,0,AGREEMENTS,CAMS_AGREEMENT,6/12/2013,6/12/2013,00116510010_CSA,116510010,6/12/2013,300000,USD,,0,250000,0,260000
L,536,1,M,L_CAMS_COLATAGREEMENT,C,0,AGREEMENTS,CAMS_AGREEMENT,6/12/2013,6/12/2013,00177020015_CSA,177020015,6/12/2013,0,USD,,0,250000,0,270000
L,536,1,M,L_CAMS_COLATAGREEMENT,C,0,AGREEMENTS,CAMS_AGREEMENT,6/12/2013,6/12/2013,00189110093_CSA,189110093,6/12/2013,0,USD,,0,250000,0,280000
L,536,1,M,L_CAMS_COLATAGREEMENT,C,0,AGREEMENTS,CAMS_AGREEMENT,6/12/2013,6/12/2013,00272220015_CSA,272220015,6/12/2013,0,USD,,0,250000,0,10000
L,536,1,M,L_CAMS_COLATAGREEMENT,C,0,AGREEMENTS,CAMS_AGREEMENT,6/12/2013,6/12/2013,SLAVE1,189110093,6/12/2013,0,USD,,0,250000,0,250000
L,536,1,M,L_CAMS_COLATAGREEMENT,C,0,AGREEMENTS,CAMS_AGREEMENT,6/12/2013,6/12/2013,SLAVE2,272220015,6/12/2013,0,USD,,0,250000,0,1000
L,536,1,M,L_CAMS_COLATAGREEMENT,C,0,AGREEMENTS,CAMS_AGREEMENT,6/12/2013,6/12/2013,SLAVE3,301410097,6/12/2013,0,USD,,0,250000,0,200
Not an expert in Informatica but I found the following answer on the web, hope it should be useful for you.
Step 1: You have to assign row numbers to each record. Generate the row numbers using the expression transformation. Create a DUMMY output port in the same expression transformation and assign 1 to that port. So that, the DUMMY output port always return 1 for each row.
Step 2: Pass the output of expression transformation to aggregator and do not specify any group by condition. Create an output port Ototalrecords in the aggregator and assign Ocount port to it. The aggregator will return the last row by default. The output of aggregator contains the DUMMY port which has value 1 and Ototal_records port which has the value of total number of records in the source.
Step 3: Pass the output of expression transformation, aggregator transformation to joiner transformation and join on the DUMMY port. In the joiner transformation check the property sorted input, then only you can connect both expression and aggregator to joiner transformation.
Step 4: In the last step use router transformation. In the router transformation create two output groups.
In the first group, the condition should be Ocount = 1 and connect the corresponding output group to table A. In the second group, the condition should be Ocount = Ototalrecords and connect the corresponding output group to table B. The output of default group should be connected to table C, which will contain all records except first & last.
Source: http://www.queryhome.com/47922/informatica-how-to-get-middle-data-from-a-file
From informatica prospective, There are multiple way to do this.
if data in flat file, the sqloverride would not work. you can create two pipe line, first line read from source and use aggregator get the count and assign to a mapping variable such v_total. second pipe line you use another variable v_count, initialize to 0 , call count function. create filter transformation, filter out v_count=1 and (v_total-v_count)=1, the rest will be load to target.
Seems a lot of code wasted making the mapping unnecessarilly complex when a simple unix command such as
head -9 (currentfilename) (newinputfilename)
Will do the job. Then all you need do is use the new file for your mapping (if you even need it anymore)
For a windows server equivalent see https://serverfault.com/questions/490841/how-to-display-the-first-n-lines-of-a-command-output-in-windows-the-equivalent