Is it compulsory to select group-by while performing count operation in the aggregator transformation in Informatica
In order to perform a count, you have to specify atleast one column in group by to AGGREGATOR transformation to let it know that it has to perform grouping on that column.
Even if you don't provide GROUP BY also, the mapping will not fail, but you won't get the expected result.
While using Aggregator transformation, you need to check group by as the result returns each row by performing aggregation one by one and the passes to the pipeline. If no group by is checked, the last row will be processed and it will return only single row (last row) as it has no command to aggregate data. In order to perform count with respect to specific column, it is manditory to check group by for required columns.
If you hesitate to group by, you can use expression transformation and use count function to perform aggregation for required column without grouping.
Thank you
It's not mandatory to select atleast one port as group by. Hiwever, if you don't choose any group by port - Infa will return only last row.
Hope this helps
Related
I have a requirement to filter (flatfile) only those records who has the colA values as 1,2,3,4,5,6 and also ColB as 'N'. The records that satisfy this condition from the source file should process to target.
Earlier it was said to check for only one value from colA. So therefore i applied
IIF(COLA='1' AND COLB'N',TRUE)
How to filter with multiple values for the same column? I am new to informatica power center.
There are two ways you can achieve this in an expression : using OR logical operator or using IN function.
With OR
IIF((COLA='1' OR COLA='2' OR COLA='3' OR COLA='4' OR COLA='5) AND COLB='N',TRUE)
Parenthesis are essential to group conditions on COLA.
With IN
IIF(IN(COLA,'1','2,'3','4','5') AND COLB='N',TRUE)
I find this one easier to read.
I would like to be able to partition an Arrow table by the values of one of its columns (assuming the set of n values occurring in that column is known). The straightforward way is a for-loop: for each of these values, scan the whole table and build a new table of matching rows. Are there ways to do this in one pass instead of n passes?
I initially thought that Arrow's support for group-by scans would be the solution -- but Arrow (in contrast to Pandas) does not support extracting groups after a group-by scan.
Am I just thinking about this wrong and there is another way to partition a table in one pass?
For the group by support, there is a "hash_list" function that returns all values in the group. Is that what you're looking for? You could then slice the resulting values after-the-fact to extract the individual groups.
In my project I have to create my own multiple sorting and multiple grouping dialogs. Basically user can choose which columns should be included, select order and direction of operation.
For multiple sorting I use this function and it works
.igGridSorting( "sortMultiple", [exprs:array] );
The problem is now with grouiping. Is there any function which will behave similary? I mean executing with array of grouping expressions (which define columns to group by, order of grouping and direction of grouping (acs / desc)) as parameter? (this feature is supported by ignite-ui built-in dialog)
In the documentation I have found:
.igGridGroupBy( "groupByColumns" );
The description is "Adds a column to the group by columns list, executes the group by operation and updates the view."
But there is nothing about how add this columns.
There is no public API method for grouping multiple columns.
The build-in dialog sets the expressions into the datasource and also takes care to rebind the grid and rebuild the grouping area. Unfortunately none of this is exposed as public API.
So the easiest approach would be to go around the columns you need to group and invoke groupByColumn for each column.
Another thing you can do is to re-create the grid with another set of columnSettings for the GroupBy feature.
I am trying to get the row count for a dataset with 280 fields with out having affect on the performance. Looking for best possible ways to perform.
The better option to avoid performance issue is, use sorter transformation and sort the columns and pass the pipeline to aggregator transformation. In aggregator transformation please check the option sorted input.
In terms if your source is a database then, index the required conditional columns in the table and also partition the table if required.
For your solution, I have in mind 2 options:
Using Aggregator (remember to use a predefined order by to improve performance with the next trans), SQ > Aggregator > Target. Inside the aggregator add new ports with the sum() and/or count() functions. Remember to select the columns to group
Check this out this example:
https://www.guru99.com/aggregator-transformation-informatica.html
Using Source Qualifier query override. Use a traditional select count/sum with group by from the database- SQ > Target.
By the way. Informatica is very good with the performance, more than the columns you need to review how many records you are processing. A best practice is always to stress the datasource/database more than the Infa app.
Regards,
Juan
If all you need is just to count the rows, use the Aggregator. That's what it's for. However, this will create cache - to limit it's size, use a single port.
To avoid caching, you can use a variable in expression and just increment it. This however will give you an extra column with all rows numbered, not just a single value. You'll still need to aggregate it. Here it would be possible to use aggregater with no function to return just the last value.
Can someone please explain me how to implement the following logic in informatica. but Not with source qualifier with other transformations inside the mapping.
SUM(WIN_30_DUR) OVER(PARTITION BY AGENT_MASTER_ID ORDER BY ROW_DT ROWS BETWEEN 30 PRECEDING AND 1 PRECEDING)
Basically this is sql(oracle) level requirement but i want at informatica level.
Use the Aggregator to calculate sums groupped by AGENT_MASTER_ID and self-join it on AGENT_MASTER_ID. Make sure to have the data sorted and use 'sorted input' property for aggregator - it will also be mandatory for a self join.