Get comma separated value to multiple rows in informatica? - informatica

I have 2 cols
SID CID
1 101,102
2 201,2021,231
IN TGT
SID CID
1 101
1 102
2 201
2 2021
2 231

You need to use normalizer.
First after SQ, use expression transformation to split CID column.
o_cid1= substr(cid,1,3) --if length is variable you need to use instr
o_cid2= substr(cid,instr(cid,',',1)+1, 3) -- if length is variable you need to use instr
...
Then use normalizer. Properties should be
Number of occurrences of sid =0
Number of occurrences of cid =3
You will see 4input ports(3for for cid1,2,3 and 1for sid) and two outports(1cid,1sid) related to your needs.
Conect sid, o_cid1,o_cid2... To corresponding ports.
Finally connect output ports cid,sid to target.

Related

BigQuery Challenge: How can I correctly assign the value by next available value and keep it until the next update?

Come up with a quite challenging BigQuery question here.
So basically, I have to assign the next available value to session1's code (in this case session 1 should be the next available value -> 123.
However, we want to keep the code value at 234 in session4 until it gets another update.
Here's what I have:
timestamp
session
user_id
code
ts1
1
User A
NULL
ts2
2
User A
NULL
ts3
2
User A
123
ts4
3
User A
NULL
ts5
3
User A
234
ts6
4
User A
NULL
And the desired output table:
timestamp
session
user_id
code
ts1
1
User A
123
ts2
2
User A
123
ts3
2
User A
123
ts4
3
User A
234
ts5
3
User A
234
ts6
4
User A
234
Thanks everyone for the help!
You might consider below approach.
SELECT *,
COALESCE(
FIRST_VALUE(code IGNORE NULLS) OVER w0,
LAST_VALUE(code IGNORE NULLS) OVER w1
) AS new_code
FROM sample_table
WINDOW w AS (PARTITION BY user_id ORDER BY timestamp),
w0 AS (w RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING),
w1 AS (w RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW);
Query results
JayTiger's answer is a much cleaner, but here's what I came up with that can be used as an alternative:
SELECT *EXCEPT (Code),
IFNULL(
(FIRST_VALUE(LatestCodeBySession IGNORE NULLS)
OVER (PARTITION BY user_id ORDER BY event_timestamp ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)),
(LAST_VALUE(LatestCodeBySession IGNORE NULLS)
OVER (PARTITION BY user_id ORDER BY event_timestamp ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)))
AS Code
LatestCodeBySession: `LAST_VALUE(Code IGNORE NULLS) OVER (PARTITION BY session ORDER BY ts ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)`

Informatica : Count the number of rows based on sequence of number

I have table where source has 1 column, like below. for example, column name is A and I have set of records in the source.
A
1
1
1
2
2
3
I want to populate two columns in target, Say columns are A and B.
Column A in the Target has same values as in source and column B has count
A B
1 1
1 2
1 3
2 1
2 2
3 1
Can someone please explain how can i achieve this.
Thanks in advance
If source is a dbms like oracle, you can use source qualifier overwrite sql like below. Use row number and partition by to generate sequence for every A.
Select
A, row_number() over(partition by A order by A) as B
From mytable
If you're looking for infomatica only solution then this is how you can do it.
Sort the data by column A
Use ex transformation, create one in/out two var, and one out port.
We are going to compare first val with prev val, if they r same, add 1to the sequence else start from 1 again.
A in/out
v_B = iif (A=prev_A, v_B +1, 1)
prev_A=A
o_B =v_B
Link A and o_B to the target.

Big Query Select rows in between two values

I'm trying to select rows in between two values using big query.
Here the table is:
ID Group values
1 A 10I
1 B 20I
1 C 30I
1 D 40I
1 E 50I
1 F 60I
1 G 70I
1 H 80I
1 I 90I
Here I need to select rows from Group C to G.
The code i'm using is:
select * from data
where Group >= 'C' and Group <='G'
The above code gave no results.
Also i tried:
select * from data
where Group between 'C' and 'G'
This also returned no results.
Someone please provide a solution.
This is because "Group" is a reserved word (the GROUP BY): BQ expects you to group something and didn't understand that here it is the name of a column. To make BQ understand just as backslashes:
SELECT *
FROM data
WHERE `Group` BETWEEN "C" AND "G"

Concatenating row values in Athena Aws

I've 2 cols lets say id and values. I want to concatenate values grouped by id col.
for eg.
I've
ID Values
1 a
1 b
2 a
2 b
I need the output as
ID Values
1 a,b
2 a,b
You can use an array_agg followed by an array_join
select id, array_join(array_agg(values),',') from table group by 1
The array_agg will give you an array of all values with the same id, and the array_join will concatenate them into a string. See the docs.

How to select count of distinct key based on indicator in another column?

I have a table which is like this:
Geo_Key Var1 Var2..Var50
123 1 0 .. 1
524 0 1 .. 1
323 1 1 .. 1
Where Var1-Var50 represents 50 columns having value 1/0.
I want to select count of distinct Geo_Key for each column(var1-var50), when its value is=1.
So Results would be like:
Var1 50
Var2 60
....
...
Var50 10
Since your variables are binary( especially 0/1) in nature, you can also try summing each column up. The sum would give you the count of each variable with value = 1.
Or, you can try it using proc freq. Pleae check the following link
http://www2.sas.com/proceedings/sugi25/25/btu/25p069.pdf