Remove duplicates in listagg?

Remove duplicates in listagg? - informatica

input data
ID CID
1 101,103
1 101
In The Target, I am getting like this, I used listagg function
ID CID
1 101,103,101
But I Want the output like below
ID CID
1 101,103

Use distinct first then use listag. You can refer to below SQL.
SELECT
subqry.ID as id,
listagg(cid,',') within group(order by(id)) as cid
FROM
(select distinct ID, CID from FROM Table1) subqry -- this will deduplicate CIDs first
group by id
Order by 1
--

Related

Why bigquery can't handle a query processing 4TB data?

I'm trying to run this query
SELECT
id AS id,
ARRAY_AGG(DISTINCT users_ids) AS users_ids,
MAX(date) AS date
FROM
users,
UNNEST(users_ids) AS users_ids
WHERE
users_ids != " 1111"
AND users_ids != " 2222"
GROUP BY
id;
Where users table is splitted table with id column and user_ids (comma separated) column and date column
on a +4TB and it give me resources
Resources exceeded during query execution: Your project or organization exceeded the maximum disk and memory limit available for shuffle operations.
.. any idea why?
id userids date
1 2,3,4 1-10-20
2 4,5,6 1-10-20
1 7,8,4 2-10-20
so the final result I'm trying to reach
id userids date
1 2,3,4,7,8 2-10-20
2 4,5,6 1-10-20
Execution details:

It's constantly repartitioning - I would guess that you're trying to cramp too much stuff into the aggregation part. Just remove the aggregation part - I don't even think you have to cross join here.
Use a subquery instead of this cross join + aggregation combo.
Edit: just realized that you want to aggregate the arrays but with distinct values
WITH t AS (
SELECT
id AS id,
ARRAY_CONCAT_AGG(ARRAY(SELECT DISTINCT uids FROM UNNEST(user_ids) as uids WHERE
uids != " 1111" AND uids != " 2222")) AS users_ids,
MAX(date) OVER (partition by id) AS date
FROM
users
GROUP BY id
)
SELECT
id,
ARRAY(SELECT DISTINCT * FROM UNNEST(user_ids)) as user_ids
,date
FROM t
Just the draft I assume id is unique but it should be something along those lines? Grouping by arrays is not possible ...
array_concat_agg() has no distinct so it comes in a second step.

SAS: Grouping by ID and summing the number of a condition in a variable for the ID

I have a dataset that contains the ID and a variable called CC. The CC holds multiple numbered values where each value represents something. It looks like this:
An ID can have the same CC in multiple rows, I just want to flag if the CC exists or not so even if Joe had five rows stating that he has CC equal to 3 I just want a 1 or 0 stating if Joe ever had a CC equal to 3.
I want it to look like this:
I tried coding it as shown below but the issue is that although I know an ID can have more than one type of CC the final dataset that's created from the code only shows 1 CC for each ID that is filled. I think maybe it's overwriting it?
Also I should note that prior to this code I created the CC Flag variables and filled it all as zeros.
proc sql;
DROP TABLE Flagged_CCs;
CREATE TABLE Flagged_CCs AS
select
ID,
COUNT(ID) as count_ID,
case when CC=1 then 1 end as CC_1,
case when CC=2 then 1 end as CC_2,
case when CC=3 then 1 end as CC_3
from Original_Dataset
group by ID;
quit;
Any help is appreciated, thank you.

Is your issue the fact that after running your new code you still get multiple line per ID?
If so I propose this:
proc sql;
DROP TABLE Flagged_CCs;
CREATE TABLE Flagged_CCs AS
select ID
,case when CC_1 >0 then 1 else 0 end as CC_1
,case when CC_2 >0 then 1 else 0 end as CC_2
,case when CC_3 >0 then 1 else 0 end as CC_3
from (
select
ID,
COUNT(ID) as count_ID,
sum(case when CC=1 then 1 end) as CC_1,
sum(case when CC=2 then 1 end) as CC_2,
sum(case when CC=3 then 1 end) as CC_3
from Original_Dataset
group by ID
);
quit;
The reason you are having the issue is that you are only aggregating the count of ID and not the other values, using an aggregate on them will eliminate duplicate records.
Hope this helps

If you're looking for a report here's one method, using PROC TABULATE.
proc format ;
value indicator_fmt
low - 0, . = 0
0 - high = 1;
run;
proc tabulate data=have;
class id cc;
table id , cc*N=''*f=indicator_fmt.;
run;
Your output will look like this then:
If you want a fully dynamic approach in a table where you don't need to know anything ahead of time, such as the number of CC's this is a different approach. It's a bit longer but the dynamic part makes it possibly worthwhile to implement.

How to identify if an observation is repeated every day in Stata

I have a database where I have a date variable, an id variable and a city variable. Sometimes the id variable is repeated in the same date and city.
Data looks something like this:
Date ID City
2/1/2015 1 1
2/1/2015 1 1
2/1/2015 1 2
2/2/2015 1 1
2/1/2015 2 1
2/2/2015 2 1
I would like to know how much days each ID is present, identify the id's that are present every day, and later on, those that are present every day in every city.
In the example above both ID 1&2 are present each day, but only ID 1 is present in each city each day.
Thanks!

I think I just did what i wanted to do.
All I had to do was:
by ID city date, sort: gen nvals = _n == 1
by ID city: replace nvals = sum(nvals)
by ID city : replace nvals = nvals[_N]

How to Update a single column through informatica?

I have a target table with the following attributes:
PARTY_ID PK
START_DATE PK
STATUS_CD PK
END_DATE
I have a dynamic lookup which is returning me 1(insert) 2(update) 0 (duplicate) for each row from source table.
What i want is when i get 2(update) to add an END_DATE to the updated row without changing anything else.
For example i have the following row in my target table:
1 12/01/2014 2 NULL
and i get this row from my source table:
1 14/01/2014 6 NULL
What i want is to add ONLY the end date to the target table without anything else. LIKE:
1 12/01/2014 2 14/01/2014
I know how to update the whole row but i dont know how to update only one column.
Schema:
CREATE SET TABLE IND_MAR_STATUS ,NO FALLBACK ,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT,
DEFAULT MERGEBLOCKRATIO
(
INDIVIDUAL_PARTY_ID DECIMAL(18,0) NOT NULL,
INDIV_MARITAL_STAT_START_DTTM DATE FORMAT 'YYYY-MM-DD' NOT NULL,
MARITAL_STATUS_CD VARCHAR(100) CHARACTER SET LATIN NOT CASESPECIFIC NOT NULL,
INDIV_MARITAL_STAT_END_DTTM DATE FORMAT 'YYYY-MM-DD',
ETL_SOURCE_ID DECIMAL(18,0) NOT NULL,
ETL_EXTRACT_SPEC_ID DECIMAL(18,0),
ETL_JOB_RUN_ID DECIMAL(18,0))
PRIMARY INDEX ( INDIVIDUAL_PARTY_ID );

Simply disconnect the target ports you don't want to update (i.e. only PARTY_ID and END_DATE should be connected).

mysql connect all fields in two columns

I have a view with two columns: a person's ID (a number) and the sector that they below to (given as numbers 1-5).
I want to create a view to show whether people belong to the same sector. I think this would have three columns: ID1, ID2, and SameSector. The first column would list IDs, and for each ID in column 1 the second column would list ALL of the IDs. The third column would be an if statement, 1 if the sector was the same for both IDs, 0 if it wasn't. This is made slightly more complicated because a person can belong to more than one sector.
For example:
I have:
ID Sector
1 1
2 1
2 5
3 1
I want:
ID1 ID2 SameSector
1 1 1
1 2 1
1 2 0
1 3 0
2 1 1
2 1 0
etc.
I'm guessing this involves some sort of self join and if statement but I can't figure out how to get all of the ID fields to be listed in ID1 column and matched to all of the ID fields in ID2 any ideas?

This should be what you want:
SELECT a.ID AS ID1, b.ID AS ID2, IF(a.Sector=b.Sector,1,0) AS SameSector
FROM theTable AS a, theTable AS b
http://sqlfiddle.com/#!2/f2cbc/4
I initially had a much more complicated query, but then realized you wanted a complete cross-join, including the same ID comparing to itself.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Remove duplicates in listagg? - informatica

input data ID CID 1 101,103 1 101 In The Target, I am getting like this, I used listagg function ID CID 1 101,103,101 But I Want the output like below ID CID 1 101,103

Use distinct first then use listag. You can refer to below SQL. SELECT subqry.ID as id, listagg(cid,',') within group(order by(id)) as cid FROM (select distinct ID, CID from FROM Table1) subqry -- this will deduplicate CIDs first group by id Order by 1 --

Related

Why bigquery can't handle a query processing 4TB data?

SAS: Grouping by ID and summing the number of a condition in a variable for the ID

How to identify if an observation is repeated every day in Stata

How to Update a single column through informatica?

mysql connect all fields in two columns

Categories

Resources