COUNTING VALUE PER PARTCIPANTS

COUNTING VALUE PER PARTCIPANTS - sas

I would like to add a new column to a dataset but I am not sure how to do so. My dataset has a variable called KEYVAR (character variable) with three different values. A participant can appear multiple times in my dataset, with each row containing a similar or different value for KEYVAR. What I want to do is create a new variable call NEWVAR that counts how many times a participant has a specific value for KEYVAR; when a participant does not have an observation for that specific value, I want NEWVAR to have a result of zero.
Here's an example of the dataset I would like (in this example, I want to count every instance of "Y" per participants as newvar):
have
PARTICIPANT KEYVAR
A Y
A N
B Y
B Y
B Y
C W
C N
C W
D Y
D N
D N
D Y
D W
want
PARTICIPANT KEYVAR NEWVAR
A Y 1
A N 1
B Y 3
B Y 3
B Y 3
C W 0
C N 0
C W 0
D Y 2
D N 2
D N 2
D Y 2
D W 2

You can use Proc SQL to compute an aggregate result over a group meeting a criteria, and have that aggregate value automatically merged into the result set.
-OR-
Use a MEANS, TRANSPOSE, MERGE approach
Sample Code (SQL)
data have;
input ID $ value $; datalines;
A Y
A N
B Y
B Y
B Y
C W
C N
C W
D Y
D N
D N
D Y
D W
E X
;
proc sql;
create table want as
select ID, value
, sum(value='Y') as Y_COUNT /* relies on logic eval 'math' 0 false, 1 true */
, sum(value='N') as N_COUNT
, sum(value='W') as W_COUNT
from have
group by ID
;
Sample Code (PROC and MERGE)
* format for PRELOADFMT and COMPLETETYPES;
proc format;
value $eachvalue
'Y' = 'Y'
'N' = 'N'
'W' = 'W'
other = '-';
;
run;
* Count how many per combination ID/VALUE;
proc means noprint data=have nway completetypes;
class ID ;
class value / preloadfmt;
format value $eachvalue.;
output out=freqs(keep=id value _freq_);
run;
* TRANSPOSE reshapes to wide (across) data layout, one row per ID;
proc transpose data=freqs suffix=_count out=counts_across(drop=_name_);
by id;
id value;
var _freq_;
where put(value,$eachvalue.) ne '-';
run;
* MERGE;
data want_way_2;
merge have counts_across;
by id;
run;

Related

Deleting first instance of a column after group by in sas proc sql

I have the following SAS dataset.
correlation
policynum
risknum
A
X
Y
A
X
Y
A
X
Y
B
X
Y
B
X
Y
B
X
Y
B
X
L
B
X
L
B
X
L
C
Z
M
C
Z
M
C
Z
M
D
Z
M
D
Z
M
D
Z
M
In SAS, I want to filter the above dataset so I get my final output as:
correlation
policynum
risknum
B
X
Y
B
X
Y
B
X
Y
B
X
L
B
X
L
B
X
L
D
Z
M
D
Z
M
D
Z
M
i.e. for each group of policynum and risknum, if multiple values exist for correlation, I want to keep the second value and get rid of the first value.
If only a single value of correlation exists for a group of policynum and risknum, I want to retain that group in my final output too.
What would be the best way to do this? It might be something simple as I am relatively new to SAS.
Thanks in advance!

If the order of the correlation values, in sort order, is the same ordering as they appear row-wise in the data set you can use SQL. Otherwise, SQL, being based on set theory, which does not have implicit row numbers, can not be used. A DATA step with DOW loop can be used.
Example:
FYI, one common situation in which SAS coders use the phrase 'DOW loop' is when SET & BY statements occur inside a DO loop.
data have;
input correlation $ policynum $ risknum $;
datalines;
A X Y
A X Y
A X Y
B X Y
B X Y
B X Y
B X L
B X L
B X L
C Z M
C Z M
C Z M
D Z M
D Z M
D Z M
;
/* keep last group of a nested group */
* SQL can be used only if correlation wanted is ALWAYS highest valued correlation;
proc sql;
create table want as
select * from have
group by policynum, risknum
having correlation = max(correlation)
;
* DATA Step DOW loops can be used when correlation wanted is last occurring correlation within by group;
data want;
do _n_ = 1 by 1 until (last.policynum);
set have;
by policynum risknum notsorted; /* presume at least contiguous */
end;
_want_correlation = correlation;
do _n_ = 1 to _n_;
set have;
if _want_correlation = correlation then OUTPUT;
end;
run;

SAS logic to populate 1 row based on another row

I have a SAS dataset like this:
Name MgrName Dept.
A B
B C
C D
X Y
I need to fill in the Dept. using a recursive logic. I know D is the head of 'Payroll', so I fill in:
Name MgrName Dept.
A B
B C
C D Payroll
X Y
But using some kind of recursion,everybody that is in D's reporting chain (A, B,C) also needs to be assigned 'Payroll'. How can I do that in SAS?

Here is a hash based approach in the context of a Proc DS2 program.
Each node (name) has only one parent (mgrname) so a hash can key with name and have mgrname as data. A loop over find method will seek the ancestral parent for a node or not find one.
Example (name is id and namemgr is pid):
data have;
input id $ pid $;datalines;
A B
B C
C D
F D
G F
H F
P H
Q H
R H
X Y
run;
proc ds2;
data want / overwrite=yes;
declare package hash links();
declare char _seek_pid;
declare char _seek_id;
declare char dept;
keep id pid dept;
* populate hash;
method init();
links.definekey('id');
links.definedata('pid');
links.dataset('{select pid, id from have {options locktable=share}}');
links.multidata('yes');
links.definedone();
end;
* seek ancestor from which value should be applied;
method apply(char rootid, char value, char id);
declare int limit;
limit = 0;
_seek_id = id;
do while (
links.find([_seek_id], [_seek_pid]) = 0 and
limit < 100 and
rootid ne _seek_pid
);
limit+1;
_seek_id = _seek_pid;
end;
if rootid = _seek_pid then dept = value;
end;
* apply some values to some nodes and children thereof;
method run();
set have (locktable=share);
apply ('D','payroll', id);
apply ('F','shadow$', id);
end;
enddata;
run;
quit;
%let syslast = want;

There is probably a smarter way, but here is a hash object approach
data have;
input Name $ MgrName $;
datalines;
A B
B C
C D
X Y
;
data want(drop=rc);
declare hash h1(dataset:'have');
h1.definekey('MgrName');
h1.definedata('Name');
h1.definedone();
declare hash h2();
h2.definekey('Name');
h2.definedone();
length Name $ 100 MgrName $ 100;
do rc=h1.find(key:'D') by 0 while (rc=0);
h2.replace();
rc=h1.find(key:Name);
end;
do until (lr);
set have end=lr;
Dept=ifc(h2.check()=0, 'Payroll', '');
output;
end;
run;

SAS for following scenario (most frequent observation)

Assume I have a data-set D1 as follows:
ID ATR1 ATR2 ATR3
1 A R W
2 B T X
1 A S Y
2 C T E
3 D U I
1 T R W
2 C X X
I want to create a data-set D2 from this as follows
ID ATR1 ATR2 ATR3
1 A R W
2 C T X
3 D U I
In other words, Data-set D2 consists of unique IDs from D1. For each ID in D2, the values of ATR1-ATR3 are selected as the most frequent (of the respective variable) among the records in D1 with the same ID. For example ID = 1 in D2 has ATR1 = A (most frequent).
I have one solution which is very clumsy. I simply sort copies of the data set `D1' three times (by ID and ATR1 e.g) and remove duplicates. I later merge the three data-sets to get what I want. However, I think there might be an elegant way to do this. I have about 20 such variables in the original data-set.
Thanks

/*
read and restructure so we end up with:
id attr_id value
1 1 A
1 2 R
1 3 W
etc.
*/
data a(keep=id attr_id value);
length value $1;
array attrs_{*} $ 1 attr_1 - attr_3;
infile cards;
input id attr_1 - attr_3;
do attr_id=1 to dim(attrs_);
value = attrs_{attr_id};
output;
end;
cards;
1 A R W
2 B T X
1 A S Y
2 C T E
3 D U I
1 T R W
2 C X X
;
run;
/* calculate frequencies of values per id and attr_id */
proc freq data=a noprint;
tables id*attr_id*value / out=freqs(keep=id attr_id value count);
run;
/* sort so the most frequent value per id and attr_id ends up at the bottom of the group.
if there are ties then it's a matter of luck which value we get */
proc sort data = freqs;
by id attr_id count;
run;
/* read and recreate the original structure. */
data b(keep=id attr_1 - attr_3);
retain attr_1 - attr_3;
array attrs_{*} $ 1 attr_1 - attr_3;
set freqs;
by id attr_id;
if first.id then do;
do i=1 to dim(attrs_);
attrs_{i} = ' ';
end;
end;
if last.attr_id then do;
attrs_{attr_id} = value;
end;
if last.id then do;
output;
end;
run;

Comparison of two data sets in SAS

I have the following data set:
data data_one;
length X 3
Y $ 20;
input x y ;
datalines;
1 test
2 test
3 test1
4 test1
5 test
6 test
7 test1
run;
data data_two;
length Z 3
A $ 20;
input Z A;
datalines;
1 test
2 test1
3 test2
run;
What I would like to have is a data set which tells me how often column Y in data_one contains the same string of column A in data_two. The result should look like this one:
Obs test test1 test2
1 4 3 0
Thanks in advance!

First we need the counts for those values of Y present in data_one.
Then we create a sorted (for the next merge) list of the values present in data_two.
The data_one Y counts from 1. are merged with the list from 2.
The Y values present in data_two but not in data_one (b and not a) are assigned count=0, the Y values not present in data_two are discarded (if b).
The last passage transposes the vertical list of counts in an horizontal set of variables.
proc freq data=data_one noprint;
table y / out=count_one (keep=y count);
run;
proc sort data=data_two out=list_two (keep=a rename=(a=y)) nodupkey;
by a;
run;
data count_all;
merge count_one (in=a) list_two (in=b);
by y;
if (b and not a) then count=0;
if b;
run;
proc transpose data=count_all out=final (drop=_name_ _label_);
id y;
run;
The first 3 steps can be replaced with one proc SQL:
proc sql;
create table count_all as
select distinct
coalesce(t1.y,t2.a) as y,
case
when missing(t1.y) then 0
else count(t1.y)
end as N
from data_one as t1
right join data_two as t2
on t1.y=t2.a
group by 1
order by 1;
quit;
proc transpose data=count_all out=final (drop=_name_);
id y;
run;

SQL Left Join logic in SAS Merge or Data step

I have below two datasets and need the third dataset as an output.
ONE TWO
---------- ----------
ID FLAG NUMB
1 N 2
2 Y 3
3 Y 9
4 N 2
5 N 3
9 Y 9
10 Y
OUTPUT
-------
ID FLAG NEW
1 N N
2 Y Y
3 Y Y
4 N N
5 N N
9 Y Y
10 Y N
If ONE.ID is found in TWO.NUMB and it's ONE.FLAG = Y then the new variable NEW = Y
else NEW = N
I was able to do this using PROC SQL as below.
proc sql;
create table output as
(
select distinct id, flag, case when numb is null then 'N' else 'Y' end as NEW
from one
left join
two
on id = numb
and flag = 'Y'
);
quit;
Could this be done in DATA step/MERGE?

since you have a sql step attempt here's an improvement on that
--this sql step does not require a merge--
proc sql noprint;
create table output as
select distinct *, case
when id in (select distinct numb from two) then "Y"
else "N"
end as new
from one
;
quit;

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

COUNTING VALUE PER PARTCIPANTS - sas

Related

Deleting first instance of a column after group by in sas proc sql

SAS logic to populate 1 row based on another row

SAS for following scenario (most frequent observation)

Comparison of two data sets in SAS

SQL Left Join logic in SAS Merge or Data step

Categories

Resources