I have a dataset with diagnosis records, where a patient can have one or more records even for same code. I am unable to use group by variable 'code' since it shows error similar as The ID value "code_v58" occurs twice in the same BY group.
data have;
input id rand found code $;
datalines;
1 101 1 001
2 102 1 v58
2 103 0 v58 /* second diagnosis record for patient 2 */
3 104 1 v58
4 105 1 003
4 106 1 003 /* second diagnosis record for patient 4 */
5 107 0 v58
;
Desired output:
Obs id code_001 code_v58 code_003
1 1 1 . .
2 2 . 1 . /* second diagnosis code's {v58} status for patient 2 is 1, so it has to be taken*/
3 3 . 1 .
4 4 . . 1
5 5 . 0 .
When I tried with let statement like [this],
proc transpose data=temp out=want(drop=_name_) prefix=code_ let;
by id;
id code; * column name becomes <prefix><code>;
var found;
run;
I got output as below:
Obs id code_001 code_v58 code_003
1 1 1 . .
2 2 . 0 .
3 3 . 1 .
4 4 . . 1
5 5 . 0 .
I tried this and modified PROC TRANSPOSE to use ID and count in the BY statement
proc transpose data=temp out=want(drop=_name_) prefix=code_;
by id count;
id code; * column name becomes <prefix><code>;
var found;
run;
and got output like below:
Obs id count code_001 code_v58 code_003
1 1 1 1 . .
2 2 1 . 1 .
3 2 2 . 0 .
4 3 1 . 1 .
5 4 1 . . 1
6 4 2 . . 1
7 5 1 . 0 .
May I know how to remove duplicate patient ids and update the code to 1 if found in any records?
You can transpose a group aggregate view.
proc sql;
create view have_v as
select id, code, max(found) as found
from have
group by id, code
order by id, code
;
proc transpose data=have_v out=want prefix=code_;
by id;
id code;
var found;
run;
Follow up with Proc STDIZE (thanks #Reeza) if you want to replace the missing values (.) with 0
proc stdize data=want out=want missing=0 reponly;
var code_:;
run;
Seems to me that you want something like this - first preprocess the data to get the value you want for FOUND, then transpose (if you actually need to). The TABULATE does what it seems like you want to do for FOUND (take the max value of it, 1 if present, 0 if only 0s are present, missing otherwise), and then TRANSPOSE that the same way you were doing before.
proc tabulate data=have out=tab;
class id code;
var found;
tables id,code*found*max;
run;
proc transpose data=tab out=want prefix=code_;
by id;
id code;
var found_max;
run;
Related
I have a dataset with varying observations per ID, and these participants are also in different treatment status (Group). I wonder if I can use proc means to quickly calculate the number of participants and visits to clinic per group status by using proc means? Ideally, I can use proc means sum function quickly capture those with 0 and 1 based on group status and gain the total number? However, I got stuck in how to proceed.
ID Visit Group
1 1 0
1 2 0
2 1 1
2 2 1
2 3 1
3 1 0
4 1 1
4 2 1
5 1 0
5 2 0
6 1 1
6 2 1
6 3 1
6 4 1
Specifically, I am interested in 1) the total number of participants in each group status. In this case we can 3 participants (ID:1,3,and 5)in the control group (0) and another 3 participants (ID:2,4,and 6) in the treatment group (1).
2) the total number of visits per group status. In this case, the total visits in the control group (0) will be 5 (2+1+2=5) and the total visits in the treatment group (1) will be 9 (3+2+4=9).
I wonder if proc means procedure can help quickly calculate such values? Thanks.
Yes, you can use proc means to get counts.
data have;
input ID$ Visit Group;
cards;
1 1 0
1 2 0
2 1 1
2 2 1
2 3 1
3 1 0
4 1 1
4 2 1
5 1 0
5 2 0
;
run;
proc means data=have n;
class group id;
var visit;
types group id group*id;
run;
If you want the sum of visit, add "sum" behind proc means data=have n and ;.
It looks like GROUP is assigned at the ID level and not the ID/VISIT level. In that case if you want to count the number of ID's in each group you need to first get down to one observation per ID.
proc sort data=have nodupkey out=unique_ids ;
by id;
run;
Now you can count how many ID's are in each group. The normal way is to use PROC FREQ.
proc freq data=unique_ids;
tables group;
run;
But you can count with PROC MEANS/SUMMARY also.
proc summary data=unique_ids nway;
class group;
output out=counts N=N_ids ;
run;
proc print data=counts;
var group n_ids;
run;
MEANS doesn't do a distinct count easily so SQL may be a simpler to understand option here.
proc sql;
create table want as
select group, count(*) as num_visits, count(distinct ID) as num_participants
from have
group by group
order by 1;
quit;
I have a data of three variables. one is id, second is observation count for that id, and third is the value of that observation. I want to transpose the data from long to wide. The issue is that I am getting an error saying my by group is not sorted in ascending order (even though it is). Another issue is that not all values have same amout of observations , please see example below and data structure of what I am looking for
data have;
input id observation value;
cards;
1 1 '4.8.9'
1 2 '4.5.7'
2 1 '5.0.5'
3 1 '4.2.0'
3 2 '4.1.0'
3 3 '5.1.9';run;
data want;
input id observation1 observation2 observation3;
cards;
1 '4.8.9' '4.5.7' NA
2 '5.0.5' NA NA
3 '4.2.0' '4.1.0' '5.1.9'
;run;
/* i have tried the following:
proc transpose data=b out=c ;
by value ;
id id;
var value;
run;
proc transpose data=b out=c ;
by value ;
id id;
var observation;
run;
*/
Your BY variable is called ID in your example dataset.
Your example data step is not defining VALUE as character. Also don't indent the in-line data lines.
You can use the prefix= option to help name the new variables. Also let's modify the value of OBSERVATION for ID=2 to demonstrate more clearly how the value of OBSERVATION is setting the variable name instead of just the order of the observations in the ID group. Now the value '5.0.5' will be stored in OBSERVATION2 even though it is the first observation for that value of ID.
data have;
input id observation value $;
cards;
1 1 '4.8.9'
1 2 '4.5.7'
2 2 '5.0.5'
3 1 '4.2.0'
3 2 '4.1.0'
3 3 '5.1.9'
;
proc transpose data=have out=want(drop=_name_) prefix=observation;
by id;
id observation;
var value;
run;
Results:
Obs id observation1 observation2 observation3
1 1 '4.8.9' '4.5.7'
2 2 '5.0.5'
3 3 '4.2.0' '4.1.0' '5.1.9'
I have a dataset like this (but with several hundred vars):
id q1 g7 q3 b2 zz gl az tre
1 1 2 1 1 1 2 1 1
2 2 3 3 2 2 2 1 1
3 1 2 3 3 2 1 3 3
4 3 1 2 2 3 2 1 1
5 2 1 2 2 1 2 3 3
6 3 1 1 2 2 1 3 3
I'd like to keep id, b2, and tre, but set everything else to missing. In a dataset this small, I can easily use call missing (q1, g7, q3, zz, gl, az) - but in a set with many more variables, I would effectively like to say call missing (of _ALL_ *except ID, b2, tre*).
Obviously, SAS can't read my mind. I've considered workarounds that involve another data step or proc sql where I copy the original variables to a new ds and merge them back on post, but I'm trying to find a more elegant solution.
This technique uses an un-executed set statement (compile time function only) to define all variables in the original data set. Keeps the order and all variable attributes type, labels, format etc. Basically setting all the variables to missing. The next SET statement which will execute brings in only the variables the are NOT to be set to missing. It doesn't explicitly set variables to missing but achieves the same result.
data nomiss;
input id q1 g7 q3 b2 zz gl az tre;
cards;
1 1 2 1 1 1 2 1 1
2 2 3 3 2 2 2 1 1
3 1 2 3 3 2 1 3 3
4 3 1 2 2 3 2 1 1
5 2 1 2 2 1 2 3 3
6 3 1 1 2 2 1 3 3
;;;;
run;
proc print;
run;
data manymiss;
if 0 then set nomiss;
set nomiss(keep=id b2 tre:);
run;
proc print;
run;
Another fairly simple option is to set them missing using a macro, and basic code writing techniques.
For example, let's say we have a macro:
%call_missing(var=);
call missing(&var.);
%mend call_missing;
Now we can write a query that uses dictionary.columns to identify the variables we want set to missing:
proc sql;
select name
from dictionary.columns
where libname='WORK' and memname='HAVE'
and not (name in ('ID','B2','TRE')); *note UPCASE for all these;
quit;
Now, we can combine these two things to get a macro variable containing code we want, and use that:
proc sql;
select cats('%call_missing(var=',name ,')')
into :misslist separated by ' '
from dictionary.columns
where libname='WORK' and memname='HAVE'
and not (name in ('ID','B2','TRE')); *note UPCASE for all these;
quit;
data want;
set have;
&misslist.;
run;
This has the advantage that it doesn't care about the variable types, nor the order. It has the disadvantage that it's somewhat more code, but it shouldn't be particularly long.
If the variables are all of the same type (numeric or character) then you could use an array.
data want ;
set have;
array _all_ _numeric_ ;
do over _all_;
if upcase(vname(_all_)) not in ('ID','B2') then _all_=.;
end;
run;
If you don't care about the order then just drop the variables and add them back on with 0 observations.
data want;
set have (keep=ID B2 TRE:) have (obs=0 drop=ID B2 TRE:);
run;
I am wanting to count the number of time a certain value appears in a particular column in sas. For example in the following dataset the value 1 appears 3 times
value 2 appears twice, value 3 appears once, value 4 appears 4 times and value 5 appears four times.
Game_ball
1
1
1
2
2
3
4
4
4
5
5
5
5
5
I want the dataset to represented like the following:
Game_ball Count
1 3
2 2
3 1
4 4
5 4
. .
. .
. .
Thanks in advance
As per #Dwal, proc freq is the easiest solution.
Using your sample data,
proc freq data=sample;
table game_ball/out=output;
run;
Or do it in one-pass data step
proc sort data = sample;by game_ball;run;
data output;
set sample;
retain count;
if first.game_ball then count = 0;
count + 1;
if last.game_ball then output;
by game_ball;
run;
Or in SQL
proc sql;
create table output as
select game_ball, count(*) as count
from sample
group by game_ball;
quit;
Using PROC REPORT in SAS, if a certain ACROSS variable has 5 different value possibilities (for example, 1 2 3 4 5), but in my data set there are no observations where that variable is equal to, say, 5, how can I get the report to show the column for 5 and display 0 for the # of observations having that value?
Currently my PROC REPORT output is just not displaying those value columns that have no observations.
When push comes to shove, you can do some hacks like this. Notice that there are no missing on SEX variable of the SASHELP.CLASS:
proc format;
value $sex 'F' = 'female' 'M' = 'male' 'X' = 'other';
run;
options missing=0;
proc report data=sashelp.class nowd ;
column age sex;
define age/ group;
define sex/ across format=$sex. preloadfmt;
run;
options missing=.;
/*
Sex
Age female male other
11 1 1 0
12 2 3 0
13 2 1 0
14 2 2 0
15 2 2 0
16 0 1 0
*/