SAS proc transpose by group with character variables - sas

I have a data of three variables. one is id, second is observation count for that id, and third is the value of that observation. I want to transpose the data from long to wide. The issue is that I am getting an error saying my by group is not sorted in ascending order (even though it is). Another issue is that not all values have same amout of observations , please see example below and data structure of what I am looking for
data have;
input id observation value;
cards;
1 1 '4.8.9'
1 2 '4.5.7'
2 1 '5.0.5'
3 1 '4.2.0'
3 2 '4.1.0'
3 3 '5.1.9';run;
data want;
input id observation1 observation2 observation3;
cards;
1 '4.8.9' '4.5.7' NA
2 '5.0.5' NA NA
3 '4.2.0' '4.1.0' '5.1.9'
;run;
/* i have tried the following:
proc transpose data=b out=c ;
by value ;
id id;
var value;
run;
proc transpose data=b out=c ;
by value ;
id id;
var observation;
run;
*/

Your BY variable is called ID in your example dataset.
Your example data step is not defining VALUE as character. Also don't indent the in-line data lines.
You can use the prefix= option to help name the new variables. Also let's modify the value of OBSERVATION for ID=2 to demonstrate more clearly how the value of OBSERVATION is setting the variable name instead of just the order of the observations in the ID group. Now the value '5.0.5' will be stored in OBSERVATION2 even though it is the first observation for that value of ID.
data have;
input id observation value $;
cards;
1 1 '4.8.9'
1 2 '4.5.7'
2 2 '5.0.5'
3 1 '4.2.0'
3 2 '4.1.0'
3 3 '5.1.9'
;
proc transpose data=have out=want(drop=_name_) prefix=observation;
by id;
id observation;
var value;
run;
Results:
Obs id observation1 observation2 observation3
1 1 '4.8.9' '4.5.7'
2 2 '5.0.5'
3 3 '4.2.0' '4.1.0' '5.1.9'

Related

apply proc means to quickly calculate multiple observations?

I have a dataset with varying observations per ID, and these participants are also in different treatment status (Group). I wonder if I can use proc means to quickly calculate the number of participants and visits to clinic per group status by using proc means? Ideally, I can use proc means sum function quickly capture those with 0 and 1 based on group status and gain the total number? However, I got stuck in how to proceed.
ID Visit Group
1 1 0
1 2 0
2 1 1
2 2 1
2 3 1
3 1 0
4 1 1
4 2 1
5 1 0
5 2 0
6 1 1
6 2 1
6 3 1
6 4 1
Specifically, I am interested in 1) the total number of participants in each group status. In this case we can 3 participants (ID:1,3,and 5)in the control group (0) and another 3 participants (ID:2,4,and 6) in the treatment group (1).
2) the total number of visits per group status. In this case, the total visits in the control group (0) will be 5 (2+1+2=5) and the total visits in the treatment group (1) will be 9 (3+2+4=9).
I wonder if proc means procedure can help quickly calculate such values? Thanks.
Yes, you can use proc means to get counts.
data have;
input ID$ Visit Group;
cards;
1 1 0
1 2 0
2 1 1
2 2 1
2 3 1
3 1 0
4 1 1
4 2 1
5 1 0
5 2 0
;
run;
proc means data=have n;
class group id;
var visit;
types group id group*id;
run;
If you want the sum of visit, add "sum" behind proc means data=have n and ;.
It looks like GROUP is assigned at the ID level and not the ID/VISIT level. In that case if you want to count the number of ID's in each group you need to first get down to one observation per ID.
proc sort data=have nodupkey out=unique_ids ;
by id;
run;
Now you can count how many ID's are in each group. The normal way is to use PROC FREQ.
proc freq data=unique_ids;
tables group;
run;
But you can count with PROC MEANS/SUMMARY also.
proc summary data=unique_ids nway;
class group;
output out=counts N=N_ids ;
run;
proc print data=counts;
var group n_ids;
run;
MEANS doesn't do a distinct count easily so SQL may be a simpler to understand option here.
proc sql;
create table want as
select group, count(*) as num_visits, count(distinct ID) as num_participants
from have
group by group
order by 1;
quit;

Transposing table while collapsing duplicate observations per BY group

I have a dataset with diagnosis records, where a patient can have one or more records even for same code. I am unable to use group by variable 'code' since it shows error similar as The ID value "code_v58" occurs twice in the same BY group.
data have;
input id rand found code $;
datalines;
1 101 1 001
2 102 1 v58
2 103 0 v58 /* second diagnosis record for patient 2 */
3 104 1 v58
4 105 1 003
4 106 1 003 /* second diagnosis record for patient 4 */
5 107 0 v58
;
Desired output:
Obs id code_001 code_v58 code_003
1 1 1 . .
2 2 . 1 . /* second diagnosis code's {v58} status for patient 2 is 1, so it has to be taken*/
3 3 . 1 .
4 4 . . 1
5 5 . 0 .
When I tried with let statement like [this],
proc transpose data=temp out=want(drop=_name_) prefix=code_ let;
by id;
id code; * column name becomes <prefix><code>;
var found;
run;
I got output as below:
Obs id code_001 code_v58 code_003
1 1 1 . .
2 2 . 0 .
3 3 . 1 .
4 4 . . 1
5 5 . 0 .
I tried this and modified PROC TRANSPOSE to use ID and count in the BY statement
proc transpose data=temp out=want(drop=_name_) prefix=code_;
by id count;
id code; * column name becomes <prefix><code>;
var found;
run;
and got output like below:
Obs id count code_001 code_v58 code_003
1 1 1 1 . .
2 2 1 . 1 .
3 2 2 . 0 .
4 3 1 . 1 .
5 4 1 . . 1
6 4 2 . . 1
7 5 1 . 0 .
May I know how to remove duplicate patient ids and update the code to 1 if found in any records?
You can transpose a group aggregate view.
proc sql;
create view have_v as
select id, code, max(found) as found
from have
group by id, code
order by id, code
;
proc transpose data=have_v out=want prefix=code_;
by id;
id code;
var found;
run;
Follow up with Proc STDIZE (thanks #Reeza) if you want to replace the missing values (.) with 0
proc stdize data=want out=want missing=0 reponly;
var code_:;
run;
Seems to me that you want something like this - first preprocess the data to get the value you want for FOUND, then transpose (if you actually need to). The TABULATE does what it seems like you want to do for FOUND (take the max value of it, 1 if present, 0 if only 0s are present, missing otherwise), and then TRANSPOSE that the same way you were doing before.
proc tabulate data=have out=tab;
class id code;
var found;
tables id,code*found*max;
run;
proc transpose data=tab out=want prefix=code_;
by id;
id code;
var found_max;
run;

How Can I show all orders of mode by using proc univariate

I try to show all orders of Mode.
For example, I import excel like:
A
1
1
2
3
3
3
and code is :
ods select Modes;
proc univariate data=Want modes;
var A;
run;
this Result shows like:
Mode Count
3 3
I want to show like
Mode Count
3 3
1 2
2 1
how can I do that???
Your desired output is actually not modes. Modes returns most frequent value or values (if there is more than one with the same frequency) with the corresponding count. In your example, there is only one mode (3), as it is the value with the highest frequency. And that's what the result shows.
You may be interested in showing frequencies of every value present in variable A. In that case, you want to use this code:
ods select Frequencies;
proc univariate data=Want freq;
var A;
run;
That is a frequency table.
data have ;
input A ##;
cards;
1 1 2 3 3 3
;
proc freq data=have order=freq ;
tables a / out=counts;
run;
proc print data=counts;
run;
Result:
Obs A COUNT PERCENT
1 3 3 50.0000
2 1 2 33.3333
3 2 1 16.6667

Multiple transactions lines to base table SAS

I am new to sas and are trying to handle some customer data, and I'm not really sure how to do this.
What I have:
data transactions;
input ID $ Week Segment $ Average Freq;
datalines;
1 1 Sports 500 2
1 1 PC 400 3
1 2 Sports 350 3
1 2 PC 550 3
2 1 Sports 650 2
2 1 PC 700 3
2 2 Sports 720 3
2 2 PC 250 3
;
run;
What I want:
data transactions2;
input ID Week1_Sports_Average Week1_PC_Average Week1_Sports_Freq
Week1_PC_Freq
Week2_Sports_Average Week2_PC_Average Week2_Sports_Freq Week2_PC_Freq;
datalines;
1 500 400 2 3 350 550 3 3
2 650 700 2 3 720 250 3 3
;
run;
The only thing I got so far is this:
Data transactions3;
SET transactions;
if week=1 and Segment="Sports" then DO;
Week1_Sports_Freq=Freq;
Week1_Sports_Average=Average;
END;
else DO;
Week1_Sports_Freq=0;
Week1_Sports_Average=0;
END;
run;
This will be way too much work as I have a lot of weeks and more variables than just freq/avg.
Really hoping for some tips are, as I'm stucked.
You can use PROC TRANSPOSE to create that structure. But you need to use it twice since your original dataset is not fully normalized.
The first PROC TRANSPOSE will get the AVERAGE and FREQ readings onto separate rows.
proc transpose data=transactions out=tall ;
by id week segment notsorted;
var average freq ;
run;
If you don't mind having the variables named slightly differently than in your proposed solution you can just use another proc transpose to create one observation per ID.
proc transpose data=tall out=want delim=_;
by id;
id segment _name_ week ;
var col1 ;
run;
If you want the exact names you had before you could add data step to first create a variable you could use in the ID statement of the PROC transpose.
data tall ;
set tall ;
length new_name $32 ;
new_name = catx('_',cats('WEEK',week),segment,_name_);
run;
proc transpose data=tall out=want ;
by id;
id new_name;
var col1 ;
run;
Note that it is easier in SAS when you have a numbered series of variable if the number appears at the end of the name. Then you can use a variable list. So instead of WEEK1_AVERAGE, WEEK2_AVERAGE, ... you would use WEEK_AVERAGE_1, WEEK_AVERAGE_2, ... So that you could use a variable list like WEEK_AVERAGE_1 - WEEK_AVERAGE_5 in your SAS code.

How can I get PROC REPORT in SAS to show values in an ACROSS variable that have no observations?

Using PROC REPORT in SAS, if a certain ACROSS variable has 5 different value possibilities (for example, 1 2 3 4 5), but in my data set there are no observations where that variable is equal to, say, 5, how can I get the report to show the column for 5 and display 0 for the # of observations having that value?
Currently my PROC REPORT output is just not displaying those value columns that have no observations.
When push comes to shove, you can do some hacks like this. Notice that there are no missing on SEX variable of the SASHELP.CLASS:
proc format;
value $sex 'F' = 'female' 'M' = 'male' 'X' = 'other';
run;
options missing=0;
proc report data=sashelp.class nowd ;
column age sex;
define age/ group;
define sex/ across format=$sex. preloadfmt;
run;
options missing=.;
/*
Sex
Age female male other
11 1 1 0
12 2 3 0
13 2 1 0
14 2 2 0
15 2 2 0
16 0 1 0
*/