Averaging Panel Data in SAS - sas

I have panel data set that looks like this
ID Usage month
1234 2 -2
1234 4 -1
1234 3 1
1234 2 2
2345 5 -2
2345 6 -1
2345 3 1
2345 6 2
Obviously there are more ID variables and usage data, but this is the general form. I want to average the usage data when the month column is negative, and when it is positive for each ID. In other words for each unique ID, average the usage for negative months and for positive months. My goal is to get something like this.
ID avg_usage_neg avg_usage_pos
1234 3 2.5
2345 5.5 4.5

Here's a few options for you.
First create the test data:
data sample;
input ID
Usage
month;
datalines;
1234 2 -2
1234 4 -1
1234 3 1
1234 2 2
2345 5 -2
2345 6 -1
2345 3 1
2345 6 2
;
run;
Here's an SQL solution:
proc sql noprint;
create table result as
select id,
avg(ifn(month < 0, usage, .)) as avg_usage_neg,
avg(ifn(month > 0, usage, .)) as avg_usage_pos
from sample
group by 1
;
quit;
Here's a datastep / proc means solution:
data sample2;
set sample;
usage_neg = ifn(month < 0, usage, .);
usage_pos = ifn(month > 0, usage, .);
run;
proc means data=sample2 noprint missing nway;
class id;
var usage_neg usage_pos;
output out=result2 mean=;
run;

Related

SAS LOOP - create columns from the records which are having a value

Suppose i have random diagnostic codes, such as 001, v58, ..., 142,.. How can I construct columns from the codes which is 1 for the records?
Input:
id found code
1 1 001
2 0 v58
3 1 v58
4 1 003
5 0 v58
......
......
15000 0 v58
Output:
id code_001 code_v58 code_003 .......
1 1 0 0
2 0 0 0
3 0 1 0
4 1 0 0
5 0 0 0
.........
.........
You will want to TRANSPOSE the values and name the pivoted columns according to data (value of code) with an ID statement.
Example:
In real world data it is often the case that missing diagnoses will be flagged zero, and that has to be done in a subsequent step.
data have;
input id found code $;
datalines;
1 1 001
2 0 v58
2 1 003 /* second diagnosis result for patient 2 */
3 1 v58
4 1 003
5 0 v58
;
proc transpose data=have out=want(drop=_name_) prefix=code_;
by id;
id code; * column name becomes <prefix><code>;
var found;
run;
* missing occurs when an id was not diagnosed with a code;
* if that means the flag should be zero (for logistic modeling perhaps)
* the missings need to be changed to zeroes;
data want;
set want;
array codes code_:;
do _n_ = 1 to dim(codes); /* repurpose automatic variable _n_ for loop index */
if missing(codes(_n_)) then codes(_n_) = 0;
end;
run;

SAS flag each row that contains the max value

I tried searching but couldn't exactly find what I was looking for. I have a dataset with multiple rows per ID. I'd like to add a variable called maxdec and show a 1 for each row that has the max dec for each ID.
Sample Dataset:
ID DEC
123 1
123 2
123 2
123 2
456 2
456 3
456 3
Desired Output:
ID DEC MAXDEC
123 1 .
123 2 1
123 2 1
123 2 1
456 2 .
456 2 .
456 3 1
It is easier to define it with 1 or 0 instead of 1 or missing.
proc sql;
create table want as
select id,dec, dec=max(dec) as maxdec
from have
group by id
;
quit;
proc sort data=have;
by id;
proc summary data=have;
class id;
var dec;
output out=max_info max=max_value;
run;
data want;
merge have
max_info (keep=id max_value)
;
by id;
if dec=max_value then maxdec=1;
run;
The proc summary calculates the maximum value of DEC for each ID, and outputs as variable MAX_VALUE in dataset MAX_INFO. The subsequent data step assigns MAXDEC=1 if the current value of DEC is equal to MAX_VALUE for that ID.
Here is a DoW loop approach
data have;
input ID DEC;
datalines;
123 1
123 2
123 2
123 2
456 2
456 3
456 3
;
data want(drop = m);
do _N_ = 1 by 1 until (last.id);
set have;
by id;
m = max(maxdex, dec);
end;
do _N_ = 1 to _N_;
set have;
maxdex = ifn(dec = m, 1, .);
output;
end;
run;

In a sas compare, output only differences and new records

In a compare with id, how can I output only the difference and the new records
but not the old records no more present?
Example, suppose I have two tables:
mybase:
key other
1 Ann
3 Ann
4 Charlie
5 Emily
and mycompare:
key other
2 Bill
3 Charlie
4 Charlie
running:
proc compare data=mybase
compare=mycompare
outnoequal
outdif
out=myoutput
listvar
outcomp
outbase
method = absolute
criterion = 0.0001
;
id key;
run;
I get a table "myoutput" like this:
type obs key other
base 1 1 Ann
compare 1 2 Bill
base 2 3 Ann
compare 2 3 Charlie
dif 2 3 XXXXXXX
base 4 5 Emily
I would like to have this:
type obs key other
compare 1 2 Bill
base 2 3 Ann
compare 2 3 Charlie
dif 2 3 XXXXXXX
This works for your example. I think you want to output records that are not matched in base and any records that match and have differences.
data mybase;
input key other $;
cards;
1 Ann
3 Ann
4 Charlie
5 Emily
;;;;
data mycompare;
input key other $;
cards;
2 Bill
3 Charlie
4 Charlie
;;;;
proc compare data=mybase
compare=mycompare
outnoequal
outdif
out=myoutput
listvar
outcomp
outbase
method = absolute
criterion = 0.0001
;
id key;
run;
proc print;
run;
data test;
set myoutput;
by key;
if (first.key and last.key) and _type_ eq 'BASE' then delete;
run;
proc print;
run;
Obs _TYPE_ _OBS_ key other
1 COMPARE 1 2 Bill
2 BASE 2 3 Ann
3 COMPARE 2 3 Charlie
4 DIF 1 3 XXXXXXX.

Transposing data in SAS

I have a dataset of laboratory results. Each row corresponds to a time point of a subject (for example: row 1 is subject #1 at his first visit, row 2 is subject #1 at his second visit,...). In each row, I have values of 5 tests (test1, test2, ....) and for each test, I have in addition to the result, two columns of reference values of the test (normal low and high levels). I wish to transpose the data, in a way that each row will be identical for subject+visit+test, with two columns, the numerical result and the status (normal or not). I failed transposing the data. I managed to get all tests in a long format, but I couldn't save the reference values. How should I do it ? My alternative is a set of if statements, it's going to be very long !
This question was also posted on communities.sas.com.
The two step process extracts data about PARAMCD (lab test code) and variable type (value and normal range limits) from the names. PARAMCD becomes a new row id variable and V L and H are used to create new variable names when the data are transposed again to the more or less (CDISC SDTM) format.
data A;
input ID Visit Group Test1 Test2 Test3 Test1_L Test1_H Test2_L Test2_H Test3_L Test3_H;
datalines;
1 1 0 5 3 6.7 1 10 2 7 3 9
1 2 0 5.5 3.8 8.7 1 10 2 7 3 6
1 3 0 4.5 2.8 5.7 1 10 3 7 3 6
2 1 1 5 3 6.7 1 10 2 7 3 9
2 2 1 5.5 3.8 8.7 1 10 2 7 3 9
2 3 1 4.5 2.8 5.7 1 10 2 7 3 9
;;;;
run;
proc print;
run;
proc transpose data=a out=b;
by id visit group;
run;
data b;
set b;
length paramcd $8 namecd $1;
call scan(_name_,1,p,l,'_');
paramcd = substrn(_name_,p,l);
namecd = coalesceC(substrn(_name_,p+l+1),'V');
drop p l _name_;
run;
proc sort data=b;
by id visit group paramcd;
run;
proc format;
value $namecd 'V'='Value' 'H'='High' 'L'='Low';
run;
proc transpose data=b out=c(drop=_name_);
by id visit group paramcd;
id namecd;
format namecd $namecd.;
var col1;
run;
data c;
set c;
length RangeFL $1;
if n(low,value) eq 2 and value lt low then RangeFL='L';
else if n(high,value) eq 2 and value gt high then RangeFL='H';
else RangeFL='N';
run;
proc print;
run;

Create different decimal format in SAS proc report and ODS

I have a sample SAS code below. And I want to know how to proc report the percentage using two decimal formats. For example, I want to put 100% as just zero decimal, and all the other percentages which is not 100% with 1 decimal such like 25.0%.
Here is my code.
data a;
infile datalines missover;
input subjid trt itt safety pp complete enroll disreason;
uncomplete=(complete^=1);
datalines;
1 1 1 1 1 1 1
2 2 0 1 1 1 1
3 1 1 1 1 0 1 4
4 2 1 1 1 1 1
5 1 1 1 0 0 1 5
6 2 1 1 1 1 1
7 2 1 1 1 0 1 1
8 1 0 1 0 1 1
9 1 1 1 1 0 1 5
10 2 0 1 0 0 1 5
11 2 1 1 1 0 1 1
12 2 1 1 0 0 1 2
13 1 1 1 0 0 1 3
14 2 1 1 0 0 1 4
;
run;
data b;
set a(in=a) a(in=b);
if b then trt=3;
run;
data c;
length cat $20;
set b;
array a(6) itt safety pp complete uncomplete enroll;
array b(5) r1 r2 r3 r4 r5;
do i=1 to 6;
cat=upcase(vname(a(i)));
value=a(i);
output;
end;
do j=1 to 5;
if disreason=j then value=1;
else value=0;
cat=upcase(vname(b(j)));
output;
end;
keep trt cat value;
run;
proc format ;
value $newcat(notsorted)
'ENROLL'='Enrolled Population'
'1'='1'
'PP'='Per-Protocol Population'
'2'='2'
'ITT'='ITT Population'
'3'='3'
'SAFETY'='Safety Population'
'4'='4'
'COMPLETE'='Patients Completed'
'UNCOMPLETE'='Patients Discontinued'
'5'='Primary Reason for Discontinuation of Study Dose'
'R1'='\li360 Lack of Effect'
'R2'='\li360 Protocol Violation'
'R3'='\li360 Lost to Follow-up'
'R4'='\li360 Adverse Event'
'R5'='\li360 Personal Reason';
value trt 1='Treatment 1'
2='Treatment 2'
3='Overall';
picture pct(round)
0<-100='0009)'(prefix='(' mult=100)
0=0;
run;
option missing='' nodate nonumber orientation=landscape;
ods rtf file='c:\dispoistion.rtf';
proc report data=c completerows nowd
style(report)={frame=hsides rules=groups}
style(header)={background=white};
column cat cat2 trt, value, (sum mean);
define cat/group format=$newcat. preloadfmt order=data noprint;
define cat2/computed ' ' style(column)={cellwidth=30% PROTECTSPECIALCHARS=OFF
};
define trt/across format=trt. '' order=internal;
define value/analysis '';
define sum/ 'N' style(column)={cellwidth=35pt} style(header)={just=right};
define mean/ '(%)' format=pct. style(column)={cellwidth=35pt just=left}
style(header)={just=left};
compute cat2/char length=50;
cat2=put(cat, $newcat.);
if cat2 in ('1', '2', '3','4') then cat2='';
endcomp;
run;
ods rtf close;
You need to use a picture format:
data test;
input x;
datalines;
0.5
0.751
0.999
1.00
;;;;
run;
proc format;
picture pct100f
-1 = [PERCENT7.0]
-1<-<1 = [PERCENT7.1]
1 = [PERCENT7.0]
other=[BEST12.];
quit;
proc print data=test;
format x pct100f.;
var x;
run;
Adjust that as needed. The -1 <-< 1 means anything that is between -1 and 1, exclusive. The [PERCENT7.0] is telling it to use that format for that section.
Try using this format:
define mean/ '(%)' format=percent7.1