I have data with
One binary variable, poor
Two socio-demographic variables var1 and var2
I would like to have the poverty rate of each of my var1 * var2 possible value, that would look like that :
But with three variables in a proc freq, I get multiple outputs, one for each value of the first variable I put on my product
proc freq data=test;
table var1*var2*poor;
run;
How can I get something close to what I would like ?
Try this
data test;
input var1 var2 poor;
cards;
1 1 1
2 3 0
3 2 1
4 1 1
1 2 1
2 3 0
4 1 0
4 2 0
3 1 1
1 2 0
3 2 0
1 3 1
3 3 0
3 3 0
3 3 1
1 1 0
2 2 0
2 2 1
2 2 1
2 1 1
2 1 1
2 1 1
;
run;
proc tabulate data=test;
class var1 var2 poor;
tables var1,
var2*poor*pctn<poor>={label="%"};
run;
Related
I want to find the number of unique ids for every subset combination of the variables. For example
data have;
input id var1 var2 var3;
datalines;
5 1 0 0
5 1 1 1
5 1 0 1
5 0 0 0
6 1 0 0
7 1 1 1
8 1 0 1
9 0 0 0
10 1 0 0
11 1 0 0
12 1 . 1
13 0 0 1
;
run;
I want the result to be
var1 var2 var3 count
. . 0 5
. . 1 5
. 0 . 7
. 0 0 5
. 0 1 3
. 1 . 2
. 1 1 2
0 . . 3
0 . 0 2
0 . 1 1
0 0 . 3
0 0 0 2
0 0 1 1
1 . . 7
1 . 0 4
1 . 1 4
1 0 . 5
1 0 0 4
1 0 1 2
1 1 . 2
1 1 1 2
which is the result of appending all the possible proc sql; group bys (var1 is shown below)
proc sql;
create table sub1 as
select var1, count(distinct id) as count
from have
where not missing(var1)
group by var1
;
quit;
I don't care about the case where all variables are missing or when any of the variables in the group by are missing. Is there a more efficient way of doing this?
You can use Proc SUMMARY to compute the combinations of var1-var3 values for each id by group. From the SUMMARY output a SQL query can count the distinct ids per combination.
Example:
data have;
input id var1 var2 var3;
datalines;
5 1 0 0
5 1 1 1
5 1 0 1
5 0 0 0
6 1 0 0
7 1 1 1
8 1 0 1
9 0 0 0
10 1 0 0
11 1 0 0
12 1 . 1
13 0 0 1
;
proc summary noprint missing data=have;
by id;
class var1-var3;
output out=combos;
run;
proc sql;
create table want as
select var1, var2, var3, count(distinct id) as count
from combos
group by var1, var2, var3
;
I have a time series SAS dataset and I want to transfer it to vertical dataset.
My data looks like..
ID A2009 A2010 A2011 A2012
1 1 2 3 4
2 1 2 3 4
3 1 2 3 4
4 1 2 3 4
5 1 2 3 4
data multcol;
infile datalines;
input ID A2009 A2010 A2011 A2012 A2013;
return;
datalines;
1 1 2 3 4 5
2 1 2 3 4 5
3 1 2 3 4 5
4 1 2 3 4 5
5 1 2 3 4 5
;
run;
proc print data=multcol noobs;
run;
I search the web only find someone's solution as following.Not worked.
But my dataset is too large, this method shut down my computer.
data cmbcol(keep=a orig_varname orig_obsnum);
set multcol;
array myvars _numeric_;
do i = 2 to dim(myvars);
orig_varname = vname(myvars(i));
orig_obsnum = _n_;
A = myvars(i);
output;
end;
run;
proc print data=cmbcol ;
title 'cmbcol';
run;
proc sort data=cmbcol;
by orig_varname a;
run;
proc print data=cmbcol noobs;
title 'cmbcol';
run;
And I want them to become like this.
ID t t+1
1 1 2
2 1 2
3 1 2
4 1 2
5 1 2
1 2 3
2 2 3
3 2 3
4 2 3
5 2 3
1 3 4
2 3 4
3 3 4
4 3 4
5 3 4
How can we do that?
Thanks in advance.
That is an unusual data structure for sure, but you could achieve this using the following macro (adjust to your needs).
options validvarname = any;
%macro transp;
%let i = 2009;
%do %while (&i <= 2011);
%let j = %eval(&i + 1);
data part_&i(rename = (A&i = t A&j = 't+1'n));
set multcol(keep = ID A&i A&j);
run;
%let i = %eval(&i + 1);
%end;
data combined;
set part_:;
run;
proc datasets nolist nodetails;
delete part_:;
quit;
%mend transp;
%transp
I have a sample SAS code below. And I want to know how to proc report the percentage using two decimal formats. For example, I want to put 100% as just zero decimal, and all the other percentages which is not 100% with 1 decimal such like 25.0%.
Here is my code.
data a;
infile datalines missover;
input subjid trt itt safety pp complete enroll disreason;
uncomplete=(complete^=1);
datalines;
1 1 1 1 1 1 1
2 2 0 1 1 1 1
3 1 1 1 1 0 1 4
4 2 1 1 1 1 1
5 1 1 1 0 0 1 5
6 2 1 1 1 1 1
7 2 1 1 1 0 1 1
8 1 0 1 0 1 1
9 1 1 1 1 0 1 5
10 2 0 1 0 0 1 5
11 2 1 1 1 0 1 1
12 2 1 1 0 0 1 2
13 1 1 1 0 0 1 3
14 2 1 1 0 0 1 4
;
run;
data b;
set a(in=a) a(in=b);
if b then trt=3;
run;
data c;
length cat $20;
set b;
array a(6) itt safety pp complete uncomplete enroll;
array b(5) r1 r2 r3 r4 r5;
do i=1 to 6;
cat=upcase(vname(a(i)));
value=a(i);
output;
end;
do j=1 to 5;
if disreason=j then value=1;
else value=0;
cat=upcase(vname(b(j)));
output;
end;
keep trt cat value;
run;
proc format ;
value $newcat(notsorted)
'ENROLL'='Enrolled Population'
'1'='1'
'PP'='Per-Protocol Population'
'2'='2'
'ITT'='ITT Population'
'3'='3'
'SAFETY'='Safety Population'
'4'='4'
'COMPLETE'='Patients Completed'
'UNCOMPLETE'='Patients Discontinued'
'5'='Primary Reason for Discontinuation of Study Dose'
'R1'='\li360 Lack of Effect'
'R2'='\li360 Protocol Violation'
'R3'='\li360 Lost to Follow-up'
'R4'='\li360 Adverse Event'
'R5'='\li360 Personal Reason';
value trt 1='Treatment 1'
2='Treatment 2'
3='Overall';
picture pct(round)
0<-100='0009)'(prefix='(' mult=100)
0=0;
run;
option missing='' nodate nonumber orientation=landscape;
ods rtf file='c:\dispoistion.rtf';
proc report data=c completerows nowd
style(report)={frame=hsides rules=groups}
style(header)={background=white};
column cat cat2 trt, value, (sum mean);
define cat/group format=$newcat. preloadfmt order=data noprint;
define cat2/computed ' ' style(column)={cellwidth=30% PROTECTSPECIALCHARS=OFF
};
define trt/across format=trt. '' order=internal;
define value/analysis '';
define sum/ 'N' style(column)={cellwidth=35pt} style(header)={just=right};
define mean/ '(%)' format=pct. style(column)={cellwidth=35pt just=left}
style(header)={just=left};
compute cat2/char length=50;
cat2=put(cat, $newcat.);
if cat2 in ('1', '2', '3','4') then cat2='';
endcomp;
run;
ods rtf close;
You need to use a picture format:
data test;
input x;
datalines;
0.5
0.751
0.999
1.00
;;;;
run;
proc format;
picture pct100f
-1 = [PERCENT7.0]
-1<-<1 = [PERCENT7.1]
1 = [PERCENT7.0]
other=[BEST12.];
quit;
proc print data=test;
format x pct100f.;
var x;
run;
Adjust that as needed. The -1 <-< 1 means anything that is between -1 and 1, exclusive. The [PERCENT7.0] is telling it to use that format for that section.
Try using this format:
define mean/ '(%)' format=percent7.1
Hi my dataset looks something like this:
Var1 Var2 mainvar
1 0 1
0 0 1
1 1 3
0 0 2
1 1 5
1 1 4
0 0 3
I want to tabulate Var1 and Var2 based on the value of mainvar (which ranges from 1 to 5) so I tried:
%let class=Var1 Var2
proc tabulate data=x noseps missing FORMCHAR=' ';
class &class mainvar;
table &class;
run;
But this is giving me the table without the data being factored by values of mainvar. Any help? Thanks!
In general, I think it's best to create a reproducible example. The following works fine for me:
data example ;
input var1 var2 mainvar ;
cards;
1 0 1
0 0 1
1 1 3
0 0 2
1 1 5
1 1 4
0 0 3
;
run;
%let class=Var1 Var2 ;
proc tabulate data=example noseps missing FORMCHAR=' ';
class &class mainvar;
table &class;
run;
How can I manage proc tabulate to show the value of a variable with missing value instead of its statistic? Thanks!
For example, I want to show the value of sym. It takes value 'x' or missing value. How can I do it?
Sample code:
data test;
input tx mod bm $ yr sym $;
datalines;
1 1 a 0 x
1 2 a 0 x
1 3 a 0 x
2 1 a 0 x
2 2 a 0 x
2 3 a 0 x
3 1 a 0
3 2 a 0
3 3 a 0 x
1 1 b 0 x
1 2 b 0
1 3 b 0
1 4 b 0
1 5 b 0
2 1 b 0
2 2 b 0
2 3 b 0
2 4 b 0
2 5 b 0
3 1 b 0 x
3 2 b 0
3 3 b 0
1 1 c 0
1 2 c 0 x
1 3 c 0
2 1 c 0
2 2 c 0
2 3 c 0
3 1 c 0
3 2 c 0
3 3 c 0
1 3 a 1 x
2 3 a 1
3 3 a 1
1 3 b 1
2 3 b 1
3 3 b 1
1 3 c 1 x
2 3 c 1
3 3 c 1
;
run;
proc tabulate data=test;
class yr bm tx mod ;
var sym;
table yr*bm, tx*mod;
run;
proc tabulate data=test;
class tx mod bm yr sym;
table yr*bm, tx*mod*sym*n;
run;
That gives you ones for each SYM=x (since n=missing). That hides the rows for SYM=missing, hence you miss some values overall from your example table. (You could format the column with a format that defines 1 = 'x' easily).
proc tabulate data=test;
class tx mod bm yr;
class sym /missing;
table yr*bm, tx*mod*sym=' '*n;
run;
That gives you all of your combinations of the 4 main variables, but includes missing syms as their own column.
If you want to have your cake and eat it too, then you need to redefine SYM to be a numeric variable, so you can use it as a VAR.
proc format;
invalue ISYM
x=1
;
value FSYM
1='x';
quit;
data test;
infile datalines truncover;
input tx mod bm $ yr sym :ISYM.;
format sym FSYM.;
datalines;
1 1 a 0 x
1 2 a 0 x
1 3 a 0 x
... more lines ...
;
run;
proc tabulate data=test;
class tx mod bm yr;
var sym;
table yr*bm, tx*mod*sym*sum*f=FSYM.;
run;
All of these assume these are unique combination rows. If you start having multiples of yr*bm*tx*mod, you would have a problem here as this wouldn't give you the expected result (sum 1+1+1=3 would not give you an 'x').