Using SASHELP.CARS, I would like to make a PROC TABULATE by Origin. So, the first way is to make 3 PROC TABULATE such as :
PROC TABULATE DATA = data out=tabulate;
where Origin="Asia";
CLASS Make DriveTrain ;
TABLE (Make), (DriveTrain) / nocellmerge ;
run;
But, instead, I would like to automatize this in a macro loop (here is a simple example I made. The real database I work with is more complex; this is why I need to make a macro :). Could you please help me why the following code won’t work ? It’s the « where Origin=reg; » part that seems to be the problem. Thank you ! So here is my code:
data data; set sashelp.cars;run;
data classes;
input id_reg reg_name $ ;
cards;
1 Asia
2 Europe
3 USA
run;
%macro comp;
%local i reg;
%do i=1 %to 3;
proc sql ;
select reg_name
into
:reg_name
from classes
where id_reg = &i.;
quit;
%let reg=reg_name;
PROC TABULATE DATA = data out=tabulate_&i;
where Origin=reg;
CLASS Make DriveTrain ;
TABLE (Make), (DriveTrain) / nocellmerge ;
run;
%end;
%mend comp;
%comp
If you insist on using Macro, the correct statement will be generated by double quoting the macro variable resolution so as to inject a string literal into the submit stream.
where Origin="®";
Use a BY statement to independently process like grouped subsets of a data set.
Use a WHERE statement to select the subset(s) to process.
Example:
ods html file='output.html' style=plateau;
proc sort data=sashelp.cars out=cars_sorted;
by origin;
run;
title;footnote;
options nocenter nodate nonumber;
PROC TABULATE DATA=cars_sorted;
by origin;
where Origin in ("Asia", "Europe", "USA");
where also make >= 'P'; * further subset for reduced size of output screen shot;
CLASS Make DriveTrain ;
TABLE (Make), (DriveTrain) / nocellmerge ;
run;
ods html close;
Output
Alternatively, use a TABLE statement of form <page dimension>,<row dimension>,<column dimension> in lieu of BY group processing. Such a form does not need presorted data because the is constructed from CLASS variables.
Example:
PROC TABULATE DATA=sashelp.cars; /* original data, not sorted */
where Origin in ("Asia", "Europe", "USA");
where also make >= 'P'; * further subset for reduced size of output screen shot;
CLASS Origin Make DriveTrain ; /* Origin added to CLASS */
TABLE Origin, (Make), (DriveTrain) / nocellmerge ; /* Origin is page dimension */
run;
Output
Thank you very much! Here is the code that works:
data data; set sashelp.cars;run;
data classes;
input id_reg reg_name $ ;
cards;
1 Asia
2 Europe
3 USA
run;
%macro comp;
%local i ;
%do i=1 %to 3;
proc sql ;
select reg_name
into
:reg_name
from classes
where id_reg = &i.;
quit;
PROC TABULATE DATA = data(where=(Origin="®_name")) out=tabulate_&i;
CLASS Make DriveTrain ;
TABLE (Make), (DriveTrain) / nocellmerge ;
run;
%end;
%mend comp;
%comp
Related
[I have this piece of code. However, the Macro in proc univariate generate too many separate dataset due to loop t from 1 to 310. How can I modify this code to include all proc univariate output into one dataset and then modify the rest of the code for a more efficient run?]
%let L=10; %* 10th percentile *;
%let H=%eval(100 - &L); %* 90th percentile*;
%let wlo=V1&L V2&L V3&L ;
%let whi=V1&H V2&H V3&H ;
%let wval=wV1 wV2 wV3 ;
%let val=V1 V2 V3;
%macro winsorise();
%do v=1 %to %sysfunc(countw(&val));
%do t=1 %to 310;
proc univariate data=regressors noprint;
var &val;
output out=_winsor&t._V&v pctlpts=&H &L
prtlpre=&val&t._V&v;
where time_count<=&t;run;
%end;
data regressors (drop=__:);
set regressors;
if _n_=1 then set _winsor&t._V&v;
&wval&t._V&v=min(max(&val&t._V&v,&wlo&t._V&v),&whi&t._V&v);
run;
%end;
%mend;
Thank you.
Presume you have data time_count, x1, x2, x3 with samples at every 0.5 time unit.
data regressors;
call streaminit(123);
do time_count = 0 to 310 by .5;
x1 = 2 ** (sin(time_count/6) * log(time_count+1));
x2 = log2 (time_count+1) + log(time_count/10+.1);
x3 = rand('normal',
output;
end;
format x: 7.3;
run;
Stack the data into groups based on integer time_count levels. The stack is constructed from a full outer join with a less than (<=) criteria. Each group is identified by the top time_count in the group.
proc sql;
create table stack as
select
a.time_count
, a.x1
, a.x2
, a.x3
, b.time_count as time_count_group /* save top value in group variable */
from regressors as a
full join regressors as b /* self full join */
on a.time_count <= b.time_count /* triangular criteria */
where
int(b.time_count)=b.time_count /* select integer top values */
order by
b.time_count, a.time_count
;
quit;
Now compute ALL your stats for ALL your variables for ALL your groups in one go. No macro, no muss, no fuss.
proc univariate data=stack noprint;
by time_count_group;
var x1 x2 x3;
output out=_winsor n=group_size pctlpts=90 10 pctlpre=x1_ x2_ x3_;
run;
So I'm having trouble getting what appears to be a fairly common model validation macro to run with my code. Here's the code:
%macro hoslem (data=, pred=, y=, ngro=10,print=T,out=hl);
/*---
Macro computes the Hosmer-Lemeshow Chi Square Statistic
for Calibration
Parameters (* = required)
-------------------------
data* input dataset
pred* Predicted Probabilities
y* Outcome 0/1 variable
ngro # of groups for the calibration test (default 10)
print Prints output (set to F to Suppress)
out output dataset (default HL)
Author: Kevin Kennedy
---*/
%let print = %upcase(&print);
proc format;
value pval 0-.0001='<.0001';
run;
data first;set &data;where &y^=. and &pred^=.;run;
proc rank groups=&ngro out=ranks data=first;ranks phat_grp;var &pred;run;
proc sort data=ranks;by phat_grp;run;
proc sql;
create table ranks2 as select *, count(*) as num_dec label='Sample Size', sum(&pred) as sum_pred
label='Sum Probabilities', sum(&y) as sum_y label='Number of Events'
from ranks
group by phat_grp;
create table ranks3 as select distinct(phat_grp),num_dec,sum_pred,sum_y,
((sum_y-sum_pred)**2/(sum_pred*(1-sum_pred/num_dec))) as chi_part label 'Chi-Square Term'
from ranks2 ;
select sum(chi_part) into :chi_sq
from ranks3;
quit;
data &out;
chi_sq=&chi_sq; label chi_sq='Hosmer Lemeshow Chi Square';
df=&ngro-2; label df ='Degree of Freedom';
p_value=1-cdf('chisquared',chi_sq,df);label p_value= 'P-Value';
format p_value pval.;
run;
%if &print=T %then %do;
title 'Hosmer Lemeshow Details';
proc print data=ranks3 noobs label;run;
Options formdlim='-';
title 'Hosmer Lemeshow Calibration Test';
proc print data=&out noobs label;run;
%end;
Options formdlim='';
%mend;
%macro add_predictive(data=, y=, p_old=, p_new= , nripoints=%str(),hoslemgrp=10) ;
/*---
this macro attempts to quantify the added predictive ability of a covariate(s)
in logistic regression based off of the statistics in: M. J. Pencina ET AL. Statistics
in Medicine (2008 27(2):157-72). Statistics returned with be: C-statistics (for 9.2 users),
IDI (INTEGRATED DISCRIMINATION IMPROVEMENT), NRI (net reclassification index)
for both Category-Free and User defined groups, and Hosmer Lemeshow GOF test
with associated pvalues and z scores.
Parameters (* = required)
-------------------------
data* Specifies the SAS dataset
y* Response variable (Outcome must be 0/1)
p_old* Predicted Probability of an Event using Initial Model
p_new* Predicted Probability of an Event using New Model
nripoints Groups for User defined classification (Optional),
Example 3 groups: (<.06, .06-.2, >.2) then nripoints=.06 .2
hoslemgrp # of groups for the Hosmer Lemeshow test (default 10)
Author: Kevin Kennedy and Michael Pencina
Date: May 26, 2010
---*/
options nonotes nodate nonumber;
ods select none;
%local start end ;
%let start=%sysfunc(datetime());
proc format;
value pval 0-.0001='<.0001';
run;
/******Step 1: C-Statistics******/
/********************************/
%if %sysevalf(&sysver >= 9.2) %then %do;
%put ********Running AUC Analysis************;
proc logistic data=&data descending;
model &y=&p_old &p_new;
roc 'first' &p_old;
roc 'second' &p_new;
roccontrast reference('first')/estimate e;
ods output ROCAssociation=rocass ROCContrastEstimate=rocdiff;
run;
proc sql noprint;
select estimate, StdErr, lowercl, uppercl, (ProbChiSq*100) as pval
into :rocdiff, :rocdiff_stderr, :rocdiff_low, :rocdiff_up, :rocp
from rocdiff
where find(contrast,'second');
quit;
data _null_;
set rocass;
if ROCModel='first' then do;
call symputx('c_old',Area);
end;
if ROCModel='second' then do;
call symputx('c_new',Area);
end;
run;
data cstat;
cstat_old=&c_old; label cstat_old='Model1 AUC';
cstat_new=&c_new; label cstat_new='Model2 AUC';
cstat_diff=&rocdiff; label cstat_diff='Difference in AUC';
cstat_stderr=&rocdiff_stderr; label cstat_stderr='Standard Error of Difference in AUC';
cstat_low=&rocdiff_low; label cstat_low='Difference in AUC Lower 95% CI';
cstat_up=&rocdiff_up; label cstat_up='Difference in ACU Upper 95% CI';
cstat_ci='('!!trim(left(cstat_low))!!','!!trim(left(cstat_up))!!')';
label cstat_ci='95% CI for Difference in AUC';
cstat_pval=&rocp/100; label cstat_pval='P-value for AUC Difference';
format cstat_pval pval.;
run;
%end;
%if %sysevalf(&sysver < 9.2) %then %do;
options notes ;
%put *********************;
%put NOTE: You are running a Pre 9.2 version of SAS;
%put NOTE: Go to SAS website to get example of ROC Macro for AUC Comps;
%put NOTE: http://support.sas.com/kb/25/017.html;
%put *********************;
%put;
options nonotes ;
%end;
/******************************/
/*****End step 1***************/
/******************************/
/******Step 2: IDI***************/
%put ********Running IDI Analysis************;
proc sql noprint;
create table idinri as select &y,&p_old, &p_new, (&p_new-&p_old) as pdiff
from &data
where &p_old^=. and &p_new^=.
order by &y;
quit;
proc sql noprint; /*define mean probabilities for old and new model and event and nonevent*/
select count(*),avg(&p_old), avg(&p_new),stderr(pdiff) into
:num_event, :p_event_old, :p_event_new,:eventstderr
from idinri
where &y=1 ;
select count(*),avg(&p_old), avg(&p_new),stderr(pdiff) into
:num_nonevent, :p_nonevent_old, :p_nonevent_new ,:noneventstderr
from idinri
where &y=0;
quit;
data fin(drop=slope_noadd slope_add);
pen=&p_event_new; label pen='Mean Probability for Events: Model2';
peo=&p_event_old; label peo='Mean Probability for Events: Model1';
pnen=&p_nonevent_new; label pnen='Mean Probability for NonEvents: Model2';
pneo=&p_nonevent_old; label pneo='Mean Probability for NonEvents: Model1';
idi=(&p_event_new-&p_nonevent_new)-(&p_event_old-&p_nonevent_old);
label idi='Integrated Discrimination Improvement';
idi_stderr=sqrt((&eventstderr**2)+(&noneventstderr**2));
label idi_stderr='IDI Standard Error';
idi_lowci=round(idi-1.96*idi_stderr,.0001);
idi_upci=round(idi+1.96*idi_stderr,.0001);
idi_ci='('!!trim(left(idi_lowci))!!','!!trim(left(idi_upci))!!')';
label idi_ci='IDI 95% CI';
z_idi=abs(idi/(sqrt((&eventstderr**2)+(&noneventstderr**2))));
label z_idi='Z-value for IDI';
pvalue_idi=2*(1-PROBNORM(abs(z_idi))); label pvalue_idi='P-value for IDI';
change_event=&p_event_new-&p_event_old;
label change_event='Probability change for Events';
change_nonevent=&p_nonevent_new-&p_nonevent_old;
label change_nonevent='Probability change for Nonevents';
slope_noadd=&p_event_old-&p_nonevent_old;
slope_add=&p_event_new-&p_nonevent_new;
relative_idi=slope_add/slope_noadd-1; label relative_idi='Relative IDI';
format pvalue_idi pval.;
run;
/************step 3 NRI analysis*******/
%put ********Running NRI Analysis************;
data nri_inf;
set idinri;
if &y=1 then do;
down_event=(pdiff<0);up_event=(pdiff>0);down_nonevent=0;up_nonevent=0;
end;
if &y=0 then do;
down_nonevent=(pdiff<0);up_nonevent=(pdiff>0);down_event=0;up_event=0;
end;
run;
proc sql;
select sum(up_nonevent), sum(down_nonevent), sum(up_event),sum(down_event)
into :num_nonevent_up_user, :num_nonevent_down_user, :num_event_up_user, :num_event_down_user
from nri_inf
quit;
/* Category-Free Groups */
data nri1;
group="Category-Free NRI";
p_up_event=&num_event_up_user/&num_event;
p_down_event=&num_event_down_user/&num_event;
p_up_nonevent=&num_nonevent_up_user/&num_nonevent;
p_down_nonevent=&num_nonevent_down_user/&num_nonevent;
nri=(p_up_event-p_down_event)-(p_up_nonevent-p_down_nonevent);
nri_stderr=sqrt(((&num_event_up_user+&num_event_down_user)/&num_event**2-(&num_event_up_user-
&num_event_down_user)**2/&num_event**3)+
((&num_nonevent_down_user+&num_nonevent_up_user)/&num_nonevent**2-
(&num_nonevent_down_user-&num_nonevent_up_user)**2/&num_nonevent**3));
low_nrici=round(nri-1.96*nri_stderr,.0001);
up_nrici=round(nri+1.96*nri_stderr,.0001);
nri_ci='('!!trim(left(low_nrici))!!','!!trim(left(up_nrici))!!')';
z_nri=nri/sqrt(((p_up_event+p_down_event)/&num_event)
+((p_up_nonevent+p_down_nonevent)/&num_nonevent)) ;
pvalue_nri=2*(1-PROBNORM(abs(z_nri)));
event_correct_reclass=p_up_event-p_down_event;
nonevent_correct_reclass=p_down_nonevent-p_up_nonevent;
z_event=event_correct_reclass/sqrt((p_up_event+p_down_event)/&num_event);
pvalue_event=2*(1-probnorm(abs(z_event)));
z_nonevent=nonevent_correct_reclass/sqrt((p_up_nonevent+p_down_nonevent)/&num_nonevent);
pvalue_nonevent=2*(1-probnorm(abs(z_nonevent)));
format pvalue_nri pvalue_event pvalue_nonevent pval. event_correct_reclass
nonevent_correct_reclass percent.;
label nri='Net Reclassification Improvement'
nri_stderr='NRI Standard Error'
low_nrici='NRI lower 95% CI'
up_nrici='NRI upper 95% CI'
nri_ci='NRI 95% CI'
z_nri='Z-Value for NRI'
pvalue_nri='NRI P-Value'
pvalue_event='Event P-Value'
pvalue_nonevent='Non-Event P-Value'
event_correct_reclass='% of Events correctly reclassified'
nonevent_correct_reclass='% of Nonevents correctly reclassified';
run;
/*User Defined NRI*/
%if &nripoints^=%str() %then %do;
/*words macro*/
%macro words(list,delim=%str( ));
%local count;
%let count=0;
%do %while(%qscan(%bquote(&list),&count+1,%str(&delim)) ne %str());
%let count=%eval(&count+1);
%end;
&count
%mend words;
%let numgroups=%eval(%words(&nripoints)+1); /*figure out how many ordinal groups*/
proc format ;
value group
1 = "0 to %scan(&nripoints,1,%str( ))"
%do i=2 %to %eval(&numgroups-1);
%let j=%eval(&i-1);
&i="%scan(&nripoints,&j,%str( )) to %scan(&nripoints,&i,%str( ))"
%end;
%let j=%eval(&numgroups-1);
&numgroups="%scan(&nripoints,&j,%str( )) to 1";
run;
data idinri;
set idinri;
/*define first ordinal group for pre and post*/
if 0<=&p_old<=%scan(&nripoints,1,%str( )) then group_pre=1;
if 0<=&p_new<=%scan(&nripoints,1,%str( )) then group_post=1;
%let i=1;
%do %until(&i>%eval(&numgroups-1));
if %scan(&nripoints,&i,%str( ))<&p_old then do;
group_pre=&i+1;
end;
if %scan(&nripoints,&i,%str( ))<&p_new then do;
group_post=&i+1;
end;
%let i=%eval(&i+1);
%end;
if &y=0 then do;
up_nonevent=(group_post>group_pre);
down_nonevent=(group_post<group_pre);
down_event=0; up_event=0;
end;
if &y=1 then do;
up_event=(group_post>group_pre);
down_event=(group_post<group_pre);
down_nonevent=0; up_nonevent=0;
end;
format group_pre group_post group.;
run;
proc sql;
select sum(up_nonevent), sum(down_nonevent), sum(up_event),sum(down_event),avg(&y)
into :num_nonevent_up_user, :num_nonevent_down_user, :num_event_up_user,
:num_event_down_user, :eventrate
from idinri
quit;
data nri2;
group='User Category NRI';
p_up_event=&num_event_up_user/&num_event;
p_down_event=&num_event_down_user/&num_event;
p_up_nonevent=&num_nonevent_up_user/&num_nonevent;
p_down_nonevent=&num_nonevent_down_user/&num_nonevent;
nri=(p_up_event-p_down_event)-(p_up_nonevent-p_down_nonevent);
nri_stderr=sqrt(((&num_event_up_user+&num_event_down_user)/&num_event**2-
(&num_event_up_user-&num_event_down_user)**2/&num_event**3)+
((&num_nonevent_down_user+&num_nonevent_up_user)/&num_nonevent**2-
(&num_nonevent_down_user-&num_nonevent_up_user)**2/&num_nonevent**3));
low_nrici=round(nri-1.96*nri_stderr,.0001);
up_nrici=round(nri+1.96*nri_stderr,.0001);
nri_ci='('!!trim(left(low_nrici))!!','!!trim(left(up_nrici))!!')';
z_nri=nri/sqrt(((p_up_event+p_down_event)/&num_event)
+((p_up_nonevent+p_down_nonevent)/&num_nonevent)) ;
pvalue_nri=2*(1-PROBNORM(abs(z_nri)));
event_correct_reclass=p_up_event-p_down_event;
nonevent_correct_reclass=p_down_nonevent-p_up_nonevent;
z_event=event_correct_reclass/sqrt((p_up_event+p_down_event)/&num_event);
pvalue_event=2*(1-probnorm(abs(z_event)));
z_nonevent=nonevent_correct_reclass/sqrt((p_up_nonevent+p_down_nonevent)/&num_nonevent);
pvalue_nonevent=2*(1-probnorm(abs(z_nonevent)));
format pvalue_nri pval.;
run;
data nri1;
set nri1 nri2;
run;
%end;
/**************/
/*step 4 gof */
/**************/
%hoslem(data=idinri,pred=&p_old,y=&y,ngro=&hoslemgrp,out=m1,print=F);
%hoslem(data=idinri,pred=&p_new,y=&y,ngro=&hoslemgrp,out=m2,print=F);
data hoslem(drop=cnt);
retain model;
set m1 m2;
cnt+1;
if cnt=1 then model='Model1';
else model='Model2';
run;
ods select all;
/*output for cstat*/
%if %sysevalf(&sysver >= 9.2) %then %do;
proc print data=cstat label noobs;
title1 "Evaluating added predictive ability of model2";
title2 'AUC Analysis';run;
%END;
/*output for IDI*/
proc print data=fin label noobs;
title1 "Evaluating added predictive ability of model2";
title2 'IDI Analysis';
var idi idi_stderr z_idi pvalue_idi idi_ci
pen peo pnen pneo change_event change_nonevent relative_idi;
run;
/*output for NRI*/
proc print data=nri1 label noobs;
title1 "Evaluating added predictive ability of model2";
title2 'NRI Analysis';
var group nri nri_stderr z_nri pvalue_nri nri_ci event_correct_reclass pvalue_event
nonevent_correct_reclass pvalue_nonevent;
run;
%if &nripoints^=%str() %then %do;
proc freq data=idinri;
where &y=0;
title 'NRI Table for Non-Events';
tables group_pre*group_post/nopercent nocol;
run;
proc freq data=idinri;
where &y=1;
title 'NRI Table for Events';
tables group_pre*group_post/nopercent nocol;
run;
%end;
/*print HL gof*/
proc print data=hoslem noobs label;
title "Hosmer Lemeshow Test with %sysevalf(&hoslemgrp-2) df";
run;
proc datasets library=work nolist;
delete fin idinri nri1 nri2 nri_inf stderr;
quit;
options notes;
%put NOTE: Macro %nrstr(%%)add_predictive completed.;
%let end=%sysfunc(datetime());
%let runtime=%sysfunc(round(%sysevalf(&end-&start)));
%put NOTE: Macro Real Run Time=&runtime seconds;
title;
%mend;
/*initial model*/
proc logistic data=DATA;
model value(event='1')=X1-X20;
output out=m1 pred=p1;
run;
/*new model*/
proc logistic data=data;
model value(event='1')=X1-X20;
output out=m2 pred=p2;
run;
proc sql;
create table TABLENAME as select *
from m1 as a left join m2 as b on a.cnt=b.cnt;
quit;
%add_predictive(data=DATA,y=value,p_old=p1,p_new=p2, nripoints=.1 .3);
The error I get after running is:
ERROR: Column cnt could not be found in the table/view identified with the correlation name A.
ERROR: Column cnt could not be found in the table/view identified with the correlation name A.
ERROR: Column cnt could not be found in the table/view identified with the correlation name B.
ERROR: Column cnt could not be found in the table/view identified with the correlation name B.
However when trouble shooting it seems to take me down a never ending path of unrecognized variables. Am I calling the macro incorrectly?
I have a dataset with some variables named sx for x = 1 to n.
Is it possible to write a freq which gives the same result as:
proc freq data=prova;
table s1 * s2 * s3 * ... * sn /list missing;
run;
but without listing all the names of the variables?
I would like an output like this:
S1 S2 S3 S4 Frequency
A 10
A E 100
A E J F 300
B 10
B E 100
B E J F 300
but with an istruction like this (which, of course, is invented):
proc freq data=prova;
table s1:sn /list missing;
run;
Why not just use PROC SUMMARY instead?
Here is an example using two variables from SASHELP.CARS.
So this is PROC FREQ code.
proc freq data=sashelp.cars;
where make in: ('A','B');
tables make*type / list;
run;
Here is way to get counts using PROC SUMMARY
proc summary missing nway data=sashelp.cars ;
where make in: ('A','B');
class make type ;
output out=want;
run;
proc print data=want ;
run;
If you need to calculate the percentages you can instead use the WAYS statement to get both the overall and the individual cell counts. And then add a data step to calculate the percentages.
proc summary missing data=sashelp.cars ;
where make in: ('A','B');
class make type ;
ways 0 2 ;
output out=want;
run;
data want ;
set want ;
retain total;
if _type_=0 then total=_freq_;
percent=100*_freq_/total;
run;
So if you have 10 variables you would use
ways 0 10 ;
class s1-s10 ;
If you just want to build up the string "S1*S2*..." then you could use a DO loop or a macro %DO loop and put the result into a macro variable.
data _null_;
length namelist $200;
do i=1 to 10;
namelist=catx('*',namelist,cats('S',i));
end;
call symputx('namelist',namelist);
run;
But here is an easy way to make such a macro variable from ANY variable list not just those with numeric suffixes.
First get the variables names into a dataset. PROC TRANSPOSE is a good way if you use the OBS=0 dataset option so that you only get the _NAME_ column.
proc transpose data=have(obs=0) ;
var s1-s10 ;
run;
Then use PROC SQL to stuff the names into a macro variable.
proc sql noprint;
select _name_
into :namelist separated by '*'
from &syslast
;
quit;
Then you can use the macro variable in your TABLES statement.
proc freq data=have ;
tables &namelist / list missing ;
run;
Car':
In short, no. There is no shortcut syntax for specifying a variable list that crosses dimension.
In long, yes -- if you create a surrogate variable that is an equivalent crossing.
Discussion
Sample data generator:
%macro have(top=5);
%local index;
data have;
%do index = 1 %to ⊤
do s&index = 1 to 2+ceil(3*ranuni(123));
%end;
array V s:;
do _n_ = 1 to 5*ranuni(123);
x = ceil(100*ranuni(123));
if ranuni(123) < 0.1 then do;
ix = ceil(&top*ranuni(123));
h = V(ix);
V(ix) = .;
output;
V(ix) = h;
end;
else
output;
end;
%do index = 1 %to ⊤
end;
%end;
run;
%mend;
%have;
As you probably noticed table s: created one freq per s* variable.
For example:
title "One table per variable";
proc freq data=have;
tables s: / list missing ;
run;
There is no shortcut syntax for specifying a variable list that crosses dimension.
NOTE: If you specify out=, the column names in the output data set will be the last variable in the level. So for above, the out= table will have a column "s5", but contain counts corresponding to combinations for each s1 through s5.
At each dimensional level you can use a variable list, as in level1 * (sublev:) * leaf. The same caveat for out= data applies.
Now, reconsider the original request discretely (no-shortcut) crossing all the s* variables:
title "1 table - 5 columns of crossings";
proc freq data=have;
tables s1*s2*s3*s4*s5 / list missing out=outEach;
run;
And, compare to what happens when a data step view uses a variable list to compute a surrogate value corresponding to the discrete combinations reported above.
data haveV / view=haveV;
set have;
crossing = catx(' * ', of s:); * concatenation of all the s variables;
keep crossing;
run;
title "1 table - 1 column of concatenated crossings";
proc freq data=haveV;
tables crossing / list missing out=outCat;
run;
Reality check with COMPARE, I don't trust eyeballs. If zero rows with differences (per noequal) then the out= data sets have identical counts.
proc compare noprint base=outEach compare=outCat out=diffs outnoequal;
var count;
run;
----- Log -----
NOTE: There were 31 observations read from the data set WORK.OUTEACH.
NOTE: There were 31 observations read from the data set WORK.OUTCAT.
NOTE: The data set WORK.DIFFS has 0 observations and 3 variables.
NOTE: PROCEDURE COMPARE used (Total process time)
I have data on with state variation in U.S. Now i want to creat many dummies to control state fix effect. In stata it's an easy work while in sas it seems I have to create all dummies manually.However logit regression with fix effects runs quite slow in stata. I wonder whether there's a more efficient way to create dummy from char variables(not numerical, which I know a few methods to apply) in sas since I have too many char variables need to be created as dummies.
Cheers,
Eva
proc logistic supports the class statement. Place your variables in the class statement and you can specify the type of parameterization you'd like as well. The most common method is referential coding.
proc logistic data=sashelp.heart;
class sex bp_status/param=ref;
model status = sex ageAtStart height weight bp_status;
run;
https://support.sas.com/documentation/cdl/en/statug/63347/HTML/default/viewer.htm#statug_logistic_sect006.htm
Not all procs support the class statement, in those cases you can use proc glmmod or a variety of other method to create your dummy variables.
http://blogs.sas.com/content/iml/2016/02/22/create-dummy-variables-in-sas.html
If you absolutely need to manually create dummy variables you can use a macro like this one. You would need to call it for each variable.
%macro create_dummy(dataset=, var=);
%* Save Distinct Values and Dummy Variable Names;
proc sql noprint;
select distinct
&var,
tranwrd(tranwrd(trim(&var), " ", "_"), ".", "")
into
:value1-,
:name1-
from
&dataset
;
select
count(distinct(&var))
into
:total
from
&dataset
;
quit;
%* Create Dummy Variables;
data &dataset;
set &dataset;
%do i=1 %to &total;
if &var = "&&value&i" then &&name&i = 1; else &&name&i = 0;
%end;
run;
%mend create_dummy;
You can add a loop to the Macro if you want to call the Macro only once. Add a do loop to the top like:
%macro create_dummy(dataset=, var=);
%do l %to %sysfunc(countw(&var));
%let var1 = %scan(&var, &l);
%* Save Distinct Values and Dummy Variable Names;
proc sql noprint;
select distinct
&var1,
tranwrd(tranwrd(trim(&var1), " ", "_"), ".", "")
into
:value1-,
:name1-
from
&dataset
;
select
count(distinct(&var1))
into
:total
from
&dataset
;
quit;
%* Create Dummy Variables;
data &dataset;
set &dataset;
%do i=1 %to &total;
if &var1 = "&&value&i" then &&name&i = 1; else &&name&i = 0;
%end;
run;
%end;
%mend create_dummy;
I'm a beginner in SAS and I have the following problem.
I need to calculate counts and percents of several variables (A B C) from one dataset and save the results to another dataset.
my code is:
proc freq data=mydata;
tables A B C / out=data_out ; run;
the result of the procedure for each variable appears in the SAS output window, but data_out contains the results only for the last variable. How to save them all in data_out?
Any help is appreciated.
ODS OUTPUT is your answer. You can't output directly using the OUT=, but you can output them like so:
ods output OneWayFreqs=freqs;
proc freq data=sashelp.class;
tables age height weight;
run;
ods output close;
OneWayFreqs is the one-way tables, (n>1)-way tables are CrossTabFreqs:
ods output CrossTabFreqs=freqs;
ods trace on;
proc freq data=sashelp.class;
tables age*height*weight;
run;
ods output close;
You can find out the correct name by running ods trace on; and then running your initial proc whatever (to the screen); it will tell you the names of the output in the log. (ods trace off; when you get tired of seeing it.)
Lots of good basic sas stuff to learn here
1) Run three proc freq statements (one for each variable a b c) with a different output dataset name so the datasets are not over written.
2) use a rename option on the out = statement to change the count and percent variables for when you combine the datasets
3) sort by category and merge all datasets together
(I'm assuming there are values that appear in in multiple variables, if not you could just stack the data sets)
data mydata;
input a $ b $ c$;
datalines;
r r g
g r b
b b r
r r r
g g b
b r r
;
run;
proc freq noprint data = mydata;
tables a / out = data_a
(rename = (a = category count = count_a percent = percent_a));
run;
proc freq noprint data = mydata;
tables b / out = data_b
(rename = (b = category count = count_b percent = percent_b));
run;
proc freq noprint data = mydata;
tables c / out = data_c
(rename = (c = category count = count_c percent = percent_c));
run;
proc sort data = data_a; by category; run;
proc sort data = data_b; by category; run;
proc sort data = data_c; by category; run;
data data_out;
merge data_a data_b data_c;
by category;
run;
As ever, there are lots of different ways of doing this sort of thing in SAS. Here are a couple of other options:
1. Use proc summary rather than proc freq:
proc summary data = sashelp.class;
class age height weight;
ways 1;
output out = freqs;
run;
2. Use multiple table statements in a single proc freq
This is more efficient than running 3 separate proc freq statements, as SAS only has to read the input dataset once rather than 3 times:
proc freq data = sashelp.class noprint;
table age /out = freq_age;
table height /out = freq_height;
table weight /out = freq_weight;
run;
data freqs;
informat age height weight count percent;
set freq_age freq_height freq_weight;
run;
This is a question I've dealt with many times and I WISH SAS had a better way of doing this.
My solution has been a macro that is generalized, provide your input data, your list of variables and the name of your output dataset. I take into consideration the format/type/label of the variable which you would have to do
Hope it helps:
https://gist.github.com/statgeek/c099e294e2a8c8b5580a
/*
Description: Creates a One-Way Freq table of variables including percent/count
Parameters:
dsetin - inputdataset
varlist - list of variables to be analyzed separated by spaces
dsetout - name of dataset to be created
Author: F.Khurshed
Date: November 2011
*/
%macro one_way_summary(dsetin, varlist, dsetout);
proc datasets nodetails nolist;
delete &dsetout;
quit;
*loop through variable list;
%let i=1;
%do %while (%scan(&varlist, &i, " ") ^=%str());
%let var=%scan(&varlist, &i, " ");
%put &i &var;
*Cross tab;
proc freq data=&dsetin noprint;
table &var/ out=temp1;
run;
*Get variable label as name;
data _null_;
set &dsetin (obs=1);
call symput('var_name', vlabel(&var.));
run;
%put &var_name;
*Add in Variable name and store the levels as a text field;
data temp2;
keep variable value count percent;
Variable = "&var_name";
set temp1;
value=input(&var, $50.);
percent=percent/100; * I like to store these as decimals instead of numbers;
format percent percent8.1;
drop &var.;
run;
%put &var_name;
*Append datasets;
proc append data=temp2 base=&dsetout force;
run;
/*drop temp tables so theres no accidents*/
proc datasets nodetails nolist;
delete temp1 temp2;
quit;
*Increment counter;
%let i=%eval(&i+1);
%end;
%mend;
%one_way_summary(sashelp.class, sex age, summary1);
proc report data=summary1 nowd;
column variable value count percent;
define variable/ order 'Variable';
define value / format=$8. 'Value';
define count/'N';
define percent/'Percentage %';
run;
EDIT (2022):
Better way of doing this is to use the ODS Tables:
/*This code is an example of how to generate a table with
Variable Name, Variable Value, Frequency, Percent, Cumulative Freq and Cum Pct
No macro's are required
Use Proc Freq to generate the list, list variables in a table statement if only specific variables are desired
Use ODS Table to capture the output and then format the output into a printable table.
*/
*Run frequency for tables;
ods table onewayfreqs=temp;
proc freq data=sashelp.class;
table sex age;
run;
*Format output;
data want;
length variable $32. variable_value $50.;
set temp;
Variable=scan(table, 2);
Variable_Value=strip(trim(vvaluex(variable)));
keep variable variable_value frequency percent cum:;
label variable='Variable'
variable_value='Variable Value';
run;
*Display;
proc print data=want(obs=20) label;
run;
The option STACKODS(OUTPUT) added to PROC MEANS in 9.3 makes this a much simpler task.
proc means data=have n nmiss stackods;
ods output summary=want;
run;
| Variable | N | NMiss |
| ------ | ----- | ----- |
| a | 4 | 3 |
| b | 7 | 0 |
| c | 6 | 1 |