I am trying to make a data set that has the p-values (or alpha values) along with the Pearson correlation coefficient for different variables.
I have about 6000 variables, and 4 variables I am correlating them with. I obtained the coefficients by using the output statement below
Full Code:
proc corr data=dat outp = corr noprint;
var v1
v2
v3
v4;
with
v1
v2
v3
v4
v5
...; *upto about v6000;
run;
However, I would also like the alpha values that I usually get in the 'Results Viewer' window, as a data set.
Thank you.
Try the following:
*Get corrs;
proc corr data=test outp=Corr;
var v1-v4;
with v:;
run;
*Get Alphas;
ods output CronbachAlpha=Alpha;
proc corr data=test alpha nocorr;
var v:;
run;
ods output close;
*Merge and Format;
data out(drop=Variables Alpha);
set Alpha Corr;
if Variables^='' then do;
_TYPE_='Alpha';
_NAME_=Variables;
v1=alpha;
v2=alpha;
v3=alpha;
v4=alpha;
end;
run;
Related
[I have this piece of code. However, the Macro in proc univariate generate too many separate dataset due to loop t from 1 to 310. How can I modify this code to include all proc univariate output into one dataset and then modify the rest of the code for a more efficient run?]
%let L=10; %* 10th percentile *;
%let H=%eval(100 - &L); %* 90th percentile*;
%let wlo=V1&L V2&L V3&L ;
%let whi=V1&H V2&H V3&H ;
%let wval=wV1 wV2 wV3 ;
%let val=V1 V2 V3;
%macro winsorise();
%do v=1 %to %sysfunc(countw(&val));
%do t=1 %to 310;
proc univariate data=regressors noprint;
var &val;
output out=_winsor&t._V&v pctlpts=&H &L
prtlpre=&val&t._V&v;
where time_count<=&t;run;
%end;
data regressors (drop=__:);
set regressors;
if _n_=1 then set _winsor&t._V&v;
&wval&t._V&v=min(max(&val&t._V&v,&wlo&t._V&v),&whi&t._V&v);
run;
%end;
%mend;
Thank you.
Presume you have data time_count, x1, x2, x3 with samples at every 0.5 time unit.
data regressors;
call streaminit(123);
do time_count = 0 to 310 by .5;
x1 = 2 ** (sin(time_count/6) * log(time_count+1));
x2 = log2 (time_count+1) + log(time_count/10+.1);
x3 = rand('normal',
output;
end;
format x: 7.3;
run;
Stack the data into groups based on integer time_count levels. The stack is constructed from a full outer join with a less than (<=) criteria. Each group is identified by the top time_count in the group.
proc sql;
create table stack as
select
a.time_count
, a.x1
, a.x2
, a.x3
, b.time_count as time_count_group /* save top value in group variable */
from regressors as a
full join regressors as b /* self full join */
on a.time_count <= b.time_count /* triangular criteria */
where
int(b.time_count)=b.time_count /* select integer top values */
order by
b.time_count, a.time_count
;
quit;
Now compute ALL your stats for ALL your variables for ALL your groups in one go. No macro, no muss, no fuss.
proc univariate data=stack noprint;
by time_count_group;
var x1 x2 x3;
output out=_winsor n=group_size pctlpts=90 10 pctlpre=x1_ x2_ x3_;
run;
proc means data = data1 stackODSoutput MIN P10 P25 P50 P75 P90 MAX N NMISS SUM nolabels maxdec=3;
var var1 var2;
output out = output;
run;
From the generated report, I can get all percentile and SUM. but the output data just provide me basic statistics with N, MIN, MAX, MEAN and std.
How can I also output the percentile and sum?
For output datasets in proc means, you need to specify which statistics you'd like within the output statement. Think of the proc statement as only controlling the visual output. Try this instead:
proc means data=sashelp.cars;
var horsepower MPG_City MPG_Highway;
output out=output
sum=
mean=
median=
std=
min=
max=
p10=
p25=
p75=
p90=
/ autoname
;
run;
Note that none of the statistics have anything after the =. The autoname option is automatically naming the statistic variables.
To make it easier to read, we can change the format of the output table. The naming convention of all variables is <variable>_<statistic>. Knowing this, we can transpose the table, separate out the variable and statistics from the name, then re-transpose it into a nicer format.
proc transpose data=output out=output_transposed;
var _NUMERIC_;
run;
data _want(index=(variable) );
set output_transposed;
Stat = scan(_NAME_, -1, '_');
Variable = tranwrd(_NAME_, cats('_', Stat), '');
keep Variable Stat COL1;
rename COL1 = Value;
run;
proc transpose data=_want out=want(drop=_NAME_);
by variable;
id stat;
var Value;
run;
I am trying to make a boxplot by using the SGPLOT in SAS. I would like to use SGPLOT with VBOX statement to flag out the Mean and Median on the gragh for each box.
Below is the data set I created as an example. Can someone give me a kind help on that?
/* Set the graphics environment */
goptions reset=all cback=white border htitle=12pt htext=10pt;
/* Create a sample data set to plot */
data one(drop=i);
do i=1 to 10;
do xvar=1 to 9 by 2;
yvar=ranuni(0)*100;
output;
end;
end;
run;
/* Sort the data by XVAR */
proc sort data=one;
by xvar;
run;
/* Use the UNIVARIATE procedure to determine */
/* the mean and median values */
proc univariate data=one noprint;
var yvar;
by xvar;
output mean=mean median=median out=stat;
run;
/* Merge the mean and median values back */
/* into the original data set by XVAR */
data all;
merge one stat;
by xvar;
run;
Use VBOX for box plot, SCATTER for mean/median.
/*--Compute the Mean and Median by sex--*/
proc means data=sashelp.heart;
class deathcause;
var cholesterol;
output out=heart(where=(_type_ > 0) keep=deathcause mean median _type_)
mean = mean
median = median;
run;
/*--Merge the data--*/
data heart2;
keep deathcause mean median cholesterol;
set sashelp.heart heart;
run;
proc print data=heart2;run;
/*--Box plot with connect and group colors--*/
ods graphics / reset ANTIALIASMAX=5300 width=5in height=3in imagename='Box_Group_Multi_Connect';
title 'Cholesterol by Cause of Death';
proc sgplot data=heart2 noautolegend noborder;
vbox cholesterol / category=deathcause group=deathcause;
scatter x=deathcause y=mean / name='mean' legendlabel='Mean' markerattrs=(color=green);
scatter x=deathcause y=median / name='median' legendlabel='Median' markerattrs=(color=red);
keylegend "mean" "median" / linelength=32 location=inside across=1 position=topright;
xaxis display=(nolabel);
run;
EDIT: Within SGPLOT and the VBOX statement, you can also plot the median as the line, and the mean as a point on the box plot, without any other manual calculations ahead of time. This is available as of SAS 9.4 M5+.
ods graphics / reset ANTIALIASMAX=5300 width=5in height=3in imagename='Box_Group';
title 'Cholesterol by Cause of Death';
proc sgplot data=sashelp.heart noborder;
vbox cholesterol / category=deathcause
displaystats=(median mean)
meanattrs=(color=red)
medianattrs=(color=green);
*xaxis display=(nolabel);
run;
I attempt to crate histograms plot via proc univariate. The target is to crate the distribution with bins of 0.1 width from 0 to 1.5 and then all the remaining in one bin.
I applied the following code to identify the range from 0 to 1.5, while it cannot manage the rest. How can I correct the code?
proc univariate data=HAVE;
where pred between 0 and 1.5;
var pred;
histogram pred/ vscale=percent midpoints=0 to 2 by 0.1 normal (noprint);
run;
You can try something like the following code to combine two Histograms by creating two variables from one variable:
/*Temporary DS with values ranging from 01. to 2.0*/
data have;
do i=0.1 to 2.0 by 0.1;
output;
end;
rename i=pred;
run;
/*Creating two variables x(0.1-1.5) and y(1.6-2.0)*/
data have;
set have;
if pred<1.6 then x=pred;
else y=pred;
drop pred;
run;
/*Combine two Histograms*/
proc sgplot data=have;
histogram x / nbins=15 binwidth=0.1;
density x / type=normal;
histogram y / nbins=5 binwidth=1.0;
density y / type=normal;
keylegend / location=inside position=topright noborder across=2;
xaxis display=(nolabel) values=(0.1 to 2.5 by 0.1);
run;
Create your own groups
Create a format so it shows the way you'd like
Plot it with SGPLOT
*create your own groups for data, especially the last group;
data mileage;
set sashelp.cars;
mpg_group=floor(mpg_highway / 10);
if mpg_group in (5, 6, 7) then
mpg_group=5;
keep mpg_highway mpg_group;
run;
*format to control display;
proc format;
value mpg_fmt 1='0 to 10' 2='11 to 20' 3='21 to 30' 4='31 to 40' 5='40+';
run;
*plot the data;
proc sgplot data=mileage;
vbar mpg_group /stat=freq barwidth=1;
format mpg_group mpg_fmt.;
run;
I want to output an extended Proc Means for my data. The standard is N, Min, Max, Std mean but I need also Median.
I have a lot of variables so I do not want to specify each individually after the output out= statement like median(var1)=var1_median etc.
The following does not work and just gives me the standard outputs:
proc means data=have n mean median std;
output out= want_means (drop=_type_ _freq_);
run;
this one also doesnt work:
proc means data=have n mean median std;
var volume price [xyz variables];
output out= want_means (drop=_type_ _freq_);
run;
I now use the following ,which works for me (note that I have to transpose it to have observations and not X variables...
proc means data=have;
output out= want(drop=_type_ _freq_)
n= mean= median= std= /autoname ;
run;
proc transpose data=want
out=want; run;