SAS-MEANS/SUMMARY Procedure output - sas

Since I need to standardizing (subtracting the mean and dividing by standard deviation) my dataset, I should mean and standard deviation of price and volume of each stock in each date. Particularly, my dataset includes different stock and date as shown in the following pic.
Hence, I used the following code to output mean and standard deviation.
proc summary data=HAVE nway;
class _ric date;
var price volume;
output out=WANT(drop=_:) mean= std=/autoname;
run;
However, my output table only has date, mean, and standard deviation. I don't why the RIC isn't included in the output table. How could I solve this problem? Thanks

The dataset doesn't have the variable _RIC because you told it to drop all variables whose name starts with _. Just be more specific in the variables you drop.
out=WANT(drop=_type_ _freq_)

Related

SAS - Totaling values in a column by a different column's value

The title may be a little ambiguous. Essentially, using the SASHELP.SHOES dataset, I'm trying to summarize the data in a new table by totaling the Sales, and Returns for each region. For instance, instead of having 56 rows for shoes sold in Africa and their individual sales/returns values, I have one row for Africa with columns TotalSales and TotalReturns. I need to do this for each region in the original dataset.
I'm not familiar at all with SAS, this is more or less the first thing I've really had to program in it. I've tried a few variations of data steps with IN or WHERE conditions, proc means steps with SUM() statements, and DO/DO WHILE loops, but I've missed something each time.
In Proc MEANS
Use a CLASS statement to specify which variable(s) are to be used to group the data. In your case REGION.
Use the VAR statement to specify which variable(s) are to have statistics calculated for within each grouping.
Default output
Corresponding to the minimal syntax
ods listing;
proc means noprint data=SASHELP.SHOES;
class region;
var sales returns;
output out=shoes_stats;
run;
Creates data set WORK.SHOES_STATS with one row per statistic per region.
Other output structure
Use procedure option NWAY to only get summarizations for combinations of all the CLASS variables. (In your case this corresponds to rows with _TYPE_=1)
The output columns can have the statistic name automatically concatenated to the variable name using the OUTPUT statement option / autoname.
Use data set options to control variables that are kept or dropped.
proc means nway noprint data=SASHELP.SHOES;
class region;
var sales returns;
output out=shoes_sums(drop=_type_ _freq_) sum= / autoname;
run;
dm 'vt shoes_sums; column names' viewtable;

I need to find the confidence intervals for proportions using stratified data

I'm trying to report estimates of proportions of subjects of a stratified random sample
I've tried every website I can find for SAS proc surveymeans, and I don't understand what I'm doing wrong.
data b;
set Data;
keep id texting section;
run;
proc surveyselect data=b out=samp_b method=srs n=(15,12,10,8)
seed=123;
strata section;
run;
proc surveymeans data=samp_b;
strata section;
weight SamplingWeight;
var texting;
run;
I should get confidence intervals for the strata, but they are not showing up. Also I need confidence intervals for the proportions!
I don't know what version of SAS/STAT you are using, but per SAS/STAT 9.2 Proc Surveymeans documentation pages, you can do one or both of the following:
1) Add the relevant statistics keywords to the proc surveymeans statement
https://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_surveymeans_sect007.htm
In the PROC SURVEYMEANS statement, you also can use statistic-keywords to specify statistics for the procedure to compute. Available statistics include the population mean and population total, together with their variance estimates and confidence limits. You can also request data set summary information and sample design information.
The available statistics keywords are listed and described on these pages:
https://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_surveymeans_a0000000238.htm
https://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_surveymeans_sect007.htm#statug.surveymeans.smeanskeys
So, to print the 95% two-sided confidence interval for the mean, you would add CLM to the end of your Proc Surveymeans statement.
2) Save the Statistics table with confidence intervals to a separate SAS dataset with an additional ods output Statistics=MyStat; statement, per these instructions.

confidence interval of the standard deviation with proc sql

My data set is really simple, just one colum with a ratio and another colum with a categorical var, I need to calculate the standard deviation for each class as well as the confidence interval.
Is there a built in function in SAS (proc SQL) to calculate the conficende interval of the standar deviation???
something like the excel function confidence() does?
thanks!
Not Proc SQl but PROC Univariate will give you the confidence intervals of mean, standard deviations and variance. The details are available in SAS support documents:
https://support.sas.com/documentation/cdl/en/procstat/63104/HTML/default/viewer.htm#procstat_univariate_sect064.htm
The following statements produce confidence limits for the mean, standard deviation, and variance of the population of heights:
ods select BasicIntervals;
proc univariate data=Heights cibasic;
var Height;
run;

How to format the proc means results?

proc mean data =table A mean;
var interestRate;
run;
The variable interest rate is already in percentage format in table A. But the proc means doesn't return the results in percentage.
How to get the mean interst rate in percentage format when doing the proc means?
I don't believe there's any way to format the output directly. You can use a format statement to change how the data would be formatted in an output dataset, and then use PROC REPORT or similar to produce output similar to PROC MEANS output. You might also consider PROC TABULATE if it's a fairly simple table.

SAS Proc means: How to capture non default statistics in output dataset such as nmiss p1 p99 etc?

Original Question:
By default Proc Means outputs N, MIN, MEAN, MAX and STD in the output dataset. How do I add, NMISS, P1, P5 etc to this list?
Additional info 1:
I want statistics on all numeric variables in my dataset. So I use _numeric_ in the var specification.
I wan't each statistic to be in a row and variables for columns.
Obs _TYPE_ _FREQ_ _STAT_ var1 var2 var3 etc
1 0 84829 N 84826.00
2 0 84829 MIN 0.00
3 0 84829 MAX 5000.00
4 0 84829 MEAN 151.22
5 0 84829 STD 1989.47
6 0 84829 NMISS 3
7 0 84829 P1 2.00
8 0 84839 P99 4999.00
How do I do this?
Thanks!
Assuming you are using the output option in proc means (and not ODS OUTPUT), you can control what comes in that dataset like so:
proc means data=sashelp.class;
var age;
class sex;
output out=mymeans nmiss= P1= P5= /autoname;
run;
The full list of statistic names is available in the PROC MEANS documentation under "statistics keyword".
You can also achieve the same result (with a slightly different output format) with ODS OUTPUT.
ods output summary=mymeans;
ods trace on;
proc means data=sashelp.class nmiss p1 p5;
var age;
class sex;
run;
ods trace off;
ods output close;
ODS TRACE on/off is to show the name of the table created (ie, 'summary'). It's not needed in production. In this case you ask for statistics the same way you ask for them to the output window (in the PROC MEANS statement).
Based on your edits, you want it transposed (one row per statistic). You can't get that directly, but the transposition isn't very hard.
proc means data=sashelp.class nmiss p1 p5;
class sex;
var _numeric_;
output out=mymeans n= mean= nmiss= p1= p5= /autoname ;
run;
data mymeans_out;
set mymeans(drop=_type_ _freq_);
by sex;
array numvars _numeric_;
format var stat $32.;
do _t = 1 to dim(numvars);
var=scan(vname(numvars[_t]),1,'_');
stat=scan(vname(numvars[_t]),-1,'_');
value = numvars[_t];
output;
end;
keep sex var stat value;
run;
This has a few limitations. If your variable names have underscores in them already, the var=scan... line will need to be rewritten to use substr and find the last underscore, then var = substr(vname(...),1,position_of_last_underscore). Stat should be fine since it uses -1 (reverse direction). If your variable names might exceed ~23 characters, you may not get the exact variable name back out again as it may be truncated or modified. If that's the case, then the ODS OUTPUT solution from above will help you (as it provides in an additional column the name of the original variable), but some more work would be needed to relate that value to the truncated name.
I also drop _TYPE_ and _FREQ_, to simplify the array definition; if you need those, then you'd need to write a bit of code to exclude them from separately being output, and keep them.
This paper has an excellent discussion of the exact issue you describe, along with macro code to output a dataset fitting your description.
A Better Means — The ODS Data Trap
Update:
I've discovered that there is a more recent paper that "presents a revised version of the macro supporting additional features and eliminating a surprising error." This is the updated solution:
Solve the SAS® ODS Data Trap in PROC MEANS
The macro appears well designed and avoids a wide variety of possible issues. The contortions used to create the output dataset involve calls to proc means (of course), proc sql, proc contents, and proc datasets and extensive use of the macro language architecture, and a description of them would probably not be instructive in this answer. I don't claim to understand it entirely myself.
However, once you have compiled the macro you should be able to create your desired dataset with one simple statement.
%better_means(data=MyDataSet)
Now that I've found this convenient solution I may start to use it myself.