I am trying to report my proc means output with 10 decimal places by specifying maxdec=10. However, SAS does not report more than 7 decimal places.
Here is the warning I get:
WARNING: NDec value is inappropriate, BEST format will be used.
I appreciate any suggestion.
If you look at the documentation, it states that MEANS will print out 0-8 decimal places based on the value of MAXDEC. If you want more, you will need to save the results and print them yourself.
Try this:
data test;
format x 12.11;
do i=1 to 1000;
x = rannor(0);
output;
end;
drop i;
run;
proc means data=test noprint;
var x;
output out=means_out mean=mean std=std;
run;
proc print data=means_out noobs;
var mean std;
format mean std 12.11;
run;
As already mentioned, maxdec= works for limiting the number of decimal places below 8. Proc means isn't going to let you do too much to change the format of the summary statistics. I'd suggest using proc tabulate:
If your proc means looks like:
proc means data=yourdata;
var yourvariable;
run;
Than use something like:
proc tabulate data=yourdata;
var yourvariable;
table yourvariable*
(n
mean*format=15.10
stddev*format=15.10
min*format=15.10
max*format=15.10);
run;
Related
I'm trying to export quartile information on a grouped dataset as a dataset in SAS but when I run this code my output is a table with the correct information displayed but the dataset WORK.TOP_1O_PERC is only summary statistics of the set (no quartiles). Does anyone know how I can export this as the CLASS (PDX) and its 25th and 75th percentiles? Thanks!
PROC MEANS DATA=WORK.TOP_10_DX P25 P75;
CLASS PDX;
VAR AmtPaid;
OUTPUT OUT = WORK.TOP_10_PERC;
RUN;
I like the STACKODS output that is a data set which is like the default printed output.
proc means data=sashelp.class n p25 p75 stackods;
ods output summary=summary;
run;
proc print;
run;
You can use output statement with <statistics>= options.
PROC MEANS DATA=WORK.TOP_10_DX NOPRINT;
CLASS PDX;
VAR AmtPaid;
OUTPUT OUT = WORK.TOP_10_PERC P25=P25 P75=P75;
RUN;
Compared to ods output, output statement is much faster but less flexible with multiple analysis variables or by statement specified situation.
I often work with a large number of variables that have zero or empty values only, but I could not find a SAS command to drop these unwanted variables. I know we can use SAS/IML, but I encountered such cases many times and would like to have a macro that may help me without having to type the variable names to avoid errors. Here is my code for removing variables with zero values only. It works to produce a cleaned output data set y from a raw data set x without using the names of the variables. I hope others could have a better solution or help me to make mine better.
%Macro dropZeroV(x, y) ;
proc means data = &x. ;
var _numeric_;
output out = sumTab ; run;
proc transpose data = sumTab(drop = _TYPE_) out= sumt; var _Numeric_; id _STAT_; run;
%let Vlst =;
proc sql noprint;
select _NAME_ into : dropLst separated by ' '
from sumT
where Max=0 and Min =0;
data &y.;
set &x.; drop &dropLst.;
run;
proc print data = &y.; run;
%Mend dropZeroV;
Use STACKODS and ODS SUMMARY to get the table in the format needed in one step rather than multiple steps. This limits it to the sum, since if the sum = 0, all values are 0. You may also want to look at rounding to avoid any issues with numeric precision.
PROC MEANS + PROC TRANSPOSE go to :
ods select none;
proc means data= &x. stackods sum;
var _numeric_;
ods output summary = sumT;
run;
I have the following problem. I need to run PROC FREQ on multiple variables, but I want the output to all be on the same table. Currently, a PROC FREQ statement with something like TABLES ERstatus Age Race, InsuranceStatus; will calculate frequencies for each variable and print them all on separate tables. I just want the data on ONE table.
Any help would be appreciated. Thanks!
P.S. I tried using PROC TABULATE, but it didn't not calculate N correctly, so I'm not sure what I did wrong. Here is my code for PROC TABULATE. My variables are all categorical, so I just need to know N and percentages.
PROC TABULATE DATA = BCanalysis;
CLASS ERstatus PRstatus Race TumorStage InsuranceStatus;
TABLE (ERstatus PRstatus Race TumorStage) * (N COLPCTN), InsuranceStatus;
RUN;
The above code does not return the correct frequencies based on InsuranceStatus where 0 = insured and 1 = uninsured, but PROC FREQ does. Also doesn't calculate correctly with ROWPCTN. So any way that I can get PROC FREQ to calculate multiple variables on one table, or PROC TABULATE to return the correct frequencies, would be appreciated.
Here is a nice image of my output in a simplified analysis of only ERstatus and InsuranceStatus. You can see that PROC FREQ returns 204 people with an ERstatus of 1 and InsuranceStatus of 1. That's correct. The values in PROC TABULATE are not.
OUTPUT
I'll answer this separately as this is answering the other possible interpretation of the question; when it's clarified I'll delete one or the other.
If you want this in a single printed table, then you either need to use proc tabulate or you need to normalize your data - meaning put it in the form of variable | value. PROC FREQ is not capable of doing multiple one-way frequencies in a single table.
For PROC TABULATE, likely your issue is missing data. Any variable that is on the class statement will be checked for missingness, and if any rows are missing data for any of the class variables, those rows are entirely excluded from the tabulation for all variables.
You can override this by adding the missing option on the class statement, or in the table statement, or in the proc tabulate statement. So:
PROC TABULATE DATA = BCanalysis;
CLASS ERstatus PRstatus Race TumorStage InsuranceStatus/missing;
TABLE (ERstatus PRstatus Race TumorStage) * (N COLPCTN), InsuranceStatus;
RUN;
This will result in a slightly different appearance than on your table, though, as it will include the missing rows in places you probably do not want them, and they'll be factored against the colpctn when again you probably don't want them.
Typically some manipulation is then necessary; the easiest is to normalize your data and then run a tabulation (using PROC TABULATE or PROC FREQ, whichever is more appropriate; TABULATE has better percentaging options though) against that normalized dataset.
Let's say we have this:
data class;
set sashelp.class;
if _n_=5 then call missing(age);
if _n_=3 then call missing(sex);
run;
And we want these two tables in one table.
proc freq data=class;
tables age sex;
run;
If we do this:
proc tabulate data=class;
class age sex;
tables (age sex),(N colpctn);
run;
Then we get an N=17 total for both subtables - that's not what we want, we want N=18. Then we can do:
proc tabulate data=class;
class age sex/missing;
tables (age sex),(N colpctn);
run;
But that's not quite right either; I want F to have 8/18 = 44.44% and M 10/18 = 55.55%, not 42% and 53% with 5% allocated to the missing row.
The way I do this is to normalize the data. This means you get a dataset with 2 variables, varname and val, or whatever makes sense for your data, plus whatever identifier/demographic/whatnot variables you might have. val has to be character unless all of your values are numeric.
So for example here I normalize class with age and sex variables. I don't keep any identifiers, but you certainly could in your data, I imagine InsuranceStatus would be kept there if I understand what you're doing in that table. Once I have the normalized table, I just use those two variables, and carefully construct a denominator definition in proc tabulate to have the right basis for my pctn value. It's not quite the same as the single table before - the variable name is in its own column, not on top of the list of values - but honestly that looks better in my opinion.
data class_norm;
set class;
length val $2;
varname='age';
val=put(age,2. -l);
if not missing(age) then output;
varname='sex';
val=sex;
if not missing(sex) then output;
keep varname val;
run;
proc tabulate data=class_norm;
class varname val;
tables varname=' '*val=' ',n pctn<val>;
run;
If you want something better than this, you'll probably have to construct it in proc report. That gives you the most flexibility, but is the most onerous to program in also.
You can use ODS OUTPUT to get all of the PROC FREQ output to one dataset.
ods output onewayfreqs=class_freqs;
proc freq data=sashelp.class;
tables age sex;
run;
ods output close;
or
ods output crosstabfreqs=class_tabs;
proc freq data=sashelp.class;
tables sex*(height weight);
run;
ods output close;
Crosstabfreqs is the name of the cross-tab output, while one-way frequencies are onewayfreqs. You can use ods trace to find out the name if you forget it.
You may (probably will) still need to manipulate this dataset some to get the structure you want ultimately.
I have a null dataset such as
data a;
if 0;
run;
Now I wish to use proc report to print this dataset. Of course, there will be nothing in the report, but I want one sentence in the report said "It is a null dataset". Any ideas?
Thanks.
You can test to see if there are any observations in the dataset first. If there are observations, then use the dataset, otherwise use a dummy dataset that looks like this and print it:
data use_this_if_no_obs;
msg = 'It is a null dataset';
run;
There are plenty of ways to test datasets to see if they contain any observations or not. My personal favorite is the %nobs macro found here: https://stackoverflow.com/a/5665758/214994 (other than my answer, there are several alternate approaches to pick from, or do a google search).
Using this %nobs macro we can then determine the dataset to use in a single line of code:
%let ds = %sysfunc(ifc(%nobs(iDs=sashelp.class) eq 0, use_this_if_no_obs, sashelp.class));
proc print data=&ds;
run;
Here's some code showing the alternate outcome:
data for_testing_only;
if 0;
run;
%let ds = %sysfunc(ifc(%nobs(iDs=for_testing_only) eq 0, use_this_if_no_obs, sashelp.class));
proc print data=&ds;
run;
I've used proc print to simplify the example, but you can adapt it to use proc report as necessary.
For the no data report you don't need to know how many observations are in the data just that there are none. This example shows how I would approach the problem.
Create example data with zero obs.
data class;
stop;
set sashelp.class;
run;
Check for no obs and add one obs with missing on all vars. Note that no observation are every read from class in this step.
data class;
if eof then output;
stop;
modify class end=eof;
run;
make the report
proc report data=class missing;
column _all_;
define _all_ / display;
define name / order;
compute before name;
retain_name=name;
endcomp;
compute after;
if not missing(retain_name) then l=0;
else l=40;
msg = 'No data for this report';
line msg $varying. l;
endcomp;
run;
I want to calculate the 95th percentile of a distribution. I think I cannot use proc means because I need the value, while the output of proc means is a table. I have to use the percentile to filter the dataset and create another dataset with only the observations greater than the percentile.
Clearly I don't want to use the numeric value..because I want to use it in a macro.
Don't put summary statistics into macro variables. You risk loss of precision.
This is based on your cryptic description of the problem.
proc means...
output out=pct95 pct95=
run;
data subset;
if _n_ eq 1 then set pct95;
set data;
if value < pct95;
run;
You can suppress proc means from outputting your results in a new tab using the noprint option. Try this:
proc means data = your_data noprint;
var variable_name;
output out = your_data2 p95= / autoname;
run;