I need help printing multiple confidence intervals in sas - sas

I am being asked to provide summary statistics including corresponding confidence interval (CI) with its width for the population mean. I need to print 85% 90% and 99%. I know I can either use univariate or proc means to return 1 interval of your choice but how do you print all 3 in a table? Also could someone explain the difference between univariate, proc means and proc sql and when they are used?
This is what I did and it only printed 85% confidence.
proc means data = mydata n mean clm alpha = 0.01 alpha =0.1 alpha = 0.15;
var variable;
RUN;

To put all three values in one table you can execute your step three times and put the results in one table by using an append step.
For shorter code and easier usage you can define a macro for this purpose.
%macro clm_val(TAB=, VARIABLE=, CONF=);
proc means
data = &TAB. n mean clm
alpha = &CONF.;
ods output summary=result;
var &VARIABLE.;
run;
data result;
length conf $8;
format conf_interval percentn8.0;
conf="&CONF.";
conf_interval=1-&CONF.;
set result;
run;
proc append data = result
base = all_results;
quit;
%mend;
%clm_val(TAB=sashelp.class, VARIABLE=age, CONF=0.01);
%clm_val(TAB=sashelp.class, VARIABLE=age, CONF=0.1);
%clm_val(TAB=sashelp.class, VARIABLE=age, CONF=0.15);
The resulting table looks like this:

Related

Vertical column summation in sas

I have the following piece of result, which i need to add. Seems like a simple request, but i have spent a few days already trying to find the solution to this problem.
Data have:
Measure Jan_total Feb_total
Startup 100 200
Switcher 300 500
Data want:
Measure Jan_total Feb_total
Startup 100 200
Switcher 300 500
Total 400 700
I want individually placed vertical sum results of each column under the respective column please.
Can someone help me arrive at the solution for this request, please?
To do this in data step code, you would do something like:
data want;
set have end=end; * Var 'end' will be true when we get to the end of 'have'.;
jan_sum + jan_total; * These 'sum statements' accumulate the totals from each observation.;
feb_sum + feb_total;
output; * Output each of the original obbservations.;
if end then do; * When we reach the end of the input...;
measure = 'Total'; * ...update the value in Measure...;
jan_total = jan_sum; * ...move the accumulated totals to the original vars...;
feb_total = feb_sum;
output; * ...and output them in an additional observation.
end;
drop jan_sum feb_sum; * Get rid of the accumulator variables (this statement can go anywhere in the step).;
run;
You could do this many other ways. Assuming that you actually have columns for all the months, you might re-write the data step code to use arrays, or you might use PROC SUMMARY or PROC SQL to calculate the totals and add the resulting totals back using a much shorter data step, etc.
proc means noprint
data = have;
output out= want
class measure;
var Jan_total Feb_total;
run;
It depends on if this is for display or for a data set. It usually makes no sense to have a total in the data set and it's just used for reporting.
PROC PRINT has a SUM statement that will add the totals to the end of a report. PROC TABULATE also provides another mechanism for reporting like this.
example from here.
options obs=10 nobyline;
proc sort data=exprev;
by sale_type;
run;
proc print data=exprev noobs label sumlabel
n='Number of observations for the order type: '
'Number of observations for the data set: ';
var country order_date quantity price;
label sale_type='Sale Type'
price='Total Retail Price* in USD'
country='Country' order_date='Date' quantity='Quantity';
sum price quantity;
by sale_type;
format price dollar7.2;
title 'Retail and Quantity Totals for #byval(sale_type) Sales';
run;
options byline;
Results:

PROC FREQ on multiple variables combined into one table

I have the following problem. I need to run PROC FREQ on multiple variables, but I want the output to all be on the same table. Currently, a PROC FREQ statement with something like TABLES ERstatus Age Race, InsuranceStatus; will calculate frequencies for each variable and print them all on separate tables. I just want the data on ONE table.
Any help would be appreciated. Thanks!
P.S. I tried using PROC TABULATE, but it didn't not calculate N correctly, so I'm not sure what I did wrong. Here is my code for PROC TABULATE. My variables are all categorical, so I just need to know N and percentages.
PROC TABULATE DATA = BCanalysis;
CLASS ERstatus PRstatus Race TumorStage InsuranceStatus;
TABLE (ERstatus PRstatus Race TumorStage) * (N COLPCTN), InsuranceStatus;
RUN;
The above code does not return the correct frequencies based on InsuranceStatus where 0 = insured and 1 = uninsured, but PROC FREQ does. Also doesn't calculate correctly with ROWPCTN. So any way that I can get PROC FREQ to calculate multiple variables on one table, or PROC TABULATE to return the correct frequencies, would be appreciated.
Here is a nice image of my output in a simplified analysis of only ERstatus and InsuranceStatus. You can see that PROC FREQ returns 204 people with an ERstatus of 1 and InsuranceStatus of 1. That's correct. The values in PROC TABULATE are not.
OUTPUT
I'll answer this separately as this is answering the other possible interpretation of the question; when it's clarified I'll delete one or the other.
If you want this in a single printed table, then you either need to use proc tabulate or you need to normalize your data - meaning put it in the form of variable | value. PROC FREQ is not capable of doing multiple one-way frequencies in a single table.
For PROC TABULATE, likely your issue is missing data. Any variable that is on the class statement will be checked for missingness, and if any rows are missing data for any of the class variables, those rows are entirely excluded from the tabulation for all variables.
You can override this by adding the missing option on the class statement, or in the table statement, or in the proc tabulate statement. So:
PROC TABULATE DATA = BCanalysis;
CLASS ERstatus PRstatus Race TumorStage InsuranceStatus/missing;
TABLE (ERstatus PRstatus Race TumorStage) * (N COLPCTN), InsuranceStatus;
RUN;
This will result in a slightly different appearance than on your table, though, as it will include the missing rows in places you probably do not want them, and they'll be factored against the colpctn when again you probably don't want them.
Typically some manipulation is then necessary; the easiest is to normalize your data and then run a tabulation (using PROC TABULATE or PROC FREQ, whichever is more appropriate; TABULATE has better percentaging options though) against that normalized dataset.
Let's say we have this:
data class;
set sashelp.class;
if _n_=5 then call missing(age);
if _n_=3 then call missing(sex);
run;
And we want these two tables in one table.
proc freq data=class;
tables age sex;
run;
If we do this:
proc tabulate data=class;
class age sex;
tables (age sex),(N colpctn);
run;
Then we get an N=17 total for both subtables - that's not what we want, we want N=18. Then we can do:
proc tabulate data=class;
class age sex/missing;
tables (age sex),(N colpctn);
run;
But that's not quite right either; I want F to have 8/18 = 44.44% and M 10/18 = 55.55%, not 42% and 53% with 5% allocated to the missing row.
The way I do this is to normalize the data. This means you get a dataset with 2 variables, varname and val, or whatever makes sense for your data, plus whatever identifier/demographic/whatnot variables you might have. val has to be character unless all of your values are numeric.
So for example here I normalize class with age and sex variables. I don't keep any identifiers, but you certainly could in your data, I imagine InsuranceStatus would be kept there if I understand what you're doing in that table. Once I have the normalized table, I just use those two variables, and carefully construct a denominator definition in proc tabulate to have the right basis for my pctn value. It's not quite the same as the single table before - the variable name is in its own column, not on top of the list of values - but honestly that looks better in my opinion.
data class_norm;
set class;
length val $2;
varname='age';
val=put(age,2. -l);
if not missing(age) then output;
varname='sex';
val=sex;
if not missing(sex) then output;
keep varname val;
run;
proc tabulate data=class_norm;
class varname val;
tables varname=' '*val=' ',n pctn<val>;
run;
If you want something better than this, you'll probably have to construct it in proc report. That gives you the most flexibility, but is the most onerous to program in also.
You can use ODS OUTPUT to get all of the PROC FREQ output to one dataset.
ods output onewayfreqs=class_freqs;
proc freq data=sashelp.class;
tables age sex;
run;
ods output close;
or
ods output crosstabfreqs=class_tabs;
proc freq data=sashelp.class;
tables sex*(height weight);
run;
ods output close;
Crosstabfreqs is the name of the cross-tab output, while one-way frequencies are onewayfreqs. You can use ods trace to find out the name if you forget it.
You may (probably will) still need to manipulate this dataset some to get the structure you want ultimately.

how to calculate percentile in SAS

I want to calculate the 95th percentile of a distribution. I think I cannot use proc means because I need the value, while the output of proc means is a table. I have to use the percentile to filter the dataset and create another dataset with only the observations greater than the percentile.
Clearly I don't want to use the numeric value..because I want to use it in a macro.
Don't put summary statistics into macro variables. You risk loss of precision.
This is based on your cryptic description of the problem.
proc means...
output out=pct95 pct95=
run;
data subset;
if _n_ eq 1 then set pct95;
set data;
if value < pct95;
run;
You can suppress proc means from outputting your results in a new tab using the noprint option. Try this:
proc means data = your_data noprint;
var variable_name;
output out = your_data2 p95= / autoname;
run;

Using Tabulate for 3-way table

I am trying to output a three way frequency table. I am able to do this (roughly) with proc freq, but would like the control for variable to be joined. I thought proc tabulate would be a good way to customize the output. Basically I want to fill in the cells with frequency, and then customize the percents at a later time. So, have count and column percent in each cell. Is that doable with proc tabulate?
Right now I have:
proc freq data=have;
table group*age*level / norow nopercent;
run;
that gives me e.g.:
What I want:
Here is the code I am using:
proc tabulate data=ex1;
class age level group;
var age;
table age='Age Category',
mean=' '*group=''*level=''*F=10./ RTS=13.;
run;
Thanks!
You can certainly get close to that. You can't really get in 'one' cell, it needs to write each thing out to a different cell, but theoretically with some complex formatting (probably using CSS) you could remove the borders.
You can't use VAR and CLASS together, but since you're just doing percents, you don't need to use MEAN - you should just use N and COLPCTN. If you're dealing with already summarized data, you may need to do this differently - if so then post an example of your dataset (but that wouldn't work in PROC FREQ either without a FREQ statement).
data have;
do _t = 1 to 100;
age = ceil(3*rand('Uniform'));
group = floor(2*rand('Uniform'));
level = floor(5*rand('Uniform'));
output;
end;
drop _t;
run;
proc tabulate data=have;
class age level group;
table age='Age Category',
group=''*level=''*(n='n' colpctn='p')*F=10./ RTS=13.;
run;
This puts N and P (n and column %) in separate adjacent cells inside a single level.

several regressions on a single dataset in SAS

I have a dataset of the following format:
a table of M rows and 2K columns.
My columns are pairs of variables: X_i, Y_i and the rows are observations.
I would like to perform many linear regressions: one for each pair of columns (Y_i ~ X_i)
and obtain the results.
I know how to access specific columns using arrays, like so:
data Xs_Ys_data (drop=i);
array Xs[60] X1-X60;
array Ys[60] Y1-Y60;
I also know how to fit a single linear regression model, like so:
proc reg data=some_data;
model y = x;
output out=out_lin_reg;
run;
And I am familiar with the concept of loops:
do i=1 to 60;
Xs[i] .......;
end;
How do I combine these three to get what I need?
Thanks!
P.S - I asked a similar question on a different format here:
SAS reading a file in long format
Update:
I have managed to create the regressions using a macro like so:
%macro mylogit();
%do i = 1 %to 60;
proc reg data=Xs_Ys_data;
model Y&i = X&i;
run;
%end;
%mend;
%mylogit()
Now I am not sure how to export the results into a single table...
You have this in your macro:
proc reg data=Xs_Ys_data;
model Y&i = X&i;
run;
So instead create:
data x_y_Data;
set xs_yx_data;
array xs x1-x60;
array yx y1-y60;
do iter = 1 to dim(xs);
x=xs[iter];
y=ys[iter];
output;
end;
run;
proc reg data=X_Y_data;
by iter;
model Y = X;
run;
And then add an output statement however you normally would to get your resulting dataset. Now you get 1 output table with all 60 iterations (still 60 printed outputs), and if you want to create one printed output you can construct that from the output dataset.