I'm trying to get SAS to give me all the age_groups when count = 0. I can't figure it out and if I have to copy and paste 2000 lines from one xlsx to another, I will mess up counts somewhere as there are rows missing for age_groups. I tried various ways to make it work, including the use of sparse in the PROC FREQ step. None of them seem to work and I think it's because the age_group is a character value so it's completely missing from the frequency counts.
Here is what currently outputs:
Here's what I need:
Here is the SAS Code I used for this output:
PROC FREQ data=partial ORDER=INTERNAL ;
tables mmwr_case*age_group / out=partial_2021 sparse list nocol nocum norow nopercent;
by mmwr_case;
where case_year = 2021;
title 'Cases, 2021, by MMWR Week';
RUN;
Related
I am being asked to provide summary statistics including corresponding confidence interval (CI) with its width for the population mean. I need to print 85% 90% and 99%. I know I can either use univariate or proc means to return 1 interval of your choice but how do you print all 3 in a table? Also could someone explain the difference between univariate, proc means and proc sql and when they are used?
This is what I did and it only printed 85% confidence.
proc means data = mydata n mean clm alpha = 0.01 alpha =0.1 alpha = 0.15;
var variable;
RUN;
To put all three values in one table you can execute your step three times and put the results in one table by using an append step.
For shorter code and easier usage you can define a macro for this purpose.
%macro clm_val(TAB=, VARIABLE=, CONF=);
proc means
data = &TAB. n mean clm
alpha = &CONF.;
ods output summary=result;
var &VARIABLE.;
run;
data result;
length conf $8;
format conf_interval percentn8.0;
conf="&CONF.";
conf_interval=1-&CONF.;
set result;
run;
proc append data = result
base = all_results;
quit;
%mend;
%clm_val(TAB=sashelp.class, VARIABLE=age, CONF=0.01);
%clm_val(TAB=sashelp.class, VARIABLE=age, CONF=0.1);
%clm_val(TAB=sashelp.class, VARIABLE=age, CONF=0.15);
The resulting table looks like this:
When I run a proc glimmix in SAS, sometimes it drops observations.
How do I get the set of dropped/excluded observations or maybe the set of included observations so that I can identify the dropped set?
My current Proc GLIMMX code is as follows-
%LET EST=inputf.aarefestimates;
%LET MODEL_VAR3 = age Male Yearc2010 HOSPST
Hx_CTSURG Cardiogenic_Shock COPD MCANCER DIABETES;
data work.refmodel;
set inputf.readmref;
Yearc2010 = YEAR - 2010;
run;
PROC GLIMMIX DATA = work.refmodel NOCLPRINT MAXLMMUPDATE=100;
CLASS hospid HOSPST(ref="xx");
ODS OUTPUT PARAMETERESTIMATES = &est (KEEP=EFFECT ESTIMATE STDERR);
MODEL RADM30 = &MODEL_VAR3 /Dist=b LINK=LOGIT SOLUTION;
XBETA=_XBETA_;
LINP=_LINP_;
RANDOM INTERCEPT/SUBJECT= hospid SOLUTION;
OUTPUT OUT = inputf.aar
PRED(BLUP ILINK)=PREDPROB PRED(NOBLUP ILINK)=EXPPROB;
ID XBETA LINP hospst hospid Visitlink Key RADM30;
NLOPTIONS TECH=NRRIDG;
run;
Thank you in advance!
It drops records with missing values in any variable you're using in the model, in a CLASS, BY, MODEL, RANDOM statement. So you can check for missing among those variables to see what you get. Usually the output data set will also indicate this by not having predictions for the records that are not used.
You can run the code below.
*create fake data;
data heart;set sashelp.heart; ;run;
*Logistic Regression model, ageCHDdiag is missing ;
proc logistic data=heart;
class sex / param=ref;
model status(event='Dead') = ageCHDdiag height weight diastolic;
*generate output data;
output out=want p=pred;
run;
*explicitly flag records as included;
data included;
set want;
if missing(pred) then include='N'; else include='Y';
run;
*check that Y equals total obs included above;
proc freq data=included;
table include;
run;
The output will show:
The LOGISTIC Procedure
Model Information
Data Set WORK.HEART
Response Variable Status
Number of Response Levels 2
Model binary logit
Optimization Technique Fisher's scoring
Number of Observations Read 5209
Number of Observations Used 1446
And then the PROC FREQ will show:
The FREQ Procedure
Cumulative Cumulative
include Frequency Percent Frequency Percent
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
N 3763 72.24 3763 72.24
Y 1446 27.76 5209 100.00
And 1,446 records are included in both of the data sets.
I think I answered my question.
The code line -
OUTPUT OUT = inputf.aar
gives the output of the model. This table includes all the observations used in the proc statement. So I can match the data in this table to my input table and find the observations that get dropped.
#REEZA - I already looked for missing values for all the columns in the data. Was not able to identify the records there are getting dropped by only identifying the no. of records with missing values. Thanks for the suggestion though.
I am new to this and I already posted this question. But I think I did not explain it well.
I have a DATA inside SAS.
Some of the cells are empty[nothing in] and in the SAS output window, they have a DOT in the cell.
WHen I run the Result, At the end of the table, It add MISSING FREQUENCY = 7 or whatever the number is...
How do I make SAS disregard the Missing Frequency and ONLY use the one that have result...
Please see my screen shot, code and my CSV:OUTPUT DATA
RESULT WITH the MISSING frequency at the bottom
/* Generated Code (IMPORT) */
/* Source File:2012_16_ChathamPed.csv */
/* Source Path: /home/cwacta0/my_courses/Week2/ACCIDENTS */
PROC IMPORT
DATAFILE='/home/cwacta0/my_courses/Week2/ACCIDENTS/2012_16_ChathamPed.csv'
OUT=imported REPLACE;
GETNAMES=YES;
GUESSINGROWS=32767;
RUN;
proc contents data=work.imported;
run;
libname mydata"/courses/d1406ae5ba27fe300" access=readonly;
run;
/* sorting data by location*/
PROC SORT ;
by LocationOfimpact;
LABEL Route="STREET NAME" Fatalities="FATALITIES" Injuries="INJURIES"
SeriousInjuries="SERIOUS INJURIES" LocationOfimpact="LOCATION OF IMPACT"
MannerOfCollision="MANNER OF COLLISION"
U1Factors="PRIMARY CAUSES OF ACCIDENT"
U1TrafficControl="TRAFFIC CONTROL SIGNS AT THE LOCATION"
U2Factors="SECONDARY CAUSES OF ACCIDENT"
U2TrafficControl="OTHER TRAFFIC CONTROL SIGNS AT THE LOCATION"
Light="TYPE OF LIGHTHING AT THE TIME OF THE ACCIDENT"
DriverAge1="AGE OF THE DRIVER" DriverAge2="AGE OF THE CYCLIST";
/* Here I was unable to extract the drivers age 25 or less and te drivers who disregarded stop sign. here is how I coded it;
IF DriverAge1 LE 25;
IF U1Factors="Failed to Yield" OR U1Factors= "Disregard Stop Sign";
Run;
Also, I want to remove the Missing DATA under the results. But in the data, those are just a blank cell. How do I tell SAS to disregard a blank cell and not add it to the result?
Here is what I did and it does not work...
if U1Factors="BLANK" Then U1Factors=".";
Please help me figre this out...Tks
IF U1Factors="." Then call missing(U1Factors)*/;
Data want;
set imported;
IF DriverAge1 LE 25 And U1Factors in ("Failed to Yield", "Wrong Side of Road",
"Inattentive");
IF Light in ("DarkLighted", "DarkNot Lighted", "Dawn");
run;
proc freq ;
tables /*Route Fatalities Injuries SeriousInjuries LocationOfimpact MannerOfCollision*/
U1Factors /*U1TrafficControl U2Factors U2TrafficControl*/
light DriverAge1 DriverAge2;
RUN;
SAS will display missing numeric variables using a period. So if there was nothing in column for DriverAge1 in the CSV file then that observation will have a missing value. If your variable is character then SAS will also normally convert values of just a period in the input stream into blanks in the SAS variable.
Missing numeric values are considered less than any real number. So if you want use conditions like less than or equal to then missing values would be included if you do not exclude them by some other condition.
You can use a WHERE statement on procs to filter the data. If you want to append to the WHERE condition in a separate statement you can use the WHERE ALSO syntax to add the extra conditions.
If you want the missing category to appear in the PROC FREQ output add the MISSPRINT option to the TABLES statement. Or add the MISSING option and it will appear and also be counted in statistics.
proc freq ;
where . < DriverAge1 <= 25
and U1Factors in ("Failed to Yield", "Wrong Side of Road","Inattentive")
;
where also Light in ("DarkLighted", "DarkNot Lighted", "Dawn");
tables U1Factors light DriverAge1 DriverAge2 / missing;
run;
The WHERE conditions will apply to the whole dataset. So if you exclude missing DriverAge1 and missing U1Factors
proc freq ;
where not missing(U1Factors) and not missing(DriverAge1);
tables U1Factors DriverAge1 ;
run;
then only the observations that are not missing for both will be included. So you might want to generate the statistics separately for each variable.
proc freq ;
where not missing(U1Factors);
tables U1Factors ;
run;
proc freq ;
where not missing(DriverAge1);
tables DriverAge1 ;
run;
I have the following problem. I need to run PROC FREQ on multiple variables, but I want the output to all be on the same table. Currently, a PROC FREQ statement with something like TABLES ERstatus Age Race, InsuranceStatus; will calculate frequencies for each variable and print them all on separate tables. I just want the data on ONE table.
Any help would be appreciated. Thanks!
P.S. I tried using PROC TABULATE, but it didn't not calculate N correctly, so I'm not sure what I did wrong. Here is my code for PROC TABULATE. My variables are all categorical, so I just need to know N and percentages.
PROC TABULATE DATA = BCanalysis;
CLASS ERstatus PRstatus Race TumorStage InsuranceStatus;
TABLE (ERstatus PRstatus Race TumorStage) * (N COLPCTN), InsuranceStatus;
RUN;
The above code does not return the correct frequencies based on InsuranceStatus where 0 = insured and 1 = uninsured, but PROC FREQ does. Also doesn't calculate correctly with ROWPCTN. So any way that I can get PROC FREQ to calculate multiple variables on one table, or PROC TABULATE to return the correct frequencies, would be appreciated.
Here is a nice image of my output in a simplified analysis of only ERstatus and InsuranceStatus. You can see that PROC FREQ returns 204 people with an ERstatus of 1 and InsuranceStatus of 1. That's correct. The values in PROC TABULATE are not.
OUTPUT
I'll answer this separately as this is answering the other possible interpretation of the question; when it's clarified I'll delete one or the other.
If you want this in a single printed table, then you either need to use proc tabulate or you need to normalize your data - meaning put it in the form of variable | value. PROC FREQ is not capable of doing multiple one-way frequencies in a single table.
For PROC TABULATE, likely your issue is missing data. Any variable that is on the class statement will be checked for missingness, and if any rows are missing data for any of the class variables, those rows are entirely excluded from the tabulation for all variables.
You can override this by adding the missing option on the class statement, or in the table statement, or in the proc tabulate statement. So:
PROC TABULATE DATA = BCanalysis;
CLASS ERstatus PRstatus Race TumorStage InsuranceStatus/missing;
TABLE (ERstatus PRstatus Race TumorStage) * (N COLPCTN), InsuranceStatus;
RUN;
This will result in a slightly different appearance than on your table, though, as it will include the missing rows in places you probably do not want them, and they'll be factored against the colpctn when again you probably don't want them.
Typically some manipulation is then necessary; the easiest is to normalize your data and then run a tabulation (using PROC TABULATE or PROC FREQ, whichever is more appropriate; TABULATE has better percentaging options though) against that normalized dataset.
Let's say we have this:
data class;
set sashelp.class;
if _n_=5 then call missing(age);
if _n_=3 then call missing(sex);
run;
And we want these two tables in one table.
proc freq data=class;
tables age sex;
run;
If we do this:
proc tabulate data=class;
class age sex;
tables (age sex),(N colpctn);
run;
Then we get an N=17 total for both subtables - that's not what we want, we want N=18. Then we can do:
proc tabulate data=class;
class age sex/missing;
tables (age sex),(N colpctn);
run;
But that's not quite right either; I want F to have 8/18 = 44.44% and M 10/18 = 55.55%, not 42% and 53% with 5% allocated to the missing row.
The way I do this is to normalize the data. This means you get a dataset with 2 variables, varname and val, or whatever makes sense for your data, plus whatever identifier/demographic/whatnot variables you might have. val has to be character unless all of your values are numeric.
So for example here I normalize class with age and sex variables. I don't keep any identifiers, but you certainly could in your data, I imagine InsuranceStatus would be kept there if I understand what you're doing in that table. Once I have the normalized table, I just use those two variables, and carefully construct a denominator definition in proc tabulate to have the right basis for my pctn value. It's not quite the same as the single table before - the variable name is in its own column, not on top of the list of values - but honestly that looks better in my opinion.
data class_norm;
set class;
length val $2;
varname='age';
val=put(age,2. -l);
if not missing(age) then output;
varname='sex';
val=sex;
if not missing(sex) then output;
keep varname val;
run;
proc tabulate data=class_norm;
class varname val;
tables varname=' '*val=' ',n pctn<val>;
run;
If you want something better than this, you'll probably have to construct it in proc report. That gives you the most flexibility, but is the most onerous to program in also.
You can use ODS OUTPUT to get all of the PROC FREQ output to one dataset.
ods output onewayfreqs=class_freqs;
proc freq data=sashelp.class;
tables age sex;
run;
ods output close;
or
ods output crosstabfreqs=class_tabs;
proc freq data=sashelp.class;
tables sex*(height weight);
run;
ods output close;
Crosstabfreqs is the name of the cross-tab output, while one-way frequencies are onewayfreqs. You can use ods trace to find out the name if you forget it.
You may (probably will) still need to manipulate this dataset some to get the structure you want ultimately.
I have a dataset of the following format:
a table of M rows and 2K columns.
My columns are pairs of variables: X_i, Y_i and the rows are observations.
I would like to perform many linear regressions: one for each pair of columns (Y_i ~ X_i)
and obtain the results.
I know how to access specific columns using arrays, like so:
data Xs_Ys_data (drop=i);
array Xs[60] X1-X60;
array Ys[60] Y1-Y60;
I also know how to fit a single linear regression model, like so:
proc reg data=some_data;
model y = x;
output out=out_lin_reg;
run;
And I am familiar with the concept of loops:
do i=1 to 60;
Xs[i] .......;
end;
How do I combine these three to get what I need?
Thanks!
P.S - I asked a similar question on a different format here:
SAS reading a file in long format
Update:
I have managed to create the regressions using a macro like so:
%macro mylogit();
%do i = 1 %to 60;
proc reg data=Xs_Ys_data;
model Y&i = X&i;
run;
%end;
%mend;
%mylogit()
Now I am not sure how to export the results into a single table...
You have this in your macro:
proc reg data=Xs_Ys_data;
model Y&i = X&i;
run;
So instead create:
data x_y_Data;
set xs_yx_data;
array xs x1-x60;
array yx y1-y60;
do iter = 1 to dim(xs);
x=xs[iter];
y=ys[iter];
output;
end;
run;
proc reg data=X_Y_data;
by iter;
model Y = X;
run;
And then add an output statement however you normally would to get your resulting dataset. Now you get 1 output table with all 60 iterations (still 60 printed outputs), and if you want to create one printed output you can construct that from the output dataset.