I need to calculate the following for my dataset. I could calculate individual PPV (95% CI) and NPV (95% CI) but got tad confused about how to calculate this:
PPV+NPV-1 (95% CI)
How do I do this calculation?
This page on SAS support gives code as follows:
title 'Sensitivity';
proc freq data=FatComp;
where Response=1;
weight Count;
tables Test / binomial(level="1");
exact binomial;
title 'Specificity';
proc freq data=FatComp;
where Response=0;
weight Count;
tables Test / binomial(level="0");
exact binomial;
title 'Positive predictive value';
proc freq data=FatComp;
where Test=1;
weight Count;
tables Response / binomial(level="1");
exact binomial;
title 'Negative predictive value';
proc freq data=FatComp;
where Test=0;
weight Count;
tables Response / binomial(level="0");
exact binomial;
I doubt that this is a useful measure. In general you should present sensitivity, specificity, positive and negative predictive values. If you want a global measure of accuracy you should go for the proportion of correctly classified subjects.
If you go in the webpage already suggested by Peter Flom yo can scroll until a piece of code for overall accuracy. The accuracy can be computed by creating a binary variable indicating whether test and response agree in each observation. :
data acc;
set FatComp;
if (test and response) or
(not test and not response) then acc=1;
else acc=0;
proc freq;
weight count;
tables acc / binomial(level="1");
exact binomial;
Hope it helps
I am being asked to provide summary statistics including corresponding confidence interval (CI) with its width for the population mean. I need to print 85% 90% and 99%. I know I can either use univariate or proc means to return 1 interval of your choice but how do you print all 3 in a table? Also could someone explain the difference between univariate, proc means and proc sql and when they are used?
This is what I did and it only printed 85% confidence.
proc means data = mydata n mean clm alpha = 0.01 alpha =0.1 alpha = 0.15;
var variable;
To put all three values in one table you can execute your step three times and put the results in one table by using an append step.
For shorter code and easier usage you can define a macro for this purpose.
%macro clm_val(TAB=, VARIABLE=, CONF=);
proc means
data = &TAB. n mean clm
alpha = &CONF.;
ods output summary=result;
data result;
length conf $8;
format conf_interval percentn8.0;
set result;
proc append data = result
base = all_results;
%clm_val(TAB=sashelp.class, VARIABLE=age, CONF=0.01);
%clm_val(TAB=sashelp.class, VARIABLE=age, CONF=0.1);
%clm_val(TAB=sashelp.class, VARIABLE=age, CONF=0.15);
The resulting table looks like this:
I have the following piece of result, which i need to add. Seems like a simple request, but i have spent a few days already trying to find the solution to this problem.
Data have:
Measure Jan_total Feb_total
Startup 100 200
Switcher 300 500
Data want:
Measure Jan_total Feb_total
Startup 100 200
Switcher 300 500
Total 400 700
I want individually placed vertical sum results of each column under the respective column please.
Can someone help me arrive at the solution for this request, please?
To do this in data step code, you would do something like:
data want;
set have end=end; * Var 'end' will be true when we get to the end of 'have'.;
jan_sum + jan_total; * These 'sum statements' accumulate the totals from each observation.;
feb_sum + feb_total;
output; * Output each of the original obbservations.;
if end then do; * When we reach the end of the input...;
measure = 'Total'; * ...update the value in Measure...;
jan_total = jan_sum; * ...move the accumulated totals to the original vars...;
feb_total = feb_sum;
output; * ...and output them in an additional observation.
drop jan_sum feb_sum; * Get rid of the accumulator variables (this statement can go anywhere in the step).;
You could do this many other ways. Assuming that you actually have columns for all the months, you might re-write the data step code to use arrays, or you might use PROC SUMMARY or PROC SQL to calculate the totals and add the resulting totals back using a much shorter data step, etc.
proc means noprint
data = have;
output out= want
class measure;
var Jan_total Feb_total;
It depends on if this is for display or for a data set. It usually makes no sense to have a total in the data set and it's just used for reporting.
PROC PRINT has a SUM statement that will add the totals to the end of a report. PROC TABULATE also provides another mechanism for reporting like this.
example from here.
options obs=10 nobyline;
proc sort data=exprev;
by sale_type;
proc print data=exprev noobs label sumlabel
n='Number of observations for the order type: '
'Number of observations for the data set: ';
var country order_date quantity price;
label sale_type='Sale Type'
price='Total Retail Price* in USD'
country='Country' order_date='Date' quantity='Quantity';
sum price quantity;
by sale_type;
format price dollar7.2;
title 'Retail and Quantity Totals for #byval(sale_type) Sales';
options byline;
When I run a proc glimmix in SAS, sometimes it drops observations.
How do I get the set of dropped/excluded observations or maybe the set of included observations so that I can identify the dropped set?
My current Proc GLIMMX code is as follows-
%LET EST=inputf.aarefestimates;
%LET MODEL_VAR3 = age Male Yearc2010 HOSPST
data work.refmodel;
set inputf.readmref;
Yearc2010 = YEAR - 2010;
CLASS hospid HOSPST(ref="xx");
OUTPUT OUT = inputf.aar
ID XBETA LINP hospst hospid Visitlink Key RADM30;
Thank you in advance!
It drops records with missing values in any variable you're using in the model, in a CLASS, BY, MODEL, RANDOM statement. So you can check for missing among those variables to see what you get. Usually the output data set will also indicate this by not having predictions for the records that are not used.
You can run the code below.
*create fake data;
data heart;set sashelp.heart; ;run;
*Logistic Regression model, ageCHDdiag is missing ;
proc logistic data=heart;
class sex / param=ref;
model status(event='Dead') = ageCHDdiag height weight diastolic;
*generate output data;
output out=want p=pred;
*explicitly flag records as included;
data included;
set want;
if missing(pred) then include='N'; else include='Y';
*check that Y equals total obs included above;
proc freq data=included;
table include;
The output will show:
The LOGISTIC Procedure
Model Information
Response Variable Status
Number of Response Levels 2
Model binary logit
Optimization Technique Fisher's scoring
Number of Observations Read 5209
Number of Observations Used 1446
And then the PROC FREQ will show:
The FREQ Procedure
Cumulative Cumulative
include Frequency Percent Frequency Percent
N 3763 72.24 3763 72.24
Y 1446 27.76 5209 100.00
And 1,446 records are included in both of the data sets.
I think I answered my question.
The code line -
OUTPUT OUT = inputf.aar
gives the output of the model. This table includes all the observations used in the proc statement. So I can match the data in this table to my input table and find the observations that get dropped.
#REEZA - I already looked for missing values for all the columns in the data. Was not able to identify the records there are getting dropped by only identifying the no. of records with missing values. Thanks for the suggestion though.
data work.smallmarket;
set work.market;
where country=Nigeria;
keep Product# NetMargin DT;
Question 1: How can i calculate an industry average NetMargin by date (DT) across all products bearing in mind that not all products will have any data? i.e. no data is not the same as 0.
Question 2: How can I calculate a moving industry average for NetMargin?
Question 1:
proc sort data= smallmarket; by date_var; run;
proc means data=smallmarket noprint;
by createdportaldate;
output out= by_date
Question 2:
If you have access, you could use Proc expand, if not, then you can find a worked example at:
Edit: found better example:
I'm working on a project and have run into an expected issue. After running PROC LOGISTIC on my data, I noticed that a few of the odds ratios and regression coefficients seemed to be the inverse of what they should be. After some investigation using PROC FREQ to run the odds ratios, I believe there is some form of error with the odds ratios from PROC LOGISTIC.
The example below is of the response variable "MonthStay" and one of the variables in question "KennelCough". MonthStay = Y and the event of interest is KennelCough = N.
I don't know how to remedy this suspected error. Am I missing something in my code to get the correct calculations? Or am I totally misunderstanding what's going on? Thanks!
Here is the PROC FREQ code and result:
proc freq data = capstone.adopts_dog order = freq;
tables KennelCough*MonthStay / relrisk;
Here is the PROC LOGISTIC CODE and results:
proc logistic data = capstone.adopts_dog plots(only)=(roc(id=prob) effect);
class Breed(ref='Chihuahua') Gender(ref='Female')
Color(ref='Black') Source(ref='Stray') EvalCat(ref='TR') SNAtIn(ref='No')
FoodAggro(ref='Y') AnimalAggro(ref='Y') KennelCough(ref='Y') Dental(ref='Y')
Fearful(ref='Y') Handling(ref='Y') UnderAge(ref='Y') InJuris(ref='Alameda County')
InRegion(ref='East Bay SPCA - Dublin') OutRegion(ref='East Bay SPCA - Dublin')
/ param=ref;
model MonthStay(event='Y') = Age Gender Breed Weight Color Source EvalCat SNatIn
NumBehvCond NumMedCond FoodAggro AnimalAggro KennelCough Dental Fearful
Handling UnderAge Injuris InRegion OutRegion
/ lackfit aggregate scale = none selection = backward rsquare;
output out = probdogs4 PREDPROBS=I reschi = pearson h = leverage;
Class Level Info
Odds Ratios Estimates
In Proc Freq, you are calculating unadjusted odds ratio while in proc logistics, all odds ratio were adjusted for covariates included in the logistic regression model