I want to run this procedure in sas9.1:
proc ttest data=auc dist=lognormal tost(0.8, 1.25);
paired TestAUC*RefAUC;
run;
but it had an error!
How should I solve this problem?
It looks like your trying to compare area under the curve (auc) from two roc curves. This link may help you:
http://support.sas.com/kb/45/339.html
It uses a chi-square test, but it may be more appropriate than a t-test. It's hard to tell with the limited information you have provided.
Related
Is there a default map for SAS that projects states like AK,HI, and PR in a more US-centric manner without the need for code workarounds as seen in this.
Alternatively, is there a simpler way to perform this than in the link provided above? Seems like a lot of code for what would seem to be a relatively common task...
The output I'm looking for would be something similar to this:
Ah I found one...
proc gmap data=mapsgfk.us map=mapsgfk.us ;
id statecode;
choro segment / levels=1 nolegend coutline=gray99 des='' name="blah";
run;
Produces the below image. It's missing PR but I can get by with this for the time being.
I have a SAS data set with missing data in multiple columns. I would like replace the missing data with a prediction based on the other data in the data set. Here a link that describes the method but doesn't show me how to do it. How do I replace the missing values with a prediction?
EDIT:
The method I had in mind was just using Proc Reg then apply the coefficents to the missing data to generate the estimate. Does this answer your question?
PROC STDIZE, PROC EXPAND, and PROC MI are all capable of performing different kinds of imputations on your data depending on exactly how you want do determine the 'prediction'.
For simple things like replacing with the mean, PROC STDIZE is the way to go. PROC MI is the most advanced - it performs multiple imputation. PROC EXPAND is appropriate if you have time-series data, as it will try to work out what the correct value is for that point in the time series.
If you have missing data in multiple columns you'll require multiple regressions. This probably isn't a good way to do this, but to answer the question - what you're requesting is called scoring a dataset and you can use PROC SCORE.
An alternative method is in your regression procedure request an OUTPUT data set that contains the predicted values for that regression.
output out=predicted1 p=pred_var_missing;
As a matter of methodology, I recommend #Joe's method instead.
Adding to #Joe 's answer, if you tell us why you want to do this imputation, we can provide better advice. I wrote a blog post called How to Ask a Statistics Question that may help.
However, often, single imputation is a bad method. More particularly, if you are going to do further analysis on this data (with the imputed values) then single imputation will underestimate the variability of the data and give wrong results.
PROC MI is usually a better approach.
I have a time series dataset with serious serial correlation problem, so I adopted Prais-Winsten estimator with iterated estimates to fix that. I did the regressions in Stata with the following command:
prais depvar indepvar indepvar2, vce(robust) rhotype(regress)
My colleague wanted to reproduce my results in SAS, so she used the following:
proc autoreg data=DATA;
model depvar = indepvar indepvar2/nlag=1 iter itprint method=YW;
run;
For the different specifications we ran, some of them roughly match, while others do not. Also I noticed that for each regression specification, Stata has many more iterations than SAS. I wonder if there is something wrong with my (or my colleague's) code.
Update
Inspired by Joe's comment, I modified my SAS code.
/*Iterated Estimation*/
proc autoreg data=DATA;
model depvar = indepvar indepvar2/nlag=1 itprint method=ITYW;
run;
/*Twostep Estimation*/
proc autoreg data=DATA;
model depvar = indepvar indepvar2/nlag=1 itprint method=YW;
run;
I have a few suggestions. Note that I'm not a real statistician and am not familiar with the specific estimators here, so this is just a quick read of the docs.
First off, the most likely issue is that it looks like SAS uses the OLS variance estimation method. That is, in your Stata code, you have vce(robust), which is in contrast to what I read SAS as using, the equivalent of vce(ols). See this page in the docs which explains how SAS does the Y-W method of autoregression, compared to this doc page that explains how Stata does it.
Second, you probably should not specify method=YW. SAS distinguishes between the simple Y-W estimation ("two-step" method) and iterated Y-W estimation. method=ITYW is what you want. You specify iter, so it may well be that you're getting this anyway as SAS tends to be smart about those sorts of things, but it's good to verify.
I would suggest actually turning the iterations off to begin with - have both do the two-step method (Stata option twostep, SAS by removing the iter request and specifying method=YW or no method specification). See how well they match there. Once you can get those to match, then move on to iterated; it's possible SAS has a different cutoff than Stata and may well not iterate past that.
I'd also suggest trying this with only one independent and dependent variable pair first, as it's possible the two programs handle things differently when you add in a second independent variable. Always start simple and then add complexity.
is it possible to show the mathemetical formular / concept behind the analysis done with SAS Enterprise?
Assuming SAS would calculate a correlation between a list of numbers -- is it possible to see what exactly SAS did from a mathematical perspective?
It is not possible to ask SAS for the mathematical formula, no. You can check the documentation; for example, this page gives many of the 'elemantary statistics' formulas (like variance, UCLM, etc.)
If you need the formula behind something more complex that you can't find online, contact your SAS Support rep, and they may be able to put you in contact with the developer of that particular proc - like if you need to know some particular to how PROC GLM does something.
You can ask SAS to give you the SAS code that it ran if you executed a task (in most cases it's available by clicking on the task node), in many cases, but that would be something like proc freq; tables a*b; run;, not a mathematical formula per se.
I have a single continuous variable with highly skewed distribution. I have log transformed it for normalization. while creating a histogram of the variable with PROC UNIVARIATE (SAS 9.3), is there a way by which I can plot the transformed variable, but keep the values of original variable on x axis ?
if this topic has been already discussed then, I would really appreciate if someone can provide a link. Thank You.
You could use the SAS Graph Template Language (GTL) to do this. The documentation contains plenty of examples that you should be able to change and modify to your needs. The output from PROC UNIVARIATE is produced by the GTL so you should be able to generate something similar.
Take the output dataset from proc univariate and base the plot off that. You will need to reverse the transformations first.
Documentation for the GTL:
http://support.sas.com/documentation/cdl/en/grstatgraph/65377/HTML/default/viewer.htm#p1sxw5gidyzrygn1ibkzfmc5c93m.htm