SAS Proc MI SAS output - sas

Proc MI is used to impute missing values in a SAS dataset. Is there a way to obtain a SAS code from Proc MI procedure, so that we can score datasets with missing value without having to use Proc MI procedure? This is needed so that dataset in production environment can be score consistently. I dont want to use Proc MI in production environment.
Thanks!

I don't believe this is possible, at least not in the way that it is for proc import. If you want to use a purely deterministic imputation algorithm, you will need to write your own data step to do it.

I don't know how complicated your data set is, but if you score a data set that covers all the possibilities you will see in the production data, then you can use your scored data set as a lookup table for the production data.

Related

Kruskal-Wallis test vs. ANOVA in SAS with complex survey data?

I am analyzing a temporal trend(yr) of certain chemicals(a b & c).
I use proc sgplot and series statement to draw a plot and found there was a decreasing trend.
Becuase the data is right-skewed, I used the median concentration of each year to draw the plot.
Now I would like to conduct a statistical test on the trend. My data came from the NHANES and need to use the proc survey** to perform analysis. I know I can do an ANOVA test based on proc surveyreg and use ANOVA option in the MODELstatement.
proc suveyreg data=a;
stratum stra;
cluster clus;
weight wt;
model a=yr/anova;
run;
But since the original data is right-skewed, I think maybe it is better to use Kruskal-Wallis test on the original data. But I don't know how to write a code in SAS and I didn't find information in proc survey**-related document.
My plan B is to use the log-transformed data and ANOVA test. But I am not sure if that is an appropriate approach. Can somebody tell me how to get the normality test of the residual in ANOVA while using proc surveyreg? I would also like to know if I can test a b & c in one procedure or I should write multiple procedures with changes in MODEL statement.
Looking forward to your engagement.Thank you!

Interpreting WHERE= in Proc

Converting a SAS code to Sql based process. Came across this pretty simple snippet.
proc freq data=temp1(where=(SAMERETAIL='Y')) noprint;
tables RETAILER*store/list nocum nopercent out=retailer_list;
run;
My interpretation of this is:
From Temp1:
Choose all observations which fit the criteria (sameretail=Y)
Extract Retail, Store frequency counts:
Store Retailer Count(*)
Output to Retailer_List.
The question I have is on the WHERE=. Is this applied to the Proc or Data? Is my interpretation correct? Business wise this is incorrect since we are only restricting the records with the flag=Y. Hence the question.
Any pointers?
Any help is greatly appreciated.
TIA.
where= is applied to the dataset you're pulling observations from to use in the proc freq. SUGI 24 has a good summary of how this works (see page 3).
THE WHERE DATA SET OPTION
The syntax of the WHERE data set option, called
WHERE=, is a combination of standard data set
option parentheses and a where-expression.

Replace missing data in SAS with prediction: Regression Imputation

I have a SAS data set with missing data in multiple columns. I would like replace the missing data with a prediction based on the other data in the data set. Here a link that describes the method but doesn't show me how to do it. How do I replace the missing values with a prediction?
EDIT:
The method I had in mind was just using Proc Reg then apply the coefficents to the missing data to generate the estimate. Does this answer your question?
PROC STDIZE, PROC EXPAND, and PROC MI are all capable of performing different kinds of imputations on your data depending on exactly how you want do determine the 'prediction'.
For simple things like replacing with the mean, PROC STDIZE is the way to go. PROC MI is the most advanced - it performs multiple imputation. PROC EXPAND is appropriate if you have time-series data, as it will try to work out what the correct value is for that point in the time series.
If you have missing data in multiple columns you'll require multiple regressions. This probably isn't a good way to do this, but to answer the question - what you're requesting is called scoring a dataset and you can use PROC SCORE.
An alternative method is in your regression procedure request an OUTPUT data set that contains the predicted values for that regression.
output out=predicted1 p=pred_var_missing;
As a matter of methodology, I recommend #Joe's method instead.
Adding to #Joe 's answer, if you tell us why you want to do this imputation, we can provide better advice. I wrote a blog post called How to Ask a Statistics Question that may help.
However, often, single imputation is a bad method. More particularly, if you are going to do further analysis on this data (with the imputed values) then single imputation will underestimate the variability of the data and give wrong results.
PROC MI is usually a better approach.

plot entire time series in SAS?

I am trying to make a forecast, and I want to see the entire time series, with the forcasted period at the end (need to compare with another graph of this kind).
SAS 9.4 does not want to comply, however, and only shows me the forecasting part.
What can I do to remedy this?
The code I'm using is:
Proc arima data=logtabell;
identify var=y(12) nlag=24;
estimate p=1 q=2;
forecast lead=12 interval=month id=date out=results;
run;
Your out= dataset will contain all values by default. If you want to specifically see the full graph that it outputs, add the plots option:
proc arima data=logtabell plots=forecast(forecast);
Or, just do it the easy way by getting every plot:
proc arima data=logtabell plots = all;
Also, make sure ods graphics on; is set.
SAS procedures have an out=resultSet option, from which you can get the results in a dataset.
Combine this output with your time serious in one graph created with proc sgplot

how to get timing information of a data step query

I just wanted to know like in proc sql we define stimer option.
The PROC SQL option STIMER | NOSTIMER specifies whether PROC SQL writes timing information for each statement to the SAS log, instead of writing a cumulative value for the entire procedure. NOSTIMER is the default.
Now in same way how to specify timing information in data set step. I am not using proc sql step
data h;
select name,empid
from employeemaster;
quit;
PROC SQL steps individually are effectively separate data steps, so in a sense you always get the identical information from SAS. What you're asking is effectively how to find out how long 'select name' takes versus 'empid'.
There's not a direct way to get the timing of an individual statement in a data step, but you could write data step code to find out. The problem is that the data step is executed row-wise, so it's really quite different from the PROC SQL STIMER details; almost nothing you do in a data step will take very long by itself, unless you are doing something more complex like a hash table lookup. What takes long is writing out the data first, and reading in the data second.
You do have some options for troubleshooting long data steps, if that's your concern. OPTIONS MSGLEVEL=I will give you information about index usage, merge details, etc., which can be helpful if you aren't sure why it is taking a long time to do certain things (see http://goo.gl/bpGWL in SAS documentation for more info). You can write your own timestamp:
data test;
set sashelp.class sashelp.class;
_t=time();
put _t=;
run;
Odds are that won't show you much of use since most data step iterations won't take very long but if you are doing something fancy it might help. You could also use conditional statements to only print the time at certain intervals - when at FIRST.ID for example in a process that works BY ID;.
Ultimately though the information you already get from notes is what is most useful. In PROC SQL you need the STIMER information because SQL is doing several things at once, while SAS lets/makes you do everything out step-wise. Example:
PROC SQL;
create table X as select * from A,B where A.ID=B.ID;
quit;
is one step - but in SAS this would be:
proc sort data=a; by ID; run;
proc sort data=b; by ID; run;
data x;
merge a(in=a) b(in=b);
by id;
if a and b;
run;
For that you would get information on the duration of each of those steps (the two sorts and the merge) in SAS, which is similar to what STIMER would tell you.
No way.
PROC SQL STIMER logs timing for each separately executable SQL statement/query.
In data step, as you may know, the data step looping occurs, observation per observation, so the data step statement timing would be something like per observation, let's say transactional. Anyway this would not describe all the details where the time is being spent - waiting for disk reads, writes, etc.
So I guess this won't be very usable. In general, SAS performance is I/O driven.