We have a large dataset, and when we run proc mixed or hpmixed on a sample of the data, we get the same coefficients for the continuous variables. However, for HPMIXED, SAS ignores the reference values we give it for categorical variables, and the intercept becomes 0.
Is there an option to force HPMIXED to use an intercept and to 'not ignore' the ref="" values we choose?
Thank you in advance for your help!
Related
I have a SAS data set with missing data in multiple columns. I would like replace the missing data with a prediction based on the other data in the data set. Here a link that describes the method but doesn't show me how to do it. How do I replace the missing values with a prediction?
EDIT:
The method I had in mind was just using Proc Reg then apply the coefficents to the missing data to generate the estimate. Does this answer your question?
PROC STDIZE, PROC EXPAND, and PROC MI are all capable of performing different kinds of imputations on your data depending on exactly how you want do determine the 'prediction'.
For simple things like replacing with the mean, PROC STDIZE is the way to go. PROC MI is the most advanced - it performs multiple imputation. PROC EXPAND is appropriate if you have time-series data, as it will try to work out what the correct value is for that point in the time series.
If you have missing data in multiple columns you'll require multiple regressions. This probably isn't a good way to do this, but to answer the question - what you're requesting is called scoring a dataset and you can use PROC SCORE.
An alternative method is in your regression procedure request an OUTPUT data set that contains the predicted values for that regression.
output out=predicted1 p=pred_var_missing;
As a matter of methodology, I recommend #Joe's method instead.
Adding to #Joe 's answer, if you tell us why you want to do this imputation, we can provide better advice. I wrote a blog post called How to Ask a Statistics Question that may help.
However, often, single imputation is a bad method. More particularly, if you are going to do further analysis on this data (with the imputed values) then single imputation will underestimate the variability of the data and give wrong results.
PROC MI is usually a better approach.
I'm considering teaching my introductory statistics course in SAS Enterprise Guide. I want my students to be able to calculate p-values and percentiles for various distributions (binomial, normal, t, chi-square) with the drop-down menus if at all possible. For example, is there a way to do both of:
DATA pval;
pval=1-PROBBNML(0.5,25,15);
RUN;
PROC PRINT DATA=pval;
RUN;
and
DATA chi;
qchi=CINV(0.95,4);
RUN;
PROC PRINT DATA=chi;
RUN;
via the drop-down menus?
When you open a data set in SAS, there is a button at the top called 'Analyze'. This has some built in functions ,although they are a bit more advanced than calculating p values.
There is in fact a way of adding dropdown menus in EG to do what you need and it is call PROMPTS.
You could have a prompt that let you select between normal and chi-square for example. You can have distribution paramaters static knowing that some won't apply in some distributions or make it dinamically depending on your selection.
here is a nice article re prompts how do you use a variable prompt
hope this helps
You could create a new dataset (File/New/Data) and define parameters of the function as variables. You can then fill in one or several lines/examples with different parameters. Using 'query builder' then computed columns icon, you can create a new variable using the desired function (CINV, PROBBNML or other) which will store the result in a new dataset.
It would be better to use only one dataset by function but you can show the result for different values of the parameters which may be interesting for your students.
This might be a weird question. I have a data set contains data like agree, neutral, disagree...for many questions. There is not so many observations so for some question, one or more options has frequency of 0, say neutral. When I run proc freq, since neutral shows up in that variable, the table does not contain a row for neutral. I end up with tables with different number of rows. I would like to know if there is a option to show these 0 frequency rows. I will also need to run proc gchart for the same data set, and I will run into the same problem for having different number of bars. Please help me on this. Thank you!
This depends on how exactly you are running your PROC FREQ. It has the sparse option, which tells it to create a value for every logical cell on the table when creating an output dataset; normally, while you would have a cell with a missing value (or zero) in a crosstab, if that is output to a dataset (which is vertical, ie each combination of x and y axis value are placed in one row) those rows are left off. Sparse makes sure that doesn't happen; and in a larger (n-dimensional) crosstab, it creates rows for every possible combination of every variable, even ones that don't occur in the data.
However, if you're just doing
proc freq data=mydata;
tables myvar;
run;
That won't help you, as SAS doesn't really have anything to go on to figure out what should be there.
For that, you have to use a class variable procedure. Proc Tabulate is one of such procedures, and is similar to Proc Freq in its syntax (sort of). You need to either use CLASSDATA on the proc statement, or PRINTMISS on the table statement. In the former case, you do not need to use a format, I don't believe. In the latter case (PRINTMISS), you need to create a format for your variable (if you don't already have one) that contains all levels of the data that you want to display (even if it's just an identity format, e.g. formatting character strings to identical character strings), and specify PRELOADFMT on the proc statement. See this man page for more details.
Background: I have a categorical variable, X, with four levels that I fit as separate dummy variables. Thus, there are three total dummy variables representing x=1, x=2, x=3 (x=0 is baseline).
Problem/issue: I want to be able to calculate the value of a linear combination (i.e. using SAS as a calculator) of these dummy variables. For example, 2*B1 + 2*B2 + B3.
In Stata, this can be done using the lincom command, which uses the stored beta estimates to calculate linear combinations of the parameters.
In SAS in a procedure such as PROC GLM, I think I should use the ESTIMATE statement, but I'm not sure how I would specify the "weights" for each variable in this case.
You are looking for PROC SCORE. This takes output regression or factor estimates and scores a new data set. See here for an example. http://support.sas.com/documentation/cdl/en/statug/66859/HTML/default/viewer.htm#statug_score_examples02.htm
FYI, PROC MODEL does allow this in the model statement, which may be less work than PROC SCORE. I know PROC MODEL can be used readily in place of PROC REG, but I'm not sure how advanced of modeling PROC MODEL does, so it may not be an option for more complex models. I was hoping for something with less coding, but given the nature of SAS, I think this and PROC SCORE are the best I'm going to get.
What if you add your linear combination as a variable in your input dataset?
data myDatasetWithLinCom;
set mydata;
LinComb=2*(x=1)+ 2*(x=2)+(x=3); /*equvilent to 2*B1 + 2*B2 + B3*/
run;
then you can specify LinComb as one of the explanatory variables and you can lookup the coefficient directly from the output.
I have a single continuous variable with highly skewed distribution. I have log transformed it for normalization. while creating a histogram of the variable with PROC UNIVARIATE (SAS 9.3), is there a way by which I can plot the transformed variable, but keep the values of original variable on x axis ?
if this topic has been already discussed then, I would really appreciate if someone can provide a link. Thank You.
You could use the SAS Graph Template Language (GTL) to do this. The documentation contains plenty of examples that you should be able to change and modify to your needs. The output from PROC UNIVARIATE is produced by the GTL so you should be able to generate something similar.
Take the output dataset from proc univariate and base the plot off that. You will need to reverse the transformations first.
Documentation for the GTL:
http://support.sas.com/documentation/cdl/en/grstatgraph/65377/HTML/default/viewer.htm#p1sxw5gidyzrygn1ibkzfmc5c93m.htm