CONTRAST for CLASS variable with more than two levels in PROC GLM - sas

Background: When we test the significance of a categorical variable that has been coded as dummy variables, we need to simultaneously test all dummy variables are 0. For example, if X takes on values of 0, 1, 2, 3 and 4, I would fit dummy variables for levels 1-4 (assuming I want 0 to be baseline), then want to simultaneously test B1=B2=B3=B4=0.
If this is the only variable in my data set, I can use the overall F-statistic to achieve this. However, if I have other covariates, the overall F-test doesn't work.
In Stata, for example, this is (very, very) simply carried out by the testparm command as:
testparm i.x (after fitting the desired regression model), where the i. prefix tells Stata X is a categorical data to be treated as dummy variables.
Question/issue: I'm wondering how I can do this in SAS with a CONTRAST (or ESTIMATE?) statement while fitting a regression model with PROC GLM. Since I have scoured the internet and haven't found what I'm looking for, I'm guessing I'm missing something very obvious. However, all of the examples I've seen are NOT for categorical (class) variables, but rather two separate (say continuous) variables. The contrast statement in that case would simply be something like
CONTRAST 'Contrast1' y 1 z 1;
Otherwise, they're for calculating hypotheses like H_0: B1-B2=0.
I feel like I need to breakdown the hypotheses into smaller pieces and determine that set that defines the whole relationship, but I'm not doing it correctly. For example, for B1=B2=B3=B4=0, I thought I might say B1=B2=B3=-B4, then define (1) B1=-B4, (2) B2=-B4 and (3) B2=B3. I was trying to code this as a CONTRAST statement as (say X is in descending order in data set: 4-0):
CONTRAST 'Contrast' x -1 0 0 1 0
x -1 0 1 0 0
x 0 1 1 0 0;
I know this is not correct, and I tried many, many variations and whatever random logic I could come up with. My problem is I have relatively novice-level knowledge of CONTRAST (and unfortunately have not found great documentation to help with this) and also of how this hypothesis test should really be formulated for the sake of estimation (do I try to split it up into pieces as I did above, or...?).

From my note above, you actually can get SAS to do this for you with PROC GENMOD and the CLASS statement and a TYPE3 specification.
proc genmod data=input;
class classvar ;
model slope= classvar othervar/ type3;
run;
quit;
In the example above, my class levels are in the classvar variable. The othervar is my other covariate.
At the end of the output, you see a table labeled LR Statistics For Type 3 Analysis. The row for classvar is the LR test of all the class effects=0.

Another case where PROC REG with TEST works (TEST x1=0, x2=0, x3=0, x4=0, e.g.), which isn't answering my initial question for PROC GLM, but is an option if PROC REG gets the job done for your type of model.

Related

Why does SAS not give p-values for the fixed effects in the mixed model?

I work with a mixed model to see the effects of variables. The code I use is:
proc mixed data=pb2;
class treat_a treat_b hoknr_ day;
model conc=treat_a|treat_b hoknr_/outp=residuals1 residual;
repeated day/subject=hoknr_(treat_a treat_b)type=vc;
run;
The outcome has no p-values for treat_a, treat_b or treat_a|treat_b but it does for hoknr_. I excluded the repeated statement, I simplified the model, I changed class but still I got no p-values for all of my fixed effects. I have used this model before and it worked, now I fitted it to this dataset and it this not fully function.
Edit1
The table of Type 3 Tests of Fixed Effects shows like this:
Type 3 Table.
The treatments could be non-estimable (treat_a is yes or no, likewise for treat_b). I have changed yes/no to 0 or 1, did not change the Type 3 Table. I have worked before with treatments being expressed in words what did not result in a table like this.
Edit2 When solution is added to the model statement, this is the result: Solution for Fixed effects.
What is wrong with this model that it does not show p-values for all fixed effects?
You might need to specify the type of test that you want with the htype= option. It sounds like one of those procs where someone didn't program the function initially, and it was kind of an afterthought late in development (not unlike the showpvalues option in proc glmselect; to this day I think that's the weirdest option in a regression proc).
https://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_mixed_sect015.htm#statug.mixed.mixedmodelhtype
Type 3 Tests of Fixed Effects
You can use the HTYPE= option in the MODEL statement to obtain tables
of Type 1 (sequential) tests and Type 2 (adjusted) tests in addition
to or instead of the table of Type 3 (partial) tests.
The ODS table names are "Tests1" for the Type 1 tests, "Tests2" for the Type 2 tests, and "Tests3" for the Type 3 tests.
Or, it could be that some of your fixed effects are not estimable.
The reason of causing the 0 in your result is your treat_a and treat_b are categorical variables. And treat_a = 1 and treat_b = 1 are reference levels. So you are missing p values in your solution table. In your interaction terms, there won't have p values if the terms include treat_a=1 or treat_b=1.

Value of coefficient (Beta1) at different values of other covariate (X2), hopefully graphed

(cross-posted at http://www.statalist.org/forums/forum/general-stata-discussion/general/1370770-margins-plot-of-treatment-effect-rather-than-y-for-values-of-a-covariate)
I'm running a multivariate regression (outcome variable is continuous, happens to be GPA). The covariate of interest is a dummy variable for treatment status; another of the covariates is a pre-score. We want to look at how the treatment effect differs at various values of pre-score. The structure of the model is not complicated:
regress GPA treatment pre_score X3 X4 X5...
What I want is a graph that shows what the treatment effect is (values of Beta1) at various values of pre-score (X2). It's straightforward to get a graph with values of the OUTCOME at various values of X2:
margins, at(pre_score= (1(0.25)5)) post
marginsplot
I have consulted an array of resources and tried alternatives using marginscontplot, coefplot with recast, the dy/dx option, and so forth. I remain unsuccessful. But this seems like something that there must be a way to do; wanting to know if a treatment effect varies for values of a control (say, income) must be common.
Can anyone direct me to the right command, or options for Margins, to output values of Beta1 (coefficient on treatment dummy), rather than of Y (GPA), at values of the pre_score?
Question was resolved at Statalist. Turns out that Margins alone can't do what I was trying to; the model needs to be run with an interaction term. Then it's simple.

extract value from variable stata

I have variables:
set obs 1000
g X= rnormal(0,1)
egen t=fill(1 2)
I need to generate a new variable, that would consist of one value: the first value of X. I tried:
separate X, by(_n <= 1)
and
gen X1 = X if t<=1
But these options give me a vector 100x1 with the first value - the value I need and 99 of empty cells. How can I generate simply a one value variable:1x1?
you have to write two lines of codes my friend
gen X1 = X if t<=1
replace X1=X1[_n-1] if missing(X1[_n])
and
local my_parameter=X1[1]
and then you happily use your `my_parameter' macro in your arma regressions
. di `my_parameter'
-.44087866
remember, to use a macro (more usually called a parameter in other languages) in a regression in stata you need to embed its name with `'
I don't disagree with the other two helpful answers already posted, but when I read "How can I generate simply a one value variable:1x1?", I cant help but think you are looking for a scalar or a macro.
If that is true they you might be better off with
sum X in 1
di r(mean)
From here, storing this value to use later is trivial:
sca MyVar = r(mean)
From help summarize, you will see that sum stores the mean, min and max among many other useful measures.
To see yourself, run return list after the call to sum to see what is returned.
By using in 1 you are restricting the summarize command to only run for the first observation. Naturally then, many of the scalars returned by summarize will equal the value you desire.
If you wish, you may also precede sum with quietly to suppress the output, or add the meanonly option to calculate only the mean along with suppressing the display.
Perhaps this will point you in a helpful direction
generate X1 = X[1]
The point is that X[1] is the value of X in the first observation. Now having said that, what do you want to do with that value? Your dataset has 1000 observations. Do you want a local or global macro? A scalar? If you intend to use it in a formula applied to all 1000 observations, then perhaps a variable with the same value for every observation will be sufficient.

Dummy and Heckman

I'm using Heckman Selection Model which are two consist of 2 equation. i'm using Probit as a selection equation and multiple regression as a result equation.
how can put in dummy variables in those equation ?
Do we have to make the variables into logaritmic form ?
How can I make logaritmic variables with stata ?
Thank you..
Here's an example of how you might do what you ask. The example looks at the effect of being a union member on log wages:
webuse union3
gen log_wage = ln(wage)
etregress log_wage age grade i.smsa i.black tenure, treat(union = i.south i.black tenure) twostep
etregress estimates an average treatment effect of an endogenous binary-treatment variable. In plain English, that means the "first-stage" is a probit. Estimation is by either full maximum likelihood or a two-step consistent estimator, as above.
The dummies are created on the fly by putting an i. in front of the covariates. This is called factor variable notation, and it also makes interactions a breeze. You can also do tab race, gen(d_) to create d_1, d_2, and d_3 (3 race dummies, one of which you can drop).

Combining a list of observations into one single variable in Stata

I ran 500 simulations in Stata, i.e. I draw 500 samples, and each sample contains 10 observations. I want to generate a mean for each sample and combine all the 500 means into one variable, because I need to plot a histogram of the means. Currently I have 500 samples, named X1, X2, ... X500, where each X has 10 elements in it. I want to get a mean for each X and plot a histogram of the means. Can someone please show me how to do that? I tried to generate a new variable for the mean, i.e. X1mean = mean(X1), but this wouldn't work, because all 10 empty elements would be filled with the mean.
"Please tell me the code" questions are widely considered off-topic here. See https://stackoverflow.com/help/on-topic : "Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results."
There are various ways to do this. One is to collapse and then xpose or reshape long. In fact, you could have produced a combined sample of 500 x 10 in the first place.
Another is to loop over variables like this
set obs 500
gen mean = .
quietly forval j = 1/500 {
su X`j', meanonly
replace mean = r(mean) in `j'
}
histogram mean
What you are presumably alluding to is code such as
egen X1mean = mean(X1)
That would be no use, but not for the reason you mention, as identical values can always be ignored: it would be no use because similar code would just produce 500 more variables. Note that mean() would not work with generate as mean() is an egen function.
The terminology you seek is observations, not elements.