Consider two variables sex and age, where sex is treated as categorical variable and age is treated as continuous variable. I want to specify the full factorial using the non-interacted age-term as baseline (the default for sex##c.age is to omit one of the sex-categories).
The best I could come up with, so far, is to write out the factorial manually (and omitting `age' from the regression):
reg y sex#c.age i.sex
This is mathematically equivalent to
reg y sex##c.age
but allows me to directly infer the regression coefficients (and standard errors!) on the interaction term sex*age for both genders.
Is there a way to stick to the more economic "##" notation but make the continuous variable the omitted category?
(I know the manual approach has little overhead notation in the example given here, but the overhead becomes huge when working with triple-interaction terms).
Not a complete answer, but a workaround: One can use `lincom' to obtain linear combinations of coefficients and standard errors. E.g., one could obtain the combined effects for both sexes as follows:
reg y sex##c.age
forval i = 1/2 {
lincom age + `i'.sex#c.age
}
Related
I am meant to find the the variable that accounts for the highest percentage of variance in dependent variable SalePrice. I use the following code to find that the natural log of the SalePrice had the highest r-square value of over 90%:
proc reg data = proj.team4;
model SalePrice = Log_Price;
title "Regression of SalePrice and Log_Price";
run;
The varaible that came in second was Overall_Qual with a r-square value of about 50%.
I must write a prediction equation that will predict the value of SalePrice given the best significant predictor. Even though the Log_price r-square is much higher, am I right to think that the Overall_Qual variable should be used in the prediction equation instead?
My reasoning being that an equation that looks like this:
SalePrice = m(ln(SalePrice)) + b
would never predict the value of SalePrice since the value of SalePrice is needed in order to carry out the equation.
Though this mathematically makes some sense to try and fit any function over a domain by a linear function, this is statistical nonsense, because it is not telling anything about what determines SalesPrice.
When defining a mixed integer linear programming problem using pulp, one may define sos like so:
x1 = LpVariable('x1', cat = LpInteger)
x2 = LpVariable('x2', cat = LpInteger)
prob.sos1['sos'] = x1 + 2*x2
(an "sos", or specially ordered set, is a special constraint specifying that only a single variable in the set may be nonzero).
We see that this allows specifying weights for the sos variables (1,2 in this case). Presumably they define the precedence of each variable, i.e. which variables to allow to be nonzero first when branching.
But how exactly are the weights defined?
The underlying solver is coin-or-cbc, and I couldn't find anything about how they use SOS weights.
Weights can be used for branching, although not all solvers use them that way. I believe CBC does, but you'll probably need to check the source code to confirm.
Weights in SOS2 are often needed to specify the ordering (SOS2 has the concept of neighbors). SOS1 does not have this issue.
Finally, if you have good bounds, binary variables are often better than SOS1 variables. Solvers do better bounding and generate better cuts when using binary variables. My rule is: if you can formulate a SOS1 structure with binary variables using good big-M values use binary variables. If you cannot find good big-M values, consider SOS1.
Is it possible to constrain parameters in Stata's xttobit to be non-negative? I read a paper where the authors said they did just that, and I am trying to work out how.
I know that you can constrain parameters to be strictly positive by exponentially transforming the variables (e.g. gen x1_e = exp(x1)) and then calling nlcom after estimation (e.g. nlcom exp(_b[x1:_y]) where y is the independent variable. (That may not be exactly right, but I am pretty sure the general idea is correct. Here is a similar question from Statlist re: nlsur).
But what would a non-negative constraint look like? I know that one way to proceed is by transforming the variables, for example squaring them. However, I tried this with the author's data and still found negative estimates from xttobit. Sorry if this is a trivial question, but it has me a little confused.
(Note: this was first posted on CV by mistake. Mea culpa.)
Update: It seems I misunderstand what transformation means. Suppose we want to estimate the following random effects model:
y_{it} = a + b*x_{it} + v_i + e_{it}
where v_i is the individual random effect for i and e_{it} is the idiosyncratic error.
From the first answer, would, say, an exponential transformation to constrain all coefficients to be positive look like:
y_{it} = exp(a) + exp(b)*x_{it} + v_i + e_{it}
?
I think your understanding of constraining parameters by transforming the associated variable is incorrect. You don't transform the variable, but rather you fit your model having reexpressed your model in terms of transformed parameters. For more details, see the FAQ at http://www.stata.com/support/faqs/statistics/regression-with-interval-constraints/, and be prepared to work harder on your problem than you might have expected to, since you will need to replace the use of xttobit with mlexp for the transformed parameterization of the tobit log-likelihood function.
With regard to the difference between non-negative and strictly positive constraints, for continuous parameters all such constraints are effectively non-negative, because (for reasonable parameterization) a strictly positive constraint can be made arbitrarily close to zero.
I am running regressions with fixed effects in Stata, using areg, and I just realized its reports a constant term. Allegedly, what areg does is perform a "within" transformation to the data and then run OLS on the transformed data. However, the constant term from the original model is destroyed by the "within" transformation.
Therefore, what does the constant reported by areg mean? Is it a programming mistake? I don't think so, because areg does not allow the -nocons- option, and it would seem that the reason is connected to the meaning of the constant.
With areg (as well as xtreg with the fe option), the intercept is fit so that y-bar minus x-bar times beta-hat is equal to zero. In other words, Stata calculates an intercept that makes the prediction calculated at the means of the independent variables equal to the mean of the dependent variable.
As said in the Stata manual:
The intercept reported by areg deserves some explanation because, given k mutually exclusive and exhaustive dummies, it is arbitrary. areg identifies the model by choosing the intercept that makes the prediction calculated at the means of the independent variables equal to the mean of the dependent variable: y = xb.
areg does give a different constant than using regress.
areg is kind of tricky as it is based on the assumption that the (absorbed) no. of groups will not increase with sample size.
I have a proc discrim statement which runs a KNN analysis. When I set k = 1 then it assigns everything a category (as expected). But when k > 1 it leaves some observations unassigned (sets category as Other).
I'm assuming this is a result of deadlock votes for two or more of the categories. I know there are ways around this by either taking a random one of the deadlocked votes as the answer, or taking the nearest of the deadlocked votes as the answer.
Is this functionality available in proc discrim? How do you tell it how to deal with deadlocks?
Cheers!
Your assumption that the assignment of an observation to the "Other" class results from the same probability of assignment to two or more of the designated classes is correct when the number of nearest neighbors is two or more. You can see this by specifying the PROC DISCRIM statement option, OUT=SASdsn, to write a SAS output data set of how well the procedure classified the input observations. This output data set contains probabilities for assignment to each of the designated classes. For example, using two nearest neighbors (K=2) with the iris data set yields five observations that the procedure classifies as ambiguous, with a probability of 0.50 for being assigned to either the Versicolor or the Virginica class. From the output data set, you can select these ambiguously classified observations and assign them randomly to these classes in a subsequent DATA step. Or, you can compare the values of the variables used to classify these ambiguously classified observations to the means of these values for each of the classes, perhaps by calculating a squared distance +/- standardized by the standard deviation of each value and by assigning the observation to the "closest" class.