I have two variables containing state identifier and year. If I want to create dummy variables indicating each state, I usually write the following code:
tab state_id, gen(state_id_)
This will give me a group of variables, state_id_1,state_id_2,... etc. But what operations are available if I want to get a list of dummy variables for the interaction of state and year, for instance a dummy variable indicating state 1 in year 2005.
Have you tried looking at xi (https://www.stata.com/manuals13/rxi.pdf)? It will create dummies for each of the categorical variables and for the interaction of those two. So if you do:
xi i.state*i.year
This should give you what you are looking for, but note that it will naturally code this and omit the first category of each of your categorical variables.
Related
The example below reproduces my problem. There is a string variable which takes several values. I want to create a global list and iterate over it in a loop. But it does not work. I've tried several versions without success. Here is the example code:
webuse auto, clear
levelsof make // list of car makes
global MAKE r(levels) // save levels in global list
foreach i in $MAKE { // loop some command over saved list
sum if make == "`$MAKE'" // ERROR 198, invalid 'Concord'
}
Using "`$MAKE'" or $MAKE does not yield desired output.
Any ideas of what am I doing wrong?
Normally, for lists to work, they should be saved as in A B C D [...]. In my case, levelsof produces a list of the following kind:
di $MAKE
`"AMC Concord"' `"AMC Pacer"' `"AMC Spirit"' `"Audi 5000"' `"Audi Fox"' `"BMW 320i"' [...]
So clearly not what is needed. But not sure how to get what I need.
Here is a solution. Note that I am using a local instead of a global. The difference is only scope. Only use global if you need to reference the value across do-files. You can remove the display lines below.
*Sysuse reads this data from disk, it comes with all Stata installations
sysuse auto, clear
*Use levelsof, and assign the returned r(levels) using a = to the local
levelsof make
local all_makes = r(levels)
*Loop over the local like this. Note that foreach creates a local, in this
*case called this_make that stores the elements in the local one per iteration
foreach this_make of local all_makes {
display "`this_make'"
sum if make == "`this_make'"
}
If global is what you need, then you simply change it to this:
*Sysuse reads this data from disk, it comes with all Stata installations
sysuse auto, clear
*Use levelsof, and assign the returned r(levels) using a = to the global
levelsof make
global all_makes = r(levels)
*Loop over the global like this. Note that foreach creates a local, in this
*case called this_make that stores the elements in the global one per iteration
foreach this_make of global all_makes {
display "`this_make'"
sum if make == "`this_make'"
}
There is a fine accepted answer but plenty more can be said. See for example this FAQ.
I am positive about levelsof as its original author, but for the purpose specified, to loop over the levels of a variable, it can be a lot cleaner to use egen, group() and loop over the integer levels of that variable. See the FAQ just linked for more. The example in the original question is a case in point, as looping over distinct string values can be tricky with a need to use double quotes " " and to watch out for spaces and so forth.
The underlying problem is not revealed but an extra comment is to underline that by: and its sibling commands such as statsby or commands similar in spirit such as rangestat from SSC offer, in effect, looping without looping.
I have data of 18 states for 6 years(2009-2014).How can i create dummies which consider state and time effect simultaneously?
Without your data I have to assume a lot to answer this, but if I assume your state variable is a string and your year variable is numeric, then to create dummy variables for this I would put the two variables together and then encode them, like below:
tostring year, replace
gen state_year = state+year
encode state_year, gen(state_year_num)
and state_year_num is your indicator variable.
If you want a bunch of dummy variables you can add this line:
tabulate state_year_num, gen(dummy)
which will generate as many dummy variables as state-year pairs.
I am setting up a dynamic model in Stata 13 by using the xtabond command. I need to add interaction between the lagged dependent variable and other variables, as attached here Formula
My attempts:
xtabond depvariable c.L1.depvariable#c.indvariable, lags(1) artests(2)
xtabond depvariable c.L1.depvariable*c.indvariable, lags(1) artests(2)
xtabond depvariable L1.depvariable*indvariable, lags(1) artests(2)
but they do not work.
Can someone please help me with the syntax? Or does some dirty alternative exist (for instance, creating interaction variable by hand)?
I am using Stata 12 and I have to run a Ordered Probit (oprobit) with a panel dataset. I know that "oprobit" command is compatible with cross-section analysis. In the new version of Stata (Stata 13) they have "xtoprobit" command to account for Random Effects Ordered Probit. I need the similar command for Stata 12. I have checked "reoprob" command but when I use it with my panel dataset I have the following error :
"factor variables and time-series operators not allowed"
That means you need to create your own dummy variables instead using the factor variable notation i.dummyvar. Try this:
tab dummyvar, gen(D)
reg y D*
This will creates a set of dummy variables (D1, D2,...) reflecting the observed values of the tabulated variable.
Some of the older user-written commands do not know what to do with the factor variable notation, which is convenient, but fairly new.
You can also explore xi for more complicated tasks.
In Stata, is there a quick way to show the correlation between a variable and a bunch of dummies. In my data I have an independent variable, goals_scored in a game, and a bunch of dummies for stadium played. How can I show the correlation between the goals_scored and i.stadium in one table, without getting the correlations between stadiums, which I do not care about.
Here's one way:
#delimit;
quietly tab stadium, gen(D); // create dummies
foreach var of varlist D* {;
quietly corr goals_scored `var';
di as text "`: variable label `var'': " as result r(rho);
};
drop D*; // get rid of dummies
cpcorr from SSC (install with ssc inst cpcorr) supports minimal correlation tables, i.e. only the correlations between one set and another set, without the others. But it's an old program (2001) and doesn't support factor variables directly. The indicator variables (a.k.a. dummy variables) would have to exist first.
If you store all of the stadium variables in a local, you would probably loop through them to pull the correlations.
1.
If all stadium variables are placed next to each other in the dataset:
foreach s of varlist stadium1-stadium150 {
// do whatever
}
2a.
If the stadium variables are not next to each other, use order to get there.
2b.
If the variable names follow a pattern, there might be another workaround.
3.
I would not use correlation for this. Depending on the distribution of goals, I would consider something else.