Iterate over list produced with levelsof - stata

The example below reproduces my problem. There is a string variable which takes several values. I want to create a global list and iterate over it in a loop. But it does not work. I've tried several versions without success. Here is the example code:
webuse auto, clear
levelsof make // list of car makes
global MAKE r(levels) // save levels in global list
foreach i in $MAKE { // loop some command over saved list
sum if make == "`$MAKE'" // ERROR 198, invalid 'Concord'
}
Using "`$MAKE'" or $MAKE does not yield desired output.
Any ideas of what am I doing wrong?
Normally, for lists to work, they should be saved as in A B C D [...]. In my case, levelsof produces a list of the following kind:
di $MAKE
`"AMC Concord"' `"AMC Pacer"' `"AMC Spirit"' `"Audi 5000"' `"Audi Fox"' `"BMW 320i"' [...]
So clearly not what is needed. But not sure how to get what I need.

Here is a solution. Note that I am using a local instead of a global. The difference is only scope. Only use global if you need to reference the value across do-files. You can remove the display lines below.
*Sysuse reads this data from disk, it comes with all Stata installations
sysuse auto, clear
*Use levelsof, and assign the returned r(levels) using a = to the local
levelsof make
local all_makes = r(levels)
*Loop over the local like this. Note that foreach creates a local, in this
*case called this_make that stores the elements in the local one per iteration
foreach this_make of local all_makes {
display "`this_make'"
sum if make == "`this_make'"
}
If global is what you need, then you simply change it to this:
*Sysuse reads this data from disk, it comes with all Stata installations
sysuse auto, clear
*Use levelsof, and assign the returned r(levels) using a = to the global
levelsof make
global all_makes = r(levels)
*Loop over the global like this. Note that foreach creates a local, in this
*case called this_make that stores the elements in the global one per iteration
foreach this_make of global all_makes {
display "`this_make'"
sum if make == "`this_make'"
}

There is a fine accepted answer but plenty more can be said. See for example this FAQ.
I am positive about levelsof as its original author, but for the purpose specified, to loop over the levels of a variable, it can be a lot cleaner to use egen, group() and loop over the integer levels of that variable. See the FAQ just linked for more. The example in the original question is a case in point, as looping over distinct string values can be tricky with a need to use double quotes " " and to watch out for spaces and so forth.
The underlying problem is not revealed but an extra comment is to underline that by: and its sibling commands such as statsby or commands similar in spirit such as rangestat from SSC offer, in effect, looping without looping.

Related

Is there a way to get the variable type?

I would like to pull out the categorical variables in Stata but it seems I am not doing the right thing. Any leads?
* Pull out variables with value labels
sysuse auto
label dir
local catvars = r(names)
foreach var of varlist _all{
if `var' in `catvars'{
di "`var'"
}
}
If the criterion for a categorical variable is that it has value labels, then ds is the simplest tool:
. sysuse auto, clear
(1978 automobile data)
. ds, has(vallabel)
foreign
ds returns a list of such variables (here just one) in r(varlist).
ds is an official command. The specific syntax just used, the option has(), was my addition in a community-contributed command later folded back into what is distributed with Stata itself (which is what "official" means here).
findname from the Stata Journal has the same functionality, and much more. It was written by me as an extension of ds. In some cases. as for this problem, the syntax is different, representing my second thoughts on what is clean, but users naturally may disagree.
. findname, vallabel
foreign
THe general question in the title can be answered too. help macro is the gateway to documentation of various low-level tools, which in essence underline most of what ds and findname do as a matter of convenience. findname now also uses some Mata functionality that isn't directly matched in Stata itself. Among the extra options of findname is local() to put the results of findname directly into a local macro; there is no such option in ds.

Stata: expand by the number of variables

This relates to a general question I'm asking myself, how can I use the results of some code in another code if Stata does not create new objects except these clandestine locals and globals?
I would like to combine:
di c(k)
and:
expand
which I R I would simply do by writing something like expand(di c(k)). How does Stata take care of wrapped functions?
edit: I'm fine with using locals and globals but I don't always know how to call them into a function.
edit2: for everyone else who has trouble keeping track of 'clandestine' globals and locals: macro list
The difficulty you have in using locals, globals, scalars, saved results is not obvious from your question. An example is:
clear
set more off
sysuse auto
keep rep78
summarize
return list
expand r(max)
Saved results may disappear when other commands are issued, but you can save them into a local, for example, and use them later:
local rmax = r(max)
display `rmax'
expand `rmax'

How do I delete observations with no data in Stata?

I have data with IDs which may or may not have all values present. I want to delete ONLY the observations with no data in them; if there are observations with even one value, I want to retain them. Eg, if my data set is:
ID val1 val2 val3 val4
1 23 . 24 75
2 . . . .
3 45 45 70 9
I want to drop only ID 2 as it is the only one with no data -- just an ID.
I have tried Statalist and Google but couldn't find anything relevant.
This will also work with strings as long as they are empty:
ds id*, not
egen num_nonmiss = rownonmiss(`r(varlist)'), strok
drop if num_nonmiss == 0
This gets a list of variables that are not the id and drops any observations that only have the id.
Brian Albert Monroe is quite correct that anyone using dropmiss (SJ) needs to install it first. As there is interest in varying ways of solving this problem, I will add another.
foreach v of var val* {
qui count if missing(`v')
if r(N) == _N local todrop `todrop' `v'
}
if "`todrop'" != "" drop `todrop'
Although it should be a comment under Brian's answer, I will add here a comment here as (a) this format is more suited for showing code (b) the comment follows from my code above. I agree that unab is a useful command and have often commended it in public. Here, however, it is unnecessary as Brian's loops could easily start something like
foreach v of var * {
UPDATE September 2015: See http://www.statalist.org/forums/forum/general-stata-discussion/general/1308777-missings-now-available-from-ssc-new-program-for-managing-missings for information on missings, considered by the author of both to be an improvement on dropmiss. The syntax to drop observations if and only if all values are missing is missings dropobs.
Just another way to do it which helps you discover how flexible local macros are without installing anything extra to Stata. I rarely see code using locals storing commands or logical conditions, though it is often very useful.
// Loop through all variables to build a useful local
foreach vname of varlist _all {
// We don't want to include ID in our drop condition, so don't execute the remaining code if our loop is currently on ID
if "`vname'" == "ID" continue
// This local stores all the variable names except 'ID' and a logical condition that checks if it is missing
local dropper "`dropper' `vname' ==. &"
}
// Let's see all the observations which have missing data for all variables except for ID
// The '1==1' bit is a condition to deal with the last '&' in the `dropper' local, it is of course true.
list if `dropper' 1==1
// Now let's drop those variables
drop if `dropper' 1==1
// Now check they're all gone
list if `dropper' 1==1
// They are.
Now dropmiss may be convenient once you've downloaded and installed it, but if you are writing a do file to be used by someone else, unless they also have dropmiss installed, your code won't work on their machine.
With this approach, if you remove the lines of comments and the two unnecessary list commands, this is a fairly sparse 5 lines of code which will run with Stata out of the box.

How to store a mean value in a local macro and then save it in another file?

I have a Stata file file1.dta and one of the variables is income. I need to calculate average_income, assign it to a local macro, and store in a different Stata file, New.dta.
I have tried the following in a do file:
#delimit;
clear;
set mem 700m;
use file1.dta;
local average_income = mean income;
use New.dta;
gen avincome = average_income;
However, it does not work.
One way to do this would be the following:
#delimit;
clear;
set mem 700m;
use file1.dta;
quietly: summarize income;
local average_income = r(mean);
use New.dta;
gen avincome = `average_income';
This overlaps with your other post, namely How to retrieve data from multiple Stata files?. You don't say why you think
use file1.dta;
local average_income = mean income;
will work, but the second line is just fantasy syntax. There are various ways to calculate the mean of a variable, the most common being to use summarize and pick up the mean from r(mean).
You should probably delete this question: it serves no long-term purpose.

Correlation with one variable and a lot of others

In Stata, is there a quick way to show the correlation between a variable and a bunch of dummies. In my data I have an independent variable, goals_scored in a game, and a bunch of dummies for stadium played. How can I show the correlation between the goals_scored and i.stadium in one table, without getting the correlations between stadiums, which I do not care about.
Here's one way:
#delimit;
quietly tab stadium, gen(D); // create dummies
foreach var of varlist D* {;
quietly corr goals_scored `var';
di as text "`: variable label `var'': " as result r(rho);
};
drop D*; // get rid of dummies
cpcorr from SSC (install with ssc inst cpcorr) supports minimal correlation tables, i.e. only the correlations between one set and another set, without the others. But it's an old program (2001) and doesn't support factor variables directly. The indicator variables (a.k.a. dummy variables) would have to exist first.
If you store all of the stadium variables in a local, you would probably loop through them to pull the correlations.
1.
If all stadium variables are placed next to each other in the dataset:
foreach s of varlist stadium1-stadium150 {
// do whatever
}
2a.
If the stadium variables are not next to each other, use order to get there.
2b.
If the variable names follow a pattern, there might be another workaround.
3.
I would not use correlation for this. Depending on the distribution of goals, I would consider something else.