I am trying to execute a Stata foreach loop, but I keep encountering an error that the variable does not exist even though when I look in my data editor it does exist, and I am capable of looking at it using list some_column. This is what I am doing:
foreach x of varlist some_column1 some_column2{
list x
}
Could someone help me identify the problem?
You're asking Stata to list the variable x, which clearly you don't have. What you really want is to list the contents of the local macro x. To do that, enclose it within appropriate quote marks.
clear all
set more off
sysuse auto
foreach x of varlist weight mpg {
list `x' in 1/10
}
See the manual: [P] macro. help foreach is filled with examples.
Related
I want to drop all variables that have a mean of 0. The code I'm using is
foreach var of varlist _all {
drop 'var' if mean 'var'==0
}
and I'm getting the error message mean not found.
How can I get around this?
There are several reasons why that won't work. First, consider this suggested solution:
foreach var of varlist _all {
su `var', meanonly
if r(mean) == 0 drop `var'
}
This will work with string variables too, as the request to summarize a string variable isn't illegal, and the mean will be returned as missing.
What's wrong with your code?
Problem 1. The sequence
mean `var' == 0
is just fantasy syntax. There isn't a mean function that you can apply in this context and if there were, the syntax would be different.
Problem 2. You can drop observations using an if qualifier or you can drop variables but you can't mix syntaxes. It's hard even to know what the mix would mean, but it's illegal any way. The deeper problem here is confusing the if command and the if qualifier. See also the help for drop.
Problem 3. As typed here you have used matching quotation marks for local macro references. It's possible to guess that you really used left and right quotation marks as otherwise you would have got a different error message. Nevertheless, your code as typed would not work for that reason also.
A wider comment is a reminder that a mean of zero doesn't imply that all values of zero. If you wanted just to drop variables with all values zero, then findname (Stata Journal) allows that
findname, all(# == 0)
drop `r(varlist)'
and there are extensions to allow missing values too.
I need to use a local macro to loop over part of a variable name in Stata.
Here is what I tried to do:
local phth mep mibp mbp
tab lod_`phth'_BL
Stata will not recognize the entire variable name.
variable lod_mep not found
r(111);
If I remove the underscore after the `phth' it still does not recognize anything after the macro name.
I want to avoid using a complicated foreach loop.
Is there any way this can be done just using the simple macro?
Thanks!
Your request is a bit confusing. First, this is precisely the purpose of a loop, and second, loops in Stata are (at the "introductory level") quite simple. The following example is a bit nonsensical (and given the structure, there are easier ways of going about this), but should convey the basic idea.
// set up a similar variable name structure
sysuse auto , clear
rename (price mpg weight length) ///
(pref_base1_suff pref_base2_suff pref_base3_suff pref_base4_suff)
// define a local macro to hold the elements to loop over
local varbases = "base1 base2 base3 base4"
// refer to the items of the local macro in a loop
foreach b of local varbases {
summ pref_`b'_suff
}
See help foreach for the syntax of foreach. In particular, note that the structure employed above may not even be required due to Stata's varlist structure (see help varlist). For example, continuing with the code above:
foreach v of varlist pref_base?_suff {
summ `v'
}
The wildcard ? takes the place of one character. * could be used for more flexibility. However, if your variables are not as easily identifiable using the pattern matching allowed by varlist, a loop as in the first example is simple enough -- four very short lines of code.
Postscript
Upon further reflection (sometimes the structure of the question anchors a certain method, when an alternative approach is more straightforward), searching the help files for information on the tabulate command (help tabulate) will direct you to the following syntax: tab1 varlist [if] [in] [weight] [, tab1_options]
Given the discussion above about the use of varlists, you can simply code
tab1 lod_m*_BL
assuming, of course, that there are no other variables matching the pattern for which you do not want to report a frequency table. Alternatively,
tab1 lod_mep_BL lod_mibp_BL lod_mbp_BL
is not much longer and does the trick, albeit without the use of any sort of wildcard or macro substitution.
I would like to enter groups of variables into a Stata command, but can't find a way to do so.
E.g. In a factor analysis, with a set of 41 variables, I would like to exclude the 5th, 33rd and 35th, but include the rest.
Should it be something like: factor x1-x4, x6-x32, x34, x36-41, factors(5) pcf
Your example calls up a factor analysis. Let''s keep with that. If your variables are indeed at least those named out of x1 through x41 then
factor x1-x4 x6-x32 x34 x36-x41
could be legal. Note that (1) the commas are not included; (2) the last varlist was corrected, as x36-41 could never be a legal varlist (as 41 could never be a legal varname); and (3) when two or more variable names are joined with a hyphen, here x6-x32 and x36-x41, such a varlist indicates a block of variables in the current dataset order, not necessarily all variables whose names begin with x with implied suffixes, e.g. in 36(1)41. Thus x36-x41 could mean x36 frog toad x41 if you have variables with those names in that order.
The moral is simple: have your variables in an order that makes management and analysis simple and easy to think about. The order command provides the easiest way to change variable order programmatically.
The more general problem of removing variable j in order from an arbitrary varlist seems a little artificial, but here we go. Suppose we have a list of variable names (in fact any names) in a local macro. tokenize maps them one by one to local macros numbered 1 up, after which we can remove whatever we like. In the example below the output of mac li is edited to remove stuff irrelevant to this example, which could be quite a lot.
. local varlist foo bar bazz frog toad newt whatever
. tokenize `varlist'
. mac li
_7: whatever
_6: newt
_5: toad
_4: frog
_3: bazz
_2: bar
_1: foo
_varlist: foo bar bazz frog toad newt whatever
. foreach j in 1 3 5 {
2. local varlist : list varlist - `j'
}
. mac li
_varlist: bar frog newt whatever
_7: whatever
_6: newt
_5: toad
_4: frog
_3: bazz
_2: bar
_1: foo
For other methods of manipulating lists, see help macrolists.
I am wondering how to write the codes for finding the median for a variable in stata without using sort, egen, summarize. This is what I got so far:
capture program drop find_median
program find_median
local n = _N
gen ord=0
forvalues i= 0/`n' {
replace ord = `i' if [`1']> [`1'][_n-1] & [`1']> [`1'][_n+1]
}
end
find_median (the variables...)
The centile command gives you the median directly.
If you insist on recreating the wheel then you can use sort inside your program but leave the sortorder of the data unchanged after the program ends by adding the sortpreserve option to the line program find_median, see: http://www.stata.com/help.cgi?program. That should make the program much simpler and thus easier to debug.
I have a problem at work: I have merged two datasets, and there is a number of variables which have the same content, but where an observation which has an value in the variable from dataset 1 have a missing-value in dataset 2. So I need to transfer the values from the one variable into the other one.
This is my best shot so far:
replace V23=1 if V232==1
replace V23=2 if V232==2
replace V23=3 if V232==3
replace V23=4 if V232==4
replace V23=8 if V232==8
replace V23=.u if V232==10 | V232==9
However, it is a tedious task to do that for 40+ variables - and since some of them are numerical variables, it becomes a a sisyphean task.
Here's a start:
foreach v of varlist v23 {
local w `v'2
replace `v' = `w' if missing(`v')
replace `v' = .u if `w' == 10 | `w' == 9
}
Notice how this solution relies on a lexical relationship among the variable names: it assumes the old variable "v23" is associated with the new variable "v232". You can make a list of such associations and use it, but this is inconvenient. It's probably easier to rename the variables, if necessary, to conform to such a convention, then run the replacement script, and then restore the desired names.
If you're unfamiliar with this kind of automation, read the help pages for macro and foreach.