How do I loop over part of a variable name? - stata

I need to use a local macro to loop over part of a variable name in Stata.
Here is what I tried to do:
local phth mep mibp mbp
tab lod_`phth'_BL
Stata will not recognize the entire variable name.
variable lod_mep not found
r(111);
If I remove the underscore after the `phth' it still does not recognize anything after the macro name.
I want to avoid using a complicated foreach loop.
Is there any way this can be done just using the simple macro?
Thanks!

Your request is a bit confusing. First, this is precisely the purpose of a loop, and second, loops in Stata are (at the "introductory level") quite simple. The following example is a bit nonsensical (and given the structure, there are easier ways of going about this), but should convey the basic idea.
// set up a similar variable name structure
sysuse auto , clear
rename (price mpg weight length) ///
(pref_base1_suff pref_base2_suff pref_base3_suff pref_base4_suff)
// define a local macro to hold the elements to loop over
local varbases = "base1 base2 base3 base4"
// refer to the items of the local macro in a loop
foreach b of local varbases {
summ pref_`b'_suff
}
See help foreach for the syntax of foreach. In particular, note that the structure employed above may not even be required due to Stata's varlist structure (see help varlist). For example, continuing with the code above:
foreach v of varlist pref_base?_suff {
summ `v'
}
The wildcard ? takes the place of one character. * could be used for more flexibility. However, if your variables are not as easily identifiable using the pattern matching allowed by varlist, a loop as in the first example is simple enough -- four very short lines of code.
Postscript
Upon further reflection (sometimes the structure of the question anchors a certain method, when an alternative approach is more straightforward), searching the help files for information on the tabulate command (help tabulate) will direct you to the following syntax: tab1 varlist [if] [in] [weight] [, tab1_options]
Given the discussion above about the use of varlists, you can simply code
tab1 lod_m*_BL
assuming, of course, that there are no other variables matching the pattern for which you do not want to report a frequency table. Alternatively,
tab1 lod_mep_BL lod_mibp_BL lod_mbp_BL
is not much longer and does the trick, albeit without the use of any sort of wildcard or macro substitution.

Related

Drop variables if mean is 0 for loop

I want to drop all variables that have a mean of 0. The code I'm using is
foreach var of varlist _all {
drop 'var' if mean 'var'==0
}
and I'm getting the error message mean not found.
How can I get around this?
There are several reasons why that won't work. First, consider this suggested solution:
foreach var of varlist _all {
su `var', meanonly
if r(mean) == 0 drop `var'
}
This will work with string variables too, as the request to summarize a string variable isn't illegal, and the mean will be returned as missing.
What's wrong with your code?
Problem 1. The sequence
mean `var' == 0
is just fantasy syntax. There isn't a mean function that you can apply in this context and if there were, the syntax would be different.
Problem 2. You can drop observations using an if qualifier or you can drop variables but you can't mix syntaxes. It's hard even to know what the mix would mean, but it's illegal any way. The deeper problem here is confusing the if command and the if qualifier. See also the help for drop.
Problem 3. As typed here you have used matching quotation marks for local macro references. It's possible to guess that you really used left and right quotation marks as otherwise you would have got a different error message. Nevertheless, your code as typed would not work for that reason also.
A wider comment is a reminder that a mean of zero doesn't imply that all values of zero. If you wanted just to drop variables with all values zero, then findname (Stata Journal) allows that
findname, all(# == 0)
drop `r(varlist)'
and there are extensions to allow missing values too.

Wildcard usage for string variables

Pretty straight-forward question. Can you use wildcard functions for strings in Stata? I haven't been able to find a suitable workaround.
Here's the code I am trying to use:
gen newvar= "output" if reg_id == "input*"
I have different values of input, i.e. input12, input18, input28292, etc. The wildcard selection does not appear to be working.
This won't work as you want. So far as Stata is concerned here, "*" is a literal character you are looking for and won't find.
Wildcard syntax like this applies when a variable list is expected, i.e. it can apply to variable names, but to use it with string values you need a dedicated function.
In your example, all cases begin with the string input, so this would work:
gen newvar = "output" if substr(reg_id, 1, 5) == "input"
Stata also supports pattern matching and regular expressions.
gen newvar = "output" if strmatch(reg_id, "input*")
is in fact the simplest way to get what you ask.
All documented:
help string functions
One simple solution:
gen newvar = "output" if strmatch(reg_id, "input*")
see help strmatch for usage.
Note also that you can use regexm in place of strmatch.

Stata foreach loop

I am trying to execute a Stata foreach loop, but I keep encountering an error that the variable does not exist even though when I look in my data editor it does exist, and I am capable of looking at it using list some_column. This is what I am doing:
foreach x of varlist some_column1 some_column2{
list x
}
Could someone help me identify the problem?
You're asking Stata to list the variable x, which clearly you don't have. What you really want is to list the contents of the local macro x. To do that, enclose it within appropriate quote marks.
clear all
set more off
sysuse auto
foreach x of varlist weight mpg {
list `x' in 1/10
}
See the manual: [P] macro. help foreach is filled with examples.

how do I loop through file names in stata

1) Is it possible to create a vector of strings in stata? 2) If yes, is it then possible to loop through the elements in this vector, performing commands on each element?
To create a single string in stata I know you do this:
local x = "a string"
But I have about 200 data files I need to loop through, and they are not conveniently named with consecutive suffixes like "_2000" "_2001" "_2002" etc. In fact there is no rhyme or reason to the file names, but I do have a list of them which I could easily cut and paste into a string vector, and then call the elements of this vector one by one, as one might do in MATLAB.
Is there a way to do this in stata?
On top of Keith's answer: you can also get the list of files in a directory with
local myfilelist : dir . files "*.dta"
or more generally
local theirfilelist : dir <directory name> files <file mask>
See help extended_fcn.
Sure -- You just create a list using a typical local call. If you don't put quotes around the whole thing your lists can be really long.
local mylist aaa bbb "cc c" dd ee ff
Then you just use foreach.
foreach filename of local mylist {
use `"`filename'"'
}
The double quotes (`" "') are used because one of the filenames has quotes around it because of the space. This is a touch faster than putting foreach filename in `mylist' { on the first line.
If you want to manipulate your list, see help macrolists.
Related questions have been asked >1 time on stackoverflow:
In Stata how do you assign a long list of variable names to a local macro?
Equivalent function of R's "%in%" for Stata
What many people might want the combination of the two as I did. Here it is:
* Create a local containing the list of files.
local myfilelist : dir "." files "*.dta"
* Or manually create the list by typing in the filenames.
local myfilelist "file1.dta" "file2.dta" "file3.dta"
* Then loop through them as you need.
foreach filename of local myfilelist {
use "`filename'"
}
I hope that helps. Note that locals/macros are limited by 67,784 characters--watch out for this when you have a really long list of files or really long filenames.

transfer values from one variable to another in Stata

I have a problem at work: I have merged two datasets, and there is a number of variables which have the same content, but where an observation which has an value in the variable from dataset 1 have a missing-value in dataset 2. So I need to transfer the values from the one variable into the other one.
This is my best shot so far:
replace V23=1 if V232==1
replace V23=2 if V232==2
replace V23=3 if V232==3
replace V23=4 if V232==4
replace V23=8 if V232==8
replace V23=.u if V232==10 | V232==9
However, it is a tedious task to do that for 40+ variables - and since some of them are numerical variables, it becomes a a sisyphean task.
Here's a start:
foreach v of varlist v23 {
local w `v'2
replace `v' = `w' if missing(`v')
replace `v' = .u if `w' == 10 | `w' == 9
}
Notice how this solution relies on a lexical relationship among the variable names: it assumes the old variable "v23" is associated with the new variable "v232". You can make a list of such associations and use it, but this is inconvenient. It's probably easier to rename the variables, if necessary, to conform to such a convention, then run the replacement script, and then restore the desired names.
If you're unfamiliar with this kind of automation, read the help pages for macro and foreach.