I am trying to write a mini-program that simply takes a list of variables, and returns a sub-list that contains variables that are actually in the dataset.
I do this many times in a do file (not sequentially, so no loops), so it is easier to just have a quick program rather than effectively duplicate this code every time.
A simplified version of the program is below. The main issue appears to be in line 6. The first argument for the program should be the name of a local macro that contains variable names to compare with those in the dataset. So, for example, if the local macro is list1, the first argument of the program is list1, and I want to store a new local macro, vlist, which contains all the variables in list1.
But when I try to do this with:
``1''
the resulting local macro vlist just ends up being empty, while the local allvars is fine.
My program's code is the following:
clear
cap program drop lm
program define lm
* Create a new local, vlist, that is all the variables in the local macro identified in argument 1
local vlist ``1''
* Create a new local, allvars, that is all variables in teh dataset
qui ds, a
local allvars `r(varlist)'
* Display both macros, to illustrate that vlist is empty, while allvars contains all variables
di as text "vlist: " as result "`vlist'"
di as text "allvars: " as result "`allvars'" _newline
* Create a new local macro that is the intersection of the two lists (if it worked)
local `1'_inter : list vlist & allvars
* Display different messages depending on the outcome (e.g. if a list was created, or empty)
if missing("``1'_inter'") di as error "There are no common variables between `1' and the dataset."
else di as result "`1' intersection with varlist _all now stored as local: `1'_inter"
end
clear
input float(var1 var2 var3 var4 var5) // Input irrelevant data
. . . . .
end
/* Next, create a macro with a list of variables that are in the dataset
(e.g. var1 and var2), and even some that are not in teh data (var7). */
local list1 var1 var2 var7
* Execute the list-match program
lm list1
As you can see, the local vlist ends up being empty, so there is no intersection between the lists.
Any idea what I am doing wrong here?
I am sure it is the double local macro, but I am not sure how to fix it.
The macro in macro reference
``1''
is evaluated as follows. First, macro 1 is the first thing you typed as argument to the command, which in your example is list1, so substituting that Stata then sees
`list1'
and the entire command is now evaluated as
local vlist `list1'
which is then evaluated as follows. What is in the local macro list1? Precisely nothing, or an empty string, as no such local macro exists at that point in the program (and any other local macros created anywhere else are invisible, because that is what local means: local to the program space concerned). So next Stata sees
local vlist
whereby you simultaneously create a local macro and blank it out, thus destroying it.
The solution is that you intended to write
local vlist "`1'"
or perhaps just
args vlist
However, that is not the end of it. The command
lm list1
mentions the name of a local macro you have just created. But passing the name of a local macro to another program is futile, as the other program can't see the contents of the macro. You have to pass the contents.
Yet more: the program then needs to be revised as you may want to pass several names to it and they will only be regarded as a single argument if you bind them in double quotes.
FWIW, the problem you describe seems to be that solved by isvar from SSC. Here it is:
*! NJC 1.0.0 20 Sept 2005
program isvar, rclass
version 8
syntax anything
foreach v of local anything {
capture unab V : `v'
if _rc == 0 local varlist `varlist' `V'
else local badlist `badlist' `v'
}
di
if "`varlist'" != "" {
local n : word count `varlist'
local what = plural(`n', "variable")
di as txt "{p}`what': " as res "`varlist'{p_end}"
return local varlist "`varlist'"
}
if "`badlist'" != "" {
local n : word count `badlist'
local what = plural(`n', "not variable")
di as txt "{p}`what': " as res "`badlist'{p_end}"
return local badlist "`badlist'"
}
end
I am fond of ds but I wouldn't use it here. You could be calling up thousands of variable names just to check whether a few names are in a dataset.
A final comment on style
local list1 var1 var2 var7
lm list1
might as well be
lm var1 var2 var7
as there is no value in putting the names into a bag (local macro) and using the name of the bag when you can just pass the names directly. As said, this won't work with your program until you fix it, but it would work with isvar.
Related
When there are just few names, looping in Stata is easy.
Also, when there is a rule as to how the names change (e.g. increment) I can do the following:
forval i = 1/5 {
...
}
However, there are cases where i have hundreds of names that I need to loop over, which don't have rules of increment.
For example:
48700 48900 48999 49020 49180 49340 ...
Is there some short-hand way of writing the loop?
Or do I just have to painstakingly list all of them?
The answer is it depends.
If these are part of variable names, you can do something like this:
clear
set obs 5
foreach var in 48700 48900 48999 49020 49180 49340 {
generate var`var' = runiform()
}
ds
var48700 var48900 var48999 var49020 var49180 var49340
ds var48*
var48700 var48900 var48999
local names `r(varlist)'
foreach var of local names {
display `var'
}
.41988069
.06420179
.36276805
If these are file names, a macro extended function can be handy:
dir, w
48700.rtf 48999.rtf 49180.rtf
48900.rtf 49020.rtf 49340.rtf
local list : dir . files "*"
display `list'
48700.rtf48900.rtf48999.rtf49020.rtf49180.rtf49340.rtf
local list : dir . files "48*"
display `list'
48700.rtf48900.rtf48999.rtf
foreach fil of local list {
display "`fil'"
}
48700.rtf
48900.rtf
48999.rtf
EDIT:
The above approaches are concerned with how to efficiently get all relevant names in a local macro.
If you already know the names and you merely want a cleaner way to write the loop (or want to re-use the names in several loops), you can simply assign these in a local macro yourself:
local names var48700 var48900 var48999 var49020 var49180 var49340
foreach var of local names {
display `var'
}
.41988069
.06420179
.36276805
.52763051
.16493952
.66403782
The local macro names will automatically expand during run time to include all the specified items.
I want to store a list of variables in a macro and then call that macro inside a mi() statement. The original application is for a programme that uses data I cannot bring online for secrecy reasons, and which will include the following statement:
generate u = cond(mi(`vars'),., runiform(0,1))
The issue being that mi() requires comma separated variable names but vars is delimited by spaces.
I use the auto dataset and mark to illustrate my problem:
sysuse auto
local myvars foreign price
mark missing if mi(`myvars')
In this example, mi() asks for arguments separated by commas, Stata stops and complains that it cannot find a foreignprice variable. Is there a utility function that will insert the commas between the macro elements?
A direct answer to the question as set is to use the macro extended function subinstr to change spaces to commas:
sysuse auto
local myvars foreign price
local myvars : subinstr local myvars " " ",", all
mark missing if mi(`myvars')
If the aim is to create a marker variable that marks observations with any values missing on specified variables, then there are other alternative ways, most of which don't need any fiddling with separators in a list. This doesn't purport to be a complete set.
A1.
regress foreign price
gen missing = !e(sample)
A2.
egen missing = rowmiss(foreign price)
replace missing = missing > 0
A3.
local myvars foreign price
local myvars : subinstr local myvars " " ",", all
gen missing = missing(`myvars')
A4.
gen missing = 0
quietly foreach v in foreign price {
replace missing = 1 if missing(`v')
}
A5.
mark missing
markout missing foreign price
replace missing = !missing
EDIT In the edited question there is reference to this within a program:
generate u = cond(mi(`vars'),., runiform(0,1))
I wouldn't do that, even with the macro edited to include commas too, although any issue is more one of personal taste.
marksample touse
markout `vars'
generate u = runiform(0,1) if `touse'
It's likely that the indicator variable so produced is needed, or at least useful, somewhere else in the same program.
I have two large datasets (more than 1000 variables in each), one of which has all the variables of the second, plus additional variables. I would like to get a list of all these additional variables, and then drop them and append one dataset to another. I have tried the command dta_equal, but got the same problem found here: http://www.stata.com/statalist/archive/2011-08/msg00308.html
I guess append, keep() cannot realize what I want to do directly, i.e., cannot append dataset while drop additional variables since I have to manually type in variables one by one in the keep() option, which is not realistic given my large dataset.
Are there any ways to deal with this?
There are several Stata commands that can be useful here.
The unab command is used in the first example to make a list of variable in the dataset with fewer variables. The second and third example use the describe command to obtain the list of variables in a dataset not currently in memory.
The final part the the example shows how to use extended macro list functions to obtain a list of common variables and the set of variables not common to both datasets.
* simulate 2 datasets, one has more variables than the other
sysuse auto, clear
save "data1.dta", replace
gen x = _n
gen y = -_n
save "data2.dta", replace
* example 1: drop after append
use "data1.dta", clear
unab vcommon : *
gen source = 1
append using "data2.dta"
replace source = 2 if mi(source)
keep `vcommon' source
* example 2: drop first then append
clear
describe using "data1.dta", varlist short
local vcommon `r(varlist)'
use `vcommon' using "data2.dta", clear
gen source = 2
append using "data1.dta"
replace source = 1 if mi(source)
* example 3: append and keep on the fly
use "data1.dta", clear
unab vcommon : *
gen source = 1
append using "data2.dta", keep(`vcommon')
replace source = 2 if mi(source)
* use extended macro list functions to manipulate variable list
clear
describe using "data1.dta", varlist short
local vlist1 `r(varlist)'
describe using "data2.dta", varlist short
local vlist2 `r(varlist)'
local vcommon : list vlist1 & vlist2
local vinonly1 : list vlist1 - vlist2
local vinonly2 : list vlist2 - vlist1
dis "common variables = `vcommon'"
dis "variables in data1 not found in data2 = `vinonly1'"
dis "variables in data2 not found in data1 = `vinonly2'"
I have a set of variables the list of which I have saved in a global macro so that I can use them in a function
global inlist_cond "amz2002ras_clss, amz2003ras_clss, amz2004ras_clss, amz2005ras_clss, amz2006ras_clss, amz2007ras_clss, amz2008ras_clss, amz2009ras_clss, amz2010ras_clss, amz2011ras_clss"
The reason why they are saved in a macro is because the list will be in a loop and its content will change depending on the year.
What I need to do is to generate a dummy variable so that water_dummy == 1 if any of the variables in the macro list has the WATER classification. In Stata, I need to write
gen water_dummy = inlist("WATER", "$inlist_cond")
, which--ideally--should translate to
gen water_dummy = inlist("WATER", amz2002ras_clss, amz2003ras_clss, amz2004ras_clss, amz2005ras_clss, amz2006ras_clss, amz2007ras_clss, amz2008ras_clss, amz2009ras_clss, amz2010ras_clss, amz2011ras_clss)
But this did not work---the code executed without any errors but the dummy variable only contained 0s. I know that it is possible to invoke macros inside functions in Stata, but I have never tried it when the macro contains a whole list of conditions. Any thoughts?
With a literal string specified, which the double quotes in the generate statement insist on, then you are comparing text with text and the comparison is not with the data at all.
. clear
. set obs 1
number of observations (_N) was 0, now 1
. gen a = "water"
. gen b = "wine"
. gen c = "beer"
. global myvars "a,b,c"
. gen found1 = inlist("water", "$myvars")
. gen found2 = inlist("water", $myvars)
. list
+---------------------------------------+
| a b c found1 found2 |
|---------------------------------------|
1. | water wine beer 0 1 |
+---------------------------------------+
The first comparison is equivalent to
. di inlist("water", "a,b,c")
0
which finds no match, as "water" is not matched by the (single!) other argument.
Macro references are certainly allowed within function or command calls: as each macro name is replaced by its contents before the syntax is checked, the function or command never even knows that a macro reference was ever used.
As #Aspen Chen concisely points out, omitting the double quotes gives what you want so long as the inlist() syntax remains legal.
If your data structure is something like in the following example, you can try the egen function incss, from egenmore (ssc install egenmore):
clear
set more off
input ///
str15(amz2009 amz2010)
"water" "juice"
"milk" "water"
"lemonade" "wine"
"water & beer" "tea"
end
list
egen watindic = incss(amz*), sub(water)
list
Be aware it searches for substrings (see the result for the last example observation).
A solution with a loop achieving different results is:
gen watindic2 = 0
forvalues i = 2009/2010 {
replace watindic2 = 1 if amz`i' == "water"
}
list
Another solution involves reshape, but I'll leave it at that.
Suppose in Stata I wish to define a program:
capture program drop myprg
program define myprg
syntax varlist
foreach var of varlist `varlist' {
disp "`var'"
}
end
I want my program to be able to accept both names of variables that exist in my dataset and names of non-existent variables. If the variable exists, it displays the name. Otherwise, it does nothing.
Suppose my dataset has two variables: age1 and age2. The current output is:
. myprg age1
age1
. myprg age*
age1
age2
. myprg varThatDoesntExist
variable varThatDoesntExist not found
r(111);
Instead, the desired output for the last command is:
. myprg varThatDoesntExist
.
How can I get this functionality?
See the help for syntax. The specification namelist generalises varlist to print out any name, existing and legal variable name or not.
program myprg
syntax namelist
foreach var of local namelist {
disp "`var'"
}
end
A variant requested after first posting of this question was to print actual variable names and to ignore anything else. For that you need to set up your own parsing. Again, see the help for syntax. You need something like
program myprg
version 8.2
syntax anything
local varlist
foreach thing of local anything {
capture unab Thing : `thing'
if _rc == 0 local varlist `varlist' `Thing'
}
foreach v of local varlist {
di `"`v'"'
}
end