Like many others, I often loop through variables in Stata, running some estimation command and then extracting the results to a variable created to hold them. This is simple when the variables are numbered sequentially or in some pattern (e.g. even numbers in a set). As an example:
sysuse auto
gen var1 = uniform()
gen var2 = uniform()
gen var3 = uniform()
*Create variables to hold results
gen str4 varname=""
gen results=.
*Loop through three variables
foreach i of numlist 1/3{
replace varname="var`i'" in `i'
sum var`i'
replace results=r(mean) in `i'
}
However, I often want to do something similar when the variables are not numeric and/or are not in an easy-to-handle order. Let's say I wanted to do the same thing for price, mpg, weight and length in the auto dataset. If we set up the for-loop as:
sysuse auto
gen str4 varname=""
gen results=.
foreach var of varlist price mpg weight length{
sum `var'
*Place values, in order, in rows?
}
then we need some way to understand that price is the first variable in the list, so its results should go in row 1 (or its name in row 1, or whatever we want to do).
This must be possible, but I would appreciate some suggestions. A clean/non-hackish way would be ideal, as I will be doing this a lot.
You can use a local counter that you start at 1 and increment at the end of each iteration:
sysuse auto, clear
gen varname=""
gen mean=.
local i=1
foreach var of varlist price mpg weight {
quietly sum `var'
replace mean = r(mean) in `i'
replace varname = "`var'" in `i'
local ++i
}
You could also do this. It's unlikely to seem as direct or simple as the standard technique explained by #Dimitriy V. Masterov, but it has its uses.
sysuse auto, clear
gen varname = ""
gen mean = .
local nvars : word count price mpg weight
tokenize "price mpg weight"
quietly forval j = 1/`nvars' {
sum ``j'', meanonly
replace mean = r(mean) in `j'
replace varname = "``j''" in `j'
}
The general points are
Words are separated by spaces, except that double quotation marks and compound double quotation marks bind tighter. Thus a, b and c are unsurprisingly the words in a b c but there are just two words in Stata "is great"
You can count how many objects you are looping over. It is the number of words in a string.
Applying tokenize to an argument string maps the separate words of that argument to local macros named 1, 2 and so forth. The nested macro references that is likely to imply are interpreted just as you would guess from elementary algebra: the innermost argument is evaluated first.
For more complicated problems, including the unpacking of a varlist, check out also unab.
Related
In Stata's auto data the following command creates all missing values: why?
bysort mpg: egen n1 = mean(price) if rep78[_n]!=rep78
For example take the 14 mpg group:
price mpg rep78
11385 14 3
14500 14 2
6303 14 4
12990 14
5379 14 4
13466 14 3
I expected that n1 for the first row will be mean(14500,6303,12990,5379). Basically I want the mean after excluding the first and last rows because for them we have rep78[_n]==rep78 (equals 3). But instead, I get all missing values.
The subscript [_n] is harmless but vacuous here as referring to the current observation. So the condition is just equivalent to rep78 != rep78 or rep78[_n] != rep78[_n] -- which is never true and so no observations satisfy the condition and the mean is returned as missing.
You're hoping or imagining that the prefix by: implies comparisons within a group, but at best that works only if subscripts are explicit and different.
This works for your problem:
sysuse auto, clear
gen wanted = .
quietly forval i = 1/`=_N' {
su price if mpg == mpg[`i'] & rep78 != rep78[`i'], meanonly
replace wanted = r(mean) in `i'
}
There may be a way to do this with rangestat or rangerun from SSC, or otherwise, in which case a better solution may follow.
EDIT: The OP's code suggestion in comments
bysort mpg rep78: egen sum_m_r_price = sum(price)
bysort mpg rep78: egen count_m_r_price = count(price)
bysort mpg: egen sum_r_price = sum(price)
bysort mpg: egen count_r_price = count(price)
gen b_wanted = ( sum_r_price-sum_m_r_price)/ (count_r_price-count_m_r_price)
appears equivalent.
In reverse, this should be faster than that:
rangestat (sum) sum2=price (count) count2=price, i(rep78 0 0) by(mpg)
rangestat (sum) sum1=price (count) count1=price, i(mpg 0 0)
gen double wanted = (sum1 - sum2) / (count1 - count2)
Previous Thread: How to make a new observation in Stata that has the average of all observations above it for all variables, but also ignore set observations?
The code is
local last = _N - 1
foreach v in `r(varlist)' {
su `v' in 1/`last', meanonly
replace `v' = r(mean) in L
}
How do I tell Stata to ignore rows 3, 62, and 99 when calculating the average?
I don't follow why you exclude the last row and why you use the replace. However, to exclude individual rows you could use an if statement, inlist() and _n.
Example:
if !inlist(_n, 1) would exclude first row.
In your case the code should be as follows (excludes rows 3, 62, 99 and the last one (_N)):
local last = _N - 1
foreach v in `r(varlist)' {
su `v' if !inlist(_n, 3, 62, 99, _N), meanonly
replace `v' = r(mean) in L
}
I have three variables varA, varB and varC.
I attempted to first replace "missing" with NA in all three variables then add a label to all three variables.
First I replaced "missing" with NA:
local mylist1 varA-varC
foreach v1 of varlist `mylist1' {
replace `v1'="NA" if `v1' =="missing"
}
Now if I want to call the list again to add the same label to all three variables:
foreach v1 of varlist `mylist1' {
label var `v1' "testvaraible"
}
but I will get an error message saying :
varlist required
Could anyone explain why I can't recall the list?
For your first example, this would be legal syntax if the variables concerned were all string:
local mylist1 varA-varC
foreach v of varlist `mylist1' {
replace `v' = "NA" if `v' == "missing"
}
Notice the different punctuation for referring to a local macro (different left and right quotation marks) and the difference in placing braces.
It is difficult even to work out what you want in your second example, but the loop is over differing values of a local macro v which you never refer to inside the loop. Also, depending on the definition of the unspecified local macro testvaraible [sic], it is still puzzling why you would label three variables identically.
You may need to be much more explicit about your data and exactly what you want if this does not answer the question. In particular, we can't see definitions for local macros v1 and testvaraible.
I came to this discussion after the edits made in response to the discussion around the previous answer. At this point, copying the code as it now stands in the original post, the problem has apparently been corrected. Hate to post this as an answer, but it's apparently too long for a comment.
. set obs 1
obs was 0, now 1
. generate str8 varA = "a"
. generate str8 varB = "missing"
. generate str8 varC = "c"
. local mylist1 varA-varC
. foreach v1 of varlist `mylist1' {
2. replace `v1'="NA" if `v1' =="missing"
3. }
(0 real changes made)
(1 real change made)
(0 real changes made)
. foreach v1 of varlist `mylist1' {
2. label var `v1' "testvariable"
3. }
. list, clean noobs
varA varB varC
a NA c
. describe
Contains data
obs: 1
vars: 3
size: 24
------------------------------------------------------------------------------------------------
storage display value
variable name type format label variable label
------------------------------------------------------------------------------------------------
varA str8 %9s testvariable
varB str8 %9s testvariable
varC str8 %9s testvariable
------------------------------------------------------------------------------------------------
Sorted by:
Note: dataset has changed since last saved
.
How could I create a variable by dividing it by an IQR? I have done it through a long way as follows.
Sample data and code is the following:
use http://www.ats.ucla.edu/stat/stata/notes/hsb2, clear
foreach var of varlist read-socst {
egen `var'75 = pctile(`var'), p(75)
egen `var'25 = pctile(`var'), p(25)
gen `var'q =`var'75 - `var'25
drop `var'75 `var'25
}
gen readI = read/readq
gen sciI = science/scienceq
The simplest way is just to use summarize results directly:
sysuse auto, clear
quietly foreach v of var price-foreign {
su `v', detail
gen `v'q = `v' / (r(p75) - r(p25))
}
The egen route is overkill if it means creating new variables for each original variable, just to hold the quartiles or the IQR as repeated constants. But egen comes into its own when you want to do this by groups:
bysort foreign: egen mpg_upq = pctile(mpg), p(75)
by foreign: egen mpg_loq = pctile(mpg), p(25)
gen mpg_Q = mpg / (mpg_upq - mpg_loq)
Note that the IQR can be 0, and will often be 0 for indicator variables.
I have 100 dta files. I have a list of variables that I need to keep and save temporary copies on the fly. Some variables may or may not exist in a certain dta.
I need Stata to keep all variables that exist in a dta and ignore those that do not exist.
The following code has wrong syntax, but it could serve as a good pseudo code to give one a general idea of what should be done:
forval j = 1/100 {
use data`j'
local myVarList =""
foreach i of varlist var1 var2 var3 var4 var5 var6 var7 var8 {
capture sum `i'
if _rc = 0 {
`myVarList' = `myVarList'" "`i'
}
}
keep `myVarList'
save temporaryData`j'
}
Is there any way to do this?
There are many issues with your code. Here's one way to do the inner loop.
/* one fake dataset */
set obs 5
gen var1 = 1
gen var2 = 2
gen var3 = "c"
gen z = 35
ds
/* keep part */
local masterlist "var1 var2"
local keeplist = ""
foreach i of local masterlist {
capture confirm variable `i'
if !_rc {
local keeplist "`keeplist' `i'"
}
}
keep `keeplist'
The key part is that you can't foreach i of varlist phantomvar, since Stata will check the existence and error out. Similarly, putting the local name in special quotes will evaluate it, but you're trying to redefine. You may find set trace on a useful feature in debugging.
This is somewhat better code:
unab allvars: _all
local masterlist "var1 var2 phantomvar"
local keeplist: list allvars & masterlist
keep `keeplist'