Generate variables with loop over pairs of variables - stata

I have data on quantities and Values for a set of countries, and currently the variable names are Q_US V_US Q_UK V_UK Q_France V_France and in that order: Quantity_country Value_country, etc.
For each country (US, UK, France, etc.) I want to generate a new variable that gives me the unit value. Manually I would create them as
gen unit_US = V_US/Q_US
gen unit_UK = V_UK/Q_UK
gen unit_France = V_France/Q_France
But I have 100+ countries, and it would be great to do this in a loop if possible.
Is there an easy way to do this?

Let's get a list of all the countries as you have used them in variable names.
unab where : V_*
local where " `where'"
local where : subinstr local where " V_" " ", all
The additional space is designed to ensure that the text removed is just the prefix V_ at the start of variable names. For another example of using unab, see this FAQ.
Check it worked:
display "`where'"
Now loop:
foreach c of local where {
gen unit_`c' = V_`c'/Q_`c'
}
I'd also consider reshape long.

Related

Looping over many names which don't have rules

When there are just few names, looping in Stata is easy.
Also, when there is a rule as to how the names change (e.g. increment) I can do the following:
forval i = 1/5 {
...
}
However, there are cases where i have hundreds of names that I need to loop over, which don't have rules of increment.
For example:
48700 48900 48999 49020 49180 49340 ...
Is there some short-hand way of writing the loop?
Or do I just have to painstakingly list all of them?
The answer is it depends.
If these are part of variable names, you can do something like this:
clear
set obs 5
foreach var in 48700 48900 48999 49020 49180 49340 {
generate var`var' = runiform()
}
ds
var48700 var48900 var48999 var49020 var49180 var49340
ds var48*
var48700 var48900 var48999
local names `r(varlist)'
foreach var of local names {
display `var'
}
.41988069
.06420179
.36276805
If these are file names, a macro extended function can be handy:
dir, w
48700.rtf 48999.rtf 49180.rtf
48900.rtf 49020.rtf 49340.rtf
local list : dir . files "*"
display `list'
48700.rtf48900.rtf48999.rtf49020.rtf49180.rtf49340.rtf
local list : dir . files "48*"
display `list'
48700.rtf48900.rtf48999.rtf
foreach fil of local list {
display "`fil'"
}
48700.rtf
48900.rtf
48999.rtf
EDIT:
The above approaches are concerned with how to efficiently get all relevant names in a local macro.
If you already know the names and you merely want a cleaner way to write the loop (or want to re-use the names in several loops), you can simply assign these in a local macro yourself:
local names var48700 var48900 var48999 var49020 var49180 var49340
foreach var of local names {
display `var'
}
.41988069
.06420179
.36276805
.52763051
.16493952
.66403782
The local macro names will automatically expand during run time to include all the specified items.

How to separate Stata macro `varlist' with commas for using in mi( ) and inlist( )?

I want to store a list of variables in a macro and then call that macro inside a mi() statement. The original application is for a programme that uses data I cannot bring online for secrecy reasons, and which will include the following statement:
generate u = cond(mi(`vars'),., runiform(0,1))
The issue being that mi() requires comma separated variable names but vars is delimited by spaces.
I use the auto dataset and mark to illustrate my problem:
sysuse auto
local myvars foreign price
mark missing if mi(`myvars')
In this example, mi() asks for arguments separated by commas, Stata stops and complains that it cannot find a foreignprice variable. Is there a utility function that will insert the commas between the macro elements?
A direct answer to the question as set is to use the macro extended function subinstr to change spaces to commas:
sysuse auto
local myvars foreign price
local myvars : subinstr local myvars " " ",", all
mark missing if mi(`myvars')
If the aim is to create a marker variable that marks observations with any values missing on specified variables, then there are other alternative ways, most of which don't need any fiddling with separators in a list. This doesn't purport to be a complete set.
A1.
regress foreign price
gen missing = !e(sample)
A2.
egen missing = rowmiss(foreign price)
replace missing = missing > 0
A3.
local myvars foreign price
local myvars : subinstr local myvars " " ",", all
gen missing = missing(`myvars')
A4.
gen missing = 0
quietly foreach v in foreign price {
replace missing = 1 if missing(`v')
}
A5.
mark missing
markout missing foreign price
replace missing = !missing
EDIT In the edited question there is reference to this within a program:
generate u = cond(mi(`vars'),., runiform(0,1))
I wouldn't do that, even with the macro edited to include commas too, although any issue is more one of personal taste.
marksample touse
markout `vars'
generate u = runiform(0,1) if `touse'
It's likely that the indicator variable so produced is needed, or at least useful, somewhere else in the same program.

Stata macro list uniq extended function (remove duplicates from a macro var list)

This question has been edited to add sample data and clean-up (hopefully) some unnecessary steps per feedback.
I am starting with longitudinal data in wide format. I need to subset, reshape, and perform summary steps for multiple different chunks of data. I want to create macro variables with varlists needed for reshaping and other repetitive steps in wide and long format. The variables being reshaped follow a consistent naming pattern of (prefix)_(name)_#. There are also variables following the same pattern that do not need to be reshaped, and variables that are time-invariant and follow other naming conventions. To generate sample data:
set obs 1
foreach t in 0 6 15 18 21 {
foreach w in score postint postintc constime starttime {
gen p_`w'_`t' = 1
}
}
gen p_miles_0 = 1
gen p_hea_0 = 1
gen cons_age = 1
ds
I want to create two macro vars 1) wide_varlist for wide format data where the variables end in a number and 2) uniquestubs for long format data where the macro list contains just the stubs. I am having trouble using the macro list extended function "uniq" to generate #2 here. Here is my code so far. My full varlists are actually much longer.
Steps to create macro with wide format varlist:
/* create varlist for wide format data a time point 0,6,15,18,21 */
ds p_score_* p_postint_* p_postintc_* p_constime_* p_starttime_*
di "`r(varlist)'"
global wide_varlist `r(varlist)'
Start steps to create macro with long format varlist:
/*copy in wide format varlist*/
global stubs "$wide_varlist"
/*remove # - this results in a macro with 5 dups of same stub*/
foreach mo of numlist 0,6,15,18,21{
global stubs : subinstr global stubs "`mo'" "", all
}
/*keep unique stubs*/
global uniquestubs : list uniq stubs
Everything above works as I intend until global uniquestubs : list uniq stubs, which doesn't create the macro uniquestubs at all.
My situation seems similar to this this question but the same solution didn't work for me.
Any thoughts? Appreciate the help.
It's a bit difficult to follow what you are trying to do (a) without a reproducible example (b) because much of your code is just copying the same varlist to different places, which is a distraction.
We can fix (a) by creating a toy dataset:
clear
set obs 1
foreach t in 0 6 15 18 21 {
foreach w in score postint postintc constime starttime {
gen p_`w'_`t' = 1
}
}
ds
p_score_0 p_score_6 p_score_15 p_score_18 p_score_21
p_postint_0 p_postint_6 p_postint_15 p_postint_18 p_postint_21
p_postintc_0 p_postintc_6 p_postintc~5 p_postintc~8 p_postintc~1
p_constime_0 p_constime_6 p_constim~15 p_constim~18 p_constim~21
p_starttim~0 p_starttim~6 p_startti~15 p_startti~18 p_startti~21
Now the main difficulty seems to be that you want stubs for a reshape long. This code suffices for the toy dataset. There is no need to scan yet more variable names with the same information. If you don't have all variables for all time points, you may need more complicated code.
unab stubs: p_*_0
local stubs : subinstr local stubs "0" "", all
di "`stubs'"
p_score_ p_postint_ p_postintc_ p_constime_ p_starttime_
I don't understand the enthusiasm for globals here, but, programming taste aside, you can put the last result in a global quite easily.

Stata: compare two datasets and drop different variables

I have two large datasets (more than 1000 variables in each), one of which has all the variables of the second, plus additional variables. I would like to get a list of all these additional variables, and then drop them and append one dataset to another. I have tried the command dta_equal, but got the same problem found here: http://www.stata.com/statalist/archive/2011-08/msg00308.html
I guess append, keep() cannot realize what I want to do directly, i.e., cannot append dataset while drop additional variables since I have to manually type in variables one by one in the keep() option, which is not realistic given my large dataset.
Are there any ways to deal with this?
There are several Stata commands that can be useful here.
The unab command is used in the first example to make a list of variable in the dataset with fewer variables. The second and third example use the describe command to obtain the list of variables in a dataset not currently in memory.
The final part the the example shows how to use extended macro list functions to obtain a list of common variables and the set of variables not common to both datasets.
* simulate 2 datasets, one has more variables than the other
sysuse auto, clear
save "data1.dta", replace
gen x = _n
gen y = -_n
save "data2.dta", replace
* example 1: drop after append
use "data1.dta", clear
unab vcommon : *
gen source = 1
append using "data2.dta"
replace source = 2 if mi(source)
keep `vcommon' source
* example 2: drop first then append
clear
describe using "data1.dta", varlist short
local vcommon `r(varlist)'
use `vcommon' using "data2.dta", clear
gen source = 2
append using "data1.dta"
replace source = 1 if mi(source)
* example 3: append and keep on the fly
use "data1.dta", clear
unab vcommon : *
gen source = 1
append using "data2.dta", keep(`vcommon')
replace source = 2 if mi(source)
* use extended macro list functions to manipulate variable list
clear
describe using "data1.dta", varlist short
local vlist1 `r(varlist)'
describe using "data2.dta", varlist short
local vlist2 `r(varlist)'
local vcommon : list vlist1 & vlist2
local vinonly1 : list vlist1 - vlist2
local vinonly2 : list vlist2 - vlist1
dis "common variables = `vcommon'"
dis "variables in data1 not found in data2 = `vinonly1'"
dis "variables in data2 not found in data1 = `vinonly2'"

Extract the mean from svy mean result in Stata

I am able to extract the mean into a matrix as follows:
svy: mean age, over(villageid)
matrix villagemean = e(b)'
clear
svmat village
However, I also want to merge this mean back to the villageid. My current thinking is to extract the rownames of the matrix villagemean like so:
local names : rownames villagemean
Then try to turn this macro names into variable
foreach v in names {
gen `v' = "``v''"
}
However, the variable names is empty. What did I do wrong? Since a lot of this is copied from Stata mailing list, I particularly don't understand the meaning of local names : rownames villagemean.
It's not completely clear to me what you want, but I think this might be it:
clear
set more off
*----- example data -----
webuse nhanes2f
svyset [pweight=finalwgt]
svy: mean zinc, over(sex)
matrix eb = e(b)
*----- what you want -----
levelsof sex, local(levsex)
local wc: word count `levsex'
gen avgsex = .
forvalues i = 1/`wc' {
replace avgsex = eb[1,`i'] if sex == `:word `i' of `levsex''
}
list sex zinc avgsex in 1/10
I make use of two extended macro functions:
local wc: word count `levsex'
and
`:word `i' of `levsex''
The first one returns the number of words in a string; the second returns the nth token of a string. The help entry for extended macro functions is help extended_fcn. Better yet, read the manuals, starting with: [U] 18.3 Macros. You will see there (18.3.8) that I use an abbreviated form.
Some notes on your original post
Your loop doesn't do what you intend (although again, not crystal clear to me) because you are supplying a list (with one element: the text name). You can see it running and comparing:
local names 1 2 3
foreach v in names {
display "`v'"
}
foreach v in `names' {
display "`v'"
}
foreach v of local names {
display "`v'"
}
You need to read the corresponding help files to set that right.
As for the question in your original post, : rownames is another extended macro function but for matrices. See help matrix, #11.
My impression is that for the kind of things you are trying to achieve, you need to dig deeper into the manuals. Furthermore, If you have not read the initial chapters of the Stata User's Guide, then you must do so.