Appending a suffix to a list of variables - stata

Is there a way to quickly append the same suffix to a list of variables from monthly reports in Stata for example:
Variable list: report, total_enrollment
Append _jan22 so new variable list: report_jan22, total_enrollment_jan22
Variable list: report, total_enrollment
Append _feb22 so new variable list: report_feb22, total_enrollment_feb22
Have looked at the Rename * command -- advice on using this command?

Do you have multiple datasets or how can the variables have the same name in the first place?
One way to solve this would be to use a loop. Provide more detail if this does not work.
clear
set obs 10
gen report = 1
gen total_enrollment = 1
local vars report total_enrollment
foreach var of local vars {
rename `var' `var'_jan22
}

Related

Looping over many names which don't have rules

When there are just few names, looping in Stata is easy.
Also, when there is a rule as to how the names change (e.g. increment) I can do the following:
forval i = 1/5 {
...
}
However, there are cases where i have hundreds of names that I need to loop over, which don't have rules of increment.
For example:
48700 48900 48999 49020 49180 49340 ...
Is there some short-hand way of writing the loop?
Or do I just have to painstakingly list all of them?
The answer is it depends.
If these are part of variable names, you can do something like this:
clear
set obs 5
foreach var in 48700 48900 48999 49020 49180 49340 {
generate var`var' = runiform()
}
ds
var48700 var48900 var48999 var49020 var49180 var49340
ds var48*
var48700 var48900 var48999
local names `r(varlist)'
foreach var of local names {
display `var'
}
.41988069
.06420179
.36276805
If these are file names, a macro extended function can be handy:
dir, w
48700.rtf 48999.rtf 49180.rtf
48900.rtf 49020.rtf 49340.rtf
local list : dir . files "*"
display `list'
48700.rtf48900.rtf48999.rtf49020.rtf49180.rtf49340.rtf
local list : dir . files "48*"
display `list'
48700.rtf48900.rtf48999.rtf
foreach fil of local list {
display "`fil'"
}
48700.rtf
48900.rtf
48999.rtf
EDIT:
The above approaches are concerned with how to efficiently get all relevant names in a local macro.
If you already know the names and you merely want a cleaner way to write the loop (or want to re-use the names in several loops), you can simply assign these in a local macro yourself:
local names var48700 var48900 var48999 var49020 var49180 var49340
foreach var of local names {
display `var'
}
.41988069
.06420179
.36276805
.52763051
.16493952
.66403782
The local macro names will automatically expand during run time to include all the specified items.

A local within a local evaluates as empty in a program

I am trying to write a mini-program that simply takes a list of variables, and returns a sub-list that contains variables that are actually in the dataset.
I do this many times in a do file (not sequentially, so no loops), so it is easier to just have a quick program rather than effectively duplicate this code every time.
A simplified version of the program is below. The main issue appears to be in line 6. The first argument for the program should be the name of a local macro that contains variable names to compare with those in the dataset. So, for example, if the local macro is list1, the first argument of the program is list1, and I want to store a new local macro, vlist, which contains all the variables in list1.
But when I try to do this with:
``1''
the resulting local macro vlist just ends up being empty, while the local allvars is fine.
My program's code is the following:
clear
cap program drop lm
program define lm
* Create a new local, vlist, that is all the variables in the local macro identified in argument 1
local vlist ``1''
* Create a new local, allvars, that is all variables in teh dataset
qui ds, a
local allvars `r(varlist)'
* Display both macros, to illustrate that vlist is empty, while allvars contains all variables
di as text "vlist: " as result "`vlist'"
di as text "allvars: " as result "`allvars'" _newline
* Create a new local macro that is the intersection of the two lists (if it worked)
local `1'_inter : list vlist & allvars
* Display different messages depending on the outcome (e.g. if a list was created, or empty)
if missing("``1'_inter'") di as error "There are no common variables between `1' and the dataset."
else di as result "`1' intersection with varlist _all now stored as local: `1'_inter"
end
clear
input float(var1 var2 var3 var4 var5) // Input irrelevant data
. . . . .
end
/* Next, create a macro with a list of variables that are in the dataset
(e.g. var1 and var2), and even some that are not in teh data (var7). */
local list1 var1 var2 var7
* Execute the list-match program
lm list1
As you can see, the local vlist ends up being empty, so there is no intersection between the lists.
Any idea what I am doing wrong here?
I am sure it is the double local macro, but I am not sure how to fix it.
The macro in macro reference
``1''
is evaluated as follows. First, macro 1 is the first thing you typed as argument to the command, which in your example is list1, so substituting that Stata then sees
`list1'
and the entire command is now evaluated as
local vlist `list1'
which is then evaluated as follows. What is in the local macro list1? Precisely nothing, or an empty string, as no such local macro exists at that point in the program (and any other local macros created anywhere else are invisible, because that is what local means: local to the program space concerned). So next Stata sees
local vlist
whereby you simultaneously create a local macro and blank it out, thus destroying it.
The solution is that you intended to write
local vlist "`1'"
or perhaps just
args vlist
However, that is not the end of it. The command
lm list1
mentions the name of a local macro you have just created. But passing the name of a local macro to another program is futile, as the other program can't see the contents of the macro. You have to pass the contents.
Yet more: the program then needs to be revised as you may want to pass several names to it and they will only be regarded as a single argument if you bind them in double quotes.
FWIW, the problem you describe seems to be that solved by isvar from SSC. Here it is:
*! NJC 1.0.0 20 Sept 2005
program isvar, rclass
version 8
syntax anything
foreach v of local anything {
capture unab V : `v'
if _rc == 0 local varlist `varlist' `V'
else local badlist `badlist' `v'
}
di
if "`varlist'" != "" {
local n : word count `varlist'
local what = plural(`n', "variable")
di as txt "{p}`what': " as res "`varlist'{p_end}"
return local varlist "`varlist'"
}
if "`badlist'" != "" {
local n : word count `badlist'
local what = plural(`n', "not variable")
di as txt "{p}`what': " as res "`badlist'{p_end}"
return local badlist "`badlist'"
}
end
I am fond of ds but I wouldn't use it here. You could be calling up thousands of variable names just to check whether a few names are in a dataset.
A final comment on style
local list1 var1 var2 var7
lm list1
might as well be
lm var1 var2 var7
as there is no value in putting the names into a bag (local macro) and using the name of the bag when you can just pass the names directly. As said, this won't work with your program until you fix it, but it would work with isvar.

Day of the week effect - excluding dummy variables not individually

I want to test the day of the week effect of stock returns. The stata code I have written works, but looks fairly inefficient.
// 1) Monday effect
eststo:reg return day_dummy2 day_dummy3 day_dummy4 day_dummy5
// 2) Tuesday effect
eststo:reg return day_dummy1 day_dummy3 day_dummy4 day_dummy5
// 3) Wednesday effect
eststo:reg return day_dummy1 day_dummy2 day_dummy4 day_dummy5
and so on.
Is there a way to write a code with the same function (excluding one day at a time) with e.g. a foreach loop?
Thank you very much for your help!
A bit clunky, perhaps, but you could use Stata's macro (see help extended_fcn) functions to iteratively exclude one of your listed variables and generate the list of remaining variables.
local vars "day1 day2 day3 day4 day5 day6 day7"
forvalues i = 1/7 {
local varexclude : word `i' of `vars'
local varsout`i' : subinstr local vars "`varexclude'" ""
// insert -estout- command here
}
macro list // to verify the individual `varsout`i'' local macros
You can obtain the initial varlist with ds day*, which stores the variable list in r(varlist).

Generate variables with loop over pairs of variables

I have data on quantities and Values for a set of countries, and currently the variable names are Q_US V_US Q_UK V_UK Q_France V_France and in that order: Quantity_country Value_country, etc.
For each country (US, UK, France, etc.) I want to generate a new variable that gives me the unit value. Manually I would create them as
gen unit_US = V_US/Q_US
gen unit_UK = V_UK/Q_UK
gen unit_France = V_France/Q_France
But I have 100+ countries, and it would be great to do this in a loop if possible.
Is there an easy way to do this?
Let's get a list of all the countries as you have used them in variable names.
unab where : V_*
local where " `where'"
local where : subinstr local where " V_" " ", all
The additional space is designed to ensure that the text removed is just the prefix V_ at the start of variable names. For another example of using unab, see this FAQ.
Check it worked:
display "`where'"
Now loop:
foreach c of local where {
gen unit_`c' = V_`c'/Q_`c'
}
I'd also consider reshape long.

Stata: compare two datasets and drop different variables

I have two large datasets (more than 1000 variables in each), one of which has all the variables of the second, plus additional variables. I would like to get a list of all these additional variables, and then drop them and append one dataset to another. I have tried the command dta_equal, but got the same problem found here: http://www.stata.com/statalist/archive/2011-08/msg00308.html
I guess append, keep() cannot realize what I want to do directly, i.e., cannot append dataset while drop additional variables since I have to manually type in variables one by one in the keep() option, which is not realistic given my large dataset.
Are there any ways to deal with this?
There are several Stata commands that can be useful here.
The unab command is used in the first example to make a list of variable in the dataset with fewer variables. The second and third example use the describe command to obtain the list of variables in a dataset not currently in memory.
The final part the the example shows how to use extended macro list functions to obtain a list of common variables and the set of variables not common to both datasets.
* simulate 2 datasets, one has more variables than the other
sysuse auto, clear
save "data1.dta", replace
gen x = _n
gen y = -_n
save "data2.dta", replace
* example 1: drop after append
use "data1.dta", clear
unab vcommon : *
gen source = 1
append using "data2.dta"
replace source = 2 if mi(source)
keep `vcommon' source
* example 2: drop first then append
clear
describe using "data1.dta", varlist short
local vcommon `r(varlist)'
use `vcommon' using "data2.dta", clear
gen source = 2
append using "data1.dta"
replace source = 1 if mi(source)
* example 3: append and keep on the fly
use "data1.dta", clear
unab vcommon : *
gen source = 1
append using "data2.dta", keep(`vcommon')
replace source = 2 if mi(source)
* use extended macro list functions to manipulate variable list
clear
describe using "data1.dta", varlist short
local vlist1 `r(varlist)'
describe using "data2.dta", varlist short
local vlist2 `r(varlist)'
local vcommon : list vlist1 & vlist2
local vinonly1 : list vlist1 - vlist2
local vinonly2 : list vlist2 - vlist1
dis "common variables = `vcommon'"
dis "variables in data1 not found in data2 = `vinonly1'"
dis "variables in data2 not found in data1 = `vinonly2'"