Looping over many names which don't have rules - stata

When there are just few names, looping in Stata is easy.
Also, when there is a rule as to how the names change (e.g. increment) I can do the following:
forval i = 1/5 {
...
}
However, there are cases where i have hundreds of names that I need to loop over, which don't have rules of increment.
For example:
48700 48900 48999 49020 49180 49340 ...
Is there some short-hand way of writing the loop?
Or do I just have to painstakingly list all of them?

The answer is it depends.
If these are part of variable names, you can do something like this:
clear
set obs 5
foreach var in 48700 48900 48999 49020 49180 49340 {
generate var`var' = runiform()
}
ds
var48700 var48900 var48999 var49020 var49180 var49340
ds var48*
var48700 var48900 var48999
local names `r(varlist)'
foreach var of local names {
display `var'
}
.41988069
.06420179
.36276805
If these are file names, a macro extended function can be handy:
dir, w
48700.rtf 48999.rtf 49180.rtf
48900.rtf 49020.rtf 49340.rtf
local list : dir . files "*"
display `list'
48700.rtf48900.rtf48999.rtf49020.rtf49180.rtf49340.rtf
local list : dir . files "48*"
display `list'
48700.rtf48900.rtf48999.rtf
foreach fil of local list {
display "`fil'"
}
48700.rtf
48900.rtf
48999.rtf
EDIT:
The above approaches are concerned with how to efficiently get all relevant names in a local macro.
If you already know the names and you merely want a cleaner way to write the loop (or want to re-use the names in several loops), you can simply assign these in a local macro yourself:
local names var48700 var48900 var48999 var49020 var49180 var49340
foreach var of local names {
display `var'
}
.41988069
.06420179
.36276805
.52763051
.16493952
.66403782
The local macro names will automatically expand during run time to include all the specified items.

Related

read files using local macros in a loop in Stata

I'm stuck trying to do a loop to read many files that have no correlated names, but I want to save them with correlated names.
This is an example of what I have:
paths:
"example_84745.dta"
"example_74632.dta"
"example_18390.dta"
So, I want to read all of them and save them like
"example_1.dta"
"example_2.dta"
"example_3.dta"
What I'm trying to do is to work with local macros, like this:
local path_1 = "example_84745.dta"
local path_2 = "example_74632.dta"
local path_3 = "example_18390.dta"
forvalues i = 1(1)3{
display "path_`i'"
use "path_`i'", clear
save "example_`i'"
}
But it is not working. It prints path_1 is not found.
I really appreciate your comments.
You're looping over 1 2 3 and asking to display path_1 path_2 path_3 and then use them, but you need to spell out that they are in turn local macro names.
local path_1 = "example_84745.dta"
local path_2 = "example_74632.dta"
local path_3 = "example_18390.dta"
forvalues i = 1(1)3{
display "`path_`i''"
use "`path_`i''", clear
save "example_`i'"
}
Local macros in Stata can be used like variables in many programming languages, but it is not customary to regard them as variables. In Stata, a variable is (is only) a column in a dataset.

Appending a suffix to a list of variables

Is there a way to quickly append the same suffix to a list of variables from monthly reports in Stata for example:
Variable list: report, total_enrollment
Append _jan22 so new variable list: report_jan22, total_enrollment_jan22
Variable list: report, total_enrollment
Append _feb22 so new variable list: report_feb22, total_enrollment_feb22
Have looked at the Rename * command -- advice on using this command?
Do you have multiple datasets or how can the variables have the same name in the first place?
One way to solve this would be to use a loop. Provide more detail if this does not work.
clear
set obs 10
gen report = 1
gen total_enrollment = 1
local vars report total_enrollment
foreach var of local vars {
rename `var' `var'_jan22
}

A local within a local evaluates as empty in a program

I am trying to write a mini-program that simply takes a list of variables, and returns a sub-list that contains variables that are actually in the dataset.
I do this many times in a do file (not sequentially, so no loops), so it is easier to just have a quick program rather than effectively duplicate this code every time.
A simplified version of the program is below. The main issue appears to be in line 6. The first argument for the program should be the name of a local macro that contains variable names to compare with those in the dataset. So, for example, if the local macro is list1, the first argument of the program is list1, and I want to store a new local macro, vlist, which contains all the variables in list1.
But when I try to do this with:
``1''
the resulting local macro vlist just ends up being empty, while the local allvars is fine.
My program's code is the following:
clear
cap program drop lm
program define lm
* Create a new local, vlist, that is all the variables in the local macro identified in argument 1
local vlist ``1''
* Create a new local, allvars, that is all variables in teh dataset
qui ds, a
local allvars `r(varlist)'
* Display both macros, to illustrate that vlist is empty, while allvars contains all variables
di as text "vlist: " as result "`vlist'"
di as text "allvars: " as result "`allvars'" _newline
* Create a new local macro that is the intersection of the two lists (if it worked)
local `1'_inter : list vlist & allvars
* Display different messages depending on the outcome (e.g. if a list was created, or empty)
if missing("``1'_inter'") di as error "There are no common variables between `1' and the dataset."
else di as result "`1' intersection with varlist _all now stored as local: `1'_inter"
end
clear
input float(var1 var2 var3 var4 var5) // Input irrelevant data
. . . . .
end
/* Next, create a macro with a list of variables that are in the dataset
(e.g. var1 and var2), and even some that are not in teh data (var7). */
local list1 var1 var2 var7
* Execute the list-match program
lm list1
As you can see, the local vlist ends up being empty, so there is no intersection between the lists.
Any idea what I am doing wrong here?
I am sure it is the double local macro, but I am not sure how to fix it.
The macro in macro reference
``1''
is evaluated as follows. First, macro 1 is the first thing you typed as argument to the command, which in your example is list1, so substituting that Stata then sees
`list1'
and the entire command is now evaluated as
local vlist `list1'
which is then evaluated as follows. What is in the local macro list1? Precisely nothing, or an empty string, as no such local macro exists at that point in the program (and any other local macros created anywhere else are invisible, because that is what local means: local to the program space concerned). So next Stata sees
local vlist
whereby you simultaneously create a local macro and blank it out, thus destroying it.
The solution is that you intended to write
local vlist "`1'"
or perhaps just
args vlist
However, that is not the end of it. The command
lm list1
mentions the name of a local macro you have just created. But passing the name of a local macro to another program is futile, as the other program can't see the contents of the macro. You have to pass the contents.
Yet more: the program then needs to be revised as you may want to pass several names to it and they will only be regarded as a single argument if you bind them in double quotes.
FWIW, the problem you describe seems to be that solved by isvar from SSC. Here it is:
*! NJC 1.0.0 20 Sept 2005
program isvar, rclass
version 8
syntax anything
foreach v of local anything {
capture unab V : `v'
if _rc == 0 local varlist `varlist' `V'
else local badlist `badlist' `v'
}
di
if "`varlist'" != "" {
local n : word count `varlist'
local what = plural(`n', "variable")
di as txt "{p}`what': " as res "`varlist'{p_end}"
return local varlist "`varlist'"
}
if "`badlist'" != "" {
local n : word count `badlist'
local what = plural(`n', "not variable")
di as txt "{p}`what': " as res "`badlist'{p_end}"
return local badlist "`badlist'"
}
end
I am fond of ds but I wouldn't use it here. You could be calling up thousands of variable names just to check whether a few names are in a dataset.
A final comment on style
local list1 var1 var2 var7
lm list1
might as well be
lm var1 var2 var7
as there is no value in putting the names into a bag (local macro) and using the name of the bag when you can just pass the names directly. As said, this won't work with your program until you fix it, but it would work with isvar.

Generate variables with loop over pairs of variables

I have data on quantities and Values for a set of countries, and currently the variable names are Q_US V_US Q_UK V_UK Q_France V_France and in that order: Quantity_country Value_country, etc.
For each country (US, UK, France, etc.) I want to generate a new variable that gives me the unit value. Manually I would create them as
gen unit_US = V_US/Q_US
gen unit_UK = V_UK/Q_UK
gen unit_France = V_France/Q_France
But I have 100+ countries, and it would be great to do this in a loop if possible.
Is there an easy way to do this?
Let's get a list of all the countries as you have used them in variable names.
unab where : V_*
local where " `where'"
local where : subinstr local where " V_" " ", all
The additional space is designed to ensure that the text removed is just the prefix V_ at the start of variable names. For another example of using unab, see this FAQ.
Check it worked:
display "`where'"
Now loop:
foreach c of local where {
gen unit_`c' = V_`c'/Q_`c'
}
I'd also consider reshape long.

stata - variable operations conditional to existent vars and to a list of varnames

I have this problem.
My dataset has variables like:
sec20_var1 sec22_var1 sec30_var1
sec20_var2 sec22_var2 sec30_var2 sec31_var2
(~102 sectors, ~60 variables, not all of the cominations are complete or even existent)
My intention is to build an indicator that do an average of variables within sector. So it is an "aggregated sector" that contains sectors belonging to a class in a high-med-low technology fashion. I already have the definitions of what sectors should include in each category. Let's say, in high technology I should put sec20 and sec31.
The problem: the list of sectors belonging to a class and the actual sectors available for each variable doesn't match. So I'm stucked with this problem and started to do it manually. My best approach was:
set more off
foreach v in _var02 {
ds *`v'
di "`r(varlist)'"
local sects`v' `r(varlist)'
foreach s in sec26 sec28 sec37 {
capture confirm local sects`v'
if !_rc {
egen oecd_medhigh_avg_`v'=rowmean(`s'`v' sec28`v' sec37`v' sec40`v' sec59`v' sec92`v' sec54`v' sec55`v' sec48`v' sec50`v' sec53`v' sec4`v' sec5`v' sec6`v')
else {
di "`v' didnt existed"
}
}
}
}
I got it work only with those variables that has all the sectors present in the totalrow (which is simpler since I dont have to store the varlist in a macro). I would like to do an average of the AVAILABLE sectors, even if they are only two per variable.
I also noticed that the macro storage could be helpful but I don't know how to put it into my code. I'm totally stucked in here.
Thanks for your help! :)
Thank you #SOConnell. As I said in my comment, I went to the same direction, but I'm still searching for the solution I expected (that I don't how to program it or even if it's possible).
I used this code, that goes in the same direction that the one made by #SOConnell, but I found this one more clear. The trick is the _rc==111 that catches the missing combinations of sector_X_variable and complete them, with the objective of beeing used in the second part. Everything worked. It's not elegant, but it has some practical use. :) The third part erases the missing variables created.
*COMPLETING THE LIST OF COMBINATIONS
set more off
foreach v in _var02 _var03 _var08 _var13 _... {
foreach s in sec27 sec35 sec42 sec43 sec45 sec46 sec39 sec52 sec67 {
capture confirm variable s'v'
if _rc==111 {
gen s'v'=.
}
}
}
*GENERATING THE INDICATOR WITH ALL POSSIBLE COMBINATIONS
set more off
foreach v in _var02 _var03 _var08 _var13 ... {
egen oecd_high_avg_v'=rowmean(sec27v' sec35v' sec42v' sec43v' sec45v' sec46v' sec39v' sec52v' sec67v')
}
*DROPPING MISSING VARIABLES CREATED TO DO THE INDICATOR.
set more off
foreach v of varlist * {
gen TEMP=.
replace TEMP=1 if !missing(v')
egen TEMPSUM=sum(TEMP)
if TEMPSUM==0 {
di " >>> Dropping empty variable:v'"
drop `v'
}
drop TEMP TEMPSUM
}
Note that I cutted the list of variables.
I will call what you are referring to as variables as "accounts".
The workaround would be to create empty variables in the dataset for all sectorXaccount combinations. From a point where you already have your dataset loaded into memory:
forval sec = 1/102 {
forval account = 1/60 {
cap gen sec`sec'_var`account'=. /*this will skip over generating the secXaccount combination if it already exists in the dataset */
}
}
Then apply the rowmean operation to the full definition of each indicator. The missings won't be calculated into your rowmean, so it will effectively be an average of available cells without you having to do the selection manually. You could then probably automate deleting the empty variables you created if you do something like:
g start=.
forval sec = 1/102 {
forval account = 1/60 {
cap gen sec`sec'_var`account'=. /*this will skip over generating the secXaccount combination if it already exists in the dataset */
}
}
g end=.
[indicator calculations go here]
drop start-end
However, it seems like you would be creating averages that might not be comparable (some will have 2 underlying values, some 3, some 4, etc.) so you need to be careful there (but you are probably already aware of that).