read files using local macros in a loop in Stata - stata

I'm stuck trying to do a loop to read many files that have no correlated names, but I want to save them with correlated names.
This is an example of what I have:
paths:
"example_84745.dta"
"example_74632.dta"
"example_18390.dta"
So, I want to read all of them and save them like
"example_1.dta"
"example_2.dta"
"example_3.dta"
What I'm trying to do is to work with local macros, like this:
local path_1 = "example_84745.dta"
local path_2 = "example_74632.dta"
local path_3 = "example_18390.dta"
forvalues i = 1(1)3{
display "path_`i'"
use "path_`i'", clear
save "example_`i'"
}
But it is not working. It prints path_1 is not found.
I really appreciate your comments.

You're looping over 1 2 3 and asking to display path_1 path_2 path_3 and then use them, but you need to spell out that they are in turn local macro names.
local path_1 = "example_84745.dta"
local path_2 = "example_74632.dta"
local path_3 = "example_18390.dta"
forvalues i = 1(1)3{
display "`path_`i''"
use "`path_`i''", clear
save "example_`i'"
}
Local macros in Stata can be used like variables in many programming languages, but it is not customary to regard them as variables. In Stata, a variable is (is only) a column in a dataset.

Related

Is there a way to extract year range from wide data?

I have a series of wide panel datasets. In each of these, I want to generate a series of new variables. E.g., in Dataset1, I have variables Car2009 Car2010 Car2011 in a dataset. Using this, I want to create a variable HadCar2009, which is 1 if Car2009 is non-missing, and 0 if missing, similarly HadCar2010, and so on. Of course, this is simple to do but I want to do it for multiple datasets which could have different ranges in terms of time. E.g., Dataset2 has variables Car2005, Car2006, Car2008.
These are all very large datasets (I have about 60 such datasets), so I wouldn't want to convert them to long either.
For now, this is what I tried:
forval j = 1/2{
use Dataset`j', clear
forval i=2005/2011{
capture gen HadCar`i' = .
capture replace HadCar`i' = 1 if !missing(Car`i')
capture replace HadCar`i' = 0 if missing(Car`i')
}
save Dataset`j', replace
}
This works, but I am reluctant to use capture, because perhaps some datasets have a variable called car2008 instead of Car2008, and this would be an error I would like the program to stop at.
Also, the ranges of years across my 60-odd datasets are different. Ideally, I would like to somehow get this range in a local (perhaps somehow using describe? I'm not sure) and then just generate these variables using that local with a simple for loop.
But I'm not sure I can do this in Stata.
Your inner loop could be rewritten from
forval i=2005/2011{
capture gen HadCar`i' = .
capture replace HadCar`i' = 1 if !missing(Car`i')
capture replace HadCar`i' = 0 if missing(Car`i')
}
to
foreach v of var Car???? {
gen Had`v' = !missing(`v')
}
noting the fact in Stata that true or false expressions evaluate to 1 or 0 directly.
https://www.stata-journal.com/article.html?article=dm0099
https://www.stata-journal.com/article.html?article=dm0087
https://www.stata.com/support/faqs/data-management/true-and-false/
This code is going to ignore variables beginning with car. There are other ways to check for their existence. However, if there are no variables Car???? the loop will trigger an error message. A loop over ?ar???? would catch car???? and Car???? (but just possibly other variables too).

How to record qualitative variable with over 100 dummies to several levels as quantitative in SAS

I am working with SAS and want to record variable which with over 50+ different qualitative dummies. For example, the state of the U.S.
In this case, I just want to reduce them into 4 or 5 levels dummy as quantitative variable.
I get several ideaS, for example to use if/else statement, however, the problem is that i have to write down and specify each of area name in SAS and the code looks like super heavy.
Is there any other ways to do that without redundant code? Or to avoid write each specific name of variable? In SAS.
Any ideas are appreciated!!
Method 1:
Use IN, but you still have to list the variables. You can also do it via a format, but you have to define the format first anyways.
if state in ('AL', 'AK', 'AZ' ... etc) then state_group = 1;
else if state in ( .... ) then state_group = 2;
Method 2:
For a format, you create format using PROC FORMAT and then apply it.
proc format;
value $ state_grp_fmt
'AL', 'AK', 'AZ' = 1
'DC', 'NC' = 2 ;
run;
And then you can use it with a PUT statement.
State_Group = put(state, state_grp_fmt);

Looping over many names which don't have rules

When there are just few names, looping in Stata is easy.
Also, when there is a rule as to how the names change (e.g. increment) I can do the following:
forval i = 1/5 {
...
}
However, there are cases where i have hundreds of names that I need to loop over, which don't have rules of increment.
For example:
48700 48900 48999 49020 49180 49340 ...
Is there some short-hand way of writing the loop?
Or do I just have to painstakingly list all of them?
The answer is it depends.
If these are part of variable names, you can do something like this:
clear
set obs 5
foreach var in 48700 48900 48999 49020 49180 49340 {
generate var`var' = runiform()
}
ds
var48700 var48900 var48999 var49020 var49180 var49340
ds var48*
var48700 var48900 var48999
local names `r(varlist)'
foreach var of local names {
display `var'
}
.41988069
.06420179
.36276805
If these are file names, a macro extended function can be handy:
dir, w
48700.rtf 48999.rtf 49180.rtf
48900.rtf 49020.rtf 49340.rtf
local list : dir . files "*"
display `list'
48700.rtf48900.rtf48999.rtf49020.rtf49180.rtf49340.rtf
local list : dir . files "48*"
display `list'
48700.rtf48900.rtf48999.rtf
foreach fil of local list {
display "`fil'"
}
48700.rtf
48900.rtf
48999.rtf
EDIT:
The above approaches are concerned with how to efficiently get all relevant names in a local macro.
If you already know the names and you merely want a cleaner way to write the loop (or want to re-use the names in several loops), you can simply assign these in a local macro yourself:
local names var48700 var48900 var48999 var49020 var49180 var49340
foreach var of local names {
display `var'
}
.41988069
.06420179
.36276805
.52763051
.16493952
.66403782
The local macro names will automatically expand during run time to include all the specified items.

Day of the week effect - excluding dummy variables not individually

I want to test the day of the week effect of stock returns. The stata code I have written works, but looks fairly inefficient.
// 1) Monday effect
eststo:reg return day_dummy2 day_dummy3 day_dummy4 day_dummy5
// 2) Tuesday effect
eststo:reg return day_dummy1 day_dummy3 day_dummy4 day_dummy5
// 3) Wednesday effect
eststo:reg return day_dummy1 day_dummy2 day_dummy4 day_dummy5
and so on.
Is there a way to write a code with the same function (excluding one day at a time) with e.g. a foreach loop?
Thank you very much for your help!
A bit clunky, perhaps, but you could use Stata's macro (see help extended_fcn) functions to iteratively exclude one of your listed variables and generate the list of remaining variables.
local vars "day1 day2 day3 day4 day5 day6 day7"
forvalues i = 1/7 {
local varexclude : word `i' of `vars'
local varsout`i' : subinstr local vars "`varexclude'" ""
// insert -estout- command here
}
macro list // to verify the individual `varsout`i'' local macros
You can obtain the initial varlist with ds day*, which stores the variable list in r(varlist).

Generate variables with loop over pairs of variables

I have data on quantities and Values for a set of countries, and currently the variable names are Q_US V_US Q_UK V_UK Q_France V_France and in that order: Quantity_country Value_country, etc.
For each country (US, UK, France, etc.) I want to generate a new variable that gives me the unit value. Manually I would create them as
gen unit_US = V_US/Q_US
gen unit_UK = V_UK/Q_UK
gen unit_France = V_France/Q_France
But I have 100+ countries, and it would be great to do this in a loop if possible.
Is there an easy way to do this?
Let's get a list of all the countries as you have used them in variable names.
unab where : V_*
local where " `where'"
local where : subinstr local where " V_" " ", all
The additional space is designed to ensure that the text removed is just the prefix V_ at the start of variable names. For another example of using unab, see this FAQ.
Check it worked:
display "`where'"
Now loop:
foreach c of local where {
gen unit_`c' = V_`c'/Q_`c'
}
I'd also consider reshape long.