In Stata I am trying to repeat code inside an if qualifier using perhaps a forvalues loop. My code looks something like this:
gen y=0
replace y=1 if x_1==1 & x_2==1 & x_3==1 & x_4==1
Instead of writing the & x_i==1 statement every time for each variable, I want to do it using a loop, something like this:
gen y=0
replace y=1 if forvalues i=1/4{x_`i'==1 &}
LATER EDIT:
Would it be possible to create a local in the line of this with the elements added together:
forvalues i=1/4{
local text_`i' "x_`i'==1 &"
display "`text_`i''"
}
And then call it at the if qualifier ?
Although you use the term "if statement" all your code is phrased in terms of if qualifiers, which aren't commands or statements. (Your use of the term "statement" is looser than customary, but that doesn't affect an answer directly.)
You can't insert loops in if qualifiers.
See for the differences
help if
help ifcmd
The entire example
gen y = 0
replace y = 1 if x==1 | x==2 | x==3 | x==4
would be better as
gen y = inlist(x, 1, 2, 3, 4)
or (dependent possibly on whatever values are allowed)
gen y = inrange(x, 1, 4)
A loop solution could be
gen y = 0
quietly forval i = 1/4 {
replace y = 1 if x == `i'
}
We can't discuss whether inlist() or inrange() would or would not be a solution for your real problem if you don't show to us.
I usually don't like - in Nick's terms - to write code to write code. I see an immediate, though not elegant nor 'heterodox', solution to your issue. The whole thing amounts to generate an indicator function for all your indicators, and use it with your if qualifier.
Implicit assumptions, which make this a bad, non-generalizable solution, are: 1) all variables are dummies, and you need them to be == 1, and 2) variable names are conveniently ordered 1 to N (although, if that is not the case, you can easily change the forv into a 'foreach var of varlist etc.')
g touse = 1
forv i =1/30{
replace touse = touse * x_'i'
}
<your action> if touse == 1
Related
I am using two-level loops to create a set of variables. But Stata reports a syntax error.
forvalues i = 1/5 {
local to `i'+1
dis `to'
forvalues j = `to'/6{
dis `j'
gen e_`i'_`j' = .
}
}
I could not figure out where I made the syntax error.
And a follow-up question. I would like to change how the number of loops are coded in the example above. Right now, it's hard-coded as 5 and 6. But I want to make it based on the data. For instance,I am coding as below:
sum x
scalar x_max_1 = `r(max)'-1
scalar x_max_2 = `r(max)'
forvalues i = 1/x_max_1 {
local to = `i'+1
dis `to'
forvalues j = `to'/x_max_2{
dis `j'
gen e_`i'_`j' = .
}
}
However, Stata reports a syntax error in this case. I am not sure why. The scalar is a numeric variable. Why would the code above not work?
Your code would be better as
forvalues i = 1/5 {
local to = `i' + 1
forvalues j = `to'/6 {
gen e_`i'_`j' = .
}
}
With your code you went
local to `i' + 1
so first time around the loop to becomes the string or text 1 + 1 which is then illegal as an argument to forvalues. That is, a local definition without an = sign will result in copying of text, not evaluation of the expression.
The way you used display could not show you this error because display used that way will evaluate expressions to the extent possible. If you had insisted that the macro was a string with
di "`to'"
then you would have seen its contents.
Another way to do it is
forvalues i = 1/5 {
forvalues j = `= `i' + 1'/6 {
gen e_`i'_`j' = .
}
}
EDIT
You asked further about
sum x
scalar x_max_1 = `r(max)'-1
scalar x_max_2 = `r(max)'
forvalues i = 1/x_max_1 {
and quite a lot can be said about that. Let's work backwards from one of various better solutions:
sum x, meanonly
forvalues i = 1/`= r(max) - 1' {
or another, perhaps a little more transparent:
sum x, meanonly
local max = r(max) - 1
forvalues i = 1/`max' {
What are the messages here:
If you only want the maximum, specify meanonly. Agreed: the option name alone does not imply this. See https://www.stata-journal.com/sjpdf.html?articlenum=st0135 for more.
What is the point of pushing the r-class result r(max) into a scalar? You already have what you need in r(max). Educate yourself out of this with the following analogy.
I have what I want. Now I put it into a box. Now I take it out of the box. Now I have what I want again. Come to think of it, the box business can be cut.
The box is the scalar, two scalars in this case.
forvalues won't evaluate scalars to give you the number you want. That will happen in many languages, but not here.
More subtly, forvalues doesn't even evaluate local references or similar constructs. What happens is that Stata's generic syntax parser does that for you before what you typed is passed to forvalues.
I am attempting to reproduce the following in stata. This is a scatter plot of average portfolio returns (y axis) and predicted retruns (x axis).
To do so, I need your help on how I can extract the intercepts from 25 regressions into one variable? I am currently running the 25 portfolio regressions as follows. I have seen that parmest can potentially do this but can't get it to work with the forval. Many thanks
forval s = 1 / 5 {
forval h = 1 / 5 {
reg S`s'H`h' Mkt_Rf SMB HML
}
}
I don't know what your data look like, but maybe something like this will work:
gen intercepts = .
local i = 1
forval s = 1 / 5 {
forval h = 1 / 5 {
reg S`s'H`h' Mkt_Rf SMB HML
// assign the ith observation of intercepts
// equal to the regression constant
replace intercepts = _b[_cons] if _n == `i'
// increment i
local ++i
}
}
The postfile series of commands can be very helpful in a situation like this. The commands allows you to store results in a separate data set without losing the data in memory.
You can start with this as a simple example. This code will produce a Stata data set called "results.dta" with the variables s h and constant with a record of each regression.
cap postclose results
postfile results s h constant using results.dta, replace
forval s = 1 / 5 {
forval h = 1 / 5 {
reg S`s'H`h' Mkt_Rf SMB HML
loc c = _b[_cons]
post results (`s') (`h') (`c')
}
}
postclose results
use results, clear
I want to write six temp data files from my original data keeping the following variables:
temp1: v1-v18
temp2: v1-v5 v19-v31
temp3: v1-v5 v32-v44
temp4: v1-v5 v45-v57
temp5: v1-v5 v58-v70
temp6: v1-v5 v71-v84
I have tried the following:
forvalues i =1(1)6 {
preserve
local j = 6 + (`i'-1)*13
local k = `j'+12
keep v1-v18 if `j'==6
keep v1-v5 v`i'-v`k' if `i'>6 & `j'<71
keep v1-v5 v71-v84 if `j'==71
export delimited using temp`i'.csv, delimiter(";") novarnames replace
restore
}
I get an invalid syntax error. The problem lies with the keep statements. Specifically the if condition with a local macro seems to be against syntax rules.
I think part of your confusion is due to misunderstanding the if qualifier vs the if command.
The if command evaluates an expression: if that expression is true, it executes what follows. The if command should be used to evaluate a single expression, in this case, the value of a macro.
You might use an if qualifier, for example, when you want to regress y x if x > 2 or replace x = . if x <= 2 etc. See here for a short description.
Your syntax has other issues too. You cannot have code following on the same line as the open brace in your forvalues loop, or again on the same line as your closing brace. You also use the local i to condition your keep. I think you mean to use j here, as i simply serves to iterate the loop, not identify a variable suffix.
Further, the logic here seems to work, but doesn't seem very general or efficient. I imagine there is a better way to do this but I don't have time to play around with it at the moment - perhaps an update later.
In any case, I think the correct syntax most analogous to what you have tried is something like the following.
clear *
set more off
set obs 5
forvalues i = 1/84 {
gen v`i' = runiform()
}
forvalues i =1/6 {
preserve
local j = 6 + (`i'-1)*13
local k = `j'+12
if `j' == 6 {
keep v1-v18
}
else if `j' > 6 & `j' < 71 {
keep v1-v5 v`j'-v`k'
}
else keep v1-v5 v71-v84
ds
di
restore
}
I use ds here to simply list the variables in the data followed by di do display a blank line as a separator, but you could simply plug back in your export and it should work just fine.
Another thing to consider if you truly want temp data files is to consider using tempfile so that you aren't writing anything to disk. You might use
forvalues i = 1/6 {
tempfile temp`i'
// other commands
save `temp`i''
}
This will create six Stata data files temp1 - temp6 that are held in memory until the program terminates.
I am using carryforward (ssc install carryforward) to fill in missing observations. Some of my data are annual and I want to use them for subsequent monthly observations, but only if the carried forward data are less than two years old. Can I achieve this logic with the dynamic_condition() option, particularly using #? I have to complete this for many variables, and would like to avoid a lot of variable generation and dropping (and really I'd like to know if it's possible).
The following "manual" solution works, but can I replicate it on the fly with dynamic_condition()? My attempts below fail.
/* generate data with observation every June */
clear
set obs 100
generate date_ym = ym(2001, 1) + (_n - 1)
format date_ym %tm
generate date_m = month(dofm(date_ym))
generate x = runiform() if (date_m == 6) & !inlist(_n, 30, 42)
/* carryforward (ssc install carryforward), "manual" solution */
egen date_m2 = group(date_ym) if !missing(x)
carryforward date_m2, replace
bysort date_m2 (date_ym): generate date_m3 = cond(_n > 24, ., date_m2)
carryforward x if !missing(date_m3), gen(x_cf)
tsset date_ym
list, sep(12)
/* can I replicate this with dynamic_condition() option? */
/* no time series operators with # */
/* carryforward x, gen(x_cf2) dynamic_condition(sum(d.# == 0) < 24) */
/* x_cf2: d.x_cf2 invalid name */
/* second # doesn't work */
/* carryforward x, gen(x_cf3) dynamic_condition(sum(# == #[_n - 1]) < 24) */
/* x_cf3: equation [_n-1] not found */
Disclosure: I don't use carryforward (SSC), but that's because I tend to think back to the principles as I understand them, as documented here.
To do this, you need to keep a record not only of previous non-missing values but also of the dates when a variable was last not missing. This arose previously: see this answer
The essence of a simpler approach is here:
clear
set seed 2803
set obs 100
generate date_ym = ym(2001, 1) + (_n - 1)
format date_ym %tm
generate x = runiform() if inlist(_n, 30, 42)
gen last = date_ym if !missing(x)
replace last = last[_n-1] if missing(last)
replace x = x[_n-1] if missing(x) & (date_ym - last) < 24
The generalisation to panels is using by: and the generalisation to multiple variables uses a foreach loop. If the dates of missing values can be different for different variables, that mostly just shifts the loop.
Schematically, suppose we are cycling over an arbitary varlist and that the dates of missing values differ, but we use the rule of using the last value within 24 months.
gen last = .
quietly foreach v of varlist <varlist> {
replace last = cond(!missing(`v'), date_ym, .)
replace last = last[_n-1] if missing(last)
replace `v' = `v'[_n-1] if missing(`v') & (date_ym - last) < 24
}
EDIT: Thank to Joe's advice, I will make my question more specific. Actually I need to code a function in Stata which takes variables A,B,C,D,... as inputs and a variable Y as output which can be evaluated with usual Stata functions/commands like "generate dummy=2*myfun(X) if ..."
The function itself contains numerical calculations. A pseudo Stata code will look like
myfun(X)
gen Y=0.5*X if X==1
replace Y=31-X if X==2
replace Y=X-2 if X==3
.... a long list
return(Y)
Notice that X can be a huge set of different Stata variables and the numerical calculations are rather long inside the function. That's why I would like to use a function. I guess that the native "program" command in Stata is not suitable for this type of problem because it cannot take variables as input/output.
(ANSWER TO ORIGINAL QUESTION)
I have never used SAS, but at a wild guess you want something like
foreach v in A B C D {
gen test`v' = 0.5 * (`v' == 1) + 0.6 * (`v' == 2) + 0.7 * (`v' == 3)
}
or
foreach v in A B C D {
gen test`v' = cond(`v' == 1, 0.5, cond(`v' == 2, 0.6, cond(`v' == 3, 0.7, .)))
}
But hang on; that middle line also looks like
gen test`v' = (4 + `v') / 10
(ANSWER TO COMPLETELY DIFFERENT REVISED QUESTION)
This can be done in various ways. As above you could have a loop
foreach v in A B C D {
gen test`v' = 0.5 * `v' if `v' == 1
replace test`v' = 31 - `v' if `v' == 2
replace test`v' = `v' - 2 if `v' == 3
}
The question says "I guess that the native "program" command in Stata is not suitable for this type of problem because it cannot take variables as input/output." That guess is completely incorrect. You could write a program to do this too. This example is schematic, not definitive. A real program would include more checks and error messages to match any incorrect input. For detailed advice, you really need to read the documentation. One answer on SO can't teach you all you need to know even to write simple Stata programs. In any case, the example is evidently frivolous and/or incomplete, so a complete working example would be pointless or impossible.
program myweirdexample
version 13
syntax varlist(numeric), Generate(namelist)
local nold : word count `varlist'
local nnew : word count `generate'
if `nold' != `nnew' {
di as err "`generate' does not match `varlist'"
exit 198
}
local i = 1
quietly foreach v of local varlist {
local new : word `i' of `generate'
gen `new' = 0.5 * `v' if `v' == 1
replace `new' = 31 - `v' if `v' == 2
replace `new' = `v' - 2 if `v' == 3
local ++i
}
end
Footnote on terminology: The question uses the term function more broadly than it is used in Stata. In Stata, commands and functions are distinct; "function" is not a synonym for command.
Second footnote: Check out recode. It may be what you need, but it is best for mapping integer codes to other integer codes.
Third footnote: An example of a needed check is that the argument of generate() should be variable names that are legal and new.