Loop through a set of variables based on condition in another variable - stata

I have a list of variables a_23 a_24_1 a_24_2 a_24_3 a_24_4 a_24_5 a_24_6 a_24_7 a_24_8.
The values in variables a_24* are based on the response in a_23.
If a_23==1, then at least one variable in a_24* must be equal to 1.
I therefore want to check if any of the variables a_24* does not contain the value 1 if a_23==1
I tried the loop below,
foreach var of varlist a_24_1* {
br a_23 a_24* if a_23==1 & `var' != 1
}
but it returns all the variables that do not contain 1 in the set of variables. However, I only need cases where all variables do not contain the value 1 if the determining variable is equal to 1.

A data example as well as code would be a good idea, so that you then base your question on an MCVE: see https://stackoverflow.com/help/mcve for explanation.
As I understand it an intermediate variable would help here:
egen mina_24 = rowmin(a_24_*)
as the minimum will be 0 if and only if all values are 0.
Note that your loop
foreach var of varlist a_24_1* {
br a_23 a_24* if a_23 == 1 & `var' != 1
}
is a loop over the single variable a_24_1; presumably you mean a24_* in the foreach line.

Related

Repeating code in an if qualifier in Stata

In Stata I am trying to repeat code inside an if qualifier using perhaps a forvalues loop. My code looks something like this:
gen y=0
replace y=1 if x_1==1 & x_2==1 & x_3==1 & x_4==1
Instead of writing the & x_i==1 statement every time for each variable, I want to do it using a loop, something like this:
gen y=0
replace y=1 if forvalues i=1/4{x_`i'==1 &}
LATER EDIT:
Would it be possible to create a local in the line of this with the elements added together:
forvalues i=1/4{
local text_`i' "x_`i'==1 &"
display "`text_`i''"
}
And then call it at the if qualifier ?
Although you use the term "if statement" all your code is phrased in terms of if qualifiers, which aren't commands or statements. (Your use of the term "statement" is looser than customary, but that doesn't affect an answer directly.)
You can't insert loops in if qualifiers.
See for the differences
help if
help ifcmd
The entire example
gen y = 0
replace y = 1 if x==1 | x==2 | x==3 | x==4
would be better as
gen y = inlist(x, 1, 2, 3, 4)
or (dependent possibly on whatever values are allowed)
gen y = inrange(x, 1, 4)
A loop solution could be
gen y = 0
quietly forval i = 1/4 {
replace y = 1 if x == `i'
}
We can't discuss whether inlist() or inrange() would or would not be a solution for your real problem if you don't show to us.
I usually don't like - in Nick's terms - to write code to write code. I see an immediate, though not elegant nor 'heterodox', solution to your issue. The whole thing amounts to generate an indicator function for all your indicators, and use it with your if qualifier.
Implicit assumptions, which make this a bad, non-generalizable solution, are: 1) all variables are dummies, and you need them to be == 1, and 2) variable names are conveniently ordered 1 to N (although, if that is not the case, you can easily change the forv into a 'foreach var of varlist etc.')
g touse = 1
forv i =1/30{
replace touse = touse * x_'i'
}
<your action> if touse == 1

How can I sort variables based on part of a string variable?

I have a dataset with string variables and I am trying to generate a new binary variable based on the first two characters. All strings are 5 characters long, but I'm only concerned with the first two in order to sort.
For example, I could have 22001 and 22005. Since both are of the form 22XXX, I want to assign value 1 for both in the variable type_A. And if I have 25001 and 25005, since both are not of the form 22XXX, I want to assign value 0 for both in the variable type_A.
This should do the job:
clear
set obs 4
generate str5 var1 = "22001" in 1
replace var1 = "22005" in 2
replace var1 = "25001" in 3
replace var1 = "25005" in 4
gen type_A = substr(var1, 1, 2) == "22"
Please note that as you explain your problem it looks like you you are storing 22005 as text - which may not necessarily be the best idea..

Use of local macro

I want to write six temp data files from my original data keeping the following variables:
temp1: v1-v18
temp2: v1-v5 v19-v31
temp3: v1-v5 v32-v44
temp4: v1-v5 v45-v57
temp5: v1-v5 v58-v70
temp6: v1-v5 v71-v84
I have tried the following:
forvalues i =1(1)6 {
preserve
local j = 6 + (`i'-1)*13
local k = `j'+12
keep v1-v18 if `j'==6
keep v1-v5 v`i'-v`k' if `i'>6 & `j'<71
keep v1-v5 v71-v84 if `j'==71
export delimited using temp`i'.csv, delimiter(";") novarnames replace
restore
}
I get an invalid syntax error. The problem lies with the keep statements. Specifically the if condition with a local macro seems to be against syntax rules.
I think part of your confusion is due to misunderstanding the if qualifier vs the if command.
The if command evaluates an expression: if that expression is true, it executes what follows. The if command should be used to evaluate a single expression, in this case, the value of a macro.
You might use an if qualifier, for example, when you want to regress y x if x > 2 or replace x = . if x <= 2 etc. See here for a short description.
Your syntax has other issues too. You cannot have code following on the same line as the open brace in your forvalues loop, or again on the same line as your closing brace. You also use the local i to condition your keep. I think you mean to use j here, as i simply serves to iterate the loop, not identify a variable suffix.
Further, the logic here seems to work, but doesn't seem very general or efficient. I imagine there is a better way to do this but I don't have time to play around with it at the moment - perhaps an update later.
In any case, I think the correct syntax most analogous to what you have tried is something like the following.
clear *
set more off
set obs 5
forvalues i = 1/84 {
gen v`i' = runiform()
}
forvalues i =1/6 {
preserve
local j = 6 + (`i'-1)*13
local k = `j'+12
if `j' == 6 {
keep v1-v18
}
else if `j' > 6 & `j' < 71 {
keep v1-v5 v`j'-v`k'
}
else keep v1-v5 v71-v84
ds
di
restore
}
I use ds here to simply list the variables in the data followed by di do display a blank line as a separator, but you could simply plug back in your export and it should work just fine.
Another thing to consider if you truly want temp data files is to consider using tempfile so that you aren't writing anything to disk. You might use
forvalues i = 1/6 {
tempfile temp`i'
// other commands
save `temp`i''
}
This will create six Stata data files temp1 - temp6 that are held in memory until the program terminates.

Error surrounding use of scan(&varlist) + Comparison of macro variables

As a follow up to this question, for which my existing answer appears to be best:
Extracting sub-data from a SAS dataset & applying to a different dataset
Given a dataset dsn_in, I currently have a set of macro variables max_1 - max_N that contain numeric data. I also have a macro variable varlist containing a list of variables. The two sets of macros are related such that max_1 is associated with scan(&varlist, 1), etc. I am trying to do compare the data values within dsn_in for each variable in varlist to the associated comparison values max_1 - max_N. I would like to output the updated data to dsn_out. Here is what I have so far:
data dsn_out;
set dsn_in;
/* scan list of variables and compare to decision criteria.
if > decision criteria, cap variable */
do i = 1 by 1 while(scan(&varlist, i) ~= '');
if scan("&varlist.", i) > input(symget('max_' || left(put(i, 2.))), best12.) then
scan("&varlist.", i) = input(symget('max_' || left(put(i, 2.))), best12.);
end;
run;
However, I'm getting the following error, which I don't understand. options mprint; shown. SAS appears to be interpreting scan as both an array and a variable, when it's a SAS function.
ERROR: Undeclared array referenced: scan.
MPRINT(OUTLIERS_MAX): if scan("var1 var2 var3 ... varN", i) > input(symget('max_'
|| left(put(i, 2.))), best12.) then scan("var1 var2 var3 ... varN", i) =
input(symget('max_' || left(put(i, 2.))), best12.);
ERROR: Variable scan has not been declared as an array.
MPRINT(OUTLIERS_MAX): end;
MPRINT(OUTLIERS_MAX): run;
Any help you can provide would be greatly appreciated.
The specific issue you have here is that you place SCAN on the left side of an equal sign. That is not allowed; SUBSTR is allowed to be used in this fashion, but not SCAN.

Capitalizing value labels in Stata

Some datasets come with full-lowercase value labels, and I end up with graphs and tables showing results for "egypt", "jordan" and "saudi arabia" instead of the capitalized country names.
I guess that the proper() string function can do something for me, but I am not finding the right way to write the code for Stata 11 that will capitalize all value labels for a given variable.
I basically need to run the proper() function on all value labels on the variable, and then assign them to the variable. Is that possible using a foreach loop and macros in Stata?
Yes. First let's create some sample data with labels for testing:
clear
drawnorm x, n(10)
gen byte v = int(4+x)
drop x
label define types 0 "zero" 1 "one" 2 "two" 3 "three" 4 "four" 5 "five" 6 "six"
label list types
label values v types
Here's a macro to capitalize the values associated with the variable "v":
local varname v
local sLabelName: value label `varname'
di "`sLabelName'"
levelsof `varname', local(xValues)
foreach x of local xValues {
local sLabel: label (`varname') `x', strict
local sLabelNew =proper("`sLabel'")
noi di "`x': `sLabel' ==> `sLabelNew'"
label define `sLabelName' `x' "`sLabelNew'", modify
}
After running it, check the results:
label list types