I have a list of variables ending in similar manner:
variable_1_2
variable_1_3
variable_1_4
variable_2_2
variable_2_3
etc.
I would like to define a variable based on the value of all of these variables. If any of them are 0, I would like my new variable to also be 0 as such:
gen newVar = .
replace newVar = 0 if variable_* == 0
However this returns "invalid name".
Can wildcard not be used inside an if statement? Is there a way around this?
The result of
generate wanted = .
foreach v of var variable_* {
replace wanted = 0 if `v' == 0
}
will be 0 if any argument is 0 and missing otherwise.
egen anyzero = anymatch(variable_*), values(0)
will produce an indicator that is 1 if any argument is 0 and 0 if no arguments are. You can always flip 0 and 1 as results if you need the complement.
Related
I have a list of variables a_23 a_24_1 a_24_2 a_24_3 a_24_4 a_24_5 a_24_6 a_24_7 a_24_8.
The values in variables a_24* are based on the response in a_23.
If a_23==1, then at least one variable in a_24* must be equal to 1.
I therefore want to check if any of the variables a_24* does not contain the value 1 if a_23==1
I tried the loop below,
foreach var of varlist a_24_1* {
br a_23 a_24* if a_23==1 & `var' != 1
}
but it returns all the variables that do not contain 1 in the set of variables. However, I only need cases where all variables do not contain the value 1 if the determining variable is equal to 1.
A data example as well as code would be a good idea, so that you then base your question on an MCVE: see https://stackoverflow.com/help/mcve for explanation.
As I understand it an intermediate variable would help here:
egen mina_24 = rowmin(a_24_*)
as the minimum will be 0 if and only if all values are 0.
Note that your loop
foreach var of varlist a_24_1* {
br a_23 a_24* if a_23 == 1 & `var' != 1
}
is a loop over the single variable a_24_1; presumably you mean a24_* in the foreach line.
I have a dataset with string variables and I am trying to generate a new binary variable based on the first two characters. All strings are 5 characters long, but I'm only concerned with the first two in order to sort.
For example, I could have 22001 and 22005. Since both are of the form 22XXX, I want to assign value 1 for both in the variable type_A. And if I have 25001 and 25005, since both are not of the form 22XXX, I want to assign value 0 for both in the variable type_A.
This should do the job:
clear
set obs 4
generate str5 var1 = "22001" in 1
replace var1 = "22005" in 2
replace var1 = "25001" in 3
replace var1 = "25005" in 4
gen type_A = substr(var1, 1, 2) == "22"
Please note that as you explain your problem it looks like you you are storing 22005 as text - which may not necessarily be the best idea..
To populate missing data with a fixed range of values
I would like to check how to populate column aktype with a range of values (the range of values for the same pidlink are always fixed at 11 types of values listed below) for those cells with missing values. I have about 17,000+ observations that are missing.
The range of values are as follows:
A
B
C
D
E
G
H
I
J
K
L
I have tried the following command but it does not work:-
foreach x of varlist aktype=1/11 {
replace aktype = "A" in 1 if aktype==""
replace aktype = "B" in 2 if aktype==""
replace aktype = "C" in 3 if aktype==""
replace aktype = "D" in 4 if aktype==""
replace aktype = "E" in 5 if aktype==""
replace aktype = "G" in 6 if aktype==""
replace aktype = "H" in 7 if aktype==""
replace aktype = "I" in 8 if aktype==""
replace aktype = "J" in 9 if aktype==""
replace aktype = "K" in 10 if aktype==""
replace aktype = "L" in 11 if aktype==""
}
Would appreciate it if you could advise on the right command to use. Many thanks!
I would generate a variable AK that has letters A-K in positions 1-11 (and 12-22, and 23-33, and so on). The replace missing values with the value of this variable AK.
* generate data
clear
set obs 20
generate aktype = ""
replace aktype = "foo" in 1/1
replace aktype = "bar" in 10/12
* generate variable with letters A-K
generate AK = char(65 + mod(_n - 1, 11))
* fill missing values
replace aktype = AK if missing(aktype)
list
This yields the following.
. list
+-------------+
| aktype AK |
|-------------|
1. | foo A |
2. | B B |
3. | C C |
4. | D D |
5. | E E |
|-------------|
This first addresses the comment "it does not work".
Generally, in this kind of forum you should always be specific and say exactly what happens, namely where the code breaks down and what the result is (e.g. what error message you get). If necessary, add why that is not what is wanted.
Specifically, in this case Stata would get no further than
foreach x of varlist aktype=1/11
which is illegal (as well as unclear to Stata programmers).
You can loop over a varlist. In this case looping over a single variable aktype is legal. (It is usually pointless, but that's style, not syntax.) So this is legal:
foreach x of varlist aktype
By the way, you define x as the loop argument, but never refer to it inside the loop. That isn't illegal, but it is unusual.
You can also loop over a numlist, e.g.
foreach x of numlist 1/11
although
forval x = 1/11
is a more direct way of doing that. All this follows from the syntax diagrams for the commands concerned, where whatever is not explicitly allowed is forbidden.
On occasions when you need to loop over a varlist and a numlist you will need to use different syntax, but what is best depends on the precise problem.
Now second to the question: I can't see any kind of rule in the question for which values get assigned A through L, so can't advise positively.
I am trying to script a dynamic way way to only take the first two elements in a list and I am having some trouble. Below is a breakdown of what I have in my List
Declaration:
Set List = CreateObject("Scripting.Dictionary")
List Contents:
List(0) = 0-0-0-0
List(1) = 0-1-0-0
List(2) = 0-2-0-0
Code so far:
for count = 0 To UBound(List) -1 step 1
//not sure how to return
next
What I currently have does not work.
Desired Return List:
0-0-0-0
0-1-0-0
You need to use the Items method of the Dictionary. For more info see here
For example:
Dim a, i
a = List.Items
For i = 0 To List.Count - 1
MsgBox(a(i))
Next i
or if you just want the first 2:
For i = 0 To 1
MsgBox(a(i))
Next i
UBound() is for arrays, not dictionaries. You need to use the Count property of the Dictionary object.
' Show all dictionary items...
For i = 0 To List.Count - 1
MsgBox List(i)
Next
' Show the first two dictionary items...
For i = 0 To 1
MsgBox List(i)
Next
Sorry that title is confusing. Hopefully it's clear below.
I'm using Stata and I'd like to assign the value 1 to a variable that depends on the value within a different variable. I have 20 order variables and also 20 corresponding variables. For example if order1 = 3, I'd like to assign variable3 = 1. Below is a snippet of what the final dataset would look like if I had only 3 of each variable.
Right now I'm doing this with two loops but I have to another loop around this that goes through this 9 more times plus I'd doing this for a couple hundred data files. I'd like to make it more efficient.
forvalues i = 1/20 {
forvalues j = 1/20 {
replace variable`j' = 1 if order`i'==`j'
}
}
Is it possible to use the value of order'i' to assign the variable[order`i'VALUE] directly? Then I can get rid of the j loop above. Something like this.
forvalues i = 1/20 {
replace variable[`order`i'value] = 1
}
Thanks for your help!
***** CLARIFICATION ADDED Feb 2nd.**
I simplified my problem and the dataset too much bc the solutions suggested work for what I presented but, are not getting at what I'm really attempting to do. Thank you three for your solutions though. I was not clear enough in my post.
To clarify, my data doesn't have a one to one correspondence of each order# assigning variable# a 1 if it's not missing. For example, the first observation for order1=3, variable1 isn't supposed to get a 1, variable3 should get a 1. What I didn't include in my original post is that I'm actually checking for other conditions to set it equal to 1.
For more background, I'm counting up births of women by birth order(1st child, 2nd child, etc) that occurred at different ages of mothers. So in the data, each row is a woman, each order# is the number birth (order1=3, it's her third child). The corresponding variable#s are the counts (variable# means the woman has a child of birth order #). I mentioned in the post, that I do this 9 times bc I'm doing it for 5 year age groups (15-19; 20-24; etc). So the first set of variable# would be counts of birth by order when women were ages 15-19; the second set of variable# would be counts of births by order when women were 20-24. etc etc. After this, I sum up the counts in different ways (by woman's education, geography, etc).
So with the additional loop what I do is something more like this
forvalues k = 1/9{
forvalues i = 1/20 {
forvalues j = 1/20 {
replace variable`k'_`j' = 1 if order`i'==`j' & age`i'==`k' & birth_age`i'<36
}
}
}
Not sure if it's possible, but I wanted to simplify so I only need to cycle through each child once, without cycling through the birth orders and directly use the value in order# to assign a 1 to the correct variable. So if order1=3 and the woman had the child at the specific age group, assign variable[agegroup][3]=1; if order1=2, then variable[agegroup][2] should get a 1.
forvalues k=1/9{
forvalues i = 1/20 {
replace variable`k'_[`order`i'value] = 1 if age`i'==`k' & birth_age`i'<36
}
}
I would reshape twice. First reshape to long, then condition variable on !missing(order), then reshape back to wide.
* generate your data
clear
set obs 3
forvalues i = 1/3 {
generate order`i' = .
local k = (3 - `i' + 1)
forvalues j = 1/`k' {
replace order`i' = (`k' - `j' + 1) if (_n == `j')
}
}
list
*. list
*
* +--------------------------+
* | order1 order2 order3 |
* |--------------------------|
* 1. | 3 2 1 |
* 2. | 2 1 . |
* 3. | 1 . . |
* +--------------------------+
* I would rehsape to long, then back to wide
generate id = _n
reshape long order, i(id)
generate variable = !missing(order)
reshape wide order variable, i(id) j(_j)
order order* variable*
drop id
list
*. list
*
* +-----------------------------------------------------------+
* | order1 order2 order3 variab~1 variab~2 variab~3 |
* |-----------------------------------------------------------|
* 1. | 3 2 1 1 1 1 |
* 2. | 2 1 . 1 1 0 |
* 3. | 1 . . 1 0 0 |
* +-----------------------------------------------------------+
Using a simple forvalues loop with generate and missing() is orders of magnitude faster than other proposed solutions (until now). For this problem you need only one loop to traverse the complete list of variables, not two, as in the original post. Below some code that shows both points.
*----------------- generate some data ----------------------
clear all
set more off
local numobs 60
set obs `numobs'
quietly {
forvalues i = 1/`numobs' {
generate order`i' = .
local k = (`numobs' - `i' + 1)
forvalues j = 1/`k' {
replace order`i' = (`k' - `j' + 1) if (_n == `j')
}
}
}
timer clear
*------------- method 1 (gen + missing()) ------------------
timer on 1
quietly {
forvalues i = 1/`numobs' {
generate variable`i' = !missing(order`i')
}
}
timer off 1
* ----------- method 2 (reshape + missing()) ---------------
drop variable*
timer on 2
quietly {
generate id = _n
reshape long order, i(id)
generate variable = !missing(order)
reshape wide order variable, i(id) j(_j)
}
timer off 2
*--------------- method 3 (egen, rowmax()) -----------------
drop variable*
timer on 3
quietly {
// loop over the order variables creating dummies
forvalues v=1/`numobs' {
tab order`v', gen(var`v'_)
}
// loop over the domain of the order variables
// (may need to change)
forvalues l=1/`numobs' {
egen variable`l' = rmax(var*_`l')
drop var*_`l'
}
}
timer off 3
*----------------- method 4 (original post) ----------------
drop variable*
timer on 4
quietly {
forvalues i = 1/`numobs' {
gen variable`i' = 0
forvalues j = 1/`numobs' {
replace variable`i' = 1 if order`i'==`j'
}
}
}
timer off 4
*-----------------------------------------------------------
timer list
The timed procedures give
. timer list
1: 0.00 / 1 = 0.0010
2: 0.30 / 1 = 0.3000
3: 0.34 / 1 = 0.3390
4: 0.07 / 1 = 0.0700
where timer 1 is the simple gen, timer 2 the reshape, timer 3 the egen, rowmax(), and timer 4 the original post.
The reason you need only one loop is that Stata's approach is to execute the command for all observations in the database, from top (first observation) to bottom (last observation). For example, variable1 is generated but according to whether order1 is missing or not; this is done for all observations of both variables, without an explicit loop.
I wonder if you actually need to do this. For future questions, if you have a further goal in mind, I think a good strategy is to mention it in your post.
Note: I've reused code from other posters' answers.
Here's a simpler way to do it (that still requires 2 loops):
// loop over the order variables creating dummies
forvalues v=1/20 {
tab order`v', gen(var`v'_)
}
// loop over the domain of the order variables (may need to change)
forvalues l=1/3 {
egen variable`l' = rmax(var*_`l')
drop var*_`l'
}