Var1 is given. Var2 should take value 1 if the Observation or one of the previous 5 observations is a missing value or 0. What is the Syntax for Var2?
I know how to do it with a lot of if Statements. But when I need to do it for the previous 50 observations that gets too inconvenient.
* Example generated by -dataex-. To install: ssc install dataex
clear
input float(Var1 Var2)
5 0
. 1
2 1
5 1
7 1
9 1
5 1
9 0
0 1
2 1
7 1
5 1
3 1
2 1
5 0
end
The question is similar to your previous --Finding the second smallest value -- which you should quote. So is this answer. rangestat is from SSC.
clear
input float(Var1 Var2)
5 0
. 1
2 1
5 1
7 1
9 1
5 1
9 0
0 1
2 1
7 1
5 1
3 1
2 1
5 0
end
gen long id = _n
gen Bad = inlist(Var1, 0, .)
rangestat (sum) Bad, int(id -5 0)
list, sepby(Bad_sum)
+----------------------------------+
| Var1 Var2 id Bad Bad_sum |
|----------------------------------|
1. | 5 0 1 0 0 |
|----------------------------------|
2. | . 1 2 1 1 |
3. | 2 1 3 0 1 |
4. | 5 1 4 0 1 |
5. | 7 1 5 0 1 |
6. | 9 1 6 0 1 |
7. | 5 1 7 0 1 |
|----------------------------------|
8. | 9 0 8 0 0 |
|----------------------------------|
9. | 0 1 9 1 1 |
10. | 2 1 10 0 1 |
11. | 7 1 11 0 1 |
12. | 5 1 12 0 1 |
13. | 3 1 13 0 1 |
14. | 2 1 14 0 1 |
|----------------------------------|
15. | 5 0 15 0 0 |
+----------------------------------+
Related
I have a panel dataset in Stata with several countries and each country containing groups. I would like to rank the groups by country, according to the variable var1.
The structure of my dataset is as follows (the rank column is what I would like to achieve). Note that var1 is indeed constant within groups (it is just the within group average of another variable).
--country--|--groupId--|---time----|---var1----|---rank---
1 | 1 | 1 | 50 | 3
1 | 1 | 2 | 50 | 3
1 | 1 | 3 | 50 | 3
1 | 2 | 1 | 90 | 1
1 | 2 | 2 | 90 | 1
1 | 2 | 3 | 90 | 1
1 | 3 | 1 | 60 | 2
1 | 3 | 2 | 60 | 2
1 | 3 | 3 | 60 | 2
2 | 4 | 1 | 15 | 2
2 | 4 | 2 | 15 | 2
2 | 4 | 3 | 15 | 2
2 | 5 | 1 | 10 | 3
2 | 5 | 2 | 10 | 3
2 | 5 | 3 | 10 | 3
2 | 6 | 1 | 80 | 1
2 | 6 | 2 | 80 | 1
2 | 6 | 3 | 80 | 1
Among the options I have tried is:
sort country groupId
by country (groupId): egen rank = rank(var1)
However, I cannot achieve the desired result.
Thanks for the data example. There are two problems with your code. One is that as you want to rank from highest to lowest, you need to negate the argument to rank(). The second is that given the repetitions, you need to rank on one time only and then copy those ranks to other times.
This works with your data example, here edited to be input code. (See also the Stata tag wiki for that principle.)
clear
input country groupId time var1 rank
1 1 1 50 3
1 1 2 50 3
1 1 3 50 3
1 2 1 90 1
1 2 2 90 1
1 2 3 90 1
1 3 1 60 2
1 3 2 60 2
1 3 3 60 2
2 4 1 15 2
2 4 2 15 2
2 4 3 15 2
2 5 1 10 3
2 5 2 10 3
2 5 3 10 3
2 6 1 80 1
2 6 2 80 1
2 6 3 80 1
end
bysort country : egen wanted = rank(-var) if time == 1
bysort country groupId (time) : replace wanted = wanted[1]
assert rank == wanted
I have a dataset with only variable values:
clear
input value new_var
1 1
3 3
5 5
30 1
40 3
50 5
11 1
12 3
13 5
end
How can I generate a new variable new_var containing a repeating sequence of the first three observations in value?
Many ways to do it: here are two:
clear
input value new_var
1 1
3 3
5 5
30 1
40 3
50 5
11 1
12 3
13 5
end
egen index = seq(), to(3)
generate wanted = value[index]
generate direct = cond(mod(_n, 3) == 1, 1, cond(mod(_n, 3) == 2, 3, 5))
list, sep(3)
+-------------------------------------------+
| value new_var index wanted direct |
|-------------------------------------------|
1. | 1 1 1 1 1 |
2. | 3 3 2 3 3 |
3. | 5 5 3 5 5 |
|-------------------------------------------|
4. | 30 1 1 1 1 |
5. | 40 3 2 3 3 |
6. | 50 5 3 5 5 |
|-------------------------------------------|
7. | 11 1 1 1 1 |
8. | 12 3 2 3 3 |
9. | 13 5 3 5 5 |
+-------------------------------------------+
This is a follow-up to my previous question: Connect IDs based on values in rows.
I would now like to consider the case, where connections between identical idb's should be classified as 0.
The output is similar to the matrix in my previous post but with diagonal elements equal to 0:
62014 62015 62016 62017 62018
62014 0 1 0 1 1
62015 1 0 0 0 0
62016 0 0 0 0 1
62017 1 0 0 0 1
62018 1 0 1 1 0
How can I do this in Stata?
You can easily change the values in the diagonal of a matrix as follows:
: B
[symmetric]
1 2 3 4 5
+---------------------+
1 | 1 |
2 | 1 1 |
3 | 0 0 1 |
4 | 1 0 0 1 |
5 | 1 0 1 1 1 |
+---------------------+
: _diag(B, 0)
: B
[symmetric]
1 2 3 4 5
+---------------------+
1 | 0 |
2 | 1 0 |
3 | 0 0 0 |
4 | 1 0 0 0 |
5 | 1 0 1 1 0 |
+---------------------+
In the context of your question, you can simply do the following:
mata: B = foo1(A)
mata: _diag(B, 0)
getmata (idb*) = B
list
+------------------------------------------------------------------------+
| idb idd1 idd2 idd3 idb1 idb2 idb3 idb4 idb5 |
|------------------------------------------------------------------------|
1. | 62014 370490 879271 1112878 0 1 0 1 1 |
2. | 62015 457013 1112878 370490 1 0 0 0 0 |
3. | 62016 341863 1366174 533773 0 0 0 0 1 |
4. | 62017 879271 327069 341596 1 0 0 0 1 |
5. | 62018 1391443 1366174 879271 1 0 1 1 0 |
+------------------------------------------------------------------------+
I would like to create new observations as follows:
A B C
1 1 1
1 2 2
1 3 4
1 4 5
1 5 2
2 1 1
2 2 5
2 3 3
2 4 3
*3* 1 .
*3* 2 .
*3* 3 .
*3* 4 .
*3* 5 .
4 1 4
4 2 3
4 3 1
The new lines are indicated by asterisks.
How can I create new observations for variable A and B?
This is a simple expand:
clear
input A B C
1 1 1
1 2 2
1 3 4
1 4 5
1 5 2
2 1 1
2 2 5
2 3 3
2 4 3
4 1 4
4 2 3
4 3 1
end
generate id = _n
expand 6 if id == 10
replace id = 11 if _n == _N
replace A = 3 if id == 10
replace C = . if id == 10
bysort id: replace B = cond(_n == 1, 1, B[_n-1]+1) if id == 10
Which will produce the desired output:
list, sepby(A)
+----------------+
| A B C id |
|----------------|
1. | 1 1 1 1 |
2. | 1 2 2 2 |
3. | 1 3 4 3 |
4. | 1 4 5 4 |
5. | 1 5 2 5 |
|----------------|
6. | 2 1 1 6 |
7. | 2 2 5 7 |
8. | 2 3 3 8 |
9. | 2 4 3 9 |
|----------------|
10. | 3 1 . 10 |
11. | 3 2 . 10 |
12. | 3 3 . 10 |
13. | 3 4 . 10 |
14. | 3 5 . 10 |
|----------------|
15. | 4 1 4 11 |
16. | 4 2 3 11 |
17. | 4 3 1 12 |
+----------------+
The code could be shorter.
expand 2 if _n < 6
replace A = 3 if _n > _N - 5
*replace B = _n + 5 - _N if A == 3
replace C = . if A == 3
sort A B
Say I have a dataset with three variables a, b, c, and having 5 observations. Like the following:
a b c
1 0 1
1 1 1
0 1 0
0 1 1
1 0 0
Now I want to generate a new variable called type, which is a possible combination of variable a, b and c. Specifically,
type=1 if a=b=c=0
type=2 if a=c=0 & b=1
type=3 if a=b=0 & c=1
type=4 if a=0 & b=c=1
type=5 if a=1 & b=c=0
type=6 if a=b=1 & c=0
type=7 if a=c=1 & b=0
type=8 if a=b=c=1
The new dataset I want to get is:
a b c type
1 0 1 7
1 1 1 8
0 1 0 2
0 1 1 4
1 0 0 5
Are there any general ways to realize this in Stata? It's better if this can also be extended when type is large, say 100 types. Thx a lot.
If the specific values of type don't matter, egen's group function works.
E.g.:
clear
input a b c
1 0 1
1 1 1
0 1 0
0 1 1
1 0 0
0 1 0
1 1 1
end
sort a b c // not necessary, but clearer presentation
egen type = group(a b c)
li
with the result
+------------------+
| a b c type |
|------------------|
1. | 0 1 0 1 |
2. | 0 1 0 1 |
3. | 0 1 1 2 |
4. | 1 0 0 3 |
5. | 1 0 1 4 |
|------------------|
6. | 1 1 1 5 |
7. | 1 1 1 5 |
+------------------+