How to create dummies based on multiple variables - stata

The following command can generate dummy variables:
tabulate age, generate(I)
Nevertheless, when I want a dummy based on multiple variables, what should I do?
For example, I would like to do the following concisely:
generate I1=1 if age==1 & year==2000
generate I2=1 if age==1 & year==2001
generate I3=1 if age==2 & year==2000
generate I4=1 if age==2 & year==2001
I have already tried this:
tabulate age year, generate(I)
However, it did not work.

You can get what you want as follows:
sysuse auto, clear
keep if !missing(rep78)
egen rf = group(rep78 foreign)
tabulate rf, generate(I)
group(rep78 |
foreign) | Freq. Percent Cum.
------------+-----------------------------------
1 | 2 2.90 2.90
2 | 8 11.59 14.49
3 | 27 39.13 53.62
4 | 3 4.35 57.97
5 | 9 13.04 71.01
6 | 9 13.04 84.06
7 | 2 2.90 86.96
8 | 9 13.04 100.00
------------+-----------------------------------
Total | 69 100.00
list I* in 1 / 10
+---------------------------------------+
| I1 I2 I3 I4 I5 I6 I7 I8 |
|---------------------------------------|
1. | 0 0 1 0 0 0 0 0 |
2. | 0 0 1 0 0 0 0 0 |
3. | 0 0 1 0 0 0 0 0 |
4. | 0 0 0 0 1 0 0 0 |
5. | 0 0 1 0 0 0 0 0 |
6. | 0 0 1 0 0 0 0 0 |
7. | 0 0 1 0 0 0 0 0 |
8. | 0 0 1 0 0 0 0 0 |
9. | 0 0 1 0 0 0 0 0 |
10. | 0 1 0 0 0 0 0 0 |
+---------------------------------------+

Related

Connect IDs based on values in rows, ignoring connection between identical IDs

This is a follow-up to my previous question: Connect IDs based on values in rows.
I would now like to consider the case, where connections between identical idb's should be classified as 0.
The output is similar to the matrix in my previous post but with diagonal elements equal to 0:
62014 62015 62016 62017 62018
62014 0 1 0 1 1
62015 1 0 0 0 0
62016 0 0 0 0 1
62017 1 0 0 0 1
62018 1 0 1 1 0
How can I do this in Stata?
You can easily change the values in the diagonal of a matrix as follows:
: B
[symmetric]
1 2 3 4 5
+---------------------+
1 | 1 |
2 | 1 1 |
3 | 0 0 1 |
4 | 1 0 0 1 |
5 | 1 0 1 1 1 |
+---------------------+
: _diag(B, 0)
: B
[symmetric]
1 2 3 4 5
+---------------------+
1 | 0 |
2 | 1 0 |
3 | 0 0 0 |
4 | 1 0 0 0 |
5 | 1 0 1 1 0 |
+---------------------+
In the context of your question, you can simply do the following:
mata: B = foo1(A)
mata: _diag(B, 0)
getmata (idb*) = B
list
+------------------------------------------------------------------------+
| idb idd1 idd2 idd3 idb1 idb2 idb3 idb4 idb5 |
|------------------------------------------------------------------------|
1. | 62014 370490 879271 1112878 0 1 0 1 1 |
2. | 62015 457013 1112878 370490 1 0 0 0 0 |
3. | 62016 341863 1366174 533773 0 0 0 0 1 |
4. | 62017 879271 327069 341596 1 0 0 0 1 |
5. | 62018 1391443 1366174 879271 1 0 1 1 0 |
+------------------------------------------------------------------------+

Create an indicator flag when condition has been met

I would like to find a way to create an indicator flag across rows such that once a criterion has been met, the flag persists across all cases within a group.
In the sample data below, I have a variable _p that defines statistical significance of the comparison of values in _mar across levels of _m. I also have a grouping variable _g that indicates the comparisons are made within a group.
The variables _f_s and _f_n represent the end result that I would like to have.
clear
input _mar _m _p _g _f_s _f_n
2.99 0 0.00000 0 1 0
3.03 1 0.00000 0 1 0
3.05 2 0.00000 0 1 1
3.06 3 0.22179 0 0 1
3.07 4 0.18044 0 0 1
3.07 5 0.58009 0 0 1
3.06 6 0.40620 0 0 1
3.06 7 0.47257 0 0 1
3.06 8 0.91196 0 0 1
3.05 9 0.68560 0 0 1
2.65 0 0.00000 1 1 0
2.70 1 0.00000 1 1 0
2.73 2 0.00103 1 1 0
2.75 3 0.00944 1 1 1
2.75 4 0.64713 1 0 1
2.76 5 0.55476 1 0 1
2.77 6 0.32807 1 0 1
2.78 7 0.03271 1 0 1
2.78 8 0.00219 1 0 1
2.79 9 0.57361 1 0 1
end
I would like to use the flag to indicate in a graph where statistical significance "stops" and ignore other comparisons values.
Below you can also find the code that I have attempted up to this point:
Snippet 1 - graph works, lines are structured as desired
snapshot save, label("import")
snapshot list
twoway ///
(line _mar _m if _g == 0 & _f_s==1, lcolor(orange) lpattern(solid)) ///
(line _mar _m if _g == 0 & _f_n==1, lcolor(orange) lpattern(dash )) ///
(scatter _mar _m if _g == 0, mcolor(orange) msymbol(o) mlabel(_mar) mlabcolor(orange) mlabsize(vsmall) mlabposition(11)) ///
///
(line _mar _m if _g == 1 & _f_s==1, lcolor(blue*2) lpattern(solid)) ///
(line _mar _m if _g == 1 & _f_n==1, lcolor(blue*2) lpattern(dash )) ///
(scatter _mar _m if _g == 1, mcolor(blue*2) msymbol(o) mlabel(_mar) mlabcolor(blue*2) mlabsize(vsmall) mlabposition(11)) ///
, legend(off) ///
xlabel(-1(1)9 -1 " " 0 "0 " 9 "9+" ) ///
ylabel(2.5(0.10)3.5, angle(horizontal) format(%5.2f) ) ymlabel(2.5(0.10)3.5, grid nolabel) ///
xtitle( "Levels" ) ytitle("Adjusted First Year GPA", height(8) ) ///
name(good)
Snippet 2 - graph does not work, lines are not structured as desired
snapshot restore 1
sort _g _m
gen x_f_s = (_p <= .05)
replace x_f_s = 0 if x_f_s ==1 & x_f_s[_n-1]==0 & x_f_s[_n+1]==0
replace x_f_s = 1 if _m == 0
gen x_f_n = x_f_s == 0
replace x_f_n = 1 if x_f_s ==1 & x_f_s[_n+1]==0
/***** the created flags are not correct *****/
list, sepby(_g)
twoway ///
(line _mar _m if _g == 0 & x_f_s==1, lcolor(orange) lpattern(solid)) ///
(line _mar _m if _g == 0 & x_f_n==1, lcolor(orange) lpattern(dash )) ///
(scatter _mar _m if _g == 0, mcolor(orange) msymbol(o) mlabel(_mar) mlabcolor(orange) mlabsize(vsmall) mlabposition(11)) ///
///
(line _mar _m if _g == 1 & x_f_s==1, lcolor(blue*2) lpattern(solid)) ///
(line _mar _m if _g == 1 & x_f_n==1, lcolor(blue*2) lpattern(dash )) ///
(scatter _mar _m if _g == 1, mcolor(blue*2) msymbol(o) mlabel(_mar) mlabcolor(blue*2) mlabsize(vsmall) mlabposition(11)) ///
, legend(off) ///
xlabel(-1(1)9 -1 " " 0 "0 " 9 "9+" ) ///
ylabel(2.5(0.10)3.5, angle(horizontal) format(%5.2f) ) ymlabel(2.5(0.10)3.5, grid nolabel) ///
xtitle( "Levels" ) ytitle("Adjusted First Year GPA", height(8) ) ///
name(not_good)
The variables that I have tried to calculate are noted with x_f_s and x_f_n.
The flags work when there are no subsequent statistical comparisons that happen to be significant. However, when there is a significant comparison after the initial "stop" the plotting does not work.
There should also be a second flag that indicates where "non-significance" starts. This would carry forward in a similar way to the first flag.
I am using solid and dashed lines to indicate where significance exists, and then stops.
Ultimately, I would like to create flags within groups for plotting purposes.
This is how I would do it:
bysort _g (_m): generate x_f_s = (_p <= .05)
bysort _g (_m): generate x_f_n = x_f_s == 0
list, sepby(_g)
+-------------------------------------------------------+
| _mar _m _p _g _f_s _f_n x_f_s x_f_n |
|-------------------------------------------------------|
1. | 2.99 0 0 0 1 0 1 0 |
2. | 3.03 1 0 0 1 0 1 0 |
3. | 3.05 2 0 0 1 1 1 0 |
4. | 3.06 3 .22179 0 0 1 0 1 |
5. | 3.07 4 .18044 0 0 1 0 1 |
6. | 3.07 5 .58009 0 0 1 0 1 |
7. | 3.06 6 .4062 0 0 1 0 1 |
8. | 3.06 7 .47257 0 0 1 0 1 |
9. | 3.06 8 .91196 0 0 1 0 1 |
10. | 3.05 9 .6856 0 0 1 0 1 |
|-------------------------------------------------------|
11. | 2.65 0 0 1 1 0 1 0 |
12. | 2.7 1 0 1 1 0 1 0 |
13. | 2.73 2 .00103 1 1 0 1 0 |
14. | 2.75 3 .00944 1 1 1 1 0 |
15. | 2.75 4 .64713 1 0 1 0 1 |
16. | 2.76 5 .55476 1 0 1 0 1 |
17. | 2.77 6 .32807 1 0 1 0 1 |
18. | 2.78 7 .03271 1 0 1 1 0 |
19. | 2.78 8 .00219 1 0 1 1 0 |
20. | 2.79 9 .57361 1 0 1 0 1 |
+-------------------------------------------------------+
This is how you can automate the application of the first rule:
bysort _g (_m): generate x_f_s = (_p <= .05)
clonevar tag = x_f_s
local i 1
while `i'== 1 {
capture noisily {
bysort _g (_m): assert x_f_s == 0 if _p <= .05 & (tag == 1 & tag[_n-1] == 0)
}
if _rc {
bysort _g (_m): replace x_f_s = 0 if _p <= .05 & (tag == 1 & tag[_n-1] == 0)
drop tag
clonevar tag = x_f_s
}
else local i 0
}
drop tag
Which produces the desired output for x_f_s:
list
+-----------------------------------------------+
| _mar _m _p _g _f_s _f_n x_f_s |
|-----------------------------------------------|
1. | 2.99 0 0 0 1 0 1 |
2. | 3.03 1 0 0 1 0 1 |
3. | 3.05 2 0 0 1 1 1 |
4. | 3.06 3 .22179 0 0 1 0 |
5. | 3.07 4 .18044 0 0 1 0 |
|-----------------------------------------------|
6. | 3.07 5 .58009 0 0 1 0 |
7. | 3.06 6 .4062 0 0 1 0 |
8. | 3.06 7 .47257 0 0 1 0 |
9. | 3.06 8 .91196 0 0 1 0 |
10. | 3.05 9 .6856 0 0 1 0 |
|-----------------------------------------------|
11. | 2.65 0 0 1 1 0 1 |
12. | 2.7 1 0 1 1 0 1 |
13. | 2.73 2 .00103 1 1 0 1 |
14. | 2.75 3 .00944 1 1 1 1 |
15. | 2.75 4 .64713 1 0 1 0 |
|-----------------------------------------------|
16. | 2.76 5 .55476 1 0 1 0 |
17. | 2.77 6 .32807 1 0 1 0 |
18. | 2.78 7 .03271 1 0 1 0 |
19. | 2.78 8 .00219 1 0 1 0 |
20. | 2.79 9 .57361 1 0 1 0 |
+-----------------------------------------------+
The second rule is more straightforward as you only need to replace just before the cut-off point:
bysort _g (_m): generate x_f_n = x_f_s == 0
bysort _g (_m): replace x_f_n = 1 if x_f_s == 1 & x_f_s[_n+1]== 0
list
+-------------------------------------------------------+
| _mar _m _p _g _f_s _f_n x_f_s x_f_n |
|-------------------------------------------------------|
1. | 2.99 0 0 0 1 0 1 0 |
2. | 3.03 1 0 0 1 0 1 0 |
3. | 3.05 2 0 0 1 1 1 1 |
4. | 3.06 3 .22179 0 0 1 0 1 |
5. | 3.07 4 .18044 0 0 1 0 1 |
|-------------------------------------------------------|
6. | 3.07 5 .58009 0 0 1 0 1 |
7. | 3.06 6 .4062 0 0 1 0 1 |
8. | 3.06 7 .47257 0 0 1 0 1 |
9. | 3.06 8 .91196 0 0 1 0 1 |
10. | 3.05 9 .6856 0 0 1 0 1 |
|-------------------------------------------------------|
11. | 2.65 0 0 1 1 0 1 0 |
12. | 2.7 1 0 1 1 0 1 0 |
13. | 2.73 2 .00103 1 1 0 1 0 |
14. | 2.75 3 .00944 1 1 1 1 1 |
15. | 2.75 4 .64713 1 0 1 0 1 |
|-------------------------------------------------------|
16. | 2.76 5 .55476 1 0 1 0 1 |
17. | 2.77 6 .32807 1 0 1 0 1 |
18. | 2.78 7 .03271 1 0 1 0 1 |
19. | 2.78 8 .00219 1 0 1 0 1 |
20. | 2.79 9 .57361 1 0 1 0 1 |
+-------------------------------------------------------+

Check whether a range contains specific values

Var1 is given. Var2 should take value 1 if the Observation or one of the previous 5 observations is a missing value or 0. What is the Syntax for Var2?
I know how to do it with a lot of if Statements. But when I need to do it for the previous 50 observations that gets too inconvenient.
* Example generated by -dataex-. To install: ssc install dataex
clear
input float(Var1 Var2)
5 0
. 1
2 1
5 1
7 1
9 1
5 1
9 0
0 1
2 1
7 1
5 1
3 1
2 1
5 0
end
The question is similar to your previous --Finding the second smallest value -- which you should quote. So is this answer. rangestat is from SSC.
clear
input float(Var1 Var2)
5 0
. 1
2 1
5 1
7 1
9 1
5 1
9 0
0 1
2 1
7 1
5 1
3 1
2 1
5 0
end
gen long id = _n
gen Bad = inlist(Var1, 0, .)
rangestat (sum) Bad, int(id -5 0)
list, sepby(Bad_sum)
+----------------------------------+
| Var1 Var2 id Bad Bad_sum |
|----------------------------------|
1. | 5 0 1 0 0 |
|----------------------------------|
2. | . 1 2 1 1 |
3. | 2 1 3 0 1 |
4. | 5 1 4 0 1 |
5. | 7 1 5 0 1 |
6. | 9 1 6 0 1 |
7. | 5 1 7 0 1 |
|----------------------------------|
8. | 9 0 8 0 0 |
|----------------------------------|
9. | 0 1 9 1 1 |
10. | 2 1 10 0 1 |
11. | 7 1 11 0 1 |
12. | 5 1 12 0 1 |
13. | 3 1 13 0 1 |
14. | 2 1 14 0 1 |
|----------------------------------|
15. | 5 0 15 0 0 |
+----------------------------------+

Generate variables by many different groups

I have a dataset with:
A unique person_id.
Different subjects that the person took in the past (humanities, IT, business etc.).
The Degree of each subject.
This looks as follows:
person_id humanities business IT Degree
1 0 1 0 BSc
1 0 0 1 MSc
2 1 0 0 PhD
2 0 1 0 MSc
2 0 0 1 BSc
3 0 0 1 BSc
I would like to transform this dataset so that I have variables consisting of each possible combination of degree and subject for each person_id.
The idea is that when I collapse it later by person_id, I will have one value for each person (namely 0 or 1). I have twelve different subjects and four main degrees.
person_id humanities business IT Degree BSc_humanities MSc_Hum
1 0 1 0 BSc 0 0
1 0 0 1 MSc 0 0
2 1 0 0 PhD 0 1
2 1 0 0 MSc 0 1
2 0 0 1 BSc 0 1
3 0 0 1 BSc 0 0
What would be the best possible way to achieve this?
You could use fillin:
clear
input person_id humanities business IT str3 Degree
1 0 1 0 BSc
1 0 0 1 MSc
2 1 0 0 PhD
2 0 1 0 MSc
2 0 0 1 BSc
3 0 0 1 BSc
end
fillin person_id humanities business Degree
list person_id humanities business Degree
+-----------------------------------------+
| person~d humani~s business Degree |
|-----------------------------------------|
1. | 1 0 0 BSc |
2. | 1 0 0 MSc |
3. | 1 0 0 PhD |
4. | 1 0 1 BSc |
5. | 1 0 1 MSc |
|-----------------------------------------|
6. | 1 0 1 PhD |
7. | 1 1 0 BSc |
8. | 1 1 0 MSc |
9. | 1 1 0 PhD |
10. | 1 1 1 BSc |
|-----------------------------------------|
11. | 1 1 1 MSc |
12. | 1 1 1 PhD |
13. | 2 0 0 BSc |
14. | 2 0 0 MSc |
15. | 2 0 0 PhD |
|-----------------------------------------|
16. | 2 0 1 BSc |
17. | 2 0 1 MSc |
18. | 2 0 1 PhD |
19. | 2 1 0 BSc |
20. | 2 1 0 MSc |
|-----------------------------------------|
21. | 2 1 0 PhD |
22. | 2 1 1 BSc |
23. | 2 1 1 MSc |
24. | 2 1 1 PhD |
25. | 3 0 0 BSc |
|-----------------------------------------|
26. | 3 0 0 MSc |
27. | 3 0 0 PhD |
28. | 3 0 1 BSc |
29. | 3 0 1 MSc |
30. | 3 0 1 PhD |
|-----------------------------------------|
31. | 3 1 0 BSc |
32. | 3 1 0 MSc |
33. | 3 1 0 PhD |
34. | 3 1 1 BSc |
35. | 3 1 1 MSc |
|-----------------------------------------|
36. | 3 1 1 PhD |
+-----------------------------------------+

Manipulating Bit-wise Operations

There is this puzzle question of creating an equivalent bit-wise & with only | and ~ operators.
I've been doing brute force combinations of | and ~ using 6 (0110) and 5 (0101) trying to get 4 (0100), but I still cannot get the answer.
The maximum number of operation can be used is 8.
Can someone please give me hints?
What helps you here is De Morgan's Law, which basically says:
~(a & b) == ~a | ~b
Thus we can just negate this and get:
a & b == ~(~a | ~b) //4 operations
And looking at the truth table (and in fact, god bless the simplicity of binary logic, there are only four possible combintations of inputs to generate the appropriate outputs for) we can see that both are equivalent (last two columns):
a | b | ~a | ~b | ~a OR ~b | ~(~a OR ~b) | a AND b
--|---|----|----|----------|-------------|--------
0 | 0 | 1 | 1 | 1 | 0 | 0
1 | 0 | 0 | 1 | 1 | 0 | 0
0 | 1 | 1 | 0 | 1 | 0 | 0
1 | 1 | 0 | 0 | 0 | 1 | 1
Truth table time...
A B A&B !A !B !A|!B !(!A|!B)
0 0 0 1 1 1 0
0 1 0 1 0 1 0
1 0 0 0 1 1 0
1 1 1 0 0 0 1