Stata inverse matrix function - stata

I'm trying to get inverse matrix with inv() function.
Excel function is working fine but I can't get it from Stata 11 and Stata 12 version
matrix A = (0,0,553959,18071,0,0,86985,0,0,0\0,0,13752,1986661,0,0,14178,0,0,0\245764,55172,0,0,0,0,210238,15835,0,174155\135950,1217897,0,0,211554,0,348453,197592,424893,704246\0,0,40442,171113,0,0,0,0,0,0\277015,720994,0,0,0,0,0,0,0,0\0,0,0,0,0,989861,121720,67779,0,58624\286,20529,34840,90896,0,8147,157021,265924,51955,4187\0,0,0,0,0,0,299389,86656,0,90804\0,0,58171,973844,0,0,0,0,0,0)
matrix list A
matrix D = inv(A)*A
matrix list D
I get:
D[10,10]
c1 c2 c3 c4 c5 c6 c7 c8 c9 c10
c1 .99815163 -.0007439 24 256 -4.441e-18 .02180544 .63042827 .71306993 .13905754 .72740125
c2 .00071017 1.0002858 -64 -640 1.978e-17 -.00837793 1.0656752 -.27397047 -.05342766 -.27947675
c3 2.008e-20 8.082e-21 2.143632 20.313752 0 -2.369e-19 .08155506 -7.747e-18 -1.511e-18 -2.800e-19
c4 7.748e-22 3.118e-22 .04412596 1.7837869 0 -9.141e-21 .00314672 -2.989e-19 -5.829e-20 -1.080e-20
c5 -.03648975 -.01468572 512 2048 1 .430473 13.357737 14.077098 2.7452099 14.360021
c6 .000033 .00001328 -1.125 -12 0 .9996107 -.09016952 -.01273068 -.00248264 -.01298654
c7 -1.280e-19 -5.153e-20 -7.292322 -129.5298 0 1.511e-18 .47996753 4.940e-17 9.633e-18 1.785e-18
c8 -.00276051 -.001111 32 512 0 .03256598 3.1088352 2.0649553 .20767957 1.0863588
c9 .01364134 .00549012 0 -1024 0 -.1609282 -9.7734934 -5.2625881 -.02627036 -5.3683558
c10 .00263441 .00106025 0 -128 0 -.03107834 -1.6240499 -1.0163072 -.1981926 -.03673303
But I think it should be:
1 0 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0
14 -8 1 -9,42932E-17 -64 1,00614E-16 -1,73472E-18 0 1,33227E-15 32
0 -32 -5,55112E-17 1 0 1,11022E-16 0 0 0 -128
0,021597505 -0,210228064 0 0 0,788571331 4,27485E-20 2,13743E-20 0 5,47181E-18 0,788571331
0 0 0 0 0 1 0 0 0 0
0 0 0 0 -16 -8,67362E-19 1 0 0 0
3,5 -1,75 6,93889E-18 -2,7765E-19 -128 1,38778E-17 0 1 2,22045E-16 35
0 0 0 0 0 0 0 0 1 0
-0,007446191 -0,112499947 0 0 1,604924249 8,70031E-20 4,35016E-20 0 1,11364E-17 1,604924249

I believe the problem is your matrix is ill-conditioned, i.e. almost singular.
If you try to compute the inverse within Mata (Stata's matrix programming language), the result is:
: Ainv = luinv(A)
: Ainv
[symmetric]
1 2 3 4 5 6 7 8 9 10
+---------------------------------------------------+
1 | . |
2 | . . |
3 | . . . |
4 | . . . . |
5 | . . . . . |
6 | . . . . . . |
7 | . . . . . . . |
8 | . . . . . . . . |
9 | . . . . . . . . . |
10 | . . . . . . . . . . |
+---------------------------------------------------+
Couple that with:
If you use these functions with a singular matrix, returned will be a
matrix of missing values. The determination of singularity is made
relative to tol. See Tolerance under Remarks in [M-5]
lusolve() for details.
Source: help mf_luinv.
Checking the condition number, we see it is very high, confirming the ill-condition:
: C = cond(A)
: C
7.47519e+17
Numerical methods vary, but for a matrix like this, you can expect large inaccuracies. See help mf_lusolve##remarks3 as indicated above.

Related

Stata: Reverse Sort

I have a data set that looks like this:
id varA varB varC
1 0 10 .
1 0 20 .
1 0 35 .
2 1 60 76
2 1 76 60
2 0 32 .
I want to create the varC that reverses the order of varB only for the values varA=1 and missing otherwise.
This may help:
clear
input id varA varB varC
1 0 10 .
1 0 20 .
1 0 35 .
2 1 60 76
2 1 76 60
2 0 32 .
end
gen group = sum(id != id[_n-1] | varA != varA[_n-1])
sort group, stable
by group: gen wanted = cond(varA == 1, varB[_N - _n + 1], .)
list id var* wanted, sepby(id varA)
+----------------------------------+
| id varA varB varC wanted |
|----------------------------------|
1. | 1 0 10 . . |
2. | 1 0 20 . . |
3. | 1 0 35 . . |
|----------------------------------|
4. | 2 1 60 76 76 |
5. | 2 1 76 60 60 |
|----------------------------------|
6. | 2 0 32 . . |
+----------------------------------+

Aggregate dummy variables to multiple categorical variables

I have 8 dummy variables (0/1). Those 8 variables have to be aggregated to one categorical variable with 8 items (categories). Normally, people should have just marked one out of the 8 dummy variables, but some marked multiple ones.
When a Person has marked two items, the first value should go into the first categorical variable, whereas the second value should go to the second categorical variable. When there are 3 items marked, the third values should go into a third categorical variable and so on (up to 3).
I know how to aggregate the dummies to a categorical variable, but I do not know which approach there is to divide the values to different variables, based on the number of marked dummies.
If the problem is not clear, please tell me. It was difficult for me to describe it properly.
Edit:
My approach is the follwoing:
local MCM_zahl4 F0801 F0802 F0803 F0804 F0805 F0806 F0807 F0808
gen MCM_zaehl_4 = 0
foreach var of varlist `MCM_zahl4' {
replace MCM_zaehl_4 = MCM_zaehl_4 + 1 if `var' == 1
}
tab MCM_zaehl_4
/*
MCM_zaehl_4 | Freq. Percent Cum.
------------+-----------------------------------
0 | 31 4.74 4.74
1 | 598 91.44 96.18
2 | 22 3.36 99.54
3 | 3 0.46 100.00
------------+-----------------------------------
Total | 654 100.00
*/
gen bildu2 = -999999
gen bildu2_D = -999999
replace bildu2 = 1 if F0801 == 1 & MCM_zaehl_4 == 1
replace bildu2 = 2 if F0802 == 1 & MCM_zaehl_4 == 1
replace bildu2 = 3 if F0803 == 1 & MCM_zaehl_4 == 1
replace bildu2 = 4 if F0804 == 1 & MCM_zaehl_4 == 1
replace bildu2 = 5 if F0805 == 1 & MCM_zaehl_4 == 1
replace bildu2 = 6 if F0806 == 1 & MCM_zaehl_4 == 1
replace bildu2 = 7 if F0807 == 1 & MCM_zaehl_4 == 1
replace bildu2 = 8 if F0808 == 1 & MCM_zaehl_4 == 1
Then I split all cases MCM_zaehl_4 > 1 manually in three variables.
E. g. for two mcm:
replace bildu2 = 5 if ID == XXX
replace bildu2_D = 2 if ID == XXX
For that approach I'd need an auomation, because for more observations I won't be able to do it manually.
If I understood you correctly, you could try the following to aggregate your multiples dummy variables into multiple aggregate columns based on the number of answers that the person marked. It assumes the repeated answers are consecutive. I reduced your problem to 6 dummy (a1-a6) and people can answer up to 3 questions.
clear
input id a1 a2 a3 a4 a5 a6
1 1 0 0 0 0 0
2 1 1 0 0 0 0
3 1 1 1 0 0 0
4 1 1 1 0 0 0
5 0 1 0 0 0 0
6 1 0 0 0 0 0
7 0 0 0 0 1 0
8 0 0 0 0 0 1
end
egen n_asnwers = rowtotal(a*)
gen wanted_1 = .
gen wanted_2 = .
gen wanted_3 = .
local i = 1
foreach v of varlist a* {
replace wanted_1 = `v' if `v' == 1 & n_asnwers == 1
replace wanted_2 = `v' if `v' == 1 & n_asnwers == 2
replace wanted_3 = `v' if `v' == 1 & n_asnwers == 3
local ++i
}
list
/*
+------------------------------------------------------------------------------+
| id a1 a2 a3 a4 a5 a6 n_asnw~s wanted_1 wanted_2 wanted_3 |
|------------------------------------------------------------------------------|
1. | 1 1 0 0 0 0 0 1 1 . . |
2. | 2 1 1 0 0 0 0 2 . 1 . |
3. | 3 1 1 1 0 0 0 3 . . 1 |
4. | 4 1 1 1 0 0 0 3 . . 1 |
5. | 5 0 1 0 0 0 0 1 1 . . |
|------------------------------------------------------------------------------|
6. | 6 1 0 0 0 0 0 1 1 . . |
7. | 7 0 0 0 0 1 0 1 1 . . |
8. | 8 0 0 0 0 0 1 1 1 . . |
+------------------------------------------------------------------------------+
*/

Connect IDs based on values in rows, ignoring connection between identical IDs

This is a follow-up to my previous question: Connect IDs based on values in rows.
I would now like to consider the case, where connections between identical idb's should be classified as 0.
The output is similar to the matrix in my previous post but with diagonal elements equal to 0:
62014 62015 62016 62017 62018
62014 0 1 0 1 1
62015 1 0 0 0 0
62016 0 0 0 0 1
62017 1 0 0 0 1
62018 1 0 1 1 0
How can I do this in Stata?
You can easily change the values in the diagonal of a matrix as follows:
: B
[symmetric]
1 2 3 4 5
+---------------------+
1 | 1 |
2 | 1 1 |
3 | 0 0 1 |
4 | 1 0 0 1 |
5 | 1 0 1 1 1 |
+---------------------+
: _diag(B, 0)
: B
[symmetric]
1 2 3 4 5
+---------------------+
1 | 0 |
2 | 1 0 |
3 | 0 0 0 |
4 | 1 0 0 0 |
5 | 1 0 1 1 0 |
+---------------------+
In the context of your question, you can simply do the following:
mata: B = foo1(A)
mata: _diag(B, 0)
getmata (idb*) = B
list
+------------------------------------------------------------------------+
| idb idd1 idd2 idd3 idb1 idb2 idb3 idb4 idb5 |
|------------------------------------------------------------------------|
1. | 62014 370490 879271 1112878 0 1 0 1 1 |
2. | 62015 457013 1112878 370490 1 0 0 0 0 |
3. | 62016 341863 1366174 533773 0 0 0 0 1 |
4. | 62017 879271 327069 341596 1 0 0 0 1 |
5. | 62018 1391443 1366174 879271 1 0 1 1 0 |
+------------------------------------------------------------------------+

How to create dummies based on multiple variables

The following command can generate dummy variables:
tabulate age, generate(I)
Nevertheless, when I want a dummy based on multiple variables, what should I do?
For example, I would like to do the following concisely:
generate I1=1 if age==1 & year==2000
generate I2=1 if age==1 & year==2001
generate I3=1 if age==2 & year==2000
generate I4=1 if age==2 & year==2001
I have already tried this:
tabulate age year, generate(I)
However, it did not work.
You can get what you want as follows:
sysuse auto, clear
keep if !missing(rep78)
egen rf = group(rep78 foreign)
tabulate rf, generate(I)
group(rep78 |
foreign) | Freq. Percent Cum.
------------+-----------------------------------
1 | 2 2.90 2.90
2 | 8 11.59 14.49
3 | 27 39.13 53.62
4 | 3 4.35 57.97
5 | 9 13.04 71.01
6 | 9 13.04 84.06
7 | 2 2.90 86.96
8 | 9 13.04 100.00
------------+-----------------------------------
Total | 69 100.00
list I* in 1 / 10
+---------------------------------------+
| I1 I2 I3 I4 I5 I6 I7 I8 |
|---------------------------------------|
1. | 0 0 1 0 0 0 0 0 |
2. | 0 0 1 0 0 0 0 0 |
3. | 0 0 1 0 0 0 0 0 |
4. | 0 0 0 0 1 0 0 0 |
5. | 0 0 1 0 0 0 0 0 |
6. | 0 0 1 0 0 0 0 0 |
7. | 0 0 1 0 0 0 0 0 |
8. | 0 0 1 0 0 0 0 0 |
9. | 0 0 1 0 0 0 0 0 |
10. | 0 1 0 0 0 0 0 0 |
+---------------------------------------+

Updated exposure variables in Stata

I'm trying to create a variable for updated body mass index (bmi) through 4 visits of a study. I've tried the below but it only lists the value from the last visit. My data is in wide format where visit_v1 = 1 if the participant was present for visit 1 and bmi_v1 = bmi at visit 1. I want bmi_su to equal bmi_v1 if visit_v1=1, bmi_v2 if visit_v2==1, etc. Any thoughts where I'm going wrong?
gen bmi_su = .
replace bmi_su = bmi_v4 if visit_v4==1
replace bmi_su = bmi_v3 if visit_v3==1 & visit_v4==0
replace bmi_su = bmi_v2 if visit_v2==1 & visit_v4==0 & visit_v3==0
replace bmi_su = bmi_v1 if visit_v1==1 & visit_v4==0 & visit_v3==0 & visit_v2==0
Do you seek something like this:
. clear all
. set more off
.
. * Assumed data structure
. input ///
> id bmi visit1 visit2 visit3 bmi1 bmi2 bmi3
id bmi visit1 visit2 visit3 bmi1 bmi2 bmi3
1. 1 20 1 0 0 20 0 0
2. 1 . 0 1 0 0 25 0
3. 1 . 0 0 1 0 0 28
4. end
.
. list, noobs
+----------------------------------------------------------+
| id bmi visit1 visit2 visit3 bmi1 bmi2 bmi3 |
|----------------------------------------------------------|
| 1 20 1 0 0 20 0 0 |
| 1 . 0 1 0 0 25 0 |
| 1 . 0 0 1 0 0 28 |
+----------------------------------------------------------+
.
. * What you want?
. gen bmisu = bmi1 + bmi2 + bmi3
.
. list, noobs
+------------------------------------------------------------------+
| id bmi visit1 visit2 visit3 bmi1 bmi2 bmi3 bmisu |
|------------------------------------------------------------------|
| 1 20 1 0 0 20 0 0 20 |
| 1 . 0 1 0 0 25 0 25 |
| 1 . 0 0 1 0 0 28 28 |
+------------------------------------------------------------------+
?
Panel or longitudinal data are usually much better off in a long data structure or shape (some say format).
In your case, the definitions imply that the last measurement will trump earlier measurements, so it is not clear why you seem surprised.
Here are some more systematic ways to do calculations. First,
gen bmi_su = bmi_v4
forval j = 3(-1)1 {
replace bmi_su = bmi_v`j' if visit`j'
}
Second,
gen bmi_su2 = bmi_v1
forval j = 2/4 {
replace bmi_su2 = bmi_v`j' if visit`j'
}
Consider also variants of the above with if missing(bmi_su) or if missing(bmi_su2) rather than the if conditions shown.