Plot graph from within Mata - stata

Consider the following toy matrix in mata:
mata: A
1 2
+-----------------+
1 | 6555 140 |
2 | 7205 135 |
3 | 6255 140 |
4 | 7272 138 |
5 | 10283 133 |
6 | 8244 136 |
7 | 6909 144 |
8 | 7645 138 |
9 | 12828 134 |
10 | 6538 137 |
+-----------------+
If I want to draw a scatter plot using this matrix, I first need to transfer it
to Stata and then also convert it to variables with the svmat command:
mata: st_matrix("A", A)
svmat A
list, separator(0)
+-------------+
| A1 A2 |
|-------------|
1. | 6555 140 |
2. | 7205 135 |
3. | 6255 140 |
4. | 7272 138 |
5. | 10283 133 |
6. | 8244 136 |
7. | 6909 144 |
8. | 7645 138 |
9. | 12828 134 |
10. | 6538 137 |
+-------------+
twoway scatter A1 A2
Is there a way to directly draw the graph without leaving mata?

One can plot a mata matrix without first converting it to Stata variables as follows:
twoway scatter matamatrix(A)
See help twoway_mata for more details.
Edit by #PearlySpencer:
This can be run directly from within mata using the stata() function:
mata: stata("twoway scatter matamatrix(A)")

An alternative approach is to use the community-contributed mata function mm_plot():
mata: mm_plot(A, "scatter")
This is part of the moremata collection of functions and must thus be downloaded first:
ssc install moremata

Related

SAS - Combine like values within rows, then add new variable for non like value(s)

I have a large dataset and am trying to run an analyses on each customer (same account and routing #), which have 100's of transactions within the dataset. I
was able to add SEQ # for like acct#'s and routing #s. How would I run an analyses to say SEQ #1 and give total # of deposits (Amount), max, min of deposits and potentially some other helpful data.
+-----------+--------+---------+--------+
| Routing# | Acct# | AMOUNT | TOTAL |SEQ #
+-----------+--------+---------+--------+
| 518 | 0 | 490.50 | 3777.5 | 1
| 518 | 0 | 170.00 | 3777.5 | 1
| 518 | 0 | 3117.00 | 3777.5 | 1
| 518 | 99 | 875.00 | 875 | 2
| 518 | 999 | 499.00 | 499 | 3
| 519 | 2 | 100.00 | 200.00 | 4
| 519 | 2 | 100.00 | 200.00 | 4
+-----------+--------+---------+--------+
Thanks
There are multiple ways to do this, but here is a data step way
data have;
input Routing Acct AMOUNT;
datalines;
518 0 490.50
518 0 170.00
518 0 3117.00
518 99 875.00
518 999 499.00
519 2 100.00
519 2 100.00
;
data want;
do until (last.Acct);
set have;
by Routing Acct notsorted;
total+amount;
end;
seq+1;
do until (last.Acct);
set have;
by Routing Acct notsorted;
output;
end;
total=0;
run;

How to subtract across columns

I want to subtract the values by apid in the table below:
-----------------------------------------------
| apid | AB | AS | BS | CS | DS | difference |
|-------|----|----|----|----|----|----------- |
| AP013 | 43 | 36 | | | | 7 |
-----------------------------------------------
For example, for "AP013", the difference is subtracting AS from AB (43 - 36 = 7).
The new value also needs to be saved in a new column called diff.
Can you please tell me how to do this in Stata?
You just generate a new variable diff:
clear
input str5 apid AB AS
"AP013" 43 36
end
generate diff = AB - AS
list
+------------------------+
| apid AB AS diff |
|------------------------|
1. | AP013 43 36 7 |
+------------------------+

Chi-Sq test result difference when done Manually and by SAS

I am trying to perform a chi-square test on my data using SAS University Edition.
Here is the strucure of my data
+----------+------------+------------------+-------------------+
| study_id | Control_id | study_mortality | control_mortality |
+----------+------------+------------------|-------------------+
| 1 | 50 | Alive | Alive |
| 1 | 52 | Alive | Alive |
| 2 | 65 | Dead | Dead |
| 2 | 70 | Dead | Alive |
+----------+------------+------------------+-------------------+
I am getting different results when I do the test with SAS Vs when I do it manually using an online calculator. I used the values from 'PROC FREQ' to calculate the Chi-Sq using online calculator. Here are the outputs of frequencies and the Chi-sq test. Can someone point where the issue is.
proc freq data = mydata;
tables study_mortality control_mortality;
where type=1;
run;
+-----------------+-------------------+
| study_mortality | Frequency |
+-----------------+-------------------
| Alive | 7614 |
| Dead | 324 |
+-----------------+-------------------+
+----------------- +-------------------+
| control_mortality| Frequency |
+----------------- +-------------------
| Alive | 6922 |
| Dead | 159 |
+----------------- +-------------------+
proc freq data = mydata;
tables study_mortality*control_mortality/ CHISQ;
where type=1;
run;
+-----------------+-------------------+---------+-------+
| | Control_mortality | | |
+-----------------+-------------------+---------+-------+
| Study_mortality | Alive | Dead | Total |
| Alive | 5515 | 134 | 5649 |
| Dead | 249 | 5 | 254 |
| Total | 5764 | 139 | 5903 |
+-----------------+-------------------+---------+-------+
Statistic DF Value Prob
Chi-Square 1 0.1722 0.6782
Likelihood Ratio Chi-Square 1 0.1818 0.6699
Continuity Adj. Chi-Square 1 0.0414 0.8388
Mantel-Haenszel Chi-Square 1 0.1722 0.6782
Phi Coefficient -0.0054
Contingency Coefficient 0.0054
Cramer's V -0.0054
You have missing data. Look at the N's on those tables.
Study Mortality is around 8000 and Control Mortality is around 7000 but when you cross them you only have 5903 records. This means that certain records are excluded. There should be a line in the output saying N missing somewhere. Not sure if SAS didn't put it there or you only pasted selected output. The P value matches exactly when I use an online calculator and also match your output.
data have;
infile cards;
input Study Control N;
cards;
1 1 5515
1 0 134
0 1 249
0 0 5
;
run;
proc freq data=have;
table study*control / chisq;
weight N;
run;

Stata : distinct values, count and histogram

I am new to Stata and still learning.
I have a var shaped like that :
+-------+
| Phase |
+-------+
| I |
+-------+
| I |
+-------+
| II |
+-------+
| III |
+-------+
| II |
+-------+
My goal is to draw a histogram with the possible value (I,II,III) (x-axis) and the number of each (2,2,1) (y-axis).
I though I could make a loop and store the number of each possible in an array but arrays does not seem to be implemented in Stata.
Is the any kind of function that do what I want already implemented or I have to write a function to distinct the value, then count them, then draw the histogram ?
Thank you.
/edit :
processed.p |
hase | Freq. Percent Cum.
------------+-----------------------------------
I | 266 0.92 0.92
I/II | 1,006 3.50 4.42
II | 10,867 37.76 42.18
II/III | 344 1.20 43.37
III | 9,248 32.13 75.51
IV | 6,984 24.27 99.77
NA | 65 0.23 100.00
------------+-----------------------------------
Total | 28,780 100.00
I found a way of counting distinct values.
I found the solution :
tab processedphase, matcell(x)
in order to obtain
processed.p |
hase | Freq. Percent Cum.
------------+-----------------------------------
I | 266 0.92 0.92
I/II | 1,006 3.50 4.42
II | 10,867 37.76 42.18
II/III | 344 1.20 43.37
III | 9,248 32.13 75.51
IV | 6,984 24.27 99.77
NA | 65 0.23 100.00
------------+-----------------------------------
Total | 28,780 100.00
then :
matrix list x
svmat x

Summarizing statistics (mean, sd) in Stata, when there are three dichotomous explanatory variables

I am trying to create a table of summary statistics (mean, sd) for a DV when there are three dichotomous IV. Using the command tab IV1 Iv2, sum (DV) I can create a summary statistics table for only two IV variables, but not for three. However, I need the summary stats for the three IV and their interactions. Is there any way around? An alternative command? Thanks!
You can make an interaction variable like this:
webuse nlswork
egen interaction = group(race nev_mar union), label
tab interaction, sum(ln_wage)
Here I use the same well-chosen sandbox as Dimitriy.
webuse nlswork, clear
quietly statsby n=r(N) mean=r(mean) sd=r(sd), by(race nev_mar union) subsets clear: summarize ln_wage
egen nvars = rownonmiss(race nev_mar union )
sort nvars race nev_mar union
format mean sd %4.3f
l race-sd, sepby(nvars) noobs
+-------------------------------------------------+
| race nev_mar union n mean sd |
|-------------------------------------------------|
| . . . 28534 1.675 0.478 |
|-------------------------------------------------|
| white . . 13590 1.796 0.464 |
| black . . 5426 1.647 0.458 |
| other . . 211 1.890 0.510 |
| . 0 . 15509 1.758 0.466 |
| . 1 . 3718 1.740 0.477 |
| . . 0 14720 1.702 0.466 |
| . . 1 4507 1.927 0.432 |
|-------------------------------------------------|
| white 0 . 11399 1.794 0.462 |
| white 1 . 2191 1.808 0.474 |
| white . 0 10774 1.753 0.465 |
| white . 1 2816 1.961 0.422 |
| black 0 . 3955 1.651 0.455 |
| black 1 . 1471 1.634 0.467 |
| black . 0 3779 1.551 0.432 |
| black . 1 1647 1.867 0.440 |
| other 0 . 155 1.893 0.553 |
| other 1 . 56 1.881 0.369 |
| other . 0 167 1.865 0.510 |
| other . 1 44 1.983 0.507 |
| . 0 0 11936 1.707 0.464 |
| . 0 1 3573 1.930 0.429 |
| . 1 0 2784 1.682 0.474 |
| . 1 1 934 1.914 0.444 |
|-------------------------------------------------|
| white 0 0 9071 1.751 0.462 |
| white 0 1 2328 1.961 0.423 |
| white 1 0 1703 1.766 0.479 |
| white 1 1 488 1.958 0.420 |
| black 0 0 2745 1.556 0.433 |
| black 0 1 1210 1.867 0.429 |
| black 1 0 1034 1.536 0.430 |
| black 1 1 437 1.866 0.469 |
| other 0 0 120 1.856 0.550 |
| other 0 1 35 2.020 0.553 |
| other 1 0 47 1.888 0.391 |
| other 1 1 9 1.842 0.235 |
+-------------------------------------------------+
So you get compactly all the three-way combinations, all the two-way, all the one-way and the overall summary. Moreover, this summary set is now the dataset in memory, so you can manipulate it further, export it, and so forth.