Stata : distinct values, count and histogram - stata

I am new to Stata and still learning.
I have a var shaped like that :
+-------+
| Phase |
+-------+
| I |
+-------+
| I |
+-------+
| II |
+-------+
| III |
+-------+
| II |
+-------+
My goal is to draw a histogram with the possible value (I,II,III) (x-axis) and the number of each (2,2,1) (y-axis).
I though I could make a loop and store the number of each possible in an array but arrays does not seem to be implemented in Stata.
Is the any kind of function that do what I want already implemented or I have to write a function to distinct the value, then count them, then draw the histogram ?
Thank you.
/edit :
processed.p |
hase | Freq. Percent Cum.
------------+-----------------------------------
I | 266 0.92 0.92
I/II | 1,006 3.50 4.42
II | 10,867 37.76 42.18
II/III | 344 1.20 43.37
III | 9,248 32.13 75.51
IV | 6,984 24.27 99.77
NA | 65 0.23 100.00
------------+-----------------------------------
Total | 28,780 100.00
I found a way of counting distinct values.

I found the solution :
tab processedphase, matcell(x)
in order to obtain
processed.p |
hase | Freq. Percent Cum.
------------+-----------------------------------
I | 266 0.92 0.92
I/II | 1,006 3.50 4.42
II | 10,867 37.76 42.18
II/III | 344 1.20 43.37
III | 9,248 32.13 75.51
IV | 6,984 24.27 99.77
NA | 65 0.23 100.00
------------+-----------------------------------
Total | 28,780 100.00
then :
matrix list x
svmat x

Related

How do I select the interaction coefficients to keep in Stata?

Similar to the question posed here, but I think I am not employing it correctly.
I used help fvvarlist to guide me on interactions.
I am employing a triple interaction with 3 binary variables:
As a toy model, let us assume:
x = gender (1 = male, 0 = female)
y = health (1 = good, 0 = poor)
z = employment (1 = employed, 0 = not employed)
using the following regression:
reg x##y##z if state == "NY" & year >1985
I am interested in the results for 1.x#1.y#1.z, but this coefficient is omitted.
1.x#1.y#1.z omitted because of collinearity
Is there a way I can keep this interaction?
It would be best to verify that you actually have this combination in your data with egen, group.
You should also use i. prefixes to keep Stata from treating your variables as continuous, which has the added benefit of a more informative error message: interaction identifies no observations in the sample rather than a mysterious collinearity one.
Here is a reproducible example:
. sysuse auto, clear
(1978 automobile data)
. sum mpg weight
Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
mpg | 74 21.2973 5.785503 12 41
weight | 74 3019.459 777.1936 1760 4840
. gen efficient = mpg > 21
. lab define efficient 0 "Inefficient" 1 "Efficient"
. lab val efficient efficient
. gen heavy = weight > 3e3
. lab define heavy 0 "Light" 1 "Heavy"
. lab val heavy heavy
. egen group = group(foreign efficient heavy), label(group)
. tab group, sort
group(foreign efficient |
heavy) | Freq. Percent Cum.
---------------------------+-----------------------------------
Domestic Inefficient Heavy | 34 45.95 45.95
Foreign Efficient Light | 15 20.27 66.22
Domestic Efficient Light | 13 17.57 83.78
Foreign Inefficient Light | 5 6.76 90.54
Domestic Efficient Heavy | 3 4.05 94.59
Domestic Inefficient Light | 2 2.70 97.30
Foreign Inefficient Heavy | 2 2.70 100.00
---------------------------+-----------------------------------
Total | 74 100.00
. reg price c.foreign##c.efficient##c.heavy, robust
note: c.foreign#c.efficient#c.heavy omitted because of collinearity.
Linear regression Number of obs = 74
F(6, 67) = 74.67
Prob > F = 0.0000
R-squared = 0.2830
Root MSE = 2606.9
-----------------------------------------------------------------------------------------------
| Robust
price | Coefficient std. err. t P>|t| [95% conf. interval]
------------------------------+----------------------------------------------------------------
foreign | 3007.6 960.0626 3.13 0.003 1091.307 4923.893
efficient | 513.2308 394.6504 1.30 0.198 -274.4948 1300.956
|
c.foreign#c.efficient | -1810.164 1071.875 -1.69 0.096 -3949.636 329.3076
|
heavy | 3283.176 696.5873 4.71 0.000 1892.782 4673.571
|
c.foreign#c.heavy | 2462.724 1196.996 2.06 0.044 73.50896 4851.938
|
c.efficient#c.heavy | -2783.741 744.4813 -3.74 0.000 -4269.732 -1297.75
|
c.foreign#c.efficient#c.heavy | 0 (omitted)
|
_cons | 3739 332.9212 11.23 0.000 3074.486 4403.514
-----------------------------------------------------------------------------------------------
. reg price i.foreign##i.efficient##i.heavy, robust
note: 1.foreign#1.efficient#1.heavy identifies no observations in the sample.
Linear regression Number of obs = 74
F(6, 67) = 74.67
Prob > F = 0.0000
R-squared = 0.2830
Root MSE = 2606.9
------------------------------------------------------------------------------------------
| Robust
price | Coefficient std. err. t P>|t| [95% conf. interval]
-------------------------+----------------------------------------------------------------
foreign |
Foreign | 3007.6 960.0626 3.13 0.003 1091.307 4923.893
|
efficient |
Efficient | 513.2308 394.6504 1.30 0.198 -274.4948 1300.956
|
foreign#efficient |
Foreign#Efficient | -1810.164 1071.875 -1.69 0.096 -3949.636 329.3076
|
heavy |
Heavy | 3283.176 696.5873 4.71 0.000 1892.782 4673.571
|
foreign#heavy |
Foreign#Heavy | 2462.724 1196.996 2.06 0.044 73.50896 4851.938
|
efficient#heavy |
Efficient#Heavy | -2783.741 744.4813 -3.74 0.000 -4269.732 -1297.75
|
foreign#efficient#heavy |
Foreign#Efficient#Heavy | 0 (empty)
|
_cons | 3739 332.9212 11.23 0.000 3074.486 4403.514
------------------------------------------------------------------------------------------
There are no foreign, efficient, and heavy cars in the data, and when you let Stata know that you have categorical variables on the RHS, you get an understandable error message about why the triple interaction is missing.

Chart Behavior in Oracle APEX

I have a time-series data in my table. Sample Data given below:
+------+------------+-----------+-----------+-------------+-------------+
| CODE | YEAR_MONTH | CALC_LVL1 | CALC_LVL2 | MSRMT_PCT_1 | MSRMT_PCT_2 |
+------+------------+-----------+-----------+-------------+-------------+
| A1 | 201912 | 87 | 564 | 0.14 | 0.1 |
| A1 | 201911 | 34 | 455 | 0.15 | 0.08 |
| A1 | 201910 | 20 | 295 | 0.1 | 0.14 |
| A1 | 201909 | 39 | 219 | 0.08 | 0.14 |
| A1 | 201908 | 98 | 438 | 0.14 | 0.11 |
| A1 | 201907 | 7 | 219 | 0.08 | 0.14 |
| A1 | 201812 | 63 | 564 | 0.14 | 0.17 |
| A1 | 201808 | 12 | 455 | 0.15 | 0.13 |
| A1 | 201805 | 48 | 409 | 0.13 | 0.13 |
| A1 | 201802 | 88 | 289 | 0.11 | 0.08 |
| A1 | 201801 | 9 | 492 | 0.14 | 0.13 |
+------+------------+-----------+-----------+-------------+-------------+
Is there any way that the default chart shows me the year values, and when user clicks on a year label, then it shows monthly data?
I am assuming you are looking for Time Axis chart where the chart label is mapped to the DATE or TIMESTAMP column. In the chart attributes, set the Time Axis Type to Enabled. Labels will then be correctly rendered as readable dates. You can then build another chart or report that can be drilled down from this chart. To do this, navigate to the chart, select the series and then in the property editor, navigate to Column Mapping. Select the column names for LABEL and VALUE. For Link > Type, select Redirect to Page in this application. Click Target, select the page and set page item and value.

Plot graph from within Mata

Consider the following toy matrix in mata:
mata: A
1 2
+-----------------+
1 | 6555 140 |
2 | 7205 135 |
3 | 6255 140 |
4 | 7272 138 |
5 | 10283 133 |
6 | 8244 136 |
7 | 6909 144 |
8 | 7645 138 |
9 | 12828 134 |
10 | 6538 137 |
+-----------------+
If I want to draw a scatter plot using this matrix, I first need to transfer it
to Stata and then also convert it to variables with the svmat command:
mata: st_matrix("A", A)
svmat A
list, separator(0)
+-------------+
| A1 A2 |
|-------------|
1. | 6555 140 |
2. | 7205 135 |
3. | 6255 140 |
4. | 7272 138 |
5. | 10283 133 |
6. | 8244 136 |
7. | 6909 144 |
8. | 7645 138 |
9. | 12828 134 |
10. | 6538 137 |
+-------------+
twoway scatter A1 A2
Is there a way to directly draw the graph without leaving mata?
One can plot a mata matrix without first converting it to Stata variables as follows:
twoway scatter matamatrix(A)
See help twoway_mata for more details.
Edit by #PearlySpencer:
This can be run directly from within mata using the stata() function:
mata: stata("twoway scatter matamatrix(A)")
An alternative approach is to use the community-contributed mata function mm_plot():
mata: mm_plot(A, "scatter")
This is part of the moremata collection of functions and must thus be downloaded first:
ssc install moremata

Chi-Sq test result difference when done Manually and by SAS

I am trying to perform a chi-square test on my data using SAS University Edition.
Here is the strucure of my data
+----------+------------+------------------+-------------------+
| study_id | Control_id | study_mortality | control_mortality |
+----------+------------+------------------|-------------------+
| 1 | 50 | Alive | Alive |
| 1 | 52 | Alive | Alive |
| 2 | 65 | Dead | Dead |
| 2 | 70 | Dead | Alive |
+----------+------------+------------------+-------------------+
I am getting different results when I do the test with SAS Vs when I do it manually using an online calculator. I used the values from 'PROC FREQ' to calculate the Chi-Sq using online calculator. Here are the outputs of frequencies and the Chi-sq test. Can someone point where the issue is.
proc freq data = mydata;
tables study_mortality control_mortality;
where type=1;
run;
+-----------------+-------------------+
| study_mortality | Frequency |
+-----------------+-------------------
| Alive | 7614 |
| Dead | 324 |
+-----------------+-------------------+
+----------------- +-------------------+
| control_mortality| Frequency |
+----------------- +-------------------
| Alive | 6922 |
| Dead | 159 |
+----------------- +-------------------+
proc freq data = mydata;
tables study_mortality*control_mortality/ CHISQ;
where type=1;
run;
+-----------------+-------------------+---------+-------+
| | Control_mortality | | |
+-----------------+-------------------+---------+-------+
| Study_mortality | Alive | Dead | Total |
| Alive | 5515 | 134 | 5649 |
| Dead | 249 | 5 | 254 |
| Total | 5764 | 139 | 5903 |
+-----------------+-------------------+---------+-------+
Statistic DF Value Prob
Chi-Square 1 0.1722 0.6782
Likelihood Ratio Chi-Square 1 0.1818 0.6699
Continuity Adj. Chi-Square 1 0.0414 0.8388
Mantel-Haenszel Chi-Square 1 0.1722 0.6782
Phi Coefficient -0.0054
Contingency Coefficient 0.0054
Cramer's V -0.0054
You have missing data. Look at the N's on those tables.
Study Mortality is around 8000 and Control Mortality is around 7000 but when you cross them you only have 5903 records. This means that certain records are excluded. There should be a line in the output saying N missing somewhere. Not sure if SAS didn't put it there or you only pasted selected output. The P value matches exactly when I use an online calculator and also match your output.
data have;
infile cards;
input Study Control N;
cards;
1 1 5515
1 0 134
0 1 249
0 0 5
;
run;
proc freq data=have;
table study*control / chisq;
weight N;
run;

DAX measure with month variable based on date field

I am having a hard time getting the following measure to work. I am trying to change the target based on a date filter. My filter is the Workday columns, where Workday is a standard date column. sMonth is a month columns formatted as whole number. I am looking to keep the slicer granular, in order to work by day, adding custom columns with month and year and basing the measure on those would help. This is what I have tried and couldn't get it to work:
Cars Inspected =
VAR
selectedMonth = MONTH(SELECTEDVALUE('All Cars Inspected'[Workday]))
RETURN CALCULATE(SUM(Targets[Target]),
FILTER(Targets,Targets[Location]="Texas"),
FILTER(Targets,Targets[Description]="CarsInspected"),
FILTER(Targets,Targets[sMonth]=selectedMonth))
I would appreciate if someone would suggest a different way of achieving the same result.
LE:
This is a mock-up of what I am trying to achieve:
The total cars get filtered by the Workday. I would like to make the Targets/Ranges dynamic. When the slider gets adjusted everything else is adjusted.
My tables look like this:
+-----------+--------------------+----------+
| Workday | TotalCarsInspected | Location |
+-----------+--------------------+----------+
| 4/4/2017 | 1 | Texas |
| 4/11/2017 | 149 | Texas |
| 4/12/2017 | 129 | Texas |
| 4/13/2017 | 201 | Texas |
| 4/14/2017 | 4 | Texas |
| 4/15/2017 | 6 | Texas |
+-----------+--------------------+----------+
+----------+--------+----------+---------------+--------+-----+--------+
| TargetID | sMonth | Location | Description | Target | Red | Yellow |
+----------+--------+----------+---------------+--------+-----+--------+
| 495 | 1 | Texas | CarsInspected | 3636 | 0.5 | 0.75 |
| 496 | 2 | Texas | CarsInspected | 4148 | 0.5 | 0.75 |
| 497 | 3 | Texas | CarsInspected | 4861 | 0.5 | 0.75 |
| 498 | 4 | Texas | CarsInspected | 4938 | 0.5 | 0.75 |
| 499 | 5 | Texas | CarsInspected | 5094 | 0.5 | 0.75 |
| 500 | 6 | Texas | CarsInspected | 5044 | 0.5 | 0.75 |
| 501 | 7 | Texas | CarsInspected | 5043 | 0.5 | 0.75 |
| 502 | 8 | Texas | CarsInspected | 4229 | 0.5 | 0.75 |
| 503 | 9 | Texas | CarsInspected | 4311 | 0.5 | 0.75 |
| 504 | 10 | Texas | CarsInspected | 4152 | 0.5 | 0.75 |
| 505 | 11 | Texas | CarsInspected | 3592 | 0.5 | 0.75 |
| 506 | 12 | Texas | CarsInspected | 3748 | 0.5 | 0.75 |
+----------+--------+----------+---------------+--------+-----+--------+
Let the Value for your gauge be the sum of TotalCarsInspected and set the Maximum value to the following measure:
Cars Inspected =
VAR selectedMonth = MONTH(MAX('All Cars Inspected'[Workday]))
RETURN LOOKUPVALUE(Targets[Target],
Targets[Location], "Texas",
Targets[Description], "CarsInspected",
Targets[sMonth], selectedMonth)