Stata: Changing Number Format - stata

I am using estpost and esttab to export tabulation results in Stata.
sysuse auto, clear
estpost tabulate turn foreign
esttab ., cells("b(fmt(0))") unstack
---------------------------------------------------
(1)
Domestic Foreign Total
b b b
---------------------------------------------------
31 1 0 1
32 0 1 1
33 1 1 2
34 2 4 6
35 2 4 6
36 1 8 9
37 2 2 4
38 1 2 3
39 1 0 1
40 6 0 6
41 4 0 4
42 7 0 7
43 12 0 12
44 3 0 3
45 3 0 3
46 3 0 3
48 2 0 2
51 1 0 1
Total 52 22 74
---------------------------------------------------
N 74
---------------------------------------------------
Although I can change the format of the cells, I couldn't find a way to change the format of the observation number(N) and the total number of observations in each column. I tried adding obs(fmt(%10.2fc)) as an estab option but it didn't work.

Related

How to add a row where there is a disruption in series of numbers in Stata

I'm attempting to format a table of 40 different age-race-sex strata to be inputted into R-INLA and noticed that it's important to include all strata (even if they are not present in a county). These should be zeros. However, at this point my table only contains records for strata that are not empty. I can identify places where strata are missing for each county by looking at my strata variable and finding the breaks in the series 1 through 40 (marked with a red x in the image below).
In these places (marked by the red x) I need to add the missing rows and fill in the corresponding county code, strata code, population=0, and the correct corresponding race, sex, age code for the strata.
If I can figure out a way to add an empty row in the spaces with the red Xs from the image, and correctly assign the strata code (and county code) to these empty/missing rows, I am able to populate the rest of the values with the code below:
recode race = 1 & sex= 1 & age =4 if strata = 4
...etc
I'm wondering if there is a way to add the missing rows using an if statement that considers the fact that there are supposed to be forty strata for each county code. It would be ideal if this could populate the correct county code and strata code as well!
Dataex sample data:
* Example generated by -dataex-. To install: ssc install dataex
clear
input float OID str5 fips_statecounty double population byte(race sex age) float strata
1 "" 672 1 1 1 1
2 "" 1048 1 1 2 2
3 "" 883 1 1 3 3
4 "" 1129 1 1 4 4
5 "" 574 1 2 1 5
6 "" 986 1 2 2 6
7 "" 899 1 2 3 7
8 "" 1820 1 2 4 8
9 "" 96 2 1 1 9
10 "" 142 2 1 2 10
11 "" 81 2 1 3 11
12 "" 99 2 1 4 12
13 "" 71 2 2 1 13
14 "" 125 2 2 2 14
15 "" 103 2 2 3 15
16 "" 162 2 2 4 16
17 "" 31 3 1 1 17
18 "" 32 3 1 2 18
19 "" 18 3 1 3 19
20 "" 31 3 1 4 20
21 "" 22 3 2 1 21
22 "" 28 3 2 2 22
23 "" 28 3 2 3 23
24 "" 44 3 2 4 24
25 "" 20 4 1 1 25
26 "" 24 4 1 2 26
27 "" 21 4 1 3 27
28 "" 43 4 1 4 28
29 "" 19 4 2 1 29
30 "" 26 4 2 2 30
31 "" 24 4 2 3 31
32 "" 58 4 2 4 32
33 "" 6 5 1 1 33
34 "" 11 5 1 2 34
35 "" 13 5 1 3 35
36 "" 7 5 1 4 36
37 "" 7 5 2 1 37
38 "" 9 5 2 2 38
39 "" 10 5 2 3 39
40 "" 11 5 2 4 40
41 "01001" 239 1 1 1 1
42 "01001" 464 1 1 2 2
43 "01001" 314 1 1 3 3
44 "01001" 232 1 1 4 4
45 "01001" 284 1 2 1 5
46 "01001" 580 1 2 2 6
47 "01001" 392 1 2 3 7
48 "01001" 440 1 2 4 8
49 "01001" 41 2 1 1 9
50 "01001" 38 2 1 2 10
51 "01001" 23 2 1 3 11
52 "01001" 26 2 1 4 12
53 "01001" 34 2 2 1 13
54 "01001" 52 2 2 2 14
55 "01001" 40 2 2 3 15
56 "01001" 50 2 2 4 16
57 "01001" 4 3 1 1 17
58 "01001" 2 3 1 2 18
59 "01001" 3 3 1 3 19
60 "01001" 6 3 2 1 21
61 "01001" 4 3 2 2 22
62 "01001" 6 3 2 3 23
63 "01001" 4 3 2 4 24
64 "01001" 1 4 1 4 28
65 "01003" 1424 1 1 1 1
66 "01003" 2415 1 1 2 2
67 "01003" 1680 1 1 3 3
68 "01003" 1823 1 1 4 4
69 "01003" 1545 1 2 1 5
70 "01003" 2592 1 2 2 6
71 "01003" 1916 1 2 3 7
72 "01003" 2527 1 2 4 8
73 "01003" 68 2 1 1 9
74 "01003" 82 2 1 2 10
75 "01003" 52 2 1 3 11
76 "01003" 54 2 1 4 12
77 "01003" 72 2 2 1 13
78 "01003" 129 2 2 2 14
79 "01003" 81 2 2 3 15
80 "01003" 106 2 2 4 16
81 "01003" 10 3 1 1 17
82 "01003" 14 3 1 2 18
83 "01003" 8 3 1 3 19
84 "01003" 4 3 1 4 20
85 "01003" 8 3 2 1 21
86 "01003" 14 3 2 2 22
87 "01003" 17 3 2 3 23
88 "01003" 10 3 2 4 24
89 "01003" 4 4 1 1 25
90 "01003" 1 4 1 3 27
91 "01003" 2 4 1 4 28
92 "01003" 2 4 2 1 29
93 "01003" 3 4 2 2 30
94 "01003" 4 4 2 3 31
95 "01003" 10 4 2 4 32
96 "01003" 5 5 1 1 33
97 "01003" 4 5 1 2 34
98 "01003" 3 5 1 3 35
99 "01003" 5 5 1 4 36
100 "01003" 5 5 2 2 38
end
label values race race
label values sex sex
My answer to your previous question
Nested for-loop: error variable already defined
detailed how to create a minimal dataset with all strata present. Therefore you should just merge that with your main dataset and replace missings on the absent strata with whatever your other software expects, zeros it seems.
The complication most obvious at this point is you need to factor in a county variable. I can't see any information on how many counties you have in your dataset, which may affect what is practical. You should be able to break down the preparation into: first, prepare a minimal county dataset with identifiers only; then merge that with a complete strata dataset.

Calculating Frequency of Defaults in a given Bucketed Score

I have a table with Scores and default indicator values.
I sorted the table on the basis of descending scores and then applied proc rank to populate the group column.
Below is a sample of the dataset after the proc rank step.
Obs Scores Def group
1 100 0 9
2 100 1 9
3 99 0 9
4 97 0 9
5 97 0 9
6 95 0 9
7 94 0 9
8 92 0 9
9 92 0 9
10 91 0 9
11 91 0 9
12 89 1 8
13 88 0 8
14 87 0 8
15 87 0 8
16 86 0 8
17 85 0 8
18 84 0 8
19 84 0 8
20 83 0 8
21 83 0 8
22 83 0 8
23 82 0 8
24 81 0 7
25 80 0 7
26 80 1 7
I want to count the population(i.e. number of scores that lie within each group).
Also count the number of defaults in each group.
I tried the below code:
proc rank data = sortedScore groups = 10 out = Score_sorted_10;
var Scores ;
ranks Scores_group;
run;
data NumCount;
set Score_sorted_10;
Retain Popnum 0;
Retain Badnum 0;
do i=0 to 9;
if Scores_group=i
then Popnum=sum(Popnum,1);
if Scores_group=i and Def=1
then Badnum=sum(Def,1);
end;
But this code is getting into infinite loop.
Please help.
I think it is easier to do it using proc sql.
The following query will do the trick:
proc sql;
create table want as
select distinct
Group,
count(scores) as Nbr_Scores,
sum(def) as Nbr_Def
from have
group by group;
quit;

select minimum value by ID, over range of visits

I'm trying to extract a variable for the lowest value over a range of visits, in this case:
I want the lowest value over first 3 days of admission (admission day 1 or 2 or 3) , by VisitID. any suggestions?
visitID value day of admission
1 941 1
1 948 2
1 935 4
2 83 1
2 84 2
2 50 4
2 79 5
and I would want:
visitID value visit minvalue
1 941 1 941
1 948 2 941
1 935 4 941
2 83 1 83
2 84 2 83
2 50 4 83
2 79 5 83
It would have been helpful if you had presented your data in an easily usable form. But here's an approach that should point you in a useful direction.
* Example generated by -dataex-. To install: ssc install dataex
clear
input byte visitid int value byte day
1 941 1
1 948 2
1 935 4
2 83 1
2 84 2
2 50 4
2 79 5
end
bysort visitid (day) : egen minvalue = min(cond(day<=3,value,.))
Which results in
. list, sepby(visitid)
+----------------------------------+
| visitid value day minvalue |
|----------------------------------|
1. | 1 941 1 941 |
2. | 1 948 2 941 |
3. | 1 935 4 941 |
|----------------------------------|
4. | 2 83 1 83 |
5. | 2 84 2 83 |
6. | 2 50 4 83 |
7. | 2 79 5 83 |
+----------------------------------+

Consecutive tagging in Stata

The task is to identify which consecutive week a product (in a specific store) has been on promotion.
clear
input ///
upc week store promo
1 1 86 1
1 2 86 1
1 3 86 1
1 4 86 1
3 1 86 0
3 2 86 1
4 1 86 0
4 2 86 1
4 3 86 1
end
The end result should look something like this:
upc week store promo promocount
1 1 86 1 1
1 2 86 1 2
1 3 86 1 3
1 4 86 1 4
3 1 86 0 0
3 2 86 1 1
4 1 86 0 0
4 2 86 1 1
4 3 86 1 2
end
I have 800K obs., and I am encountering a problem with the real data set. When I run bysort upc week store promo: gen prcount = _n if promo==1, my data set is sorted in a different way (which, as a result, yields wrong tagging):
upc week store promo
1 1 86 1
3 1 86 0
4 1 86 0
1 2 86 1
3 2 86 1
4 2 86 1
1 3 86 1
4 3 86 1
1 4 86 1
Anyway, I now realize my code is wrong. Any suggestions?
I think
. quietly input ///
> upc week store promo
. generate promocount = 0
. bysort store upc (week): replace promocount = 1+cond(_n==1,0,promocount[_n-1]) if promo>0
(7 real changes made)
. list, clean noobs
upc week store promo promoc~t
1 1 86 1 1
1 2 86 1 2
1 3 86 1 3
1 4 86 1 4
3 1 86 0 0
3 2 86 1 1
4 1 86 0 0
4 2 86 1 1
4 3 86 1 2
does do what you want.

SAS and line pointers in a loop

data test;
infile datalines;
input k1 k2 k3 k4 k5 k6 k7 k8 k9 k10;
array a(*) k1-k10;
do i=1 to 10;
if a(i) eq . then stop;
line=a(i);
input #line k1 k2 k3 k4 k5 k6 k7 k8 k9 k10;
output;
end;
stop;
datalines;
5 9 2 4 6 3 . . . .
29 57 32 9 2 29 2 0 23 1
83 34 28 1 43 3 24 2 6 2
0 84 62 75 3 52 65 1 5 2
0 2 12 45 92 3 60 24 6 2
47 24 87 2 52 36 1 17 3 1
90 93 2 1 40 20 75 2 5 14
78 27 27 2 4 1 12 21 4 2
21 40 3 21 3 19 3 2 4 2
84 2 5 3 13 6 23 98 1 2
;
run;
I want to read only the observations whose numbers are in the first row. The expected result:
0 2 12 45 92 3 60 24 6 2
21 40 3 21 3 19 3 2 4 2
29 57 32 9 2 29 2 0 23 1
0 84 62 75 3 52 65 1 5 2
47 24 87 2 52 36 1 17 3 1
83 34 28 1 43 3 24 2 6 2
The error I get after running my code:
ERROR: Old line 3387 wanted but SAS is at line 3391.
Use: INFILE N=X; , with a suitable value of x.
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0
3391 47 24 87 2 52 36 1 17 3 1
k1=0 k2=2 k3=12 k4=45 k5=92 k6=3 k7=60 k8=24 k9=6 k10=2 i=2 line=2 _ERROR_=1 _N_=1
What does "a suitable value of x" mean? What should I change in my code?
You are overwriting the values in your array with your second input statement. Here they are read into different variables so as not to be overwritten.
data test;
infile datalines n=100;
input h1 h2 h3 h4 h5 h6 h7 h8 h9 h10;
array h{*} h1-h10;
do i = 1 to 10;
line = h[i];
if line then do;
input #line k1 k2 k3 k4 k5 k6 k7 k8 k9 k10;
output;
end;
end;
keep k:;
datalines;
5 9 2 4 6 3 . . . .
29 57 32 9 2 29 2 0 23 1
83 34 28 1 43 3 24 2 6 2
0 84 62 75 3 52 65 1 5 2
0 2 12 45 92 3 60 24 6 2
47 24 87 2 52 36 1 17 3 1
90 93 2 1 40 20 75 2 5 14
78 27 27 2 4 1 12 21 4 2
21 40 3 21 3 19 3 2 4 2
84 2 5 3 13 6 23 98 1 2
;
run;
SAS is telling you that you need to amend your infile statement to allow it to read a sufficient number of lines ahead. For your code as written, n=10 should be ok, as none of variables you're using to get the line number have values greater than 10.
data test;
/*Add the n= option to the infile statement as suggested by log message*/
infile datalines n= 10;
input k1 k2 k3 k4 k5 k6 k7 k8 k9 k10;
array a(*) k1-k10;
array b(*) b1-b10;
/*Make a copy of the first row
that won't get overwritten by subsequent input statements*/
do i=1 to 10;
b(i) = a(i);
end;
do i=1 to 10;
if b(i) eq . then stop;
line=b(i);
input #line k1 k2 k3 k4 k5 k6 k7 k8 k9 k10;
output;
end;
stop;
datalines;
5 9 2 4 6 3 . . . .
29 57 32 9 2 29 2 0 23 1
83 34 28 1 43 3 24 2 6 2
0 84 62 75 3 52 65 1 5 2
0 2 12 45 92 3 60 24 6 2
47 24 87 2 52 36 1 17 3 1
90 93 2 1 40 20 75 2 5 14
78 27 27 2 4 1 12 21 4 2
21 40 3 21 3 19 3 2 4 2
84 2 5 3 13 6 23 98 1 2
;
run;