Calculating Frequency of Defaults in a given Bucketed Score - sas

I have a table with Scores and default indicator values.
I sorted the table on the basis of descending scores and then applied proc rank to populate the group column.
Below is a sample of the dataset after the proc rank step.
Obs Scores Def group
1 100 0 9
2 100 1 9
3 99 0 9
4 97 0 9
5 97 0 9
6 95 0 9
7 94 0 9
8 92 0 9
9 92 0 9
10 91 0 9
11 91 0 9
12 89 1 8
13 88 0 8
14 87 0 8
15 87 0 8
16 86 0 8
17 85 0 8
18 84 0 8
19 84 0 8
20 83 0 8
21 83 0 8
22 83 0 8
23 82 0 8
24 81 0 7
25 80 0 7
26 80 1 7
I want to count the population(i.e. number of scores that lie within each group).
Also count the number of defaults in each group.
I tried the below code:
proc rank data = sortedScore groups = 10 out = Score_sorted_10;
var Scores ;
ranks Scores_group;
run;
data NumCount;
set Score_sorted_10;
Retain Popnum 0;
Retain Badnum 0;
do i=0 to 9;
if Scores_group=i
then Popnum=sum(Popnum,1);
if Scores_group=i and Def=1
then Badnum=sum(Def,1);
end;
But this code is getting into infinite loop.
Please help.

I think it is easier to do it using proc sql.
The following query will do the trick:
proc sql;
create table want as
select distinct
Group,
count(scores) as Nbr_Scores,
sum(def) as Nbr_Def
from have
group by group;
quit;

Related

Stata: Changing Number Format

I am using estpost and esttab to export tabulation results in Stata.
sysuse auto, clear
estpost tabulate turn foreign
esttab ., cells("b(fmt(0))") unstack
---------------------------------------------------
(1)
Domestic Foreign Total
b b b
---------------------------------------------------
31 1 0 1
32 0 1 1
33 1 1 2
34 2 4 6
35 2 4 6
36 1 8 9
37 2 2 4
38 1 2 3
39 1 0 1
40 6 0 6
41 4 0 4
42 7 0 7
43 12 0 12
44 3 0 3
45 3 0 3
46 3 0 3
48 2 0 2
51 1 0 1
Total 52 22 74
---------------------------------------------------
N 74
---------------------------------------------------
Although I can change the format of the cells, I couldn't find a way to change the format of the observation number(N) and the total number of observations in each column. I tried adding obs(fmt(%10.2fc)) as an estab option but it didn't work.

Pandas groupby, pivot, or stack? Turn groups of a single column into multiple columns

My data looks like this:
2 PresentationID 12954
5 Attendees 65
6 Downloads 0
7 Questions 0
8 Likes 11
9 Tweets 0
10 Polls 0
73 PresentationID 12953
76 Attendees 64
77 Downloads 31
78 Questions 0
79 Likes 11
80 Tweets 0
81 Polls 0
143 PresentationID 12951
146 Attendees 64
147 Downloads 28
148 Questions 2
149 Likes 2
150 Tweets 0
151 Polls 0
And i need to get it to this format:
PresentationID Attendees Downloads Questions Likes Tweets Polls
0 12954 65 0 0 11 0 0
1 12953 64 31 6 0 4
2 12892 204 0 0 14 0 0
I have tried several combinations of groupby, pivot, and stack with no avail. Any advice greatly appreciated. Thanks.
You can use cumcount with pivot:
print (df)
A B C
0 2 PresentationID 12954
1 5 Attendees 65
2 6 Downloads 0
3 7 Questions 0
4 8 Likes 11
5 9 Tweets 0
6 10 Polls 0
7 73 PresentationID 12953
8 76 Attendees 64
9 77 Downloads 31
10 78 Questions 0
11 79 Likes 11
12 80 Tweets 0
13 81 Polls 0
14 143 PresentationID 12951
15 146 Attendees 64
16 147 Downloads 28
17 148 Questions 2
18 149 Likes 2
19 150 Tweets 0
20 151 Polls 0
df['G'] = df.groupby('B').cumcount()
df = df.pivot(index='G', columns='B', values='C')
print (df)
B Attendees Downloads Likes Polls PresentationID Questions Tweets
G
0 65 0 11 0 12954 0 0
1 64 31 11 0 12953 0 0
2 64 28 2 0 12951 2 0
df = pd.pivot(index=df.groupby('B').cumcount(), columns=df.B, values=df.C)
print (df)
B Attendees Downloads Likes Polls PresentationID Questions Tweets
0 65 0 11 0 12954 0 0
1 64 31 11 0 12953 0 0
2 64 28 2 0 12951 2 0

SAS macro variables in PROC MIXED

This is my first foray into using SAS macros, and I'm following this page from the amazing UCLA Stats Consulting Group. I'm interested in using macro variables in PROC MIXED to avoid copying and pasting blocks of code (my actual data set has ~400 variables).
My example modifies the UCLA example to have students in many schools.
data hsb3;
input id school female race ses prog
read write math science socst;
datalines;
1 1 0 4 1 1 57 52 41 47 57
2 1 1 4 2 3 68 59 53 63 61
3 1 0 2 3 1 44 33 54 58 31
4 1 0 4 3 3 63 44 47 53 56
5 1 0 4 2 2 47 51 43 50 61
6 1 1 4 2 2 44 52 51 50 61
7 1 0 3 2 1 50 59 60 56 52
8 1 0 1 2 2 34 46 52 53 57
9 1 0 4 2 2 63 57 51 63 61
19 2 0 3 1 2 57 63 41 63 61
20 2 1 4 2 2 60 57 51 58 31
21 2 0 4 3 2 57 55 51 53 56
22 2 0 4 3 2 73 46 71 50 61
23 2 0 4 2 1 54 65 57 50 61
24 2 1 4 2 2 45 60 50 56 52
25 2 0 3 2 1 42 63 43 53 57
26 2 0 1 1 2 34 57 51 63 61
27 2 0 4 2 2 63 49 60 55 31
10 3 1 3 2 2 57 55 51 55 31
11 3 1 4 3 3 60 46 71 31 56
12 3 1 4 2 2 57 66 57 55 61
13 3 0 3 3 2 50 60 50 31 61
14 3 0 4 3 2 57 57 57 55 46
15 3 0 3 3 3 68 55 50 31 56
16 3 0 4 1 2 34 46 43 50 56
17 3 0 4 3 2 34 65 51 50 56
18 3 0 4 1 2 63 60 60 47 57
28 4 1 3 2 2 57 52 52 53 61
29 4 1 4 2 3 60 57 51 63 61
30 4 1 1 2 2 57 65 51 55 46
31 4 0 4 3 2 73 60 71 31 56
32 4 0 4 3 2 54 63 57 55 46
33 4 0 3 1 2 45 57 50 31 56
34 4 0 1 1 1 42 49 43 50 56
35 4 0 4 3 2 47 52 51 50 56
36 4 0 4 2 1 57 57 60 56 52
;
run;
The UCLA example shows how to use macro variables with proc reg to do several simple linear regression models to predict reading score with any of the other variables:
%let indvars = write math female socst;
proc reg data = hsb3;
model read = &indvars;
run;
quit;
To do this taking school into account, we can use PROC MIXED instead:
proc mixed data = hsb3;
class school;
model read = &indvars;
random school;
run;
quit;
But what I really want to do is to see if any of the scores differ by gender (still taking school into account).
%let scores = read write math science socst;
proc mixed data = hsb3;
class school;
model &scores = female;
random school;
run;
quit;
Now I get the error:
NOTE: The SAS System stopped processing this step because of errors.
167 class school;
168 model &indvars = female;
-
22
200
NOTE: Line generated by the macro variable "INDVARS".
1 write math female socst
----
73
ERROR 22-322: Syntax error, expecting one of the following: a name, ;, (, *, -, /, :, #,
_CHARACTER_, _CHAR_, _NUMERIC_, |.
ERROR 200-322: The symbol is not recognized and will be ignored.
ERROR 73-322: Expecting an =.
Somehow the macro variable is not working. Is there a problem with using macro variables as a response variable in PROC MIXED? They work as a response variable in PROC REG....
proc reg data = hsb3;
model &scores = female;
run;
quit;
Your problem doesn't have anything to do with macro variables or macro code. Instead you are not creating a valid MODEL statement to use in PROC MIXED.
The MODEL statement names a single dependent variable ...
Try transforming the data perhaps?
%let scores = read write math science socst;
data want ; set hsb3 ;
array scores &scores ;
do i=1 to dim(scores);
score=scores(i);
name=vname(scores(i));
output;
end;
run;
proc sort; by name ; run;
proc mixed data = want;
by name;
class school;
model score = female;
random school;
run;

How to loop rows and columns in pandas while replacing values with a constant increment

I am trying to replace values in a dataframe by 0. the first column I need to replace the 1st 3 values, the next column the 1st 6 values so on so forth increasing by 3 every time
a=np.array([133,124,156,189,132,176,189,192,100,120,130,140,150,50,70,133,124,156,189,132])
b = pd.DataFrame(a.reshape(10,2), columns= ['s','t'])
for columns in b:
yy = 3
for i in xrange(yy):
b[columns][i] = 0
yy += 3
print b
the outcome is the following
s t
0 0 0
1 0 0
2 0 0
3 189 189
4 132 132
5 176 176
6 189 189
7 192 192
8 100 100
9 120 120
I am clearly missing something really simple, to make the loop replace 6 values instead of only 3 in column t, any ideas?
i would do it this way:
i = 1
for c in b.columns:
b.ix[0 : 3*i-1, c] = 0
i += 1
Demo:
In [86]: b = pd.DataFrame(np.random.randint(0, 100, size=(20, 4)), columns=list('abcd'))
In [87]: %paste
i = 1
for c in b.columns:
b.ix[0 : 3*i-1, c] = 0
i += 1
## -- End pasted text --
In [88]: b
Out[88]:
a b c d
0 0 0 0 0
1 0 0 0 0
2 0 0 0 0
3 10 0 0 0
4 8 0 0 0
5 49 0 0 0
6 55 48 0 0
7 99 43 0 0
8 63 29 0 0
9 61 65 74 0
10 15 29 41 0
11 79 88 3 0
12 91 74 11 4
13 56 71 6 79
14 15 65 46 81
15 81 42 60 24
16 71 57 95 18
17 53 4 80 15
18 42 55 84 11
19 26 80 67 59
You need inicialize yy=3 before loop:
yy = 3
for columns in b:
for i in xrange(yy):
b[columns][i] = 0
yy += 3
print b
Python 3 solution:
yy = 3
for columns in b:
for i in range(yy):
b[columns][i] = 0
yy += 3
print (b)
s t
0 0 0
1 0 0
2 0 0
3 189 0
4 100 0
5 130 0
6 150 50
7 70 133
8 124 156
9 189 132
Another solution:
yy= 3
for i, col in enumerate(b.columns):
b.ix[:i*yy+yy-1, col] = 0
print (b)
s t
0 0 0
1 0 0
2 0 0
3 189 0
4 100 0
5 130 0
6 150 50
7 70 133
8 124 156
9 189 132

Use ODS Graphics to produce grouped histogram

I have this data set:
data a1q1;
input pid los age gender $ temp wbc anti service $ ;
cards;
1 5 30 F 99 82 2 M
2 10 73 F 98 52 1 M
3 6 40 F 99 122 2 S
4 11 47 F 98 42 2 S
5 5 25 F 99 112 2 S
6 14 82 M 97 61 2 S
7 30 60 M 100 81 1 M
8 11 56 F 99 72 2 M
9 17 43 F 98 72 2 M
10 3 50 M 98 122 1 S
11 9 59 F 98 72 1 M
12 3 4 M 98 32 2 S
13 8 22 F 100 111 2 S
14 8 33 F 98 141 1 S
15 5 20 F 98 112 1 S
16 5 32 M 99 92 2 S
17 7 36 M 99 61 2 S
18 4 69 M 98 62 2 S
19 3 47 M 97 51 2 M
20 7 22 M 98 62 2 S
21 9 11 M 98 102 2 S
22 11 19 M 99 141 2 S
23 11 67 F 98 42 2 M
24 9 43 F 99 52 2 S
25 4 41 F 98 52 2 M
;
I need to use PROC SGPLOT to output an identical, if not, similar barchart that would be outputted from the following PROC:
proc gchart data = a1q1;
vbar wbc / group = gender;
run;
I need PROC SGPLOT to group the two genders together and not stack them. I have tried coding this way but to no avail:
proc sgplot data = a1q1;
vbar wbc / group= gender response =wbc stat=freq nostatlabel;
run;
How would I go about coding to get the output I need?
Thank you for your time!
Sounds like you should use SGPANEL, not SGPLOT. SGPLOT can make grouped bar charts, but not automatically make histogram bins without using a format (you could do that if you want) and doesn't support group with the histogram plot. However, SGPANEL can handle that.
proc sgpanel data=a1q1;
panelby gender;
histogram wbc;
run;