I am Looking for SAS coding to compute scores for a whole cohort based on scores calculated for a subgroup
I can create scores in the whole population by itself as my whole dataset but have no experience in using the fitted values of a subgroup dataset to compute scores for the whole population
I work with SAS coding
NA
Welcome to stackoverflow! If I understand your question, this will do what you want.
I grabbed some data from sas support:
Data Neuralgia;
input Treatment $ Sex $ Age Duration Pain $ ##;
datalines;
P F 68 1 No B M 74 16 No P F 67 30 No
P M 66 26 Yes B F 67 28 No B F 77 16 No
A F 71 12 No B F 72 50 No B F 76 9 Yes
A M 71 17 Yes A F 63 27 No A F 69 18 Yes
B F 66 12 No A M 62 42 No P F 64 1 Yes
A F 64 17 No P M 74 4 No A F 72 25 No
P M 70 1 Yes B M 66 19 No B M 59 29 No
A F 64 30 No A M 70 28 No A M 69 1 No
B F 78 1 No P M 83 1 Yes B F 69 42 No
B M 75 30 Yes P M 77 29 Yes P F 79 20 Yes
A M 70 12 No A F 69 12 No B F 65 14 No
B M 70 1 No B M 67 23 No A M 76 25 Yes
P M 78 12 Yes B M 77 1 Yes B F 69 24 No
P M 66 4 Yes P F 65 29 No P M 60 26 Yes
A M 78 15 Yes B M 75 21 Yes A F 67 11 No
P F 72 27 No P F 70 13 Yes A M 75 6 Yes
B F 65 7 No P F 68 27 Yes P M 68 11 Yes
P M 67 17 Yes B M 70 22 No A M 65 15 No
P F 67 1 Yes A M 67 10 No P F 72 11 Yes
A F 74 1 No B M 80 21 Yes A F 69 3 No
;
run;
Then subsetted down to build a model using only the males:
data males;
set Neuralgia;
where sex = "M";
run;
Then I built a model and saved the model details, into the work library, in a file called theMaleModel.
proc logistic data=males outmodel=work.theMaleModel;
class Treatment;
model Pain = Treatment Age Duration ;
run;
Then I apply the male model to the full dataset and save the scored results into a dataset, in the work library, called scoreEverybody:
proc logistic inmodel=work.theMaleModel;
score data=Neuralgia out=scoreEverybody;
run;
You can see more examples like this if you look here. If that answers your question please click the check next to this answer.
data.Hotel_Address.head(10)
0 s Gravesandestraat 55 Oost 1092 AA Amsterdam ...
1 s Gravesandestraat 55 Oost 1092 AA Amsterdam ...
2 s Gravesandestraat 55 Oost 1092 AA Amsterdam ...
3 s Gravesandestraat 55 Oost 1092 AA Amsterdam ...
4 s Gravesandestraat 55 Oost 1092 AA Amsterdam ...
5 s Gravesandestraat 55 Oost 1092 AA Amsterdam ...
6 s Gravesandestraat 55 Oost 1092 AA Amsterdam ...
7 s Gravesandestraat 55 Oost 1092 AA Amsterdam ...
8 s Gravesandestraat 55 Oost 1092 AA Amsterdam ...
9 s Gravesandestraat 55 Oost 1092 AA Amsterdam ...
How can I extract country name after last space till the end of string with regex in pandas?
Regex is not necessary, use split and select last lists by str[-1]:
data['new'] = data.Hotel_Address.str.split().str[-1]
I have data set as below. Can someone help me to import data to hdfs using sqoop boundary query, Using the column (id) which is having duplicate keys.
mysql> select id,name,age from employee;
id name age
1 A 30
2 B 35
3 C 40
4 D 23
5 E 26
1 A 24
2 B 16
3 C 78
4 G 66
3 H 56
4 A 63
20 C 58
13 F 47
2 A 49
3 B 60
I often end up with the following situation. I have a dataframe with two IDs
A = pd.DataFrame([[1,'a', 'a1'], [2, None, 'a2'], [3,'c', 'a3'], [4,'None', 'a3'], [None, 'e', 'a3'], ['None', 'None', 'None']], columns = ['id1', 'id2', 'colA'])
id1 id2 colA
0 1 a a1
1 2 None a2
2 3 c a3
3 4 None a3
4 None e a3
5 None None None
and I have another dataframe with additional info I want to add to the first dataframe
B = pd.DataFrame([[1,'a', 'b1', 'c1'], [2, 'b', 'b2', 'c2'], [3,'c', 'b3', 'c3'], [4, 'd', 'b4', 'c4'], [5, 'e', 'b5', 'c5'], [6, 'e', 'b5', 'c5']], columns = ['id1', 'id2', 'colB', 'colC'])
Out[15]:
id1 id2 colB colC
0 1 a b1 c1
1 2 b b2 c2
2 3 c b3 c3
3 4 d b4 c4
4 5 e b5 c5
5 6 e b5 c5
I want to merge on id1, like this
A.merge(B, how='left', on='id1')
id1 id2_x colA id2_y colB colC
0 1 a a1 a b1 c1
1 2 None a2 b b2 c2
2 3 c a3 c b3 c3
3 4 None a3 d b4 c4
4 None e a3 NaN NaN NaN
5 None None None NaN NaN NaN
This is close to what I want. However for the failed lookups (that is when id1 is not available) I would like to merge on id2, so the result looks like
id1 id2_x colA id2_y colB colC
0 1 a a1 a b1 c1
1 2 None a2 b b2 c2
2 3 c a3 c b3 c3
3 4 None a3 d b4 c4
4 None e a3 NaN b5 c5
5 None None None NaN NaN NaN
What's the best way to achieve this? Note I don't really want 2 id2 columns in the result and id2 may have duplicates.
IIUC you use fillna. But it fill last row too.
print df
id1 id2_x colA id2_y colB colC
0 1 a a1 a b1 c1
1 2 None a2 b b2 c2
2 3 c a3 c b3 c3
3 4 None a3 d b4 c4
4 None e a3 NaN NaN NaN
5 None None None NaN NaN NaN
df = df.fillna(B)
print df
id1 id2_x colA id2_y colB colC
0 1 a a1 a b1 c1
1 2 None a2 b b2 c2
2 3 c a3 c b3 c3
3 4 None a3 d b4 c4
4 None e a3 NaN b5 c5
5 None None None NaN b5 c5
As EdChum mentioned in comments, next solution is use combine_first, but output is different:
print A.combine_first(B)
colA colB colC id1 id2
0 a1 b1 c1 1 a
1 a2 b2 c2 2 b
2 a3 b3 c3 3 c
3 a3 b4 c4 4 None
4 a3 b5 c5 5 e
5 None b5 c5 None None
Difference is:
In [142]: %timeit A.combine_first(B)
100 loops, best of 3: 3.44 ms per loop
In [143]: %timeit A.merge(B, how='left', on='id1').fillna(B)
100 loops, best of 3: 2.89 ms per loop
I have this data set:
data a1q1;
input pid los age gender $ temp wbc anti service $ ;
cards;
1 5 30 F 99 82 2 M
2 10 73 F 98 52 1 M
3 6 40 F 99 122 2 S
4 11 47 F 98 42 2 S
5 5 25 F 99 112 2 S
6 14 82 M 97 61 2 S
7 30 60 M 100 81 1 M
8 11 56 F 99 72 2 M
9 17 43 F 98 72 2 M
10 3 50 M 98 122 1 S
11 9 59 F 98 72 1 M
12 3 4 M 98 32 2 S
13 8 22 F 100 111 2 S
14 8 33 F 98 141 1 S
15 5 20 F 98 112 1 S
16 5 32 M 99 92 2 S
17 7 36 M 99 61 2 S
18 4 69 M 98 62 2 S
19 3 47 M 97 51 2 M
20 7 22 M 98 62 2 S
21 9 11 M 98 102 2 S
22 11 19 M 99 141 2 S
23 11 67 F 98 42 2 M
24 9 43 F 99 52 2 S
25 4 41 F 98 52 2 M
;
I need to use PROC SGPLOT to output an identical, if not, similar barchart that would be outputted from the following PROC:
proc gchart data = a1q1;
vbar wbc / group = gender;
run;
I need PROC SGPLOT to group the two genders together and not stack them. I have tried coding this way but to no avail:
proc sgplot data = a1q1;
vbar wbc / group= gender response =wbc stat=freq nostatlabel;
run;
How would I go about coding to get the output I need?
Thank you for your time!
Sounds like you should use SGPANEL, not SGPLOT. SGPLOT can make grouped bar charts, but not automatically make histogram bins without using a format (you could do that if you want) and doesn't support group with the histogram plot. However, SGPANEL can handle that.
proc sgpanel data=a1q1;
panelby gender;
histogram wbc;
run;