I have this data set:
data a1q1;
input pid los age gender $ temp wbc anti service $ ;
cards;
1 5 30 F 99 82 2 M
2 10 73 F 98 52 1 M
3 6 40 F 99 122 2 S
4 11 47 F 98 42 2 S
5 5 25 F 99 112 2 S
6 14 82 M 97 61 2 S
7 30 60 M 100 81 1 M
8 11 56 F 99 72 2 M
9 17 43 F 98 72 2 M
10 3 50 M 98 122 1 S
11 9 59 F 98 72 1 M
12 3 4 M 98 32 2 S
13 8 22 F 100 111 2 S
14 8 33 F 98 141 1 S
15 5 20 F 98 112 1 S
16 5 32 M 99 92 2 S
17 7 36 M 99 61 2 S
18 4 69 M 98 62 2 S
19 3 47 M 97 51 2 M
20 7 22 M 98 62 2 S
21 9 11 M 98 102 2 S
22 11 19 M 99 141 2 S
23 11 67 F 98 42 2 M
24 9 43 F 99 52 2 S
25 4 41 F 98 52 2 M
;
I need to use PROC SGPLOT to output an identical, if not, similar barchart that would be outputted from the following PROC:
proc gchart data = a1q1;
vbar wbc / group = gender;
run;
I need PROC SGPLOT to group the two genders together and not stack them. I have tried coding this way but to no avail:
proc sgplot data = a1q1;
vbar wbc / group= gender response =wbc stat=freq nostatlabel;
run;
How would I go about coding to get the output I need?
Thank you for your time!
Sounds like you should use SGPANEL, not SGPLOT. SGPLOT can make grouped bar charts, but not automatically make histogram bins without using a format (you could do that if you want) and doesn't support group with the histogram plot. However, SGPANEL can handle that.
proc sgpanel data=a1q1;
panelby gender;
histogram wbc;
run;
Related
I am Looking for SAS coding to compute scores for a whole cohort based on scores calculated for a subgroup
I can create scores in the whole population by itself as my whole dataset but have no experience in using the fitted values of a subgroup dataset to compute scores for the whole population
I work with SAS coding
NA
Welcome to stackoverflow! If I understand your question, this will do what you want.
I grabbed some data from sas support:
Data Neuralgia;
input Treatment $ Sex $ Age Duration Pain $ ##;
datalines;
P F 68 1 No B M 74 16 No P F 67 30 No
P M 66 26 Yes B F 67 28 No B F 77 16 No
A F 71 12 No B F 72 50 No B F 76 9 Yes
A M 71 17 Yes A F 63 27 No A F 69 18 Yes
B F 66 12 No A M 62 42 No P F 64 1 Yes
A F 64 17 No P M 74 4 No A F 72 25 No
P M 70 1 Yes B M 66 19 No B M 59 29 No
A F 64 30 No A M 70 28 No A M 69 1 No
B F 78 1 No P M 83 1 Yes B F 69 42 No
B M 75 30 Yes P M 77 29 Yes P F 79 20 Yes
A M 70 12 No A F 69 12 No B F 65 14 No
B M 70 1 No B M 67 23 No A M 76 25 Yes
P M 78 12 Yes B M 77 1 Yes B F 69 24 No
P M 66 4 Yes P F 65 29 No P M 60 26 Yes
A M 78 15 Yes B M 75 21 Yes A F 67 11 No
P F 72 27 No P F 70 13 Yes A M 75 6 Yes
B F 65 7 No P F 68 27 Yes P M 68 11 Yes
P M 67 17 Yes B M 70 22 No A M 65 15 No
P F 67 1 Yes A M 67 10 No P F 72 11 Yes
A F 74 1 No B M 80 21 Yes A F 69 3 No
;
run;
Then subsetted down to build a model using only the males:
data males;
set Neuralgia;
where sex = "M";
run;
Then I built a model and saved the model details, into the work library, in a file called theMaleModel.
proc logistic data=males outmodel=work.theMaleModel;
class Treatment;
model Pain = Treatment Age Duration ;
run;
Then I apply the male model to the full dataset and save the scored results into a dataset, in the work library, called scoreEverybody:
proc logistic inmodel=work.theMaleModel;
score data=Neuralgia out=scoreEverybody;
run;
You can see more examples like this if you look here. If that answers your question please click the check next to this answer.
I am currently practicing SAS programming on using two SAS dataset(sample and master) . Below are the hypothetical or dummy data created for illustration purpose to solve my problem through SAS programming . I would like to extract the data for the id's in sample dataset from master dataset(test). I have given an example with few id's as sample dataset, for which i need to extract next 12 month information from master table(test) for each id's based on the yearmonth information( desired output given in the third output).
Below is the code to extract the previous 12 month data but i am not getting idea to extract next 12 month records as pulled for previous months, Can anyone help me in solving this problem using SAS programming with optimized way.
proc sort data=test;
by id yearmonth;
run;
data result;
set test;
array prev_month {13} PREV_MONTH_0-PREV_MONTH_12;
by id;
if first.id then do;
do i =1 to 13;
prev_month(i)=0;
end;
end;
do i = 13 to 2 by -1;
prev_month(i)=prev_month(i-1);
end;
prev_month(1)=no_of_cust;
drop i prev_month_0;
retain prev_month:;
run;
data sample1;
set sample(drop=no_of_cust);
run;
proc sort data=sample1;
by id yearmonth;
run;
data all;
merge sample1(in=a) result(in=b);
by id yearmonth;
if a;
run;
One sample dataset (dataset name - sample).
ID YEARMONTH NO_OF_CUST
1 200909 50
1 201005 65
1 201008 78
1 201106 95
2 200901 65
2 200902 45
2 200903 69
2 201005 14
2 201006 26
2 201007 98
One master dataset - dataset name (test) (huge dataset over the year for each id from start of the account to till date.)
ID YEARMONTH NO_OF_CUST
1 200808 125
1 200809 125
1 200810 111
1 200811 174
1 200812 98
1 200901 45
1 200902 74
1 200903 73
1 200904 101
1 200905 164
1 200906 104
1 200907 22
1 200908 35
1 200909 50
1 200910 77
1 200911 86
1 200912 95
1 201001 95
1 201002 87
1 201003 79
1 201004 71
1 201005 65
1 201006 66
1 201007 66
1 201008 78
1 201009 88
1 201010 54
1 201011 45
1 201012 100
1 201101 136
1 201102 111
1 201103 17
1 201104 77
1 201105 111
1 201106 95
1 201107 79
1 201108 777
1 201109 758
1 201110 32
1 201111 15
1 201112 22
2 200711 150
2 200712 150
2 200801 44
2 200802 385
2 200803 65
2 200804 66
2 200805 200
2 200806 333
2 200807 285
2 200808 265
2 200809 222
2 200810 220
2 200811 205
2 200812 185
2 200901 65
2 200902 45
2 200903 69
2 200904 546
2 200905 21
2 200906 256
2 200907 214
2 200908 14
2 200909 44
2 200910 65
2 200911 88
2 200912 79
2 201001 65
2 201002 45
2 201003 69
2 201004 54
2 201005 14
2 201006 26
2 201007 98
Desired Output should like below,
ID YEARMONTH NO_OF_CUST AFTER_MONTH_1 AFTER_MONTH_2 AFTER_MONTH_3 AFTER_MONTH_4 AFTER_MONTH_5 AFTER_MONTH_6 AFTER_MONTH_7 AFTER_MONTH_8 AFTER_MONTH_9 AFTER_MONTH_10 AFTER_MONTH_11 AFTER_MONTH_12
1 200909 50 77 86 95 95 87 79 71 65 66 66 78 88
Step1: Join your sample table with the main(test) table and using intnx to get all the values for next 12 months.
Step2: Making a column names "after month"
Step3: Transpose to get your final output
proc sql;
create table abc as
select a.id,a.yearmonth,b.yearmonth as yearmonth1, b.no_of_cust
from
sample a
left join
test b
on a.id = b.id and a.yearmonth <= b.yearmonth <= intnx("month",a.yearmonth,12)
order by a.id,a.yearmonth,b.yearmonth;
quit;
data abc1(drop=col yearmonth1);
set abc;
by id yearmonth;
if first.yearmonth then col=-1;
col+1;
columns = compress("after_month_"||col);
run;
proc transpose data=abc1 out=abc2(rename=(after_month_0 = no_of_cust) drop=_name_);
by id yearmonth;
id columns;
var no_of_cust;
run;
My Output:
Or
If you want to make changes in your query then you could use the below code.
proc sort data=test;
by id descending yearmonth;
run;
data result;
set test;
array after_month {13} after_MONTH_0-after_MONTH_12;
by id;
if first.id then do;
do i = 1 to 13;
after_month(i) = 0;
end;
end;
do i = 13 to 2 by -1;
after_month(i) = after_month(i-1);
end;
after_month(1) = NO_OF_CUST;
drop i after_MONTH_0;
retain after_MONTH:;
run;
data sample1;
set sample(drop=no_of_cust);
run;
proc sort data=result;
by id yearmonth;
run;
proc sort data=sample1;
by id yearmonth;
run;
data all;
merge sample1(in=a) result(in=b);
by id yearmonth;
if a;
run;
Let me know in case of any queries.
For the below x, y data, how to plot vertical line exactly in the middle of the plot (please note, X should not be ordered and plot be as it is).
Also how to plot a vertical line at x=5 (a distance on X from X=0) when X in the data below is taken as 0, 1, 2, 3.. and so forth.
data sample;
infile cards truncover expandtabs;
input X Y;
cards;
29 21
18 23
28 24
16 26
3 27
18 29
2 33
3 37
26 39
2 42
25 47
9 54
13 57
17 58
29 60
5 63
23 66
4 69
3 72
17 73
7 73
12 72
8 69
20 66
12 63
8 60
28 58
3 57
18 54
11 47
21 42
8 39
1 37
16 29
3 27
17 22
3 19
6 17
19 14
18 10
;
run;
I tried:
proc sort data=sample ;
by x;
run;
proc sgplot data=sample;
needle x=x y=y;
run;
data Trapezoidal;
set sample end=last;
dif_x=dif(x);
mean_y=mean(lag(y),y);
integral + (dif_x*mean_y);
if last then putlog 'area under curve is ' integral;
run;
Vertical lines are plotted with refline. Determine what the 'middle' is however you wish (using proc means or proc sql or similar), get it into a variable in your dataset or a macro variable, and use refline in proc sgplot to produce the line.
Same applies for your specific-values-of-x (except you don't actually need to do anything there to produce the value to plot at). Add refline x=5; or similar to your plots to get them.
You could also use band plots if you're trying to highlight certain areas.
This is my first foray into using SAS macros, and I'm following this page from the amazing UCLA Stats Consulting Group. I'm interested in using macro variables in PROC MIXED to avoid copying and pasting blocks of code (my actual data set has ~400 variables).
My example modifies the UCLA example to have students in many schools.
data hsb3;
input id school female race ses prog
read write math science socst;
datalines;
1 1 0 4 1 1 57 52 41 47 57
2 1 1 4 2 3 68 59 53 63 61
3 1 0 2 3 1 44 33 54 58 31
4 1 0 4 3 3 63 44 47 53 56
5 1 0 4 2 2 47 51 43 50 61
6 1 1 4 2 2 44 52 51 50 61
7 1 0 3 2 1 50 59 60 56 52
8 1 0 1 2 2 34 46 52 53 57
9 1 0 4 2 2 63 57 51 63 61
19 2 0 3 1 2 57 63 41 63 61
20 2 1 4 2 2 60 57 51 58 31
21 2 0 4 3 2 57 55 51 53 56
22 2 0 4 3 2 73 46 71 50 61
23 2 0 4 2 1 54 65 57 50 61
24 2 1 4 2 2 45 60 50 56 52
25 2 0 3 2 1 42 63 43 53 57
26 2 0 1 1 2 34 57 51 63 61
27 2 0 4 2 2 63 49 60 55 31
10 3 1 3 2 2 57 55 51 55 31
11 3 1 4 3 3 60 46 71 31 56
12 3 1 4 2 2 57 66 57 55 61
13 3 0 3 3 2 50 60 50 31 61
14 3 0 4 3 2 57 57 57 55 46
15 3 0 3 3 3 68 55 50 31 56
16 3 0 4 1 2 34 46 43 50 56
17 3 0 4 3 2 34 65 51 50 56
18 3 0 4 1 2 63 60 60 47 57
28 4 1 3 2 2 57 52 52 53 61
29 4 1 4 2 3 60 57 51 63 61
30 4 1 1 2 2 57 65 51 55 46
31 4 0 4 3 2 73 60 71 31 56
32 4 0 4 3 2 54 63 57 55 46
33 4 0 3 1 2 45 57 50 31 56
34 4 0 1 1 1 42 49 43 50 56
35 4 0 4 3 2 47 52 51 50 56
36 4 0 4 2 1 57 57 60 56 52
;
run;
The UCLA example shows how to use macro variables with proc reg to do several simple linear regression models to predict reading score with any of the other variables:
%let indvars = write math female socst;
proc reg data = hsb3;
model read = &indvars;
run;
quit;
To do this taking school into account, we can use PROC MIXED instead:
proc mixed data = hsb3;
class school;
model read = &indvars;
random school;
run;
quit;
But what I really want to do is to see if any of the scores differ by gender (still taking school into account).
%let scores = read write math science socst;
proc mixed data = hsb3;
class school;
model &scores = female;
random school;
run;
quit;
Now I get the error:
NOTE: The SAS System stopped processing this step because of errors.
167 class school;
168 model &indvars = female;
-
22
200
NOTE: Line generated by the macro variable "INDVARS".
1 write math female socst
----
73
ERROR 22-322: Syntax error, expecting one of the following: a name, ;, (, *, -, /, :, #,
_CHARACTER_, _CHAR_, _NUMERIC_, |.
ERROR 200-322: The symbol is not recognized and will be ignored.
ERROR 73-322: Expecting an =.
Somehow the macro variable is not working. Is there a problem with using macro variables as a response variable in PROC MIXED? They work as a response variable in PROC REG....
proc reg data = hsb3;
model &scores = female;
run;
quit;
Your problem doesn't have anything to do with macro variables or macro code. Instead you are not creating a valid MODEL statement to use in PROC MIXED.
The MODEL statement names a single dependent variable ...
Try transforming the data perhaps?
%let scores = read write math science socst;
data want ; set hsb3 ;
array scores &scores ;
do i=1 to dim(scores);
score=scores(i);
name=vname(scores(i));
output;
end;
run;
proc sort; by name ; run;
proc mixed data = want;
by name;
class school;
model score = female;
random school;
run;
data test;
infile datalines;
input k1 k2 k3 k4 k5 k6 k7 k8 k9 k10;
array a(*) k1-k10;
do i=1 to 10;
if a(i) eq . then stop;
line=a(i);
input #line k1 k2 k3 k4 k5 k6 k7 k8 k9 k10;
output;
end;
stop;
datalines;
5 9 2 4 6 3 . . . .
29 57 32 9 2 29 2 0 23 1
83 34 28 1 43 3 24 2 6 2
0 84 62 75 3 52 65 1 5 2
0 2 12 45 92 3 60 24 6 2
47 24 87 2 52 36 1 17 3 1
90 93 2 1 40 20 75 2 5 14
78 27 27 2 4 1 12 21 4 2
21 40 3 21 3 19 3 2 4 2
84 2 5 3 13 6 23 98 1 2
;
run;
I want to read only the observations whose numbers are in the first row. The expected result:
0 2 12 45 92 3 60 24 6 2
21 40 3 21 3 19 3 2 4 2
29 57 32 9 2 29 2 0 23 1
0 84 62 75 3 52 65 1 5 2
47 24 87 2 52 36 1 17 3 1
83 34 28 1 43 3 24 2 6 2
The error I get after running my code:
ERROR: Old line 3387 wanted but SAS is at line 3391.
Use: INFILE N=X; , with a suitable value of x.
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0
3391 47 24 87 2 52 36 1 17 3 1
k1=0 k2=2 k3=12 k4=45 k5=92 k6=3 k7=60 k8=24 k9=6 k10=2 i=2 line=2 _ERROR_=1 _N_=1
What does "a suitable value of x" mean? What should I change in my code?
You are overwriting the values in your array with your second input statement. Here they are read into different variables so as not to be overwritten.
data test;
infile datalines n=100;
input h1 h2 h3 h4 h5 h6 h7 h8 h9 h10;
array h{*} h1-h10;
do i = 1 to 10;
line = h[i];
if line then do;
input #line k1 k2 k3 k4 k5 k6 k7 k8 k9 k10;
output;
end;
end;
keep k:;
datalines;
5 9 2 4 6 3 . . . .
29 57 32 9 2 29 2 0 23 1
83 34 28 1 43 3 24 2 6 2
0 84 62 75 3 52 65 1 5 2
0 2 12 45 92 3 60 24 6 2
47 24 87 2 52 36 1 17 3 1
90 93 2 1 40 20 75 2 5 14
78 27 27 2 4 1 12 21 4 2
21 40 3 21 3 19 3 2 4 2
84 2 5 3 13 6 23 98 1 2
;
run;
SAS is telling you that you need to amend your infile statement to allow it to read a sufficient number of lines ahead. For your code as written, n=10 should be ok, as none of variables you're using to get the line number have values greater than 10.
data test;
/*Add the n= option to the infile statement as suggested by log message*/
infile datalines n= 10;
input k1 k2 k3 k4 k5 k6 k7 k8 k9 k10;
array a(*) k1-k10;
array b(*) b1-b10;
/*Make a copy of the first row
that won't get overwritten by subsequent input statements*/
do i=1 to 10;
b(i) = a(i);
end;
do i=1 to 10;
if b(i) eq . then stop;
line=b(i);
input #line k1 k2 k3 k4 k5 k6 k7 k8 k9 k10;
output;
end;
stop;
datalines;
5 9 2 4 6 3 . . . .
29 57 32 9 2 29 2 0 23 1
83 34 28 1 43 3 24 2 6 2
0 84 62 75 3 52 65 1 5 2
0 2 12 45 92 3 60 24 6 2
47 24 87 2 52 36 1 17 3 1
90 93 2 1 40 20 75 2 5 14
78 27 27 2 4 1 12 21 4 2
21 40 3 21 3 19 3 2 4 2
84 2 5 3 13 6 23 98 1 2
;
run;