plotting a vertical line exactly in the middle and at specific value of X in SAS - sas

For the below x, y data, how to plot vertical line exactly in the middle of the plot (please note, X should not be ordered and plot be as it is).
Also how to plot a vertical line at x=5 (a distance on X from X=0) when X in the data below is taken as 0, 1, 2, 3.. and so forth.
data sample;
infile cards truncover expandtabs;
input X Y;
cards;
29 21
18 23
28 24
16 26
3 27
18 29
2 33
3 37
26 39
2 42
25 47
9 54
13 57
17 58
29 60
5 63
23 66
4 69
3 72
17 73
7 73
12 72
8 69
20 66
12 63
8 60
28 58
3 57
18 54
11 47
21 42
8 39
1 37
16 29
3 27
17 22
3 19
6 17
19 14
18 10
;
run;
I tried:
proc sort data=sample ;
by x;
run;
proc sgplot data=sample;
needle x=x y=y;
run;
data Trapezoidal;
set sample end=last;
dif_x=dif(x);
mean_y=mean(lag(y),y);
integral + (dif_x*mean_y);
if last then putlog 'area under curve is ' integral;
run;

Vertical lines are plotted with refline. Determine what the 'middle' is however you wish (using proc means or proc sql or similar), get it into a variable in your dataset or a macro variable, and use refline in proc sgplot to produce the line.
Same applies for your specific-values-of-x (except you don't actually need to do anything there to produce the value to plot at). Add refline x=5; or similar to your plots to get them.
You could also use band plots if you're trying to highlight certain areas.

Related

How to read from a variable having varying character length from in-stream data in SAS?

Hopefully a easy one.
I am fairly new to SAS.
I am trying to read data using following SAS code (as per given in lecture notes), but it is not producing proper SAS dataset because of some character length issue. I tried : and & some hit and trial but not able to fix it. Can anyone kindly explain me on how can I fix this?
data SMSA_subset_weather;
length city $ 27;
input city & JanTemp JulyTemp RelHum Rain;
datalines;
Akron, OH 27 71 59 36
Albany-Schenectady-Troy, NY 23 72 57 35
Baltimore, MD 35 77 55 43
Allentown, Bethlehem, PA-NJ 29 74 54 44
Atlanta, GA 45 79 56 47
;
run
;
I am using SAS ON DEMAND. When I use dsd it is giving me the desired output, like below:
data SMSA_subset_weather;
infile datalines delimiter=" " dsd;
length city $ 27;
input city & JanTemp JulyTemp RelHum Rain;
datalines;
"Akron, OH" 27 71 59 36
"Albany-Schenectady-Troy, NY" 23 72 57 35
"Baltimore, MD" 35 77 55 43
"Allentown, Bethlehem, PA-NJ" 29 74 54 44
"Atlanta, GA" 45 79 56 47
;
run
;
I am achieving this by modifying the instream data and using dsd option. But I assume I cant handle if there were more of such observations.
You need a second space following the city name, or quotation marks.
When using a delimiter that may appear in the data, you can use the & as you do in order to ask SAS to read until it finds two consecutive delimiters. (You also can use DSD option and quotation marks.)
So here:
data SMSA_subset_weather;
length city $ 27;
input city & JanTemp JulyTemp RelHum Rain;
datalines;
Akron, OH 27 71 59 36
Albany-Schenectady-Troy, NY 23 72 57 35
Baltimore, MD 35 77 55 43
Allentown, Bethlehem, PA-NJ 29 74 54 44
Atlanta, GA 45 79 56 47
;
run
;
Or here, using DSD with quotation marks:
data SMSA_subset_weather;
length city $ 27;
infile datalines dsd dlm=' ';
input city JanTemp JulyTemp RelHum Rain;
datalines;
"Akron, OH" 27 71 59 36
"Albany-Schenectady-Troy, NY" 23 72 57 35
"Baltimore, MD" 35 77 55 43
"Allentown, Bethlehem, PA-NJ" 29 74 54 44
"Atlanta, GA" 45 79 56 47
;
run
;

SAS macro variables in PROC MIXED

This is my first foray into using SAS macros, and I'm following this page from the amazing UCLA Stats Consulting Group. I'm interested in using macro variables in PROC MIXED to avoid copying and pasting blocks of code (my actual data set has ~400 variables).
My example modifies the UCLA example to have students in many schools.
data hsb3;
input id school female race ses prog
read write math science socst;
datalines;
1 1 0 4 1 1 57 52 41 47 57
2 1 1 4 2 3 68 59 53 63 61
3 1 0 2 3 1 44 33 54 58 31
4 1 0 4 3 3 63 44 47 53 56
5 1 0 4 2 2 47 51 43 50 61
6 1 1 4 2 2 44 52 51 50 61
7 1 0 3 2 1 50 59 60 56 52
8 1 0 1 2 2 34 46 52 53 57
9 1 0 4 2 2 63 57 51 63 61
19 2 0 3 1 2 57 63 41 63 61
20 2 1 4 2 2 60 57 51 58 31
21 2 0 4 3 2 57 55 51 53 56
22 2 0 4 3 2 73 46 71 50 61
23 2 0 4 2 1 54 65 57 50 61
24 2 1 4 2 2 45 60 50 56 52
25 2 0 3 2 1 42 63 43 53 57
26 2 0 1 1 2 34 57 51 63 61
27 2 0 4 2 2 63 49 60 55 31
10 3 1 3 2 2 57 55 51 55 31
11 3 1 4 3 3 60 46 71 31 56
12 3 1 4 2 2 57 66 57 55 61
13 3 0 3 3 2 50 60 50 31 61
14 3 0 4 3 2 57 57 57 55 46
15 3 0 3 3 3 68 55 50 31 56
16 3 0 4 1 2 34 46 43 50 56
17 3 0 4 3 2 34 65 51 50 56
18 3 0 4 1 2 63 60 60 47 57
28 4 1 3 2 2 57 52 52 53 61
29 4 1 4 2 3 60 57 51 63 61
30 4 1 1 2 2 57 65 51 55 46
31 4 0 4 3 2 73 60 71 31 56
32 4 0 4 3 2 54 63 57 55 46
33 4 0 3 1 2 45 57 50 31 56
34 4 0 1 1 1 42 49 43 50 56
35 4 0 4 3 2 47 52 51 50 56
36 4 0 4 2 1 57 57 60 56 52
;
run;
The UCLA example shows how to use macro variables with proc reg to do several simple linear regression models to predict reading score with any of the other variables:
%let indvars = write math female socst;
proc reg data = hsb3;
model read = &indvars;
run;
quit;
To do this taking school into account, we can use PROC MIXED instead:
proc mixed data = hsb3;
class school;
model read = &indvars;
random school;
run;
quit;
But what I really want to do is to see if any of the scores differ by gender (still taking school into account).
%let scores = read write math science socst;
proc mixed data = hsb3;
class school;
model &scores = female;
random school;
run;
quit;
Now I get the error:
NOTE: The SAS System stopped processing this step because of errors.
167 class school;
168 model &indvars = female;
-
22
200
NOTE: Line generated by the macro variable "INDVARS".
1 write math female socst
----
73
ERROR 22-322: Syntax error, expecting one of the following: a name, ;, (, *, -, /, :, #,
_CHARACTER_, _CHAR_, _NUMERIC_, |.
ERROR 200-322: The symbol is not recognized and will be ignored.
ERROR 73-322: Expecting an =.
Somehow the macro variable is not working. Is there a problem with using macro variables as a response variable in PROC MIXED? They work as a response variable in PROC REG....
proc reg data = hsb3;
model &scores = female;
run;
quit;
Your problem doesn't have anything to do with macro variables or macro code. Instead you are not creating a valid MODEL statement to use in PROC MIXED.
The MODEL statement names a single dependent variable ...
Try transforming the data perhaps?
%let scores = read write math science socst;
data want ; set hsb3 ;
array scores &scores ;
do i=1 to dim(scores);
score=scores(i);
name=vname(scores(i));
output;
end;
run;
proc sort; by name ; run;
proc mixed data = want;
by name;
class school;
model score = female;
random school;
run;

select specific rows by row number in sas

I am new to SAS I have SAS data like (It does not contain Obs column)
Obs ID Name Score1 Score2 Score3
1 101 90 95 98
2 203 78 77 75
3 223 88 67 75
4 280 68 87 75
.
.
.
.
100 468 78 77 75
I want data having row number 2 6 8 10 34. Output should look like
Obs ID Name Score1 Score2 Score3
1 203 78 77 75
2 227 88 67 75
3 280 68 87 75
.
.
.
Thanks in advance.
The other answer is ok for small tables, but if you are working with a very large table it's inefficient as it reads every row in the table to see whether it has the right row number. Here's a more direct approach:
data example;
do i = 2, 6, 8, 10;
set sashelp.class point = i;
output;
end;
stop;
run;
This picks out just the rows you actually want and doesn't read all the others.
You can loop through each line of data with a data step and only output the lines when you are in the n'th loop with a condition like this.
data test;
set LIB.TABLE;
if _N_ in (2, 6, 8, 10, 34) then output;
run;
where _N_ will correspond to the number of the line in this case.

Use ODS Graphics to produce grouped histogram

I have this data set:
data a1q1;
input pid los age gender $ temp wbc anti service $ ;
cards;
1 5 30 F 99 82 2 M
2 10 73 F 98 52 1 M
3 6 40 F 99 122 2 S
4 11 47 F 98 42 2 S
5 5 25 F 99 112 2 S
6 14 82 M 97 61 2 S
7 30 60 M 100 81 1 M
8 11 56 F 99 72 2 M
9 17 43 F 98 72 2 M
10 3 50 M 98 122 1 S
11 9 59 F 98 72 1 M
12 3 4 M 98 32 2 S
13 8 22 F 100 111 2 S
14 8 33 F 98 141 1 S
15 5 20 F 98 112 1 S
16 5 32 M 99 92 2 S
17 7 36 M 99 61 2 S
18 4 69 M 98 62 2 S
19 3 47 M 97 51 2 M
20 7 22 M 98 62 2 S
21 9 11 M 98 102 2 S
22 11 19 M 99 141 2 S
23 11 67 F 98 42 2 M
24 9 43 F 99 52 2 S
25 4 41 F 98 52 2 M
;
I need to use PROC SGPLOT to output an identical, if not, similar barchart that would be outputted from the following PROC:
proc gchart data = a1q1;
vbar wbc / group = gender;
run;
I need PROC SGPLOT to group the two genders together and not stack them. I have tried coding this way but to no avail:
proc sgplot data = a1q1;
vbar wbc / group= gender response =wbc stat=freq nostatlabel;
run;
How would I go about coding to get the output I need?
Thank you for your time!
Sounds like you should use SGPANEL, not SGPLOT. SGPLOT can make grouped bar charts, but not automatically make histogram bins without using a format (you could do that if you want) and doesn't support group with the histogram plot. However, SGPANEL can handle that.
proc sgpanel data=a1q1;
panelby gender;
histogram wbc;
run;

SAS PROC GLM in a loop

I tried to run PROC GLM in a loop, because I have many models (different combinations of dependent and independent variables), and it’s very time consuming to run them one by one. But log error indicates only one MODEL statement allowed in PROC GLM, so any solutions for this?
my code looks like as below
data old;
input year A1 A2 A3 A4 B C D;
datalines;
2000 22 22 30 37 4 13 14
2000 37 29 31 38 6 16 12
2000 42 29 34 37 3 15 15
2000 28 28 27 35 10 13 15
2000 33 22 37 40 9 12 15
2000 22 29 26 40 3 11 15
2000 37 20 32 40 6 12 13
2001 44 22 33 35 7 20 12
2001 33 20 26 40 6 13 15
2001 32 30 37 35 1 12 13
2001 44 25 31 39 4 20 14
2001 25 30 37 38 4 20 10
2001 43 21 35 38 6 11 10
2001 25 23 34 37 5 17 11
2001 45 30 35 37 1 13 14
2001 48 24 36 39 2 13 15
2001 25 24 35 40 9 16 11
2002 38 26 33 35 2 14 10
2002 29 24 35 36 1 16 13
2002 34 28 32 35 9 16 11
2002 45 26 29 35 9 19 10
2002 26 22 38 35 1 14 12
2002 20 26 26 39 8 17 10
2002 33 20 35 37 9 18 12
;
run;
%macro regression (in, YLIST,XLIST);
%local NYLIST;
%let NYLIST=%sysfunc(countw(&YLIST));
ods tagsets.ExcelXP path='D:\REG' file='Regression.xls';
Proc GLM data=∈ class year;
%do i=1 %to &NYLIST;
Model %scan(&YLIST,&i)=%scan(&XLIST,1) %scan(&XLIST,2)/ solution;
Model %scan(&YLIST,&i)=%scan(&XLIST,1) %scan(&XLIST,2)/ solution;
Model %scan(&YLIST,&i)=%scan(&XLIST,1) %scan(&XLIST,2)/ solution;
%end;
%do i=2 %to &NYLIST;
Model %scan(&YLIST,&i)=%scan(&XLIST,1) %scan(&XLIST,2) %scan(&XLIST,3)/ solution;
Model %scan(&YLIST,&i)=%scan(&XLIST,1) %scan(&XLIST,2) %scan(&XLIST,3)/ solution;
Model %scan(&YLIST,&i)=%scan(&XLIST,1) %scan(&XLIST,2) %scan(&XLIST,3)/ solution;
%end;
run;
ods tagsets.excelxp close;
%mend regression;
options mprint;
%regression
(in=old
,YLIST= A1 A2 A3 A4
,XLIST= B C D);
/*potential solutions*/
%macro regression(data,y,x1,x2,x3);
proc glm data=&data;
class year;
model &y=&x1 &x2 &x3/solution;
run;
%mend regression;
%macro sql (mydataset,y,x1,x2,x3);
proc sql noprint;
select cats('%regression(&mydataset,',&y,',',&x1,',',&x2,',',&x3,')')
into :calllist separated by ' ' from &mydataset;
quit;
&calllist.;
%mend sql;
%sql
(mydataset=old
,y=A1
,X1=B
,X2=C
,X3=D);
You should do this in two steps. One is a macro that contains one instance of PROC GLM:
%macro regression(data,y,x1,x2,x3);
proc glm data=&data;
class year;
model &y &x1 &x2 &x3/solution;
run;
%mend regression;
And then call that macro from something else, either a macro with the looping elements, or better, from a dataset that contains your y/x1/x2/x3 as columns (one row per model statement) using call execute or proc sql select into methods. For example, with a data set modeldata containing the y/x values for your model:
proc sql noprint;
select cats('%regression(mydataset,',y,',',x1,',',x2,',',x3,')') into :calllist separated by ' ' from modeldata;
quit;
&calllist.;
Or similar.