I am currently trying to use PROC SGPLOT in SAS to create a series plot with five lines (8th grade, 10th grade, 12th grade, College Students, and Young Adults). The yaxis is a percentage of prevalence in drug use ranging from 0-100. The xaxis is the year 1975-2019, but formatted (using proc format) so that it shows the value of year as '75-'19. I would like to label each line using its respective group (8th grade - Young Adult). But when I use:
proc sgplot data = save.fig2_1data noautolegend ;
series x=year y=eighth / lineattrs=(color=orange) curvelabel='8th Grade' curvelabelpos=start ;
series x=year y=tenth / lineattrs=(color=green) curvelabel='10th Grade' curvelabelpos=start ;
series x=year y=twelfth / lineattrs=(color=blue) curvelabel='12th Grade' curvelabelpos=start;
series x=year y=college / lineattrs=(color=red) curvelabel='College Students' curvelabelpos=start;
series x=year y=youngadult / lineattrs=(color=purple) curvelabel='Young Adults' curvelabelpos=start ;
xaxis label="YEAR" values=(1975 to 2019 by 2) minor;
yaxis label="PERCENT" max=100 min=0 ;
format year yr. ; run ;
Series Plot
The "curvelabelpos=" does not give the option to place my label above the first data point of "12th Grade" and "College Students" so that my xaxis does not have all of the space on the left side of the plot. How do I move these two labels above the first data point of each line so that the xaxis does not have empty space?
There are no series statement options that will produce the labeling you want.
You will have to create an annotation data set for the sgplot.
In this sample code the curvelabel= option was set to '' so the procedure generates a series line that uses the widest amount of horizontal drawing space. The sganno data set contains the annotation functions that will draw your own curvelabel text near the first data point of the series with the blank curvelabel. Adjust the %sgtext anchor= value as needed. Be sure to read the SG Annotation Macro Dictionary documentation to understand all the text annotation capabilities.
For the case of wanting an artificial split in the series lines there are two things to try:
introduce a fake year, 2012.5, for which none of the series variables have a value. I tried this but only 1 of 5 series drew with a 'fake' split.
introduce N new variables for the N lines needing a split. For the post split time frame copy the data into the new variables and set the original to missing.
add SERIES statements for the new variables.
data have;
call streaminit(1234);
do year = 1975 to 2019;
array response eighth tenth twelfth college youngadult;
if year >= 1991 then do;
eighth = round (10 + rand('uniform',10), .1);
tenth = eighth + round (5 + rand('uniform',5), .1);
twelfth = tenth + round (5 + rand('uniform',5), .1);
if year in (1998:2001) then tenth = .;
end;
else do;
twelfth = 20 + round (10 + rand('uniform',25), .1);
end;
if year >= 1985 then do;
youngadult = 25 + round (5 + rand('uniform',20), .1);
end;
if year >= 1980 then do;
college = 35 + round (7 + rand('uniform',25), .1);
end;
if year >= 2013 then do _n_ = 1 to dim(response);
%* simulate inflated response level;
if response[_n_] then response[_n_] = 1.35 * response[_n_];
end;
output;
end;
run;
data have_split;
set have;
array response eighth tenth twelfth college youngadult;
array response2 eighth2 tenth2 twelfth2 college2 youngadult2;
if year >= 2013 then do _n_ = 1 to dim(response);
response2[_n_] = response[_n_];
response [_n_] = .;
end;
run;
ods graphics on;
ods html;
%sganno;
data sganno;
%* these variables are used to track '1st' or 'start' point
%* of series being annotated
;
retain y12 ycl;
set have;
if missing(y12) and not missing(twelfth) then do;
y12=twelfth;
%sgtext(label="12th Grade", textcolor="blue", drawspace="datavalue", anchor="top", x1=year, y1=y12, width=100, widthunit='pixel')
end;
if missing(ycl) and not missing(college) then do;
ycl=college;
%sgtext(label="College Students", textcolor="red", drawspace="datavalue", anchor="bottom", x1=year, y1=ycl, width=100, widthunit='pixel')
end;
run;
proc sgplot data=have_split noautolegend sganno=sganno;
series x=year y=eighth / lineattrs=(color=orange) curvelabel='8th Grade' curvelabelpos=start;*auto curvelabelloc=outside ;
series x=year y=tenth / lineattrs=(color=green) curvelabel='10th Grade' curvelabelpos=start;*auto curvelabelloc=outside ;
series x=year y=twelfth / lineattrs=(color=blue) curvelabel='' curvelabelpos=start;*auto curvelabelloc=outside ;
series x=year y=college / lineattrs=(color=red) curvelabel='' curvelabelpos=start;*auto curvelabelloc=outside ;
series x=year y=youngadult / lineattrs=(color=purple) curvelabel='Young Adults' curvelabelpos=start;*auto curvelabelloc=outside ;
* series for the 'shifted' time period use the new variables;
series x=year y=eighth2 / lineattrs=(color=orange) ;
series x=year y=tenth2 / lineattrs=(color=green) ;
series x=year y=twelfth2 / lineattrs=(color=blue) ;
series x=year y=college2 / lineattrs=(color=red) ;
series x=year y=youngadult2 / lineattrs=(color=purple) ;
xaxis label="YEAR" values=(1975 to 2019 by 2) minor;
yaxis label="PERCENT" max=100 min=0 ;
run ;
ods html close;
ods html;
Richard's answered what you explicitly want, but I think what you want isn't ideal from a graphical standpoint - and that's why SAS won't do it for you.
Labelling over a line is hard to read, especially when you use the same color as the line. Labelling outside the chart is much cleaner, as is placing the labels in a keylegend.
In this case, I would use CURVELABELLOC=OUTSIDE, and either use CURVELABELPOS=MAX (default, which places them to the right of the chart), or CURVELABELPOS=MIN, which places them nearer the start as you prefer but also overlays the axis (which is not as clean-looking).
See this as an example. This is highly legible, the curve labels are in a place that the eye naturally travels to, and doesn't alter the size of the axis. Putting them at the right also means they're in the same spot for all of the lines, which is cleaner than having them at the start of the lines which are staggered.
data fig2_1data;
call streaminit(7);
tenth = 0.5;
twelfth= 0.6;
do year=1975 to 2019;
if year eq 1987 then eighth=0.4;
eighth = rand('Uniform',0.2)-0.1 + eighth;
tenth = rand('Uniform',0.2)-0.1 + tenth;
twelfth = rand('Uniform',0.2)-0.1 + twelfth;
output;
end;
run;
proc sgplot data = fig2_1data noautolegend ;
series x=year y=eighth / lineattrs=(color=orange)
curvelabel='8th Grade' curvelabelpos=max curvelabelloc=outside;
series x=year y=tenth / lineattrs=(color=green)
curvelabel='10th Grade' curvelabelpos=max curvelabelloc=outside;
series x=year y=twelfth / lineattrs=(color=blue)
curvelabel='12th Grade' curvelabelpos=max curvelabelloc=outside;
xaxis label="YEAR" values=(1975 to 2019 by 2) minor;
yaxis label="PERCENT" max=1 min=0 ;
format year yr. ; run ;
Related
The data I have is
Year Score
2020 100
2020 45
2020 82
.
.
.
2020 91
2020 14
2020 35
And the output I want is
Score_Ranking Count_Percent Cumulative_count_percent Sum
top100 x y z
101-200
.
.
.
800-900
900-989
The dataset has a total of 989 observations for the same year. I want to divide the whole dataset into 10 bins but set the size to 100. However, if I use the proc hpbin function, my results get divided into 989/10 bins. Is there a way I can determine the bin size?
Also, I want additional rows that show proportion, cumulative proportion, and the sum of the scores. How can I print these next to the bins?
Thank you in advance.
Sort your data
Classify into bins
Use PROC FREQ for #/Cumulative Count
Use PROC FREQ for SUM by using WEIGHT
Merge results
Or do 3-4 in same data step.
I'm not actually sure what the first two columns will tell you as they will all be the same except for the last one.
First generate some fake data to work with, the sort is important!
*generate fake data;
data have;
do score=1 to 998;
output;
end;
run;
proc sort data=have;
by score;
run;
Method #1
Note that I use a view here, not a data set which can help if efficiency may be an issue.
*create bins;
data binned / view=binned;
set have ;
if mod(_n_, 100) = 1 then bin+1;
run;
*calculate counts/percentages;
proc freq data=binned noprint;
table bin / out=binned_counts outcum;
run;
*calculate sums - not addition of WEIGHT;
proc freq data=binned noprint;
table bin / out=binned_sum outcum;
weight score;
run;
*merge results together;
data want_merged;
merge binned_counts binned_sum (keep = bin count rename = count= sum);
by bin;
run;
Method #2
And another method, which requires a single pass of your data rather than multiple as in the PROC FREQ approach:
*manual approach;
data want;
set have
nobs = _nobs /*Total number of observations in data set*/
End=last /*flag for last record*/;
*holds values across rows and sets initial value;
retain bin 1 count cum_count cum_sum 0 percent cum_percent ;
*increments bins and resets count at start of each 100;
if mod(_n_, 100) = 1 and _n_ ne 1 then do;
*output only when end of bin;
output;
bin+1;
count=0;
end;
*increment counters and calculate percents;
count+1;
percent = count / _nobs;
cum_count + 1;
cum_percent = cum_count / _nobs;
cum_sum + score;
*output last record/final stats;
if last then output;
*format percents;
format percent cum_percent percent12.1;
run;
I am comparing the evolution of plasma concentrations over time for different treatments of patients.
We applied each treatment to different subjects and for each treatment we want a graph with the evolution for each subject in black, as well as for the the mean in red.
It should look like this
but it does look like this
My data has variable
trtan and trta for treatment number and name
subjid for the patient receiving that treatment
ATPT for timepoint
AVAL for Individual Concentrations
MEAN for average Concentrations
I am using SGPLOT to produce this line plot. y axis has concentrations while x axis has time points, I am sorting data by treatment, subject and timepoint before passing to Proc SGPLOT.
Lines for indivizual subjects are fine, Issue is with mean line plot, Since dataset is sorted by subject i am getting multiple mean plots by subject as well.
My requirement is to have multiple indivizual plots and an overlaying mean plot. Can anyone advise how can i solve this.
I am using below code. How can I repair it?
proc sort data = pc2;
by trtan trta subjid atptn atpt;
run;
proc sgplot data = pc2 dattrmap = anno pad = (bottom = 20%) NOAUTOLEGEND ;
by trtan trta;
series x = atptn y = aval/ group = trta
lineattrs = (color = black thickness = 1 pattern = solid );
series x = atptn y = mean/ group = trta attrid = trtcolor
lineattrs = (thickness = 2 pattern = solid );
xaxis label= "Actual Time (h)"
labelattrs = (size = 10)
values = (0 12 24 36 48 72 96 120 168)
valueattrs = (size = 10)
grid;
yaxis label= "Plasma Concentration (ng/mL)"
labelattrs = (size = 10)
valueattrs = (size = 10)
grid;
run;
This is not a problem with the mean only.
Leave out the mean, ass min=-20 to your yaxis specification, and you will see the same problem.
Alternatively run this code
data pc2;
do subj = 1 to 3;
do time = 1 to 25;
value = 2*sin(time/3) + rand('Normal');
output;
end;
end;
run;
proc sgplot data=pc2;
series x=time y=value;
run;
and you will get
The solution is to have one plot for each subject, so first sort the data by time and transpose it to have one variable subj_1 etc. for each subject.
proc sort data=pc2 out=SORTED;
by time subj;
run;
proc transpose data=TEST out=TRANS prefix=subj_;
by time;
id subj;
run;
I leave it as an exercise for you to add the mean to this dataset.
Then run sgplot with a series statement per subject. To build these statements, we interrogate the meta data in dataset WORK.TRANS
proc sql;
select distinct 'series x=time y='|| name ||'/lineattrs = (color=black)'
into :series_statements separated by ';'
from sasHelp.vColumn
where libname eq 'WORK' and memName eq 'TRANS'
and (name like 'subj%' or name = mean;
quit;
proc sgplot data=TRANS;
&series_statements;
run;
The result, without the mean, looks like this for my example:
Of course, you will have to do some graphical fine tuning.
We can achive it simply by taking the mean by ATPT and then instead of merging the mean record to the PK data by ATPT, you need to append the records and then you can run your code and it will give you the result you are expecting, please let me know if it does not work, it seems to have worked for me.
I have a dataset that looks like the following:
pt_fin Admit_Type MONTH_YEAR BED_ORDERED_TO_DISPO (minutes)
1 Acute Jan 214
2 Acute Jan 628
3 ICU Jan 300
4 ICU Feb 99
I already have a code (see below) that produces a plot with a x (admit type grouped my month) and y axes (median bed to dispo time), but I want to add a secondary Y axes which counts the number of patients which were used to compute each respective median.
For example, I want a secondary Y axis data point that corresponds to the month and admit type, so for Jan, the secondary Y axis data point will have a 2 separate counts 1)of the patients admitted to acute and 2) of the patients admitted to ICU.
proc sgplot data=Combined;
title "Median Bed Order To Dispo By Month, Admit Location";
vbar MONTH_YEAR / response=BED_ORDERED_TO_DISPO stat=median
group = Admit_Type groupdisplay=cluster ;
run;
I've been trying to adapt what I've found here but the plots my code produces are super messy and incorrect.
https://blogs.sas.com/content/iml/2019/01/14/align-y-y2-axes-sgplot.html
Desired output(pretend X's and *'s, respectively, are connected in a line graph corresponding to the Y axis):
| * |
m | | | X | | #
e | x | | * |
d | | | | | |
|-------------------------------|
Acute ICU Acute ICU
Jan FEb
Code which I've tried that produce rubbish
proc sgplot data=Combined;
vbarbasic MONTH_YEAR/ response=Bed_Order_Hour y2axis; /*needs to be on y axis 1*/
group = Admit_Type
series x=MONTH_YEAR y=Pt_fin/ markers; *Pt_fin needs to be on y axis 2*/
run;
Your visualization explanation is weak. You might want to use two plotting statements in your SGPLOT, VBAR and VLINE.
data have;
do type = 'Acute', 'ICU';
do month = '01jan2018'd to '31dec2018'd;
do _n_ = 1 to floor (50 * ranuni(123));
patid + 1;
minutes = 10 + floor(1000 * ranuni(123));
output;
end;
month = intnx ('month', month, 0, 'e');
end;
end;
format month monname3.;
run;
ods html5 file="plot.html" path="c:\temp";
proc sgplot data=have;
title "Median of patient minutes by month";
vbar month / group=type groupdisplay=cluster response=minutes stat=median;
vline month / group=type groupdisplay=cluster response=minutes stat=freq y2axis ;
run;
ods html5 close;
The vline presents the viewer a secondary focus on the frequency for each median. The same information (as an aspect) of the median could be communicated instead with just a modification of the vbar intensity. The highest freq bars (of median) would be 'strongest' shade and the lower 'freq' bars would be faded.
How to plot the dynamic of the variable y over x in SAS (the easiest way!), for example, temperature changes over time. Thank you very much! (PC-SAS)
Two common ways are the scatter and series statements in Proc SGPLOT.
For example:
data have;
do time = 0 to 24;
time = time + ranuni(123) - 1.2;
temp = 60 + 6 * sin(time / 7.6);
rowNumber + 1;
output;
end;
run;
ods graphics on / width=325px;
proc sgplot data=have;
title "Scatter";
footnote "Created: &sysdate";
scatter x=time y=temp;
run;
proc sgplot data=have;
title "Series";
footnote "Created: &sysdate";
series x=time y=temp / markers;
run;
I attempt to crate histograms plot via proc univariate. The target is to crate the distribution with bins of 0.1 width from 0 to 1.5 and then all the remaining in one bin.
I applied the following code to identify the range from 0 to 1.5, while it cannot manage the rest. How can I correct the code?
proc univariate data=HAVE;
where pred between 0 and 1.5;
var pred;
histogram pred/ vscale=percent midpoints=0 to 2 by 0.1 normal (noprint);
run;
You can try something like the following code to combine two Histograms by creating two variables from one variable:
/*Temporary DS with values ranging from 01. to 2.0*/
data have;
do i=0.1 to 2.0 by 0.1;
output;
end;
rename i=pred;
run;
/*Creating two variables x(0.1-1.5) and y(1.6-2.0)*/
data have;
set have;
if pred<1.6 then x=pred;
else y=pred;
drop pred;
run;
/*Combine two Histograms*/
proc sgplot data=have;
histogram x / nbins=15 binwidth=0.1;
density x / type=normal;
histogram y / nbins=5 binwidth=1.0;
density y / type=normal;
keylegend / location=inside position=topright noborder across=2;
xaxis display=(nolabel) values=(0.1 to 2.5 by 0.1);
run;
Create your own groups
Create a format so it shows the way you'd like
Plot it with SGPLOT
*create your own groups for data, especially the last group;
data mileage;
set sashelp.cars;
mpg_group=floor(mpg_highway / 10);
if mpg_group in (5, 6, 7) then
mpg_group=5;
keep mpg_highway mpg_group;
run;
*format to control display;
proc format;
value mpg_fmt 1='0 to 10' 2='11 to 20' 3='21 to 30' 4='31 to 40' 5='40+';
run;
*plot the data;
proc sgplot data=mileage;
vbar mpg_group /stat=freq barwidth=1;
format mpg_group mpg_fmt.;
run;