By group controlling line colors/where clause - sas

I want to plot Y by X plot where I group by year, but color code year based on different variable (dry). So each year shows as separate line but dry=1 years plot one color and dry=0 years plot different color. I actually figured one option (yeah!) which is below. But this doesn't give me much control.
Is there a way to put a where clause in the series statement to select specific categories so that I can specifically assign a color (or other format)? Or is there another way? This would be analogous to R where one can use multiple line statements for different subsets of data.
Thanks!!
This code works.
proc sgplot data = tmp;
where microsite_id = "&msit";
by microsite_id ;
yaxis label= "Pct. Stakes" values = (0 to 100 by 20);
xaxis label= 'Date' values = (121 to 288 by 15);
series y=tpctwett x=jday / markers markerattrs=(symbol=plus) group = year grouplc=dry groupmc=dry;
format jday tadjday metajday jdyfmt.;
label tpctwett='%surface water' tadval1='breed' metaval1='meta';
run;

Use an Attribute map, see the documentation
You can use the DRY variable to set the specific colours. For each year, assign the colour using the DRY variable in a data step.
proc sort data=tmp out=attr_data; by year; run;
data attrs;
set attr_data;
id='year';
if dry=0 then linecolor='green';
if dry=1 then linecolor='red';
keep id linecolor;
run;
Then add the dattrmap=attrs in the PROC SGPLOT statement and the attrid=year in the SGPLOT options.
ods graphics / attrpriority=none;
proc sgplot data = tmp dattrmap=attrs;
where microsite_id = "&msit";
by microsite_id ;
yaxis label= "Pct. Stakes" values = (0 to 100 by 20);
xaxis label= 'Date' values = (121 to 288 by 15);
series y=tpctwett x=jday / markers markerattrs=(symbol=plus) group = year grouplc=dry groupmc=dry attrid=year;
format jday tadjday metajday jdyfmt.;
label tpctwett='%surface water' tadval1='breed' metaval1='meta';
run;
Note that I tested and edited this post so it should work now.

Related

Vertical column summation in sas

I have the following piece of result, which i need to add. Seems like a simple request, but i have spent a few days already trying to find the solution to this problem.
Data have:
Measure Jan_total Feb_total
Startup 100 200
Switcher 300 500
Data want:
Measure Jan_total Feb_total
Startup 100 200
Switcher 300 500
Total 400 700
I want individually placed vertical sum results of each column under the respective column please.
Can someone help me arrive at the solution for this request, please?
To do this in data step code, you would do something like:
data want;
set have end=end; * Var 'end' will be true when we get to the end of 'have'.;
jan_sum + jan_total; * These 'sum statements' accumulate the totals from each observation.;
feb_sum + feb_total;
output; * Output each of the original obbservations.;
if end then do; * When we reach the end of the input...;
measure = 'Total'; * ...update the value in Measure...;
jan_total = jan_sum; * ...move the accumulated totals to the original vars...;
feb_total = feb_sum;
output; * ...and output them in an additional observation.
end;
drop jan_sum feb_sum; * Get rid of the accumulator variables (this statement can go anywhere in the step).;
run;
You could do this many other ways. Assuming that you actually have columns for all the months, you might re-write the data step code to use arrays, or you might use PROC SUMMARY or PROC SQL to calculate the totals and add the resulting totals back using a much shorter data step, etc.
proc means noprint
data = have;
output out= want
class measure;
var Jan_total Feb_total;
run;
It depends on if this is for display or for a data set. It usually makes no sense to have a total in the data set and it's just used for reporting.
PROC PRINT has a SUM statement that will add the totals to the end of a report. PROC TABULATE also provides another mechanism for reporting like this.
example from here.
options obs=10 nobyline;
proc sort data=exprev;
by sale_type;
run;
proc print data=exprev noobs label sumlabel
n='Number of observations for the order type: '
'Number of observations for the data set: ';
var country order_date quantity price;
label sale_type='Sale Type'
price='Total Retail Price* in USD'
country='Country' order_date='Date' quantity='Quantity';
sum price quantity;
by sale_type;
format price dollar7.2;
title 'Retail and Quantity Totals for #byval(sale_type) Sales';
run;
options byline;
Results:

Different number formats for axis value labels and bar labels in PROC GCHART

Is there any way to specify formats directly for axis values and data labels? As far as I can tell, it uses whatever format is applied to the dependent variable.
Example:
data sample;
input group $ number;
format number dollar6.1;
cards;
A 55.2
B 20.3
C 47.1
D 43.2
;
run;
axis1 minor=none order=0 to 60 by 10;
proc gchart data=sample;
vbar group/ type=sum sumvar=number sum levels=all raxis=axis1;
run;
If I set the format to dollar6.1 then the axis labels have an unecessary decimal (0.0, 10.0, 20.0, etc.)
But, if I set the format to dollar6.0, then the labels on the tops of each bar are missing the decimal that I would like to show.
Any way to specify formats independantly for either of these?
I don't believe you can control the formats separately; you have limited kinds of control as far as time axis, log axis, etc., but otherwise no control over the numeric format.
What you can do is one of two things. At least in SGPLOT, you can create a secondary variable with a different format, and produce an empty graph (or an identical copy of your bar chart but with no label) using the variable formatted how you want the axis formatted; then produce the chart with the second, otherly formatted variable.
Secondly, you can assign explicit values to the axis. Rather than using the automatic values arising from your data, you can just use VALUE= to overwrite the labels placed on the tick marks. This isn't optimal if you have a varying axis (ie, you produce twenty of these with different axis amounts or whatnot), but if it's a fixed axis then you can probably get away with this. Look at the AXIS statement in GChart for more information.
How you'd do the first option:
data sample;
input group $ number;
format number dollar6.1;
axis_number = number;
format axis_number dollar6.0;
cards;
A 55.2
B 20.3
C 47.1
D 43.2
;
run;
proc sgplot data=sample;
vbar group /response=axis_number;
vbar group /response=number datalabel;
yaxis label='Number (sum)';
run;
That creates the bar chart twice, once with axis_number which then defines the axis, and once with number which defines the labels.
You can do this sort of thing using an annotate dataset. I'd give a better explanation of this if I had a solid understanding of how it works, but I use it so rarely that it's usually more of a trial-and-error process:
data sample;
input group $ number;
format number dollar6.0;
cards;
A 55.2
B 20.3
C 47.1
D 43.2
;
run;
Create anno dataset. I pulled this from the link above and got rid of extraneous stuff. Set [function]='label', [position] = '2' to place the labels above the bars, xsys = 2' and ysys = 2 to base the coordinates on the data values. size and style control the font.
midpoint=group puts the labels on the bars, y=number makes the y coordinate of the label equal the height of the bars, and text is where you specify the value and format of your label.
SAS Annotate Dictionary
data anno;
length function style $12;
retain function 'label' size 1 position '2'
xsys '2' ysys '2' style 'Albany AMT';
set sample;
midpoint=group;
y=number;
text=put(number,dollar6.1);
run;
Make your chart using your current code, but removing the sum and inserting annotate=anno.
axis1 minor=none order=0 to 60 by 10;
proc gchart data=sample;
vbar group/ type=sum sumvar=number annotate=anno levels=all raxis=axis1;
run;
If you're running 9.2 or later, and are happy to use the Graphics Template Language (GTL) then you can do it like this:
Add a new column to your data that rounds the value:
data sample;
input group $ number;
format number dollar6.1;
axisval=round(number,1);
cards;
A 55.2
B 20.3
C 47.1
D 43.2
;
run;
Define the chart:
proc template;
define statgraph mychart;
begingraph;
layout overlay;
barchartparm x=group y=axisval / datalabel=number;
endlayout;
endgraph;
end;
run;
Render the chart using the data we created earlier:
proc sgrender data=sample template=mychart;
run;
The trick here is using the datalabel= option of the barchartparm statement to specify which column contains the values for the labels. There may be some other ways to do this using the GTL and specifying formats but this seemed pretty straightforward to me.
The GTL is included in Base SAS 9.2 onwards I believe.

Color options in GPLOT in SAS

I have a temporal series with a variable in the horizontal axis that is the year. Once i have drawn it with gplot procedure I want to divide the graphic in years painting each year in different color. I have tried to do an if statemente inside gplot procedure when defining the color inside symbol options like this
symbol
if year=2006 then c=red;
(this is very simplified, it would depend on much more years and all this stuff)
but this desnt work.
EDITED:
Thanks everybody but I think i didint explain myself properly. I
have this code
PROC GPLOT DATA = work.Datosipppa
;
PLOT IPPPA * date /
OVERLAY
VAXIS=AXIS1
HAXIS=AXIS2
FRAME LEGEND=LEGEND1
href='01jun2006'd '01jun2007'd
;
PLOT2 tasaParoMensual * date = 2 /
OVERLAY
VAXIS=AXIS3
OVERLAY
LEGEND=LEGEND1
;
run;
quit;
and i want to colored each of the years in different colour.
I want to show you my graph but i cant if idont have 10 of reputation :(
IN FACT I WANT TO DO SOMETHNG EQUAL TO THIS EXAMPLE
http://support.sas.com/documentation/cdl/en/graphref/63022/HTML/default/viewer.htm#a003259878.htm
BUT INSTEAD OF IN THIS PROCEDURE IN GPLOT
One straightforward approach is to create a list colors in the GOPTIONS statement, like this:
goptions reset=all colors=(red yellow green blue purple black);
symbol value=dot;
proc gplot data=sashelp.cars;
plot horsepower * enginesize = type;
run;
quit;
You will need to review the output carefully that the years match the colors you want.
Another way is to specify separate symbol statements for each group you are plotting. Try this example below that is a stripped down version of your code. You will need to create a YEAR variable and include that in the PLOT statement so each year will be assigned to a different symbol statement / color.
goptions reset=all;
*** GENERATE TEST DATA ***;
data have;
do date = '01Jun2005'd to '01aug2007'd;
ipppa = ranuni(123456);
tasaParoMensual = 10 + rannor(123456) ;
year = year(date);
output;
end;
run;
*** SYMBOLS 1-3 ARE USED IN THE FIRST PLOT STATEMENT TO SYMBOLIZE THE THREE YEARS IN THE DATA ***;
symbol1 value=dot color=red;
symbol2 value=dot color=green;
symbol3 value=dot color=yellow;
*** SYMBOLS 4 IS USED IN THE PLOT2 STATEMENT ***;
symbol4 value=star color=black i=join;
proc gplot data=have;
plot ipppa * date = year /
href='01jun2006'd '01jun2007'd
;
plot2 tasaParoMensual * date ;
run;
quit;
Hope that helps.

drawing histogram and boxplot in SAS

I wrote the following code in sas, but I did not get result!
The result histogram in grey and the range of data is not as I specified! what is the problem?
I got the following warning too: WARNING: The MIDPOINTS= list was extended to accommodate the data
what about color?
axis1 order=(0 to 100000 by 50000);
axis2 order=(0 to 100 by 5);
run;
proc capability data=HW2 noprint;
histogram Mvisits/midpoints=0 to 98000 by 10000
haxis=axis1
cfill=blue;
run;
.......................................
I have the same problem with boxplot, for example I got the following plot and I want to change the distances, then I could see the plot better, but I could not.
The below is for proc univariate rather than proc capability, I do not have access to SAS/QC to test, but the user guide shows very similar syntax for the histogram statements. Hopefully, you'll be able to translate it back.
It looks like you are having problems with the colour due to your output system. Your graphs are probably delivered via ODS, in which case the cfill option does not apply (see here and not the Traditional Graphics tag).
To change the colour of the histogram bars in ODS output you can use proc template:
proc template;
define style styles.testStyle;
parent = styles.htmlblue;
style GraphDataDefault /
color = green;
end;
run;
ods listing style = styles.testStyle;
proc univariate data = sashelp.cars;
histogram mpg_city;
run;
An example explaining this can be found here.
Alternatively you can use proc sgplot to create a histogram with more control of the colour as follows:
proc sgplot data = sashelp.cars;
histogram mpg_city / fillattrs = (color = red);
run;
As to your question of truncating the histogram. It doesn't really make a great deal of sense to ignore the extreme values as it will give you an erroneous image of the distribution, which somewhat defeats the purpose of the histogram. That said, you can achieve what you are asking for with bit of a hack:
data tempData;
set sashelp.cars;
tempClass = 1;
run;
proc univariate data = tempData noprint;
class tempClass;
histogram mpg_city / maxnbin = 5 endpoints = 0 to 25 by 5;
run;
In the above a dummy class tempClass is created and then comparative histograms are requested using the class statement. maxnbins will limit the number of bins displayed only in a comparative histogram.
Your other option is to exclude (or cap) your extreme points before creating the histogram, but this will lead to slightly erroneous frequency counts/percentages/bar heights.
data tempData;
set sashelp.cars;
mpg_city = min(mpg_city, 20);
run;
proc univariate data = tempData noprint;
histogram mpg_city / endpoints = 0 to 25 by 5;
run;
This is a possible approach to original question (untested as no SAS/QC or data):
proc capability data = HW2 noprint;
histogram Mvisits /
midpoints = 0 to 300000 by 10000
noplot
outhistogram = histData;
run;
proc sgplot data = histData;
vbar _MIDPT_ /
response = _OBSPCT_
fillattrs = (color = blue);
where _MIDPT_ <= 100000;
run;

SAS: Box-and-Whisker Plots using multiple datasets

My objective is to create a Box-and-Whisker plot using data from multiple datasets. Important: the size the dataset are not the same - I am not sure if this can be an issue. I'm trying the following code:
%macro plot;
%do i=1 %to 10;
ods graphics on;
title 'Box Plot for Durations';
proc boxplot data=d&i; /*where d&i refers to my datasets*/
plot durations / *HERE I am also having some difficulties because I have to refer to a y(durations)*x values. But I only have a y(durations) the one I want to boxplot - my x corresponds to the different datasets where I take the value.
boxstyle = schematic
nohlabel;
label durations = 'Durations';
run;
%end;
%mend plot;
%plot;
I want my x values to refer to each datasets where I take the duration values to boxplot. Each d1 d2 d3...d10 are ten different datasets corresponding to 10 different firms. Therefore, I wish to have 10 boxplot on in one graph...any insights?
I figured that the best was to simply take all the data that I wish to plot from my datasets and merge them in one file. I created a unique id associated with each datasets prior to merging the data. Then its easy to box plot the data by doing:
title 'Box Plot for Durations';
proc boxplot data=ALL_DATA;
plot boxplotdata*id /
boxstyle = schematic
nohlabel;
label durations = 'Durations';
run;