I've got a panel of three histograms and I've been able to figure out how to tweak all of the formatting except for one thing: getting the ticks to be the endpoints for the bins, instead of the midpoints.
I know that in 'proc univariate,' one can use an 'endpoints=' option in the histogram statement.
However, I cannot find a similar statement in the documentation for 'proc sgpanel'
Here is my code:
ods graphics on;
title "Baseline";
proc sgpanel data=baseline;
panelby scrp_cohort2 / rows=3 layout=rowlattice;
histogram pt_eq5d3l_health_state / boundary=lower group=scrp_cohort2;
where time=0;
colaxis min=0 max=100 grid values=(0 to 100 by 10);
run;
ods graphics off;
Specify a colaxis offsetmin and offsetmax that are 1/2 the bin width (as fraction).
Example:
Three SGPANEL runs to compare and contrast. The final one is the one you want.
data have;
call streaminit(2021);
do panel = 1 to 3;
do _n_ = 1 to 100 + rand('integer',50);
id + 1;
group = rand('integer',3);
do time = 0 to 10;
status = rand('integer',0,100);
output;
end;
end;
end;
stop;
run;
ods html file='gfx.html';
ods graphics on/ height=400 width=500;
title "Baseline";
proc sgpanel data=have;
panelby panel / rows=3 layout=rowlattice;
histogram status / boundary=lower group=group;
where time=0;
run;
proc sgpanel data=have;
panelby panel / rows=3 layout=rowlattice;
histogram status / boundary=lower group=group;
where time=0;
colaxis min=0 max=100 grid values=(0 to 100 by 20);
run;
proc sgpanel data=have;
panelby panel / rows=3 layout=rowlattice;
histogram status / boundary=lower group=group;
where time=0;
colaxis grid values=(0 to 100 by 10)
offsetmin=0.05
offsetmax=0.05
;
run;
ods graphics off;
ods html close;
The issue here is that you're trying to manipualte a histogram, which is a chart that is not a discrete-values chart, even though it looks like it is such a chart. For example, VBAR would offer a discreteoffset option that would let you do exactly what you ask.
However, a histogram is a chart that graphs not discrete values on an x/y axis, just in a particular way that ends up looking sort of like a bar chart. So it won't let you move the labels around, because they're not just labels - they're fixed positions on the axis, which the histogram is collapsing points around.
Unfortunately, the endpoints option isn't available for PROC SGPANEL, which of course would be how you'd ideally solve this issue. You have a couple of options for what would work, depending on what you want to do exactly and what your data look like.
First, you can simply summarize your data using proc univariate or whatever works best, and then use vbar to graph the (now discrete) data. You can get a histogram dataset out of proc univariate easily enough (with ODS OUTPUT or OUTHISTOGRAM= option) with by statement for your group/panel values, and then you can graph that with VBAR in SGPANEL.
Second, you can make some adjustments to how things are done in SGPANEL, which might be enough for your needs. Look at the following graph, using Richard's example data.
proc sgpanel data=have;
panelby panel / rows=3 layout=rowlattice;
histogram status / boundary=lower group=group binstart=-5 binwidth=10;
where time=0;
colaxis min=0 max=100 grid values=(0 to 100 by 10) ;
run;
What it does is start the bins at -5, instead of at 0, but the colaxis is still starting at zero. That's now accurately doing what you want, I think - except that 0 itself ends up in the -5 bar, which you might not want. The bins are now centered at 5/15/25/35/etc., which is hopefully what you do want. If you do have 0 in your data, you may be able to use options to move where 0 is bucketed (but it would affect all of the other exact endpoints also).
This is what that looks like with the 0's removed. If there are actual 0's, then you would have a bar to the left of the plot area, though.
Here is the same thing but with 0's in it, which you'll note means a bar to the left of 0.
This is a similar plot but with 0's allowed, and with boundary=upper which moves all of the exactly-on-bin-boundaries to the upper bin (so 0 goes to the 0-10 bin). Note the other changes - and there is now a 100-110 bar which contains the 100 values.
Code for the latter chart (earlier chart is same but boundary=lower):
title "Baseline";
proc sgpanel data=have;
panelby panel / rows=3 layout=rowlattice;
histogram status / boundary=lower group=group binstart=-5 binwidth=10 boundary=upper;
where time=0;
colaxis min=0 max=100 grid values=(0 to 100 by 10) ;
run;
I am trying to learn SAS and specifically PROC REPORT. I am using SASHELP.CARS dataset.
What I want to achieve in the 6th column of the output, labelled as 'Number of Cars > Mean(Invoice)' to compute number of cars whose Invoice is greater than the Group's mean of Invoice. I am using the code below.
PROC REPORT DATA=sashelp.CARS NOWD OUT=learning.MyFirstReport;
COLUMNS Type Origin INVOICE=Max_INVOICE INVOICE=Mean_Invoice
INVOICE=Count_Invoice TEST DriveTrain;
DEFINE Type / Group 'Type of Car' CENTER;
DEFINE Origin / Group 'Origin of Car' CENTER;
DEFINE Max_Invoice / ANALYSIS MAX 'Max of Invoice';
DEFINE Mean_Invoice / ANALYSIS MEAN 'Mean of Invoice';
DEFINE Count_Invoice / ANALYSIS N FORMAT=5.0 'Total Number of Cars' center;
DEFINE DriveTrain / ACROSS 'Type of DriveTrain of Car';
DEFINE TEST / COMPUTED 'Number of Cars > Mean(Invoice)' center;
COMPUTE TEST;
TEST=N(_c7_>Mean_Invoice);
ENDCOMP;
RUN;
The Output that I am getting is in the image below.
Output of the above SAS code
I don't think that is the correct output since all the rows in the column show a value of 1. How do I get the desired output in the 6th column of the output?
The non group columns are being defined analysis for computing aggregate statistics. One way to achieve a count of a logical evaluation is to prep the data so that a SUM aggregation of an individual flag (0 or 1) is the count of positive assertions.
Prepare
proc sql;
create view cars_v as
select *
, mean(invoice) as invoice_mean_over_type_origin
, (invoice > calculated invoice_mean_over_type_origin) as flag_50
from sashelp.cars
group by type, origin
;
Report
PROC REPORT DATA=CARS_V OUT=work.MyFirstReport;
COLUMNS
Type
Origin
INVOICE/*=Max_INVOICE */
INVOICE=INVOICE_use_2/*=Mean_Invoice */
flag_50
flag_50=flag_50_use_2
flag_50_other
DriveTrain
;
DEFINE Type / Group 'Type of Car' CENTER;
DEFINE Origin / Group 'Origin of Car' CENTER;
DEFINE Invoice / ANALYSIS MAX 'Max of Invoice';
DEFINE Invoice_use_2 / ANALYSIS MEAN 'Mean of Invoice';
DEFINE flag_50 / analysis sum 'Number of Cars > Mean ( Invoice )' center;
DEFINE flag_50_use_2 / noprint analysis N ;
* noprint makes a hidden column whose value is available to compute blocks;
DEFINE flag_50_other / computed 'Number of Cars <= Mean ( Invoice )' center;
DEFINE DriveTrain / ACROSS 'Type of DriveTrain of Car';
compute flag_50_other;
flag_50_other = flag_50_use_2 - flag_50.sum;
endcomp;
RUN;
In newer versions of SAS NOWD is a default option. New Proc REPORT code does not need to specified it explicitly.
Reusing a variable such as invoice=mean_invoice is ok, but a future reader of the code might have some misunderstanding when seeing the DEFINE Mean_Invoice / ANALYSIS MEAN 'Mean of Invoice'; line of code -- is the define for the mean or the mean of a mean
?
I want to plot Y by X plot where I group by year, but color code year based on different variable (dry). So each year shows as separate line but dry=1 years plot one color and dry=0 years plot different color. I actually figured one option (yeah!) which is below. But this doesn't give me much control.
Is there a way to put a where clause in the series statement to select specific categories so that I can specifically assign a color (or other format)? Or is there another way? This would be analogous to R where one can use multiple line statements for different subsets of data.
Thanks!!
This code works.
proc sgplot data = tmp;
where microsite_id = "&msit";
by microsite_id ;
yaxis label= "Pct. Stakes" values = (0 to 100 by 20);
xaxis label= 'Date' values = (121 to 288 by 15);
series y=tpctwett x=jday / markers markerattrs=(symbol=plus) group = year grouplc=dry groupmc=dry;
format jday tadjday metajday jdyfmt.;
label tpctwett='%surface water' tadval1='breed' metaval1='meta';
run;
Use an Attribute map, see the documentation
You can use the DRY variable to set the specific colours. For each year, assign the colour using the DRY variable in a data step.
proc sort data=tmp out=attr_data; by year; run;
data attrs;
set attr_data;
id='year';
if dry=0 then linecolor='green';
if dry=1 then linecolor='red';
keep id linecolor;
run;
Then add the dattrmap=attrs in the PROC SGPLOT statement and the attrid=year in the SGPLOT options.
ods graphics / attrpriority=none;
proc sgplot data = tmp dattrmap=attrs;
where microsite_id = "&msit";
by microsite_id ;
yaxis label= "Pct. Stakes" values = (0 to 100 by 20);
xaxis label= 'Date' values = (121 to 288 by 15);
series y=tpctwett x=jday / markers markerattrs=(symbol=plus) group = year grouplc=dry groupmc=dry attrid=year;
format jday tadjday metajday jdyfmt.;
label tpctwett='%surface water' tadval1='breed' metaval1='meta';
run;
Note that I tested and edited this post so it should work now.
A sample SAS Proc report code is below. I want to change the color of the header background and foreground for one value of the across variable. I used a compute block to change the column background to light gray using the absolute column references c4 and c5. How do I change the header style attributes for c4 and c5 to background=gainboro and foreground=black?
data test;
length name $ 10 disease $ 10.;
infile datalines dsd;
input name $ disease cases rate;
datalines;
State,Fever,4847,25.16
State,Cold,25632,131.5
State,Flu,103825,535.82
Lincoln,Fever,3920,44.17
Lincoln,Cold,16913,190.18
Lincoln,Flu,62965,735.39
Washington,Fever,827,56.56
Washington,Cold,3609,234.26
Washington,Flu,16610,1078.8
Kings,Fever,1026,37.45
Kings,Cold,4984,181.85
Kings,Flu,18388,694.33
Sussex,Fever,1411,78.38
Sussex,Cold,5515,300.46
Sussex,Flu,13881,813.11
Queens,Fever,616,26.03
Queens,Cold,2496,107.75
Queens,Flu,12518,558.09
;
run;
proc report data=test nowd headline headskip
STYLE(Header)={background=charcoal foreground=white }
style(column)={background=gray foreground=black}
style(report)=[rules=rows bordercolor=white];
columns (name disease,(cases rate));
define name/group order=data 'County' style(column)={background=lighttgray} style(header)=[bordertopcolor=gainsboro background=gainsboro foreground=black];
define disease/across '' order=data ;
define cases/'Cases' format=comma9. ;
define rate/'Rate' format=comma12.1 ;
compute cases;
call define('_c4_','style','style={background=lighttgray}');
call define('_c5_','style','style={background=lighttgray}');
endcomp;
run;
quit;
run;
You can use formats to do something close to what you're asking, but I'm not sure it's possible to do what you're asking - maybe Cynthia Zender on communities.sas.com might?
data test;
length name $ 10 disease $ 10.;
infile datalines dsd;
input name $ disease cases rate;
datalines;
State,Fever,4847,25.16
State,Cold,25632,131.5
State,Flu,103825,535.82
Lincoln,Fever,3920,44.17
Lincoln,Cold,16913,190.18
Lincoln,Flu,62965,735.39
Washington,Fever,827,56.56
Washington,Cold,3609,234.26
Washington,Flu,16610,1078.8
Kings,Fever,1026,37.45
Kings,Cold,4984,181.85
Kings,Flu,18388,694.33
Sussex,Fever,1411,78.38
Sussex,Cold,5515,300.46
Sussex,Flu,13881,813.11
Queens,Fever,616,26.03
Queens,Cold,2496,107.75
Queens,Flu,12518,558.09
;
run;
proc format;
value $headerbackf
'Cold' = 'gainsboro'
other = 'charcoal';
value $headerforef
'Cold' = 'black'
other = 'white'
;
quit;
proc report data=test nowd headline headskip
STYLE(Header)={background=charcoal foreground=white }
style(column)={background=gray foreground=black}
style(report)=[rules=rows bordercolor=white];
columns (name disease,(cases rate));
define name/group order=data 'County' style(column)={background=lightgray} style(header)=[bordertopcolor=gainsboro background=gainsboro foreground=black];
define disease/across '' order=data style(header)={background=$HEADERBACKF. foreground=$HEADERFOREF.};
define cases/'Cases' format=comma9. style(header)=inherit;
define rate/'Rate' format=comma12.1 ;
compute cases;
call define('_c4_','style','style={background=lighttgray}');
call define('_c5_','style','style={background=lighttgray}');
endcomp;
run;
That gets that top row formatted, but, doesn't actually get the row you're asking for. I'm not sure it's possible to.
It's possible as #ChrisJ noted that you might be able to do this with CSS styles and nth child selection. It's also possible you can't, unfortunately, due to how SAS does things with PROC REPORT - in particular, in PROC REPORT everything gets shoved inside <tr>s including the header rows, so nth-child and sibling selectors are impossible due to the headers not being children or siblings of each other.
Here's an example of a kludgey version of this, using sashelp.cars as an example.
CSS: (save in a .css file on your drive somewhere, say "c:\temp\test.css"):
#import 'base.css';
/* Red the second (really third) column header value */
.table thead tr:nth-child(2) th:nth-child(3) {
color:red
}
/* Yellow background for the mpg headers under Europe */
.table thead tr:nth-child(3) th:nth-child(4),
.table thead tr:nth-child(3) th:nth-child(5)
{
background-color:yellow
}
/* Green the mpg-city values */
.table thead tr:nth-child(3) th:nth-child(even) {
color:green
}
SAS program: (assumes the above-saved CSS file)
ods html file='example.html' cssstyle='c:\temp\test.css'(html);
ods pdf file='example.pdf' cssstyle='c:\temp\test.css'(print);
proc sort data=sashelp.cars out=cars; by origin;
run;
proc report data=cars nowd;
columns type origin,(mpg_city mpg_highway);
define origin/across;
define type/group;
define mpg_City / analysis mean;
define mpg_highway / analysis mean;
run;
ods _all_ close;
This is partially based on Kevin Smith's Unveiling the power of Cascading Style Sheets (CSS) in ODS.
Unfortunately, we can't in any way identify a cell that has "MPG(City)" in it except by knowing they'll be even column numbers. We similarly can't identify a cell under a "Europe" except by knowing what cells those will be.
Try adding a dummy column _c to the end of your columns statement, and add a define & compute to go with it.
Also, ensure your colour names are actually valid, e.g. lighttgray is invalid and will not work.
columns ... _c ;
define _c / computed noprint ;
compute _c ;
call define('_c4_','style','style={background=lightgray}');
call define('_c5_','style','style={background=lightgray}');
endcomp ;
I have a temporal series with a variable in the horizontal axis that is the year. Once i have drawn it with gplot procedure I want to divide the graphic in years painting each year in different color. I have tried to do an if statemente inside gplot procedure when defining the color inside symbol options like this
symbol
if year=2006 then c=red;
(this is very simplified, it would depend on much more years and all this stuff)
but this desnt work.
EDITED:
Thanks everybody but I think i didint explain myself properly. I
have this code
PROC GPLOT DATA = work.Datosipppa
;
PLOT IPPPA * date /
OVERLAY
VAXIS=AXIS1
HAXIS=AXIS2
FRAME LEGEND=LEGEND1
href='01jun2006'd '01jun2007'd
;
PLOT2 tasaParoMensual * date = 2 /
OVERLAY
VAXIS=AXIS3
OVERLAY
LEGEND=LEGEND1
;
run;
quit;
and i want to colored each of the years in different colour.
I want to show you my graph but i cant if idont have 10 of reputation :(
IN FACT I WANT TO DO SOMETHNG EQUAL TO THIS EXAMPLE
http://support.sas.com/documentation/cdl/en/graphref/63022/HTML/default/viewer.htm#a003259878.htm
BUT INSTEAD OF IN THIS PROCEDURE IN GPLOT
One straightforward approach is to create a list colors in the GOPTIONS statement, like this:
goptions reset=all colors=(red yellow green blue purple black);
symbol value=dot;
proc gplot data=sashelp.cars;
plot horsepower * enginesize = type;
run;
quit;
You will need to review the output carefully that the years match the colors you want.
Another way is to specify separate symbol statements for each group you are plotting. Try this example below that is a stripped down version of your code. You will need to create a YEAR variable and include that in the PLOT statement so each year will be assigned to a different symbol statement / color.
goptions reset=all;
*** GENERATE TEST DATA ***;
data have;
do date = '01Jun2005'd to '01aug2007'd;
ipppa = ranuni(123456);
tasaParoMensual = 10 + rannor(123456) ;
year = year(date);
output;
end;
run;
*** SYMBOLS 1-3 ARE USED IN THE FIRST PLOT STATEMENT TO SYMBOLIZE THE THREE YEARS IN THE DATA ***;
symbol1 value=dot color=red;
symbol2 value=dot color=green;
symbol3 value=dot color=yellow;
*** SYMBOLS 4 IS USED IN THE PLOT2 STATEMENT ***;
symbol4 value=star color=black i=join;
proc gplot data=have;
plot ipppa * date = year /
href='01jun2006'd '01jun2007'd
;
plot2 tasaParoMensual * date ;
run;
quit;
Hope that helps.