Order of data displayed in a SAS proc GCHART HBAR statement - sas

I have the following, but I wish to control the order in which the data is displayed. Instead of displaying the bars in the order of A, B, C, D, E, F, I wish to display the bars based on a user-specified ordering. For example, I would like to be able to assign in a SAS dataset a value to a variable named rank that will control the order in which the bars are stacked.
How can I do this?
%let name=ex_17;
%let myfont=Albany AMT;
goptions reset=all;
goptions reset=(global goptions);
/*GOPTIONS DEVICE=png xpixels=800 ypixels=400;*/
goptions gunit=pct border cback=white colors=(blacks) ctext=black
htitle=4 htext=3.0 ftitle="&myfont" ftext="&myfont";
data mileage;
length factor $ 24;
input factor $ level $ value;
datalines;
C left -38.882
C right 39.068
D right 38.99
D left -38.97
E right 38.982
E left -38.975
F left -38.973
F right 38.979
B left -38.975
B right 38.975
A right 38.977
A left -38.973
;
/* base case: 38.975 */
data mileage;
set mileage;
if level="right" then value = value - 38.975;
if level="left" then value = -1*(38.975 - value*-1);
run;
data convert;
set mileage;
*if level='left' then value=-value;
run;
proc format;
picture posval low-high='000,009';
run;
data anlabels(drop=factor level value);
length text $ 24;
retain function 'label' when 'a' xsys ysys '2' hsys '3' size 2;
set convert;
midpoint=factor; subgroup=level;
*text=left(put(value, BEST6.3));
if level ='left' then position='>';
else position='<'; output;
run;
title1 'Sensitivity Analysis graph';
*footnote1 justify=left ' SAS/GRAPH' move=(+0,+.5) 'a9'x move=(+0,-.5) ' Software'
justify=right 'DRIVER ';
*title2 'by Daniel Underwood' h=3.0;
footnote1 'Estimates accurate within +/- 0.002';
*axis1 label=(justify=left 'Disutility') style=0 color=black;
axis1 label=(justify=left '') style=0 color=black;
*axis2 label=none value=(tick=3 '') minor=none major=none
width=3 order=(-10000 to 20000 by 10000) color=black;
axis2 label=none minor=none major=none value=(tick=3 '')
width=3 order=(-0.093 to 0.093 by 0.186) color=black;
pattern1 value=solid color=ltgray;
pattern2 value=solid color=ltgray;
/*
goption vpos=25;
goptions vsize=5in;
*/
proc gchart data=convert;
format value BEST6.3;
note move=(40,90) height=3 'Women' move=(+12,+0) 'Men';
hbar factor / sumvar=value discrete nostat subgroup=level
maxis=axis1 raxis=axis2 nolegend annotate=anlabels
coutline=same des='' space=2;
run;
quit;

The order of values displayed is controlled by the ORDER= option on either an AXIS statement (to order midpoints or the chart variable) or a LEGEND statement (to order values of a sub-group variable).
If you are asking for a way to use a variable named RANK to control the order for sub-group variables, here is a SAS sample program that does exactly that.

Related

SAS / PROC FREQ TABLES - can I suppress frequencies and percents if frequency is less than a given value?

I'm using tagsets.excelxp in SAS to output dozens of two-way tables to an .xml file. Is there syntax that will suppress rows (frequencies and percents) if the frequency in that row is less than 10? I need to apply that in order to de-identify the results, and it would be ideal if I could automate the process rather than use conditional formatting in each of the outputted tables. Below is the syntax I'm using to create the tables.
ETA: I need those suppressed values to be included in the computation of column frequencies and percents, but I need them to be invisible in the final table (examples of options I have considered: gray out the entire row, turn the font white so it doesn't show for those cells, replace those values with an asterisk).
Any suggestions would be greatly appreciated!!!
Thanks!
dr j
%include 'C:\Users\Me\Documents\excltags.tpl';
ods tagsets.excelxp file = "C:\Users\Me\Documents\Participation_rdg_LSS_3-8.xml"
style = MonoChromePrinter
options(
convert_percentages = 'yes'
embedded_titles = 'yes'
);
title1 'Participation';
title2 'LSS-Level';
title3 'Grades 3-8';
title4 'Reading';
ods noproctitle;
proc sort data = part_rdg_3to8;
by flag_accomm flag_participation lss_nm;
run;
proc freq data = part_rdg_3to8;
by flag_accomm flag_participation;
tables lss_nm*grade_p / crosslist nopercent;
run;
ods tagsets.excelxp close;
D.Jay: Proc FREQ does not contain any options for conditionally masking cells of it's output. You can leverage the output data capture capability of the ODS system with a follow-up Proc REPORT to produce the desired masked output.
I am guessing on the roles of the lss and grade_p as to be a skill level and a student grade level respectively.
Generate some sample data
data have;
do student_id = 1 to 10000;
flag1 = ranuni(123) < 0.4;
flag2 = ranuni(123) < 0.6;
lss = byte(65+int(26*ranuni(123)));
grade = int(6*ranuni(123));
* at every third lss force data to have a low percent of grades < 3;
if mod(rank(lss),3)=0 then
do until (grade > 2 or _n_ < 0.15);
grade = int(6*ranuni(123));
_n_ = ranuni(123);
end;
else if mod(rank(lss),7)=0 then
do until (grade < 3 or _n_ < 0.15);
grade = int(6*ranuni(123));
_n_ = ranuni(123);
end;
output;
end;
run;
proc sort data=have;
by flag1 flag2;
*where lss in ('A' 'B') and flag1 and flag2; * remove comment to limit amount of output during 'learning the code' phase;
run;
Perform the Proc FREQ
Only capture the data corresponding to the output that would have been generated
ods _all_ close;
* ods trace on;
/* trace will log the Output names
* that a procedure creates, and thus can be captured
*/
ods output CrossList=crosslist;
proc freq data=have;
by flag1 flag2;
tables lss * grade / crosslist nopercent;
run;
ods output close;
ods trace off;
Now generate output to your target ODS destination (be it ExcelXP, html, pdf, etc)
Reference output of which needs to be produced an equivalent having masked values.
* regular output of FREQ, to be compare to masked output
* of some information via REPORT;
proc freq data=have;
by flag1 flag2;
tables lss * grade / crosslist nopercent;
run;
Proc REPORT has great features for producing conditional output. The compute block is used to select either a value or a masked value indicator for output.
options missing = ' ';
proc format;
value $lss_report ' '= 'A0'x'Total';
value grade_report . = 'Total';
value blankfrq .b = '*masked*' ._=' ' other=[best8.];
value blankpct .b = '*masked*' ._=' ' other=[6.2];
proc report data=CrossList;
by flag1 flag2;
columns
('Table of lss by grade'
lss grade
Frequency RowPercent ColPercent
FreqMask RowPMask ColPMask
)
;
define lss / order order=formatted format=$lss_report. missing;
define grade / display format=grade_report.;
define Frequency / display noprint;
define RowPercent / display noprint;
define ColPercent / display noprint;
define FreqMask / computed format=blankfrq. 'Frequency' ;
define RowPMask / computed format=blankpct. 'Row/Percent';
define ColPMask / computed format=blankpct. 'Column/Percent';
compute FreqMask;
if 0 <= RowPercent < 10
then FreqMask = .b;
else FreqMask = Frequency;
endcomp;
compute RowPMask;
if 0 <= RowPercent < 10
then RowPMask = .b;
else RowPMask = RowPercent;
endcomp;
compute ColPMask;
if 0 <= RowPercent < 10
then ColPMask = .b;
else ColPMask = ColPercent;
endcomp;
run;
ods html close;
If you have to produce lots of cross listings for different data sets, the code is easily macro-ized.
When I've done this in the past, I've first generated the frequency to a dataset, then filtered out the N, then re-printed the dataset (using tabulate usually).
If you can't recreate the frequency table perfectly from the freq output, you can do a simple frequency, check which IDs or variables or what have you to exclude, and then filter them out from the input dataset and rerun the whole frequency.
I don't believe that you can with PROC FREQ, but you can easily replicate your code with PROC TABULATE and you can use a custom format there to mask the numbers. This example sets it to M for missing and N for less than 5 and with one decimal place for the rest of the values. You could also replace the M/N with a space (single space) to have no values shown instead.
*Create a format to mask values less than 5;
proc format;
value mask_fmt
. = 'M' /*missing*/
low - < 5='N' /*less than 5 */
other = [8.1]; /*remaining values with one decimal place*/
run;
*sort data for demo;
proc sort data=sashelp.cars out=cars;
by origin;
run;
ods tagsets.excelxp file='/folders/myfolders/demo.xml';
*values partially masked;
proc tabulate data=cars;
where origin='Asia';
by origin;
class make cylinders;
table make, cylinders*n*f=mask_fmt. ;
run;
ods tagsets.excelxp close;
This was tested on SAS UE.
EDIT: Forgot the percentage piece, so this likely will not work for that, primarily because I don't think you'll get the percentages the same as in PROC FREQ (appearance) so it depends on how important that is to you. The other possibility to accomplish this would be to modify the PROC FREQ template to use the custom format as above. Unfortunately I do not have time to mock this up for you but maybe someone else can. I'll leave this here to help get you started and delete it later on.

How can I add non-overlapping labels for North-Eastern US states in proc gmap?

I am trying to plot two variables on US map. I would like to show price of product A and the difference v/s product B below it inside parenthesis. The code is almost finished. Only problem I am facing is that I am unable to put labels for smaller north eastern states like New Jersey, vermont and Hampshire without them overlapping. I would like something like the attached file wherein the above mentioned states' labels are shown with a line.
Below is the code I have so far.
proc import datafile="../Book8.csv" out=response dbms=csv replace;
run;
proc export data=response outfile="check.csv" dbms=csv replace;
run;
proc sort data=response out=sallx2(drop=Price_B); by STATECODE; run;
proc sort data=maps.us2 out=sus2(keep=STATE STATECODE); by STATECODE; run;
data mapfips;
merge sallx2 (in=a)
sus2 (in=b)
;
by STATECODE;
if a;
run;
data mapfips;
set mapfips;
dummy="$";
dummy1="(";
dummy2=")";
new_Price_A=catx("", of dummy Price_A);
new_Difference=catx("", of dummy1 dummy Difference dummy2);
run;
proc sort data=mapfips out=smapfips; by STATE; run;
proc sort data=maps.uscenter out=suscenter(keep=STATE X Y) nodupkey;
by STATE; run;
data mapfips2;
merge smapfips (in=a)
suscenter (in=b)
;
by STATE;
if a;
run;
data stlabel;
length function $ 8 position $ 1
text $ 20 style $ 30;
set mapfips2;
retain flag 0
xsys ysys '2'
hsys '3' when 'a';
format Difference dollar5.2;
text=new_Difference; style="'Albany AMT'";
color='black'; size=2; position='7'; output;
format Price_A dollar5.2;
text=new_Price_A; style="'Albany AMT'";
color='black'; size=2; position='4'; output;
if ocean='Y' then do;
text=new_Difference; position='6'; output;
function='move';
flag=1;
end;
else if flag=1 then do;
function='draw'; size=2; output;
flag=0;
end;
output;
run;
proc contents data=stlabel;
run;
proc format;
picture Difference_
low - -0.01 = 'negative'
0.00 = 'parity'
0.01 -high = 'positive'
;
run;
proc contents data=response;
pattern1 color=green;
pattern2 color=yellow;
pattern3 color= red;
title 'PRODUCT A V/S PRODUCT B';
proc gmap
data=response
map=maps.us
all;
id STATECODE;
format Difference Difference_.;
choro Difference / discrete annotate=stlabel ;
run;
quit;
Pawan:
You need to understand "Annotation Variables" and "Annotation Functions", as well as Maps.USCENTER
This code is a modification of the SAS sample "Example 6: Labeling the States on a U.S. Map". The code is more verbose than the example, for explanation, and due to two label lines per state and call-out tweaking.
The USCENTER data has a special feature:
Ocean variable, when Y (yes) there will be two rows for a state
First row is 'safe` X & Y for text, offset from the states actual geo-center. Use for placing labels and start-point for call-out line.
Second row is X & Y for actual geo-center and end-point for call-out line.
The code has features
Flag variable, retained to track if row is the subsequent row of an ocean state call out line, and thus the function should be set to 'draw'
Tweaks for specific states to change SAS provided call-out coordinatesif state2 = 'VT' then do;
* tweak first end-point of call out for VT;
x = 0.27;
y = 0.20;
position1 = 'A'; /* RAD: right aligned 1/2 cell above **/
position2 = 'D'; /* RAD: right aligned 1/2 cell below **/
end;
The code does not create new call-outs where none existed before. You would have to add rows to a copy of maps.uscenters data to create a new call-outs.
/* Original from SAS Example 6: Labeling the States on a U.S. Map */
goptions reset=global gunit=pct border cback=white
colors=(black blue green red)
ftext='Albany AMT' /* RAD: Change default font to 'Albany AMT' */
htitle=6 htext=3
;
data WORK.myTexts;
set maps.uscenter;
by state;
if first.state;
line1 = 'Line 1';
line2 = 'Line 2';
state2 = fipstate(state);
if state2 ne 'DC';
run;
data WORK.map_annotation; /* RAD: use WORK libref instead of REFLIB */
length function $ 8 x y 8 position $1 text $20;
retain
flag 0
xsys ysys '2' /* RAD: coordinate system for drawing, 2 means data values */
hsys '3' /* RAD: coordinate system for heights, 3 means % of graphics output area */
when 'a' /* RAD: annotation occurs after all procedure drawing is done */
style "'Albany AMT'" /* RAD: quoted style value indicates a true type font is being requested for drawn labels */
;
merge
myTexts (in=myAnno)
maps.uscenter (drop=long lat)
;
by state;
if myAnno;
function='label';
size=1.5; /* RAD: size for label is font height in HSYS coordinate system, make it small enough for stacking two labels */
position='B'; /* RAD: text position is centered about X and Y at half cell above Y */
if ocean='Y' then do;
position1 = 'C'; /* RAD: left aligned 1/2 cell above */
position2 = 'F'; /* RAD: left aligned 1/2 cell below */
if state2 = 'VT' then do;
* tweak first end-point of call out for VT;
x = 0.27;
y = 0.20;
position1 = 'A'; /* RAD: right aligned 1/2 cell above */
position2 = 'D'; /* RAD: right aligned 1/2 cell below */
end;
text=catx(':', state2, line1);
position=position1;
output;
text=line2;
position=position2;
output;
function='move'; /* RAD: move the pen to the start of call-out line */
flag=1;
output;
end;
else if flag=1 then do;
/* Dealing with an Ocean state,
* this is the second observation for it (data feature of MAPS.USCENTER)
*/
function='draw'; /* RAD: draw line to the end of the call-out line (which is state geo-center) */
size=.25; /* Size for 'draw' is line thickness */
flag=0;
output;
end;
else do;
/* USCENTER row is neither ocean state, nor ocean state 2nd row */
/* Thus a state is one without a call-out line
* place the annotation at the states center */
text=line1;
position='B'; /* RAD: Center aligned 1/2 cell above */
output;
text=line2;
position='E'; /* RAD: Center aligned 1/2 cell above */
output;
end;
run;
title 'Positioning State Labels with MAPS.USCENTER';
footnote j=r 'GR19N06 ';
pattern1 value=mempty color=blue repeat=50;
proc gmap data=maps.us map=maps.us;
id state;
choro state / nolegend
annotate=WORK.map_annotation;
run;
quit;

SAS Second smallest value

The following code, built using the Summary Statistics task from SAS Enterprise Guide, finds the min of each column of a table.
How can I find the second smallest value?
I tried replacing MIN with SMALLEST(2) but doesn't work.
Thank you.
TITLE;
TITLE1 "Summary Statistics";
TITLE2 "Results";
FOOTNOTE;
FOOTNOTE1 "Generated by the SAS System (&_SASSERVERNAME, &SYSSCPL) on
%TRIM (%QSYSFUNC(DATE(), NLDATE20.)) at
%TRIM(%SYSFUNC(TIME(), TIMEAMPM12.))";
PROC MEANS DATA=WORK.SORTTempTableSorted
NOPRINT
CHARTYPE
MIN NONOBS ;
VAR A B C;
OUTPUT OUT=WORK.MEANSummaryStats(LABEL="Summary Statistics for
WORK.QUERY_FOR_TRNSTRANSPOSEDPD__0001")
MIN()=
/ AUTONAME AUTOLABEL INHERIT
;
RUN;
Using the ExtremeValue table from PROC UNIVARIATE.
ods select none;
ods output ExtremeValues=ExtremeValues(where=(loworder=2) drop=high:);
proc univariate data=sashelp.class NEXTRVAL=2;
run;
ods select all;
proc print;
run;
I don't think there's any way to accomplish this within proc means. There are ways using a variety of other procs. The Univariate procedure highlights one method using the Extreme Observations.
https://support.sas.com/documentation/cdl/en/procstat/63104/HTML/default/viewer.htm#procstat_univariate_sect058.htm
title 'Extreme Blood Pressure Observations';
ods select ExtremeObs;
proc univariate data=BPressure;
var Systolic Diastolic;
id PatientID;
run;
proc print data=ExtremeObs;
run;
I will assume you are interested in all numeric columns. If the IFLIST macro variable is longer than 64k bytes in length due to the number of numeric variables and the length of their names, this code will fail. It should work for all reasonably narrow data sets.
UNTESTED CODE
We get a list of the variables in your data set.
PROC CONTENTS DATA=SORTTempTableSorted OUT=md NOPRINT ;
RUN ;
We use that list to create statements and expressions.
IFLIST is a block of statements to store the minimum value of fieldname in fieldname_1 and the second lowest in fieldname_2. If the comparison is LT, then we keep distinct values, not necessarily the order statistics. If the comparision is LE and there are multiple observations with the minimum value, fieldname_1 and fieldname_2 will be equal to each other. I will assume you want distinct values.
MAXLIST is an expression that will resolve to the largest numeric value in the data set :)
MINLIST and MINLIST2 are created for use in RETAIN and KEEP statements.
PROC SQL STIMER NOPRINT EXEC ;
SELECT 'IF ' || name || ' LT ' || name '_1 THEN DO;' ||
name || '_2=' || name || '_1;' ||
name || '_1=' || name || ';END;ELSE IF ' ||
name || ' LT ' || name || '_2 THEN ' ||
name || '_2=' || name,
'MAX(' || name || ')',
name || '_1',
name || '_2'
INTO :iflist SEPARATED BY '; ',
:maxlist SEPARATED BY '<>'
:minlist SEPARATED BY ' ',
:min2list SEPARATED BY ' '
FROM md
WHERE type EQ 1
;
Now we get the largest numeric value from the data set:
SELECT &maxlist
INTO :maxval
FROM SORTTempTableSorted
;
QUIT ;
Now we do the work. The END option sets "eof" to 1 on the last observation, which is the only time we want to write a record to the output data set.
DATA min2 ;
SET SORTTempTableSorted END=eof;
RETAIN &minlist &min2list &maxval;
KEEP &minlist &min2list ;
&iflist ;
IF eof THEN
OUTPUT ;
RUN ;
A data step solution that should work for any number of columns without ever running into macro limitations:
proc sql noprint;
select count(*) into :NUM_COUNT from dictionary.columns
where LIBNAME='SASHELP' and MEMNAME = 'CLASS' and TYPE = 'num';
quit;
data class_min2;
do until(eof);
set sashelp.class end = eof;
array min2[&NUM_COUNT,2] _temporary_;
array nums[*] _numeric_;
do _n_ = 1 to &NUM_COUNT;
min2[_n_,1] = min(min2[_n_,1],nums[_n_]);
if min2[_n_,1] < nums[_n_] then min2[_n_,2] = min(nums[_n_],min2[_n_,2]);
end;
end;
do _iorc_ = 1 to 2;
do _n_ = 1 to &NUM_COUNT;
nums[_n_] = min2[_n_,_iorc_];
end;
output;
end;
keep _NUMERIC_;
run;
This outputs the two lowest distinct values of each numeric variable, without transposing the data in the same way that proc univariate does. You can easily allow for duplicate minimum values with a bit of tweaking.
Sort the values in ascending order. Delete the first value. This would be the minimum value. Now the value left at the first position is your second minimum.

SAS proc format for background colours for dates including blank values

I have this code where I create a proc format statement for dates based on todays date.
Any date prior to today is red and any date in the future is green. However this is a proc report I am calling this statement in and there are blank values for date in some cases. Therefore, I want fields that don't contain a date to be white.
data _null_;
sdate = date ();
format sdate date9.;
call symput('sdate',sdate);
run;
proc format;
value closefmt
low - &sdate ='red'
' ' = 'white'
&sdate - high = 'green';
run;
It doesn't like ' ' = white and doesn't accept null.
Any help would be appreciated.
Thanks in advance.
Use single dot for missing values in numeric variable:
proc format;
value closefmt
low - &sdate ='red'
. = 'white'
&sdate - high = 'green';
run;
/* test code */
data indata;
d = .; output;
do i=-3 to 3;
d = today()+i;
output;
end;
run;
data _null_;
set indata;
format d yymmdds10.;
length color $10;
color = put(d, closefmt. - L);
putlog d color;
run;

Tornado diagrams in SAS

There is an example "Tornado Diagram" here. I am trying to modify that code. Here is my modified version:
%let name=ex_17;
goptions reset=(global goptions);
GOPTIONS DEVICE=png xpixels=800 ypixels=600;
goptions gunit=pct border cback=lightgray colors=(blacks) ctext=black
htitle=6.5 htext=3 ftitle="albany amt" ftext="albany amt";
data mileage;
input factor $ level $ value;
datalines;
Screening M 7199
Diagnosis F 4502
Biopsy M 12304
Treatment F 5428
Recovery M 15701
Metastasis F 6915
;
data convert;
set mileage;
if level='F' then value=-value;
run;
proc format;
picture posval low-high='000,009';
run;
data anlabels(drop=factor level value);
length text $ 24;
retain function 'label' when 'a' xsys ysys '2' hsys '3' size 2;
set convert;
midpoint=factor; subgroup=level;
text=left(put(value, posval.));
if level ='F' then position='>';
else position='<'; output;
run;
title1 'One-Way Sensitivity Analysis on NNS to Gain 1 QALY';
*axis1 label=(justify=left 'Disutility') style=0 color=black;
axis1 label=(justify=left '') style=0 color=black;
axis2 label=none value=(tick=3 '') minor=none major=none
width=3 order=(-10000 to 20000 by 10000) color=black;
pattern1 value=solid color=green;
pattern2 value=solid color=blue;
proc gchart data=convert;
format value posval.;
note move=(25,80) height=3 'Women' move=(+10,+0) 'Men';
hbar factor / sumvar=value discrete nostat subgroup=level
maxis=axis1 raxis=axis2 nolegend annotate=anlabels
coutline=same des='';
run;
quit;
However, as you can see by running this code, the labels for each bar are cut off, not fully visible. Also, some halves of the bars aren't visible.
What am I doing to make these things not visible, and how can I fix this?
Your axis labels are getting cut off in the input dataset.
data mileage;
length factor $20;
input factor $ level $ value;
datalines;
Screening M 7199
Diagnosis F 4502
Biopsy M 12304
Treatment F 5428
Recovery M 15701
Metastasis F 6915
;
run;
As far as "some halves are not visible", what halves aren't visible? You only have either M or F for each factor, so you aren't going to get two bars on each factor. You're getting all of the bars you're asking for, or at least I see all of them (6 bars, some on left some on right).