Stalk bar chart in controlled order in SAS - sas

I want to create stlak bar charts, The column WORKSCOPE having HSB,OHS,RES. but I want to reorder the WORKSCOPE in the order of OHS,HSB,RES in the stalk bar. Usually default its taking in alphabetical order. how can I achieve it.
goptions reset=all;
goptions colors=(red blue green);
legend1 label=none
order=('OHS' 'HSB' 'RES');
proc gchart data=FINALREPV3;
vbar year / discrete type=sum sumvar=VALUE
subgroup=WORKSCOPE legend=legend1 ;
run;

You can create a sorting variable beforehand, use it to sort the input data and then plot using proc sgplot with xaxis discreteorder = data.
/* Create dummy sorting variable */
data want;
set finalrepv3;
if workscope = "OHS" then _sort = 1;
if workscope = "HBS" then _sort = 2;
if workscope = "RES" then _sort = 3;
run;
/* Put the data in the required order */
proc sort data = want;
by _sort;
run;
/* Create the plot */
proc sgplot data = want;
vbar year / group = workscope response = value stat = sum;
/* Request that the x axis respect the data's order */
xaxis discreteorder = data;
run;

Related

SAS SGPLOT VBOX: Display Mean and Median on Boxplot

I am trying to make a boxplot by using the SGPLOT in SAS. I would like to use SGPLOT with VBOX statement to flag out the Mean and Median on the gragh for each box.
Below is the data set I created as an example. Can someone give me a kind help on that?
/* Set the graphics environment */
goptions reset=all cback=white border htitle=12pt htext=10pt;
/* Create a sample data set to plot */
data one(drop=i);
do i=1 to 10;
do xvar=1 to 9 by 2;
yvar=ranuni(0)*100;
output;
end;
end;
run;
/* Sort the data by XVAR */
proc sort data=one;
by xvar;
run;
/* Use the UNIVARIATE procedure to determine */
/* the mean and median values */
proc univariate data=one noprint;
var yvar;
by xvar;
output mean=mean median=median out=stat;
run;
/* Merge the mean and median values back */
/* into the original data set by XVAR */
data all;
merge one stat;
by xvar;
run;
Use VBOX for box plot, SCATTER for mean/median.
/*--Compute the Mean and Median by sex--*/
proc means data=sashelp.heart;
class deathcause;
var cholesterol;
output out=heart(where=(_type_ > 0) keep=deathcause mean median _type_)
mean = mean
median = median;
run;
/*--Merge the data--*/
data heart2;
keep deathcause mean median cholesterol;
set sashelp.heart heart;
run;
proc print data=heart2;run;
/*--Box plot with connect and group colors--*/
ods graphics / reset ANTIALIASMAX=5300 width=5in height=3in imagename='Box_Group_Multi_Connect';
title 'Cholesterol by Cause of Death';
proc sgplot data=heart2 noautolegend noborder;
vbox cholesterol / category=deathcause group=deathcause;
scatter x=deathcause y=mean / name='mean' legendlabel='Mean' markerattrs=(color=green);
scatter x=deathcause y=median / name='median' legendlabel='Median' markerattrs=(color=red);
keylegend "mean" "median" / linelength=32 location=inside across=1 position=topright;
xaxis display=(nolabel);
run;
EDIT: Within SGPLOT and the VBOX statement, you can also plot the median as the line, and the mean as a point on the box plot, without any other manual calculations ahead of time. This is available as of SAS 9.4 M5+.
ods graphics / reset ANTIALIASMAX=5300 width=5in height=3in imagename='Box_Group';
title 'Cholesterol by Cause of Death';
proc sgplot data=sashelp.heart noborder;
vbox cholesterol / category=deathcause
displaystats=(median mean)
meanattrs=(color=red)
medianattrs=(color=green);
*xaxis display=(nolabel);
run;

How do I create a SAS data view from an 'Out=' option?

I have a process flow in SAS Enterprise Guide which is comprised mainly of Data views rather than tables, for the sake of storage in the work library.
The problem is that I need to calculate percentiles (using proc univariate) from one of the data views and left join this to the final table (shown in the screenshot of my process flow).
Is there any way that I can specify the outfile in the univariate procedure as being a data view, so that the procedure doesn't calculate everything prior to it in the flow? When the percentiles are left joined to the final table, the flow is calculated again so I'm effectively doubling my processing time.
Please find the code for the univariate procedure below
proc univariate data=WORK.QUERY_FOR_SGFIX noprint;
var CSA_Price;
by product_id;
output out= work.CSA_Percentiles_Prod
pctlpre= P
pctlpts= 40 to 60 by 10;
run;
In SAS, my understanding is that procs such as proc univariate cannot generally produce views as output. The only workaround I can think of would be for you to replicate the proc logic within a data step and produce a view from the data step. You could do this e.g. by transposing your variables into temporary arrays and using the pctl function.
Here's a simple example:
data example /view = example;
array _height[19]; /*Number of rows in sashelp.class dataset*/
/*Populate array*/
do _n_ = 1 by 1 until(eof);
set sashelp.class end = eof;
_height[_n_] = height;
end;
/*Calculate quantiles*/
array quantiles[3] q40 q50 q60;
array points[3] (40 50 60);
do i = 1 to 3;
quantiles[i] = pctl(points[i], of _height{*});
end;
/*Keep only the quantiles we calculated*/
keep q40--q60;
run;
With a bit more work, you could also make this approach return percentiles for individual by groups rather than for the whole dataset at once. You would need to write a double-DOW loop to do this, e.g.:
data example;
array _height[19];
array quantiles[3] q40 q50 q60;
array points[3] _temporary_ (40 50 60);
/*Clear heights array between by groups*/
call missing(of _height[*]);
/*Populate heights array*/
do _n_ = 1 by 1 until(last.sex);
set class end = eof;
by sex;
_height[_n_] = height;
end;
/*Calculate quantiles*/
do i = 1 to 3;
quantiles[i] = pctl(points[i], of _height{*});
end;
/* Output all rows from input dataset, with by-group quantiles attached*/
do _n_ = 1 to _n_;
set class;
output;
end;
keep name sex q40--q60;
run;

SAS / PROC FREQ TABLES - can I suppress frequencies and percents if frequency is less than a given value?

I'm using tagsets.excelxp in SAS to output dozens of two-way tables to an .xml file. Is there syntax that will suppress rows (frequencies and percents) if the frequency in that row is less than 10? I need to apply that in order to de-identify the results, and it would be ideal if I could automate the process rather than use conditional formatting in each of the outputted tables. Below is the syntax I'm using to create the tables.
ETA: I need those suppressed values to be included in the computation of column frequencies and percents, but I need them to be invisible in the final table (examples of options I have considered: gray out the entire row, turn the font white so it doesn't show for those cells, replace those values with an asterisk).
Any suggestions would be greatly appreciated!!!
Thanks!
dr j
%include 'C:\Users\Me\Documents\excltags.tpl';
ods tagsets.excelxp file = "C:\Users\Me\Documents\Participation_rdg_LSS_3-8.xml"
style = MonoChromePrinter
options(
convert_percentages = 'yes'
embedded_titles = 'yes'
);
title1 'Participation';
title2 'LSS-Level';
title3 'Grades 3-8';
title4 'Reading';
ods noproctitle;
proc sort data = part_rdg_3to8;
by flag_accomm flag_participation lss_nm;
run;
proc freq data = part_rdg_3to8;
by flag_accomm flag_participation;
tables lss_nm*grade_p / crosslist nopercent;
run;
ods tagsets.excelxp close;
D.Jay: Proc FREQ does not contain any options for conditionally masking cells of it's output. You can leverage the output data capture capability of the ODS system with a follow-up Proc REPORT to produce the desired masked output.
I am guessing on the roles of the lss and grade_p as to be a skill level and a student grade level respectively.
Generate some sample data
data have;
do student_id = 1 to 10000;
flag1 = ranuni(123) < 0.4;
flag2 = ranuni(123) < 0.6;
lss = byte(65+int(26*ranuni(123)));
grade = int(6*ranuni(123));
* at every third lss force data to have a low percent of grades < 3;
if mod(rank(lss),3)=0 then
do until (grade > 2 or _n_ < 0.15);
grade = int(6*ranuni(123));
_n_ = ranuni(123);
end;
else if mod(rank(lss),7)=0 then
do until (grade < 3 or _n_ < 0.15);
grade = int(6*ranuni(123));
_n_ = ranuni(123);
end;
output;
end;
run;
proc sort data=have;
by flag1 flag2;
*where lss in ('A' 'B') and flag1 and flag2; * remove comment to limit amount of output during 'learning the code' phase;
run;
Perform the Proc FREQ
Only capture the data corresponding to the output that would have been generated
ods _all_ close;
* ods trace on;
/* trace will log the Output names
* that a procedure creates, and thus can be captured
*/
ods output CrossList=crosslist;
proc freq data=have;
by flag1 flag2;
tables lss * grade / crosslist nopercent;
run;
ods output close;
ods trace off;
Now generate output to your target ODS destination (be it ExcelXP, html, pdf, etc)
Reference output of which needs to be produced an equivalent having masked values.
* regular output of FREQ, to be compare to masked output
* of some information via REPORT;
proc freq data=have;
by flag1 flag2;
tables lss * grade / crosslist nopercent;
run;
Proc REPORT has great features for producing conditional output. The compute block is used to select either a value or a masked value indicator for output.
options missing = ' ';
proc format;
value $lss_report ' '= 'A0'x'Total';
value grade_report . = 'Total';
value blankfrq .b = '*masked*' ._=' ' other=[best8.];
value blankpct .b = '*masked*' ._=' ' other=[6.2];
proc report data=CrossList;
by flag1 flag2;
columns
('Table of lss by grade'
lss grade
Frequency RowPercent ColPercent
FreqMask RowPMask ColPMask
)
;
define lss / order order=formatted format=$lss_report. missing;
define grade / display format=grade_report.;
define Frequency / display noprint;
define RowPercent / display noprint;
define ColPercent / display noprint;
define FreqMask / computed format=blankfrq. 'Frequency' ;
define RowPMask / computed format=blankpct. 'Row/Percent';
define ColPMask / computed format=blankpct. 'Column/Percent';
compute FreqMask;
if 0 <= RowPercent < 10
then FreqMask = .b;
else FreqMask = Frequency;
endcomp;
compute RowPMask;
if 0 <= RowPercent < 10
then RowPMask = .b;
else RowPMask = RowPercent;
endcomp;
compute ColPMask;
if 0 <= RowPercent < 10
then ColPMask = .b;
else ColPMask = ColPercent;
endcomp;
run;
ods html close;
If you have to produce lots of cross listings for different data sets, the code is easily macro-ized.
When I've done this in the past, I've first generated the frequency to a dataset, then filtered out the N, then re-printed the dataset (using tabulate usually).
If you can't recreate the frequency table perfectly from the freq output, you can do a simple frequency, check which IDs or variables or what have you to exclude, and then filter them out from the input dataset and rerun the whole frequency.
I don't believe that you can with PROC FREQ, but you can easily replicate your code with PROC TABULATE and you can use a custom format there to mask the numbers. This example sets it to M for missing and N for less than 5 and with one decimal place for the rest of the values. You could also replace the M/N with a space (single space) to have no values shown instead.
*Create a format to mask values less than 5;
proc format;
value mask_fmt
. = 'M' /*missing*/
low - < 5='N' /*less than 5 */
other = [8.1]; /*remaining values with one decimal place*/
run;
*sort data for demo;
proc sort data=sashelp.cars out=cars;
by origin;
run;
ods tagsets.excelxp file='/folders/myfolders/demo.xml';
*values partially masked;
proc tabulate data=cars;
where origin='Asia';
by origin;
class make cylinders;
table make, cylinders*n*f=mask_fmt. ;
run;
ods tagsets.excelxp close;
This was tested on SAS UE.
EDIT: Forgot the percentage piece, so this likely will not work for that, primarily because I don't think you'll get the percentages the same as in PROC FREQ (appearance) so it depends on how important that is to you. The other possibility to accomplish this would be to modify the PROC FREQ template to use the custom format as above. Unfortunately I do not have time to mock this up for you but maybe someone else can. I'll leave this here to help get you started and delete it later on.

How do I order the bars of a bar chart by height rather than alphabetically

Here's a simple data set:
data dat;
do i = 1 to 100;
if rand('unif') > 0.85 then txt = 'DEFG';
else if rand('unif') > 0.75 then txt = 'ABC';
else if rand('unif') > 0.80 then txt = 'KLMNOP';
else txt = 'HIJ';
output;
end;
run;
I want to create a bar chart that displays the frequencies of txt:
proc sgplot data = dat;
/* Bars are ordered alphabetically */
vbar txt;
run;
This creates
What I'd like instead is to order the bars by height (from left to right: HIJ, ABC, DEFG, KLMN).
Is there an option to proc sgplot to achieve that?
You can add option categoryorder in your vbar statement:
proc sgplot data = dat;
vbar txt /categoryorder=respdesc ;
run; quit;
I found it here: http://blogs.sas.com/content/graphicallyspeaking/2012/06/07/bar-chart-with-response-sort/#prettyPhoto

frequency of the variable

prg below tries to count the instances of variables.
%macro freq(dsn=,variable=,freq=,label);
proc freq data = &dsn;
tables &variable;
run;
%mend;
%freq(dsn=fff,variable=ggg);
you should be able to do this by adding a format and print procedure. I have tested this and I believe it achieves what you are looking to accomplish.
%macro freq(indsn=,variable=,freq=,label);
/* Create a new format or label when a value falls between
0 and the user defined frequency. */
proc format;
value fmt 0-&freq. = "&label.";
run;
/* Run the frequency procedure and suppress the output
to a temporary data set. */
proc freq data = &indsn;
tables &variable / noprint out=temp;
run;
/* Print the temporary data set and format the Count
variable (which was created in the freq procedure)
to the format, fmt, that we created. Finally, print only
records with a frequency less than the user defined
frequency. */
proc print data=temp noobs;
format count fmt.;
where count le &freq.;
run;
%mend;
With your recent edit you will be able to accomplish this with a data step and an if statement.
%macro freq(indsn=,variable=,freq=,label);
/* Run the frequency procedure and suppress the output
to a temporary data set. */
proc freq data = &indsn;
tables &variable / noprint out=temp;
run;
/* Assign variable a new name if its frequency equals the
user defined frequency and only keep records with a count
less than or equal to the frequency. */
data temp;
set temp;
if count le &freq.;
if count = &freq. then &variable. = &label.;
run;
/* Print only the &variable variable. */
proc print data=temp noobs;
var &variable.;
run;
%mend;