Is it possible to choose the whisker value with proc sgplot.
Because it seems only 25th and 75th are avalaible for SGPLOT.
Maybe someone know if it is possible or not?
Thanks
It is not possible to add non-standard whiskers, probably because it is discouraged by statisticians.
They discourage it because the boxplot has a specific definition in terms of the quartiles.
While there are occasional variations, (i.e., to get a rough normality plot),
in general people expect to see quartiles in a box plot.
Adding arbitrary percentiles, even ones that make sense like the ones you propose,
is likely to confuse the audience more than it helps.
Try this visualization: A waterfall graph of sales contributions based on the percentile intervals you suggest:
data actualBinned; set sashelp.prdsale;
keep actual;
run;
proc rank data=actualBinned out=actualBinned
groups=100
descending;
var actual;
ranks rank;
run;
data actualBinned; set actualBinned;
if rank < 5 then bin="00-05";
else if rank < 25 then bin="05-25";
else if rank < 50 then bin="25-50";
else if rank < 75 then bin="50-75";
else if rank < 95 then bin="75-95";
else bin="95-100";
run;
proc sort data=actualBinned;
by bin;
run;
proc sgplot data=actualBinned;
waterfall category=bin response=actual;
run;
I am not a huge fan of bins of different width displayed with the same width. I would rather use 20 bins of width 5.
With that caveat, I can see how a manager might find this visualization more useful in a specific context.
BTW, the waterfall graph is experimental in 9.3. For older version of SAS there are several recipes online.
Related
I have a target population with some characteristics and I have been asked to select an appropriate control based on these characteristics. I am trying to do a stratified sample using SAS base but I need to be able to define my 4 starta %s from my target and apply these to my sample. Is there any way I can do that? Thank you!
To do stratified sampling you can use PROC SURVEYSELECT
Here is an example:-
/*Dataset creation*/
data data_dummy;
input revenue revenue_tag Premiership_level;
datalines;
1000 High 1
90 Low 2
500 Medium 3
1200 High 4
;
run;
/*Now you need to Sort by rev_tag, Premiership_level (say these are the
variables you need to do stratified sampling on)*/
proc sort data = data_dummy;
by rev_tag Premiership_level;
run;
/*Now use SURVEYSELECT to do stratified sampling using 10% samprate (You can
change this 10% as per your requirement)*/
/*Surveyselect is used to pick entries for groups such that , both the
groups created are similar in terms of variables specified under strata*/
proc surveyselect data=data_dummy method = srs samprate=0.10
seed=12345 out=data_control;
strata rev_tag Premiership_level;
run;
/*Finally tag (if you want for more clarity) your 10% data as control
group*/
Data data_control;
Set data_control;
Group = "Control";
Run;
Hope this helps:-)
is there a way to detect an outlier from proc means while calculating min max Q1 and Q3?
the box plot procedure is not working on my SAS and I am trying to perform a boxplt in excel with the values from SAS.
Assuming you have a specific definition for what an outlier is, PROC UNIVARIATE can calculate the value that appears at that percentile using the PCTLPTS keyword on the OUTPUT statement. It also will identify extreme observations individually, so you can see the top few observations (if you have few enough observations that the number of extremes is likely to be <= 5).
The paper A SAS Application to Identify and Evaluate Outliers goes over a few of the ways you can look at outliers, including box plots and PROC UNIVARIATE, and includes some regression-based approaches as well.
If you want a 'standard boxplot' use the outbox= option in SAS to create the standard data set used for a box plot.
proc boxplot data=sashelp.class;
plot age*sex / outbox = xyz;
run;
Short of using annotations, I have been unable to find a reasonable way to prevent my x-axis labels from overlapping when using a barchartparm in SAS. From the documentation, they clearly state that barcharts use a discrete axis and the other axis types such as time are not permissible for them. Although conceptually this makes sense it seems like an 'unnecessary' limitation to enforce as it leaves no control over the x-axis labeling as every discrete label will be printed.
Sample data:
data test;
format rpt_date date9.;
do rpt_date=date()-90 to date();
root = round(ranuni(1) *100,1);
output;
end;
run;
Define the chart template:
proc template;
define statgraph giddyup;
begingraph;
layout overlay;
barchartparm x=rpt_date y=root ;
endlayout;
endgraph;
end;
run;
Create the chart:
proc sgrender data=test template=giddyup;
run;
Result:
I tried to be duct-tape it and create a custom format for the x-axis that would 'blank-out' many of the values, and although the chart was produced, it stacked all the blanks together (??) and also produced a warning.
I've also tried using the alternate x2axisopts and setting the axis to secondary with no luck.
If I used a series chart I would be able to control the axis fine, but in my case the data is much easier to interpret as a barchart. Perhaps they needed to add additional options to the xaxisopts for barcharts.
The most frustrating thing here is that it's something that you can do in excel in 2 seconds, and to me seems like it would be a very common chart in excel, that is not easily reproducible in SAS!
EDIT: I also don't want to use proc gchart .
Ok now I feel silly. Turns out that histograms will achieve the same result nicely:
histogramparm x=rpt_date y=root ;
Still a valuable question I guess as I spent a lot of time googling for answers and could not find a solution.
Good thing I didn't want it horizontal...
I have Household ID's and their respective sales. As it turn out there are few of these HH ID's who have extremely high Total Sales. Can you guys please suggest a good method for the outlier treatment.
It will be great if you suggest in SAS.
Regards,
Saket
The following is a basic, rather crude method. It involves removing values more than 3 standard deviations from the mean:-
** Standardise data;
proc standard data=sales_data mean=0 std=1 out=sales_data_std;
var sales;
run;
** Remove values more than 3 std devs from mean;
data sales_data_no_outliers;
set sales_data_std;
where sales < -3 or sales > 3;
run;
There's a reference to this approach in Wikipedia.
Still, it's crude; it relies on your variable being normally distributed and will almost always find outliers (if n > 100) even if, in all reasonableness, the values are not really outlying.
The subject of outliers is long and detailed but a cursory overview of the topic might be useful. Unfortunately, I can't really think of any introductory sources off-hand.
I am trying to use Gchart in SAS to plot the values I've got, here is my code:
title "WOE Trend of VarA.";
proc gchart data=work.VarA;
vbar VarB /
type=sum sumvar = VarA ASCENDING
subgroup = VarA nolegend
raxis=axis1
maxis=axis2
autoref clipref
width=32;
run;
There are four observations in table VarA, thus I expect to see four bars appear in the plot. However, in practive, there are two of the bars are stacked together that formed a stacked bar chart as follows. Also, the values of the observations are integers, however, there are decimals in the X-axis.
I guess I must have missed something in the option part since I am very new to this. Can anyone give me a clue that what am I wrong and how can I fix it? Thank you very much.
Probably what you have is
VarA Varb
42 0.75
20 0.75
35 -0.75
28 2.25
That would generate the above chart. If you didn't subgroup by VarA, you'd get a single bar 62 long for the first observation instead of splitting it partway through. Summing and subgrouping by the same variable doesn't make a whole lot of sense, to me, but it depends on what you're trying to do I suppose.
The decimals are likely in the data, and are just rounded by your format. If you want more useful help, you might post your actual data and code.