Hello I would like to plot a Series SGPLOT where the Y axis is the percentage of a ratio of two values.
For example I have:
|Month|Chickens_sold|Total_sold|
|-----|-------------|----------|
|01 |5 |10 |
|02 |6 |13 |
|03 |4 |11 |
|04 |9 |9 |
I want a graph that has Month for the x axis and y is a calculated field of (Chicken_sold/Total_sold*100)
my code is something like this:
PROC SGPLOT DATA=Farm;
SERIES x=Month y=(Chicken_sold/Total_sold*100);
RUN;
Create your calculation within your dataset first.
data want;
set farm;
percent = Chicken_sold/Total_sold*100;
run;
proc sgplot data=want;
series x = month y = percent;
run;
Note that in CAS Actions on Viya, the concept of a calculated variable like this is valid and can be done. This is done with the computedVars and computedVarsProgram statements.
There are many other SAS PROCs that also let you run programs or functions within them, but SGPLOT is not one of them. Generally SGPLOT is designed around prepared data.
Related
Ok, so I have a dataset that I have to sample based on another dataset's proportions, and I already have an allocation dataset with 2 columns: strata and alloc. When I run the ff code:
proc surveyselect data=have out=want outall method = srs sampsize=10000 seed=1994;
strata strata/alloc = alloc;
id name;
run;
I get this error:
ERROR: The sum of the _ALLOC_ proportions in the data set ALLOC must equal 1.
I checked my allocation dataset and I see that the strata equal to 1. I'm not sure if there's an issue with my dataset or code. I've already sorted the have dataset by strata, and I also sorted the allocation dataset by strata as well. I've been using the same (or similar) script to randomly sample from many different datasets below, so I'm not sure why it isn't working for this one.
Any ideas? Thanks!
Edit: For more info, I'm using SAS Enterprise Guide 7.1.
For reference, the alloc table is as follows (I can't give real strata names, but I've checked and they are identical to the strata in my have dataset):
_alloc_ | strata
0.3636363636 | strata1
0.0909090909 | strata2
0.0909090909 | strata3
0.0909090909 | strata4
0.1818181818 | strata5
0.0909090909 | strata6
0.0909090909 | strata7
I am also perplexed. As I mentioned, this code worked in other datasets except for this one. If there is any correlation at all, I created the alloc dataset using R and imported it to SAS.
Do a distinct count on all the strata values in your dataset:
proc sql noprint;
create table check as
select distinct strata
from have
;
quit;
If there are any extra groups that do not exist in the alloc dataset or vis-versa, your error message will appear. In the example code below, alloc has 7 strata but have has 6 strata.
data alloc;
infile datalines dlm='|';
input _alloc_ strata$;
datalines;
0.3636363636 | strata1
0.0909090909 | strata2
0.0909090909 | strata3
0.0909090909 | strata4
0.1818181818 | strata5
0.0909090909 | strata6
0.0909090909 | strata7
;
run;
/* Only have 6 strata instead of 7 in the data */
data have;
do strata = 'strata1', 'strata2', 'strata3', 'strata4', 'strata5', 'strata6';
do i = 1 to 100;
name = 'name';
output;
end;
end;
run;
proc surveyselect data=have
out=want
outall
method = srs
sampsize=10
seed=1994
;
strata strata / alloc = alloc;
id name;
run;
I have data that look like the following:
data have;
format date date9.;
input date:mmddyy10. Intervention _24hrPtVolumeESI_1_5;
datalines;
9/17/2018 0 204
9/24/2018 0 139
10/17/2018 0 527
10/23/2018 1 430
11/01/2018 1 231
;
run;
I would like to create a bar chart where the x axis contains ranges of median wait time (e.g. 100-125, 126-150 etc.) while displaying those times comparatively based on intervention (0 or 1). Thus, each range would have two bars-one for preintervention (0) and post interventions(1) The Y axis would simply show the counts for how man given median scores fell within the x axis range.
I've tried toying around with a sgplot code but that produces sloppy results.
proc sgplot data=WORK.FelaCombo;
vbar _24hrPtVolumeESI_1_5 / response=_24hrPtVolumeESI_1_5 stat=sum
group=intervention nostatlabel
groupdisplay=cluster;
xaxis display=(nolabel);
yaxis grid;
run;
Try using a histogram instead. vbar is more for discrete categories, whereas histogram will automatically create bins.
proc sgplot data=WORK.have;
histogram _24hrPtVolumeESI_1_5 /
scale=count
binstart=100
binwidth=25
group=intervention
transparency=0.5
showbins
;
xaxis display=(nolabel);
yaxis grid;
run;
I am trying to create a prediction interval in SAS. My SAS code is
Data M;
input y x;
datalines;
100 20
120 40
125 32
..
;
proc reg;
model y = x / clb clm alpha =0.05;
Output out=want p=Ypredicted;
run;
data want;
set want;
y1= Ypredicted;
proc reg data= want;
model y1 = x / clm cli;
run;
but when I run the code I could find the new Y1 how can I predict the new Y?
What you're trying to do is score your model, which takes the results from the regression and uses them to estimate new values.
The most common way to do this in SAS is simply to use PROC SCORE. This allows you to take the output of PROC REG and apply it to your data.
To use PROC SCORE, you need the OUTEST= option (think 'output estimates') on your PROC REG statement. The dataset that you assign there will be the input to PROC SCORE, along with the new data you want to score.
As Reeza notes in comments, this is covered, along with a bunch of other ways to do this that might work better for you, in Rick Wicklin's blog post, Scoring a regression model in SAS.
I have a table of this form
id1|A|
id1| |var1
id1|B|var2
id2|C|
I would like to count retrieve the data that have all the information for all variables: ie
id1|B|var2
to perform this task I want to count the number of observations in each row and take only the rows which have full observation:
id|name|age |cntrow
id1| A | |2
id1| |var1|2
id1| B |var2|3
id2| C | |2
Any guess how to perform this task?
You can use a CMISS function. Something along the lines of:
Data nomissing missing;
Set input_dataset;
if CMISS(of _ALL_)=0 then output nomissing;
if CMISS(of _ALL_)>0 then output missing;
run;
The n function would work if this were numeric. Since the data are not, you can use CMISS to find out how many are missing:
data have;
infile datalines dlm='|';
input
id $ charvar1 $ charvar2 $ numvar;
vars_missing = cmiss(of _all_)-1; *because vars_missing is also missing at this point!;
put _all_;
datalines;
id1|A| |3
id1| |var1|2
id1|B|var2|.
id2|C| |2
;;;;
run;
And then subtract that from the known number of variables. If you don't know it, you can create _CHARACTER_ and _NUMERIC_ arrays and use dim() for those to find out.
I am trying to create histograms in sas with the help of proc univariate in sas. But it gives me histograms with equal class widths. Suppose i want to have a histogram with first class interval from 1 to 10 and second class interval from 10 to 100.
I tried using-
proc univariate data=sasdata1.dataone;
var sum;
histogram sum/ midpoints=0 to 10 by 10 10 to 100 by 90 ;run;
But this does not work. What is the correct way of doing this?
You can't do it with UNIVARIATE as far as I know, but any of the SGPLOT/GPLOT/etc. procedures will work; just bin your data into a categorical variable and VBAR that variable.
If you're okay with frequencies (not percents), this would work:
data test;
set sashelp.class;
do _t = 1 to floor(ranuni(7)*20);
age=age+floor(ranuni(7)*10);
output;
end;
run;
proc format;
value agerange
low-12 = "Pre-Teen"
13-14 = "Early Teen"
15-18 = "Teen"
19-21 = "Young Adult"
22-high = "Adult";
quit;
ods graphics on;
ods preferences;
proc sgplot data=test;
format age agerange.;
vbar age;
run;
I believe if you need percents, you'd want to PROC FREQ or TABULATE your data first and then SGPLOT (or GPLOT) the results.
I did find a macro that can be used to create histograms with unequal endpoints.
The code can be found in the NESUG 2008 proceedings