Bar Graph by Month - SAS EG - sas

I am trying to create a bar graph in SAS Enterprise Guide. The graph is Savings by Month.
The input Data is
Ref Date Savings
A 03JUN2013 1000
A 08JUN2013 2000
A 08JUL2013 1500
A 08AUG2013 300
A 08NOV2013 100
B 09DEC2012 500
B 09MAY2013 400
B 19MAY2013 5999
B 09OCT2013 511
C 15OCT2013 1200
C 01NOV2013 1500
The first step I do is to convert the date into month. The I use PROC MEANS to calculate total savings by month by Ref.
Then I create a bar graph. The issue I am getting is the bar graph is not in a sequential order as it should be. Like it is AUG13 JUl13 JUN13 .. etc. instead of JUN JUL AUG.
PROC SQL;
CREATE TABLE SAVINGS_11 AS
SELECT
PUT(DATE,monname3.) AS MONTH,
(DATE) FORMAT=MONNAME3. AS MONTH1,
MONTH(DATE) AS MONTH2,
PUT(DATE,MONYY5.) AS MONTH3,
(DATE) FORMAT=MONYY5. AS MONTH4,
DATE,
REF,
SAVINGS
FROM INPUT;
QUIT;
/* -------------------------------------------------------------------
Sort data set
------------------------------------------------------------------- */
PROC SORT
DATA=SAVINGS_11(KEEP=SAVINGS MONTH MONTH1 MONTH2 MONTH3 MONTH4 REF)
OUT=SORT1;
BY REF;
RUN;
/* -------------------------------------------------------------------
Run the Means Procedure
------------------------------------------------------------------- */
TITLE;
TITLE1 "Summary";
TITLE2 "Results";
FOOTNOTE;
PROC MEANS DATA=SORT1
NOPRINT
CHARTYPE
NOLABELS
NWAY
SUM NONOBS ;
VAR SAVINGS;
CLASS MONTH / ORDER=DATA ASCENDING;
BY REF;
ID MONTH1 MONTH2 MONTH3 MONTH4;
OUTPUT OUT=MEANSUMMARY
SUM()=
/ AUTONAME AUTOLABEL WAYS INHERIT
;
RUN;
/* -------------------------------------------------------------------
End of task code.
------------------------------------------------------------------- */
RUN; QUIT;
TITLE; FOOTNOTE;
PROC SORT
DATA=MEANSUMMARY(KEEP=MONTH MONTH2 "SAVINGS_Sum"n REF)
OUT=SORT2
;
BY REF MONTH2;
RUN;
Axis1
STYLE=1
WIDTH=1
MINOR=NONE
;
Axis2
STYLE=1
WIDTH=1
;
TITLE;
TITLE1 "Bar Chart";
FOOTNOTE;
PROC GCHART DATA=SORT2
;
VBAR
MONTH
/
SUMVAR="SAVINGS_Sum"n
CLIPREF
FRAME LEVELS=ALL
TYPE=SUM
INSIDE=SUM
COUTLINE=BLACK
RAXIS=AXIS1
MAXIS=AXIS2
;
BY REF;
/* -------------------------------------------------------------------
End of task code.
------------------------------------------------------------------- */
RUN; QUIT;
TITLE; FOOTNOTE;
Whatever format I use, the end result is not in a sequential order. Please help.

Your problem is that you're converting the date value to a character variable. MONTH, at least, should be a formatted date variable, not a character variable; so this line:
PUT(DATE,monname3.) AS MONTH,
should be
DATE AS MONTH FORMAT=monname3.,
Most procedures (like PROC MEANS and PROC GPLOT) will respect formats and group by same-formatted values. I don't completely understand why you have 5 month variables all containing different versions of the same thing, so perhaps there are better ways to do what you're doing here.
In particular, if you have SAS 9.2 or later, SGPLOT will probably do this entire process for you without any of the summarization steps.

Apart from what Joe mentioned above, you also need to include the key word DISCRETE in your VBAR statement if you want to be able to see all months for each reference on the x-axis where they made a saving (Note: this wil generate warning messages if some references do not have any savings in some months).
You could try the following code which I believe produces the output you are after:
PROC SQL;
CREATE TABLE DATA_TO_PLOT AS
SELECT
REF
,INPUT(PUT(date,YYMMN6.),YYMMN6.) FORMAT =DATE9. AS MONTH
,SUM(Savings) AS MONTHLY_SAVINGS
FROM INPUT
GROUP BY 1,2
ORDER BY 1,2 ;
QUIT;
Axis1 STYLE=1 WIDTH=1 MINOR=NONE;
Axis2 STYLE=1 WIDTH=1;
TITLE;
TITLE1 "Bar Chart";
PROC GCHART DATA=DATA_TO_PLOT;
VBAR MONTH
/ SUMVAR=MONTHLY_SAVINGS
CLIPREF
FRAME TYPE=SUM
COUTLINE=BLACK
RAXIS=AXIS1
MAXIS=AXIS2
INSIDE=SUM
DISCRETE
;
FORMAT MONTH MONYY7.;
BY Ref;
RUN; QUIT;

Related

Deleting and adding specific row/ column in SAS output

I have the following data
DATA HAVE;
input year dz $8. area;
cards;
2000 stroke 08
2000 stroke 06
2000 stroke 06
;
run;
After using proc freq
proc freq data=have;
table area*dz/ list nocum ;
run;
I get the below output
In this output
I want to delete the 'dz', what can I do to delete this column?
I want a row in the end that gives 'total', what can I do to get a 'total' row?
Thank you!
There must be a better way of doing this, but the following code creates the desired table:
data have;
input year dz $8. area;
cards;
2000 stroke 08
2000 stroke 06
2000 stroke 06
;
run;
ods output List=list;
proc freq data=have;
table area*dz / list;
run;
data stage1;
set list(keep= area frequency percent CumFrequency CumPercent) end=eof;
area_char = put(area,best.-l); /* Convert it to char to add the Total row */
if eof then do;
call symputx("cumFreq", cumfrequency);
call symputx("cumPerc", cumpercent);
end;
drop area;
run;
data want;
retain area frequency percent; /* Put the variables in the desired order */
set stage1(rename=(area_char=area) drop=cumfrequency cumpercent) end=eof;
output;
if eof then do; /* Manually create the Total row */
area = "Total";
Frequency = &cumfreq.;
Percent = &cumperc.;
output;
end;
run;
Output (want table):
You should subset your data with a where clause and use a title statement if a important partitioning variable is to be removed from output. If you didn't subset how would your audience know if a count contained say episodes of stroke and ministroke if ministroke was also in the data.
Compute the frequencies with freq and use a reporting procedure (print, report, tababulate) that summarizes to show a total line.
Example:
data have;
input year dz $ area;
cards;
2000 stroke 08
2000 stroke 06
2000 stroke 06
;
proc freq noprint data=have;
where dz = 'stroke';
table area / out=freqs;
run;
title 'Stroke dz';
title2 'print';
proc print data=freqs noobs label;
var area;
sum count percent;
run;
title2 'report';
proc report data=freqs;
columns area count percent;
define area / display;
define count / analysis;
rbreak after / summarize;
run;
title2 'tabulate';
proc tabulate data=freqs;
class area;
var count percent;
table area all, count percent;
run;
Thank you all for your valuable responses. The following code gives me the desired output in a concise way
proc freq data=HAVE;
tables area / list nocum out=a;
run;
proc sql;
create table b as
select * from a
union
select
'Total' as area,
sum(count) as count,
sum(percent) as percent
FROM a
;
quit;
proc print data=b; run;

SAS Demographic Table

I have been trying to create a demographic table like below this but I can't seem append the different tables. Please advise on where I can make adjustments in the code.
Group A Group B
chort 1 cohort 2 cohort 3 subtotal cohort 4 cohort 5 cohort 6 subtotal
Age
n
mean
sd
median
min
Gender
n
female
male
Race
n
white
asian
hispanic
black
My Code:
PROC FORMAT;
value content
1=' '
2='Age'
3='Gender'
4='Race'
value sex
1=' n'
2=' female'
3=' male';
value race
1=' n'
2=' white'
3=' asian'
4=' hispanic'
5=' black';
value stat
1=' n'
2=' Mean'
3=' Std. Dev.'
4=' Median'
5=' Minimum';
RUN;
DATA testtest;
SET test.test(keep = id group cohort age gender race);
RUN;
data tottest;
set testtest;
output;
if prxmatch('m/COHORT 1|COHORT 2|COHORT 3/oi', cohort) then do;
cohort='Subtotal';
output;
end;
if prxmatch('m/COHORT 4|COHORT 5|COHORT 6/oi', cohort) then do;
cohort='Subtotal';
output;
end;
run;
data count;
if 0 then set testtest nobs=npats;
call symput('npats',put(npats,1.));
stop;
run;
proc freq data=tottest;
tables cohort /out=patk0 noprint;
tables cohort*sex /out=sex0 noprint;
tables cohort*race /out=race0 noprint;
run;
PROC MEANS DATA = testtest n mean std min median;
class cohort;
VAR age;
RUN;
I know that I would have to transpose it and out it in a report. But before I do that, how do I get the variable out of my proc means, proc freq, etc?

How to filter a dataset according to its quantile

In the following code, how could I keep only the observations superior to the 95th quantile?
data test;
input business_ID $ count;
datalines;
'busi1' 2
'busi1' 10
'busi1' 4
'busi2' 1
'busi3' 2
'busi3' 1
;
run;
proc sort data = test;
by descending count;
run;
I don't know how to cleanly stock the quartile and then re-use it with an if condition.
Thanks
Edit : I can determine the quantile with this code :
proc means data=test noprint;
var count;
output out=quantile P75= / autoname;
run;
But how can I relate to it in the Test dataset so that I can select every observations above that quantile?
You could either read the value of the quantile in a macro variable to use in a subsequent if or where condition:
proc means data=test noprint;
var count;
output out=quantile P75= / autoname;
run;
data _null_;
set quantile;
call symput('quantile',count_p75);
run;
data test;
set test;
where count > &quantile.;
run;
or you could use an SQL subquery
proc means data=test noprint;
var count;
output out=quantile P75= / autoname;
run;
proc sql undo_policy=none;
create table test as
select *
from test
where count > (select count_p75 from quantile)
;
quit;
(Note that your question mentions the 95th quantile whereas your sample code mentions the 75th)
User2877959's solution is solid. Recently I did this with Proc Rank. The solution is a bit 'work around-y', but saves a lot of typing.
proc rank data=Input groups=1000 out=rank_out;
var var_to_rank;
ranks Rank_val;
run;
data seventy_five;
set rank_out;
if rank_val>750;
run;
More on Rank: http://documentation.sas.com/?docsetId=proc&docsetTarget=p0le3p5ngj1zlbn1mh3tistq9t76.htm&docsetVersion=9.4&locale=en

How to sum up previous rows value's with current row in SAS?

I need a summation column, however, both retain and lag commando'es are inefficient.
There are number of ways. You could use proc sql or proc means. I've written a way below:
data begin;
length person $3 sallary 5;
input person sallary;
datalines;
a 200
a 300
b 800
c 400
c 500
c 600
;
run;
proc means data=begin noprint;
by person; /*Handle each person as distinct subset*/
output out=Sal_by_person(drop= _type_ _freq_)
sum(sallary)=Total_sallary /*What we calculate and what we call them.*/
;
run;

Calculating correlation and covariance for a event window in SAS

I have to calculate the correlation and covariance for my daily sales values for an event window. The event window is of 45 day period and my data looks like -
store_id date sales
5927 12-Jan-07 3,714.00
5927 12-Jan-07 3,259.00
5927 14-Jan-07 3,787.00
5927 14-Jan-07 3,480.00
5927 17-Jan-07 3,646.00
5927 17-Jan-07 3,316.00
4978 18-Jan-07 3,530.00
4978 18-Jan-07 3,103.00
4978 18-Jan-07 3,026.00
4978 21-Jan-07 3,448.00
Now, for every store_id, date combination, I need to go back 45 days (there is more data for each combination in my original data set) calculate the correlation between sales and lag(sales) i.e. autocorrelation of degree one. As you can see, the date column is not continuous. So something like (date - 45) is not going to work.
I have gotten till this part -
data ds1;
set ds;
by store_id;
LAG_SALE = lag(sales);
IF FIRST.store_idTHEN DO;
LAG_SALE = .;
END;
run;
For calculating correlation and covariances -
proc corr data=ds1 outp=Corr
by store_id date;
cov; /** include covariances **/
var sales lag_sale;
run;
But how do I insert the event window for each date, store_id combination? My final output should look something like this -
id date corr cov
5927 12-Jan-07 ... ...
5927 14-Jan-07 ... ...
Here is what I've come up with:
First I convert the date to a SAS date, which is the number of days since Jan. 1 1960:
data ds;
set ds (rename=(date=old_date));
date = input(old_date, date11.);
drop old_date;
run;
Then compute lag_sale (I am using the same calculation you used in the question, but make sure this is what you want to do. For some observations the lag sale is the previous recorded date, but for some it is the same store_id and date, just a different observation.):
proc sort data=ds; by store_id; run;
data ds;
set ds;
by store_id;
lag_sale = lag(sales);
if first.store_id then lag_sale = .;
run;
Then set up the final data set:
data final;
length store_id 8 date 8 cov 8 corr 8;
if _n_ = 0;
run;
Then create a macro which takes a store_id and date and runs proc corr. The first part of the macro selects only the data with that store_id and within the past 45 days of the date. Then it runs proc corr. Then it formats proc corr how you want it and appends the results to the "final" data set.
%macro corr(store_id, date);
data ds2;
set ds;
where store_id = &store_id and %eval(&date-45) <= date <=&date
and lag_sale ne .;
run;
proc corr noprint data=ds2 cov outp=corr;
by store_id;
var sales lag_sale;
run;
data corr2;
set corr;
where _type_ in ('CORR', 'COV') and _name_ = 'sales';
retain cov;
date = &date;
if _type_ = 'COV' then cov = lag_sale;
else do;
corr = lag_sale;
output;
end;
keep store_id date corr cov;
run;
proc append base=final data=corr2 force; run;
%mend corr;
Finally run the macro for each store_id/date combination.
proc sort data=ds out=ds3 nodupkey;
by store_id date;
run;
data _null_;
set ds3;
call execute('%corr('||store_id||','||date||');');
run;
proc sort data=final;
by store_id date;
run;