I want to create a list of years. Since I just need a handful, I do it like this:
create table work.year_range
(YEAR num);
insert into work.year_range
values (2006) values (2007) values (2008) values (2009) values (2010) values (2011)
values (2012) values (2013) values (2014) values (2015) values (2016) values (2017);
but that is clearly ugly. What is the ideomatic way to create a range of numbers in PROC SQL?
you have to use do loop and hard to emulate the same in proc sql;
data year_range;
do year= 2006 to 2017;
output;
end;
run;
Another crude way(not recommended) to do proc sql is to use undocumented monotonic function and using dataset which has more than 11 records as shown below
proc sql;
create tale work.year_range
select 2005 + monotonic() as year
from sashelp.class
where calculated year between 2006 and 2017;
Related
I am working with paneldata that looks something like this:
I am going to perform a t-test in SAS 9.4 to find out if there is a significant change in var1 from 2014 to 2016, and I am assuming that I have to use a paired t-test, since I have several an observation in both 2014 and 2016 for each individual (ID).
My question is, can this be done in SAS, when I am using panel data like the one I have shown? Or do I need to create a a wide dataset with one variable containing the data from 2014 and one variable containing the data from 2016? I know that I have to do that in STATA, but maybe I don't have to change my entire dataset to do this in SAS?
You will have to transpose your data to to a paired t-test. You can use PROC TRANSPOSE though.
*sort for transpose;
proc sort data=have; by id year; run;
*reformat from long to wide;
proc transpose data=have out=want prefix=Year_;
by ID;
ID Year;
Var Var1;
run;
*Paired T-Test;
proc ttest data=want;
paired Year_2014*Year_2016;
run;
PS. Please include your data as text not an image in the future. We cannot write code off an image and I'm not typing out your data, so at present this is untested but should work.
Hi I have a time series data table from October 2013 to October 2016. I would like to plot the time series from October 2013 to November 2014, October 2014 to November 2015, and October 2015 to November 2016 on top of each other on the same graph to analyze any seasonal trends.
My idea is to create separate data tables with each subsegment, but is there an easier way to do this in SAS?
This is an example of the data table I want to plot the seasonality of.
The workflow I think here is to add a group variable that indicates, say, year, which has the same value for all rows you want plotted in one plot-grouping.
Then you use the group statement in whatever plot type you want. Something like:
data stocks_years;
set sashelp.stocks;
date_year = intck('YEAR','01AUG1986'd,date,'c')+1986;
date_month= month(date);
run;
proc sgplot data=stocks_years;
vline date_month/response=close group=date_year stat=mean;
run;
This is an example of doing that to see the average close per month of the three stocks in the SASHELP.STOCKS dataset. It is a terrible plot of course but it should give you some idea of what it would look like. Each of those differently colored lines is from a different year (aug->jul being defined as a year, with the number being the year number of aug).
The lead off provided by Joe gave me everything I needed. Here is the completed code for anyone else's reference.
%macro Plot_Seasonal_Worse_TP(tbl_name, tp, cutoff_date);
/*tp = transition probability */
proc sql;
create table &tbl_name._trim as
SELECT *
FROM &tbl_name
WHERE asofdt > &cutoff_date;
run;
data &tbl_name._trim;
set &tbl_name._trim;
date_year = intck('YEAR','01NOV2013'd,asofdt,'c')+2014;
date_month= MOD(month(asofdt)+2, 12); /* move november and december of previous year to front of time series */
run;
proc sgplot data=&tbl_name._trim;
vline date_month/response=&tp group=date_year;
title &tbl_name (&tp);
run;
%mend Plot_Seasonal_Worse_TP;
Output looks like this as well.
First question, is it possible to produce a boxplot using proc gchart in SAS?
If it is possible, please give me a brief idea.
Or else, on the topic of using proc boxplot.
Suppose I have a dataset that has three variables ID score year;
something like,
data aaa;
input id score year;
datalines;
1 50 2008
1 40 2007
2 30 2008
2 20 2007
;
run;
I want to produce a boxplot showing for each ID in each year. (So in this case, 4 boxplots in a single plot)
How can i achieve this?
I have tried using
proc boxplot data=aaa;
plot score*ID;
by year;
run;
However, this is not working as we can see year is not sorted by order.
Is there a way to get other this?
You need to sort your input dataset first. Run this
proc sort data = aaa;
by year;
run;
and then your proc boxplot should work as written.
This is quite easy to do with sgplot, which is part of the newer ODS Graphics suite which is available in base SAS.
proc sgplot data=sashelp.cars;
vbox mpg_city/category=type group=origin grouporder=ascending;
run;
You would use category=id and group=year in your example data - you get one separate tick on the x axis for each category and then you get a separate bar clustered together for each group.
Let's say I have 50 years of data for each day and month. I also have a column which lists the max rainfall for each day of that dataset. I want to be able to compute the average monthly rainfall and standard deviation for each of those 50 years. How would I accomplish this task? I've considered using PROC MEANS:
PROC MEANS DATA = WORK.rainfall;
BY DATE;
VAR AVG(max_rainfall);
RUN;
but I'm unfamiliar on how to let SAS understand that I want to be using the MM of the MMDDYY format to indicate where to start and stop calculating those averages for each month. I also do not know how I can tell SAS within this PROC MEANS statement on how to format the data correctly, using MMDDYY10. This is why my code fails.
Update: I've also tried using this statement,
proc sql;
create table new as
select date,count(max_rainfall) as rainfall
from WORK.rainfall
group by date;
create table average as
select year(date) as year,month(date) as month,avg(rainfall) as avg
from new
group by year,month;
quit;
but that doesnt solve the problem either, unfortunately. It gives me the wrong values, although it does create a table. Where in my code could I have gone wrong? Am I telling SAS correctly that add all the rainfall's in 30 days and then divide it by the number of days for each month? Here's a snippet of my table.
You can use a format to group the dates for you. But you should use a CLASS statement instead of a BY statement. Here is an example using the dataset SASHELP.STOCKS.
proc means data=sashelp.stocks nway;
where date between '01JAN2005'd and '31DEC2005'd ;
class date ;
format date yymon. ;
var close ;
run;
I am unable to create stacked charts by group and subgroup in sas9.4, I want charts which are similar to excel graphs. Please find the sample data and excel graph below (first image) and also the SAS graph (second image).
I am unable to set the common year for the SEGMENT 'ACTUAL' AND 'FORECAST' on the same axis (year). The actual means the data has up to 2014 and forecast means after 2014, Both should fall in the same axis.
goptions reset=all ;
goptions colors=(red blue green);
legend1 label=none ;
proc gchart data=NEW;
vbar year/ discrete type=sum sumvar=VALUE
group= segment subgroup=WKSCOPE ;
where year le 2020 AND YEAR ge 2012;
run;
I would solve this with annotation. I know SGPLOT better than GCHART so I'll answer it this way.
data have;
input segment $ year wkscope $ value;
datalines;
ACTUAL 2012 PH 5
ACTUAL 2012 PH 1
ACTUAL 2012 BHS 1
ACTUAL 2012 RES 2
ACTUAL 2013 PH 2
ACTUAL 2013 PH 5
ACTUAL 2013 BHS 1
ACTUAL 2014 RES 2
FORECAST 2015 PH 3
FORECAST 2015 BHS 0
FORECAST 2016 PH 4
FORECAST 2016 RES 1
FORECAST 2017 PH 5
FORECAST 2017 BHS 1
FORECAST 2017 RES 2
;;;;
run;
data sgannods;
x1space='wallpercent';
y1space='wallpercent';
x1=75;
y1=-10;
label="Forecast";
function='text';
output;
x1=25;
label="Actual";
output;
run;
proc sgplot data=have sganno=sgannods;
vbar year/response=value group=wkscope groupdisplay=stack;
run;
Basically, do everything except segment, then annotate using that value. You can generate it by hand like I do, or (preferably) generate it from the original data if it could change. I use WALLPERCENT since it's going to be first half is actual last half is forecast, but if it could change (2 actual 4 forecast) then you shouldn't do that; you should either use WALLPERCENT and work out the proper position from the data (with a proc freq, probably) or use DATAVALUE and put it under the middle value.
If this isn't close enough, I would go to robslink.com, which has a nice set of examples (and is written by one of the developers of the GCHART set of procs). Sanjay also has a blog, Graphically Speaking which has some great examples, and both post on SAS Communities.
The image I produce follows here. It's not particularly close in other manners but all of those are easy to fix (color scheme, sizes, location of legends).
Data labels are the one thing you can't really have this way; they're addable if you use VBARPARM, but that requires summarizing the data ahead of time. Sanjay covers this in one of his blog posts about 9.4M2 (if you have the M2 maintenance release); I also cover this in my MWSUG Paper, Labelling without the Hassle: How to Produce Labeled Stacked Bar Charts Using
SGPLOT and GTL Without Annotate if you have an older version.