Season Identification in SAS - sas

I have three years worth of monthly data showing concentrations of X chemical in a sample. The data shows seasonality as predicted. However, the seasons are not your regular summer/winter/etc. I am trying to find out how I can delineate the seasons in SAS. I am trying to break the year down to 2 main seasons (high vs low concentrations). So I need SAS to be able to identify where that break is between the two seasons (i.e., which months are in the high concentration season and which months are in the low concentration season). Any way to do that?

Yes. You will want to use proc timeseries to decompose the series and estimate the season. You can alternatively use proc spectra, but proc timeseries is much more comprehensive.
ods graphics on;
proc timeseries data=sashelp.air plots=(decomp sc sa cycles sic periodogram);
id date interval=month;
var air;
run;
The results clearly indicate a season of 12 in our example.

Related

Renaming date variable to perform an intck to calculate day difference

I have this dataset and need to calculate the days' difference between each dose date per period. How do I label each period study date so I can carry out an intck to calculate the days' difference per subject (ptno)
Just use the DIF() function to calculate the change in value for your date variable. SAS stores dates as number of days so the difference will be the number of days between the two observations. You could then test if the difference is 7 days or not.
data want;
set have;
by ptno period;
interval = dif(ex_stadt);
if first.ptno then interval=0;
seven_days = (interval = 7) ;
run;
The code of Tom works very well. I simulated the data set with a few rows based in
the sample showed above and it's OK.
Only thing absent is PROC SORT. If the data set is huge the log will exhibit an error.
proc sort data=have;
by ptno period;
run;

I need to find the confidence intervals for proportions using stratified data

I'm trying to report estimates of proportions of subjects of a stratified random sample
I've tried every website I can find for SAS proc surveymeans, and I don't understand what I'm doing wrong.
data b;
set Data;
keep id texting section;
run;
proc surveyselect data=b out=samp_b method=srs n=(15,12,10,8)
seed=123;
strata section;
run;
proc surveymeans data=samp_b;
strata section;
weight SamplingWeight;
var texting;
run;
I should get confidence intervals for the strata, but they are not showing up. Also I need confidence intervals for the proportions!
I don't know what version of SAS/STAT you are using, but per SAS/STAT 9.2 Proc Surveymeans documentation pages, you can do one or both of the following:
1) Add the relevant statistics keywords to the proc surveymeans statement
https://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_surveymeans_sect007.htm
In the PROC SURVEYMEANS statement, you also can use statistic-keywords to specify statistics for the procedure to compute. Available statistics include the population mean and population total, together with their variance estimates and confidence limits. You can also request data set summary information and sample design information.
The available statistics keywords are listed and described on these pages:
https://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_surveymeans_a0000000238.htm
https://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_surveymeans_sect007.htm#statug.surveymeans.smeanskeys
So, to print the 95% two-sided confidence interval for the mean, you would add CLM to the end of your Proc Surveymeans statement.
2) Save the Statistics table with confidence intervals to a separate SAS dataset with an additional ods output Statistics=MyStat; statement, per these instructions.

Percent split with where condition in SAS

I am new to SAS and data analytics in general. So sorry if my question sound too dumb.
I have a dataset of brand medicine with three variables. Variable 1 contains the drug name, variable two contains whether that drug is BRANDED, Generic or Brand-Generic and variable 3 contains the total sale of that drug.
What I want is percent split the BRANDED, GENERIC AND BRANDED GENERIC drugs among total drug sale. The final output should look like
Branded : 35%
Generic : 25%
Branded-Generic : 40%
Any help with a sas code which would do that is greatly appreciated thank you.
So you want a % sale split! You can try using SQL (proc sql) to get your desired answer.
proc sql;
create table want as
select drug_type, sum(total_sale) as tot_sale
from have
group by drug_type;
create table want as
select *, tot_sale/sum(tot_sale) as percent_sale format=percent10.2
from want;
quit;
I created a table 'want' that will have total sale for each drug type. Using that table, I created a column that has the calculated sale percentage and formatted it to a percent (for easy view).
Of course, there are other ways of doing it, like using proc summary or proc freq or even a data step. But as a beginner, I guess starting out with SQL would be a good decision.

Getting average price across stores and across months

I am trying to use the proc tabulate procedure to arrive at the average price of some configurable items, across stores and across months. Below is the sample data set, which I need to process
Configuration|Store_Postcode|Retail Price|month
163|SE1 2BN|455|1
320|SW12 9HD|545|1
23|E2 0RY|515|1
The below code is displaying the month wise average price for each configuration.
proc tabulate data=cs2.pos_data_raw;
class configuration store_postcode month;
var retail_price;
table configuration,month*MEAN*retail_price;
run;
But can I get this grouped one more level - at the Store Post code level? I modified the code to read as shown below, but executing this is crashing the system!
proc tabulate data=cs2.pos_data_raw;
class configuration store_postcode month;
var retail_price;
table configuration,store_postcode*month*MEAN*retail_price;
run;
Please advice if my approach is incorrect, or what am I doing wrong in proc tabulate so much so that it crashes the system.
I am not sure if this exactly answers your question since I am new to SAS, but when I switched store_postcode*month*MEAN*retail_price to month*store_postcode*MEAN*retail_price , it worked without crashing. I am just guessing that the reason for this is because your data only contains 1 value for month and multiple for postal code, therefore month is the most general level of categorization then it becomes more specific.
On a side note, I tried to format the table in another way also to segment the data by postal code:
proc tabulate data=pos_data_raw;
class configuration store_postcode month;
var retail_price;
table store_postcode*configuration, month*MEAN*retail_price;
run;
The output looks like this:
where the table will have postal code and configuration id on the left and month and retail price on top.

Plotting seasonal data, with years on top of each other in SAS?

Hi I have a time series data table from October 2013 to October 2016. I would like to plot the time series from October 2013 to November 2014, October 2014 to November 2015, and October 2015 to November 2016 on top of each other on the same graph to analyze any seasonal trends.
My idea is to create separate data tables with each subsegment, but is there an easier way to do this in SAS?
This is an example of the data table I want to plot the seasonality of.
The workflow I think here is to add a group variable that indicates, say, year, which has the same value for all rows you want plotted in one plot-grouping.
Then you use the group statement in whatever plot type you want. Something like:
data stocks_years;
set sashelp.stocks;
date_year = intck('YEAR','01AUG1986'd,date,'c')+1986;
date_month= month(date);
run;
proc sgplot data=stocks_years;
vline date_month/response=close group=date_year stat=mean;
run;
This is an example of doing that to see the average close per month of the three stocks in the SASHELP.STOCKS dataset. It is a terrible plot of course but it should give you some idea of what it would look like. Each of those differently colored lines is from a different year (aug->jul being defined as a year, with the number being the year number of aug).
The lead off provided by Joe gave me everything I needed. Here is the completed code for anyone else's reference.
%macro Plot_Seasonal_Worse_TP(tbl_name, tp, cutoff_date);
/*tp = transition probability */
proc sql;
create table &tbl_name._trim as
SELECT *
FROM &tbl_name
WHERE asofdt > &cutoff_date;
run;
data &tbl_name._trim;
set &tbl_name._trim;
date_year = intck('YEAR','01NOV2013'd,asofdt,'c')+2014;
date_month= MOD(month(asofdt)+2, 12); /* move november and december of previous year to front of time series */
run;
proc sgplot data=&tbl_name._trim;
vline date_month/response=&tp group=date_year;
title &tbl_name (&tp);
run;
%mend Plot_Seasonal_Worse_TP;
Output looks like this as well.