Averages in SAS with dates using months - sas

Let's say I have 50 years of data for each day and month. I also have a column which lists the max rainfall for each day of that dataset. I want to be able to compute the average monthly rainfall and standard deviation for each of those 50 years. How would I accomplish this task? I've considered using PROC MEANS:
PROC MEANS DATA = WORK.rainfall;
BY DATE;
VAR AVG(max_rainfall);
RUN;
but I'm unfamiliar on how to let SAS understand that I want to be using the MM of the MMDDYY format to indicate where to start and stop calculating those averages for each month. I also do not know how I can tell SAS within this PROC MEANS statement on how to format the data correctly, using MMDDYY10. This is why my code fails.
Update: I've also tried using this statement,
proc sql;
create table new as
select date,count(max_rainfall) as rainfall
from WORK.rainfall
group by date;
create table average as
select year(date) as year,month(date) as month,avg(rainfall) as avg
from new
group by year,month;
quit;
but that doesnt solve the problem either, unfortunately. It gives me the wrong values, although it does create a table. Where in my code could I have gone wrong? Am I telling SAS correctly that add all the rainfall's in 30 days and then divide it by the number of days for each month? Here's a snippet of my table.

You can use a format to group the dates for you. But you should use a CLASS statement instead of a BY statement. Here is an example using the dataset SASHELP.STOCKS.
proc means data=sashelp.stocks nway;
where date between '01JAN2005'd and '31DEC2005'd ;
class date ;
format date yymon. ;
var close ;
run;

Related

Renaming date variable to perform an intck to calculate day difference

I have this dataset and need to calculate the days' difference between each dose date per period. How do I label each period study date so I can carry out an intck to calculate the days' difference per subject (ptno)
Just use the DIF() function to calculate the change in value for your date variable. SAS stores dates as number of days so the difference will be the number of days between the two observations. You could then test if the difference is 7 days or not.
data want;
set have;
by ptno period;
interval = dif(ex_stadt);
if first.ptno then interval=0;
seven_days = (interval = 7) ;
run;
The code of Tom works very well. I simulated the data set with a few rows based in
the sample showed above and it's OK.
Only thing absent is PROC SORT. If the data set is huge the log will exhibit an error.
proc sort data=have;
by ptno period;
run;

Relabel Year Month Variable To Inform Proc Freq Order

I have a date time variable 'chg_date_of_svc' and would like to make this variable a month_year variable. To do this, I simply wrote the follow code:
data combined1;
set combined;
MONTH_YEAR=chg_date_of_svc;
format MONTH_YEAR monyy7.;
run;
I would then like to use the month_year variable in a proc freq statement; however, the month_years do not appear in chronological order when using the following code. For example, January 2019 appears before December 2018 in the tables the proc freq statement produces.
This may not be the easiest solution but I suspect I have to relabel the specific year_months so they appear in the the correct chronological order?
proc freq data = combined1 order=data;
table EM_Charge*MONTH_YEAR;
run;
Thank you for the help.
You requested that it list of columns in the order that they first appear in the input dataset. If you want them in chronological order then remove the ORDER=DATA option. If you must use ORDER=DATA then sort the data first.

Add row percentage within top column in proc tabulate (SAS)

I am using Proc tabulate in SAS and I want to make a table like this example:enter image description here
As you can see, I want to calculate the rowpctn within each year. My problem is that i don't know where to write the rowpctn in the code, and after searching the web for an hour, I still haven't been able to find the answer.
My code looks like this:
Proc tabulate Data=Have;
class year gender question_1;
table gender, year*question_1;
run;
Where do I have to state the rowpctn to get the desired result?
While waiting for a response, I kept searching the internet for an answer, and I managed to find a solution myself.
The code is below:
Proc tabulate Data=Have;
class year gender question_1;
table gender*pctn<question_1>, year*question_1;
run;

Getting average price across stores and across months

I am trying to use the proc tabulate procedure to arrive at the average price of some configurable items, across stores and across months. Below is the sample data set, which I need to process
Configuration|Store_Postcode|Retail Price|month
163|SE1 2BN|455|1
320|SW12 9HD|545|1
23|E2 0RY|515|1
The below code is displaying the month wise average price for each configuration.
proc tabulate data=cs2.pos_data_raw;
class configuration store_postcode month;
var retail_price;
table configuration,month*MEAN*retail_price;
run;
But can I get this grouped one more level - at the Store Post code level? I modified the code to read as shown below, but executing this is crashing the system!
proc tabulate data=cs2.pos_data_raw;
class configuration store_postcode month;
var retail_price;
table configuration,store_postcode*month*MEAN*retail_price;
run;
Please advice if my approach is incorrect, or what am I doing wrong in proc tabulate so much so that it crashes the system.
I am not sure if this exactly answers your question since I am new to SAS, but when I switched store_postcode*month*MEAN*retail_price to month*store_postcode*MEAN*retail_price , it worked without crashing. I am just guessing that the reason for this is because your data only contains 1 value for month and multiple for postal code, therefore month is the most general level of categorization then it becomes more specific.
On a side note, I tried to format the table in another way also to segment the data by postal code:
proc tabulate data=pos_data_raw;
class configuration store_postcode month;
var retail_price;
table store_postcode*configuration, month*MEAN*retail_price;
run;
The output looks like this:
where the table will have postal code and configuration id on the left and month and retail price on top.

Fill in missing values with mode in SAS

I think the logic to replace missingness is quite clear but when I dump it to SAS I find it too complicated to start with.
Given no code was provided, I'll give you some rough directions to get you started, but put it on you to determine any specifics.
First, lets create a month column for the data and then calculate the modes for each key for each month. Additionally, lets put this new data in its own dataset.
data temp;
set original_data;
month = month(date);
run;
proc univariate data=temp modes;
var values;
id key month;
out=mode_data;
run;
However, this procedure calculates the mode in a very specific way that you may not want (defaults to the lowest in the case of a tie and produces no mode if nothing occurs at least twice) Documentation: http://support.sas.com/documentation/cdl/en/procstat/63104/HTML/default/viewer.htm#procstat_univariate_sect027.htm
If that doesn't work for you, I recommend using proc sql to get a count of each key, month, value combination and calculating your own mode from there.
proc sql;
create table mode_data as select distinct
key, month, value, count(*) as distinct_count
from temp
group by key, month, value;
quit;
From there you might want to create a table containing all months in the data.
proc sql;
create table all_months as select distinct month
from temp;
quit;
Don't forget to merge back in any missing months from to the mode data and use the lag or retain functions to search previous months for "old modes".
Then simply merge your fully populated mode data back to the the temp dataset we created above and impute the missing values to the mode when value is missing (i.e. value = .)
Hope that helps get you started.