SAS: Printing monthly and weekly average - sas

How can I print (and export to file) monthly and weekly average of value? The data is stored in a library and the form is following:
Obs. Date Value
1 08FEB2016:00:00:00 29.00
2 05FEB2016:00:00:00 29.30
3 04FEB2016:00:00:00 29.93
4 03FEB2016:00:00:00 28.65
5 02FEB2016:00:00:00 28.40
(...)
3078 08MAR2004:00:00:00 32.59
3079 05MAR2004:00:00:00 32.75
3080 04MAR2004:00:00:00 32.05
3081 03MAR2004:00:00:00 31.82
EDIT: I somehow managed to get the monthly data but I'm returning average for each month separately. I would to have it done as one result, namely Month-Average+export it to a file or a data set. And still I have no idea how to deal with weeks.
%macro printAvgM(start,end);
proc summary data=sur1.dane(where=(Date>=&start
and Date<=&end)) nway;
var Value;
output out=want (drop=_:) mean=;
proc print;
run;
%mend printAvgM;
%printAvgM('01jan2003'd,'31jan2003'd);
EDIT2: Here is my code, step by step:
libname sur 'C:\myPath';
run;
proc import datafile="C:\myPath\myData.csv"
out=SUR.DANE
dbms=csv replace;
getnames=yes;
run;
proc sort data=sur.dane out=sur.dane;
by Date;
run;
libname sur1 "C:\myPath\myDB.accdb";
run;
proc datasets;
copy in=sur out=sur1;
select dane;
run;
data sur1.dane2;
set sur1.dane;
date2=datepart(Date);
format date2 WEEKV11.;
run;
The last step results in NOTE: SAS variable labels, formats, and lengths are not written to DBMS tables. and the format of dane2 variable is DATETIME19..

Ok, it's small enough to handle easily then. I would recommend first converting your datetime variable to a date variable using DATEPART() function and then use a format within PROC MEANS. You can look up the WEEKU and WEEKV formats to see if they meet your needs. The code below should be enough to get you started. You could do the monthly without the date conversion, but I couldn't find a weekly format for the datetime variable.
*Fake data generated;
data fd;
start=datetime();
do i=1 to 3000000 by 120;
datetime=start+(i-1)*30;
var=rand('normal', 25, 5);
output;
end;
keep datetime var;
format datetime datetime21.;
run;
*Get date variable;
data fd_date;
set fd;
date_var = datepart(datetime);
date_month = put(date_var, yymon7,);
Date_week = put(date_var, weekv11.);
run;
*Monthly summary;
proc means data=fd_date noprint nway;
class date_var;
var var;
output out=want_monthly mean(var)=avg_var std(var)=std_var;
format date_var monyy7.;
run;
*Weekly summary;
proc means data=fd_date noprint nway;
class date_var;
var var;
output out=want_weekly mean(var)=avg_var std(var)=std_var;
format date_var weekv11.;
run;
Replace date_var with the new monthly and weekly variables. Because these are character variables they won't sort properly.

Related

SAS iterpolation : alternative to proc expand

From annual data :
I would like to create the the data per day but I can't use the proc expand because the SAS ETS is not available.
Thank you for your suggestions.
Something like this is a basic approach perhaps:
create a list of dates for interpolation
merge have data (shown above, not included in code below)
Plot to see if linear pattern, (looks somewhat exponential/curved)
run linear regression, saving predicted values
plot interpolated values against actual values
data years;
do date='30Jun2017'd to '30Jun2022'd;
output;
end;
run;
data have;
merge years have;
by date;
format date date9.;
run;
proc sgplot data=have;
series x=date y=px_last;
run;
proc reg data=have plots;
model px_last = date;
output out=pred p=predicted_value;
run;
proc sgplot data=pred;
series x=date y=predicted_Value;
scatter x=date y=px_last;
run;

SAS Aggregate to show results for the current year only

I have a table where I compare the results week to week.
I have aggregations of old dates using these functions.
%let date_old=%sysfunc(intnx(year,%sysfunc(Today()),-1,s));
%put &=date_old;
proc format;
value vintf low-&date_old = 'OLD' other=[yymmd7.];
run;
/*agregujemy wyniki do daty vintf jako old*/
proc summary data=tablea_new nway;
class policy_vintage;
format policy_vintage vintf.;
var AKTYWNE WYGASLE;
output out=newtabe sum=;
And I would like to do exactly the same, only to aggregate the dates to show the yearly range, i.e. 2021-01-2022-01. Or the current year 2021-01-2021-12. Is the following sample okay? What's the best way to do this?
%let date_future=%sysfunc(intnx(year,%sysfunc(Today()),+12,s));
%put &=date_future;
proc format;
value vintfutr +&date_future= 'FUTURE' other=[yymmd7.];
run;
%let date_old=%sysfunc(intnx(year,%sysfunc(Today()),-1,s));
%let date_future=%sysfunc(intnx(year,%sysfunc(Today()),+1,s));
proc format;
value vintf
low-&date_old = 'OLD'
&date_future-high = 'FUTURE'
other=[yymmd7.]
;
run;

Adding a column calculated from subset of another column

I have a SAS dataset similar to the one created here.
data have;
input date :date. count;
cards;
20APR2012 10
20APR2012 20
20APR2012 20
27APR2012 15
27APR2012 5
;
run;
proc sort data=have;
by date;
run;
I want to create a column containing the sum for each date, so it would look like
date total
20APR2012 50
27APR2012 20
I have tried using first. but I think my syntax is off. Thanks.
This is what proc means is for.
proc means data=have;
class date;
var count;
output out=want sum=total;
run;
The code below works to give you your desired result.
proc sql;
create table wanted_tab as
select
date format date9.,
sum(count) as Total
from have
group by date;
;
quit;

SAS: PROC IMPORT: CSV WITH DATES AS VAR NAMES

I'm importing CSV data in the following format:
SEDOL,12/08/2009,13/08/2009,14/08/2009,17/08/2009,18/08/2009
B1YVN39,7.8431,7.8431,7.8431,7.8431,7.598
B00G7R3,3.8,3.61,3.81,3.81,3.81
2965237,4.5351,4.5351,4.5351,4.5351,4.5351
2554345,7.355,7.355,7.355,7.355,7.355
I'm using the following command:
PROC IMPORT OUT= want
DATAFILE= have
DBMS=CSV REPLACE;
RUN;
Then transposing the data to long format, as follows:
PROC SORT DATA=want OUT=want; BY SEDOL;RUN;
proc transpose data=want out=transp;
by SEDOL;
run;
proc print; run;
How can I import the dates correctly formatted and change the variable type from default to date?
Importing and transposing are handy procedures, but if you understand your data well, a little data step program can deal with this in one step:
data want(keep=sedol v_date v_value);
infile have dsd dlm=',' truncover;
informat sedol $8. d1-d50 ddmmyy10. v1-v50 8.;
format v_date yymmdd10.;
array d(50) d1-d50;
array v(50) v1-v50;
/* Retain the date values and the count of dates */
retain d1-d50 idx;
/* Read header */
if _n_ = 1 then do;
input sedol d1-d50;
/* loop to find how many date columns there are */
do idx=1 to 50 while(d(idx) ne .);
end;
idx = idx - 1; /* must subtract one here */
delete;
end;
/* Read data lines */
input sedol v1-v50;
do i=1 to idx;
v_date = d(i);
v_value = v(i);
output;
end;
run;
As long as your input file is exactly as you describe (a header record with a leading ID variable less than 8 characters followed by some number of date values representing columns), this will process up to 50 measurements. It should be easy enough to modify if your needs change.
I would suggest in this case importing separately data and headers.
First, we import data:
PROC IMPORT OUT= want
DATAFILE= "C:\have.csv"
DBMS=CSV REPLACE;
getnames=no;
datarow=2;
RUN;
Then we import only the first row with variables' names:
options obs=1;
PROC IMPORT OUT= header
DATAFILE= "C:\have.csv"
DBMS=CSV REPLACE;
getnames=no;
RUN;
options obs=max;
Then we transpose row with headers into column and "mask" illegal (as SAS-names) values - add letter (doesn't matter which one, I chose 'D') as the first character and replace all slashes '/' to underscores '_':
proc transpose data=header out=header(drop=_name_);var _all_;run;
data header;
set header;
if anydigit(substr(COL1,1,1)) then COL1=cats("D",COL1);
COL1=translate(COL1,"_","/");
run;
Put this new 'cleaned' column names into a macrovariable:
proc sql noprint;
select COL1 into :names separated by ' '
from header;
quit;
And generate DATA-step for renaming using CALL EXECUTE routine:
data _null_;
dsid=open("want","i");
num=attrn(dsid,"nvars");
call execute("data want;");
call execute("set want;");
call execute("rename");
do i=1 to num;
call execute(varname(dsid,i)||"="||scan("&names",i," "));
end;
call execute(";run;");
rc=close(dsid);
run;
Now your original SORT and TRANSPOSE:
PROC SORT DATA=want OUT=want; BY SEDOL;RUN;
proc transpose data=want out=transp;
by SEDOL;
run;
And at last 'unmask' those dates back (deleting first D and replacing _ to /), and covert them to real dates with INPUT(). RETAIN statement is added just to put the new variable DATE at the second place right after SEDOl.
data transp;
retain SEDOL date;
set transp;
substr(_name_,1,1)='';
_name_=translate(_name_,"/","_");
date=input(strip(_name_),ddmmyy10.);
drop _name_;
format date ddmmyy10.;
run;

how to customize porc freq to deal with missing values

I have the following code
data work.customBins;
retain fmtname 'bins' type 'n';
do binStart=-2.5 to 2.45 by 0.05;
binEnd=binStart+0.05;
difference=cat(binStart," to ",binEnd);
output;
end;
run;
proc format library=work cntlin=work.customBins; run;
proc freq data=work.myData;
table variable /missing;
format variable bins.;
run;
This code works properly everything is fine my only issue is If I have bins for example -1.45 to -1.40 that dont have any values proc freq disregards them. I want the cumulative frequency of the pervious bin to be displayed in the bins that have no values for example
-1.50 to -.145 cumulative Freq = 2%
-.1.45 to -1.4 has no values but the cumulative Freq for this should be 2%
I have also tried doing this
data work.combined;
set work.myData (in=a) work.customBins (in=b)
if a then cont=1;
if b then cont=0;
run;
proc freq data=work.combined;
table variable /missing;
format variable bins.;
weight cont/zeros;
run;
But this also does not work
myData just contains a single variabrle called variable which is decimal numbers in the range of -2.45 to 2.45
Here is a working variant:
data work.customBins;
do binStart=-2.5 to 2.45 by 0.05;
binEnd=binStart+0.05;
difference=cat(binStart," to ",binEnd);
output;
end;
run;
proc sql;
create table want as
select difference, count(variable) as count
from customBins left join mydata
on binStart < variable <= binEnd
group by difference
order by binStart;
quit;
proc freq data=want order=data;
tables difference;
weight count / zeros;
run;
Regarding your first variant. Are you sure that your PROC FORMAT works as expected? Dataset used in CNTLIN-option should have variables START, END and LABEL, not voluntarily named ones. Anyway, it wouldn't work because PROC FREQ uses only values that you do have in mydata dataset, doesn't matter how many other labels you defined in your format.