I have a dataset with value for each day of particular month like 01JAN2020, 03JAN2020, 06JAN202, 01FEB2020, 04FEB2020. I need to count rows in particular month year. When I use count(*) function and group it by particular column I received only daily rows counting. Which function will show the sum of rows number in particular month rather than daily sum.
Thank you,
In SQL in order to compute an aggregate count for the months of the dates, the GROUP BY should be by the date's month. The month (or 1st day of the month) can be computed using the INTNX function, or the YEAR and MONTH functions.
Proc SQL
Example:
data have;
call streaminit(2021); * initialize random number stream;
do date = '01jan2020'd to today();
do _n_ = 1 to rand('integer', 5); * random, up to 5 repeats per day;
output;
end;
end;
format date date9.;
run;
proc sql;
create table want as
select
intnx('month', date, 0) as month format=yymon7.
, count(*) as count
from
have
group by
calculated month /* calculated is SAS SQL special feature */
;
Proc MEANS
You can also use Proc MEANS and format the date as a month representation. The procedure will group according to the formatted value.
Example:
proc means nway data=have noprint ;
format date yymon7.;
class date;
var date;
output out=want N=count;
run;
PROC FREQ + a format for how you want your date displayed. The first example is by year month, the second is just by month name.
*by year month;
proc freq data=sashelp.stocks;
format date yymmn6.;
table date;
run;
*by month name;
proc freq data=sashelp.stocks;
format date monname.;
table date;
run;
Related
I have a dataset where I have several variables with suffixes that correspond to given dates. I want to replace the suffixes with the dates to make my output tables more user friendly.
Here is a sample of my code
the fields in my sales dataset are
product number_of_sales_1 number_of_sales_2 number_of_sales_3 revenue_1 revenue_2 revenue_3 tax_1 tax_2 tax_3
The suffixes 1,2,3 correspond to dates which are held in a second dataset with the following format
dates
id date
1 01Apr
2 01May
3 01Jun
I want to bulk replace the suffixes with the dates so my fields in sales become
product number_of_sales_01Apr number_of_sales_01May number_of_sales_01Jun revenue_01Apr revenue_01May revenue_01Jun tax_01Apr tax_01May tax_01Jun
Both the number of dates and the numberof metrics in sales are dynamic so I can't just hardcode in the the code.
I assume your datasets look like below:
data sales;
product="abc";number_of_sales_1=1;number_of_sales_2=2;number_of_sales_3=3;
revenue_1=1000;revenue_2=2000;revenue_3=3000;tax_1=100;tax_2=200;tax_3=300;
run;
data dates;
id=1;date="01Apr";output;id=2;date="01May";output;id=3;date="01Jun";output;
run;
1st Step - Finding out the dates variables which needs to be renamed
proc contents data=sales out=sales_temp(keep=name) noprint; run;
data sales_temp1;
length check_date_vars $1. id 8.;
set sales_temp;
check_date_vars=compress(substr(name,length(name)));
temp=notdigit(check_date_vars);
if temp=0 then id=check_date_vars;
run;
2nd step - Merging the above dataset with the datset which contains the formats, to create a mapping between old names and new names and creating macro variables out of it
proc sort data=sales_temp1; by id; run;
proc sort data=dates; by id; run;
data sales_temp_date;
merge sales_temp1(in=a) dates(in=b);
by id;
if a and b;
new_name=substr(name,1,length(name)-1)||date;
run;
proc sql noprint;
select count(*) into :num_vars separated by " " from sales_temp_date;
quit;
proc sql noprint;
select name into:old_name1 - :old_name&num_vars. from sales_temp_date;
select new_name into:new_name1 - :new_name&num_vars. from sales_temp_date;
quit;
3rd Step - Renaming the variables
%macro rename();
proc datasets library=work nolist;
modify sales;
rename
%do i=1 %to &num_vars.;
&&old_name&i.= &&new_name&i.
%end;
;
run;
%mend;
%rename;
How can I print (and export to file) monthly and weekly average of value? The data is stored in a library and the form is following:
Obs. Date Value
1 08FEB2016:00:00:00 29.00
2 05FEB2016:00:00:00 29.30
3 04FEB2016:00:00:00 29.93
4 03FEB2016:00:00:00 28.65
5 02FEB2016:00:00:00 28.40
(...)
3078 08MAR2004:00:00:00 32.59
3079 05MAR2004:00:00:00 32.75
3080 04MAR2004:00:00:00 32.05
3081 03MAR2004:00:00:00 31.82
EDIT: I somehow managed to get the monthly data but I'm returning average for each month separately. I would to have it done as one result, namely Month-Average+export it to a file or a data set. And still I have no idea how to deal with weeks.
%macro printAvgM(start,end);
proc summary data=sur1.dane(where=(Date>=&start
and Date<=&end)) nway;
var Value;
output out=want (drop=_:) mean=;
proc print;
run;
%mend printAvgM;
%printAvgM('01jan2003'd,'31jan2003'd);
EDIT2: Here is my code, step by step:
libname sur 'C:\myPath';
run;
proc import datafile="C:\myPath\myData.csv"
out=SUR.DANE
dbms=csv replace;
getnames=yes;
run;
proc sort data=sur.dane out=sur.dane;
by Date;
run;
libname sur1 "C:\myPath\myDB.accdb";
run;
proc datasets;
copy in=sur out=sur1;
select dane;
run;
data sur1.dane2;
set sur1.dane;
date2=datepart(Date);
format date2 WEEKV11.;
run;
The last step results in NOTE: SAS variable labels, formats, and lengths are not written to DBMS tables. and the format of dane2 variable is DATETIME19..
Ok, it's small enough to handle easily then. I would recommend first converting your datetime variable to a date variable using DATEPART() function and then use a format within PROC MEANS. You can look up the WEEKU and WEEKV formats to see if they meet your needs. The code below should be enough to get you started. You could do the monthly without the date conversion, but I couldn't find a weekly format for the datetime variable.
*Fake data generated;
data fd;
start=datetime();
do i=1 to 3000000 by 120;
datetime=start+(i-1)*30;
var=rand('normal', 25, 5);
output;
end;
keep datetime var;
format datetime datetime21.;
run;
*Get date variable;
data fd_date;
set fd;
date_var = datepart(datetime);
date_month = put(date_var, yymon7,);
Date_week = put(date_var, weekv11.);
run;
*Monthly summary;
proc means data=fd_date noprint nway;
class date_var;
var var;
output out=want_monthly mean(var)=avg_var std(var)=std_var;
format date_var monyy7.;
run;
*Weekly summary;
proc means data=fd_date noprint nway;
class date_var;
var var;
output out=want_weekly mean(var)=avg_var std(var)=std_var;
format date_var weekv11.;
run;
Replace date_var with the new monthly and weekly variables. Because these are character variables they won't sort properly.
I am looking to automate a daily report for my company but I have run in to a bit of trouble. The report gets updated only on the 2nd working day of each month. I found some code on the SAS website which works out what the 2nd working day of any month is.
data scdwrk;
/* advance date to the first day of the month using the INTNX function */
second=intnx('month',today(),0);
/* determine the day of the week using the WEEKDAY function */
day=weekday(second);
/* if day=Monday then advance by 1 */
if day=2 then second+1;
/* if day=Sunday then advance by 2 */
else if day=1 then second+2;
format second date9.;
run ;
I have also set a flag that compares todays date to the date from this generated by this piece of code.
I now need to find a way that if the code is run on the first working day of the month then it runs a particular set of macro date variables
%let start_date="&prevmnth;
%let end_date= &endprevmnth;
%let month= &prevyearmnth;
and then when its run on the 2nd working day of the month it uses the other set of macro date variables (calender month)
%let start_date="&currmnth;
%let end_date= &endcurrmnth;
%let month= &curryearmnth;
Any help on this would be greatly appreciated.
I have some recent code that does just this. Here is how I tackled it.
First, create a table of holidays. This can be maintained yearly.
Second, create a table with the first 5 days of the month that are not weekend days.
Third, delete holidays.
Finally, get the second value in the data set.
data holidays;
format holiday_date date9.;
informat holiday_date date9.;
input holiday_date;
datalines;
01JAN2015
19JAn2015
16FEB2015
03APR2015
25MAY2015
03JUL2015
07SEP2015
26NOV2015
25DEC2015
;
data _dates;
firstday = intnx('month',today(),0);
format firstday date date9.;
do date=firstday to firstday+5;
if 1 < weekday(date) < 7 then
output;
end;
run;
proc sql noprint;
delete from _dates
where date in (select holiday_date from holidays);
quit;
data _null_;
set _dates(firstobs=2);
call symput("secondWorkDay",put(date,date9.));
stop;
run;
%put &secondWorkDay;
Hi does anyone know how to calculate the standard deviation over the next four quarters for each quarter? Thanks :)
My attempt is below:
date1 is the sas date for the quarter in a year
Proc sql ; create table th.totalroll as
Select distinct permco, date1 ,
(select std(adjret) from th.returns1 where qtr between
intnx('quarter',qtr(date),0) and intnx('quarter', qtr(date),+3)) as
TOTALroll From th.returns1 group by permco ,date1;
QUIT;
It's hard to tell how close you are because I'm not entirely certain what your data looks like, but here's an example assuming you have more than one date in each quarter. Create sample data:
data have;
format date date9.;
do m = 1 to 128;
date = intnx('month','01JAN2008'd,m-1);
amount = round(ranuni(date)*10);
output;
end;
drop m;
run;
Using proc sql, create quarter variable (you might already have this variable?) and group by this variable. Use a having clause to restrict results to the first date of each quarter.
proc sql;
create table want as
select
yyq(year(t1.date),qtr(t1.date)) as quarter format=yyq.,
(select std(t2.amount)
from have t2
where t2.date >= yyq(year(t1.date),qtr(t1.date))
and t2.date < intnx('quarter',yyq(year(t1.date),qtr(t1.date)),4)) as stddev
from
have t1
group by
calculated quarter
having
t1.date = min(t1.date)
;
quit;
You should be able to adapt this to work for your data.
You can use proc expand if your dataset is already in quarterly. So something like this:
proc expand data=th.returns1
out=th.totalroll
from=quarter
to=quarter;
by permco date1;
id date;
convert adjret=TOTALroll / transformout=( MOVSTD 4 );
run;
Don't forget to sort you data first. And MOVSTD gives you backward moving standard deviation. You may need to shift the output stream back by 4 quarters if you want the forward moving STD.
Transformation Operations for proc expand:
http://support.sas.com/documentation/cdl/en/etsug/60372/HTML/default/viewer.htm#etsug_expand_sect026.htm
I have a SAS dataset similar to the one created here.
data have;
input date :date. count;
cards;
20APR2012 10
20APR2012 20
20APR2012 20
27APR2012 15
27APR2012 5
;
run;
proc sort data=have;
by date;
run;
I want to create a column containing the sum for each date, so it would look like
date total
20APR2012 50
27APR2012 20
I have tried using first. but I think my syntax is off. Thanks.
This is what proc means is for.
proc means data=have;
class date;
var count;
output out=want sum=total;
run;
The code below works to give you your desired result.
proc sql;
create table wanted_tab as
select
date format date9.,
sum(count) as Total
from have
group by date;
;
quit;