Related
We have a code that we use to create quarterly reports of projects. There is a piece of code, a do loop, that takes the startdate and enddate of each project in our dataset and creates an observation for each month and year that the project took place in. For example if we have a project called "Employment Help" with a startdate value of 01JAN2022 and an enddate value of 01APR2022, the do loop will create 4 observations for this project with the month and year values of 1 2022, 2 2022, 3 2022, and 4 2022. We use this to count how many projects happened during our quarters. We are running into an issue where the do loop is dropping projects and not giving them a month or year value and we are losing projects in our count because of this. The dates are all in the same format.
Here is an example of some data that is pulled in, EXAMPLE 2 is properly pulled into the do loop, EXAMPLE 1 does not get pulled through.
Here is the code:
**data test2;
set users3;
do i = 0 to (year(enddate)-year(startdate));
year = year(startdate)+i;
end;
do i = 0 to (month(enddate)-month(startdate));
month = month(startdate)+i;
drop i;
output;
end;
run;**
Consider the following example:
data have;
input project$ startdate:date9. enddate:date9.;
format startdate enddate date9.;
datalines;
A 01JAN2022 01APR2022
B 01MAR2022 01JUN2022
C 01NOV2022 01JAN2023
;
run;
The third row will fail to run because the difference between the start month number and end month number is negative (1 - 11). Instead of doing two loops, one for year and one for month, do a single loop for all of the months from the start date. Use intnx() to generate your months using startdate as the reference month. i will offset each month from the start date. For example:
code output
intnx('month', '01JAN2022'd, 0) 01JAN2022
intnx('month', '01JAN2022'd, 1) 01FEB2022
intnx('month', '01JAN2022'd, 2) 01MAR2022
Since you're incrementing by exactly one month for each date, you can get the year and month number in a single loop.
data want;
set have;
do i = 0 to intck('month', startdate, enddate);
month = month(intnx('month', startdate, i) );
year = year(intnx('month', startdate, i) );
output;
end;
drop i;
run;
Your code doesn't seem to handle the cross of years, ie if a project started in 2021 and ended in 2022.
This should get you closer.
data have;
input startdate : date9. enddate : date9.;
format startdate enddate date9.;
cards;
01Jan2022 01Apr2022
01Sep2021 01Apr2022
;;;
run;
data want;
set have;
nmonths = intck('month', startdate, enddate) +1 ;
date = startdate;
do i = 1 to nmonths;
month = month(date);
year = year(date);
date = intnx('month', startdate, i, 'b');
output;
end;
run;
I have a time series data. Data looks like the following:
date variable
01-Dec-2012 0.1
02-Dec-2012 0.1
03-Dec-2012 0.1
04-Dec-2012 0.1
05-Dec-2012 0.1
...
20-Dec-2012 0.1
21-Dec-2012 0.1
22-Dec-2012 0.1
I want to create a dummy variable which equals to 1 if date is in December and it is before or at the second Thursday. It equals to 0 if date is in December and after the second Thursday. It equals to missing if month(date) ^= 12.
Can anyone teach me how to identify the second Thursday of December and solve this problem please.
NWKDOM
Third Friday in a month, where the month /year are extracted from a SAS date.
Friday3 = NWKDOM(3, 6, month(sas_date), year( sas_date));
http://support.sas.com/documentation/cdl/en/lefunctionsref/63354/HTML/default/viewer.htm#p1kdveu0ry8ltxn1m3um2ntxs7d5.htm
Here's another approach for people who don't have SAS 9.3+ and can't use nwkdom to do this:
Dummy = intck('week.5',intnx('month',date,0)-1,date-1) < 2;
How this works, from the inside working outwards:
intnx is used to find the first day of the month.
Subtract 1 to get the last day of the previous month.
Subtract 1 from date to get yesterday's date.
Using intck, count the number of Thursdays (week.5) in between these two dates. N.B. this includes yesterday if it was a Thursday, but not the last day of the previous month if that was a Thursday.
If this number is less than 2, date is currently less than or equal to the second Thursday of the month.
Sample usage:
data _null_;
do date = '01dec2011'd to '30dec2011'd;
Dummy = intck('week.5',intnx('month',date,0)-1,date-1) < 2;
put date weekdate. +1 dummy;
end;
run;
EDIT: now works correctly when the first day of the month is a Thursday.
Think this will solve your problem. Have a feeling there is a nicer solution for this but it should work.
data YourData;
format date date9. ;
do i=1 to 100 ;
date=intnx('day', '17oct03'd,i);
var=rand('uniform');
output;
end;
drop i;
run;
Data Find;
set YourData;
Month=month(date);
day=day(date);
Weekday=WEEKDAY(date);
/* weekday=5 this is thursday */
if weekday=5 and month=12 then flag=1;
/* flag2 retains the value */
flag2+flag;
if month=12 and flag2 < 2 then Dummy=1;
else if month=12 and flag2=2 and flag=1 then Dummy=1;
else if month=12 then Dummy=0;
else Dummy=.;
run;
I have monthly data with several observations per day. I have day, month and year variables. How can I retain data from only the first and the last 5 days of each month? I have only weekdays in my data so the first and last five days of the month changes from month to month, ie for Jan 2008 the first five days can be 2nd, 3rd, 4th, 7th and 8th of the month.
Below is an example of the data file. I wasn't sure how to share this so I just copied some lines below. This is from Jan 2, 2008.
Would a variation of first.variable and last.variable work? How can I retain observations from the first 5 days and last 5 days of each month?
Thanks.
1 AA 500 B 36.9800 NH 2 1 2008 9:10:21
2 AA 500 S 36.4500 NN 2 1 2008 9:30:41
3 AA 100 B 36.4700 NH 2 1 2008 9:30:43
4 AA 100 B 36.4700 NH 2 1 2008 9:30:48
5 AA 50 S 36.4500 NN 2 1 2008 9:30:49
If you want to examine the data and determine the minimum 5 and maximum 5 values then you can use PROC SUMMARY. You could then merge the result back with the data to select the records.
So if your data has variables YEAR, MONTH and DAY you can make a new data set that has the top and bottom five days per month using simple steps.
proc sort data=HAVE (keep=year month day) nodupkey
out=ALLDAYS;
by year month day;
run;
proc summary data=ALLDAYS nway;
class year month;
output out=MIDDLE
idgroup(min(day) out[5](day)=min_day)
idgroup(max(day) out[5](day)=max_day)
/ autoname ;
run;
proc transpose data=MIDDLE out=DAYS (rename=(col1=day));
by year month;
var min_day: max_day: ;
run;
proc sql ;
create table WANT as
select a.*
from HAVE a
inner join DAYS b
on a.year=b.year and a.month=b.month and a.day = b.day
;
quit;
/****
get some dates to play with
****/
data dates(keep=i thisdate);
offset = input('01Jan2015',DATE9.);
do i=1 to 100;
thisdate = offset + round(599*ranuni(1)+1); *** within 600 days from offset;
output;
end;
format thisdate date9.;
run;
/****
BTW: intnx('month',thisdate,1)-1 = first day of next month. Deduct 1 to get the last day
of the current month.
intnx('month',thisdate,0,"BEGINNING") = first day of the current month
****/
proc sql;
create table first5_last5 AS
SELECT
*
FROM
dates /* replace with name of your data set */
WHERE
/* replace all occurences of 'thisdate' with name of your date variable */
( intnx('month',thisdate,1)-5 <= thisdate <= intnx('month',thisdate,1)-1 )
OR
( intnx('month',thisdate,0,"BEGINNING") <= thisdate <= intnx('month',thisdate,0,"BEGINNING")+4 )
ORDER BY
thisdate;
quit;
Create some data with the desired structure;
Data inData (drop=_:); * froget all variables starting with an underscore*;
format date yymmdd10. time time8.;
_instant = datetime();
do _i = 1 to 1E5;
date = datepart(_instant);
time = timepart(_instant);
yy = year(date);
mm = month(date);
dd = day(date);
*just some more random data*;
letter = byte(rank('a') +floor(rand('uniform', 0, 26)));
*select week days*;
if weekday(date) in (2,3,4,5,6) then output;
_instant = _instant + 1E5*rand('exponential');
end;
run;
Count the days per month;
proc sql;
create view dayCounts as
select yy, mm, count(distinct dd) as _countInMonth
from inData
group by yy, mm;
quit;
Select the days;
data first_5(drop=_:) last_5(drop=_:);
merge inData dayCounts;
by yy mm;
_newDay = dif(date) ne 0;
retain _nrInMonth;
if first.mm then _nrInMonth = 1;
else if _newDay then _nrInMonth + 1;
if _nrInMonth le 5 then output first_5;
if _nrInMonth gt _countInMonth - 5 then output last_5;
run;
Use the INTNX() function. You can use INTNX('month',...) to find the beginning and ending days of the month and then use INTNX('weekday',...) to find the first 5 week days and last five week days.
You can convert your month, day, year values into a date using the MDY() function. Let's assume that you do that and create a variable called TODAY. Then to test if it is within the first 5 weekdays of last 5 weekdays of the month you could do something like this:
first5 = intnx('weekday',intnx('month',today,0,'B'),0) <= today
<= intnx('weekday',intnx('month',today,0,'B'),4) ;
last5 = intnx('weekday',intnx('month',today,0,'E'),-4) <= today
<= intnx('weekday',intnx('month',today,0,'E'),0) ;
Note that those ranges will include the week-ends, but it shouldn't matter if your data doesn't have those dates.
But you might have issues if your data skips holidays.
Lets suppose we have the following table ("Purchases"):
Date Units_Sold Brand Year
18/03/2010 5 A 2010
12/04/2010 2 A 2010
22/05/2010 1 A 2010
25/05/2010 7 A 2010
11/08/2011 5 A 2011
12/07/2010 2 B 2010
22/10/2010 1 B 2010
05/05/2011 7 B 2011
And the same logic continues until the end of 2014, for different brands.
What I want to do is calculate the number of Units_Sold for every Brand, in each year. However, I don't want to do it for the calendar year, but for the actual year.
So an example of what I don't want:
proc sql;
create table Dont_Want as
select Year, Brand, sum(Units_Sold) as Unit_per_Year
from Purchases
group by Year, Brand;
quit;
The above logic is ok if we know that e.g. Brand "A" exists throughout the whole 2010. But if Brand "A" appeared on 18/03/2010 for the first time, and exists until now, then a comparison of Years 2010 and 2011 would not be good enough as for 2010 we are "lacking" 3 months.
So what I want to do is calculate:
for A: the sum from 18/03/2010 until 17/03/2011, then from 18/03/2011 until 17/03/2012, etc.
for B: the sum from 12/07/2010 until 11/07/2011, etc.
and so on for all Brands.
Is there a smart way of doing this?
Step 1: Make sure your dataset is sorted or indexed by Brand and Date
proc sort data=want;
by brand date;
run;
Step 2: Calculate the start/end dates for each product
The idea behind the below code:
We know that the first occurrence of the brand in the sorted dataset is the day in which the brand was introduced. We'll call this Product_Year_Start.
The intnx function can be used to increment that date by 365 days, then subtract 1 from it. Let's call this date Product_Year_End.
Since we now know the product's year end date, we know that if the date on any given row exceeds the product's year end date, we have started the next product year. We'll just take the calculated Product_Year_End and Product_Year_Start for that brand and bump them up by one year.
This is all achieved using by-group processing and the retain statement.
data Comparison_Dates;
set have;
by brand date;
retain Product_Year_Start Product_Year_End;
if(first.brand) then do;
Product_Year_Start = date;
Product_Year_End = intnx('year', date, 1, 'S') - 1;
end;
if(Date > Product_Year_End) then do;
Product_Year_Start = intnx('year', Product_Year_Start, 1, 'S');
Product_Year_End = intnx('year', Product_Year_End, 1, 'S');
end;
format Product_Year_Start Product_Year_End date9.;
run;
Step 3: Using the original SQL code, group instead by the new product start/end dates
proc sql;
create table want as
select catt(year(Product_Year_Start), '-', year(Product_Year_End) ) as Product_Year
, Brand
, sum(Units_Sold) as Unit_per_Year
from Comparison_Dates
group by Brand, calculated Product_Year
order by Brand, calculated Product_Year;
quit;
The following code is doing what you ask in a literal sense, for the earliest 'date' of each 'brand', it start aggregating 'unitssold', when hits 365 days mark, it resets count, and starts another cycle.
data have;
informat date ddmmyy10.;
input date units_sold brand $ year;
format date date9.;
cards;
18/03/2010 5 A 2010
12/04/2010 2 A 2010
22/05/2010 1 A 2010
25/05/2010 7 A 2010
11/08/2011 5 A 2011
12/07/2010 2 B 2010
22/10/2010 1 B 2010
05/05/2011 7 B 2011
;
proc sort data=have;
by brand date;
run;
data want;
do until (last.brand);
set have;
by brand date;
if first.brand then
do;
Sales_Over_365=0;
_end=intnx('day',date,365);
end;
if date <= _end then
Sales_Over_365+units_sold;
else
do;
output;
Sales_Over_365=units_sold;
_end=intnx('day',date,365);
end;
end;
output;
drop _end;
run;
You need to have a start date for each brand. For now we can use the first sale date, but that might not be what you want. Then you can classify each sales date into which year it is for that brand.
Let's start by creating a dataset from your sample data. The YEAR variable is not needed.
data have ;
input Date Units_Sold Brand $ Year ;
informat date ddmmyy10.;
format date yymmdd10.;
cards;
18/03/2010 5 A 2010
12/04/2010 2 A 2010
22/05/2010 1 A 2010
25/05/2010 7 A 2010
11/08/2011 5 A 2011
12/07/2010 2 B 2010
22/10/2010 1 B 2010
05/05/2011 7 B 2011
;;;;
Now we can get the answer you want with an SQL query.
proc sql ;
create table want as
select brand
, start_date
, 1+floor((date - start_date)/365) as sales_year
, intnx('year',start_date,calculated sales_year -1,'same')
as start_sales_year format=yymmdd10.
, sum(units_sold) as total_units_sold
from
( select brand
, min(date) as start_date format=yymmdd10.
, date
, units_sold
from have
group by 1
)
group by 1,2,3,4
;
quit;
This will produce this result:
total_
sales_ start_ units_
Brand start_date year sales_year sold
A 2010-03-18 1 2010-03-18 15
A 2010-03-18 2 2011-03-18 5
B 2010-07-12 1 2010-07-12 10
There is no straight forward way of doing it. You can do something like this.
To test the code, I saved your table in to a text file.
Then I created a class called Sale.
public class Sale
{
public DateTime Date { get; set; }
public int UnitsSold { get; set; }
public string Brand { get; set; }
public int Year { get; set; }
}
Then I populated a List<Sale> using the saved text file.
var lines = File.ReadAllLines(#"C:\Users\kosala\Documents\data.text");
var validLines = lines.Where(l => !l.Contains("Date")).ToList();//remove the first line.
List<Sale> sales = validLines.Select(l => new Sale()
{
Date = DateTime.Parse(l.Substring(0,10)),
UnitsSold = int.Parse(l.Substring(26,5)),
Brand = l.Substring(46,1),
Year = int.Parse(l.Substring(56,4)),
}).ToList();
//All the above code is for testing purposes. The actual code starts from here.
var totalUnitsSold = sales.OrderBy(s => s.Date).GroupBy(s => s.Brand);
foreach (var soldUnit in totalUnitsSold)
{
DateTime? minDate = null;
DateTime? maxDate = null;
int total = 0;
string brand = "";
foreach (var sale in soldUnit)
{
brand = sale.Brand;
if (minDate == null)
{
minDate = sale.Date;
}
if ((sale.Date - minDate).Value.Days <= 365)
{
maxDate = sale.Date;
total += sale.UnitsSold;
}
else
{
break;
}
}
Console.WriteLine("Brand : {0} UnitsSold Between {1} - {2} is {3}",brand, minDate.Value, maxDate.Value, total);
}
I have below dataset , I need to find the week number from the date given based on the financial year(e.g April 2013 to March 2014). For example 01AprXXX , should be 0th or 1st week of the year and the consequent next year March's last week should be 52/53. I have tried a way to find out the same( code is present below as well).
I am just curious to know if there is any better way in SAS to do this in SAS
. Thanks in advance. Please let me know if this question is redundant, in that case I would delete it at the earliest, although I search for the concept but didn't find anything.
Also my apologies for my English, it may not be grammatically correct.But I hope I am able to convey my point.
DATA
data dsn;
format date date9.;
input date date9.;
cards;
01Nov2015
08Sep2013
06Feb2011
09Mar2004
31Mar2009
01Apr2007
;
run;
CODE
data dsn2;
set dsn;
week_normal = week(date);
dat2 = input(compress("01Apr"||year(date)),date9.);
week_temp = week(dat2);
format dat2 date9.;
x1 = month(input(compress('01Jan'||(year(date)+1)),date9.)) ;***lower cutoff;
x2 = month(input(compress("31mar"||year(date)),date9.)); ***upper cutoff;
x3 = week(input(compress("31dec"||(year(date)-1)),date9.)); ***final week value for the previous year;
if month(dat2) <= month(date) <= month(input(compress("31dec"||year(date)),date9.)) then week_f = week(date) - week_temp;
else if x2 >= month(date) >= x1 then week_f = week_normal + x3 - week(input(compress("31mar"||(year(date)+1)),date9.)) ;
run;
RESULT
INTCK and INTNX are your best bets here. You could use them as I do below, or you could use the advanced functionality with defining your own interval type (fiscal year); that's described more in the documentation.
data dsn2;
set dsn;
week_normal = week(date);
dat2 = intnx('month12.4',date,0); *12 month periods, starting at month 4 (so 01APR), go to the start of the current one;
week_F = intck('week',dat2,date); *Can adjust what day this starts on by adding numbers to week, so 'week.1' etc. shifts the start of week day by one;
format dat2 date9.;
run;