Quickest way to fill in missing dates in a sequence? SAS - sas

Say you download data for stocks or bonds, you have the stock price or yield for every trading day. So you have two variables, stock price (or yield if bond) and date. What is the quickest way to add weekends and holidays to the dates variable while using the previous open day as the values for those missing days?
For example, if it were July 1, 2022 there would be a stock price, lets say $100, corresponding to that date, but during the long weekend (4th of July) there are no observations in the data with the date being July 2nd through 4th. How do you add those dates with the stock price equaling $100 until the next trading day, July 5th?
I used a do loop to create the dates then merged and retain, but I feel like theres got to be a quicker method

You could just add an OUTPUT statement in a DO loop. The tricky part is getting the next date. Here is a method using a second SET statement that is offset by one observation.
data want;
set have ;
by date;
set have(firstobs=2 keep=date rename=(date=next_date)) have(obs=0 drop=_all_);
next_date = coalesce(next_date,date);
do date=date to next_date;
output;
end;
run;
But your real data probably has multiple stocks. So add some BY group processing.
data want;
set have ;
by stock date;
set have(firstobs=2 keep=date rename=(date=next_date)) have(obs=0 drop=_all_);
if last.stock the next_date=date;
do date=date to next_date;
output;
end;
run;

Related

Dates and intervals in SAS

If I have two dates e.g. 22MAR2005 to 01MAR2006 and I want to create season intervals (spring, summer, autumn, winter) based on this interval, how can this be done in a data step?
Season's are defined as:
Spring: March to May
Summer: June to August
Autumn: September to November
Winter: December to February
I need to calculate how long they spent in each season.
You need to convert that to MONTH first and then to SEASON. This exact question was asked recently so it's relatively easy to find via search (I'm assuming some course is using this as homework?).
data want;
set have;
*create month;
month_date = month(date);
*assign to season;
if month_date in (6, 7, 8) then season = 'Summer';
else if month_date in (9, 10, 11) then season ='Fall';
....etc;
run;
You could also use a format but since you're just starting out this is likely easier.
Other users question which seems really really similar:
https://communities.sas.com/t5/SAS-Enterprise-Guide/proc-glm/m-p/492142
EDIT: Use INTCK() to calculate the number of intervals.
Then use INTNX to increment across the intervals and count your days.
How you align your dates can be controlled with the first parameter to the INTNX() function. You can check the documentation for the exact specifics.
data want;
start_date="22Mar2005"d;
end_date="01Mar2006"d;
num_intervals=intck('quarter', start_date, end_date, 'C') ;
do interval=0 to num_intervals;
season_start=intnx('Month3.3', start_date, interval, 'b');
season_end=intnx('Month3.3', start_date, interval, 'e');
Number_Days=season_end - season_start + 1;
output;
end;
format start_date end_date season: yymmddd10.;
run;

How can I pull certain records and update data in SAS?

I have a table of customer purchases. The goal is to be able to pull summary statistics on the last 20 purchases for each customer and update them as each new order comes in. What is the best way to do this? Do I need to a table for each customer? Keep in mind there are over 500 customers. Thanks.
This is asked at a high level, so I'll answer it at that level. If you want more detailed help, you'll want to give more detailed information, and make an attempt to solve the problem yourself.
In SAS, you have the BY statement available in every PROC or DATA step, as well as the CLASS statement, available in most PROCs. These both are useful for doing data analysis at a level below global. For many basic uses they give a similar result, although not in all cases; look up the particular PROC you're using to do your analysis for more detailed information.
Presumably, you'd create one table containing your most twenty recent records per customer, or even one view (a view is like a table, except it's not written to disk), and then run your analysis PROC BY your customer ID variable. If you set it up as a view, you don't even have to rerun that part - you can create a permanent view pointing to your constantly updating data, and the subsetting to last 20 records will happen every time you run the analysis PROC.
Yes, You can either add a Rank to your existing table or create another table containing the last 20 purchases for each customer.
My recommendation is to use a datasetp to select the top20 purchasers per customer then do your summary statistics. My Code below will create a table called "WANT" with the top 20 and a rank field.
Sample Data:
data have;
input id $ purchase_date amount;
informat purchase_date datetime19.;
format purchase_date datetime19.;
datalines;
cust01 21dec2017:12:12:30 234.57
cust01 23dec2017:12:12:30 2.88
cust01 24dec2017:12:12:30 4.99
cust02 21nov2017:12:12:30 34.5
cust02 23nov2017:12:12:30 12.6
cust02 24nov2017:12:12:30 14.01
;
run;
Sort Data in Descending order by ID and Date:
proc sort data=have ;
by id descending purchase_date ;
run;
Select Top 2: Change my 2 to 20 in your case
/*Top 2*/
%let top=2;
data want (where=(Rank ne .));
set have;
by id;
retain i;
/*reset counter for top */
if first.id then do; i=1; end;
if i <= &top then do; Rank= &top+1-i; output; i=i+1;end;
drop i;
run;
Output: Last 2 Customer Purchases:
id=cust01 purchase_date=24DEC2017:12:12:30 amount=4.99 Rank=2
id=cust01 purchase_date=23DEC2017:12:12:30 amount=2.88 Rank=1
id=cust02 purchase_date=24NOV2017:12:12:30 amount=14.01 Rank=2
id=cust02 purchase_date=23NOV2017:12:12:30 amount=12.6 Rank=1

Counting working days in SAS EG

Hi to all and good time of a day!
Here is my case I need to solve I will very gratefull if you can help me.
I have some data set it contains only one variable date format.
Example:
01JAN2016
06JAN2016
15FEB2016
The second data set is days - holidays for a period 5 years.
Example:
01JAN2016
02JAN2016
and etc, all these days are not working days.
The case is I need to count number of working days from date for every observation from first data set till now. It seems that I need to count number of days
"Now date" minus Date(from first data set) and minus number of days from second data set with holidays (count(date) where Date(from first data set)< date < "Now"
You can define your own type of interval to use with SAS funcions intck and intnx. Here's how to do it:
First create a table of weekdays for whichever years you have holidays for, up to present (or a future) year.
Here we'll start by including all weekdays from 2014 to 2016. This is assuming you don't want to count weekend days. If that's not the case, just modify the code so that the condition "weekday(date) in (2:6)" is not applied. You'll get the full 365 days of the year.
data mon_fri;
do date = "01JAN2014"d to "31DEC2016"d;
if weekday(date) in (2:6) then output;
end;
format date date9.;
run;
Then we'll create a table having all those dates we just created, minus the holidays we have in the table Holidays. We'll place the table in a library called myLib, and rename the date column to "Begin" for compliance with SAS custom intervals.
libname myLib "some/place/on/your/drive";
data mylib.workdays(RENAME=(date=Begin));
merge mon_fri (in=weekday)
Holidays (in=holiday);
by date;
if weekday and not holiday then output;
run;
Now we set up a custom interval which we'll simply call "workdays".
options intervalds=(workdays=mylib.workdays);
From there, all you have left to do is something like this:
data dateCalculations;
set mydata;
numOfDays = intck("workdays", theDate, today());
run;
SAS will take care of counting the number of dates (lines in the workdays dataset) separating the startdate (column called theDate) from the enddate (today's date).
Et voilĂ !
This is wonderful and very helpful. I use two different SAS systems (both on remote Unix servers). Setting the intervalds option only seems to work on one of them. I copy/paste the same code and on the other nothing happens - no warning, no error, it simply doesn't work.
Here is how I'm setting it (download the CSV from Yahoo! Finance for the S&P500, daily data, starting January 1950):
PROC IMPORT DATAFILE="sp500_1950_2016.csv"
OUT=sp500_1950_2016
DBMS=DLM
REPLACE;
delimiter=',';
getnames=yes;
RUN;
data trading_days;
set sp500_1950_2016 (keep = date rename=(date=begin));
where year(begin) < 2017;
run;
options intervalds=(TradingDay=trading_days) ;
Then I call it like so to count number of observations I should have from fund inception to Dec 31, 2016 or when the fund closed, whichever is sooner:
data ops2; set operations_master; where ~missing(inception);
if missing(enddate) then enddate = '31dec2016'd;
datadays = INTCK('TradingDay',inception,enddate);run;
proc univariate; var datadays;run;quit;
On system 1, this works just fine. On system 2, I get 0 for the variable datadays. I've already checked to see if there is a sys admin override on setting the intervalds option, and there is not. Is there another reason why this might not work on a given system?

SAS Get week number for each month

I'm new to SAS and have been trying to figure out how to get the week number for each month. I'm having an issue with the months where they don't start at the beginning of the week. For example, if I have a month where the data from the 1st of the month falls on a Thursday, it shows the 1st and 2nd of that month as week 0. Is there a way to display those weeks as week 1? I've tried different things and have been unsuccessful.
DATA getweek;
set test;
if year(rundt) ne year(today())then delete;
month = month(rundt);
week1=intck('week',intnx('month',rundt,0),rundt);
format rundt MMDDYY8.;
RUN;
This depends largely on how you want to define week number. If the first of the month falls on a wednesday, what week is that week? Is that week one, or week zero? It can be commonly referred to either way.
If you want one, so Thurs/Fri/Sat are week 1, then the 4th is week 1, then just add one to the result - it's consistently going to be off (down) by one, after all.
data test;
do rundt=19805 to 19904;
day_of_month=day(rundt);
output;
end;
format rundt date9.;
run;
DATA getweek;
set test;
if year(rundt) ne year(today())then delete;
month = month(rundt);
week1=intck('week',intnx('month',rundt,0),rundt)+1;
format rundt MMDDYY8.;
RUN;
Now, if you want the first 9 or 10 days to be week 1, you have a different solution (though I don't recommend this). Use min to set it to minimum of one. This means your first week could be as many as 13 days, which is counterintuitive, but if it's what you need, this is how you do it.
DATA getweek;
set test;
if year(rundt) ne year(today())then delete;
month = month(rundt);
week1=min(intck('week',intnx('month',rundt,0),rundt),1);
format rundt MMDDYY8.;
RUN;

week function giving strange result

using the week function to clean some data and eventually will order the weeks. I used week() on the date 8/26/2011 and I got 34, and when the function inserted the date 01/13/2012 it spit out 2. I thouhgt I was getting number of weeks since jan 1, 1960?
As per the WEEK Function documentation, the default U descriptor specifies the number of the week within the year, with Sunday being deemed the 1st day of the week. (You can use V if you want Monday to be considered the 1st day instead.)
The week function calculates the week of the current year. The answer to the implied question, "how do I calculated the number of days since 1/1/1960 [or some arbitrary date]," is the intck function.
data have;
input datevar date9.;
datalines;
01JAN1960
02JAN2013
13JAN2012
26AUG2011
;;;;
run;
data want;
set have;
wks = intck('week',0,datevar); *# of weeks from 0 to datevar [0=1/1/1960].
*Can replace 0 with any other date variable.;
run;