Data set where one date var precedes a different date var - sas

I have merged two datasets. One data set has the date a project was submitted and the other has when a project was ended. I want to have a new dataset that has only the projects where the ending date is before the submission date. I'm basically trying to identify where projects are being properly closed out before we submit them for outside reviews. Both date variables are date9. formats.
The data looks something like this (edit: there are no missing dates)
Service Submission_date End_date
1 1/1/2010 2/1/2009
2 2/1/2010 12/31/2010
3 5/1/2012 3/1/2010
I used a simple where statement but I am not still seeing incorrect dates. I used code like this:
data correctsubmission;
set projects;
where end_date < submit_date;
run;
Any ideas?

Make sure your variables actually contain dates.
data have;
input Service Submission_date End_date ;
informat Submission_date End_date mmddyy.;
format Submission_date End_date yymmdd10.;
cards;
1 1/1/2010 2/1/2009
2 2/1/2010 12/31/2010
3 5/1/2012 3/1/2010
;
They should be numeric variables that contain the number of days since 1960. Preferable with a date format (like DATE, YYMMDD, etc) so that humans can read the displayed value.
Also make sure to account for missing values.
data want;
set have;
where .Z < end_date < submission_date;
run;
Or reverse the test.
data want ;
set have;
where Submission_date > End_date ;
run;

Related

Extremely New to SAS

I am new to SAS and I am struggling struggling with my code. I would love some help. Am I thinking about this the right way? I have a huge table and I want to extract that data from certain dates. My two dates: 1969-12-01 and 1948-01-01 my sample code:
data null;
call symput ('timenow',put (time(),time.));
call symput ('datenow',put (date(),date9.));
run;
title "The current time is timenow and the date is datenow";
proc print data=sashelp.buy;
run;
So first learn about your dataset. So for example run PROC CONTENTS.
proc contents data=sashelp.buy; run;
Which will show you that there is variable named DATE that has date values (number of days since 1960).
So to reference a specific date use a date literal. That is a quoted string in the style that the DATE informat can read followed by the letter D. You can then use a WHERE statement to filter the data.
data want;
set sashelp.buy;
where date = '31dec1969'd ;
run;
Which will not find any observations since that date does not appear in that dataset.
If you want to select for multiple dates you could either add more conditions using OR.
where (date = '31dec1969'd) or (date = '01jan1948'd);
You can also use the IN operator:
where date in ('31dec1969'd '01jan1948'd);
Note that if your variable contains datetime values (number of seconds) then to pick a specific date you would either need to use a range of datetime literals:
where datetime between '31dec1969:00:00'dt and '31dec1969:11:59:59'dt);
Or convert the number of seconds into number of days and compare to the date literal.
where datepart(datetime) = '31dec1969'd ;
Welcome to StackOverflow Sportsguy3090.
Here I make a dataset called sample with some sample dates. That dataset has a variable called name and another variable called date. Internally, SAS stores dates as the number of days until or after January 1st 1970. That is rough to look at. So I use the format statement to have the dates appear as a 10 character string with month/day/year.
data sample;
name = "Abe "; date = "01Dec1969"d; output;
name = "Betty"; date = "01Jan1948"d; output;
name = "Carl"; date = "06Jun1960"d; output;
name = "Doug"; date = "06Dec1969"d; output;
name = "Ed"; date = "01Jan1947"d; output;
format date mmddyy10.;
run;
The code below subsets the data and puts the good records into a new dataset called keepers. It only keeps the records that are in the date range (including the limit dates).
data keepers;
set sample;
where date between "01jan1948"d and "01Dec1969"d;
run;
I hope that helps.... if not send up another flare.

I want to extract month data from a datetime format column in SAS

I have a data in format 01 Jan 19.00.00 (datetime), and I want to extract the month name only from it.
Tried the below code but getting output in numbers i.e. 1 2 3 and so on. I want the output either in Jan Feb mar format or January February march format.
data want;
set detail;
month = month(datepart(BEGIN_DATE_TIME));
run;
You can use the MONNAME format.
data test;
dt = datetime();
monname = put(datepart(dt),MONNAME.);
put monname=;
run;
If you want "OCT" not "OCTOBER" you can add a 3 to the format (MONNAME3.).
If you are using the value in a report the better approach might be to use a date value formatted with MONNAME.
The values of a date formatted variable will be ordered properly when the variable is used in a CLASS or BY statement. If you had instead computed a new variable as the month name, the default ordering of values would be alphabetical.
data want;
set have;
begin_date = datepart(BEGIN_DATE_TIME);
format begin_date MONNAME3.;
run;

Input date format YYYYMMDD

I need to input dates in YYYYMMDD format and create macro variables from these dates to use in a WHERE clause. The FINAL dataset should select one record from Sales but 0 observations are returned.
data work.FiscalYear2019;
input #1 fiscalYear $4. #5 StartDate mmddyy8.;
retain diff;
if fiscalYear = '2019' then do;
tday = today();
diff = tday - StartDate;
call symputx('FYTD_days',diff);
call symputx('CY_StartDate', StartDate);
call symputx('CY_EndDate', put(today(),mmddyy8.));
end;
else if fiscalYear = '2018' then do;
PY_EndDate = StartDate + diff;
call symput('PY_EndDate', put(PY_EndDate,mmddyy8.));
call symput('PY_StartDate', put(StartDate,mmddyy8.));
end;
datalines;
201912312018
201801012018
;
data work.Sales;
input #1 fiscalYear $4. #5 orderDate mmddyy8.;
format orderDate mmddyy6.;
datalines;
201902042019
201801012018
;
data final (WHERE=(orderDate >= &PY_StartDate AND
orderDate <= &PY_EndDate));
set Sales;
run;
I expect the FINAL dataset to contain one record from the Sales dataset but FINAL has 0 observations.
To use your macro variables as date values you need to either generate the macro variables as the raw number of days values, like you did with CY_StartDate, or generate them using the DATE format and enclose them in quotes and append the letter D to make a date literal.
Like this:
call symputX('PY_StartDate', put(StartDate,date9.));
call symputX('PY_EndDate', PY_EndDate);
...
data final
set Sales;
WHERE orderDate >= "&PY_StartDate"d
AND orderDate <= &PY_EndDate
;
run;
Also your subject line mentions YYYYMMDD informat and it does not appear in your code. Are you interpreting your source data properly? Does 201801012018 represent an 8 digit date in YMD order plus a four digit year? Or a four digit year plus an 8 digit date in MDY order?
You are just not referring to your macro variables in the last data step with the proper syntax. Those &PY_StartDate and &PY_EndDate variables are just strings after macro code is compiled, and you need to refer to them as date constants. So this should fix the issue:
data final (WHERE=(orderDate >= "&PY_StartDate"d AND
orderDate <= "&PY_EndDate"d));
set Sales;
run;
In the future, I recommend including options mprint; at the start of your code. This option displays the text generated by macro execution in your log, which can help greatly with debugging macros.

Date missing in dataset

I created below SAS code to pull the data for particular a date.
%let date =2016-12-31;
proc sql;
connect to teradata as tera ( user=testuser password=testpass );
create table new as select * from connection tera (select acct,org
from dw.act
where date= &date.);
disconnect from tera;
quit;
There are situation where that particular date may be missing in the dataset due to holiday.
I thinking how to query the previous date(non-holiday) if the mention date in the %let statement is holiday
Before running your query you have to do a lookup or data check on the date you are using. You have two options:
Use a Date Dimension table in order identify/lookup holidays.
Count how many records you have for that date, if you get 0 obs for this date, use date+1 in your query.
I recommend using the date dimension table option.
Teradata has Sys_Calendar.Calendar view. You can use that in query, it has all the information regarding weekdays and others.
if you want to SAS way use weekday function and use call symput as shown below. Teradata needs single quote around the date, so it is better to have single quotes around when creating macro variable
data _null_;
/* this is for intial date*/
date_int = input('2016-12-31', yymmdd10.);
/* create a new date variable depending on weekday*/
if weekday(date_int) = 7 then date =date_int-2; /*sunday -2 days to get
friday*/
else if weekday(date_int) = 6 then date =date_int-1;/*saturday -1 day to get
friday*/
else date =date_int;
format date date_int yymmdd10.;
call symputx('date', ''''||put(date,yymmdd10.)||'''');
run;
%put modfied date is &date;
modified date is '2016-12-29'
Now you can use this macro variable in your pass through.

Counting working days in SAS EG

Hi to all and good time of a day!
Here is my case I need to solve I will very gratefull if you can help me.
I have some data set it contains only one variable date format.
Example:
01JAN2016
06JAN2016
15FEB2016
The second data set is days - holidays for a period 5 years.
Example:
01JAN2016
02JAN2016
and etc, all these days are not working days.
The case is I need to count number of working days from date for every observation from first data set till now. It seems that I need to count number of days
"Now date" minus Date(from first data set) and minus number of days from second data set with holidays (count(date) where Date(from first data set)< date < "Now"
You can define your own type of interval to use with SAS funcions intck and intnx. Here's how to do it:
First create a table of weekdays for whichever years you have holidays for, up to present (or a future) year.
Here we'll start by including all weekdays from 2014 to 2016. This is assuming you don't want to count weekend days. If that's not the case, just modify the code so that the condition "weekday(date) in (2:6)" is not applied. You'll get the full 365 days of the year.
data mon_fri;
do date = "01JAN2014"d to "31DEC2016"d;
if weekday(date) in (2:6) then output;
end;
format date date9.;
run;
Then we'll create a table having all those dates we just created, minus the holidays we have in the table Holidays. We'll place the table in a library called myLib, and rename the date column to "Begin" for compliance with SAS custom intervals.
libname myLib "some/place/on/your/drive";
data mylib.workdays(RENAME=(date=Begin));
merge mon_fri (in=weekday)
Holidays (in=holiday);
by date;
if weekday and not holiday then output;
run;
Now we set up a custom interval which we'll simply call "workdays".
options intervalds=(workdays=mylib.workdays);
From there, all you have left to do is something like this:
data dateCalculations;
set mydata;
numOfDays = intck("workdays", theDate, today());
run;
SAS will take care of counting the number of dates (lines in the workdays dataset) separating the startdate (column called theDate) from the enddate (today's date).
Et voilĂ !
This is wonderful and very helpful. I use two different SAS systems (both on remote Unix servers). Setting the intervalds option only seems to work on one of them. I copy/paste the same code and on the other nothing happens - no warning, no error, it simply doesn't work.
Here is how I'm setting it (download the CSV from Yahoo! Finance for the S&P500, daily data, starting January 1950):
PROC IMPORT DATAFILE="sp500_1950_2016.csv"
OUT=sp500_1950_2016
DBMS=DLM
REPLACE;
delimiter=',';
getnames=yes;
RUN;
data trading_days;
set sp500_1950_2016 (keep = date rename=(date=begin));
where year(begin) < 2017;
run;
options intervalds=(TradingDay=trading_days) ;
Then I call it like so to count number of observations I should have from fund inception to Dec 31, 2016 or when the fund closed, whichever is sooner:
data ops2; set operations_master; where ~missing(inception);
if missing(enddate) then enddate = '31dec2016'd;
datadays = INTCK('TradingDay',inception,enddate);run;
proc univariate; var datadays;run;quit;
On system 1, this works just fine. On system 2, I get 0 for the variable datadays. I've already checked to see if there is a sys admin override on setting the intervalds option, and there is not. Is there another reason why this might not work on a given system?