Format error, operation not executed in WPS using SAS language - sas

Good afternoon.
Data description first:
if cell = . , rightfully blank
if cell = yyyymm, missing value needs to be imputed
if cell = 0 or other numeric value, number of days
I have monthly data from January 2010 to November 2016. I need to derive the missing values based on the available data. I don't touch change anything if it should be blank but if it's missing, I need to derive it.
Here is my code (I've changed the variable names because it might be confusing):
data trial3;
set work.trial2;
if currentmonth=. Then currentmonth=.; else do;
if currentmonth=201001 then do;
if nextmonth=0 then currentmonth=0;
else do;
if nextmonth ne 201002 then _201001=nextmonth-31;
if currentmonth<0 then currentmonth=0;
end;
end;
end;
There's no error notification but the log would just end running and every line it did run is preceded by an exclamation point.

Related

write conditional in SAS with DATA _NULL_

I am writing a conditional in SAS starts with DATA NULL
%LET today = today();
DATA _NULL_;
if day(today) ge 1 and day(today) le 15 then do;
date1=put(intnx('month',today,-1,'E'), date11.);
date2=put(intnx('month',today,-1,'L'), date11.);
end;
if day(today) > 15 then do;
date1=put(intnx('month',today,0,'B'), date11.);
date2=put(intnx('month',today,0,'L'), date11.);
end;
call symput('report_date',date1);
call symput('report_date2',date2);
RUN;
but with above, I am not getting any values for my report_dates.
the condition is:
date 1 = If the current date is greater than or equal to 1 and less than 16, set the date1 to the 16th of the previous month, otherwise set it to the 1st of the current month
date2 = If the current date is 16 and above, set the date2 to the 15th of the current month, otherwise set date2 to the last day of the previous month
The IF/THEN logic does not account for the missing value you passing to the DAY() function calls.
The variable TODAY is never created in the data step. So just remove the %LET statement and add an actual assignment statement instead.
DATA _NULL_;
today=today();
if day(today) ge 1 and day(today) le 15 then do;
...
Just because you used the same name for the macro variable in the %LET statement as you used for the variable in the data step does not imply that the two have anything at all to do with each other.
If you wanted to use the macro variable to generate the code for the data step you would need to replace the TODAY with &TODAY.
if day(&today) ge 1 and day(&today) le 15 then do;
So for the value you set to the macro variable TODAY it would mean that the SAS code you are trying to run is:
if day(today()) ge 1 and day(today()) le 15 then do;
Note that is actually not a good way to handle this problem because it is calling the TODAY() function multiple times. That could cause strange results if the data step started right before midnight. So the IF condition might be run on the 31st but when you get to the ELSE condition the clock has ticked over to the first of the next month.

Check for valid dates using SAS macro

I have written a macro to check for invalid dates and set it to '11111111', but got an unexpected NOTE: Invalid argument to function INPUT. The reason for the note is, source data has date "0212-04-26" which is beyond the SAS dates ranging A.D. 1582 to A.D. 19,900. So now I'm looking to check for invalid date without any message`.
My Code:
%macro chkdate(datefld=, num_date=) ;
/* invalid date gets set to '11111111' */
if &datefld ne '0001-01-01' then do ;
t_date = input(compress(&datefld, '-'), yymmdd8.) ;
if t_date eq . then do ;
%errors( key_desc=Invalid Date, fname=&datefld,)
&datefld = '11111111' ;
end;
end;
%mend chkdate ;
thanks
Different approaches exist...
I'd recommend reading this paper which illustrates the ? and ?? informat modifiers to be used with input().
You could, among other avenues, start by using input() with format 8. and check that it's between, for instance, 19600101 and 20250101 (or whichever numbers are appropriate), and only then do the input() with yymmdd8..
If you want to be more thorough, you could use input() with 3 substrings and check separately the year, month and day parts (using between 1960 and 2020, between 1 and 12, etc.).

Counting working days in SAS EG

Hi to all and good time of a day!
Here is my case I need to solve I will very gratefull if you can help me.
I have some data set it contains only one variable date format.
Example:
01JAN2016
06JAN2016
15FEB2016
The second data set is days - holidays for a period 5 years.
Example:
01JAN2016
02JAN2016
and etc, all these days are not working days.
The case is I need to count number of working days from date for every observation from first data set till now. It seems that I need to count number of days
"Now date" minus Date(from first data set) and minus number of days from second data set with holidays (count(date) where Date(from first data set)< date < "Now"
You can define your own type of interval to use with SAS funcions intck and intnx. Here's how to do it:
First create a table of weekdays for whichever years you have holidays for, up to present (or a future) year.
Here we'll start by including all weekdays from 2014 to 2016. This is assuming you don't want to count weekend days. If that's not the case, just modify the code so that the condition "weekday(date) in (2:6)" is not applied. You'll get the full 365 days of the year.
data mon_fri;
do date = "01JAN2014"d to "31DEC2016"d;
if weekday(date) in (2:6) then output;
end;
format date date9.;
run;
Then we'll create a table having all those dates we just created, minus the holidays we have in the table Holidays. We'll place the table in a library called myLib, and rename the date column to "Begin" for compliance with SAS custom intervals.
libname myLib "some/place/on/your/drive";
data mylib.workdays(RENAME=(date=Begin));
merge mon_fri (in=weekday)
Holidays (in=holiday);
by date;
if weekday and not holiday then output;
run;
Now we set up a custom interval which we'll simply call "workdays".
options intervalds=(workdays=mylib.workdays);
From there, all you have left to do is something like this:
data dateCalculations;
set mydata;
numOfDays = intck("workdays", theDate, today());
run;
SAS will take care of counting the number of dates (lines in the workdays dataset) separating the startdate (column called theDate) from the enddate (today's date).
Et voilĂ !
This is wonderful and very helpful. I use two different SAS systems (both on remote Unix servers). Setting the intervalds option only seems to work on one of them. I copy/paste the same code and on the other nothing happens - no warning, no error, it simply doesn't work.
Here is how I'm setting it (download the CSV from Yahoo! Finance for the S&P500, daily data, starting January 1950):
PROC IMPORT DATAFILE="sp500_1950_2016.csv"
OUT=sp500_1950_2016
DBMS=DLM
REPLACE;
delimiter=',';
getnames=yes;
RUN;
data trading_days;
set sp500_1950_2016 (keep = date rename=(date=begin));
where year(begin) < 2017;
run;
options intervalds=(TradingDay=trading_days) ;
Then I call it like so to count number of observations I should have from fund inception to Dec 31, 2016 or when the fund closed, whichever is sooner:
data ops2; set operations_master; where ~missing(inception);
if missing(enddate) then enddate = '31dec2016'd;
datadays = INTCK('TradingDay',inception,enddate);run;
proc univariate; var datadays;run;quit;
On system 1, this works just fine. On system 2, I get 0 for the variable datadays. I've already checked to see if there is a sys admin override on setting the intervalds option, and there is not. Is there another reason why this might not work on a given system?

SAS Data Step | Between 2 Dates

Probably a simple question. I have a simple dataset with scheduled payment dates in it.
DATA INFORM2;
INFORMAT previous_pmt_date scheduled_pmt_date MMDDYY10.;
INPUT previous_pmt_date scheduled_pmt_date;
FORMAT previous_pmt_date scheduled_pmt_date MMDDYYS10.;
DATALINES;
11/16/2015 12/16/2015
12/17/2015 01/16/2016
01/17/2016 02/16/2016
;
What I'm trying to do is to create a binary latest row indicator. For example, If I wanted to know the latest row as of 1/31/2016 I'd want row 2 to be flagged as the latest row. What I had been doing before is finding out where 1/31/2016 is between the previous_pmt_date and the scheduled_pmt_date, but that isn't correct for my purposes. I'd like to do this in an data step as opposed to SQL subqueries. Any ideas?
Want:
previous_pmt_date scheduled_pmt_date latest_row_ind
11/16/2015 12/16/2015 0
12/17/2015 01/16/2016 1
01/17/2016 02/16/2016 0
Here's a solution that does it all in the single existing datastep without any additional sorting. First I'm going to modify your data slightly to include account as the solution really should take that into account as well:
DATA INFORM2;
INFORMAT previous_pmt_date scheduled_pmt_date MMDDYY10.;
INPUT account previous_pmt_date scheduled_pmt_date;
FORMAT previous_pmt_date scheduled_pmt_date MMDDYYS10.;
DATALINES;
1 11/16/2015 12/16/2015
1 12/17/2015 01/16/2016
1 01/17/2016 02/16/2016
2 11/16/2015 12/16/2015
2 12/17/2015 01/16/2016
2 01/17/2016 02/16/2016
;
run;
Specify a cutoff date:
%let cutoff_date = %sysfunc(mdy(1,31,2016));
This solution uses the approach from this question to save the variables in the next row of data, into the current row. You can drop the vars at the end if desired (I've commented out for the purposes of testing).
data want;
set inform2 end=eof;
by account scheduled_pmt_date;
recno = _n_ + 1;
if not eof then do;
set inform2 (keep=account previous_pmt_date scheduled_pmt_date
rename=(account = next_account
previous_pmt_date = next_previous_pmt_date
scheduled_pmt_date = next_scheduled_pmt_date)
) point=recno;
end;
else do;
call missing(next_account, next_previous_pmt_date, next_scheduled_pmt_date);
end;
select;
when ( next_account eq account and next_scheduled_pmt_date gt &cutoff_date ) flag='a';
when ( next_account ne account ) flag='b';
otherwise flag = 'z';
end;
*drop next:;
run;
This approach works by using the current observation in the dataset (obtained via _n_) and adding 1 to it to get the next observation. We then use a second set statement with the point= option to load in that next observation and rename the variables at the same time so that they don't overwrite the current variables.
We then use some logic to flag the necessary records. I'm not 100% of the logic you require for your purposes, so I've provided some sample logic and used different flags to show which logic is being triggered.
Some notes...
The by statement isn't strictly necessary but I'm including it to (a) ensure that the data is sorted correctly, and (b) help future readers understand the intent of the datastep as some of the logic requires this sort order.
The call missing statement is simply there to clean up the log. SAS doesn't like it when you have variables that don't get assigned values, and this will happen on the very last observation so this is why we include this. Comment it out to see what happens.
The end=eof syntax basically creates a temporary variable called eof that has a value of 1 when we get to the last observation on that set statement. We simply use this to determine if we're at the last row or not.
Finally but very importantly, be sure to make sure you are keeping only the variables required when you load in the second dataset otherwise you will overwrite existing vars in the original data.

SAS date or numeric data?

%let months_back = %sysget(months_back);
data;
m = intnx('month', "&sysdate9"d, -&months_back - 2, 'begin');
m = intnx('day', put(m, date9.), 26, 'same');
m2back = put(m, yymmddd10.);
put m2back;
run;
NOTE: Character values have been converted to numeric values at the
places given by: (Line):(Column).
5:19 NOTE: Invalid numeric data, '01OCT2012' , at line 5 column 19.
I really don't know why this go wrong. The date string is numeric data?
PUT(m, date9.) is the culprit here. The 2nd argument of INTNX needs to be numeric (i.e. a date), the PUT function always returns a character value, in this instance '01OCT2012'. Just take out the PUT function completely and the code should work.
m = intnx('day', m, 26, 'same');
SAS stores dates as numbers - and in fact does not have a truly separate type for them. A SAS date is the number of days since 1/1/1960, so a bit over 19000 for today. The date format is entirely irrelevant to any date calculations - it is solely for human readibility.
The bit where you say:
"&sysdate9"d
actually converts the string "01JAN2012" to a numeric value (18304).
There's actually a quicker way to accomplish what you're trying to do. Because days correspond to whole numbers in SAS, to increment by one day you can simply add one to the value.
For example:
%let months_back=5;
data _null_;
m = intnx('month', today(), -&months_back - 2, 'begin');
m2 = intnx('day', m, 26, 'same');
m3 = intnx('month',"&sysdate9"d, -&months_back - 2)+26;
m2back = put(m2, yymmdd10.);
put m= date9. m2= yymmdd10. m3= yymmdd10.;
run;
M3 does your entire calculation in one step, by using the MONTH interval, then adding 26. INTNX('day'...) is basically pointless, unless there's some other value to using the function (using a shift index for example).
You also can see the use of a format in the PUT(log) statement here - you don't have to PUT it to a character value and then put that to the log to get the formatted value, just put (var) (format.); - and string together as many as you want that way.
Also, "&sysdate9."d is not the best way to get the current date. &sysdate. is only defined on startup of SAS, so if your session ran for 3 days you would not be on the current day (though perhaps that's desired?). Instead, the TODAY() function gets the current date, up to date no matter how long your SAS session has been running.
Finally - I recommend data _null_; if you don't want a dataset (and naming the result dataset if you do want it). data _null_ does not create a dataset. data; simply creates increasing numbers of datasets (data1, data2, ...) which quickly fill up your workspace and make it hard to tell what you're doing.