Date comparison using PROC SQL within SAS - sas

I'm using PROC SQL within SAS and trying to get a count where the current month is equal to the month on a date field I'm reading. the format of the input date is - mmddyy10.
This is a sample of what I'm trying –
data test;
input job $ lastrun;
DateNew = datejul(lastrun);
Format datenew mmddyy10.;
datalines;
joba 19300
jobb 19200
jobc 19303
jobx 19288
run;
proc print; run;
proc sql;
select
count(job) AS cnt_LastMonth
from test
where datepart(datenew) = intnx('month', today(), -1, 'same');
quit;
In this example I'm expecting the cnt_LastMonth to return 3, however it returns 0.

You can't calculate datepart from date variable, only from datetime. And if you want to compare dates that belong to one month, don't ignore year value.
proc sql;
create table qert as
select
count(job) AS cnt_LastMonth
from test
where intnx('month', DateNew, 0, 'b') = intnx('month', today(), -1, 'b');
/*Increments both dates to the month's begin
Instead of it you can try to use:
where month(DateNew) = month(today())-1 and year(DateNew)=year(today());
*/
quit;

proc sql;
select count(job) AS cnt_LastMonth
from test
where month(DateNew)= 10;
quit;
OR
proc sql;
SELECT count(A2.job) AS cnt_LastMonth
FROM (SELECT *,
MONTH(Date_Minus_1) as Month_filter,
MONTH(DateNew) as Month
FROM(SELECT *,
intnx('Month',today(),-1,'s') as Date_Minus_1 format=mmddyy10.
FROM test) A1)A2
Where A2.Month =A2.Month_filter;
Run;

Related

Finding the max value of a variable in SAS per ID per time period

proc sql;
create table abc as select distinct formatted_date ,Contract, late_days
from merged_dpd_raw_2602
group by 1,2
;quit;
this gives me the 3 variables I\m working with
they have the form
|ID|Date in YYMMs.10| number|
proc sql;
create table max_dpd_per_contract as select distinct contract, max(late_days) as DPD_for_contract
from sasa
group by 1
;quit;
this gives me the maximum number for the entire period but how do I go on to make it per period?
I'm guessing the timeseries procedure should be used here.
proc timeseries data=sasa
out=sasa2;
by contract;
id formatted_date interval=day ACCUMULATE=maximum ;
trend maximum ;
var late_days;
run;
but I am unsure how to continue.
I want to to find the maximum value of the variable "late days" per a given time period(month). So for contact A for the time period jan2018 the max late_days value is X.
how the data looks:https://imgur.com/iIufDAx
In SQL you will want to calculate your aggregate within a group that uses a computed month value.
Example:
data have;
call streaminit(2021);
length contract date days_late 8;
do contract = 1 to 10;
days_late = 0;
do date = '01jan2020'd to '31dec2020'd;
if days_late then
if rand('uniform') < .55 then
days_late + 1;
else
days_late = 0;
else
days_late + rand('uniform') < 0.25;
output;
end;
end;
format date date9.;
run;
options fmterr;
proc sql;
create table want as
select
contract
, intnx('month', date, 0) as month format = monyy7.
, max(days_late) as max_days_late
from
have
group by
contract, month
;
You will get the same results using Proc MEANS
proc means nway data=have noprint;
class contract date;
format date monyy7.;
output out=want_2 max(days_late) = max_days_late;
run;

Is there an IML equivalent to SQL count?

In the following example that produces a frequency based on a variable
proc iml;
call randseed(1);
Y = sample(2014:2020, 3000);
M = sample(1:12, 3000);
D = sample(1:31, 3000);
create Date var {Y M D};
append;
run;
data Date;
set Date;
format M z2. D z2.;
Year=put(Y, z4. -L);
Month=put(M, z2. -L);
Day=put(D, z2. -L);
drop Y M D;
run;
data Date;
set Date;
Date = catx('-', Year, Month, Day);
drop Year Month Day;
run;
proc sort data=Date; by Date; run;
proc sql;
create table Count as
select Date, count(*) as Frequency from Date
group by Date;
run;
is there a corresponding functionality in proc IML that does the same thing as the last proc sql part?

sas proc sql - get min date and add 1 year

I have a dataset with IDs, and each ID has multiple dates (actually datetime). I want to use PROC SQL to get the minimum datetime and also add 1 year to the minimum. I'm trying to do this all in one PROC SQL but have been fumbling and can't get this to work. Below are two attempts. Would appreciate any advice.
*** GENERATE RANDOM DATES AFTER JAN 1, 2012 AND CREATE DATE/TIME VARIABLE ***;
data have ;
format date mmddyy10. dt datetime15.;
do person_id=100, 200, 300, 400, 500;
do i = 1 to 100;
jdate = int(1000 * ranuni(123987));
date = mdy(1,1,2012) + jdate;
dt = dhms(date, 0,0,0);
output;
end;
end;
run;
*** TRY1: THIS DOES NOT WORK - GETS MIN DATE/TIME AND REMERGES WITH EVERY RECORD***;
proc sql;
create table try1 as
select min(dt) as index_dt format=datetime15. ,
(dt + 365*24*60*60) as followup_date format=datetime15.
from have
;
quit;
*** TRY2: USE MIN() IN "HAVING" STATEMENT ***;
*** PROBLEMATIC IF PERSON_ID HAS MIN(DT) OCCUR MULTIPLE TIMES ***;
proc sql;
create table try2 as
select person_id,
dt as index_dt format=datetime15.,
(dt + 365*24*60*60) as followup_date format=datetime15.
from have
group by person_id
having dt=min(dt)
;
quit;
Try this:
proc sql;
create table try1 as
select
min(dt) as index_dt format=datetime15. ,
calculated index_dt + 365*24*60*60 as followup_date format=datetime15.
from have
;
quit;
The trick here is using the "calculated" keyword.
Also you may want to do the following to add a year on instead of your multiplications:
proc sql;
create table try1 as
select
min(dt) as index_dt format=datetime15. ,
input(compress(
put(intnx('YEAR', datepart(calculated index_dt),1,'SAMEDAY'),date9.)||":"||
put(timepart(calculated index_dt),time5.)),datetime15.) as followup_date format=datetime15.
from have
;
quit;
Try using "select distinct person_id" instead of "select person_id" - that should help with your issue with duplicates. I'm not sure if SAS treats 365*24*3600 as the correct number of seconds per year, so that may be a contributing factor as well.
i don't think that you can do in only proc sql. I think to do that in this way:
*** GENERATE RANDOM DATES AFTER JAN 1, 2012 AND CREATE DATE/TIME VARIABLE ***;
data have ;
format date mmddyy10. dt datetime15.;
do person_id=100, 200, 300, 400, 500;
do i = 1 to 100;
jdate = int(1000 * ranuni(123987));
date = mdy(1,1,2012) + jdate;
dt = dhms(date, 0,0,0);
output;
end;
end;
run;
%macro do_elaboration(ds=);
/*count how many rows has my table */
%let dataset=&ds.;
%let DSID = %sysfunc(open(&dataset., IS));
%let nobs = %sysfunc(attrn(&DSID., NLOBS));
%let rc=%sysfunc(close(&DSID.));
/*loop over the number of rows*/
%do i=1 %to &nobs.;
/*at each loop get one id*/
data _NULL_;
set &ds. (OBS=&i OBS=&i);
call symputx("id", person_id);
run;
/*with proc sql get the min_dt*/
proc sql noprint;
select min(dt) into:min_dt
from &ds.
where person_id=&id.
;
quit;
/*increment the min_dt with the function sas intnx*/
data have_final_tmp;
person_id = &id.;
followup_date = intnx('dtyear',&min_dt,1);
format followup_date datetime15.;
run;
/*put all id with the followup_date in only one dataset*/
proc append base=have_final data=have_final_tmp force;
run;
%end;
%mend do_elaboration;
/*call the macro*/
%do_elaboration(ds=have);
I write the code very quickly and i don't test it so you should check it, but the concept is clear.

Calculating correlation and covariance for a event window in SAS

I have to calculate the correlation and covariance for my daily sales values for an event window. The event window is of 45 day period and my data looks like -
store_id date sales
5927 12-Jan-07 3,714.00
5927 12-Jan-07 3,259.00
5927 14-Jan-07 3,787.00
5927 14-Jan-07 3,480.00
5927 17-Jan-07 3,646.00
5927 17-Jan-07 3,316.00
4978 18-Jan-07 3,530.00
4978 18-Jan-07 3,103.00
4978 18-Jan-07 3,026.00
4978 21-Jan-07 3,448.00
Now, for every store_id, date combination, I need to go back 45 days (there is more data for each combination in my original data set) calculate the correlation between sales and lag(sales) i.e. autocorrelation of degree one. As you can see, the date column is not continuous. So something like (date - 45) is not going to work.
I have gotten till this part -
data ds1;
set ds;
by store_id;
LAG_SALE = lag(sales);
IF FIRST.store_idTHEN DO;
LAG_SALE = .;
END;
run;
For calculating correlation and covariances -
proc corr data=ds1 outp=Corr
by store_id date;
cov; /** include covariances **/
var sales lag_sale;
run;
But how do I insert the event window for each date, store_id combination? My final output should look something like this -
id date corr cov
5927 12-Jan-07 ... ...
5927 14-Jan-07 ... ...
Here is what I've come up with:
First I convert the date to a SAS date, which is the number of days since Jan. 1 1960:
data ds;
set ds (rename=(date=old_date));
date = input(old_date, date11.);
drop old_date;
run;
Then compute lag_sale (I am using the same calculation you used in the question, but make sure this is what you want to do. For some observations the lag sale is the previous recorded date, but for some it is the same store_id and date, just a different observation.):
proc sort data=ds; by store_id; run;
data ds;
set ds;
by store_id;
lag_sale = lag(sales);
if first.store_id then lag_sale = .;
run;
Then set up the final data set:
data final;
length store_id 8 date 8 cov 8 corr 8;
if _n_ = 0;
run;
Then create a macro which takes a store_id and date and runs proc corr. The first part of the macro selects only the data with that store_id and within the past 45 days of the date. Then it runs proc corr. Then it formats proc corr how you want it and appends the results to the "final" data set.
%macro corr(store_id, date);
data ds2;
set ds;
where store_id = &store_id and %eval(&date-45) <= date <=&date
and lag_sale ne .;
run;
proc corr noprint data=ds2 cov outp=corr;
by store_id;
var sales lag_sale;
run;
data corr2;
set corr;
where _type_ in ('CORR', 'COV') and _name_ = 'sales';
retain cov;
date = &date;
if _type_ = 'COV' then cov = lag_sale;
else do;
corr = lag_sale;
output;
end;
keep store_id date corr cov;
run;
proc append base=final data=corr2 force; run;
%mend corr;
Finally run the macro for each store_id/date combination.
proc sort data=ds out=ds3 nodupkey;
by store_id date;
run;
data _null_;
set ds3;
call execute('%corr('||store_id||','||date||');');
run;
proc sort data=final;
by store_id date;
run;

Changing date format in SAS9.3

Does anyone know how to change a date variable from Date9 to MMDDYY10 format in SAS9.3? I've tried using the put and input functions, but the result is null
Formats are nothing but instructions on how to display a value. Dates are numeric represented as the number of days from 1JAN1960.
data x;
format formated1 date9. formated2 mmddyy10.;
noformated = "01JAN1960"d;
formated1 = noformated;
formated2 = noformated;
run;
proc print data=x;
run;
Obs formated1 formated2 noformated
1 01JAN1960 01/01/1960 0
In short, just change the format on the dataset and the date will be displayed with the new format.
Try both functions:
tmpdate = put(olddate,DATE9.);
newdate = input(tmpdate,MMDDYY10.);
Or maybe even
newdate = input(put(olddate,DATE9.),MMDDYY10.);
For changing the format of variable in a table - PROC SQL or PROC DATASETS:
data WORK.TABLE1;
format DATE1 DATE2 date9.;
DATE1 = today();
DATE2 = DATE1;
run;
proc contents;
run;
proc datasets lib=WORK nodetails nolist;
modify TABLE1;
format DATE1 mmddyy10.;
quit;
proc sql;
alter table WORK.TABLE1
modify DATE2 format=mmddyy10.
;
quit;
proc contents;
run;