Removing rows out of a table given a certain condition - sas

I have the following table "HAVE":
ID
Date
Test_5000_ABC_2022-01
01MAY2020
Test_12345_XYZ_2022-05
15OCT2021
Test_00000_UMX_2022-12
01SEP2021
Test_00000_UMX_2022-12
01DEC2022
The last part of a string in the "ID column" there is always a year and a month delimited by "-", while the column "date" has a date in the "DDMMYYY" format.
Now, I would want to delete all entries from this table where the date from the "ID" column is after the date (after the month and year) in the "date" column and save it as a new table. So, basically, my WANT table would look like this:
ID
Date
Test_00000_UMX_2022-12
01DEC2022
I appreciate any kind of help, as I am very new to SAS. Thank you!

Extract date from the ID variable
Align the date to beginning of the month
Compare as needed
data have;
infile cards dlm='09'x truncover;
input ID : $23. Date : date9.;
cards;
Test_5000_ABC_2022-01 01MAY2020
Test_12345_XYZ_2022-05 15OCT2021
Test_00000_UMX_2022-12 01SEP2021
Test_00000_UMX_2022-12 01DEC2022
;;;;
run;
data want;
set have;
date_id = mdy(input(scan(id, -1, "-_"), 8.) , 1, input(scan(id, -2, "-_"), 8.) );
*check your condition;
if date_id > intnx('month', date, 0, 'b') then flag=1;
*if date_id > intnx('month', date, 0, 'b') then delete;
format date_id date yymmdds10.;
run;

You can use the following condition both in proc sql and in a DATA step:
where input(scan(ID, -1, '_')||'-01', yyyymmdd10.) > Date
The scan takes the fraction from your ID after the last _, without the trainling blanks. The input applies the informat yyyymmdd10. to it.

Related

Removing duplicates using several conditions using SAS

Date set having id and date .I want a date set with two duplicate id but condition is that one should be before 8th June and other should be after 8th June.
To take the first date and the first date after 2021-06-08 you can sort by ID and DATE and use LAG() to detect when you cross the date boundary.
data have ;
input id date :date. ;
format date date9.;
cards;
1 01jun2021
1 07jun2021
1 08jun2021
1 09jun2021
;
data want;
set have ;
by id date;
if first.id or ( (date<='08JUN2021'd) ne lag(date<='08JUN2021'd));
run;
results
Obs id date
1 1 01JUN2021
2 1 09JUN2021

SAS sum a particular value

I have a dataset with value for each day of particular month like 01JAN2020, 03JAN2020, 06JAN202, 01FEB2020, 04FEB2020. I need to count rows in particular month year. When I use count(*) function and group it by particular column I received only daily rows counting. Which function will show the sum of rows number in particular month rather than daily sum.
Thank you,
In SQL in order to compute an aggregate count for the months of the dates, the GROUP BY should be by the date's month. The month (or 1st day of the month) can be computed using the INTNX function, or the YEAR and MONTH functions.
Proc SQL
Example:
data have;
call streaminit(2021); * initialize random number stream;
do date = '01jan2020'd to today();
do _n_ = 1 to rand('integer', 5); * random, up to 5 repeats per day;
output;
end;
end;
format date date9.;
run;
proc sql;
create table want as
select
intnx('month', date, 0) as month format=yymon7.
, count(*) as count
from
have
group by
calculated month /* calculated is SAS SQL special feature */
;
Proc MEANS
You can also use Proc MEANS and format the date as a month representation. The procedure will group according to the formatted value.
Example:
proc means nway data=have noprint ;
format date yymon7.;
class date;
var date;
output out=want N=count;
run;
PROC FREQ + a format for how you want your date displayed. The first example is by year month, the second is just by month name.
*by year month;
proc freq data=sashelp.stocks;
format date yymmn6.;
table date;
run;
*by month name;
proc freq data=sashelp.stocks;
format date monname.;
table date;
run;

How do I extract the month and year from dates in a date column, within SAS?

I formatted the date column within my table, using the following code;
data data1;
set data;
format Date ddmmyy10.;
run;
I want to know how I can create a new column within my table which just extracts the month and year from the dates in the "Date" column.
How do I do this please?
in order to create new columns calculated by existing Date column you should try this:
data data1;
set data;
format Date ddmmyy10.;
yearNo=year(Date);
monthNo=month(Date);
run;

Alternative to LAG function

My dataset is structured like this:
ID, Order Date, Delivery Date, Flag
1, 10/03/12, 15/03/12,
1, 17/03/12, 20/03/12, 1
I want to be able to calculate the date difference between the first occurring delivery date and subsequent order dates. Eventual aim is group these records with an identifier.
Have tried the monotonic() with monotonic()+1 for a self join - but the problem with this is that each ID can have multiple different numbers of rows needing to be grouped together. Am using SAS Enterprise Guide 7 - unfortunately LAG is not available.
An example of what I'm looking to achieve is:
ID, Order Date, Deliv Date, Order Date_1, Deliv Date_1, DateDIFF(Deliv Date - Order Date_1)
1, 10/03/12, 15/03/12, 17/03/12, 20/03/12, 2
Any ideas?
You need to proc sort the data in descending order by Delivery_Date then retain the values of the previous record while grouping by ID in order to calculate the difference.
Code:
data have;
infile datalines dlm=',' dsd;
informat id 11. Order_Date ddmmyy8. Delivery_Date ddmmyy8. flag 11.;
format Order_Date ddmmyy8. Delivery_Date ddmmyy8.;
input ID Order_Date Delivery_Date Flag;
datalines;
1, 10/03/12, 15/03/12,.
1, 17/03/12, 20/03/12, 1
2, 10/03/12, 10/03/12,.
2, 17/03/12, 20/03/12, 1
2, 10/03/12, 27/03/12,1
2, 17/03/12, 23/03/12, 1
run;
proc sort data=have;
by id descending Delivery_Date ;
run;
data want;
set have;
by id;
retain nxt_date;
if first.id=1 then do;
nxt_date = Delivery_Date;
diff=0;
end;
else do;
prv_date=nxt_date;
diff=nxt_date-Delivery_Date;
nxt_date = Delivery_Date;
end;
format prv_date ddmmyy8.
drop nxt_date;
run;
Output:
id=1 Order_Date=17/03/12 Delivery_Date=20/03/12 flag=1 diff=0 prv_date=.
id=1 Order_Date=10/03/12 Delivery_Date=15/03/12 flag=. diff=5 prv_date=20/03/12
id=2 Order_Date=10/03/12 Delivery_Date=27/03/12 flag=1 diff=0 prv_date=.
id=2 Order_Date=17/03/12 Delivery_Date=23/03/12 flag=1 diff=4 prv_date=27/03/12
id=2 Order_Date=17/03/12 Delivery_Date=20/03/12 flag=1 diff=3 prv_date=23/03/12
id=2 Order_Date=10/03/12 Delivery_Date=10/03/12 flag=. diff=10 prv_date=20/03/12

How to calculate 'age last birthday' on a given date for a given birthday in SAS PROC SQL step

I want to calculate 'age last birthday' on a specific evaluation date, given a specific date of birth, using a SAS PROC SQL command.
How can I do this and are there any limitations?
Sample Input
DATA INPUTS;
infile cards dlm=',' dsd;
INPUT DOBDt :DATE9. EvalDt :DATE9. expected;
FORMAT DOBDt date9. EvalDt date9.;
CARDS;
11MAY2009,10MAY2015,5
11MAY2009,11MAY2015,6
11MAY2009,12MAY2015,6
28FEB1984,01DEC2015,31
29FEB1984,28FEB2012,27
29FEB1984,29FEB2012,28
29FEB1984,01MAR2012,28
;
RUN;
The goal would be to take the dobDt as an input, evaluate on the EvalDt and produce the answer of expected
This can be done as such :
PROC SQL
PROC SQL;
CREATE TABLE outputs2 AS
select
*
,intck('year',DOBDt,EvalDt,'c') AS actual
,((calculated actual) eq expected) AS check
FROM
inputs
;
QUIT;
actual, the calculated value, matches expected, the desired outcome, for all the examples provided. I am not aware of any limitations to this approach although there are probably some extreme ages that it cannot calculate due to SAS dates having a limited range of values.
As a bonus:
DATA STEP
DATA outputs;
set inputs;
actual = intck('year',DOBDt,EvalDt,'c');
check = (actual eq expected);
RUN;
This is how we used to do it back in the day. Also "age at last birthday" seems pretty clear to me.
DATA INPUTS;
infile cards dlm=',' dsd;
INPUT DOBDt :DATE9. EvalDt :DATE9. expected;
FORMAT DOBDt date9. EvalDt date9.;
age = year(evaldt)-year(dobdt) - (month(evaldt) eq month(dobdt) and day(evaldt) lt day(dobdt)) - (month(evaldt) lt month(dobdt));
CARDS;
11MAY2009,10MAY2015,5
11MAY2009,11MAY2015,6
11MAY2009,12MAY2015,6
28FEB1984,01DEC2015,31
29FEB1984,28FEB2012,27
29FEB1984,29FEB2012,28
29FEB1984,01MAR2012,28
;;;;
RUN;
proc print;
run;