My dataset is structured like this:
ID, Order Date, Delivery Date, Flag
1, 10/03/12, 15/03/12,
1, 17/03/12, 20/03/12, 1
I want to be able to calculate the date difference between the first occurring delivery date and subsequent order dates. Eventual aim is group these records with an identifier.
Have tried the monotonic() with monotonic()+1 for a self join - but the problem with this is that each ID can have multiple different numbers of rows needing to be grouped together. Am using SAS Enterprise Guide 7 - unfortunately LAG is not available.
An example of what I'm looking to achieve is:
ID, Order Date, Deliv Date, Order Date_1, Deliv Date_1, DateDIFF(Deliv Date - Order Date_1)
1, 10/03/12, 15/03/12, 17/03/12, 20/03/12, 2
Any ideas?
You need to proc sort the data in descending order by Delivery_Date then retain the values of the previous record while grouping by ID in order to calculate the difference.
Code:
data have;
infile datalines dlm=',' dsd;
informat id 11. Order_Date ddmmyy8. Delivery_Date ddmmyy8. flag 11.;
format Order_Date ddmmyy8. Delivery_Date ddmmyy8.;
input ID Order_Date Delivery_Date Flag;
datalines;
1, 10/03/12, 15/03/12,.
1, 17/03/12, 20/03/12, 1
2, 10/03/12, 10/03/12,.
2, 17/03/12, 20/03/12, 1
2, 10/03/12, 27/03/12,1
2, 17/03/12, 23/03/12, 1
run;
proc sort data=have;
by id descending Delivery_Date ;
run;
data want;
set have;
by id;
retain nxt_date;
if first.id=1 then do;
nxt_date = Delivery_Date;
diff=0;
end;
else do;
prv_date=nxt_date;
diff=nxt_date-Delivery_Date;
nxt_date = Delivery_Date;
end;
format prv_date ddmmyy8.
drop nxt_date;
run;
Output:
id=1 Order_Date=17/03/12 Delivery_Date=20/03/12 flag=1 diff=0 prv_date=.
id=1 Order_Date=10/03/12 Delivery_Date=15/03/12 flag=. diff=5 prv_date=20/03/12
id=2 Order_Date=10/03/12 Delivery_Date=27/03/12 flag=1 diff=0 prv_date=.
id=2 Order_Date=17/03/12 Delivery_Date=23/03/12 flag=1 diff=4 prv_date=27/03/12
id=2 Order_Date=17/03/12 Delivery_Date=20/03/12 flag=1 diff=3 prv_date=23/03/12
id=2 Order_Date=10/03/12 Delivery_Date=10/03/12 flag=. diff=10 prv_date=20/03/12
Related
I am new to SAS
I have multiple datasets with the following variables
Dataset 1 Subid;visit; flag; date; time
Dataset 2 Subid;visit; flag; date; time
Dataset 3 Subid;visit; date; time
Dataset 4 Subid;visit; date; time
I need to,
When flag is present in the dataset compare date and time for the flag across datasets across visits
When flag is not present in dataset compare date across mentioned datasets and across visits
You have two datasets with the flag and two datasets without the flag. If you simply want a pure comparison of two datasets, proc compare will produce a report for you that compares variables with each other.
Example data:
data dataset1;
input subid visit flag date:date9. time:time.;
format date date9. time time.;
datalines;
1 1 1 01JAN2022 00:00
2 2 0 01JAN2022 01:00
;
run;
data dataset2;
input subid visit flag date:date9. time:time.;
format date date9. time time.;
datalines;
1 1 1 01JAN2022 00:00
2 2 1 03JAN2022 02:00
;
run;
Code:
proc sort data=dataset1;
by subid visit;
run;
proc sort data=dataset2;
by subid visit;
run;
proc compare base=dataset1 compare=dataset2;
id subid visit;
var date time;
run;
You can produce a dataset of only the differences as well.
proc compare base = dataset1
compare = dataset2
out = compare
outnoequal
noprint
;
id subid visit;
var date time;
run;
I have the following table "HAVE":
ID
Date
Test_5000_ABC_2022-01
01MAY2020
Test_12345_XYZ_2022-05
15OCT2021
Test_00000_UMX_2022-12
01SEP2021
Test_00000_UMX_2022-12
01DEC2022
The last part of a string in the "ID column" there is always a year and a month delimited by "-", while the column "date" has a date in the "DDMMYYY" format.
Now, I would want to delete all entries from this table where the date from the "ID" column is after the date (after the month and year) in the "date" column and save it as a new table. So, basically, my WANT table would look like this:
ID
Date
Test_00000_UMX_2022-12
01DEC2022
I appreciate any kind of help, as I am very new to SAS. Thank you!
Extract date from the ID variable
Align the date to beginning of the month
Compare as needed
data have;
infile cards dlm='09'x truncover;
input ID : $23. Date : date9.;
cards;
Test_5000_ABC_2022-01 01MAY2020
Test_12345_XYZ_2022-05 15OCT2021
Test_00000_UMX_2022-12 01SEP2021
Test_00000_UMX_2022-12 01DEC2022
;;;;
run;
data want;
set have;
date_id = mdy(input(scan(id, -1, "-_"), 8.) , 1, input(scan(id, -2, "-_"), 8.) );
*check your condition;
if date_id > intnx('month', date, 0, 'b') then flag=1;
*if date_id > intnx('month', date, 0, 'b') then delete;
format date_id date yymmdds10.;
run;
You can use the following condition both in proc sql and in a DATA step:
where input(scan(ID, -1, '_')||'-01', yyyymmdd10.) > Date
The scan takes the fraction from your ID after the last _, without the trainling blanks. The input applies the informat yyyymmdd10. to it.
I have a sas datebase with something like this:
id birthday Date1 Date2
1 12/4/01 12/4/13 12/3/14
2 12/3/01 12/6/13 12/2/14
3 12/9/01 12/4/03 12/9/14
4 12/8/13 12/3/14 12/10/16
And I want the data in this form:
id Date Datetype
1 12/4/01 birthday
1 12/4/13 1
1 12/3/14 2
2 12/3/01 birthday
2 12/6/13 1
2 12/2/14 2
3 12/9/01 birthday
3 12/4/03 1
3 12/9/14 2
4 12/8/13 birthday
4 12/3/14 1
4 12/10/16 2
Thanks by ur help, i'm on my second week using sas <3
Edit: thanks by remain me that i was not finding a sorting method.
Good day. The following should be what you are after. I did not come up with an easy way to rename the columns as they are not in beginning data.
/*Data generation for ease of testing*/
data begin;
input id birthday $ Date1 $ Date2 $;
cards;
1 12/4/01 12/4/13 12/3/14
2 12/3/01 12/6/13 12/2/14
3 12/9/01 12/4/03 12/9/14
4 12/8/13 12/3/14 12/10/16
; run;
/*The trick here is to use date: The colon means everything beginning with date, comparae with sql 'date%'*/
proc transpose data= begin out=trans;
by id;
var birthday date: ;
run;
/*Cleanup. Renaming the columns as you wanted.*/
data trans;
set trans;
rename _NAME_= Datetype COL1= Date;
run;
See more from Kent University site
Two steps
Pivot the data using Proc TRANSPOSE.
Change the names of the output columns and their labels with PROC DATASETS
Sample code
proc transpose
data=have
out=want
( keep=id _label_ col1)
;
by id;
var birthday date1 date2;
label birthday='birthday' date1='1' date2='2' ; * Trick to force values seen in pivot;
run;
proc datasets noprint lib=work;
modify want;
rename
_label_ = Datetype
col1 = Date
;
label
Datetype = 'Datetype'
;
run;
The column order in the TRANSPOSE output table is:
id variables
copy variables
_name_ and _label_
data based column names
The sample 'want' shows the data named columns before the _label_ / _name_ columns. The only way to change the underlying column order is to rewrite the data set. You can change how that order is perceived when viewed is by using an additional data view, or an output Proc that allows you to specify the specific order desired.
I have the following dataset:
DATA survey;
informat order_date date9. ;
INPUT id order_date ;
DATALINES;
1 11SEPT20016
2 12AUG2016
3 14JAN2016
;
RUN;
PROC PRINT data = survey;
format order_date date9.;
RUN;
What I would like to do now is classify the records based on their last visit. So what I want to do is:
Set a date (fe, 10SEPT 2016)
Classify all records that have a lastvisit > 30days as 1, Classify all records that have a lastvisit > 60days as 2 etc...
Any thoughts on how I need to program this?
You could build something like this (count the days between the dates, divide them by 30 and ceil them). Alternativly, if you want to use months and not 30 days, you can replace the first intck parameter with 'month' and remove the ceil and /30:
DATA survey;
informat order_date date9. ;
INPUT id order_date ;
DATALINES;
1 11SEP2016
2 12AUG2016
3 14JAN2016
4 09SEP2016
5 10AUG2016
;
RUN;
%let lastvisit=10SEP2016;
data result;
set survey;
days_30=ceil(intck('days', order_date,"&lastvisit"d)/30)-1;
run;
PROC PRINT data = result;
format order_date date9.;
RUN;
I have the following dataset and code:
DATA survey;
INPUT id order_date ;
DATALINES;
1 11JAN2007
2 12JAN2007
3 14JAN2007
;
PROC PRINT; RUN;
data work;
set survey;
where '11JAN2007'<= order_date <= '13JAN2007';
proc print data=work;
run;
When I run this code it does give the desired output however. It only gives a table with three empty order_date columns.
Any thoughts on what goes wrong here?
This would work:
DATA survey;
informat order_date date9. ;
INPUT id order_date ;
DATALINES;
1 11JAN2007
2 12JAN2007
3 14JAN2007
;
RUN;
PROC PRINT data = survey;
format order_date date9.;
RUN;
data work;
set survey;
where '11JAN2007'd<= order_date <= '13JAN2007'd;
run;
proc print data=work;
format order_date date9. ;
run;
See SAS help for topics date, informat,...
If you want to query based on date, you need to tell SAS that your string is a date. You do this by putting a 'd' after the date string, e.g.
'11JAN2007'd