I have the following dataset:
DATA survey;
informat order_date date9. ;
INPUT id order_date ;
DATALINES;
1 11SEPT20016
2 12AUG2016
3 14JAN2016
;
RUN;
PROC PRINT data = survey;
format order_date date9.;
RUN;
What I would like to do now is classify the records based on their last visit. So what I want to do is:
Set a date (fe, 10SEPT 2016)
Classify all records that have a lastvisit > 30days as 1, Classify all records that have a lastvisit > 60days as 2 etc...
Any thoughts on how I need to program this?
You could build something like this (count the days between the dates, divide them by 30 and ceil them). Alternativly, if you want to use months and not 30 days, you can replace the first intck parameter with 'month' and remove the ceil and /30:
DATA survey;
informat order_date date9. ;
INPUT id order_date ;
DATALINES;
1 11SEP2016
2 12AUG2016
3 14JAN2016
4 09SEP2016
5 10AUG2016
;
RUN;
%let lastvisit=10SEP2016;
data result;
set survey;
days_30=ceil(intck('days', order_date,"&lastvisit"d)/30)-1;
run;
PROC PRINT data = result;
format order_date date9.;
RUN;
Related
I have a sas datebase with something like this:
id birthday Date1 Date2
1 12/4/01 12/4/13 12/3/14
2 12/3/01 12/6/13 12/2/14
3 12/9/01 12/4/03 12/9/14
4 12/8/13 12/3/14 12/10/16
And I want the data in this form:
id Date Datetype
1 12/4/01 birthday
1 12/4/13 1
1 12/3/14 2
2 12/3/01 birthday
2 12/6/13 1
2 12/2/14 2
3 12/9/01 birthday
3 12/4/03 1
3 12/9/14 2
4 12/8/13 birthday
4 12/3/14 1
4 12/10/16 2
Thanks by ur help, i'm on my second week using sas <3
Edit: thanks by remain me that i was not finding a sorting method.
Good day. The following should be what you are after. I did not come up with an easy way to rename the columns as they are not in beginning data.
/*Data generation for ease of testing*/
data begin;
input id birthday $ Date1 $ Date2 $;
cards;
1 12/4/01 12/4/13 12/3/14
2 12/3/01 12/6/13 12/2/14
3 12/9/01 12/4/03 12/9/14
4 12/8/13 12/3/14 12/10/16
; run;
/*The trick here is to use date: The colon means everything beginning with date, comparae with sql 'date%'*/
proc transpose data= begin out=trans;
by id;
var birthday date: ;
run;
/*Cleanup. Renaming the columns as you wanted.*/
data trans;
set trans;
rename _NAME_= Datetype COL1= Date;
run;
See more from Kent University site
Two steps
Pivot the data using Proc TRANSPOSE.
Change the names of the output columns and their labels with PROC DATASETS
Sample code
proc transpose
data=have
out=want
( keep=id _label_ col1)
;
by id;
var birthday date1 date2;
label birthday='birthday' date1='1' date2='2' ; * Trick to force values seen in pivot;
run;
proc datasets noprint lib=work;
modify want;
rename
_label_ = Datetype
col1 = Date
;
label
Datetype = 'Datetype'
;
run;
The column order in the TRANSPOSE output table is:
id variables
copy variables
_name_ and _label_
data based column names
The sample 'want' shows the data named columns before the _label_ / _name_ columns. The only way to change the underlying column order is to rewrite the data set. You can change how that order is perceived when viewed is by using an additional data view, or an output Proc that allows you to specify the specific order desired.
My dataset is structured like this:
ID, Order Date, Delivery Date, Flag
1, 10/03/12, 15/03/12,
1, 17/03/12, 20/03/12, 1
I want to be able to calculate the date difference between the first occurring delivery date and subsequent order dates. Eventual aim is group these records with an identifier.
Have tried the monotonic() with monotonic()+1 for a self join - but the problem with this is that each ID can have multiple different numbers of rows needing to be grouped together. Am using SAS Enterprise Guide 7 - unfortunately LAG is not available.
An example of what I'm looking to achieve is:
ID, Order Date, Deliv Date, Order Date_1, Deliv Date_1, DateDIFF(Deliv Date - Order Date_1)
1, 10/03/12, 15/03/12, 17/03/12, 20/03/12, 2
Any ideas?
You need to proc sort the data in descending order by Delivery_Date then retain the values of the previous record while grouping by ID in order to calculate the difference.
Code:
data have;
infile datalines dlm=',' dsd;
informat id 11. Order_Date ddmmyy8. Delivery_Date ddmmyy8. flag 11.;
format Order_Date ddmmyy8. Delivery_Date ddmmyy8.;
input ID Order_Date Delivery_Date Flag;
datalines;
1, 10/03/12, 15/03/12,.
1, 17/03/12, 20/03/12, 1
2, 10/03/12, 10/03/12,.
2, 17/03/12, 20/03/12, 1
2, 10/03/12, 27/03/12,1
2, 17/03/12, 23/03/12, 1
run;
proc sort data=have;
by id descending Delivery_Date ;
run;
data want;
set have;
by id;
retain nxt_date;
if first.id=1 then do;
nxt_date = Delivery_Date;
diff=0;
end;
else do;
prv_date=nxt_date;
diff=nxt_date-Delivery_Date;
nxt_date = Delivery_Date;
end;
format prv_date ddmmyy8.
drop nxt_date;
run;
Output:
id=1 Order_Date=17/03/12 Delivery_Date=20/03/12 flag=1 diff=0 prv_date=.
id=1 Order_Date=10/03/12 Delivery_Date=15/03/12 flag=. diff=5 prv_date=20/03/12
id=2 Order_Date=10/03/12 Delivery_Date=27/03/12 flag=1 diff=0 prv_date=.
id=2 Order_Date=17/03/12 Delivery_Date=23/03/12 flag=1 diff=4 prv_date=27/03/12
id=2 Order_Date=17/03/12 Delivery_Date=20/03/12 flag=1 diff=3 prv_date=23/03/12
id=2 Order_Date=10/03/12 Delivery_Date=10/03/12 flag=. diff=10 prv_date=20/03/12
Lets say I have the following dates for the observations
data dates;
input obs date$11.;
cards;
1 06/10/1949
2 01/07/1952
3 02/10/1947
;
run;
But now I want to insert another column next to date called new date under the date9. format and this new date column is to be numeric.
I tried the following,
data newdata;
set dates;
newdate=input(date,date9.);
run;
But when I run this, the new date column seems to be empty
Your string values are not using a format that is compatible with the DATE. informat. They appear to be using either MMDDYY. or DDMMYY., but it is not possible to tell which from your example values.
data dates;
input obs datestr :$11.;
date1 = input(datestr,mmddyy10.);
date2 = input(datestr,ddmmyy10.);
format date1 date2 date9. ;
cards;
1 06/10/1949
2 01/07/1952
3 02/10/1947
;
results:
Obs obs datestr date1 date2
1 1 06/10/1949 10JUN1949 06OCT1949
2 2 01/07/1952 07JAN1952 01JUL1952
3 3 02/10/1947 10FEB1947 02OCT1947
I am trying to extract all the Time occurrences for only the recent visit. Can someone help me with the code please.
Here is my data:
Obs Name Date Time
1 Bob 2017090 1305
2 Bob 2017090 1015
3 Bob 2017081 0810
4 Bob 2017072 0602
5 Tom 2017090 1300
6 Tom 2017090 1010
7 Tom 2017090 0805
8 Tom 2017072 0607
9 Joe 2017085 1309
10 Joe 2017081 0815
I need the output as:
Obs Name Date Time
1 Bob 2017090 1305,1015
2 Tom 2017090 1300,1010,0805
3 Joe 2017085 1309
Right now my code is designed to give me only one recent entry:
DATA OUT2;
SET INP1;
BY DATE;
IF FIRST.DATE THEN OUTPUT OUT2;
RETURN;
I would first sort the data by name and date. Then I would transpose and process the results.
proc sort data=have;
by name date;
run;
proc transpose data=have out=temp1;
by name date;
var value;
run;
data want;
set temp1;
by name date;
if last.name;
format value $2000.;
value = catx(',',of col:);
drop col: _name_;
run;
You may want to further process the new VALUE to remove excess commas (,) and missing value .'s.
Very similar to the question yesterday from another user, you can use quite a few solutions here.
SQL again is the easiest; this is not valid ANSI SQL and pretty much only SAS supports this, but it does work in SAS:
proc sql;
select name, date, time
from have
group by name
having date=max(date);
quit;
Even though date and time are not on the group by it's legal in SAS to put them on the select, and then SAS automatically merges (inner joins) the result of select name, max(date) from have group by name having date=max(date) to the original have dataset, returning multiple rows as needed. Then you'd want to collapse the rows, which I leave as an exercise for the reader.
You could also simply generate a table of maximum dates using any method you choose and then merge yourself. This is probably the easiest in practice to use, in particular including troubleshooting.
The DoW loop also appeals here. This is basically the precise SAS data step implementation of the SQL above. First iterate over that name, figure out the max, then iterate again and output the ones with that max.
proc sort data=have;
by name date;
run;
data want;
do _n_ = 1 by 1 until (last.name);
set have;
by name;
max_Date = max(max_date,date);
end;
do _n_ = 1 by 1 until (last.name);
set have;
by name;
if date=max_date then output;
end;
run;
Of course here you more easily collapse the rows, too:
data want;
length timelist $1024;
do _n_ = 1 by 1 until (last.name);
set have;
by name;
max_Date = max(max_date,date);
end;
do _n_ = 1 by 1 until (last.name);
set have;
by name;
if date=max_date then timelist=catx(',',timelist,time);
if last.name then output;
end;
run;
If the data is sorted then just retain the first date so you know which records to combine and output.
proc sort data=have ;
by name descending date time;
run;
data want ;
set have ;
by name descending date ;
length timex $200 ;
retain start timex;
if first.name then do;
start=date;
timex=' ';
end;
if date=start then do;
timex=catx(',',timex,time);
if last.date then do;
output;
call missing(start,timex);
end;
end;
drop start time ;
rename timex=time ;
run;
I have the following dataset and code:
DATA survey;
INPUT id order_date ;
DATALINES;
1 11JAN2007
2 12JAN2007
3 14JAN2007
;
PROC PRINT; RUN;
data work;
set survey;
where '11JAN2007'<= order_date <= '13JAN2007';
proc print data=work;
run;
When I run this code it does give the desired output however. It only gives a table with three empty order_date columns.
Any thoughts on what goes wrong here?
This would work:
DATA survey;
informat order_date date9. ;
INPUT id order_date ;
DATALINES;
1 11JAN2007
2 12JAN2007
3 14JAN2007
;
RUN;
PROC PRINT data = survey;
format order_date date9.;
RUN;
data work;
set survey;
where '11JAN2007'd<= order_date <= '13JAN2007'd;
run;
proc print data=work;
format order_date date9. ;
run;
See SAS help for topics date, informat,...
If you want to query based on date, you need to tell SAS that your string is a date. You do this by putting a 'd' after the date string, e.g.
'11JAN2007'd