I have a dataset with several subjects with hour timepoints (0.02,24.02,48.02 etc) for each record per subject.
Each record has 4 dates with a single record assigned to each timepoint (0.02= 28AUG2019, 24.02= 29AUG2019 etc).
The date should be the same for each hour timepoint.
What sas function could I use to validate that the dates for each record assigned to each hour timepoint is the same for each subject ?
Would the IFC/IFN function work in this scenario?
Sample data for one subject
For the case of a data set with variables time1-time4 and date1-date4 I see at least two interpretations of the question.
One
Are the date values all the same across the record?
The date values can be arrayed and examined in a loop for a change from one index to the previous.
data have;
format time1-time4 6.2 date1-date4 date9.;
informat date1-date4 date9.;
input time1-time4 date1-date4;
datalines;
0.02 24.02 48.02 72.02 28AUG2019 29AUG2019 28AUG2019 28AUG2019
1.02 34.02 58.02 82.02 01SEP2019 01SEP2019 01SEP2019 01SEP2019
run;
data want;
set have;
array dates date1-date4;
do index=2 to dim(dates);
if dates(index) ne dates(index-1) then at_least_one_date_different_flag = 1;
end;
drop index;
run;
Two
The date should be the same for each hour timepoint.
This could be construed to mean looking down the data set, for each distinct value of time in any of the time1-time4 variables the corresponding date values in the date1-date4 must be also be distinct.
row 1: 1 2 3 4 A B C D
row 2: 1 2 3 5 A B C D
row 3: 1 3 5 2 A X D B
Looking down you have 3:C, 3:C, 3:X which would be different (3 has C's and X's)
If this is the case, please update the question with more sample data of what you have and want.
Related
Hi I need to calculate the value of month supposed in sas
01jan1960 is equal to 1
02jan1960 is equal to 2
So I need to calculate for 01aug2020
I used intck function but no output
I want in datastep only .
SAS stores dates as the number of days since 1960 with zero representing first day of 1960. To represent a date in a program just use a quoted string followed by the letter D. The string needs to be something the DATE informat can interpret.
Let's run a little test.
6 data _null_;
7 do dt=0 to 3,"01-JAN-1960"d,'01AUG2020'd;
8 put dt= +1 dt date9.;
9 end;
10 run;
dt=0 01JAN1960
dt=1 02JAN1960
dt=2 03JAN1960
dt=3 04JAN1960
dt=0 01JAN1960
dt=22128 01AUG2020
So the date value for '01AUG2020'd is 22,128.
Subtraction works
days_interval = '01Aug2020'd - '01Jan1960'd;
Or looking at the unformatted value as SAS stores dates from 01Jan1960
days_interval = '01Aug2020'd;
format days_interval 8.;
I am working with crime data. Now, I have the following table crimes. Each row contains a specific crime (e.g. assault): the date it was committed (date) and a person-ID of the offender (person).
date person
------------------------------
02JAN2017 1
03FEB2017 1
04JAN2018 1 --> not to be counted (more than a year after 02JAN2017)
27NOV2017 2
28NOV2018 2 --> should not be counted (more than a year after 27NOV2017)
01MAY2017 3
24FEB2018 3
10OCT2017 4
I am interested in whether each person has committed (relapse=1) or not committed (relapse=0) another crime within 1 year after the first crime committed by the same person. Another condition is that the first crime has to be committed within a specific year (here 2017).
The result should therefore look like this:
date person relapse
------------------------------
02JAN2017 1 1
03FEB2017 1 1
04JAN2018 1 1
27NOV2017 2 0
28NOV2018 2 0
01MAY2017 3 1
24FEB2018 3 1
10OCT2017 4 0
Can anyone please give me a hint on how to do this in SAS?
Obviously, the real data are much larger, so I cannot do it manually.
One approach is to use DATA step by group processing.
The BY <var> statement sets up binary variables first.<var> and last.<var> that flag the first row in a group and the last row in a group.
You appear to be assigning the computed relapse flag over the entire group, and that kind of computation can be done with what SAS coders call a DOW loop -- a loop with the SET statement inside loop, with a follow up loop that assigns the computation to each row in the group.
The INTCK function can compute the number of years between two dates.
For example:
data want(keep=person date relapse);
* DOW loop computes assertion that relapse occurred;
relapse = 0;
do _n_ = 1 by 1 until (last.person);
set crimes; * <-------------- CRIMES;
by person date;
* check if persons first crime was in 2017;
if _n_ = 1 and year(date) = 2017 then _first = date;
* check if persons second crime was within 1 year of first;
if _n_ = 2 and _first then relapse = intck('year', _first, date, 'C') < 1;
end;
* at this point the relapse flag has been computed, and its value
* will be repeated for each row output;
* serial loop over same number of rows in the group, but
* read in through a second SET statement;
do _n_ = 1 to _n_;
set crimes; * <-------------- CRIMES;
output;
end;
run;
The process would be more complex, with more bookkeeping variables, if the actual process is to classify different time frames of a person as either relapsed or reformed based on rules more nuanced than "1st in 2017 and next within 1 year".
I started using sas relatively recent - I'm not by any means attempting to create perfect code here.
I'd sort the data by id/person and date first (date should be numeric), and then use retain statements check against the date of the first crime. It's not perfect, but if your data is good (no missing dates), it'll work, and it is easy to follow imho.
This only works if the first record and act of crime is supposed to happen in 2017. If you have crimes happening in 2016, and want to check whether 'a crime' is committed in 2017 and then check the relapse, then this code is not going to work - but I think that is covered in the comments beneath your question.
data test;
input tmp_year $ 1-9 person;
datalines;
02JAN2017 1
03FEB2017 1
04JAN2018 1
27NOV2017 2
28NOV2018 2
01MAY2017 3
24FEB2018 3
10OCT2017 4
;
run;
data test2;
set test;
crime_date = input(tmp_year, date9.);
act_year = year(crime_date);
run;
proc sort data=test2;
by person crime_date ;
run;
data want;
set test2;
by person crime_date;
retain date_of_crime;
if first.person and act_year = 2017 then date_of_crime = crime_date;
else if first.person then call missing(date_of_crime);
if intck('YEAR', date_of_crime, crime_date) =< 1 and not first.person
then relapse = 1;
else relapse = 0;
run;
The above code flags the act of crimes committed one year after an act of crime in 2017. You can then retrieve the unique persons with a proc sql statement, and join them with whatever dataset you have.
I am simply looking to create a new column retaining the value of a specific column within the previous record, so that I can compare the existing column against the new column. Further down the line I want to be able to output records whereby the values in both columns are different and lose the records where the values are the same
Essentially I want my dataset to look like this, where the first idr has the retained date set to null:
Idr Date1 Date2
1 20/01/2016 .
1 20/01/2016 20/01/2016
1 18/10/2016 20/01/2016
2 07/03/2016 .
2 18/05/2016 07/03/2016
2 21/10/2016 18/05/2016
3 29/01/2016 .
3 04/02/2016 29/01/2016
3 04/02/2016 04/02/2016
I have used code along the following lines in the past whereby I have created a temporary variable referencing the data I want to retain:
date_temp=date1;
data example2;
set example1;
by idr date1;
date_temp=date1;
retain date_temp ;
if first.idr then do;
date_temp=date1;
end;else do;
date2=date_temp;
end;
run;
I have searched the highs and lows of google - Any help would be greatly appreciated
The trick here is to set the value of the retained variable ready for the next row after you've already output the current row, rather than relying on the default implicit output at the end of the data step:
data example2;
set example1;
by idr;
retain date2;
if first.idr then call missing(date2);
output;
date2 = date1;
format date2 ddmmyy10.;
run;
Logic that executes after the output statement doesn't make any difference to the row that's just been output, but until the data step proceeds to the next iteration, everything is still in the PDV, including variables that are being retained across to the next row, so this is an opportunity to update them. More details on the PDV.
Another way of doing this, using the lag function:
data example3;
set example1;
date2 = lag(date1);
if idr ne lag(idr) then call missing(date2);
run;
Be careful when using lag - it returns the value from the last time that instance of the lag function was executed during your data step, not necessarily the value of that variable from the previous row, so weird things tend to happen if you do something like if condition then mylaggedvar=lag(var);
To achieve your final outcome (remove records where the idr and date are the same as the previous row), you can easily achieve this without creating the extra column. Providing the data is sorted by idr and date1, then just use first.date1 to keep the required records.
data have;
input Idr Date1 :ddmmyy10.;
format date1 ddmmyy10.;
datalines;
1 20/01/2016
1 20/01/2016
1 18/10/2016
2 07/03/2016
2 18/05/2016
2 21/10/2016
3 29/01/2016
3 04/02/2016
3 04/02/2016
;
run;
data want;
set have;
by idr date1;
if first.date1 then output;
run;
Case 1
Suppose the data are sorted by year then by month (always have 3 observations in data).
Year Month Index
2014 11 1.1
2014 12 1.5
2015 1 1.2
I need to copy the Index of last month to new observation
Year Month Index
2014 11 1.1
2014 12 1.5
2015 1 1.2
2015 2 1.2
Case 2
Year is removed from data. So we only have Month and Index.
Month Index
1 1.2
11 1.1
12 1.5
Data is always collected from consecutive 3 months. So 1 is the last month.
Still, ideal output is
Month Index
1 1.2
2 1.2
11 1.1
12 1.5
I solve it by creating another dataset only contains Month (1,2...12). Then right join the original dataset twice. But I think there's more elegant way to deal with this.
Case 1 can be a straight-forward data step. Add end=eof to the set statement to initialize a variable eof that returns value 1 when the data step is reading the last row of the data set. An output statement in the data step outputs a row during each iteration. If eof=1, a do block runs that increments the month by 1 and outputs another row.
data want;
set have end=eof;
output;
if eof then do;
month=mod(month+1,12);
output;
end;
run;
For case 2, I would switch to an sql solution. Self join the table to itself on month, incremented by 1 in the second table. Use the coalesce function to keep the values from the existing table if it exists. If not, use the values from the second table. Since a case crossing December-January will produce 5 months, limit the output to four rows using the outobs= option in proc sql to exclude the unwanted second January.
proc sql outobs=4;
create table want as
select
coalesce(t1.month,mod(t2.month+1,12)) as month,
coalesce(t1.index,t2.index) as index
from
have t1
full outer join have t2
on t1.month = t2.month+1
order by
coalesce(t1.month,t2.month+1)
;
quit;
using the week function to clean some data and eventually will order the weeks. I used week() on the date 8/26/2011 and I got 34, and when the function inserted the date 01/13/2012 it spit out 2. I thouhgt I was getting number of weeks since jan 1, 1960?
As per the WEEK Function documentation, the default U descriptor specifies the number of the week within the year, with Sunday being deemed the 1st day of the week. (You can use V if you want Monday to be considered the 1st day instead.)
The week function calculates the week of the current year. The answer to the implied question, "how do I calculated the number of days since 1/1/1960 [or some arbitrary date]," is the intck function.
data have;
input datevar date9.;
datalines;
01JAN1960
02JAN2013
13JAN2012
26AUG2011
;;;;
run;
data want;
set have;
wks = intck('week',0,datevar); *# of weeks from 0 to datevar [0=1/1/1960].
*Can replace 0 with any other date variable.;
run;