I have a dataset with policies date began between 2018-01-01 and 2019-12-31 (2 years) that has the following info:
policy policy_beg_date policy_end_date
a 01-01-2018 06-02-2018
b 04-02-2019 02-04-2020
c 23-12-2019 03-02-2020
d 02-02-2019
policy begining date (date that policy started) and policy end date (date that policy ended - is missing if policy is still active)
I would like to create flags of 13 months (flag_0month, flag_1month,....flag_13month) where i give 1 if policy is active in month 0, month1, and so on.... where month0 is the month of beginnig, month1 is month of begining + 1 and so on so i would have something like this:
policy policy_beg_date policy_end_date month_begining month_end act_month0 act_month1 act_month2
a 01-01-2018 06-01-2018 201801 201801 1
b 04-02-2019 02-04-2020 201902 202003 1 1 1
c 23-12-2019 03-02-2020 201912 202002 1 1 1
d 02-02-2019 201902 1 1 1
Can anyone please help me achivieng this?
I already tryied someting like in this post : https://communities.sas.com/t5/SAS-Programming/How-to-create-a-flag-for-each-month-lying-between-2-d... but i get errors and i think it's not the same as i need.
Thank you!!
data have ;
infile cards truncover ;
input policy : $1. policy_beg_date : ddmmyy10. policy_end_date : ddmmyy10. ;
format policy_beg_date policy_end_date date9. ;
datalines ;
a 01-01-2018 06-02-2018
b 04-02-2019 02-04-2020
c 23-12-2019 03-02-2020
d 02-02-2019
e 01-01-2020 31-01-2020
;
run;
data want ;
set have ;
/* use intck() to get # months */
months = intck('month',policy_beg_date,min(policy_end_date,date())) ;
/* array to hold the flags */
array act{0:13} act_month0-act_month13 ;
/* loop from 0 to months, set the array flags */
do mn = 0 to min(months,hbound(act)) ;
act{mn} = 1 ;
end ;
run ;
proc print noobs ; run ;
Related
I'd like to assign to an empty field a value based on many values of other entries. Here my dataset:
input ID date $10. type typea $10. ;
datalines;
1 10/11/2006 1 a
2 10/12/2006 2 a
2 . 2 b
3 20/01/2007 5 p
4 . 1 r
5 11/09/2008 1 ca
5 . 1 cb
5 . 2 b
;
run;
My goal is the following: for all empty entries of the variable "date", assign to it the same date of the record which has the same ID, the same type, but a different typea. If there aren't other records with the criteria described, leave the date field empty. So the output should be:
data temp;
input ID date $10. type typea $10.;
datalines;
1 10/11/2006 1 a
2 10/12/2006 2 a
2 10/12/2006 2 b
3 20/01/2007 5 p
4 . 1 r
5 11/09/2008 1 ca
5 11/09/2008 1 cb
5 . 2 b
;
run;
I tried with something like that based on another answer on SO (SAS: get the first value where a condition is verified by group), but it doesn't work:
by ID type typea ;
run;
data temp;
set temp;
by ID type typea ;
if cat(first.ID, first.type, first.typea) then date_store=date;
if cat(ID eq ID and type ne type and typea eq typea) then do;
date_change_type1to2=date_store;
end;
run;
Do you have any hints? Thanks a lot!
You could use UPDATE statement to help you carry-forward the DATE values for a group.
data have;
input ID type typea :$10. date :yymmdd. ;
format date yymmdd10.;
datalines;
1 1 a 2006-11-10
2 2 a 2006-12-10
2 2 b .
3 5 p 2007-01-20
4 1 r .
5 1 ca 2008-09-11
5 1 cb .
5 2 b .
;
data want;
update have(obs=0) have;
by id type ;
output;
run;
If there are also missing values of TYPEA then those will also be carried forward. If you don't want that to happen you could re-read just those variables after the update.
data want;
update have(obs=0) have;
by id type ;
set have(keep=typea);
output;
run;
Hello so this is a sample of my data (There is an additional column of LBCAT =URINALYSIS for those panel of tests)
I've been asked to only include the panel of tests where LBNRIND is populated for any of those tests and the rest to be removed. Some subjects have multiple test results at different visit timepoints and others only have 1.I can't utilise a simple where LBNRIND ne '' in the data step because I need the entire panel of Urinalysis tests and not just that particular test result. What would be the best approach here? I think transposing the data would be too messy but maybe putting the variables in an array/macro and utilising a do loop for those panel of tests?.
Update:I've tried this code but it doesn't keep the corresponding tests for where lb_nrind >0. If I apply the sum(lb_nrind > '' ) the same when applying lb_nrind > '' to the having clause
*proc sql;
*create table want as
select * from labUA
group by ptno and day and lb_cat
having sum(lb_nrind > '') > 0 ;
data want2;
do _n_ = 1 by 1 until (last.ptno);
set labUA;
by ptno period day hour ;
if not flag_group then flag_group = (lb_nrind > '');
end;
do _n_ = 1 to _n_;
set want;
if flag_group then output;
end;
drop flag_group; run;*
You can use a SQL HAVING clause to retain rows of a group meeting some aggregate condition. In your case that group might be a patientid, panelid and condition at least one LBNRIND not NULL
Example:
Consider this example where a group of rows is to be kept only if at least one of the rows in the group meets the criteria result7=77
Both code blocks use the SAS feature that a logical evaluation is 1 for true and 0 for false.
SQL
data have;
infile datalines missover;
input id test $ parm $ result1-result10;
datalines;
1 A P 1 2 . 9 8 7 . . . .
1 B Q 1 2 3
1 C R 4 5 6
1 D S 8 9 . . . 6 77
1 E T 1 1 1
1 F U 1 1 1
1 G V 2
2 A Z 3
2 B K 1 2 3 4 5 6 78
2 C L 4
2 D M 9
3 G N 8
4 B Q 7
4 D S 6
4 C 1 1 1 . . 5 0 77
;
proc sql;
create table want as
select * from have
group by id
having sum(result7=77) > 0
;
DOW Loop
data want;
do _n_ = 1 by 1 until (last.id);
set have;
by id;
if not flag_group then flag_group = (result7=77);
end;
do _n_ = 1 to _n_;
set have;
if flag_group then output;
end;
drop flag_group;
run;
My data is as following:
id balance date
1 10 02Mar2018
1 12 05Mar2018
1 -15 07Mar2018
1 14 14Mar2018
1 -25 25Mar2018
Now i want the number of days id 1 was in positive bal and number of days the id was in negative bal in a march month.
For example no of days in positive will be calculated as following 01mar to 06 mar as first negative entry came on 07Mar so that 6 days.
Then again it went on positive bal on 14 to ,24 that 11 days
so in total it was 6+11=17 days in positive.
And similarly for negative bal.
I tried using following code:
DATA B;
SET A ;
BY ID;
IF FIRST.ID THEN Y=DATE;
RETAIN Y;
ELSE Y=INTCK('day',DATE,Y);
RUN;
But couldn't get the exact results.
Any help will be appriciated.
Assuming your data is sorted by id and date.
First do a 'look-ahead' merge (to get the next date) :
data lookahead ;
merge have
have (firstobs=2 rename=(date=nextdate id=nextid)) ;
if id ^= nextid then call missing(nextdate) ;
drop nextid ;
run ;
/* data now looks like this */
id balance date nextdate
1 10 02Mar2018 05Mar2018
1 12 05Mar2018 07Mar2018
1 -15 07Mar2018 14Mar2018
1 14 14Mar2018 25Mar2018
1 -25 25Mar2018
Then, expand out the missing dates, dealing with instances where the first date per id isn't the 1st of a month, and the last record per id isn't the last day of the month :
data expand ;
set lookahead (rename=(date=thisdate)) ;
by id ;
if first.id and day(thisdate) ^= 1 then do ;
/* loop from 1st of month to day before date, output new record for each date */
do date = intnx('month',thisdate,0,'b') to thisdate - 1 ;
output ;
end ;
end ;
/* output the input record */
date = thisdate ; output ;
/* output dates up to the next date */
if nextdate > thisdate + 1 then do ;
do date = thisdate + 1 to nextdate - 1 ;
output ;
end ;
end ;
else
/* last record for id, loop to end of month */
if missing(nextdate) and thisdate ^= intnx('month',thisdate,0,'end') then do ;
do date = thisdate + 1 to intnx('month',thisdate,0,'end') ;
output ;
end ;
end ;
drop thisdate nextdate ;
format date date9. ;
run ;
/* data now looks like this */
id balance date
1 10 01Mar2018
1 10 02Mar2018
1 10 03Mar2018
1 10 04Mar2018
1 12 05Mar2018
1 12 06Mar2018
1 -15 07Mar2018
1 -15 08Mar2018
... etc ...
1 -15 13Mar2018
1 14 14Mar2018
1 14 15Mar2018
... etc ...
1 14 24Mar2018
1 -25 25Mar2018
... etc ...
1 -25 31Mar2018
It should now be relatively easily to flag the values accordingly, and count them up per id/month.
I have a dataset in SAS and I want to Convert one column into string by the Product. I have attached the image of input and output required.
I need the Colomn STRING in the outut. can anyone please help me ?
I have coded a data step to create the input data:
data have;
input products $
dates
value
;
datalines;
a 1 0
a 2 0
a 3 1
a 4 0
a 5 1
a 6 1
b 1 0
b 2 1
b 3 1
b 4 1
b 5 0
b 6 0
c 1 1
c 2 0
c 3 1
c 4 1
c 5 0
c 6 1
;
Does the following suggested solution give you what you want?:
data want;
length string $ 20;
do until(last.products);
set have;
by products;
string = catx(',',string,value);
end;
do until(last.products);
set have;
by products;
output;
end;
run;
Here's my quick solution.
data temp;
length cat $20.;
do until (last.prod);
set have;
by prod notsorted;
cat=catx(',',cat,value);
end;
drop value date;
run;
proc sql;
create table want as
select have.*, cat as string
from have inner join temp
on have.prod=temp.prod;
quit;
In a summarized dataset, I have the status of an event at each hour after baseline in which it was recorded. I also have the last hour the event could have been recorded. I want to create a new dataset with one record for each hour from the first through the last hour, with the status for each record being the one from the last recorded status.
Here is an example dataset:
data new;
input hour status last_hour;
cards;
2 1 12
4 1 12
5 1 12
6 1 12
7 0 12
9 1 12
10 0 12
;
run;
In this case, the first recorded hour was the second, and the last recorded hour was the 10th. The last possible hour to record data was the 12th.
The final dataset should look like so:
0 . 12
1 . 12
2 1 12
3 1 12
4 1 12
5 1 12
6 1 12
7 0 12
8 0 12
9 1 12
10 0 12
11 0 12
12 0 12
I sort of have it working with this series of data steps, but I'm not sure if there's a cleaner way I'm not seeing.
data step1;
set new (keep=id hour);
by id;
do hour = 0 to last_hour;
output;
end;
run;
proc sort data=step1;
by id hour;
run;
proc sql;
create table step2 as
select distinct a.id, a.hour, b.status
from step1 as a
left join new as b
on a.id = b.id
and a.hour = b.hour
order by a.id, a.hour;
quit;
data step3;
set step2;
by id hour;
retain previous_status;
if first.id then do;
previous_status = .;
if status > . then previous_status = status;
end;
if not first.id then do;
if status = . and previous_status > . then status = previous_status;
if status > . then previous_status = status;
end;
run;
Seeing your code, it seems you left out of your question the fact that you also have id's. So this is a newer solution that deals with different id's. See further below for my first solution ignoring id's.
Since last_hour is always 12, I left it out of the have dataset. It will be added later on.
data have;
input id hour status;
cards;
1 2 1
1 4 1
1 5 1
1 6 1
1 7 0
1 9 1
1 10 0
2 2 1
2 4 1
2 5 1
2 6 1
2 7 0
2 9 1
2 10 0
;
Create a hours dataset, just containing numbers 0 thru 12;
data hours;
do i = 0 to 12;
hour = i;
output;
end;
drop i;
run;
Create a temporary dataset that will have the right number of rows (13 rows for every id, with valid hour values where they exist in the have table).
proc sql;
create table tmp as
select distinct t1.id, t2.hour, 12 as last_hour
from have as t1
cross join
(select hour from hours) as t2;
quit;
Then use merge and retain to fill in the missing hour column where appropriate.
data want;
merge have
tmp;
by id hour;
retain status_previous;
if not first.id then do;
if status ne . then status_previous = status;
else if status_previous ne . then status = status_previous;
end;
if last.id then status_previous = .;
drop status_previous;
run;
Previous solution (no id's)
If last_hour is always 12, then this should do it:
data have;
input hour status last_hour;
datalines;
2 1 12
4 1 12
5 1 12
6 1 12
7 0 12
9 1 12
10 0 12
;
data hours;
do i = 0 to 12;
hour = i;
last_hour = 12;
output;
end;
drop i;
run;
data want;
merge have
hours;
by hour;
retain status_previous;
if status ne . then status_previous = status;
else if status_previous ne . then status = status_previous;
drop status_previous;
run;