Identifying overlap medication use - sas

Asked on SAS communitiesas well , havent gotten a correct response.
https://communities.sas.com/t5/SAS-Programming/Identifying-overlap-medication-use/m-p/628115#M185541
I have a problem similar to the problem in -
https://communities.sas.com/t5/SAS-Programming/Concomitant-drug-medication-use/m-p/339879#M77587
However I have an issue , I have overlapping of same drug as well -
Eg:
+----+------+-----------+-----------+-----------+
| ID | DRUG | START_DT | DAYS_SUPP | END_DT |
+----+------+-----------+-----------+-----------+
| 1 | A | 2/17/2010 | 30 | 3/19/2010 |
| 1 | A | 3/17/2010 | 30 | 4/16/2010 |
| 1 | A | 4/12/2010 | 30 | 5/12/2010 |
| 1 | A | 8/20/2010 | 30 | 9/19/2010 |
| 1 | B | 5/6/2009 | 30 | 6/5/2009 |
+----+------+-----------+-----------+-----------+
Here the three A prescriptions are over lapping .
So using the code in the link gives me combinations like A-A-B
whereas I don't want that.
However I want to account for the overlapping days for drug A. So I want to shift the second row prescription to 3/20/2010 to 4/19/2010. Similarly for 3rd A prescription.
the code I have tried -
data have2;
set have_sorted1;
format NEW_START_DT NEW_END_DT _lagEND_DT date9.;
_lagID = lag(patient_ID);
_lagDRUG = lag(drg_cls);
_lagEND_DT = lag(rx_ed_dt);
if patient_ID = _lagID and drg_cls= _lagDRUG and rx_st_dt <= _lagEND_DT then flag=1;
else flag = 0;
retain NEW_START_DT NEW_END_DT;
if flag=0 then do;
NEW_START_DT = rx_st_dt;
NEW_END_DT = rx_ed_dt;
end;
else do;
New_start_dt = NEW_End_DT + 1;
NEW_END_DT = new_start_dt + DAY_SUPP ;
end;
/* drop flag _:;*/
run;
But even then I get incorrect result -
id Drug drug_start day_supp drug_end New_start New_end
15 A 6-Sep-15 30 5-Oct-15 6-Sep-15 5-Oct-15
15 A 24-Sep-15 90 22-Dec-15 6-Oct-15 4-Jan-16
15 A 6-Dec-15 90 4-Mar-16 5-Jan-16 4-Apr-16
15 A 26-Feb-16 90 25-May-16 5-Apr-16 4-Jul-16
15 A 29-May-16 90 26-Aug-16 29-May-16 26-Aug-16
15 A 7-Dec-16 90 6-Mar-17 7-Dec-16 6-Mar-17
15 A 17-Feb-17 90 17-May-17 7-Mar-17 5-Jun-17

It might be easier to track the 'flag' state implicitly in a shift variable that tracks how many days to shift forward.
Example:
Shift is always applied, but will be zero when no overlap occurs. The prior end, after computation, is tracked in a retained variable. The code does not need to rely on LAG.
data have;
infile cards firstobs=3 dlm='|';
input ID DRUG: $ START_DT: mmddyy10. DAYS_SUPP END_DT: mmddyy10.;
format start_dt end_dt mmddyy10.;
datalines;
| ID | DRUG | START_DT | DAYS_SUPP | END_DT |
+----+------+-----------+-----------+-----------+
| 1 | A | 2/17/2010 | 30 | 3/19/2010 |
| 1 | A | 3/17/2010 | 30 | 4/16/2010 |
| 1 | A | 4/12/2010 | 30 | 5/12/2010 |
| 1 | A | 8/20/2010 | 30 | 9/19/2010 |
| 1 | B | 5/6/2009 | 30 | 6/5/2009 |
;
data want;
set have;
by id drug;
retain shift prior_shifted_end;
select;
when (first.drug) shift = 0;
when (prior_shifted_end > start_dt) shift = prior_shifted_end - start_dt + 1;
otherwise shift = 0;
end;
original_start_dt = start_dt;
original_end_dt = end_dt;
start_dt + shift;
end_dt + shift;
prior_shifted_end = end_dt;
format prior: original: mmddyy10.;
run;

Related

How to extracting all values that contain part of particular number and then deleting them?

How do you extract all values containing part of a particular number and then delete them?
I have data where the ID contains different lengths and wants to extract all the IDs with a particular number. For example, if the ID contains either "-00" or "02" or "-01" at the end, pull to be able to see the hit rate that includes those—then delete them from the ID. Is there a more effecient way in creating this code?
I tried to use the substring function to slice it to get the result, but there is some other ID along with the specified position.
Code:
Proc sql;
Create table work.data1 AS
SELECT Product, Amount_sold, Price_per_unit,
CASE WHEN Product Contains "Pen" and Lenghth(ID) >= 9 Then ID = SUBSTR(ID,1,9)
WHEN Product Contains "Book" and Lenghth(ID) >= 11 Then ID = SUBSTR(ID,1,11)
WHEN Product Contains "Folder" and Lenghth(ID) >= 12 Then ID = SUBSTR(ID,1,12)
...
END AS ID
FROM A
Quit;
Have:
+------------------+-----------------+-------------+----------------+
| ID | Product | Amount_sold | Price_per_unit |
+------------------+-----------------+-------------+----------------+
| 123456789 | Pen | 30 | 2 |
| 63495837229-01 | Book | 20 | 5 |
| ABC134475472 02 | Folder | 29 | 7 |
| AB-1235674467-00 | Pencil | 26 | 1 |
| 69598346-02 | Correction pen | 15 | 1.50 |
| 6970457688 | Highlighter | 15 | 2 |
| 584028467 | Color pencil | 15 | 10 |
+------------------+-----------------+-------------+----------------+
Wanted the final result:
+------------------+-----------------+-------------+----------------+
| ID | Product | Amount_sold | Price_per_unit |
+------------------+-----------------+-------------+----------------+
| 123456789 | Pen | 30 | 2 |
| 63495837229 | Book | 20 | 5 |
| ABC134475472 | Folder | 29 | 7 |
| AB-1235674467 | Pencil | 26 | 1 |
| 69598346 | Correction pen | 15 | 1.50 |
| 6970457688 | Highlighter | 15 | 2 |
| 584028467 | Color pencil | 15 | 10 |
+------------------+-----------------+-------------+----------------+
Just test if the string has any embedded spaces or hyphens and also that the last word when delimited by space or hyphen is 00 or 01 or 02 then chop off the last three characters.
data have;
infile cards dsd dlm='|' truncover ;
input id :$20. product :$20. amount_sold price_per_unit;
cards;
123456789 | Pen | 30 | 2 |
63495837229-01 | Book | 20 | 5 |
ABC134475472 02 | Folder | 29 | 7 |
AB-1235674467-00 | Pencil | 26 | 1 |
69598346-02 | Correction pen | 15 | 1.50 |
6970457688 | Highlighter | 15 | 2 |
584028467 | Color pencil | 15 | 10 |
;
data want;
set have ;
if indexc(trim(id),'- ') and scan(id,-1,'- ') in ('00' '01' '02') then
id = substrn(id,1,length(id)-3)
;
run;
Result
amount_ price_
Obs id product sold per_unit
1 123456789 Pen 30 2.0
2 63495837229 Book 20 5.0
3 ABC134475472 Folder 29 7.0
4 AB-1235674467 Pencil 26 1.0
5 69598346 Correction pen 15 1.5
6 6970457688 Highlighter 15 2.0
7 584028467 Color pencil 15 10.0
There may be other solutions but you have to use some string functions. I used here the functions substr, reverse (reverting the string) and indexc (position of one of the characters in the string):
data have;
input text $20.;
datalines;
12345678
AB-142353 00
AU-234343-02
132453 02
221344-09
;
run;
data want (drop=reverted pos);
set have;
if countw(text) gt 1
then do;
reverted=strip(reverse(text));
pos=indexc(reverted,'- ')+1;
new=strip(reverse(substr(reverted,pos)));
end;
else new=text;
run;

Grouping child items and displaying parent sum

I have the following table
+-------+--------+---------+
| group | item | value |
+-------+--------+---------+
| 1 | a | 10 |
| 1 | b | 20 |
| 2 | b | 30 |
| 2 | c | 40 |
+-------+--------+---------+
I would like to group the table by group, insert the grouped sum into value, and then ungroup:
+-------+--------+
| item | value |
+-------+--------+
| 1 | 30 |
| a | 10 |
| b | 20 |
| 2 | 70 |
| b | 30 |
| c | 40 |
+-------+--------+
The purpose of the result is to interpret the first column as items a and b belonging to group 1 with sum 30 and items b and c belonging to group 2 with sum 70.
Such a data transformation can be indicative of a reporting requirement more than a useful data structure for downstream processing. Proc REPORT can create output in the form desired.
data have;
infile datalines;
input group $ item $ value ##; datalines;
1 a 10 1 b 20 2 b 30 2 c 40
;
proc report data=have;
column group item value;
define group / order order=data noprint;
break before group / summarize;
compute item;
if missing(item) then item=group;
endcomp;
run;
I assume that both group and item are character variables
data have;
infile datalines firstobs=4 dlm='|';
input group $ item $ value;
datalines;
+-------+--------+---------+
| group | item | value |
+-------+--------+---------+
| 1 | a | 10 |
| 1 | b | 20 |
| 2 | b | 30 |
| 2 | c | 40 |
+-------+--------+---------+
;
data want (keep=group value);
do _N_=1 by 1 until (last.group);
set have;
by group;
v + value;
end;
value = v;output;v=0;
do _N_=1 to _N_;
set have;
group = item;
output;
end;
run;

Sequences in SAS Tables

I'm looking to add a sequence column to my sas dataset, but according to ids and transaction dates. To illustrate, below is the table I'm referring to:
ID | TXN_DT |
01 | 01JAN2020 |
01 | 01JAN2020 |
01 | 02JAN2020 |
01 | 03JAN2020 |
02 | 01JAN2020 |
02 | 02JAN2020 |
02 | 02JAN2020 |
02 | 03JAN2020 |
02 | 03JAN2020 |
and I want to add a sequence like so:
ID | TXN_DT | SEQ |
01 | 01JAN2020 | 1 |
01 | 01JAN2020 | 1 |
01 | 02JAN2020 | 2 |
01 | 03JAN2020 | 3 |
02 | 01JAN2020 | 1 |
02 | 02JAN2020 | 2 |
02 | 02JAN2020 | 2 |
02 | 03JAN2020 | 3 |
02 | 03JAN2020 | 3 |
I'm trying to run the following code, but it seems to jump a row up and not copying the previous' row's value, and instead skips to 2 rows above.
data want;
set have;
by id;
if first.id then seq=1;
else seq+1;
if txn_dt=lag(txn_dt) then seq = lag(seq);
keep id seq txn_dt;
run;
any help? Thanks in advance!
Try
if first.id then seq=0;
seq + (first.id or txn_dt ne lag(txn_dt);
Try to use retain and first.
data want(drop=txn_dt_group);
set have;
by id txn_dt;
retain txn_dt_group seq;
if first.id then do;
txn_dt_group=txn_dt;
seq=1;
end;
if txn_dt ne txn_dt_group then do;
seq=seq+1;
txn_dt_group=txn_dt;
end;
run;
Output:
+-----------+----+-----+
| txn_dt | ID | seq |
+-----------+----+-----+
| 01JAN2020 | 1 | 1 |
| 01JAN2020 | 1 | 1 |
| 02JAN2020 | 1 | 2 |
| 03JAN2020 | 1 | 3 |
| 01JAN2020 | 2 | 1 |
| 02JAN2020 | 2 | 2 |
| 02JAN2020 | 2 | 2 |
| 03JAN2020 | 2 | 3 |
| 03JAN2020 | 2 | 3 |
+-----------+----+-----+
data want;
set have;
by id txn_dt;
if first.id then seq=1;
else if first.txn_dt then seq+1;
run;
I think that should do it.
For completeness, here is a hash solution that does not depend on the order of your data.
data have;
input ID $ TXN_DT :date9.;
infile datalines dlm='|';
format TXN_DT date9.;
datalines;
01|01JAN2020
01|01JAN2020
01|02JAN2020
01|03JAN2020
02|01JAN2020
02|02JAN2020
02|02JAN2020
02|03JAN2020
02|03JAN2020
;
data want(drop=rc);
if _N_ = 1 then do;
dcl hash h1 ();
h1.definekey ('ID', 'TXN_DT');
h1.definedata ('SEQ');
h1.definedone ();
dcl hash h2 ();
h2.definekey ('ID');
h2.definedata ('SEQ');
h2.definedone ();
do until (lr);
set have end=lr;
if h2.find() = 0 then do;
if h1.check() ne 0 then seq + 1;
end;
else seq = 1;
h1.ref();
h2.replace();
end;
end;
set have;
rc = h1.find();
run;

adding rows given a certain condition

I have a database with 3 columns. ID, Date and amount. It is ordered by ID and Date. All I want to do is to add a row after the latest occurrence of every ID with the same ID, Date = Date + 1 Month and Amount = 0.
As an Illustration I want to go from this:
id | Date |amount |
A | 01JAN| 1 |
A | 01FEB| 1 |
B | 01FEB| 0 |
B | 01MAR| 1 |
to this:
id | Date |amount |
A | 01JAN| 1 |
A | 01FEB| 1 |
A | 01MAR| 0 | <- ADD THIS ROW
B | 01FEB| 0 |
B | 01MAR| 1 |
B | 01APR| 0 |<- ADD THIS ROW
I know I should use intxn but beyond that I don't really know what to do. I appreciate any input.
Assuming that the DATE variable has actual date values in it you just need to output twice on the last observation in each group.
data want;
set have;
by id;
output;
if last.id then do;
date=intnx('month',date,1,'b');
amount=0;
output;
end;
run;

Chi-Sq test result difference when done Manually and by SAS

I am trying to perform a chi-square test on my data using SAS University Edition.
Here is the strucure of my data
+----------+------------+------------------+-------------------+
| study_id | Control_id | study_mortality | control_mortality |
+----------+------------+------------------|-------------------+
| 1 | 50 | Alive | Alive |
| 1 | 52 | Alive | Alive |
| 2 | 65 | Dead | Dead |
| 2 | 70 | Dead | Alive |
+----------+------------+------------------+-------------------+
I am getting different results when I do the test with SAS Vs when I do it manually using an online calculator. I used the values from 'PROC FREQ' to calculate the Chi-Sq using online calculator. Here are the outputs of frequencies and the Chi-sq test. Can someone point where the issue is.
proc freq data = mydata;
tables study_mortality control_mortality;
where type=1;
run;
+-----------------+-------------------+
| study_mortality | Frequency |
+-----------------+-------------------
| Alive | 7614 |
| Dead | 324 |
+-----------------+-------------------+
+----------------- +-------------------+
| control_mortality| Frequency |
+----------------- +-------------------
| Alive | 6922 |
| Dead | 159 |
+----------------- +-------------------+
proc freq data = mydata;
tables study_mortality*control_mortality/ CHISQ;
where type=1;
run;
+-----------------+-------------------+---------+-------+
| | Control_mortality | | |
+-----------------+-------------------+---------+-------+
| Study_mortality | Alive | Dead | Total |
| Alive | 5515 | 134 | 5649 |
| Dead | 249 | 5 | 254 |
| Total | 5764 | 139 | 5903 |
+-----------------+-------------------+---------+-------+
Statistic DF Value Prob
Chi-Square 1 0.1722 0.6782
Likelihood Ratio Chi-Square 1 0.1818 0.6699
Continuity Adj. Chi-Square 1 0.0414 0.8388
Mantel-Haenszel Chi-Square 1 0.1722 0.6782
Phi Coefficient -0.0054
Contingency Coefficient 0.0054
Cramer's V -0.0054
You have missing data. Look at the N's on those tables.
Study Mortality is around 8000 and Control Mortality is around 7000 but when you cross them you only have 5903 records. This means that certain records are excluded. There should be a line in the output saying N missing somewhere. Not sure if SAS didn't put it there or you only pasted selected output. The P value matches exactly when I use an online calculator and also match your output.
data have;
infile cards;
input Study Control N;
cards;
1 1 5515
1 0 134
0 1 249
0 0 5
;
run;
proc freq data=have;
table study*control / chisq;
weight N;
run;