I have this parameter code specification
"Create a record with PARAMCD set to "DLQI7" for every record in SDTM.QS where QS.QSTESTCD equals "DLQI7". If there is a corresponding (by USUBJID, VISITNUM) record where QS.QSTESTCD equals "DLQI7SCO" and QS.QSSTRESN is not missing then set AVAL to the non-missing value of QS.QSSTRESN from this record. Else, if QS.QSORRES equals "00" on the original record where QS.QSTESTCD equals "DLQI7", then set AVAL to 0."
Here is a sample of my data with the relevant columns
USUBJID VISITNUM QSTESTCD QSSTRESN QSORRES
1001 2 DLQI7 0 0
1001 4 DLQI7 0 0
1001 5 DLQI7 0 0
1001 6 DLQI7 0 0
1001 2 DLQI7SCO 0 0
1001 4 DLQI7SCO 0 0
1001 5 DLQI7SCO 0 0
1001 6 DLQI7SCO 0 0
1002 2 DLQI7 0 0
1002 4 DLQI7 0 0
1002 5 DLQI7 0 0
1002 6 DLQI7 0 00
1002 2 DLQI7SCO 1 1
1002 4 DLQI7SCO 1 1
1002 5 DLQI7SCO 1 1
1002 6 DLQI7SCO
1002 2 DLQI7 0
1002 10 DLQI7 0
1002 2 DLQI7SCO 2 2
1002 10 DLQI7SCO 0 0
1003 2 DLQI7 0 0
1003 2 DLQI7SCO 1 1
1004 2 DLQI7 0 00
1004 4 DLQI7 0 00
1004 5 DLQI7 0 00
1004 2 DLQI7SCO
1004 4 DLQI7SCO
1004 5 DLQI7SCO
What's the best approach here?. I have to remove the DLQI7SCO or DLQI7 records depending on if they are non-missing as it has to be one record where QS.QSTESTCD equals "DLQI7" for each subject. Should I use proc transpose for the DLQI7SCO records and then maybe use a combination of nmiss and coalesce to produce AVAL?
Let's break this down step by step and translate each statement into a piece of code.
1. Create a record with PARAMCD set to "DLQI7" for every record in SDTM.QS where QS.QSTESTCD equals "DLQI7"
if(QSTESTCD = 'DLQI7') then PARAMCD = QSTESTCD;
2. If there is a corresponding (by USUBJID, VISITNUM) record where QS.QSTESTCD equals "DLQI7SCO" and QS.QSSTRESN is not missing then set AVAL to the non-missing value of QS.QSSTRESN from this record.
if(QSTESTCD = 'DLQI7SCO' AND NOT missing(QSSTRESN) ) then AVAL = QSSTRESN
3. Else, if QS.QSORRES equals "00" on the original record where QS.QSTESTCD equals "DLQI7", then set AVAL to 0."
else if(QSORRES = '00' AND QSTESTCD = 'DLQI7') then AVAL = 0;
The data looks like it is sorted in a particular order. We'll leave it in that order.
This yields the following program:
data want;
set have;
by usubjid qtestcd visitnum notsorted;
if(QSTESTCD = 'DLQI7') then PARAMCD = QSTESTCD;
if(QSTESTCD = 'DLQI7SCO' AND NOT missing(QSSTRESN) ) then AVAL = QSSTRESN;
else if(QSORRES = '00' AND QSTESTCD = 'DLQI7') then AVAL = 0;
run;
If you can provide me with some feedback on if this is the right approach then that would be helpful, but this is my best direct translation of the instructions.
Related
I'm trying to create a data set that will show me the duplicate transactions. The trouble I'm running into is when there are multiple orders on one order_id. The records that get assigned the 2s I would be considering the duplicate order.
data have;
input acct_id order_id;
datalines;
1 121
1 122
2 123
2 124
3 125
3 125
3 125
3 126
3 126
3 126
data want;
set have;
by acct_id order_id;
if first.acct_id then order_count = 1;
else order_count =2;
run;
My desired output is below.
acct_id | order_id | order_count
1 121 1
1 122 2
2 123 1
2 124 2
3 125 1
3 125 1
3 125 1
3 126 2
3 126 2
3 126 2
What I have coded out already I feel like is close but I can't get it figured out.
data want;
set have;
by acct_id order_id notsorted;
if first.acct_id then order_count=0;
if first.order_id then order_count+1;
put acct_id order_id order_count;
run;
acct_id order_id order_count
1 121 1
1 122 2
2 123 1
2 124 2
3 125 1
3 125 1
3 125 1
3 126 2
3 126 2
3 126 2
I need to do this:
table 1:
ID Cod.
1 20
2 102
4 30
7 10
9 201
10 305
table 2:
ID Cod.
1 20
2 50
3 15
4 30
5 25
7 10
10 300
Now, I got a table like this with an outer join:
ID Cod. ID1 Cod1.
1 20 1 20
2 50 . .
. . 2 102
3 15 . .
4 30 4 30
5 25 . .
7 10 7 10
. . 9 201
10 300 . .
. . 10 305
Now I want to add a flag that tell me if the ID have common values, so:
ID Cod. ID1 Cod1. FLag_ID Flag_cod:
1 20 1 20 0 0
2 50 . . 0 1
. . 2 102 0 1
3 15 . . 1 1
4 30 4 30 0 0
5 25 . . 1 1
7 10 7 10 0 0
. . 9 201 1 1
10 300 . . 0 1
. . 10 305 0 1
I would like to know how can I get the flag_ID, specifically to cover the cases of ID = 2 or ID=10.
Thank you
You can group by a coalescence of id in order to count and compare details.
Example
data table1;
input id code ##; datalines;
1 20 2 102 4 30 7 10 9 201 10 305
;
data table2;
input id code ##; datalines;
1 20 2 50 3 15 4 30 5 25 7 10 10 300
;
proc sql;
create table got as
select
table2.id, table2.code
, table1.id as id1, table1.code as code1
, case
when count(table1.id) = 1 and count(table2.id) = 1 then 0 else 1
end as flag_id
, case
when table1.code - table2.code ne 0 then 1 else 0
end as flag_code
from
table1
full join
table2
on
table2.id=table1.id and table2.code=table1.code
group by
coalesce(table2.id,table1.id)
;
You might also want to look into
Proc COMPARE with BY
Proc SQL Version=9.4. No windows functions to use.
There are client id, time period(month), amount and corresponding class.
client_id data_period amount class
1 200801 30000 2
2 200801 17000 1
3 200801 9000 1
1 200802 30000 2
2 200802 55555 2
3 200802 11000 2
Threshold amount = 20 000.
amount > 20k gives class = 2, amount <= 20k makes class = 1
client_id = 1, amount and class are the same for 200801 and 200802.
client_id = 2, amount gets higher from 17k to 55.5k, class change is correct, from 1 to 2.
client_id =3, amount changed within the same class 1 (<20K), but class changed incorrectly.
Desired result is
client_id oldDate newDate AmtOld AmtNew ClassOld ClassNew Good Bad
2 200801 200802 17000 55555 1 2 1 0
3 200801 200802 9000 11000 1 1 0 1
I tried to applied self join to get all the differences btw data periods, but there are too many rows in output. Data below is not from example above, real numbers.
client_id oldDate newDate AmtOld AmtNew ClassOld ClassNew
A001687463 200808 200802 -5613 1690386 I03 I04
A001687463 200807 200802 -5613 1690386 I03 I04
A001687463 200806 200802 -5613 1690386 I03 I04
A001687463 200805 200802 -5613 1690386 I03 I04
PROC SQL;
CREATE TABLE WORK.'Q'n AS
SELECT distinct
t1.client_id, t1.data_period as oldDate, t2.data_period as newDate, t1.amount as expAmtOld, t2.amount as expAmtNew, t1.class as classOld, t2.class as classNew
FROM WORK.'E'n t1, WORK.'E'n t2
where
t1.client_id = t2.client_id and
t1.amount <> t2.amount
order by t1.client_id;
Do not attempt to do sequential processing using SQL. It is not built for that.
It should be easy to do in a data step. For example let's convert your printout into an actual SAS dataset so we have something to code with.
data have ;
input client_id data_period amount class ;
cards;
1 200801 30000 2
2 200801 17000 1
3 200801 9000 1
1 200802 30000 2
2 200802 55555 2
3 200802 11000 2
;
And let's sort it by client and period.
proc sort data=have ;
by client_id data_period ;
run;
Now just set the data and use the LAG() function to get the previous values.
Not sure what you definition of GOOD and BAD were so I just created new class variables based on your rule of 20K.
data want ;
set have ;
by client_id;
old_period = lag(data_period);
old_class = lag(class);
newclass = 1 + (amount > 20000) ;
old_newclass = lag(newclass);
if first.client_id then call missing(of old_:);
bad = (class ne newclass) or (old_newclass ne old_class) ;
run;
So here are the results.
client_ data_ old_ old_ old_
id period amount class period class newclass newclass bad
1 200801 30000 2 . . 2 . 0
1 200802 30000 2 200801 2 2 2 0
2 200801 17000 1 . . 1 . 0
2 200802 55555 2 200801 1 2 1 0
3 200801 9000 1 . . 1 . 0
3 200802 11000 2 200801 1 1 1 1
Here is the data I have, I use proc tabulate to present it how it is presented in excel, and to make the visualization easier. The goal is to make sure groups strictly below the diagonal (i know it's a rectangle, the (1,1) (2,2)...(7,7) "diagonal") to roll up the column until it hits the diagonal or makes a group size of at least 75.
1 2 3 4 5 6 7 (month variable)
(age)
1 80 90 100 110 122 141 88
2 80 90 100 110 56 14 88
3 80 90 87 45 12 41 88
4 24 90 100 110 22 141 88
5 0 1 0 0 0 0 2
6 0 1 0 0 0 0 6
7 0 1 0 0 0 0 2
8 0 1 0 0 0 0 11
Ive already used if/thens to regroup certain data values, but I need a general way to do it for other sets.
Thanks in advance
desired results
1 2 3 4 5 6 7 (month variable)
(age)
1 80 90 100 110 122 141 88
2 80 90 100 110 56 14 88
3 104 90 87 45 12 41 88
4 0 94 100 110 22 141 88
5 0 0 0 0 0 0 2
6 0 0 0 0 0 0 6
7 0 0 0 0 0 0 13
8 0 0 0 0 0 0 0
Mock up some categorical data for some patients who have to be counted
data mock;
do patient_id = 1 to 2500;
month = ceil(7*ranuni(123));
age = ceil(8*ranuni(123));
output;
end;
stop;
run;
Create a tabulation of counts (N) similar to the one shown in the question:
options missing='0';
proc tabulate data=mock;
class month age;
table age,month*n=''/nocellmerge;
run;
For each month get the sub-diagonal patient count
proc sql;
/* create table subdiagonal_column as */
select month, count(*) as subdiag_col_freq
from mock
where age > month
group by month;
For each row get the pre-diagonal patient count
/* create table prediagonal_row as */
select age, count(*) as prediag_row_freq
from mock
where age > month
group by age;
other sets can be tricky if the categorical values are not +1 monotonic. To do a similar process for non-montonic categorical values you will need to create surrogate variables that are +1 monotonic. For example:
data mock;
do item_id = 1 to 2500;
pet = scan ('cat dog snake rabbit hamster', ceil(5*ranuni(123)));
place = scan ('farm home condo apt tower wild', ceil(6*ranuni(123)));
output;
end;
run;
proc tabulate data=mock;
class pet place;
table pet,place*n=''/nocellmerge;
run;
proc sql;
create table unq_pets as select distinct pet from mock;
create table unq_places as select distinct place from mock;
data pets;
set unq_pets;
pet_num = _n_;
run;
data places;
set unq_places;
place_num = _n_;
run;
proc sql;
select distinct place_num, mock.place, count(*) as subdiag_col_freq
from mock
join pets on pets.pet = mock.pet
join places on places.place = mock.place
where pet_num > place_num
group by place_num
order by place_num
;
For example, if I have a collection of individual records, organized into households, and I'd like to assign to the parents coresidentwithdaughter=1 to individuals who have a daughter, and coresidentwithson=1 to individuals who have a son how would I do that?
How would I code coresidentwithdaughter and coresidentwithson for the following result, given the variables of household number and relation
sample data:
household 703 703 703 703 703 703
sex 1 2 2 2 1 1
age 43 41 17 16 13 12
relation head spouse child child child child
coresidentwithdaughter 1 1 0 0 0 0
coresidentwithson 1 1 0 0 0 0
household 704 704 704 sex 2 2 1 age 29 20 2 relation head sister child coresidentwithdaughter 0 0 0 coresidentwithson 1 0 0
Based on the assumption that 'child' is the only relation to be deemed a son or daughter, and 'head' and 'spouse' are the only values allowed to be parents (not sure what you intended to do with 'sister'). The following will do the job using PROC SQL.
proc sql;
select t.*,
case when t.relation in ('head','spouse') and x.son then 1 else 0 end as coresidentwithson,
case when t.relation in ('head','spouse') and x.daughter then 1 else 0 end as coresidentwithdaughter
table t
inner join
(
select
household,
sum(case when sex=1 then 1 else 0 end) as son
sum(case when sex=2 then 1 else 0 end) as daughter
from table
where relation = 'child'
group by 1
) x
on t.household=x.household;
quit;
This is an example of a self-join, which is required when performing a query that refers back to the same group. This will also be possible using the data-step, but I don't have that solution.