How compare values of a variable conditioning it to another variable - SAS - sas

[Hi everyone, in dataset in figure 1 my goal is to say, for each usubjid patient, if the avalc value reported for paramcd=WFNSECRF is equal to that reported for paramcd=WFNSIRT. For example for patient ID-054-304-1101002 these two values are the same while for patient ID-054-304-1107007 they differ. In particular I have to create the variable discrepancy in which I put 'No' if the values are equal and 'Yes' if they are different. How can I do it?
Thanks in advance for the answers] 1

This is easily done by using proc sql with a simple case statement.
data have;
input USUBJID :$20. PARAMCD :$15. AVALC :$10.;
datalines;
ID-054-304-1101001 WFNSECRF GRADE_I
ID-054-304-1101002 WFNSECRF GRADE_I
ID-054-304-1101002 WFNSIRT GRADE_I
ID-054-304-1101003 WFNSECRF GRADE_I
ID-054-304-1101004 WFNSECRF GRADE_I
ID-054-304-1101004 WFNSECRF GRADE_II
ID-054-304-1101004 WFNSIRT GRADE_II
ID-054-304-1101005 WFNSECRF GRADE_I
ID-054-304-1101005 WFNSIRT GRADE_II
ID-054-304-1101006 WFNSECRF GRADE_I
ID-054-304-1101007 WFNSIRT GRADE_I
;
run;
If every AVALC value in the same USUBJID is the same, then count(distinct avalc) will equal 1, so value of your discrepancy should be No. For every other potential case, the value will be Yes, including if AVALC value is missing altogether.
proc sql;
create table want as
select *, case
when count(distinct avalc) = 1 then 'No'
else 'Yes'
end as discrepancy
from have
group by usubjid
;
quit;

Related

Alternative to LAG function

My dataset is structured like this:
ID, Order Date, Delivery Date, Flag
1, 10/03/12, 15/03/12,
1, 17/03/12, 20/03/12, 1
I want to be able to calculate the date difference between the first occurring delivery date and subsequent order dates. Eventual aim is group these records with an identifier.
Have tried the monotonic() with monotonic()+1 for a self join - but the problem with this is that each ID can have multiple different numbers of rows needing to be grouped together. Am using SAS Enterprise Guide 7 - unfortunately LAG is not available.
An example of what I'm looking to achieve is:
ID, Order Date, Deliv Date, Order Date_1, Deliv Date_1, DateDIFF(Deliv Date - Order Date_1)
1, 10/03/12, 15/03/12, 17/03/12, 20/03/12, 2
Any ideas?
You need to proc sort the data in descending order by Delivery_Date then retain the values of the previous record while grouping by ID in order to calculate the difference.
Code:
data have;
infile datalines dlm=',' dsd;
informat id 11. Order_Date ddmmyy8. Delivery_Date ddmmyy8. flag 11.;
format Order_Date ddmmyy8. Delivery_Date ddmmyy8.;
input ID Order_Date Delivery_Date Flag;
datalines;
1, 10/03/12, 15/03/12,.
1, 17/03/12, 20/03/12, 1
2, 10/03/12, 10/03/12,.
2, 17/03/12, 20/03/12, 1
2, 10/03/12, 27/03/12,1
2, 17/03/12, 23/03/12, 1
run;
proc sort data=have;
by id descending Delivery_Date ;
run;
data want;
set have;
by id;
retain nxt_date;
if first.id=1 then do;
nxt_date = Delivery_Date;
diff=0;
end;
else do;
prv_date=nxt_date;
diff=nxt_date-Delivery_Date;
nxt_date = Delivery_Date;
end;
format prv_date ddmmyy8.
drop nxt_date;
run;
Output:
id=1 Order_Date=17/03/12 Delivery_Date=20/03/12 flag=1 diff=0 prv_date=.
id=1 Order_Date=10/03/12 Delivery_Date=15/03/12 flag=. diff=5 prv_date=20/03/12
id=2 Order_Date=10/03/12 Delivery_Date=27/03/12 flag=1 diff=0 prv_date=.
id=2 Order_Date=17/03/12 Delivery_Date=23/03/12 flag=1 diff=4 prv_date=27/03/12
id=2 Order_Date=17/03/12 Delivery_Date=20/03/12 flag=1 diff=3 prv_date=23/03/12
id=2 Order_Date=10/03/12 Delivery_Date=10/03/12 flag=. diff=10 prv_date=20/03/12

All values for only most recent occurrence

I am trying to extract all the Time occurrences for only the recent visit. Can someone help me with the code please.
Here is my data:
Obs Name Date Time
1 Bob 2017090 1305
2 Bob 2017090 1015
3 Bob 2017081 0810
4 Bob 2017072 0602
5 Tom 2017090 1300
6 Tom 2017090 1010
7 Tom 2017090 0805
8 Tom 2017072 0607
9 Joe 2017085 1309
10 Joe 2017081 0815
I need the output as:
Obs Name Date Time
1 Bob 2017090 1305,1015
2 Tom 2017090 1300,1010,0805
3 Joe 2017085 1309
Right now my code is designed to give me only one recent entry:
DATA OUT2;
SET INP1;
BY DATE;
IF FIRST.DATE THEN OUTPUT OUT2;
RETURN;
I would first sort the data by name and date. Then I would transpose and process the results.
proc sort data=have;
by name date;
run;
proc transpose data=have out=temp1;
by name date;
var value;
run;
data want;
set temp1;
by name date;
if last.name;
format value $2000.;
value = catx(',',of col:);
drop col: _name_;
run;
You may want to further process the new VALUE to remove excess commas (,) and missing value .'s.
Very similar to the question yesterday from another user, you can use quite a few solutions here.
SQL again is the easiest; this is not valid ANSI SQL and pretty much only SAS supports this, but it does work in SAS:
proc sql;
select name, date, time
from have
group by name
having date=max(date);
quit;
Even though date and time are not on the group by it's legal in SAS to put them on the select, and then SAS automatically merges (inner joins) the result of select name, max(date) from have group by name having date=max(date) to the original have dataset, returning multiple rows as needed. Then you'd want to collapse the rows, which I leave as an exercise for the reader.
You could also simply generate a table of maximum dates using any method you choose and then merge yourself. This is probably the easiest in practice to use, in particular including troubleshooting.
The DoW loop also appeals here. This is basically the precise SAS data step implementation of the SQL above. First iterate over that name, figure out the max, then iterate again and output the ones with that max.
proc sort data=have;
by name date;
run;
data want;
do _n_ = 1 by 1 until (last.name);
set have;
by name;
max_Date = max(max_date,date);
end;
do _n_ = 1 by 1 until (last.name);
set have;
by name;
if date=max_date then output;
end;
run;
Of course here you more easily collapse the rows, too:
data want;
length timelist $1024;
do _n_ = 1 by 1 until (last.name);
set have;
by name;
max_Date = max(max_date,date);
end;
do _n_ = 1 by 1 until (last.name);
set have;
by name;
if date=max_date then timelist=catx(',',timelist,time);
if last.name then output;
end;
run;
If the data is sorted then just retain the first date so you know which records to combine and output.
proc sort data=have ;
by name descending date time;
run;
data want ;
set have ;
by name descending date ;
length timex $200 ;
retain start timex;
if first.name then do;
start=date;
timex=' ';
end;
if date=start then do;
timex=catx(',',timex,time);
if last.date then do;
output;
call missing(start,timex);
end;
end;
drop start time ;
rename timex=time ;
run;

Select many columns and other non-continuous columns to find duplicate?

I have a dataset with many columns like this:
ID Indicator Name C1 C2 C3....C90
A 0001 Black 0 1 1.....0
B 0001 Blue 1 0 0.....1
B 0002 Blue 1 0 0.....1
Some of the IDs are duplicates because the indicator is different, but they're essentially the same record. To find duplicates, I want to select distinct ID, Name and then C1 through C90 to check because some claims who have the same Id and indicator have different C1...C90 values.
Is there a way to select c1...c90 either through proc sql or a sas data step? It seems the only way I can think of is to set the dataset and then drop the non essential columns, but in the actual dataset, it's not only Indicator but at least 15 other columns.
It would be nice if PROC SQL used the : variable name wildcard like other Procs do. When no other alternative is reasonable, I usually use a macro to select bulk columns. This might work for you:
%macro sel_C(n);
%do i=1 %to %eval(&n.-1);
C&i.,
%end;
C&n.
%mend sel_C;
proc sql;
select ID,
Indicator,
Name,
%sel_C(90)
from have_data;
quit;
If I understand the question properly, the easiest way would be to concatenate the columns to one. RETAIN that value from row to row, and you can compare it across rows to see if it's the same or not.
data want;
set have;
by id indicator;
retain last_cols;
length last_cols $500;
cols = catx('|',of c1-c90);
if first.id then call missing(last_cols);
else do;
identical = (cols = last_cols); *or whatever check you need to perform;
end;
output;
last_cols = cols;
run;
There are a few different ways you can do this and it will be much easier if the actual column names are C1 - C90. If you're just looking to remove anything that you know is a duplicate you can use proc sort.
proc sort data=dups out=nodups nodupkey;
by ID Name C1-C90;
run;
The nodupkey option will automatically remove any duplicates in the by statement.
Alternatively, if you want to know which records contain duplicates, you could use proc summary.
proc summary data=dups nway missing;
class ID Name C1-C90;
output out=onlydups(where=(_freq_ > 1));
run;
proc summary creates two new variables, _type_ and _freq_. If you specify _freq_ > 1 you will only output the duplicate records. Also, note that this will remove the Indicator variable.

First statement in SAS

I have a data in the given way below
ID typ date
1 A 2014jan01
1 B 2014mar01
1 B 2014apr01
1 A 2014jun01
I want to create a new variable with Count, wrt the typ and also date.
DESIRED OUTPUT
ID typ date count
1 A 2014jan01 1
1 B 2014mar01 1
1 B 2014apr01 2
1 A 2014jun01 1
i wrote this program
proc sort data=have; by ID date typ;run;
data want;
set have;
by ID date typ;
if first.typ then Count=1;
else
Count+1; run;
but it is not giving the desired result.
#Quentin has provided the correct answer (using the NOTSORTED option in the data step). You need to understand how the FIRST variable works with the order of variables in the BY statement. Your order is ID DATE TYP, so FIRST.ID is set to 1 for the first record only, FIRST.DATE is set to 1 for all records as the date is different each time, which means that any subsequent variable (i.e. FIRST.TYP) is also set to 1 for all records. Below is the code you should be running (credit to #Quentin)
proc sort data=have; by ID date typ;run;
data want;
set have;
by ID typ date notsorted;
if first.typ then Count=1;
else Count+1;
run;

How to create a new variable in SAS by extracting part of the value of an existing numeric variable?

I have two datasets in SAS that I would like to merge, but they have no common variables. One dataset has a "subject_id" variable, while the other has a "mom_subject_id" variable. Both of these variables are 9-digit codes that have just 3 digits in the middle of the code with common meaning, and that's what I need to match the two datasets on when I merge them.
What I'd like to do is create a new common variable in each dataset that is just the 3 digits from within the subject ID. Those 3 digits will always be in the same location within the 9-digit subject ID, so I'm wondering if there's a way to extract those 3 digits from the variable to make a new variable.
Thanks!
SQL(using sample data from Data Step code):
proc sql;
create table want2 as
select a.subject_id, a.other, b.mom_subject_id, b.misc
from have1 a JOIN have2 b
on(substr(a.subject_id,4,3)=substr(b.mom_subject_id,4,3));
quit;
Data Step:
data have1;
length subject_id $9;
input subject_id $ other $;
datalines;
abc001def other1
abc002def other2
abc003def other3
abc004def other4
abc005def other5
;
data have2;
length mom_subject_id $9;
input mom_subject_id $ misc $;
datalines;
ghi001jkl misc1
ghi003jkl misc3
ghi005jkl misc5
;
data have1;
length id $3;
set have1;
id=substr(subject_id,4,3);
run;
data have2;
length id $3;
set have2;
id=substr(mom_subject_id,4,3);
run;
Proc sort data=have1;
by id;
run;
Proc sort data=have2;
by id;
run;
data work.want;
merge have1(in=a) have2(in=b);
by id;
run;
an alternative would be to use
proc sql
and then use a join and the substr() just as explained above, if you are comfortable with sql
Assuming that your "subject_id" variable is a number then the substr function wont work as sas will try convert the number to a string. But by default it pads some paces on the left of the number.
You can use the modulus function mod(input, base) which returns the remainder when input is divided by base.
/*First get rid of the last 3 digits*/
temp_var = floor( subject_id / 1000);
/* then get the next three digits that we want*/
id = mod(temp_var ,1000);
Or in one line:
id = mod(floor(subject_id / 1000), 1000);
Then you can continue with sorting the new data sets by id and then merging.