Using a sas lookup table when the column number changes - sas

I have two sas datasets,
Table 1 Table 2
col1 col2 col3 col4 col5 a b
. 1 2 3 4 1 1
1 5 8 6 1 1 4
2 5 9 7 1 4 3
3 6 9 7 1 2 1
4 6 9 7 2 2 2
where table 1 is a lookup table for values a and b in table 2, such that I can make a column c. In table 1 a is equivalent to col1 and b to row1 (i.e. the new column c in table 2 should read 5,1,7,5,9. How can I achieve this in sas. I was thinking of reading table 1 into a 2d array then get column c = array(a,b), but can't get it to work

Here's an IML solution, first, as I think this is really the 'best' solution for you - you're using a matrix, so use the matrix language. I'm not sure if there's a non-loop method - there may well be; if you want to find out, I would add the sas-iml tag to the question and see if Rick Wicklin happens by the question.
data table1;
input col1 col2 col3 col4 col5 ;
datalines;
. 1 2 3 4
1 5 8 6 1
2 5 9 7 1
3 6 9 7 1
4 6 9 7 2
;;;;
run;
data table2;
input a b;
datalines;
1 1
1 4
4 3
2 1
2 2
;;;;
run;
proc iml;
use table1;
read all var _ALL_ into table1[colname=varnames1];
use table2;
read all var _ALL_ into table2[colname=varnames2];
print table1;
print table2;
table3 = j(nrow(table2),3);
table3[,1:2] = table2;
do _i = 1 to nrow(table3);
table3[_i,3] = table1[table3[_i,1]+1,table3[_i,2]+1];
end;
print table3;
quit;

Here is the temporary array solution. It's not all that pretty. If speed is an issue you don't have to loop over the array to insert it, you can use direct memory access, but I don't want to do that unless speed is a huge issue (and if it is, you should use a better data structure first).
data table3;
set table2;
array _table1[4,4] _temporary_;
if _n_ = 1 then do;
do _i = 1 by 1 until (eof);
set table1(firstobs=2) nobs=_nrows end=eof;
array _cols col2-col5;
do _j = 1 to dim(_cols);
_table1[_i,_j] = _cols[_j];
end;
end;
end;
c = _table1[a,b];
keep a b c;
run;

Just use the POINT= option on a SET statement to pick the row. You can then use an ARRAY to pick the column.
data table1 ;
input col1-col4 ;
cards;
5 8 6 1
5 9 7 1
6 9 7 1
6 9 7 2
;
data table2 ;
input a b ;
cards;
1 1
1 4
4 3
2 1
2 2
;
data want ;
set table2 ;
p=a ;
set table1 point=p ;
array col col1-col4 ;
c=col(b);
drop col1-col4;
run;

Related

Generating Unique ID for same group

I have data set,
CustID Rating
1 A
1 A
1 B
2 A
2 B
2 C
2 D
3 X
3 X
3 Z
4 Y
4 Y
5 M
6 N
7 O
8 U
8 T
8 U
And expecting Output
CustID Rating ID
1 A 1
1 A 1
1 B 1
2 A 1
2 B 2
2 C 3
2 D 4
3 X 1
3 X 1
3 Z 2
4 Y 1
4 Y 1
5 M 1
6 N 1
7 O 1
8 U 1
8 T 2
8 U 1
In the solution below, I selected the distinct possible ratings into a macro variable to be used in an array statement. These distinct values are then searched in the ratings tolumn to return the number assigned at each successful find.
You can avoid the macro statement in this case by replacing the %sysfunc by 3 (the number of distinct ratings, if you know it before hand). But the %sysfunc statement helps resolve this in case you don't know.
data have;
input CustomerID Rating $;
cards;
1 A
1 A
1 B
2 A
2 A
3 A
3 A
3 B
3 C
;
run;
proc sql noprint;
select distinct quote(strip(rating)) into :list separated by ' '
from have
order by 1;
%put &list.;
quit;
If you know the number before hand:
data want;
set have;
array num(3) $ _temporary_ (&list.);
do i = 1 to dim(num);
if findw(rating,num(i),'tips')>0 then id = i;
end;
drop i;
run;
Otherwise:
%macro Y;
data want;
set have;
array num(%sysfunc(countw(&list., %str( )))) $ _temporary_ (&list.);
do i = 1 to dim(num);
if findw(rating,num(i),'tips')>0 then id = i;
end;
drop i;
run;
%mend;
%Y;
The output:
Obs CustomerID Rating id
1 1 A 1
2 1 A 1
3 1 B 2
4 2 A 1
5 2 A 1
6 3 A 1
7 3 A 1
8 3 B 2
9 3 C 3
Assuming data is sorted by customerid and rating (as in the original unedited question). Is the following what you want:
data want;
set have;
by customerid rating;
if first.customerid then
id = 0;
if first.rating then
id + 1;
run;

Compare column values

I have 5 columns and want to check which columns have exact values
num1 num2 num3 num4 num5
1 2 2 3 1
2 3 3 2 2
2 2 2 2 2
4 5 6 7 4
Here column 1(num1) and last(num5) have exact same values everywhere. How can I find it?
You could transpose and then look for duplicate rows instead.
data have ;
input num1-num5 ;
cards;
1 2 2 3 1
2 3 3 2 2
2 2 2 2 2
4 5 6 7 4
;
data _null_;
call symputx('nobs',nobs);
stop;
set have nobs=nobs;
run;
proc transpose data=have out=tran; var num1-num5; run;
proc sort data=tran; by col1-col&nobs; run;
data want;
set tran ;
by col1-col&nobs;
if not (first.col&nobs and last.col&nobs) ;
run;
proc print data=want;
run;
Results
Obs _NAME_ COL1 COL2 COL3 COL4
1 num1 1 2 2 4
2 num5 1 2 2 4

How to join multiple columns into one in sas

I have a time series SAS dataset and I want to transfer it to vertical dataset.
My data looks like..
ID A2009 A2010 A2011 A2012
1 1 2 3 4
2 1 2 3 4
3 1 2 3 4
4 1 2 3 4
5 1 2 3 4
data multcol;
infile datalines;
input ID A2009 A2010 A2011 A2012 A2013;
return;
datalines;
1 1 2 3 4 5
2 1 2 3 4 5
3 1 2 3 4 5
4 1 2 3 4 5
5 1 2 3 4 5
;
run;
proc print data=multcol noobs;
run;
I search the web only find someone's solution as following.Not worked.
But my dataset is too large, this method shut down my computer.
data cmbcol(keep=a orig_varname orig_obsnum);
set multcol;
array myvars _numeric_;
do i = 2 to dim(myvars);
orig_varname = vname(myvars(i));
orig_obsnum = _n_;
A = myvars(i);
output;
end;
run;
proc print data=cmbcol ;
title 'cmbcol';
run;
proc sort data=cmbcol;
by orig_varname a;
run;
proc print data=cmbcol noobs;
title 'cmbcol';
run;
And I want them to become like this.
ID t t+1
1 1 2
2 1 2
3 1 2
4 1 2
5 1 2
1 2 3
2 2 3
3 2 3
4 2 3
5 2 3
1 3 4
2 3 4
3 3 4
4 3 4
5 3 4
How can we do that?
Thanks in advance.
That is an unusual data structure for sure, but you could achieve this using the following macro (adjust to your needs).
options validvarname = any;
%macro transp;
%let i = 2009;
%do %while (&i <= 2011);
%let j = %eval(&i + 1);
data part_&i(rename = (A&i = t A&j = 't+1'n));
set multcol(keep = ID A&i A&j);
run;
%let i = %eval(&i + 1);
%end;
data combined;
set part_:;
run;
proc datasets nolist nodetails;
delete part_:;
quit;
%mend transp;
%transp

Transposing data in SAS

I have a dataset of laboratory results. Each row corresponds to a time point of a subject (for example: row 1 is subject #1 at his first visit, row 2 is subject #1 at his second visit,...). In each row, I have values of 5 tests (test1, test2, ....) and for each test, I have in addition to the result, two columns of reference values of the test (normal low and high levels). I wish to transpose the data, in a way that each row will be identical for subject+visit+test, with two columns, the numerical result and the status (normal or not). I failed transposing the data. I managed to get all tests in a long format, but I couldn't save the reference values. How should I do it ? My alternative is a set of if statements, it's going to be very long !
This question was also posted on communities.sas.com.
The two step process extracts data about PARAMCD (lab test code) and variable type (value and normal range limits) from the names. PARAMCD becomes a new row id variable and V L and H are used to create new variable names when the data are transposed again to the more or less (CDISC SDTM) format.
data A;
input ID Visit Group Test1 Test2 Test3 Test1_L Test1_H Test2_L Test2_H Test3_L Test3_H;
datalines;
1 1 0 5 3 6.7 1 10 2 7 3 9
1 2 0 5.5 3.8 8.7 1 10 2 7 3 6
1 3 0 4.5 2.8 5.7 1 10 3 7 3 6
2 1 1 5 3 6.7 1 10 2 7 3 9
2 2 1 5.5 3.8 8.7 1 10 2 7 3 9
2 3 1 4.5 2.8 5.7 1 10 2 7 3 9
;;;;
run;
proc print;
run;
proc transpose data=a out=b;
by id visit group;
run;
data b;
set b;
length paramcd $8 namecd $1;
call scan(_name_,1,p,l,'_');
paramcd = substrn(_name_,p,l);
namecd = coalesceC(substrn(_name_,p+l+1),'V');
drop p l _name_;
run;
proc sort data=b;
by id visit group paramcd;
run;
proc format;
value $namecd 'V'='Value' 'H'='High' 'L'='Low';
run;
proc transpose data=b out=c(drop=_name_);
by id visit group paramcd;
id namecd;
format namecd $namecd.;
var col1;
run;
data c;
set c;
length RangeFL $1;
if n(low,value) eq 2 and value lt low then RangeFL='L';
else if n(high,value) eq 2 and value gt high then RangeFL='H';
else RangeFL='N';
run;
proc print;
run;

SAS sort by original order

Say you have three separate data sets consisting of the same number of observations. Each observation has an ID letter, A-Z, followed by some numerical observation. For example:
Data set 1:
B 3 8 1 9 4
C 4 1 9 3 1
A 4 4 5 4 9
Data set 2:
C 3 1 9 4 0
A 4 1 2 0 0
B 0 3 3 1 8
I want to merge the data sets BY that first variable. The problem is, the first variable is NOT already sorted in alphabetical form, and I do not want to sort it in alphabetical form. I want to merge the data but keep the original order. For example, I would get:
Merged data:
B 3 8 1 9 4
B 0 3 3 1 8
C 4 1 9 3 1
C 3 1 9 4 0
A 4 4 5 4 9
A 4 1 2 0 0
Is there any way to do this?
You can create a variable that holds the order and then apply that the new dataset after its "merged". I believe this is an append rather than merge though. I've used a format, though you could use a sql or data set merge as well.
data have1;
input id $ var1-var5;
cards;
B 3 8 1 9 4
C 4 1 9 3 1
A 4 4 5 4 9
;
run;
data have2;
input id $ var1-var5;
cards;
C 3 1 9 4 0
A 4 1 2 0 0
B 0 3 3 1 8
;
run;
data order;
set have1;
fmtname='sort_order';
type='J';
label=_n_;
start=id;
keep id fmtname type label start;
run;
proc format cntlin=order;
run;
data want;
set have1 have2;
order_var=input(id, $sort_order.);
run;
proc sort data=want;
by order_var;
run;
This is just one SQL version which follows along a similar path to Joe's answer. Row order is input via a sub-query rather than a format. However the initial order of the two input tables is lost in the join to the row order sub-query. The original order (have2 follows have1) is re-instated by using the table names as a secondary order variable.
proc sql;
create table want1 as
select want.id
,want.var1
,want.var2
,want.var3
,want.var4
,want.var5
from (
select *
, 'have1' as source
from have1
union all
select *
, 'have2' as source
from have2
) as want
left join
(
select id
, monotonic() as row_no
from have1
) as order
on want.id eq order.id
order by order.row_no
,want.source
;
quit;
proc compare
base=want1
compare=want
;
run;
And this is a data step version without a format. Here the have1 table with row order is re-merged with the concatenated data (have1 and have2) and then re-sorted by row order.
data want2;
set have1 have2;
run;
data have1;
set have1;
order_var = _n_;
run;
proc sort data=want2;
by id;
run;
proc sort data=have1;
by id;
run;
data want2;
merge want2 have1;
by id;
run;
proc sort data=want2;
by order_var;
run;
proc compare
base=want2
compare=want
;
run;