How to join multiple columns into one in sas - sas

I have a time series SAS dataset and I want to transfer it to vertical dataset.
My data looks like..
ID A2009 A2010 A2011 A2012
1 1 2 3 4
2 1 2 3 4
3 1 2 3 4
4 1 2 3 4
5 1 2 3 4
data multcol;
infile datalines;
input ID A2009 A2010 A2011 A2012 A2013;
return;
datalines;
1 1 2 3 4 5
2 1 2 3 4 5
3 1 2 3 4 5
4 1 2 3 4 5
5 1 2 3 4 5
;
run;
proc print data=multcol noobs;
run;
I search the web only find someone's solution as following.Not worked.
But my dataset is too large, this method shut down my computer.
data cmbcol(keep=a orig_varname orig_obsnum);
set multcol;
array myvars _numeric_;
do i = 2 to dim(myvars);
orig_varname = vname(myvars(i));
orig_obsnum = _n_;
A = myvars(i);
output;
end;
run;
proc print data=cmbcol ;
title 'cmbcol';
run;
proc sort data=cmbcol;
by orig_varname a;
run;
proc print data=cmbcol noobs;
title 'cmbcol';
run;
And I want them to become like this.
ID t t+1
1 1 2
2 1 2
3 1 2
4 1 2
5 1 2
1 2 3
2 2 3
3 2 3
4 2 3
5 2 3
1 3 4
2 3 4
3 3 4
4 3 4
5 3 4
How can we do that?
Thanks in advance.

That is an unusual data structure for sure, but you could achieve this using the following macro (adjust to your needs).
options validvarname = any;
%macro transp;
%let i = 2009;
%do %while (&i <= 2011);
%let j = %eval(&i + 1);
data part_&i(rename = (A&i = t A&j = 't+1'n));
set multcol(keep = ID A&i A&j);
run;
%let i = %eval(&i + 1);
%end;
data combined;
set part_:;
run;
proc datasets nolist nodetails;
delete part_:;
quit;
%mend transp;
%transp

Related

Is there a way to easily save output from PROC LTA to a MS Word document in SAS?

I am using the PROC LTA plugin provided by The Methodology Centre at PennState. It writes output to the SAS output window, but not in the usual HTML format. It appears to be just text. Is there a way I can easily copy the values from the output into a Word document?
Things I have tried:
Copy pasting directly from SAS output into Word or Excel. It does not correctly place values into cells.
Printing the output to a PDF, and copying from that PDF. It does not correctly place values into cells.
Using the ODS system. It does not seem to save any results to ODS. I have checked using ODS TRACE.
TIA.
Without seeing data and LTA code, you can save the OUTPUT window contents and, via ODS with style=monotype, copy that into a document with a DATA _NULL_ Step.
Example (LCA code from plugin page):
DATA test;
INPUT it1 it2 it3 it4 count;
DATALINES;
1 1 1 1 5
1 1 1 2 5
1 1 2 1 9
1 1 2 2 8
1 2 1 2 5
1 2 2 1 8
1 2 2 2 4
2 1 1 1 5
2 1 1 2 3
2 1 2 1 6
2 1 2 2 8
2 2 1 1 3
2 2 1 2 7
2 2 2 1 5
2 2 2 2 10
;
RUN;
dm 'clear output';
PROC LCA DATA=test ;
NCLASS 2;
ITEMS it1 it2 it3 it4;
CATEGORIES 2 2 2 2;
FREQ count;
SEED 100000;
RHO PRIOR=1;
RUN;
* save contents of output window to catalog entry;
dm 'output; saveas work.lca.results.output';
filename results catalog 'work.lca.results.output';
ods rtf file='results.rtf' style=monospace;
title; footnote;
options nodate nonumber nocenter;
* read contents of catalog entry and write to ODS;
data _null_;
infile results;
input;
line = _infile_;
file print ods;
put _ods_;
run;
ods rtf close;
Document (image of)
The PROC LCA and PROC LTA procedures include the OUTPOST, OUTEST and OUTPARAM options that allow you to save some results to datasets, which can then be printed using ODS.
Example code:
DATA test;
INPUT it1 it2 it3 it4 count;
DATALINES;
1 1 1 1 5
1 1 1 2 5
1 1 2 1 9
1 1 2 2 8
1 2 1 2 5
1 2 2 1 8
1 2 2 2 4
2 1 1 1 5
2 1 1 2 3
2 1 2 1 6
2 1 2 2 8
2 2 1 1 3
2 2 1 2 7
2 2 2 1 5
2 2 2 2 10
;
RUN;
PROC LTA DATA=test OUTEST=est1 OUTPARAM=par1 ;
NSTATUS 2;
NTIMES 4;
ITEMS it1 it2 it3 it4;
CATEGORIES 2;
SEED 100000;
RUN;
ods rtf file="results.rtf";
proc print data=par1; run;
ods rtf close;

Compare column values

I have 5 columns and want to check which columns have exact values
num1 num2 num3 num4 num5
1 2 2 3 1
2 3 3 2 2
2 2 2 2 2
4 5 6 7 4
Here column 1(num1) and last(num5) have exact same values everywhere. How can I find it?
You could transpose and then look for duplicate rows instead.
data have ;
input num1-num5 ;
cards;
1 2 2 3 1
2 3 3 2 2
2 2 2 2 2
4 5 6 7 4
;
data _null_;
call symputx('nobs',nobs);
stop;
set have nobs=nobs;
run;
proc transpose data=have out=tran; var num1-num5; run;
proc sort data=tran; by col1-col&nobs; run;
data want;
set tran ;
by col1-col&nobs;
if not (first.col&nobs and last.col&nobs) ;
run;
proc print data=want;
run;
Results
Obs _NAME_ COL1 COL2 COL3 COL4
1 num1 1 2 2 4
2 num5 1 2 2 4

Using a sas lookup table when the column number changes

I have two sas datasets,
Table 1 Table 2
col1 col2 col3 col4 col5 a b
. 1 2 3 4 1 1
1 5 8 6 1 1 4
2 5 9 7 1 4 3
3 6 9 7 1 2 1
4 6 9 7 2 2 2
where table 1 is a lookup table for values a and b in table 2, such that I can make a column c. In table 1 a is equivalent to col1 and b to row1 (i.e. the new column c in table 2 should read 5,1,7,5,9. How can I achieve this in sas. I was thinking of reading table 1 into a 2d array then get column c = array(a,b), but can't get it to work
Here's an IML solution, first, as I think this is really the 'best' solution for you - you're using a matrix, so use the matrix language. I'm not sure if there's a non-loop method - there may well be; if you want to find out, I would add the sas-iml tag to the question and see if Rick Wicklin happens by the question.
data table1;
input col1 col2 col3 col4 col5 ;
datalines;
. 1 2 3 4
1 5 8 6 1
2 5 9 7 1
3 6 9 7 1
4 6 9 7 2
;;;;
run;
data table2;
input a b;
datalines;
1 1
1 4
4 3
2 1
2 2
;;;;
run;
proc iml;
use table1;
read all var _ALL_ into table1[colname=varnames1];
use table2;
read all var _ALL_ into table2[colname=varnames2];
print table1;
print table2;
table3 = j(nrow(table2),3);
table3[,1:2] = table2;
do _i = 1 to nrow(table3);
table3[_i,3] = table1[table3[_i,1]+1,table3[_i,2]+1];
end;
print table3;
quit;
Here is the temporary array solution. It's not all that pretty. If speed is an issue you don't have to loop over the array to insert it, you can use direct memory access, but I don't want to do that unless speed is a huge issue (and if it is, you should use a better data structure first).
data table3;
set table2;
array _table1[4,4] _temporary_;
if _n_ = 1 then do;
do _i = 1 by 1 until (eof);
set table1(firstobs=2) nobs=_nrows end=eof;
array _cols col2-col5;
do _j = 1 to dim(_cols);
_table1[_i,_j] = _cols[_j];
end;
end;
end;
c = _table1[a,b];
keep a b c;
run;
Just use the POINT= option on a SET statement to pick the row. You can then use an ARRAY to pick the column.
data table1 ;
input col1-col4 ;
cards;
5 8 6 1
5 9 7 1
6 9 7 1
6 9 7 2
;
data table2 ;
input a b ;
cards;
1 1
1 4
4 3
2 1
2 2
;
data want ;
set table2 ;
p=a ;
set table1 point=p ;
array col col1-col4 ;
c=col(b);
drop col1-col4;
run;

Easily splitting out multiple saved mean values into separate macro variables in SAS

I have a data set with a ton of variables. For example:
ID v1 v2 v3 v4 v5 v6 v7 v8
1 4 1 2 2 2 2 1 2
2 2 3 1 4 3 4 4 2
3 3 5 1 3 4 3 4 3
4 3 1 2 3 2 2 4 2
5 5 1 5 5 3 5 1 5
...
I want to take the average of each variable, store it, and then be able to use it for other data sets.
What I have tried so far is for each variable, over and over:
proc means data=data;
var v1;
output out=v1out mean=meanv1;
run;
proc means data=data;
var v2;
output out=v2out mean=meanv2;
run;
...
then, for each (again):
data v1temp;
set v1;
call symput("meanv1",meanv1);
run;
data v2temp;
set v2;
call symput("meanv2",meanv2);
run;
...
But this is very tedious with a lot of variables. Is there an easier way?
I want to take the average of each variable, store it, and then be
able to use it for other data sets.
There doesn't seem to be an advantage to using global macro variables for this. Another option is to calculate the means as #user102890 suggests above:
proc means data = myData noprint;
var v1-v8;
output out = myDataMeans(drop = _type_ _freq_
where = (_stat_='MEAN')
rename = (v1-v8 = meanV1-meanV8));
run;
And then just set that one observation into your data set:
DATA myData;
set myData;
if _N_ = 1 then set myDataMeans;
...;
RUN;
Then you have variables meanV1-meanV8 available as actual data set values on every observation of data set data. You could do the same thing for any other data set for which you want to use the means of those variables.
Behold the power of PROC SQL;)
data myData;
input id v1-v8;
datalines;
1 4 1 2 2 2 2 1 2
2 2 3 1 4 3 4 4 2
3 3 5 1 3 4 3 4 3
4 3 1 2 3 2 2 4 2
5 5 1 5 5 3 5 1 5
;
run;
proc transpose data= myData out= myXData;
by id;
var v1-v8;
run;
proc sql noprint;
select mean( col1 )
into :mean1 - :mean8
from myXData
group by _name_
;
quit;
%put &mean1 &mean2 &mean3 &mean4 &mean5 &mean6 &mean7 &mean8;
Log output:
171
172 %put &mean1 &mean2 &mean3 &mean4 &mean5 &mean6 &mean7 &mean8;
3.4 2.2 2.2 3.4 2.8 3.2 2.8 2.8
I still concur the macro variables are not the best way storing sequential data.
data myData;
input id v1-v8;
datalines;
1 4 1 2 2 2 2 1 2
2 2 3 1 4 3 4 4 2
3 3 5 1 3 4 3 4 3
4 3 1 2 3 2 2 4 2
5 5 1 5 5 3 5 1 5
;
run;
proc means data = myData noprint;
var v1-v8;
output out = myDataMeans(drop = _type_ _freq_
where = (_stat_='MEAN')
rename = (v1-v8 = meanV1-meanV8));
run;
The output datset, myDataMeans looks like the following:
_STAT_ meanV1 meanV2 meanV3 meanV4 meanV5 meanV6 meanV7 meanV8
MEAN 3.4 2.2 2.2 3.4 2.8 3.2 2.8 2.8
The following will read the myDataMeans dataset and put each column in it into its own macro variable.
%let dsid=%sysfunc(open(myDataMeans,i));/*open the dataset which has macro vars to read in cols*/
%syscall set(dsid); /*no leading ampersand with %SYSCALL */
%let rc=%sysfunc(fetchobs(&dsid,1));/*just reading 1 obs*/
%let rc=%sysfunc(close(&dsid));/*close dataset after reading*/
%put _user_;
The following global macro variables are created as shown in the log:
GLOBAL _STAT_ MEAN
GLOBAL MEANV1 3.4
GLOBAL MEANV2 2.2
GLOBAL MEANV3 2.2
GLOBAL MEANV4 3.4
GLOBAL MEANV5 2.8
GLOBAL MEANV6 3.2
GLOBAL MEANV7 2.8
GLOBAL MEANV8 2.8

Creating a dummy variable for ``switching''

I'm working on a project in SAS and I wanted to create a dummy variable that accounted for ``preferences in medicine''. I have a long data-set, by time period, of individuals taking either medicine type 1 or type 2. For my research, I want to create a variable to represent if individuals who take type 1 medicine, then switched to type 2, but went back to type 1. I am unconcerned with the time interval that the individual was on the medication for, just that they followed this pattern.
id month type
1 1 2
1 2 2
1 3 2
2 1 1
2 2 2
2 3 1
...
I have more months, but just wanted to provide something to elucidate what I'm trying to get. Basically, I want to tally those subjects who are like subject 2.
well, nothing fancy, but it works for me:
DATA LONG1;
input id month type;
cards;
1 1 2
1 2 2
1 3 2
1 4 2
1 5 2
1 6 2
1 7 2
1 8 2
1 9 2
1 10 2
2 1 1
2 2 1
2 3 1
2 4 1
2 5 1
2 6 1
2 7 1
2 8 1
2 9 1
2 10 1
3 1 1
3 2 1
3 3 1
3 4 2
3 5 1
3 6 1
3 7 1
3 8 1
3 9 1
3 10 1
;
Proc Print; run;
* 1) make a wide dataset by deconstructing the initial long data by month & rejoining by id
2) then use if/then statements to create your dummy variable,
3) then merge the dummy variable back into your long dataset using ID;
DATA month1; set long1; where month=1; rename month=month_1 type=type_1; Proc Sort; by ID; run;
DATA month2; set long1; where month=2; rename month=month_2 type=type_2; Proc Sort; by ID; run;
DATA month3; set long1; where month=3; rename month=month_3 type=type_3; Proc Sort; by ID; run;
DATA month4; set long1; where month=4; rename month=month_4 type=type_4; Proc Sort; by ID; run;
DATA month5; set long1; where month=5; rename month=month_5 type=type_5; Proc Sort; by ID; run;
DATA month6; set long1; where month=6; rename month=month_6 type=type_6; Proc Sort; by ID; run;
DATA month7; set long1; where month=7; rename month=month_7 type=type_7; Proc Sort; by ID; run;
DATA month8; set long1; where month=8; rename month=month_8 type=type_8; Proc Sort; by ID; run;
DATA month9; set long1; where month=9; rename month=month_9 type=type_9; Proc Sort; by ID; run;
DATA month10; set long1; where month=10; rename month=month_10 type=type_10; Proc Sort; by ID; run;
DATA WIDE;
merge month1 month2 month3 month4 month5 month6 month7 month8 month9 month10; by ID;
if (type_1=1 and type_2=1 and type_3=1 and type_4=1 and type_5=1
and type_6=1 and type_7=1 and type_8=1 and type_9=1 and type_10=1) or
(type_1=2 and type_2=2 and type_3=2 and type_4=2 and type_5=2
and type_6=2 and type_7=2 and type_8=2 and type_9=2 and type_10=2)
then switch='no '; else switch='yes '; keep ID switch; run;
DATA LONG2;
merge wide long1; by ID;
Proc Print; run;
btw: also go to the SAS listserv, they love stuff like this:
http://www.listserv.uga.edu/archives/sas-l.html
This worked on the limited data I used:
DATA Have;
input id month type;
datalines;
1 1 1
1 2 1
1 3 1
1 4 1
1 5 1
2 1 1
2 2 2
2 3 1
2 4 1
2 5 1
3 1 1
3 2 1
3 3 2
3 4 2
3 5 1
4 1 2
4 2 2
4 3 2
4 4 2
4 5 2
;
Data Temp(keep=id dummy);
length dummy $15;
retain Start Type2 dummy;
set Have;
by id;
if first.id then Do;
Start=0;
Type2=0;
Dummy="";
end;
If Type=1 then do;
If Start=0 then Start=1;
else if Start=1 and Type2=1 then Dummy="Switch-er-Roo";
end;
else do;
if Start=1 then Type2=1;
end;
if last.id then output;
run;
Data Want;
merge temp(in=a) have(in=b);
by id;
run;
I prefer #CarolinaJay65 approach, it's a lot cleaner and just involves one pass of the data. If all you are interested in are the patients who start and finish on Type1, but use Type2 at some point, then the code can be simplified slightly. The following code (using #CarolinaJay65 source data) will only output the patient_id's matching this criteria.
data switch_id (keep=id);
set have;
by id month;
retain switch;
if first.id then do;
call missing(switch);
if type=1 then switch=0;
end;
else if not missing(switch) and type=2 then switch=1;
if last.id and type=1 and switch=1 then output;
run;
If you just wanted the number of patients who match the criteria then you could tweak this code further.
data switch (keep=count);
set have end=final;
by id month;
retain switch count 0;
if first.id then do;
call missing(switch);
if type=1 then switch=0;
end;
else if not missing(switch) and type=2 then switch=1;
if last.id and type=1 and switch=1 then count+1;
if final then output;
run;
I think the following should work:
DATA Have;
input id month type;
if _n_ ^= 1 and id ^= lag(id) then diftype = .;
else diftype = dif(type);
datalines;
1 1 1
1 2 1
1 3 1
1 4 1
1 5 1
2 1 1
2 2 2
2 3 1
2 4 1
2 5 1
3 1 1
3 2 1
3 3 2
3 4 2
3 5 1
4 1 2
4 2 2
4 3 2
4 4 2
4 5 2
;
proc sql;
select case when max(diftype) = 1 and min(diftype) = -1 then 1 else 0 end as flag, * from have
group by id
;
quit;