How to reshape data wide to long [duplicate] - sas

This question already has answers here:
SAS Data formatting (reverse proc transpose?)
(2 answers)
Closed 7 years ago.
I want to reshape data columns to rows
Initial Table as shown below
ID1 ID2 ID3 Name
----------------------------
I001 I002 I003 John
Desire Table like
ID Name
------------
I001 John
I002 John
I003 John
Can anyone help out?
Thanks lots!!

One way to do this is to set up an array of IDs and loop through with an explicit OUTPUT statement.
data want;
set have;
array ids(3) id1-id3;
do i=1 to dim(ids);
ID=ids(i);
OUTPUT;
end;
run;

You can use PROC TRANSPOSE Make sure your data is sorted by NAME
proc transpose data=have out=want(rename=(_name_=ID));
by Name;
run;

Related

SAS: How to create a table that lists the variable names of another table

I am starting out with SAS and PROC SQL, and I want to do a very simple task as described below but am having trouble.
data test;
input x_ray y_chromosome z_sword;
cards;
1 2 3
4 5 6
4 7 8
7 8 9
7 9 10
7 10 11
;
In the table test we have three variables, x_ray, y_chromosome, and z_sword.
My goal is to make another table, let's say, result, that looks like this.
data result;
input name $;
cards;
x_ray
y_chromosome
z_sword
;
I looked up ways to do this online, but all the methods are very well beyond my understanding and I simply do not believe that such a simple process has to be so complicated.
May I have some help, please?
Use PROC CONTENTS.
proc contents data=test noprint out=result;
run;
If you only want the variable names and not the other information you can use dataset option to limit the variables kept in the RESULT dataset.
proc contents data=test noprint out=result(keep=name) ;
run;
I cannot find any shorter and simpler answer than #Tom, just another two ways for reference.
Use dictionary table:
proc sql;
create table result as select name from sashelp.vcolumn where libname='WORK' and memname='TEST';
quit;
or query them by I/O functions:
data result;
rc = open('work.test');
if rc then do;
do i = 1 to attrn(rc,'nvars');
name = varname(rc,i);
output;
end;
rc = close(rc);
end;
run;

Proc transpose in SAS with multiple observations in var variable

I have a dataset that I want to tranpose from long to wide. I have:
**ID **Question** Answer**
1 Referral to a
1 Referral to b
1 Referral to d
2 Referral to a
2 Referral to c
4 Referral to a
6 Referral to a
6 Referral to c
6 Referral to d
What I want the tranposed dataset to look like:
**ID **Referral to**
1 a, b, d
2 a, c
4 a
6 a, c, d
I've tried to transpose the data, but the resulting dataset only contains 1 of the responses from the answer column, not all of them.
Code I've been using:
proc transpose data=test out=test2 let;
by ID;
id Question;
var Answer; run;
The dataset has hundreds of thousands of rows with dozens of variables that are exactly the same as the 'Referral to' example. How can make it so the tranposed wide dataset contains all of the answers to the Question in the same cell and not just one? Any help would be appreciated.
Thank you.
Here's two methods you can use in this case.
The first uses a data step approach, which is a single step. The second is more dynamic and uses a PROC TRANSPOSE + CATX() after the fact to create the combined variable. Note the use of PREFIX option in the transpose procedure to make the variables easier to identify and concatenate.
*create sample data for demonstration;
data have;
infile cards dlm='09'x;
input OrgID Product $ States $;
cards;
1 football DC
1 football VA
1 football MD
2 football CA
3 football NV
3 football CA
;
run;
*Sort - required for both options;
proc sort data=have;
by orgID;
run;
**********************************************************************;
*Use RETAIN and BY group processing to combine the information;
**********************************************************************;
data want_option1;
set have;
by orgID;
length combined $100.;
retain combined;
if first.orgID then
combined=states;
else
combined=catx(', ', combined, states);
if last.orgID then
output;
run;
**********************************************************************;
*Transpose it to a wide format and then combine into a single field;
**********************************************************************;
proc transpose data=have out=wide prefix=state_;
by orgID;
var states;
run;
data want_option2;
set wide;
length combined $100.;
combined=catx(', ', of state_:);
run;

using contain operator or equivilant [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
in SAS how to to use a contain (or alternative) operator when you have more than one set of letters to choose. eg where have_variable= abd, afg, afd, acc and want_variable=abd, afg, afd (containing ab or af only)
I've Split your have and want list into two tables with multiple records then left joined on Have list to find the matching ones.
The final table will look like this
/* Create your input String */
data Have;
have="abd , afg , afd , acc";
run;
data Want ;
want="abd , afg , afd";
run;
/* Splint Input strings into Multiple Rows */
data Have_List;
set Have;
do i=1 by 0;
source=lowcase(scan(have,i,','));
if missing(source) then leave;
output;
i+1;
end;
keep source ;
run;
data Want_List;
set Want;
do i=1 by 0;
lookup=lowcase(scan(want,i,','));
if missing(lookup) then leave;
match='match';
output;
i+1;
end;
keep lookup match;
run;
/* Create a SQL left join to lookup the matching values */
proc sql;
create table match as
select h.source as have , COALESCE(w.match,"no-match") as match
from have_list h left join want_list w on h.source=w.lookup;
quit;
You can use a list in your select statement.
Like that :
proc sql;
select * from my_table where have_variable in ('abd','afg','afd','acc') and want_variable in ('abd','afg','afd');
run;
quit;
You can even use the in operator in a dataset statement like this :
data want;
set mydate;
if have_variable in ('abd','afg','afd','acc') and
want_variable in ('abd','afg','afd');
run;
If you want to obtain the variable only containing 2 letters you can use the LIKE :
proc sql;
select * from my_table where have_variable like '%ab%' or have_variable like '%af%';
run;
in a dataset :
data want;
set mydate;
where have_variable like '%ab%' or
have_variable like '%af%';
run;
Regards
If you only want records that begin with ab or af (rather than contains them anywhere in the string), then you can you in followed by :. With this usage, the colon instructs SAS to only search the first n letters in the string, where n is length of the comparison (2 in your example).
Note that this only works in a datastep, not proc sql.
data have;
input have_var $;
datalines;
abd
afg
afd
acc
;
run;
data _null_;
set have;
where have_var in: ('ab','af');
put _all_;
run;

Frequency of a value across multiple variables?

I have a data set of patient information where I want to count how many patients (observations) have a given diagnostic code. I have 9 possible variables where it can be, in diag1, diag2... diag9. The code is V271. I cannot figure out how to do this with the "WHERE" clause or proc freq.
Any help would be appreciated!!
Your basic strategy to this is to create a dataset that is not patient level, but one observation is one patient-diagnostic code (so up to 9 observations per patient). Something like this:
data want;
set have;
array diag[9];
do _i = 1 to dim(diag);
if not missing(diag[_i]) then do;
diagnosis_Code = diag[_i];
output;
end;
end;
keep diagnosis_code patient_id [other variables you might want];
run;
You could then run a proc freq on the resulting dataset. You could also change the criteria from not missing to if diag[_i] = 'V271' then do; to get only V271s in the data.
An alternate way to reshape the data that can match Joe's method is to use proc transpose like so.
proc transpose data=have out=want(keep=patient_id col1
rename=(col1=diag)
where=(diag is not missing));
by patient_id;
var diag1-diag9;
run;

Transposing one column in a dataset but by year and another column

I have this dataset here which looks like this:
Basically I want to manipulate the data set so that I have
GVKEY1 as unique such as 1004 then a unique year number such as 1996 then several gvkey2 after that. However the number of gvkey2 for each year is not the same. Does anyone know how to get around this problem? This means I will have several 12 lines of data for gvkey1 for 1004 since i have years from 1996 to 2008. Then for each year I will have many columns where each column will have a gvkey2.
Best Regards,
Naz
Can you not just use PROC TRANSPOSE?
proc sort data=your_data_set out=temp1;
by gvkey1 year;
run;
proc transpose data=temp1 out=temp2;
by gvkey1 year;
var gvkey2;
run;
This will give you a series of variables COL1 - COLx. Use the PREFIX option for different variable names.
I'm not sure I've understood your question, but if you're looking for unique gvkey1/year pairs, you could do either of these:
proc sql;
create table results as
select distinct gvkey1, year
from _your_data_set;
quit;
or
proc sort data=_your_data_set(keep=gvkey1 year) out=results nodupkey;
by gvkey1 year;
run;
If that's not what you're looking for, I suggest posting an example of the results you want.