Transposing wide to long in SAS, without extra columns - sas

I'd like to transpose a dataset, but SAS insists on adding a new column, if the "by" column
has multiple entries.
So if I run
data test;
input a b $ c $ ;
datalines;
1 aaa bbb
1 bbb bbb
2 ccc ccc
3 ccc ccc
;
run;
proc transpose data=test;
by a;
var b b;
run;
I get a table with two columns that looks like this:
1 b aaa bbb
1 c bbb bbb
2 b ccc
2 c ccc
3 b ccc
3 c ccc
What I'd like with a table that looks like this:
1 b aaa
1 c bbb
1 b bbb
1 c bbb
2 b ccc
2 c ccc
3 b ccc
3 c ccc
So instead of adding columns, for each entry, I want SAS to add rows. Any ideas on how to do this?
Just to be clear, this is a toy example! The dataset I'm working with has more columns.

This ought to work (using your example code):
proc transpose data=test out=test_tran1(rename=(_name_ = old_var));
by a;
var b c;
run;
proc transpose data=test_tran1 out=test_tran2(drop=_: rename = (col1=values) where = (not missing(values)));
by a old_var;
var col:;
run;

You can't use PROC TRANSPOSE in a single step with a 'mixed' dataset (multiple rows per by group AND multiple columns) to get long. Transpose only really works well going all one or the other.
Easiest way to get long is usually the data step.
data want;
set test;
array vars b c;
do _i = 1 to dim(vars);
varname = vname(vars[_i]);
value = vars[_i];
output;
end;
keep a varname value;
run;

data output;
set test;
array vars {*} b -- c; * define array composed of list of variables from B to C, have to be of same type;
length varname $32;
keep a varname value;
do i=1 to dim(vars);* loop array (list of variables);
varname= vname(vars(i));* get name of variable that provided value;
value = vars(i);* get the value of variable;
output; *output row;
end;
run;

Related

SAS Mapping between two datasets

I’m trying to map between two datasets using SAS.
Dataset 1
ID Start_Date End Date
Aaa 1/1/2023 5/1/2023
Bbb 10/1/2023 1/2/2023
Ccc 15/1/2023 27/1/2023
Dataset 2
ID Date
Aaa 4/1/2023
Aaa 10/1/2023
Bbb 1/1/2023
Bbb 15/1/2023
Bbb 31/1/2023
Ccc 10/1/2023
I want to filter out all the rows in dataset2 that fits the time range (between start and end date) from dataset1 for each ID.
For this example, the output should be as following:
ID Date
Aaa 4/1/2023
Bbb 15/1/2023
Bbb 31/1/2023
Perhaps a simple inner join will give you the expected result
proc sql;
create table want as
select t2.*
from dataset1 t1
inner join dataset2 t2
on t1.id = t2.id
and t2.date between t1.start_date and t1.end_date
;
quit;
ID date
Aaa 04/01/23
Bbb 15/01/23
Bbb 31/01/23
If you have dates stored as a string instead (which seem to be the case following your desired output), consider using the input() function in the join
proc sql;
create table want as
select t2.*
from have1 t1
inner join have2 t2
on t1.id = t2.id
and input(t2.date, ddmmyy10.)
between input(t1.start_date, ddmmyy10.)
and input(t1.end_date, ddmmyy10.)
;
quit;
ID date
Aaa 4/1/2023
Bbb 15/1/2023
Bbb 31/1/2023
Try this
data one;
input ID $ (Start_Date End_Date)(:ddmmyy10.);
format Start_Date End_Date ddmmyy10.;
datalines;
Aaa 1/1/2023 5/1/2023
Bbb 10/1/2023 1/2/2023
Ccc 15/1/2023 27/1/2023
;
data two;
input ID $ Date :ddmmyy10.;
format Date ddmmyy10.;
datalines;
Aaa 4/1/2023
Aaa 10/1/2023
Bbb 1/1/2023
Bbb 15/1/2023
Bbb 31/1/2023
Ccc 10/1/2023
;
data want(keep = ID Date);
merge one two;
by ID;
if Start_Date <= Date <= End_Date;
run;
Using double set can solve this problem logically。
The first set is to load the dataset 2.And the second one is to load the range (datset 1).A if will only output the date that fitting the range.
Try this:
data dsout;
set ds2;
do i=1 to rec while(^eof);
set ds1(rename=(ID=ID_)) point=i nobs=rec end=eof;
if ID=ID_ and date>=Start_Date and date<=End_Date then do;
output;
leave;
end;
end;
drop Start_Date End_Date;
run;

values of commun column in A replaced by that in B with function merge in SAS

I want merge two tables, but they have 2 columns in commun, and i do not want value of var1 in A replaced by that in B, if we don't use drop or rename, does anyone know it?
I can fix it with sql but just curious with Merge!
data a;
infile datalines;
input id1 $ id2 $ var1;
datalines;
1 a 10
1 b 10
2 a 10
2 b 10
;
run;
/* create table B */
data b;
infile datalines;
input id1 $ id2 $ var1 var2;
datalines;
1 a 30 50
2 b 30 50
;
run;
/* Marge A and B */
data c;
merge a (in=N) b(in=M);
if N;
by id1;
run;
but what i like is:
data C;
infile datalines;
input id1 $ id2 $ var1 var2;
datalines;
1 a 10 50
1 b 10 50
2 a 10 50
2 b 10 50
;
run;
Use rename
data c;
merge a (in=N) b(in=M rename=(var1=var1_2));
by id1;
if N;
run;
If you don't want to use rename / drop etc., then you could just flip the merge order such that the datasets whose var1 should be retained overwrites the other:
data c;
merge b (in=M) a(in=N);
by id1;
if N;
run;
When the data step loads data from the datasets mentioned it does it in the order that they appear on the MERGE (or SET or UPDATE) statement. So if you are merging two dataset and the BY variables match values then the record from the first is loaded and the record from the second is loaded, overwriting the values read from the first.
For 1 to 1 matching you can just change the order that the datasets are mentioned.
merge b(in=M) a(in=N) ;
If you really want the variables defined in the output dataset in the order they appear in A then add a SET statement that the compiler will process but that can never execute before your MERGE statement.
if 0 then set a b ;
If you are doing a 1 to many matching then you might have other trouble since when a dataset stops contributing values to the current BY group then SAS does not re-read the last observation. In that case you will have to use some combination of RENAME=, DROP= or KEEP= dataset options.
In PROC SQL when you have duplicate names for selected columns (and are trying to create an output dataset instead of report) then SAS ignores the second copy of the named variable. So in a sense it is the reverse of what happens with the MERGE statement.

Create new row to data set based existing ones SAS

I have a dataset looking something like this:
var1 var2 count
cat1 no 1
cat1 yes 4
cat1 unkown 3
cat2 no 7
cat2 yes 3
cat2 unkown 5
cat3 no 2
cat3 yes 9
cat3 unkown 0
What I want to do is combine var1 & var2 into new variable where first row is from var1 and the others from var2. So it supposed to look like:
comb count
cat1
no 1
yes 4
unkown 3
cat2
no 7
yes 3
unkown 5
cat3
no 2
yes 9
unkown 0
Any help would be highly appreciated!
It's quite simple.
Here the solution :
1) create the dataset source:
data testa;
infile datalines dsd dlm=',';
input var1 : $200. var2 : $200. count : 8. ;
datalines;
cat1,no,1,
cat1,yes,4,
cat1,unkown,3,
cat2,no,7,
cat2,yes,3,
cat2,unkown,5,
cat3,no,2,
cat3,yes,9,
cat3,unkown,0,
;
run;
2) Selection of var list : cat1|cat2|cat3
proc sql;
select distinct(var1) into: list_var separated by '|' from testa;
run;
3) Process the var list one by one
%macro processListVar(list_var);
data want;
run;
%let k=1;
%do %while (%qscan(&list_var, &k,|) ne );
%let var = %scan(&list_var, &k,|);
data testb(drop=var1 rename=(var2=comb));
set testa;
N=_N_+1+&k;
where var1="&var";
run;
data testc;
N=1+&k;
comb="&var";
count=.;
run;
data tmp;
set testb testc;
run;
proc sort data=tmp out=teste;
by N;
run;
data want;
set want teste;
run;
%put var=&var;
%let k = %eval(&k + 1);
%end;
%mend processListVar;
%processListVar(&list_var);
4) At the end you get the result in dataset want.
You have to exclude finaly the N column like that :
data want_cleaned (drop=N);
set want;
run;
5) More explanation on the code.
a. The key problem was to keep the order between cat1,cat2,cat3.
b. So I divided the problem by each dataset cat1, cat2, .. and created a %do %while to loop through categories.
c. We use the column N, to count the number of line (like an index), and then we can sort on this column, to keep the order.
d. For example : the first var cat1 : We select the column var2, we rename it like the comb column. We drop the var1 column. It create the testb dataset.
The testb dataset is used to create an index (column N) and we create the first line of our subdataset (N=1+&k) in testc. &k is used through all subdatasets. Like that the index is continuing between subdatasets. (without interfering each others). We make a merge between testb and testc. The dataset tmp contains all info needed for cat1. Then we merge all subdatasets in dataset want.
So to summary, we create a loop, and we merge the datasets together at the end. We make a sort on the column N, to display lines in the order you wanted.
Regards,

SAS merging/condensing data

I have a dataset similar to the one below
ID A B C D E
1 1
1 1
1 1
2 1
2 1
3 1
3 1
4 1
5 1
I want to condense the data into one row for each ID. So the dataset would look like the one below.
ID A B C D E
1 1 1 1
2 1 1
3 1 1
4 1
5 1
Well I created another table and removed the duplicate ID's. So I have two tables--A and B. I then tried merging the two datasets together. I was playing around with following SAS code.
data C;
merge A B;
by ID;
run;
Here's a neat trick I picked up from another forum. There's no need to split up the original dataset, the first update statement creates the structure and the second updates the values. The BY statement ensures you only get 1 record per ID.
data have;
infile datalines dsd;
input ID A B C D E;
datalines;
1,1,,,,,
1,,,1,,,
1,,1,,,,
2,,1,,,,
2,,,,1,,
3,,,,,1,
3,1,,,,,
4,,,1,,,
5,,1,,,
;
run;
data want;
update have (obs=0) have;
by id;
run;
This could be solved using the retain statement.
data B(rename=(A2=A B2=B C2=C D2=D));
set A;
by id;
retain A2 B2 C2 D2;
if first.id then do;
A2 = .;
B2 = .;
C2 = .;
D2 = .;
end;
if A ne . then A2=A;
if B ne . then B2=B;
if C ne . then C2=C;
if D ne . then D2=D;
if last.id then output;
drop A B C D;
run;
There are other ways to solve this, but hopefully this is helpful.
PROC MEANS is a great tool for something like this. PROC SQL would also give you a reasonable solution, but MEANS is faster.
proc means data=yourdata;
var a b c d e;
class id;
types id; *to avoid the 'overall' row;
output out=yourdata max=; *output the maximum of each var for each ID - use SUM instead if you want more than 1;
run;

Update one dataset with another without using PROC SQL

I have the below two datasets
Dataset A
id age mark
1 . .
2 . .
1 . .
Dataset B
id age mark
2 20 200
1 10 100
I need the below dataset as output
Output Dataset
id age mark
1 10 100
2 20 200
1 10 100
How to carry out this without using PROC SQL i.e. using DATA STEP?
There are many ways to do this. The easiest is to sort the two data sets and then use MERGE. For example:
proc sort data=A;
by id;
run;
proc sort data=B;
by id;
run;
data WANT;
merge A(drop=age mark) B;
by ID;
run;
The trick is to drop the variables you are adding from the first data set A; the new variables will come from the second data set B.
Of course, this solution does not preserve the original order of the observations in your data set AND only works because your second data set contains unique values of id.
I tried this and it worked for me, even if you have data you would like to preserve in that column. Just for completeness sake I added an SQL variant too.
data a;
input id a;
datalines;
1 10
2 20
;
data b;
input id a;
datalines;
1 .
1 5
1 .
2 .
3 4
;
data c (drop=b);
merge a (rename = (a=b) in=ina) b (in = inb);
by id;
if b ne . then a = b;
run;
proc sql;
create table d as
select a.id, a.a from a right join b on a.id=b.id where a.id is not null
union all
select b.id, b.a from a right join b on a.id = b.id where a.id is null
;
quit;