SAS data manipulation for 2 table based on one column id - sas

I have 2 tables as followings:
Table 1
data table1;
input id $ value;
datalines;
A 1
A 2
B 1
B 2
C 1
D 1
;
Table 2
data table2;
input id $ value;
datalines;
A 1
B 2
C 1
D 1
E 1
;
As you may observed that the unique id for table 1 is A, B, C, D.
I would like to delete observations those id in table2 do not appear in table1.
Therefore last observation of table2 should be deleted as E not in {A, B, C, D}
Desired output:
A 1
B 2
C 1
D 1

You can do this with proc sql:
proc sql;
delete from table2
where not exists (select 1 from table1 where table1.id = table2.id);

Related

How to rank variables in SAS by summarising another variable

For example:
Groupvar Value
A 5
A 1
B 0
B 9
B 8
C 2
C 2
I want to rank by the Groupvar summarising on Value. So in this example sum(A) = 6, sum(B) = 17, sum(C) = 4. So Rank 1 = B, rank 2 = A, rank 3 = 6.
Ideal output:
Groupvar Value Rank
A 5 2
A 1 2
B 0 1
B 9 1
B 8 1
C 2 3
C 2 3
Any ideas how this can be done? I can create a proc summary > then rank> then merge the rank back on. But I'm wondering if there's a better way to do it.
You might be able to use the variable created by the LEVELS options plus ORDER=FREQ. For example.
data have;
input Groupvar:$1. Value;
cards;
A 5
A 1
B 0
B 9
B 8
C 2
C 2
;;;;
run;
proc summary nway order=freq missing;
class groupvar;
freq value;
output out=test(drop=_type_ index=(groupvar)) / levels;
run;
proc print;
run;
data want;
merge have test;
by groupvar;
run;
proc print;
run;
you can do it at one proc sql:
data test1;
Ord + 1;
input Groupvar$ Value;
cards;
A 5
A 1
B 0
B 9
B 8
C 2
C 2
;
run;
proc sql noprint;
create table Test2 as
select Groupvar, sum(Value) as Sum
from Test1
group by Groupvar
order by Sum
;
create table Test3 as
select a.*, b.Sum, b.Rank
from Test1 as a
left join (
select a.*, monotonic() as Rank
from Test2 as a
) as b
on a.Groupvar = b.Groupvar
order by Ord
;
quit;

SAS : Map a column from 1 table to any of the multiple columns in another table

I have a table1 that contains 4 different kind of ids
Data table1;
Input id1 $ id2 $ id3 $ final_id $;
Datalines;
1 a a1 p
2 b b2 q
- c c2 r
3 d - s
4 - d4 t
A table2 contains any of the ids from id1, id2 or id3 of table1:
Data table1;
Input id $ col1 $ col2 $;
Datalines;
1 gsh ywu
b hsjs kall
c2 jsjs ywe
3 sja weei
d4 ase uwh
I want to left join table1 on table2 such that I get a new column in table2 giving me final_id from table1.
How do i go about this problem?
Please help.
Thank you.
You can do it using SQL:
proc SQL noprint;
create table merged as
select b.final_id, a.*
from table2 as a left join table1 as b
on (a.id eq b.id1 or a.id eq b.id2 or a.id eq b.id3)
;
quit;

add tables down on other table with variable name in SAS

I have 2 sas output tables. First table has a,b,c columns and second table has d,e,f columns
First table is :
a b c
1 2 3
4 5 6
Second table is :
d e f
7 8 9
Is it possible to append them in one sheet with desired output
a b c
1 2 3
4 5 6
d e f
7 8 9
Yes, you can append the second table to the first using proc append , you just need to rename the columns in second table before appending.
proc append base=table1 data=table2(rename=(d=a e=b f=c)) ;run;
Full Code:
data table1;
input a b c;
datalines;
1 2 3
4 5 6
;
run;
data table2;
input d e f;
datalines;
7 8 9
;
run;
proc append base=table1 data=table2(rename=(d=a e=b f=c)) ;run;
Table1 will look like this after appending:
a=1 b=2 c=3
a=4 b=5 c=6
a=7 b=8 c=9
Another Option: If all your data is characters, you just need a create a third table to hold the column names you want to append.
Full Code:
data table1;
input a $ b $ c $;
datalines;
1 2 3
4 5 6
;
run;
data table2;
input d $ e $ f $;
datalines;
7 8 9
;
run;
data table2_names;
input d $ e $ f $;
datalines;
d e f
;
run;
proc append base=table1 data=table2_names(rename=(d=a e=b f=c)) ;run;
proc append base=table1 data=table2(rename=(d=a e=b f=c)) ;run;
Output:
a=1 b=2 c=3
a=4 b=5 c=6
a=d b=e c=f
a=7 b=8 c=9

Get the unique combinations per variable in SAS

I am attempting to group by a variable that is not unique with a discrete variable to get the unique combinations per non-unique variable. For example:
A B
1 a
1 b
2 a
2 a
3 a
4 b
4 d
5 c
5 e
I want:
A Unique_combos
1 a, b
2 a
3 a
4 b, d
5 e
My current attempt is something along the lines of:
proc sql outobs=50;
title 'Unique Combinations of b per a';
select a, b
from mylib.mydata
group by distinct a;
run;
If you are happy to use a data step instead of proc sql you can use the retain keyword combined with first/last processing:
Example data:
data have;
attrib b length=$1 format=$1. informat=$1.;
input a
b $
;
datalines;
1 a
1 b
2 a
2 a
3 a
4 b
4 d
5 c
5 e
;
run;
Eliminate duplicates and make sure the data is sorted for first/last processing:
proc sql noprint;
create table tmp as select distinct a,b from have order by a,b;
quit;
Iterate over the distinct list and concatenate the values of b together:
data want;
length combinations $200; * ADJUST TO BE BIG ENOUGH TO STORE ALL THE COMBINATIONS;
set tmp;
by a;
retain combinations '';
if first.a then do;
combinations = '';
end;
combinations = catx(', ',combinations, b);
if last.a then do;
output;
end;
drop b;
run;
Result:
combinations a
a, b 1
a 2
a 3
b, d 4
c, e 5
You just need to put a distinct keyword in the select clause, eg:
title 'Unique Combinations of b per a';
proc sql outobs=50;
select distinct a, b
from mylib.mydata;
The run statement is unnecessary, the sql procedure is normally ended with a quit - although I personally never use it, as the statement will execute upon hitting the semicolon and the procedure quits anyway upon hitting the next step boundary.

Update one dataset with another without using PROC SQL

I have the below two datasets
Dataset A
id age mark
1 . .
2 . .
1 . .
Dataset B
id age mark
2 20 200
1 10 100
I need the below dataset as output
Output Dataset
id age mark
1 10 100
2 20 200
1 10 100
How to carry out this without using PROC SQL i.e. using DATA STEP?
There are many ways to do this. The easiest is to sort the two data sets and then use MERGE. For example:
proc sort data=A;
by id;
run;
proc sort data=B;
by id;
run;
data WANT;
merge A(drop=age mark) B;
by ID;
run;
The trick is to drop the variables you are adding from the first data set A; the new variables will come from the second data set B.
Of course, this solution does not preserve the original order of the observations in your data set AND only works because your second data set contains unique values of id.
I tried this and it worked for me, even if you have data you would like to preserve in that column. Just for completeness sake I added an SQL variant too.
data a;
input id a;
datalines;
1 10
2 20
;
data b;
input id a;
datalines;
1 .
1 5
1 .
2 .
3 4
;
data c (drop=b);
merge a (rename = (a=b) in=ina) b (in = inb);
by id;
if b ne . then a = b;
run;
proc sql;
create table d as
select a.id, a.a from a right join b on a.id=b.id where a.id is not null
union all
select b.id, b.a from a right join b on a.id = b.id where a.id is null
;
quit;