ATTACHED SCREENSHOT OF DESIRED OUTPUTthe required condition is
"SUBJECT in A = SUBJECT in B
and
VISIT in A NE(not equal to) VISIT in B"
I would like to find the exact mismatch and missing VISIT from the below Tables A and B by using Proc SQL procedure, Can anyone help me please?
Table A
SUBJECT Test VISIT
1001 ABCB 1
1001 ABCD 2
1001 ABCD 3
1001 ABCD 5
Table B
SUBJECT Test VISIT1
1001 ABCD 2
1001 ABCD 1
1001 ABCD 4
Expected output:
SUBJECT Test VISIT VISIT1
1001 ABCD 3
1001 ABCD 5
1001 ABCD 4
VISIT 3 AND 5 IS PRESENT IN DATASET A NOT IN B AND VISIT 4 IS PRESENT IN DATASET2 NOT IN DATASET A , LIKE WISE
CODE FOR DATASET-
DATA A;
LENGTH SUBJECT 8 Test $10 visit 8;
INPUT SUBJECT Test $ visit ;
DATALINES;
1001 ABCD 1
1001 ABCD 2
1001 ABCD 3
1001 ABCD 5
;
RUN;
DATA B;
LENGTH SUBJECT 8 Test $10 visit1 8;
INPUT SUBJECT Test $ visit1 ;
DATALINES;
1001 ABCD 2
1001 ABCD 1
1001 ABCD 4
;
RUN;
Thanks in advance!
the code i tried is below (but not working as expected)-
****************(VISIT ) in A and not in B****;
proc sql;
create table SS1 as
select distinct a.* FROM
A a where a.visit not in(select s.visit1 from B s WHERE A.SUBJECT = S.SUBJECT );
create table INRAVE as
select * from SS1 A
left join
B B
on a.subject=b.SUBJECT and a.VISIT NE b.VISIT1
where b.SUBJECT is not null
;
quit;
****************VISIT in B and not in A****;
proc sql;
create table SS2 as
select distinct a.* from
B a where a.VISIT1 not in(select S.VISIT from A s WHERE A.SUBJECT = S.SUBJECT );
create table INVENDOR as
select * from SS2 A
left join
A B
on a.subject=b.SUBJECT and a.VISIT1 NE b.VISIT
where b.SUBJECT is not null
;
quit;
data ALL;;
set inrave invendor;
where subject=subject ;
RUN;
Seems you know SQL very well, why not try union all, just like this:
proc sql noprint;
create table C as
select *, 'A' as Source from A
where catx('#',SUBJECT,Test,visit) not in (
select distinct catx('#',SUBJECT,Test,visit1) from B
)
union all corr
select *, 'B' as Source from B(rename=VISIT1=VISIT)
where catx('#',SUBJECT,Test,visit) not in (
select distinct catx('#',SUBJECT,Test,visit) from A
)
;
create table D(drop=TmpVISIT Source) as
select *,
case when Source = 'B' then . else TmpVISIT end as VISIT,
case when Source = 'B' then TmpVISIT else . end as VISIT1
from C(rename=VISIT=TmpVISIT);
quit;
I get all obs from dataset A where not repeat in dataset B and do the oppsite with dataset B.
Well, I also get another solution, which is shorter:
proc sql noprint;
select catx('#',SUBJECT,Test,visit) into :Ununique separated by '" "' from (
select * from A union all select * from B(rename=visit1=visit)
)
group by SUBJECT, Test, visit
having count(*) > 1;
quit;
data D;
set A B;
if catx('#',SUBJECT,Test,coalesce(visit1,visit)) in ("&Ununique") then delete;
run;
Whereas, this method is limited by the max lenth of macro variable.
Related
The SAS proc sql allows user to do a count(distinct colname) , based on some group by dimension(s). What is the fastest way to achieve the same feature for SUM(distinct colname)?
data: have
grp1 grp2 col1 col2
a b 20 .
a b 30 10
a b 20 10
a b . 10
data want:
grp1 grp2 col1_sum col2_sum
a b 50(20+30) 10
So basically, for the dimension (a,b), I need a sum of the distinct values in col1 and col2.
sum(distinct col) as mentioned in your question should work:
data have;
input grp1 $1. grp2 $3 col1 col2;
datalines;
a b 20 .
a b 30 10
a b 20 10
a b . 10
;run;
proc sql;
select
grp1, grp2,
sum(distinct col1) as s1,
sum(distinct col2) as s2,
from have
group by grp1, grp2;
run;
... should yield results:
grp1 grp2 s1 s2
---- ---- ---- ----
a b 50 10
I have two datasets. Both have a common column- ID. I would like to check if ID from df1 lies in df2 and extract all such rows from df1. I'm doing this in SAS.
It is easily done in one sql query.
proc sql;
create table extract_from_df1 as
select
*
from
df1
where
id in (select id from df2)
;
quit;
There are lots of ways to do this. For example:
proc sql;
create table compare as select distinct
a.id as id1, b.id as id2
from table1 as a
left join table2 as b
on a.id = b.id;
quit;
and then keep matches. Or you can try:
proc sql;
delete from table2 where id2 in select distinct id1 from table1;
quit;
data df1;
input id name $;
cards;
1 abc
2 cde
3 fgh
4 ijk
;
run;
data df2;
input id address $;
cards;
1 abc
2 cde
5 ggh
6 ihh
7 jjj
;
run;
data c;
merge df1(in=x) df2(in=y);
if x and y;
keep id name;
run;
proc print data=c;
run;
I'm using this SAS code:
data test1;
input cust_id $
month
category $
status $;
datalines;
A 200003 ABC C
A 200004 DEF C
A 200006 XYZ 3
B 199910 ASD X
B 199912 ASD C
;
quit;
proc sql;
create view test2 as
select cust_id, input(put(month, 6.), yymmn6.) as month format date9.,
category, status from test1 order by cust_id, month asc;
quit;
proc expand data=test2 out=test3 to=month method=none;
by cust_id;
id month;
quit;
proc print data=test3;
title "after expand";
quit;
and I want to create a dataset that looks like this:
Obs cust_id month category status
1 A 01MAR2000 ABC C
2 A 01APR2000 DEF C
3 A 01MAY2000 . .
4 A 01JUN2000 XYZ 3
5 B 01OCT1999 ASD X
6 B 01NOV1999 . .
7 B 01DEC1999 ASD C
but the output from proc expand just says "Nothing to do. The data set WORK.TEST3 has 0 observations and 0 variables." I don't want/need to change the frequency of the data, just interpolate it with missing values.
What am I doing wrong here? I think proc expand is the correct procedure to use, based on this example and the documentation, but for whatever reason it doesn't create the data.
You need to add a VAR statement. Unfortunately, the variables need to be numeric. So just expand the month by cust_id. Then join back the original values.
proc expand data=test2 out=test3 to=month ;
by cust_id;
id month;
var _numeric_;
quit;
proc sql noprint;
create table test4 as
select a.*,
b.category,
b.status
from test3 as a
left join
test2 as b
on a.cust_id = b.cust_id
and a.month = b.month;
quit;
proc print data=test4;
title "after expand";
quit;
I know how to count group and subgroup numbers through proc freq or sql. My question is when some factor in the subgroup is missing, and I still want to show missing factor as 0. How can I do that? For example,
the data set is:
group1 group2
1 A
1 A
1 A
1 A
2 A
2 B
2 B
I want a result as:
group1 group2 N
1 A 4
1 B 0
2 A 1
2 B 2
If I only use the default SAS setting, it will usually show as
group1 group2 N
1 A 4
2 A 1
2 B 2
But I still want to the second line in the result tell to me that there are 0 observations in that category.
Use the SPARSE option within proc freq. Consider it a cross join between all options from GROUP1 and GROUP2.
data have;
input group1 group2 $;
cards;
1 A
1 A
1 A
1 A
2 A
2 B
2 B
;
run;
proc freq data=have;
table group1*group2/out=want sparse;
run;
proc print data=want;
run;
Reeza's sparse option works as long as each group is represented in your data at least once. Suppose there were a group1 3 that is not represented in your data, and you would still want them to show up in the frequency table. If that is the case, the solution is to create a reference table with all of your categories then right join your frequency table to it.
Create a reference table:
data ref;
do group1 = 1 to 3;
group2 = 'A';
output;
group2 = 'B';
output;
end;
run;
Create the frequency table with proc sql, right joining to the reference table:
proc sql;
select
r.group1,
r.group2,
count(h.group1) as freq
from
have h
right join ref r
on h.group1 = r.group1
and h.group2 = r.group2
group by
r.group1,
r.group2
order by
r.group1,
r.group2
;
quit;
Another option that's a cross between DWal's issue of "what if the data isn't in the data" and Reeza's One Proc, One Solution, is proc tabulate. If the format contains all possible values, even if the values don't appear, it works, with printmiss.
proc format;
value groupformat
1='Group 1'
2='Group 2'
3='Group 3'
;
quit;
data have;
input group1 group2 $;
cards;
1 A
1 A
1 A
1 A
2 A
2 B
2 B
;
run;
proc tabulate data=have;
class group1 group2/preloadfmt;
format group1 groupformat.;
tables group1*group2,n/printmiss misstext='0';
run;
How to do this via proc summary, using DWal's reference table to specify which combinations of values to use:
data ref;
do group1 = 1 to 3;
group2 = 'A';
output;
group2 = 'B';
output;
end;
run;
data have;
input group1 group2 $1.;
cards;
1 A
1 A
1 A
1 A
2 A
2 B
2 B
;
run;
proc summary nway data = have classdata=ref;
class group1 group2;
output out = summary (drop = _TYPE_);
run;
N.B. I had to tweak the have dataset slightly to make sure that group2 has length 1 in both datasets. If you use variables with the same name but different lengths in your classdata= and data= datasets, SAS will complain.
I have the below two datasets
Dataset A
id age mark
1 . .
2 . .
1 . .
Dataset B
id age mark
2 20 200
1 10 100
I need the below dataset as output
Output Dataset
id age mark
1 10 100
2 20 200
1 10 100
How to carry out this without using PROC SQL i.e. using DATA STEP?
There are many ways to do this. The easiest is to sort the two data sets and then use MERGE. For example:
proc sort data=A;
by id;
run;
proc sort data=B;
by id;
run;
data WANT;
merge A(drop=age mark) B;
by ID;
run;
The trick is to drop the variables you are adding from the first data set A; the new variables will come from the second data set B.
Of course, this solution does not preserve the original order of the observations in your data set AND only works because your second data set contains unique values of id.
I tried this and it worked for me, even if you have data you would like to preserve in that column. Just for completeness sake I added an SQL variant too.
data a;
input id a;
datalines;
1 10
2 20
;
data b;
input id a;
datalines;
1 .
1 5
1 .
2 .
3 4
;
data c (drop=b);
merge a (rename = (a=b) in=ina) b (in = inb);
by id;
if b ne . then a = b;
run;
proc sql;
create table d as
select a.id, a.a from a right join b on a.id=b.id where a.id is not null
union all
select b.id, b.a from a right join b on a.id = b.id where a.id is null
;
quit;