SAS proc compare, 2 versions of the same table - sas

I have two tables, version 1 and version n of the same dataset. The data can be in any format. I need to compare these tables, and find out the rows and values, which are different. There could be additional rows in the new tables.
My code:
proc compare
base=dk_upd
compare=dk_old
out=compare_test
outall
OUTNOEQUAL
allobs .
;
run;
Tried various options, couldn't find the needed combination

Related

Compare datasets and plot in sas

I have two variables in 2 separate datasets in sas. Both have a primary key of Customer_Id and another column say LVR . One dataset has old values for the LVR Column. The other one has values from the new calculation for the same column.
I need to show the differences between both on a graph.
I tried to merge them and then tried proc gplot to plot the two LVRs.
Merged dataset looks something like this :
Cust_id LVR_new LVR_old
111 1 2
222 2 .
333 5 4
The dataset containing LVR_new is almost twice in size (number of rows) than the one containing LVR_old.We got more customers qualifying post the new calculations.
The merged dataset has 3046778 observations and 3 variables.
I tried to use proc gplot using the code below:
proc plot data=djia;
plot LVR_old*LVR_new = Cust_id;
run;
This has been running since long so i don't expect the results are going to be very useful.
Can anyone please suggest how can I achieve this. I need to showcase the differences between the two datasets on a graph to be able to show the shift in the results.
Thanks!
Why not use PROC TTEST? There are some ODS GRAPHICS plots that PROC TTEST makes.
Your problem looks exactly liked the paired comparisons example in the documentation.
http://support.sas.com/documentation/cdl/en/statug/67523/HTML/default/viewer.htm#statug_ttest_examples03.htm
proc ttest;
paired LVR_old*LVR_new;
run;

Using Proc Compare to compare observations in columns from two different datasets

I'm trying to use proc compare to compare two columns from two different datasets. Below is a snippet of my code:
proc compare base=combT compare=groupT out=results allobs listobs;
var comb_Var ;
with descr;
run;
The issue it is saying there is nothing in common between the two columns, while I can see there are some common observation between the two columns. Any guidance will be helpful.
I'm using SAS EG and the observations in both columns are char type.

Filter a SAS dataset to contain only identifiers given in a list

I am working in SAS Enterprise guide and have a one column SAS table that contains unique identifiers (id_list).
I want to filter another SAS table to contain only observations that can be found in id_list.
My code so far is:
proc sql noprint;
CREATE TABLE test AS
SELECT *
FROM data_sample
WHERE id IN id_list
quit;
This code gives me the following errors:
Error 22-322: Syntax error, expecting on of the following: (, SELECT.
What am I doing wrong?
Thanks up front for the help.
You can't just give it the table name. You need to make a subquery that includes what variable you want it to read from ID_LIST.
CREATE TABLE test AS
SELECT *
FROM data_sample
WHERE id IN (select id from id_list)
;
You could use a join in proc sql but probably simpler to use a merge in a data step with an in= statement.
data want;
merge oneColData(in = A) otherData(in = B);
by id_list;
if A;
run;
You merge the two datasets together, and then using if A you only take the ID's that appear in the single column dataset. For this to work you have to merge on id_list which must be in both datasets, and both datasets must be sorted by id_list.
The problem with using a Data Step instead of a PROC SQL is that for the Data step the Data-set must be sorted on the variable used for the merge. If this is not yet the case, the complete Data-set must be sorted first.
If I have a very large SAS Data-set, which is not sorted on the variable to be merged, I have to sort it first (which can take quite some time). If I use the subquery in PROC SQL, I can read the Data-set selectively, so no sort is needed.
My bet is that PROC SQL is much faster for large Data-sets from which you want only a small subset.

How to compare variables in more than two datasets in sas?

How to compare variables in more than two tables?
I even tried with compare but its not comparing more than two tables.
Using PROC COMPARE will not tell you much about the data aside from whether or not 2 tables have the same variable (which I assume is not the info you need).
You'll need to merge the tables together using MERGE. Your code will look something like this:
DATA TABLE1 TABLE2;
MERGE TABLE1 (IN=A) TABLE2 (IN=B);
BY VAR1;
IF A AND B THEN OUTPUT NEWTABLENAME;
ELSE IF A AND NOT B THEN OUTPUT NEWTABLENAME2;
RUN;
The BY statement tells SAS which variable you'd like the tables to merge on, as SAS isn't going to just smush some extra columns onto your existing table.
You can check out more about what PROC COMPARE does here
You can read more about PROC MERGE here

How to merge multiple datasets by common variable in SAS?

I have multiple datasets. Each of them has different number of attributes. I want to merge them all by common variable. This is 'union' if I use proc SQL. But there is hunderds of variables.
Example.
Dataset_Name Number of columns
dataset1 110
dataset2 120
dataset3 130
... ...
Say they have 100 columns in common. The final dataset which contains all dataset1,dataset2,dataset3..etc
only has common columns(in this case, 100 columns).
How do I do this?
And how do I get columns for each dataset this is not in common with the final dataset.
example: dataset1 will have 10 columns that are not in the final dataset, and list the name of 10 columns.
Thanks!!!!
UNION in SQL is equivalent to sequential SET in SAS.
data want;
set dataset1 dataset2 dataset3;
run;
Now, SAS by default includes all columns present in any dataset. To limit to just what's in all datasets, you have to use a keep statement.
You can determine this using proc sql, among other ways.
proc sql;
select name into :commonlist separated by ' '
from dictionary.columns C, dictionary.columns D
where C.libname=D.libname
and C.memname='DATASET1'
and D.memname='DATASET2'
and C.name=D.name
;
quit;
For more than two datasets it's more complicated and partially depends on your, but if you're comfortable in SQL you can figure that out pretty easily. A similar construct can create a list of just dataset 1 variables. The important part is the into :commonlist separated by ' ', which says to pull the select results into a macro variable called commonlist, separating rows by space. (The colon says to create a macro variable, not a table.)
So you can then run:
data want (keep=&commonlist.) dset1(keep=&dset1list.) dset2(keep=&dset2list.);
set dataset1(in=ds1) dataset2(in=ds2) dataset3(in=ds3);
output want;
if ds1 then output dset1;
else if ds2 then output dset2;
else if ds3 then output dset3;
run;
The in=xyz indicates which dataset a row came from. Each output dataset can have a separate list of variables to keep. You might want to keep the ID variable in those other datasets as well.
I will say that usually in SAS you don't do what you're doing here: it's not easy to do because it doesn't tend to be the best way to handle things - specifically, the little split off datasets. In general you would just keep those extra variables on the master dataset, and they'd just be nulls for anyone not in a dataset with that variable - assuming it makes sense to make this 'master' dataset at all.