Union tables with different columns - sas

I have two large tables (~1GB each) with many different columns on which I want to perform a union all in sas.
Currently, I use the following method with proc sql and union all.
SELECT A, B, '' as C from Table_1
UNION ALL
SELECT '' as A, B, C from Table_2
However, this is not preferable as I have dozens of rows in both tables and I am constantly adding to them. Therefore, I am looking for a way to automatically create the blank columns without having to explicitly write them out.
I also tried the following query:
select * from
(select * from Table_1),
(select * from Table_2)
However, this seems very computationally intensive and takes forever to run.
Are there any better ways to do this? I am also open to using data set instead of proc sql;

A simple data step should do a thing:
data result_tab;
set Table_1 Table_2;
run;
This will rewrite both tables. Records from Table_2 will be added at the end of the result_tab. Set statement in data step will declare variables from both input tables.

Unfortunately, PROC SQL does require all dataset to have the same variables when using UNION. If you can use DATA SET then PROC SORT NODUPKEY that would be simplest (maybe not most efficient). To use PROC SQL, uou need to assign NULL values to the missing variables. For example:
data dset1;
input var1 var2;
datalines;
1 2
2 2
3 2
;
run;
data dset2;
input var1 var3;
datalines;
4 1
5 1
6 1
;
run;
PROC SQL;
CREATE TABLE dset3 AS
SELECT var1, var2, . AS var3 FROM dset1
UNION
SELECT var1, . AS var2, var3 FROM dset2
QUIT;
PROC PRINT DATA=dset3; RUN;

Related

Put everything into sas sql

I have two codes one proc sql and another proc and datastep. Both are interlinked datasets.
Below is the proc sql lines.
create table new as select a.id,a.alid,b.pdate from tb a inner join
tb1 act on a.aid =act.aid left join tb2 as b on (r.alid=a.alid) where
a.did in (15,45); quit;
Below is the proc and datasteps created from above datatset new.
proc sort data = new uodupkey;
by alid;
data new1;
set new;
format ddate date9.
dat1=datepart(today);
datno=input(number,20.);
key=_n_;
rename alid blid;
run;
proc sort data=new1 nodupkey;
by datno dat1;
run;
I need to put everything into single proc sql step.
You mention two data steps but I only see one.
Anyway, your data step and proc sort can indeed be written in one sql query (which you can then insert in your proc sql):
proc sql;
create table new1 as
select id
,alid as blid
,pdate
,datepart(today) as dat1
,input(number,20.) as datno
,monotonic() as key
from new1
group by datno, dat1
having key=min(key)
;
quit;
One remark though. Your data step expects variables called ddate,today and number in your input dataset new. If that dataset is supposed to be the result of your first sql query, then those variables don't exist and their values along with those of dat1 and datno in new1 will always be missing.
Also I assume you misspelled nodupkey on your proc sort.
EDIT: or, to have it all in the same query (if that's what you meant with "the same proc sql"):
proc sql;
create table new1 as
select id
,alid as blid
,pdate
,datepart(today) as dat1
,input(number,20.) as datno
,monotonic() as key
from (
select a.id,a.alid,b.pdate
from tb a
inner join tb1 act
on a.aid =act.aid
left join tb2 as b
on (r.alid=a.alid)
where a.did in (15,45)
)
group by datno, dat1
having key=min(key)
;
quit;

SAS date operation within proc sql

I'm very new to SAS, and am trying to modify this piece of code
proc sql;
select a, Current_Date - 2 - b as some_date
from table
Current_Date is a function in sas. I'm trying to replace it with my own date. a and b are column names in a database i'm connecting to. I tried changing Current_Date to '12nov2013'd, but it doesn't seem to work.
I tried:
1.
%let startFrom = '12nov2013'd;
proc sql;
select a, &startFrom - 2 - b as some_date
from table
this doesn't work.
2.
proc sql;
select a, input('12nov2013',date9.) - 2 - b as some_date
from table
This doesn't work
How should Ii do this date operation in SAS
3.
proc sql;
select a, intck('DAY','19nov2015'd,rqo_tran_date_alt) as TranMonth
from table
this also doesn't work
Nothing really wrong with your first example, though it could be improved.
%let startFrom = '12nov2013'd;
data have;
input a $ b;
datalines;
Zero 0
Five 5
Ten 10
Hundred 100
;;;;
run;
proc sql;
select a, &startFrom - 2 - b as some_date format=date9.
from have;
quit;
Adding the format is pretty helpful usually, though not required. This assumes b is a numeric variable containing a number of days. If it contains anything else, or is character, this won't necessarily give you the correct result.

How to count distinct of the concatenation/cross of two variables in SAS Proc Sql?

I know in teradata or other sql platforms you can find the count distinct of a combination of variables by doing:
select count(distinct x1||x2)
from db.table
And this will give all the unique combinations of x1,x2 pairs.
This syntax, however, does not work in proc sql.
Is there anyway to perform such a count in proc sql?
Thanks.
That syntax works perfectly fine in PROC SQL.
proc sql;
select count(distinct name||sex)
from sashelp.class;
quit;
If the fields are numeric, you must put them to character (using put) or use cat or one of its siblings, which happily take either numeric or character.
proc sql;
select count(distinct cats(age,sex))
from sashelp.class;
quit;
This maybe redundant, but when you mentioned "combination", it instantly triggered 'permutation' in my mind. So here is one solution to differentiate these two:
DATA TEST;
INPUT (X1 X2) (:$8.);
CARDS;
A B
B A
C D
C D
;
PROC SQL;
SELECT COUNT(*) AS TOTAL, COUNT(DISTINCT CATS(X1,X2)) AS PERMUTATION,
COUNT(DISTINCT CATS(IFC(X1<=X2,X1,X2),IFC(X1>X2,X1,X2))) AS COMBINATION
FROM TEST;
QUIT;

Data step merge PROC SQL equivalent flagging which table record was found in

I merge two data sets as follows:
data ds3;
merge ds1(in=in1) ds2(in=in2);
by mrgvar;
if in1;
if in2 then flag=1;
run;
If I were to do this with a PROC SQL step instead, how can I set the flag variable as above?
proc sql;
create table ds3 as
select a.*
,b.*
,???
from ds1 as a
left join
ds2 as b
on a.mrgvar=b.mrgvar;
quit;
A common way is to use the table alias with the join variable.
proc sql;
create table ds3 as
select a.*
,b.*
,case when b.mrgvar is null then 0 else 1 end as flag
from ds1 as a
left join
ds2 as b
on a.mrgvar=b.mrgvar;
quit;
Something to that effect - if b.mrgvar is null/missing then it's only coming from table a. (Yes, you can separately reference the two even though they're basically the same and get combined in the result table.)

SAS enterprise guide summing by personal ID

I have a dataset which has multiple obs per person. I want to have each single record showing the sum of a variable per person ID. However I do not want to group the data into single personal IDs. I hope the example below explains my question
I want to create the column in bold. How to do this? In SAS EG (or SAS if necessary)?
ID...Var1...SUM
X.....10.......30
X.....20.......30
Y.....20.......80
Y.....20.......80
Y.....40.......80
Z.....30.......30
You can do this using either proc sql or proc means
more info:proc means
proc sql
proc sql:
proc sql noprint;
create table new_table as
select distinct id, var1, sum(var_to_sum) as summed_var_name
from old_table
group by id
;
quit;
after rereading your question, using proc means you will need to merge var1 back in, better off using proc sql above.
proc means:
proc means data = old_table sum;
by id var1;
var var_to_sum;
output out = new_table sum;
run;