Extract data using SQL, analyse and write back to SQL table - sas

I am a newbie to SAS Base, and I am struggling to create a simple program that extracts data from a table on my database, runs e.g. PROC MEANS, and writes the data back to the table.
I know how to use PROC SQL (read and update tables) and PROC MEANS, but I can't figure out how to combine the steps.
PROC SQL;
SELECT make,model,type,invoice,horsepower
FROM
SASHELP.CARS
;
QUIT;
PROC Means;
RUN;
What I want to accomplish is create an additional column in the dataset with e.g. the mean of the horsepower.. and in the end I want to write that computed column to the table on the database.
Edit
What I was looking for is this:
PROC SQL;
create table want as
select make,model,type,invoice,horsepower
, mean(horsepower) as mean_horsepower
from sashelp.cars
;
QUIT;
PROC MEANS DATA=want;
RUN;

SAS makes this very easy to do with SQL since it will automatically remerge summary statistics back to detailed records.
create table want as
select make,model,type,invoice,horsepower
, mean(horsepower) as mean_horsepower
from sashelp.cars
;
Or using normal SAS code.
proc means data=sashelp.cars nway noprint ;
var horsepower ;
output out=mean_horsepower mean=mean_horsepower ;
run;
data want ;
set sashelp.cars ;
if _n_=1 then set mean_horsepower (keep=mean_horsepower);
run;

Related

SAS - Change date format returned from database

I'm pulling data from many Teradata tables that have dates stored in MM/DD/YYYY format (ex: 8/21/2003, 10/7/2013). SAS returns them as DDMMMYYYY, or DATE9 format (ex: 21AUG2003, 07OCT2013). Is there a way to force SAS to return date variables as MM/DD/YYYY, or MMDDYY10 format? I know I can manually specify this for specific columns, but I have a macro set up to execute the same query for 65 different tables:
%macro query(x);
proc sql;
connect using dbase;
create table &x. as select * from connection to dbase
(select *
from table.&x.);
disconnect from dbase;
quit;
%mend(query);
%query(bankaccount);
%query(budgetcat);
%query(timeattendance);
Some of these tables will have date variables and some won't. So I'd like the value to be returned as MMDDYY10 format by default. Thanks for your help!
Per the comments to my question, I was able to figure this out using the FMTINFO function. I pretty much used this same code:
proc contents data=mylib._all_ noprint out=contents;
run;
data _null_;
set contents;
where fmtinfo(format,'cat')='date';
by libname memname ;
if first.libname then call execute(catx(' ','proc datasets nolist lib=',libname,';')) ;
if first.memname then call execute(catx(' ','modify',memname,';format',name)) ;
else call execute(' '||trim(name)) ;
if last.memname then call execute(' DDMMYYS10.; run;') ;
if last.libname then call execute('quit;') ;
run;
Found here:
https://communities.sas.com/t5/SAS-Procedures/Change-DATE-formats-to-DDMMYYS10-for-ALL-unknown-number-date/td-p/366637

Out of Memory using PROC FREQ

I have approximately 1,000,000 rows and 25 columns of data and I'm trying to return a list of column names, the number of distinct values and whether there are missing values.
I am not able to directly code in column names in PROC SQL and count distinct as I have numerous data sets with different column names and I'm trying to automatically return the desired outcome for all tables with one piece of code.
I've tried running the following code
proc freq nlevels data= &DATASET_NAME;
ods output nlevels=nlevels ;
tables _all_ NOPRINT;
run;
This returns an out of memory error. Is there another way to achieve the result, avoiding the out of memory error.
It is unnecessary to input column name by table _all_, but it possibly makes out of memory by inputting all columns at the same time, try to separate column to do proc freq and then combine results:
proc sql;
create table name as
select name from dictionary.columns where libname='SASHELP' and memname='CLASS';
quit;
data want;
run;
data _null_;
set name;
call execute(
'proc freq data=class nlevels;
table '||name||';
ods output nlevels=nlevels;
run;
data want;
set want nlevels;
run;'
);
run;
This question is very similar to SAS summary statistic from a dataset
The answers cover techniques for
transpose + freq
hash
freq w/ ODS exclude+output

Including proc sort in proc sql

Below sample data is from oracle database
promo flag
vijay a
vijay b
vijay c
sam b
sam g
sam c
I have one proc sql statement connected to oracle(though i have not mentioned oracle connection below)
proc sql;
create table a as select *from new;
quit;
then two proc sort statement based on above dataset a.
proc sort data = a;
by promo descending flag;
run;
proc sort data =a nodupkey out =new1;
by promo;
run;
Now I want do these two proc sort statements inside proc sql statement itself. Any idea how to do?
proc sql;
create table want as
select distinct promo,flag from new group by promo having flag=max(flag);
quit;

Put everything into sas sql

I have two codes one proc sql and another proc and datastep. Both are interlinked datasets.
Below is the proc sql lines.
create table new as select a.id,a.alid,b.pdate from tb a inner join
tb1 act on a.aid =act.aid left join tb2 as b on (r.alid=a.alid) where
a.did in (15,45); quit;
Below is the proc and datasteps created from above datatset new.
proc sort data = new uodupkey;
by alid;
data new1;
set new;
format ddate date9.
dat1=datepart(today);
datno=input(number,20.);
key=_n_;
rename alid blid;
run;
proc sort data=new1 nodupkey;
by datno dat1;
run;
I need to put everything into single proc sql step.
You mention two data steps but I only see one.
Anyway, your data step and proc sort can indeed be written in one sql query (which you can then insert in your proc sql):
proc sql;
create table new1 as
select id
,alid as blid
,pdate
,datepart(today) as dat1
,input(number,20.) as datno
,monotonic() as key
from new1
group by datno, dat1
having key=min(key)
;
quit;
One remark though. Your data step expects variables called ddate,today and number in your input dataset new. If that dataset is supposed to be the result of your first sql query, then those variables don't exist and their values along with those of dat1 and datno in new1 will always be missing.
Also I assume you misspelled nodupkey on your proc sort.
EDIT: or, to have it all in the same query (if that's what you meant with "the same proc sql"):
proc sql;
create table new1 as
select id
,alid as blid
,pdate
,datepart(today) as dat1
,input(number,20.) as datno
,monotonic() as key
from (
select a.id,a.alid,b.pdate
from tb a
inner join tb1 act
on a.aid =act.aid
left join tb2 as b
on (r.alid=a.alid)
where a.did in (15,45)
)
group by datno, dat1
having key=min(key)
;
quit;

SAS equivalent to R’s is.element()

It’s the first time that I’ve opened sas today and I’m looking at some code a colleague wrote.
So let’s say I have some data (import) where duplicates occur but I want only those which have a unique number named VTNR.
First she looks for unique numbers:
data M.import;
set M.import;
by VTNR;
if first.VTNR=1 then unique=1;
run;
Then she creates a table with the duplicated numbers:
data M.import_dup1;
set M.import;
where unique^=1;
run;
And finally a table with all duplicates.
But here she is really hardcoding the numbers, so for example:
data M.import_dup2;
set M.import;
where VTNR in (130001292951,130100975613,130107546425,130108026864,130131307133,130134696722,130136267001,130137413257,130137839451,130138291041);
run;
I’m sure there must be a better way.
Since I’m only familiar with R I would write something like:
import_dup2 <- subset(import, is.element(import$VTNR, import_dup1$VTNR))
I guess there must be something like the $ also for sas?
To me it looks like the most direct translation of the R code
import_dup2 <- subset(import, is.element(import$VTNR, import_dup1$VTNR))
Would be to use SQL code
proc sql;
create table import_dup2 as
select * from import
where VTNR in (select VTNR from import_dup1)
;
quit;
But if your intent is to find the observations in IMPORT that have more than one observation per VTNR value there is no need to first create some other table.
data import_dup2 ;
set import;
by VTNR ;
if not (first.VTNR and last.VTNR);
run;
I would use the options in PROC SORT.
Make sure to specify an OUT= dataset otherwise you'll overwrite your original data.
/*Generate fake data with dups*/
data class;
set sashelp.class sashelp.class(obs=5);
run;
/*Create unique and dup dataset*/
proc sort data=class nouniquekey uniqueout=uniquerecs out=dups;
by name;
run;
/*Display results - for demo*/
proc print data=uniquerecs;
title 'Unique Records';
run;
proc print data=dups;
title 'Duplicate Records';
run;
Above solution can give you duplicates but not unique values. There are many possible ways to do both in SAS. Very easy to understand would be a SQL solution.
proc sql;
create table no_duplicates as
select *
from import
group by VTNR
having count(*) = 1
;
create table all_duplicates as
select *
from import
group by VTNR
having count(*) > 1
;
quit;
I would use Reeza's or Tom's solution, but for completeness, the solution most similar to R (and your preexisting code) would be three steps. Again, I wouldn't use this here, it's excess work for something you can do more easily, but the concept is helpful in other situations.
First, get the dataset of duplicates - either her method, or proc sort.
proc sort nodupkey data=have out=nodups dupout=dups;
by byvar;
run;
Then pull those into a macro list:
proc sql;
select byvar
into :duplist separated by ','
from dups;
quit;
Then you have them in &duplist. and can use them like so:
data want;
set have;
if not (byvar in &duplist.);
run;
data want;
set import;
where VTNR in import_dup1;
run;