SAS: Run SQL-query using a macro - sas

Using an answer from this thread, I was trying to get to work the following code. I have a list of sql-queries in a table plus an id for each query. Now I'd like to have the results of these queries plus the id as another table.
/* The Macro */
%macro run_query(q,id);
proc sql noprint;
select count into: count
from (&q.) a;
quit;
%mend;
/* Some fake-data */
DATA queries;
INPUT id :$12. query :$3000.;
INFORMAT id $12.;
INFILE DATALINES DSD;
DATALINES;
01,SELECT COUNT(*) AS count FROM sashelp.bweight WHERE Married=1
0101,SELECT COUNT(*) AS count FROM sashelp.bweight WHERE Boy=1
0102,SELECT COUNT(*) AS count FROM sashelp.bweight WHERE Black=1
;
RUN;
/* Make a copy of the dataset */
DATA want;
SET queries;
RUN;
/* Insert the results */
data want;
set queries;
call execute(%nrstr(%run_query('||query||','||id||')));
run;
Can anyone see, what the problem is? The error report looks like this:

In part /* Insert the results */ you're basically sending all your values/results into dev null with the data step:
data _null_;
Instead try:
data want;

you can try this for the second part
use proc sql in macro to extract the count, and build dataset using variable count1,count2,count3
%macro a;
proc sql;
select count(*) into :count1 FROM sashelp.bweight WHERE Married=1;
SELECT COUNT(*) into :count2 FROM sashelp.bweight WHERE Boy=1;
SELECT COUNT(*) into :count3 FROM sashelp.bweight WHERE Black=1;
quit;
DATA queries;
length id $12 query $3000;
format id $12. query $3000.;
infile datalines delimiter=',';
input id $ query $;
datalines;
01,&count1
0101,&count2
0102,&count3
;
run;
%mend a;
/*Call above maco*/
%a;

Related

Using PRXMATCH to match strings from another sas dataset

Need your assistance and guidance. Please see below
*rsubmit;proc sql;
connect to teradata(user=&user_id. password=&user_pwd.);
create table mylib.DWH_table as select * from connection to teradata(
select distinct nm from DWH_table
);
quit;*endrsubmit;
*rsubmit;
DATA mylib.out_sas1;
set mylib.DWH_table;
if prxmatch ("m/studio/i",nm) > 0;
run;*endrsubmit;
So the above code checks for the word "studio" in the column nm and returns the results. However, this is a manual process that needs to be automated. I have another dataset that contains just one column named "KEYWORDS". Some of the sample data I have given below
KEYWORDS:
apple
mango
banana
grapes
The goal is that SAS should take the word in the column and compare it to the value in the database and create a separate output table.
So for example:
*rsubmit;
DATA mylib.out_sas2;
set mylib.DWH_table;
if prxmatch ("m/apple/i",nm) > 0;
run;*endrsubmit;
*rsubmit;
DATA mylib.out_sas3;
set mylib.DWH_table;
if prxmatch ("m/mango/i",nm) > 0;
run;*endrsubmit;
Can this be done in SAS?
Put your keywords in macro vairables
proc sql;
select count(distinct KEYWORDS)
into :no_keys
from mylib.MY_KEYWORDS;
select distinct KEYWORDS
into :key_1-key_&no_keys
from mylib.MY_KEYWORDS;
quit;
Now use those macro variables
%macro find_keywords;
data
%do key_nr = 1 %to &no_keys;
mylib.out_sas&key_nr (drop = UP_nm)
%end;
;
set mylib.DWH_table;
UP_nm : upcase(nm);
%do key_nr = 1 %to &no_keys;
keyword = "&key.";
if prxmatch ("m/&&key_&key_nr/i",UP_nm) > 0 then output out_sas&key_nr;
%end;
run;
%mend;
%find_keywords;
You need to embed this in a macro, because you cannot use %do ... %end; in "open" code. && resolves to &, which makes it a delayed &, that is resolved after resolving &key_nr.
Disclaimer: this code is not tested. If you have trouble getting it running, please respond.
Consider a macro call via a data step using CALL EXECUTE:
%macro subset_data(key);
%let name_unquoted = %qsysfunc(compress(&key., %str(%")));
data mylib.out_&name_unquoted.;
set mylib.DWH_table;
if prxmatch ("m/"||trim(&key.)||"/i",nm) > 0;
run;
%mend;
data _null_;
set mydata;
call execute('%nrstr(%subset_data("'||KEYWORDS||'"))');
run;
Alternatively, instead of call execute, create a SAS script file of macro calls, then run with %include:
data _null_;
set mydata;
file "Temp.sas" ;
put '%subset_data("' KEYWORDS '") ;' ;
run;
%include "Temp.sas";
But if keywords are many (i.e., tens to hundreds to thousands), consider #Richard's comment above to develop an indicator column in a concatenated dataset via a helper, temp dataset:
%macro subset_data(key);
*** BUILD temp WITH INDICATOR;
data temp;
set mylib.DWH_table;
if prxmatch ("m/"||trim(&key.)||"/i",nm) > 0;
keyword = &key.;
run;
*** CONCATENATE temp;
data mylib.subset_data;
set mylib.subset_data
temp;
run;
%mend;
Reproducible Example (using sashelp.class dataset)
proc contents data = sashelp.class; run;
%macro subset_data(key);
%let name_unquoted = %qsysfunc(compress(&key.,%str(%")));
data &name_unquoted.;
set sashelp.class;
if prxmatch("m/"||trim(&key.)||"/i", Name) > 0;
run;
%mend;
data keywords;
input id keyword $;
datalines;
1 w
2 u
3 y
;
data _null_;
set keywords;
call execute('%nrstr(%subset_data("'||keyword||'"))');
run;
proc sql version
%macro subset_data(key);
%let name_unquoted = %qsysfunc(compress(&key., %str(%")));
proc sql;
create table &name_unquoted. as
select * from mylib.DWH_table
where nm like "%" || trim(&key.) || "%";
-- where nm index(nm, trim(&key.)) > 0;
quit;
%mend;
proc sql (with SAS## datasets)
data keywords;
set keywords;
dname = cat("", "sas", _n_);
run;
%macro subset_data(key, dname);
%let name_unquoted = %qsysfunc(compress(&dname.,%str(%")));
proc sql;
create table &name_unquoted. as
select * from mylib.DWH_table
where nm like "%" || trim(&key.) || "%";
-- where nm index(nm, trim(&key.)) > 0;
quit;
%mend;
data _null_;
set keywords;
call execute('%nrstr(%subset_data("'||keyword||'", "'||dname||'"))');
run;
One idea is to perform an cross join on an is match criteria. The result is one table with one row per name noun match.
Sample data and code:
data names;
length name $80;
infile cards length=L;
input name $varying. L;
datalines;
Bob
Bob's Burgers
Angel
Angle iron city
Chad
Chadwicks town council
Dutch
Edward
run;
data nouns;
length noun $10;
infile cards length=L;
input noun $varying. L;
datalines;
chad
own
ward
burger
run;
/*
* might want to pre lowercase the data being matched up
data lower_names;
set names;
lower_name = lower(name);
data lower_nouns;
lower_noun = lower(noun);
run;
*/
proc sql;
create table want as
select name, noun
from names as NAME
cross join nouns as NOUN
where index(lowcase(NAME),lowcase(trim(NOUN))) >= 1 /* SAS INDEX() result: 1 or higher means noun is present */
;
quit;
Regardless of your approach there will be a lot of activity. Suppose there are 100 nouns to be checked against all names, that's 26M names x 100 nouns = 2.6B is match evaluations. The system that is the most powerful and resource available will usually get you the fastest answer.
Case 1: SAS installation better
Download names to SAS
cross join names to nouns in SAS
Case 2: Teradata installation is better
Upload nouns to Teradata
cross join names to nouns in Teradata (via passthrough SQL)
Case 1 code:
Proc SQL;
connect to (user=&user_id. password=&user_pwd.);
* download names;
create table mylib.DWH_names as
select * from connection to Teradata (
select distinct nm from DWH_table
);
create table work.NameNounMatches as
select
nm,
noun
from
mylib.dwh_names as NAMES
cross join
mylib.nouns as NOUNS
where
INDEX(lowcase(NAMES.nm),lowcase(trim(NOUNS.noun))) >= 1
;
Case 2 code:
Teradata temp table -- Upload (connection=global) from Tom on https://communities.sas.com/t5/SAS-Enterprise-Guide/SAS-Access-to-Teradata-How-to-create-Temporary-tables-in/td-p/228852
libname tdwork teradata username=&username password=&password server=&server
connection=global dbmstemp=yes
;
data tdwork.NOUNS_UPLOADED;
set mylib.nouns;
run;
* cross join in Teradata via passthrough;
proc sql;
connect using tdwork;
create table work.NameNounMatches as
select * from connection to tdwork
( select Cust.UNIQUE_ID,IP.IP_NAME
from TABLE_DWH as NAMES_LIST
cross join NOUNS_UPLOADED as NOUNS_LIST
where POSITION(NAMES_LIST.nm,NOUNS_LIST.noun) >= 1
);
quit;

How do I calculate range of a variable in SAS?

I have a table in SAS which has a variable say X.
I want to know only the range of X, I used PROC UNIVARIATE, but it gives out a lot of other information.
I have been trying to use RANGE function in the following way, but doesn't yield any result. Please help!
DATA DATASET2;
SET DATASET1;
R=RANGE(X);
KEEP R;
RUN;
PROC PRINT DATASET2;
RUN;
the range function is for within a row and you have tried for column, so probably you might have got zeros.
range function can be used as follows.
R= range(x,y,x);
For within an column you need use proc means.
proc means data=sashelp.class range maxdec=2;
var age;
run;
or by using proc sql as shown below.
proc sql;
select max(age) -min(age) as range
from sashelp.class;
You can also use the range function in proc sql, where it operates on columns rather than rows:
proc sql;
select range(age) from sashelp.class;
quit;
This is also possible within a data step, if you don't like sql:
data _null_;
set sashelp.class end = eof;
retain min_age max_age;
min_age = min(age,min_age);
max_age = max(age,max_age);
if eof then do;
range = max_age - min_age;
put range= min_age= max_age=;
end;
run;
Or equivalently:
data _null_;
do until (eof);
set sashelp.class end = eof;
min_age = min(age,min_age);
max_age = max(age,max_age);
end;
range = max_age - min_age;
put range= min_age= max_age=;
run;

SAS - Add origin table name as a column in report

I have an output table that contains 300+ variables from 30 different tables that are joined by UNION, which is used for modelling. I have created a macro that creates a report with a number of statistics, such as mean, min/max values etc. using this output table. I am trying to add a column to the report that details which table(s) the variables come from. I say table(s) as some of the variables are shared across different tables. I want to avoid having the same variable in the report multiple times as the statistics are the same irrespective of what table the variable comes from. Is there an efficient way to do this?
Instead of UNION consider using a DATA STEP and then use the INDSNAME option instead.
data want;
set sashelp.class sashelp.cars indsname=source;
source_dataset = source;
run;
If it were me, I would loop over each of the union datasets and just put the table name and variable names into a compiled dataset. You probably have all the table names in either a macro list or typed out, so you can just add a few more lines of code to run proc contents on each of those to compile a full list of table and variable names. Note that like your example, there will be duplicate variable names that you can modify after the table is compiled:
** create different tables **;
data height; set sashelp.class(keep=name height); run;
data weight; set sashelp.class(keep=name weight); run;
data sex; set sashelp.class(keep=name sex); run;
** put your datasets into a list either manually or dynamically **;
/* manually */
%let ds_list=height weight sex;
/* dynamically -- be careful to include only tables in your union */
proc sql noprint;
select MEMNAME
into: ds_list separated by " "
from sashelp.vmember
where libname = "WORK" and memname not in ("SASMACR","FORMATS");
quit;
%put &ds_list.;
** loop over each table to put the table name and variables in a dataset **;
%MACRO get_names(ds_list);
%do i=1 %to %sysfunc(countw(&ds_list.));
%let ds = %scan(&ds_list.,&i.);
proc contents data = &ds. noprint
out=names_&ds.(keep=MEMNAME NAME rename=(MEMNAME=SOURCE_DATASET));
run;
proc append data = names_&ds. base=full force; run;
%end;
%MEND;
%get_names(&ds_list.);
I managed to do this using the following:
Create table with source tables.
PROC SQL;
CREATE TABLE SOURCES AS
SELECT NAME
,MEMNAME
FROM DICTIONARY.COLUMNS
WHERE LIBNAME='LIBNAME'
ORDER BY 1,2;
RUN;
Join to my stats table.
PROC SQL;
CREATE TABLE STATS_NEW AS
SELECT memname AS TABLE_NAME,a.*
FROM STATS a
LEFT JOIN SOURCES b
ON a.name = b.name
GROUP BY a.name
ORDER BY a.name;
QUIT;
Transpose data and add in comma separators.
DATA STATS_TRANSPOSE (drop=TABLE_NAME);
LENGTH INPUT_TABLES $1000;
SET STATS_NEW;
BY name;
RETAIN INPUT_TABLES;
IF FIRST.name THEN DO; INPUT_TABLES=TABLE_NAME; END;
IF NOT FIRST.name
THEN DO;
INPUT_TABLES=CATS(INPUT_TABLES,', ',TABLE_NAME);
END;
IF LAST.name THEN DO;
IF name IN ('FIELD1','FIELD2')
THEN DO; INPUT_TABLES='ALL'; END;
OUTPUT;
END;
RUN;

replacing field name suffixes in bulk

I have a dataset where I have several variables with suffixes that correspond to given dates. I want to replace the suffixes with the dates to make my output tables more user friendly.
Here is a sample of my code
the fields in my sales dataset are
product number_of_sales_1 number_of_sales_2 number_of_sales_3 revenue_1 revenue_2 revenue_3 tax_1 tax_2 tax_3
The suffixes 1,2,3 correspond to dates which are held in a second dataset with the following format
dates
id date
1 01Apr
2 01May
3 01Jun
I want to bulk replace the suffixes with the dates so my fields in sales become
product number_of_sales_01Apr number_of_sales_01May number_of_sales_01Jun revenue_01Apr revenue_01May revenue_01Jun tax_01Apr tax_01May tax_01Jun
Both the number of dates and the numberof metrics in sales are dynamic so I can't just hardcode in the the code.
I assume your datasets look like below:
data sales;
product="abc";number_of_sales_1=1;number_of_sales_2=2;number_of_sales_3=3;
revenue_1=1000;revenue_2=2000;revenue_3=3000;tax_1=100;tax_2=200;tax_3=300;
run;
data dates;
id=1;date="01Apr";output;id=2;date="01May";output;id=3;date="01Jun";output;
run;
1st Step - Finding out the dates variables which needs to be renamed
proc contents data=sales out=sales_temp(keep=name) noprint; run;
data sales_temp1;
length check_date_vars $1. id 8.;
set sales_temp;
check_date_vars=compress(substr(name,length(name)));
temp=notdigit(check_date_vars);
if temp=0 then id=check_date_vars;
run;
2nd step - Merging the above dataset with the datset which contains the formats, to create a mapping between old names and new names and creating macro variables out of it
proc sort data=sales_temp1; by id; run;
proc sort data=dates; by id; run;
data sales_temp_date;
merge sales_temp1(in=a) dates(in=b);
by id;
if a and b;
new_name=substr(name,1,length(name)-1)||date;
run;
proc sql noprint;
select count(*) into :num_vars separated by " " from sales_temp_date;
quit;
proc sql noprint;
select name into:old_name1 - :old_name&num_vars. from sales_temp_date;
select new_name into:new_name1 - :new_name&num_vars. from sales_temp_date;
quit;
3rd Step - Renaming the variables
%macro rename();
proc datasets library=work nolist;
modify sales;
rename
%do i=1 %to &num_vars.;
&&old_name&i.= &&new_name&i.
%end;
;
run;
%mend;
%rename;

group by in sas

I've the below dataset as input
ID
--
1
2
2
3
4
4
4
5
And need a new dataset as below
ID count of ID
-- -----------
1 1
2 2
3 1
4 3
5 1
Could you please tell how to do this in SAS wihtout using PROC SQL?
or how about Proc Freq or Proc Summary? These avoid having to presort the data.
proc freq data=have noprint;
table id / out=want1 (drop=percent);
run;
proc summary data=have nway;
class id;
output out=want2 (drop=_type_);
run;
proc sql noprint;
create table test as select distinct id, count(id)
from your_table
group by ID
order by ID
;
quit;
Try this:
DATA Have;
input id ;
datalines;
1
2
2
3
4
4
4
5
;
Proc Sort data=Have;
by ID;
run;
Data Want;
Set Have;
By ID;
If first.ID then Count=0;
Count+1;
If Last.ID then Output;
Run;
PROC SORT DATA=YOURS NOPRINT;
BY ID; RUN;
PROC MEANS DATA=YOURS;
VAR ID;
BY ID;
OUTPUT OUT=NEWDATASET N=; RUN;
You can also choose to keep only the Id and N variables in your newdataset.
We can use simple PROC SQL count to do this:
proc sql;
create table want as
select id, count(id) as count_of_id
from have
group by id;
quit;
Here is yet another possibility, often known as a DoW construction:
Data want;
do count=1 by 1 until(last.ID);
set have;
by id;
end;
run;
If the aggregation you want to do is complex then go with PROC SQL only as we are more familiar with Group by in SQL
proc sql ;
create table solution_1 as select distinct ID, count(ID)
from table_1
group by ID
order by ID
;
quit;
OR
If you are using SAS- EG Query builders are very useful in small
analyses .
It's just drag & drop the columns u want to aggregate and in summary option Select whatever operation you want to perform like Avg,Count,miss,NMiss etc .