SAS conditional execution of proc sql - sas

I have created a stored process in SAS that prompts the user to select a month/year combination which looks like this 2015_10. Then from the next box they can click on a calendar and select a startdate which is a timestamp and an enddate also a timestamp.
I would like to combine this into one step, where the user only selects the start and end date. However, my source table is in SQL Server, and the tables are partitioned by months, and the tables are named like this datatabel_2015_10 where the last two digits represent the month. Once the user selects the month, I have proc sql query that table, and then after that there is another query to pull only the rows which fall between the start date and end date, those are time1 and time2 stored as character strings in MS SQL Server which look like this 30JAN2015:19:52:29
How can I code this up so as to eliminate the month/year prompt and only have two selections, namely startdatetime and enddatetime, and still get the query.
Concatenating the monthly tables is not an option because they are huge and runs forever even if I use a pass through query.
Please help.
Thanks
LIBNAME SWAPPLI ODBC ACCESS=READONLY read_lock_type=nolock noprompt="driver=SQL server; server=XXX; Trusted Connection=yes; database=XXX" schema='dbo';
proc sql;
create table a as
select
startdate_time,
enddate_time
from SWAPPLI.SQL_DB_2015_10
quit;
proc sort data=a out=b;
by startdate_time, enddate_time
where enddate_time between "&startdate"dt and "&enddate"dt;
run;

You can delete a step by asking directly the timestamp begin and end.
Then you can deduce the YEAR and MONTH from the timestamp selected.
For &startdate = 30JAN2015:19:52:29, you can create 2 macro-variable by creating 2 substring following this :
%let startdate = 30JAN2015:19:52:29;
%let YEAR=%SUBSTR("&startdate",4,3);
%let MONTH=%SUBSTR("&startdate",7,4);
%PUT YEAR=&YEAR;
%PUT MONTH=&MONTH;
Result :
%PUT YEAR=&YEAR;
YEAR=JAN
%PUT MONTH=&MONTH;
MONTH=2015
Then you can create a %if condition to match JAN with 01, FEB with 02, ect...
Here MONTH will be 01
So you don't need to ask the information 2 times any more.
Then you can select your dataset by doing this :
proc sql;
create table a as
select
startdate_time,
enddate_time
from SWAPPLI.SQL_DB_&YEAR._&MONTH.
quit;
proc sort data=a out=b;
by startdate_time, enddate_time
where enddate_time between "&startdate"dt and "&enddate"dt;
run;
You should probably restrict the selection to a MONTH and not allow the selection through several month like start=01JAN and end=01MAR. Because the selection will be only on the dataset SQL_DB_2017_01 and not taking into account the end=01MAR.

Related

Iteratively adding to merged SAS dataset

I have 18 separate datasets that contain similar information: patient ID, number of 30-day equivalents, and total day supply of those 30-day equivalents. I've output these from a dataset that contains those 3 variables plus the medication class (VA_CLASS) and the quarter it was captured in (a total of 6 quarters).
Here's how I've created the 18 separate datasets from the snip of the dataset shown above:
%macro rx(class,num);
proc sql;
create table dm_sum&clas._qtr&num as select PatID,
sum(equiv_30) as equiv_30_&class._&num
from dm_qtrs
where va_class = "HS&class" and dm_qtr = &qtr
group by 1;
quit;
%mend;
%rx(500,1);
%rx(500,2);
%rx(500,3);
%rx(500,4);
%rx(500,5);
%rx(500,6);
%rx(501,1);
and so on...
I then need to merge all 18 datasets back together by PatID and what I'd like to do is iteratively add the next dataset created to the previous, as in, add dataset dm_sum_500_qtr3 to a file that already contains the results of dm_sum_500_qtr1 & dm_sum_500_qtr1.
Thanks for looking, Brian
In the macro append the created data set to it an accumulator data set. Be sure to delete it before starting so there is a fresh accumulation. If the process is run at different times (like weekly or monthly) you may want to incorporate a unique index to prevent repeated appendings. If you are stacking all these sums, the create table should also select va_class and dm_qtr
%macro (class, num, stack=perm.allClassNumSums);
proc sql; create table dm_sum&clas._qtr&num as … ;
proc append force base=perm.allClassNumSums data=dm_sum&clas._qtr#
run;
%mend;
proc sql;
drop table perm.allClassNumSums;
%rx(500,1)
%rx(500,2)
%rx(500,3)
%rx(500,4)
%rx(500,5)
…
A better approach might be a single query with an larger where, and leave the class and qtr as categorical variables. Your current approach is moving data (class and qtr) into metadata (column names). Such a transformation makes additional downstream processing more difficult.
Proc TABULATE or REPORT can be use a CLASS statement to assist the creation of output having category based columns. These procedures might even be able to work directly with the original data set and not require a preparatory SQL query.
proc sql;
create table want as
select
PatID, va_class, dm_qtr,
sum(equiv_30) as equiv_30_sum
from dm_qtrs
where catx(':', va_class, dm_sqt) in
(
'HS500:1'
'HS500:2'
'HS500:3'
…
'HS501:1'
)
group by PatID, va_class, dm_qtr;
quit;

How can I show the date in the format of date9 in this SQL process?

After I run this SAS SQL process. The date variable "EXVISDAT" showed in the results in a numerical way. How can I change the code to let the date variable "EXVISDAT" show in format of "date9."?
proc sql;
create table test2 as
select VISIT, Targetdays, Targetdays + '01JUL2019'd - 1 as EXVISDAT
from data_in.chrono_visits_lts14424
order by Targetdays
;
quit;
You can read about it in documentation.
In yours case, query will look like:
proc sql;
create table test2 as
select VISIT, Targetdays, Targetdays + '01JUL2019'd - 1 as EXVISDAT format=date9.
from data_in.chrono_visits_lts14424
order by Targetdays
;
quit;

Convert SAS DATE and use in a PROC SQL

I'm having problems to DATEs in SAS Enterprise Guide 7.1 M4.
it's very very simple in SQL Server or VBA but in SAS is driving me crazy.
Problem:
For some strange reason I'm unable to make a simple select. I tried many different forms of formating and convertions but any seems to work
My Simple select returns no observations.
Description of T1.DT_DATE in proc contents
Type: Num
Len: 8
Format: DDMMYY10.
Informat: DATETIME20.
%let DATE_EXAMPLE='01JAN2019'd;
data _null_;
call symput ('CONVERTED_DATE',put(&DATE_EXAMPLE, ddmmyy10.));
run;
%put &CONVERTED_DATE;
PROC SQL;
CREATE TABLE TEST_SELECT AS
SELECT *
FROM MY_SAMPLE_DATA as T1
WHERE T1.DT_DATE = &CONVERTED_DATE
;QUIT;
Intially you are setting up the date properly but you are changing it to a different value that is not understood in where clause. See the resolutions of macrovariable for both macrovariables you have created
%put value of my earlier date value is &DATE_EXAMPLE;
value of my earlier date value is '01JAN2019'd
%put value of my current date value is &CONVERTED_DATE;
value of my current date value is 01/01/2019
change your code to use date literal that is '01JAN2019'd then your code will work. 01/01/2019 value will not make sense in where clause.
PROC SQL;
CREATE TABLE TEST_SELECT AS
SELECT *
FROM MY_SAMPLE_DATA as T1
WHERE T1.DT_DATE = &CONVERTED_DATE
;QUIT;

Date missing in dataset

I created below SAS code to pull the data for particular a date.
%let date =2016-12-31;
proc sql;
connect to teradata as tera ( user=testuser password=testpass );
create table new as select * from connection tera (select acct,org
from dw.act
where date= &date.);
disconnect from tera;
quit;
There are situation where that particular date may be missing in the dataset due to holiday.
I thinking how to query the previous date(non-holiday) if the mention date in the %let statement is holiday
Before running your query you have to do a lookup or data check on the date you are using. You have two options:
Use a Date Dimension table in order identify/lookup holidays.
Count how many records you have for that date, if you get 0 obs for this date, use date+1 in your query.
I recommend using the date dimension table option.
Teradata has Sys_Calendar.Calendar view. You can use that in query, it has all the information regarding weekdays and others.
if you want to SAS way use weekday function and use call symput as shown below. Teradata needs single quote around the date, so it is better to have single quotes around when creating macro variable
data _null_;
/* this is for intial date*/
date_int = input('2016-12-31', yymmdd10.);
/* create a new date variable depending on weekday*/
if weekday(date_int) = 7 then date =date_int-2; /*sunday -2 days to get
friday*/
else if weekday(date_int) = 6 then date =date_int-1;/*saturday -1 day to get
friday*/
else date =date_int;
format date date_int yymmdd10.;
call symputx('date', ''''||put(date,yymmdd10.)||'''');
run;
%put modfied date is &date;
modified date is '2016-12-29'
Now you can use this macro variable in your pass through.

SAS Macro help to loop monthly sas datasets

I have monthly datasets in SAS Library for customers from Jan 2013 onwards with datasets name as CUST_JAN2013,CUST_FEB2013........CUST_OCT2017. These customers datasets have huge records of 2 million members for each month.This monthly datset has two columns (customer number and customer monthly expenses).
I have one input dataset Cust_Expense with customer number and month as columns. This Cust_Expense table has only 250,000 members and want to pull expense data for each member from SPECIFIC monthly SAS dataset by joining customer number.
Cust_Expense
------------
Customer_Number Month
111 FEB2014
987 APR2017
784 FEB2014
768 APR2017
.....
145 AUG2017
345 AUG2014
I have tried using call execute, but it tries to loop thru each 250,000 records of input dataset (Cust_Expense) and join with corresponding monthly SAS customer tables which takes too much of time.
Is there a way to read input tables (Cust_Expense) by month so that we read all customers for a specific month and then read the same monthly table ONCE to pull all the records from that month, so that it does not loop 250,000 times.
Depending on what you want the result to be, you can create one output per month by filtering on cust_expenses per month and joining with the corresponding monthly dataset
%macro want;
proc sql noprint;
select distinct month
into :months separated by ' '
from cust_expenses
;
quit;
proc sql;
%do i=1 %to %sysfunc(countw(&months));
%let month=%scan(&months,&i,%str( ));
create table want_&month. as
select *
from cust_expense(where=(month="&month.")) t1
inner join cust_&month. t2
on t1.customer_number=t2.customer_number
;
%end;
quit;
%mend;
%want;
Or you could have one output using one join by 'unioning' all those monthly datasets into one and dynamically adding a month column.
%macro want;
proc sql noprint;
select distinct month
into :months separated by ' '
from cust_expenses
;
quit;
proc sql;
create table want as
select *
from cust_expense t1
inner join (
%do i=1 %to %sysfunc(countw(&months));
%let month=%scan(&months,&i,%str( ));
%if &i>1 %then union;
select *, "&month." as month
from cust_&month
%end;
) t2
on t1.customer_number=t2.customer_number
and t1.month=t2.month
;
quit;
%mend;
%want;
In either case, I don't really see the point in joining those monthly datasets with the cust_expense dataset. The latter does not seem to hold any information that isn't already present in the monthly datasets.
Your first, best answer is to get rid of these monthly separate tables and make them into one large table with ID and month as key. Then you can simply join on this and go on your way. Having many separate tables like this where a data element determines what table they're in is never a good idea. Then index on month to make it faster.
If you can't do that, then try creating a view that is all of those tables unioned. It may be faster to do that; SAS might decide to materialize the view but maybe not (but if it's extremely slow, then look in your temp table space to see if that's what's happening).
Third option then is probably to make use of SAS formats. Turn the smaller table into a format, using the CNTLIN option. Then a single large datastep will allow you to perform the join.
data want;
set jan feb mar apr ... ;
where put(id,CUSTEXPF1.) = '1';
run;
That only makes one pass through the 250k table and one pass through the monthly tables, plus the very very fast format lookup which is undoubtedly zero cost in this data step (as the disk i/o will be slower).
I guess you could output your data in specific dataset like this example :
data test;
infile datalines dsd;
input ID : $2. MONTH $3. ;
datalines;
1,JAN
2,JAN
3,JAN
4,FEB
5,FEB
6,MAR
7,MAR
8,MAR
9,MAR
;
run;
data JAN FEB MAR;
set test;
if MONTH = "JAN" then output JAN;
if MONTH = "FEB" then output FEB;
if MONTH = "MAR" then output MAR;
run;
You will avoid to loop through all your ID (250000)
and you will use dataset statement from SAS
At the end you will get 12 DATASET containing the ID related.
If you case, FEB2014 , for example, you will use a substring fonction and the condition in your dataset will become :
...
set test;
...
if SUBSTR(MONTH,1,3)="FEB" then output FEB;
...
Regards