I am building code that imports a daily file of sales from last 30 days, then replaces last 30 days of records in a main table. I want to only replace the last 30 days if the imported daily file has enough records:
proc sql;
select min(sale_date) into :oldestSale from daily_sales;
quit;
Here's a logic that I want to build in SAS:
IF oldestSale < (today() - 30) THEN
PROC SQL to replace
ELSE
do nothing
END
What would be the best solution? This is trivial in Python which I'm translating this from, but I'm stuck on SAS syntax for doing anything similar...
The 'logic' is quite vague in specifics, but the gist seems to be
import daily sales
presume daily sales is complete and every daily sale is represented
if daily sales info starting date is more than 30 days ago
remove all sales from main table from starting date
append current daily sales
One approach would be to have a macro that peforms the conditional remove/append as a 'replace' action.
%macro process_daily_sales;
proc import … out=daily_sales;
run;
%local oldestSale;
%let oldestSale = %sysfunc(today()); %* sentinel value just in case;
proc sql;
select min(sale_date) into :oldestSale from daily_sales;
quit;
%local gap;
%let gap = %eval ( %sysfunc(today()) - &oldestSale );
%if &gap > 30 %then %do;
proc sql;
delete from main_sales where sales_date >= &oldestSale;
quit;
proc append base=main_sales data=daily_sales;
run;
%end;
%mend;
Related
Good afternoon,
I am decent with SAS but I've never written macros.
I have a DB where I need to break out in separate datasets ID's where the date in a field occurs. E.g. All ID's with a date in Jan 2018 would be one dataset, All ID's with a date in Feb 2018 would be another data set, so on and so forth. Field name is ZDate.
I found this which seems to do exactly what I want. However I think my date isn't in the correct format. The date I'm pulling is in a timestamp in snowflake and I'm converting it to a date with to_date. It's showing as formatted date (16FEB2020) in the original vintagedata data set but the subsequent data sets are completely blank.
%macro month;
%local mindate maxdate i date month ;
proc sql noprint;
select min(ZDate),max(ZDate)
into :mindate , :maxdate
from vintagedata
;
quit;
data
%do i=0 %to %sysfunc(intck(month,&mindate,&maxdate));
%let date=%sysfunc(intnx(month,&mindate,&i));
%let month=%sysfunc(putn(&date,monyy7.));
&month
%end;
;
set vintagedata;
%do i=0 %to %sysfunc(intck(month,&mindate,&maxdate));
%let date=%sysfunc(intnx(month,&mindate,&i));
%let month=%sysfunc(putn(&date,monyy7.));
if intnx('month',date,0)=&date then output &month ;
%end;
run;
%mend;
%month;
I have 10 csv files that would like to do PROC IMPORT, one at each time, depending on the value user input. For example, user will input 201801 to run January 2018 csv file(PROC IMPORT).
For that case, i tried to use substring &period(which stores the yymmn6. value which user input) and then from there, put in proc import. My code as below:
%let period=201812;
%macro convertdate();
data convertdate;
year=substr("&period",1,4);
month=substr("&period",5,2);
if month='01' then newmonth='jan';
if month='02' then newmonth='feb';
if month='03' then newmonth='mac';
if month='04' then newmonth='apr';
if month='05' then newmonth='may';
if month='06' then newmonth='jun';
if month='07' then newmonth='jul';
if month='08' then newmonth='aug';
if month='09' then newmonth='sep';
if month='10' then newmonth='oct';
if month='11' then newmonth='nov';
if month='12' then newmonth='dec';
run;
/*Assign convertdate the only distinct record into macro, then set as the data step for proc import*/
%if month='01' %then %do;
%let newmonth=jan;
%end;
/*proc import and remaining transformation from existing script*/
proc import out=rmr_raw_source
file="/sasdata/source/user_files/re_&newmonth.2018.xlsx"
dbms=xlsx replace;
sheet="re_&newmonth.2018";
getnames=no;
dbsaslabel=none;
run;
%mend;
%convertdate;
However, it wont work. I am getting warning below:
WARNING: Apparent symbolic reference NEWMONTH not resolved.
Does anyone have better solution to it?
You can try something as follows:
%let period = '201801'; /*7*/
%global month;
data test123;
period = . /*1*/
full_date = cats(period,'01'); /*2*/
format converted_date DATE10.; /*3*/
converted_date = input(full_date, yymmdd8.); /*4*/
month = put(converted_date,monname3.); /*5*/
call symput('month',month); /*6*/
run;
Create a SAS variable out of Macro.
Set to first date of the given month.
Create a new var and set the date format.
convert the TEXT date to SAS date.
get the month name out of date created above.
create a macro variable named 'month' out of SAS variable month.
Make macro variable 'month' global.
You can now use month macro variable anywhere in the code.
I have monthly datasets in SAS Library for customers from Jan 2013 onwards with datasets name as CUST_JAN2013,CUST_FEB2013........CUST_OCT2017. These customers datasets have huge records of 2 million members for each month.This monthly datset has two columns (customer number and customer monthly expenses).
I have one input dataset Cust_Expense with customer number and month as columns. This Cust_Expense table has only 250,000 members and want to pull expense data for each member from SPECIFIC monthly SAS dataset by joining customer number.
Cust_Expense
------------
Customer_Number Month
111 FEB2014
987 APR2017
784 FEB2014
768 APR2017
.....
145 AUG2017
345 AUG2014
I have tried using call execute, but it tries to loop thru each 250,000 records of input dataset (Cust_Expense) and join with corresponding monthly SAS customer tables which takes too much of time.
Is there a way to read input tables (Cust_Expense) by month so that we read all customers for a specific month and then read the same monthly table ONCE to pull all the records from that month, so that it does not loop 250,000 times.
Depending on what you want the result to be, you can create one output per month by filtering on cust_expenses per month and joining with the corresponding monthly dataset
%macro want;
proc sql noprint;
select distinct month
into :months separated by ' '
from cust_expenses
;
quit;
proc sql;
%do i=1 %to %sysfunc(countw(&months));
%let month=%scan(&months,&i,%str( ));
create table want_&month. as
select *
from cust_expense(where=(month="&month.")) t1
inner join cust_&month. t2
on t1.customer_number=t2.customer_number
;
%end;
quit;
%mend;
%want;
Or you could have one output using one join by 'unioning' all those monthly datasets into one and dynamically adding a month column.
%macro want;
proc sql noprint;
select distinct month
into :months separated by ' '
from cust_expenses
;
quit;
proc sql;
create table want as
select *
from cust_expense t1
inner join (
%do i=1 %to %sysfunc(countw(&months));
%let month=%scan(&months,&i,%str( ));
%if &i>1 %then union;
select *, "&month." as month
from cust_&month
%end;
) t2
on t1.customer_number=t2.customer_number
and t1.month=t2.month
;
quit;
%mend;
%want;
In either case, I don't really see the point in joining those monthly datasets with the cust_expense dataset. The latter does not seem to hold any information that isn't already present in the monthly datasets.
Your first, best answer is to get rid of these monthly separate tables and make them into one large table with ID and month as key. Then you can simply join on this and go on your way. Having many separate tables like this where a data element determines what table they're in is never a good idea. Then index on month to make it faster.
If you can't do that, then try creating a view that is all of those tables unioned. It may be faster to do that; SAS might decide to materialize the view but maybe not (but if it's extremely slow, then look in your temp table space to see if that's what's happening).
Third option then is probably to make use of SAS formats. Turn the smaller table into a format, using the CNTLIN option. Then a single large datastep will allow you to perform the join.
data want;
set jan feb mar apr ... ;
where put(id,CUSTEXPF1.) = '1';
run;
That only makes one pass through the 250k table and one pass through the monthly tables, plus the very very fast format lookup which is undoubtedly zero cost in this data step (as the disk i/o will be slower).
I guess you could output your data in specific dataset like this example :
data test;
infile datalines dsd;
input ID : $2. MONTH $3. ;
datalines;
1,JAN
2,JAN
3,JAN
4,FEB
5,FEB
6,MAR
7,MAR
8,MAR
9,MAR
;
run;
data JAN FEB MAR;
set test;
if MONTH = "JAN" then output JAN;
if MONTH = "FEB" then output FEB;
if MONTH = "MAR" then output MAR;
run;
You will avoid to loop through all your ID (250000)
and you will use dataset statement from SAS
At the end you will get 12 DATASET containing the ID related.
If you case, FEB2014 , for example, you will use a substring fonction and the condition in your dataset will become :
...
set test;
...
if SUBSTR(MONTH,1,3)="FEB" then output FEB;
...
Regards
I have a table with observations from the date 01.08.2016 to 30.08.2016.
How to create 12 tables in the following way:
the first one contains observations from the date 01.08.2016 to 20.08.2016;
the second one contains observations from the date 01.08.2016 to 21.08.2016;
...
the 12th one contains observations from the date 01.08.2016 to 30.08.2016.
I think that it is better to do using loops, but dont know how.
This assumes that the date is in SAS date format. You can use character comparison if your date is in character format.
The data vector still contains the observation after the output statement is executed. So as long as the condition is true, the data step will write the same observation to multiple datasets. Also, I think you will need the date comparisons till 31st August if you want 12 datasets.
data want1 want2 want3 ... want12;
set have;
if date <= '20AUG2016'd then output want1;
if date <= '21AUG2016'd then output want2;
if date <= '22AUG2016'd then output want3;
.
.
.
if date <= '31AUG2016'd then output want12;
run;
It is probably better to use WHERE statements than to make separate tables. But to do either without hardcoding you need to use code generation. That is normally done using macro logic.
%macro split(start,stop);
%local i n;
%let n=%sysfunc(intck(day,&start,&stop));
%let n=%eval(&n+1);
DATA
%do i=1 %to &n;
WANT&i
%end;
;
set have ;
%do i=1 %to &n ;
if date <= %sysfunc(intnx(day,&start,&i-1)) then output WANT&i ;
%end;
run;
%mend split;
%split('20AUG2016'd,'31AUG2016'd);
I have multiple tables in a library call snap1:
cust1, cust2, cust3, etc
I want to generate a loop that gets the records' count of the same column in each of these tables and then insert the results into a different table.
My desired output is:
Table Count
cust1 5,000
cust2 5,555
cust3 6,000
I'm trying this but its not working:
%macro sqlloop(data, byvar);
proc sql noprint;
select &byvar.into:_values SEPARATED by '_'
from %data.;
quit;
data_&values.;
set &data;
select (%byvar);
%do i=1 %to %sysfunc(count(_&_values.,_));
%let var = %sysfunc(scan(_&_values.,&i.));
output &var.;
%end;
end;
run;
%mend;
%sqlloop(data=libsnap, byvar=membername);
First off, if you just want the number of observations, you can get that trivially from dictionary.tables or sashelp.vtable without any loops.
proc sql;
select memname, nlobs
from dictionary.tables
where libname='SNAP1';
quit;
This is fine to retrieve number of rows if you haven't done anything that would cause the number of logical observations to differ - usually a delete in proc sql.
Second, if you're interested in the number of valid responses, there are easier non-loopy ways too.
For example, given whatever query that you can write determining your table names, we can just put them all in a set statement and count in a simple data step.
%let varname=mycol; *the column you are counting;
%let libname=snap1;
proc sql;
select cats("&libname..",memname)
into :tables separated by ' '
from dictionary.tables
where libname=upcase("&libname.");
quit;
data counts;
set &tables. indsname=ds_name end=eof; *9.3 or later;
retain count dataset_name;
if _n_=1 then count=0;
if ds_name ne lag(ds_name) and _n_ ne 1 then do;
output;
count=0;
end;
dataset_name=ds_name;
count = count + ifn(&varname.,1,1,0); *true, false, missing; *false is 0 only;
if eof then output;
keep count dataset_name;
run;
Macros are rarely needed for this sort of thing, and macro loops like you're writing even less so.
If you did want to write a macro, the easier way to do it is:
Write code to do it once, for one dataset
Wrap that in a macro that takes a parameter (dataset name)
Create macro calls for that macro as needed
That way you don't have to deal with %scan and troubleshooting macro code that's hard to debug. You write something that works once, then just call it several times.
proc sql;
select cats('%mymacro(name=',"&libname..",memname,')')
into :macrocalls separated by ' '
from dictionary.tables
where libname=upcase("&libname.");
quit;
¯ocalls.;
Assuming you have a macro, %mymacro, which does whatever counting you want for one dataset.
* Updated *
In the future, please post the log so we can see what is specifically not working. I can see some issues in your code, particularly where your macro variables are being declared, and a select statement that is not doing anything. Here is an alternative process to achieve your goal:
Step 1: Read all of the customer datasets in the snap1 library into a macro variable:
proc sql noprint;
select memname
into :total_cust separated by ' '
from sashelp.vmember
where upcase(memname) LIKE 'CUST%'
AND upcase(libname) = 'SNAP1';
quit;
Step 2: Count the total number of obs in each data set, output to permanent table:
%macro count_obs;
%do i = 1 %to %sysfunc(countw(&total_cust) );
%let dsname = %scan(&total_cust, &i);
%let dsid=%sysfunc(open(&dsname) );
%let nobs=%sysfunc(attrn(&dsid,nobs) );
%let rc=%sysfunc(close(&dsid) );
data _total_obs;
length Member_Name $15.;
Member_Name = "&dsname";
Total_Obs = &nobs;
format Total_Obs comma8.;
run;
proc append base=Total_Obs
data=_total_obs;
run;
%end;
proc datasets lib=work nolist;
delete _total_obs;
quit;
%mend;
%count_obs;
You will need to delete the permanent table Total_Obs if it already exists, but you can add code to handle that if you wish.
If you want to get the total number of non-missing observations for a particular column, do the same code as above, but delete the 3 %let statements below %let dsname = and replace the data step with:
data _total_obs;
length Member_Name $7.;
set snap1.&dsname end=eof;
retain Member_Name "&dsname";
if(NOT missing(var) ) then Total_Obs+1;
if(eof);
format Total_Obs comma8.;
run;
(Update: Fixed %do loop in step 2)