SAS Macro using multiple lists for conditions - sas

I have a condition table that I want to use to create my other tables. Now I want my code to go through my condition table taking into consideration both condition. My data looks as follow.
Month Date1 Date2 Line
Jan2010 01Jan2010 31Jan2010 PL
Feb2010 01Feb2010 28Feb2010 CB
Feb2010 01Feb2010 28Feb2010 HB
Mar2010 01Mar2010 31Mar2010 PL
Current Code
%Macro Split_Data(Month,BeginDate,EndDate,Line);
Data Want_&Month._&Line.;
Set Have;
Where Date > &BeginDate and Date <&EndDate and Line = '&Line.';
Quit;
%Mend;
%Split_Data(Jan2010,'01JAN2010'd,'31JAN2010'd,PL);
%Split_Data(Jan2010,'01JAN2010'd,'31JAN2010'd,CB);
I don't want to list the macro like this. I would much rather have a table the macro calls up and condition on. Is this possible? How can I create a less manual method. So I can update my condition table without having to update my SAS code.

I think the dates could complicate things. That being said, you can pass the SAS date values across without quotation marks and your macro would still work. If you have other area's that use the date you would have to confirm that it met your requirements. I would usually use a data _null_ step but I saved it to a data set WANT here so you can see the string, if desired.
data want;
set have;
str = catt('%split_data(', put(month, yymon7.), ",", date1, ",", date2, ",", line, ");");
call execute(str);
run;

Use your metadata to generate the code you need.
First set some macro variables (or make a macro with parameters).
%let inds=HAVE;
%let base=WANT ;
%let metadata=METADATA;
Then use the metadata to generate a single data step to write all of the output datasets.
filename code temp;
data _null_;
set &metadata end=eof;
file code ;
if _n_=1 then put 'DATA';
dsname= catx('_',symget('base'),month,line);
put #2 dsname ;
if eof then put ';' / #2 "set &inds;" ;
run;
data _null_;
set &metadata end=eof;
file code mod;
dsname= catx('_',symget('base'),month,line);
put #2 'IF date > "' date1 date9. '"d and date < "' date2 date9. '"d and line=' line :$quote.
'then output ' dsname ';'
;
if eof then put 'run;' ;
run;
Then include the generated code to run it.
%inc code / source2 ;
So for your example the generated code would look like this:
DATA
WANT_Jan2010_PL
WANT_Feb2010_CB
WANT_Feb2010_HB
WANT_Mar2010_PL
;
set HAVE;
IF date > "01JAN2010"d and date < "31JAN2010"d and line="PL" then output WANT_Jan2010_PL ;
IF date > "01FEB2010"d and date < "28FEB2010"d and line="CB" then output WANT_Feb2010_CB ;
IF date > "01FEB2010"d and date < "28FEB2010"d and line="HB" then output WANT_Feb2010_HB ;
IF date > "01MAR2010"d and date < "31MAR2010"d and line="PL" then output WANT_Mar2010_PL ;
run;

Related

How to generate new import code following a structure change in an upstream flat file without headers?

I have an existing process that imports data from a flat file with no headers. There are hundreds of columns. The provider of the file has added several hundred more columns at different points within the existing columns. I have a list of the old and new column names and SAS code that properly sets the data types for the old columns but not the new ones. I'd rather not have to go through my existing import code and manually write column headers and data formats but I'm not sure how to use these parts to get new import code for the new headers.
data raw_file;
infile "flatfile.csv" delimiter="|" missover dsd firstobs=1;
informat oldcol1 best32.;
informat oldcol2 mmddyy10.;
informat oldcolN $60.;
format oldcol1 best32.;
format oldcol2 mmddyy10.;
format oldcolN $60.;
input
oldcol1
oldcol2
oldcolN $;
run;
I have the header information in an Excel file right now.
old K010H K010I K010J K020A
new K010H K010I K010J K010L K010M K010N K020A
Based on your description, I presume you either know or will find out the informats for the new columns also. If that is the case, why don't you auto generate the code to read the file?
Since you have the header information, assuming you can modify it to the following format and save as a CSV:
var infmt
K010H best32.
K010I mmddyy10.
K010J $60.
K010L best32.
K010M mmddyy10.
K010N $60.
K020A best32.
Then something like this would automatically generate the code and read the data for you:
proc import datafile="cols.csv" out=cols replace;
run;
proc sql;
select var into :cols separated by ' ' from cols ;
select infmt into :infmts separated by ' ' from cols ;
quit;
%macro gen_code;
data raw_file;
infile "flatfile.csv" delimiter="|" missover dsd firstobs=1;
%let ii = 1;
%do %while (%scan(&cols, &ii, %str( )) ~= %str());
%let col = %scan(&cols, &ii, %str( ));
%let infmt = %scan(&infmts, &ii, %str( ));
informat &col &infmt ;
%let ii = %eval(&ii + 1);
%end;
input
%let ii = 1;
%do %while (%scan(&cols, &ii, %str( )) NE %str());
%let col = %scan(&cols, &ii, %str( ));
&col
%let ii = %eval(&ii + 1);
%end;
;
run;
%mend;
%gen_code;
In the future, you could make modifications to your header CSV file and the rest will be taken care by the code itself.
If you have a machine readable data dictionary then you can generate the code from that. Otherwise you will need to just edit your data step. While you are at it you can clean it up so that it is easier to maintain.
First thing is to use LENGTH or ATTRIB to define the variables, instead of forcing SAS to guess. Second only attach informats or formats to variables that need them. For example there is no need to attach informats to normal strings or numbers. No need to attach $xx format to character variables. Do you really need to attach BEST32. format to numbers instead of letting SAS go ahead and display the numeric variables without formats attached using the default BEST12. format?
Second if you define the variables in the order they appear then you can use a positional variable list in the INPUT statement. Then you only have to change the INPUT statement if the first or last variable changes.
So for your example you might create a data step like this instead.
data raw_file;
infile "flatfile.csv" dlm="|" truncover dsd firstobs=1;
length
oldcol1 8
oldcol2 8
oldcolN $60
;
informat oldcol2 mmddyy10.;
format oldcol2 mmddyy10.;
input oldcol1 -- oldcolN ;
run;
Then adding new variables is as simple as inserting them into right place in the LENGTH statement and when needed adding them to the INFORMAT and/or FORMAT statements. If you don't know what the variables contain then make them as character strings and look at the resulting values and decide later if you need to define them differently.

Drop a variable from a SAS dataset based on a condition (IF THEN DO)

So I've done some searching around online but haven't managed to find anything that can solve this problem. Essentially, I have been given a dataset that I've then split into individual dataset's based on name.
However, if the person is a female, the age needs to be omitted from the dataset. Example output:
Males
Name Age Weight Height
Females
Name Weight Height
I have tried the following IF statement, but it just seems to drop the age variable from both the male and female tables:
if sex="F" then do;
drop age;
end;
I'm fairly new to SAS so any help would be greatly appreciated!
When you run a data step in SAS, some statements are processed during compilation, and others subsequently during execution. In this case, the drop statement is processed before your if-then logic, so you can't use it to conditionally drop a column.
Alternatively, you could output a missing value for age for each affected row, e.g.
if sex = 'F' then call missing(age);
Or you could use a drop clause on one output dataset but not the other:
data boys girls(drop=age);
set sashelp.class;
if sex = 'F' then output girls;
else if sex = 'M' then output boys;
run;
The DROP statement cannot be run conditionally. You need to conditionally generate the DROP statement (or DROP= dataset option).
To use a trivial example dataset let's start with SASHELP.CLASS and split it into individual datasets. Note that this dataset only has one observation per NAME, but I will add BY group processing to the code generation step so you can see how you could use it in the case where there are multiple observations per name.
First let's generate code for single DATA statement that creates multiple output datasets. Based on the value of the SEX variable it will conditionally add a DROP= dataset option.
filename code temp;
data _null_;
set sashelp.class end=eof ;
by name ;
file code ;
if _n_=1 then put 'data' ;
if first.name then do;
put ' ' name # ;
if sex='F' then put '(drop=age)' # ;
put ;
end;
if eof then put ';' ;
run;
Now let's append the code for the rest of the data step that will read the source dataset and output the records to the appropriate dataset.
data _null_;
set sashelp.class end=eof ;
by name ;
file code mod ;
if _n_=1 then put ' set sashelp.class; ' ;
if first.name then put ' if name =' name $quote. 'then output ' name ';' ;
if eof then put 'run;' ;
run;
Finally run the generated code.
%include code / source2 ;

How to create several tables from one table using Loops in SAS?

I have a table with observations from the date 01.08.2016 to 30.08.2016.
How to create 12 tables in the following way:
the first one contains observations from the date 01.08.2016 to 20.08.2016;
the second one contains observations from the date 01.08.2016 to 21.08.2016;
...
the 12th one contains observations from the date 01.08.2016 to 30.08.2016.
I think that it is better to do using loops, but dont know how.
This assumes that the date is in SAS date format. You can use character comparison if your date is in character format.
The data vector still contains the observation after the output statement is executed. So as long as the condition is true, the data step will write the same observation to multiple datasets. Also, I think you will need the date comparisons till 31st August if you want 12 datasets.
data want1 want2 want3 ... want12;
set have;
if date <= '20AUG2016'd then output want1;
if date <= '21AUG2016'd then output want2;
if date <= '22AUG2016'd then output want3;
.
.
.
if date <= '31AUG2016'd then output want12;
run;
It is probably better to use WHERE statements than to make separate tables. But to do either without hardcoding you need to use code generation. That is normally done using macro logic.
%macro split(start,stop);
%local i n;
%let n=%sysfunc(intck(day,&start,&stop));
%let n=%eval(&n+1);
DATA
%do i=1 %to &n;
WANT&i
%end;
;
set have ;
%do i=1 %to &n ;
if date <= %sysfunc(intnx(day,&start,&i-1)) then output WANT&i ;
%end;
run;
%mend split;
%split('20AUG2016'd,'31AUG2016'd);

SAS loop through datasets

I have multiple tables in a library call snap1:
cust1, cust2, cust3, etc
I want to generate a loop that gets the records' count of the same column in each of these tables and then insert the results into a different table.
My desired output is:
Table Count
cust1 5,000
cust2 5,555
cust3 6,000
I'm trying this but its not working:
%macro sqlloop(data, byvar);
proc sql noprint;
select &byvar.into:_values SEPARATED by '_'
from %data.;
quit;
data_&values.;
set &data;
select (%byvar);
%do i=1 %to %sysfunc(count(_&_values.,_));
%let var = %sysfunc(scan(_&_values.,&i.));
output &var.;
%end;
end;
run;
%mend;
%sqlloop(data=libsnap, byvar=membername);
First off, if you just want the number of observations, you can get that trivially from dictionary.tables or sashelp.vtable without any loops.
proc sql;
select memname, nlobs
from dictionary.tables
where libname='SNAP1';
quit;
This is fine to retrieve number of rows if you haven't done anything that would cause the number of logical observations to differ - usually a delete in proc sql.
Second, if you're interested in the number of valid responses, there are easier non-loopy ways too.
For example, given whatever query that you can write determining your table names, we can just put them all in a set statement and count in a simple data step.
%let varname=mycol; *the column you are counting;
%let libname=snap1;
proc sql;
select cats("&libname..",memname)
into :tables separated by ' '
from dictionary.tables
where libname=upcase("&libname.");
quit;
data counts;
set &tables. indsname=ds_name end=eof; *9.3 or later;
retain count dataset_name;
if _n_=1 then count=0;
if ds_name ne lag(ds_name) and _n_ ne 1 then do;
output;
count=0;
end;
dataset_name=ds_name;
count = count + ifn(&varname.,1,1,0); *true, false, missing; *false is 0 only;
if eof then output;
keep count dataset_name;
run;
Macros are rarely needed for this sort of thing, and macro loops like you're writing even less so.
If you did want to write a macro, the easier way to do it is:
Write code to do it once, for one dataset
Wrap that in a macro that takes a parameter (dataset name)
Create macro calls for that macro as needed
That way you don't have to deal with %scan and troubleshooting macro code that's hard to debug. You write something that works once, then just call it several times.
proc sql;
select cats('%mymacro(name=',"&libname..",memname,')')
into :macrocalls separated by ' '
from dictionary.tables
where libname=upcase("&libname.");
quit;
&macrocalls.;
Assuming you have a macro, %mymacro, which does whatever counting you want for one dataset.
* Updated *
In the future, please post the log so we can see what is specifically not working. I can see some issues in your code, particularly where your macro variables are being declared, and a select statement that is not doing anything. Here is an alternative process to achieve your goal:
Step 1: Read all of the customer datasets in the snap1 library into a macro variable:
proc sql noprint;
select memname
into :total_cust separated by ' '
from sashelp.vmember
where upcase(memname) LIKE 'CUST%'
AND upcase(libname) = 'SNAP1';
quit;
Step 2: Count the total number of obs in each data set, output to permanent table:
%macro count_obs;
%do i = 1 %to %sysfunc(countw(&total_cust) );
%let dsname = %scan(&total_cust, &i);
%let dsid=%sysfunc(open(&dsname) );
%let nobs=%sysfunc(attrn(&dsid,nobs) );
%let rc=%sysfunc(close(&dsid) );
data _total_obs;
length Member_Name $15.;
Member_Name = "&dsname";
Total_Obs = &nobs;
format Total_Obs comma8.;
run;
proc append base=Total_Obs
data=_total_obs;
run;
%end;
proc datasets lib=work nolist;
delete _total_obs;
quit;
%mend;
%count_obs;
You will need to delete the permanent table Total_Obs if it already exists, but you can add code to handle that if you wish.
If you want to get the total number of non-missing observations for a particular column, do the same code as above, but delete the 3 %let statements below %let dsname = and replace the data step with:
data _total_obs;
length Member_Name $7.;
set snap1.&dsname end=eof;
retain Member_Name "&dsname";
if(NOT missing(var) ) then Total_Obs+1;
if(eof);
format Total_Obs comma8.;
run;
(Update: Fixed %do loop in step 2)

How can I perform a macro for each observation in sas data set?

Here is the macro code.....
libname myfmt "&FBRMrootPath./Formats";
%macro CreateFormat(DSN,Label,Start,fmtname,type);
options mprint mlogic symbolgen;
%If &type='n' %then %do;
proc sort data=&DSN out=Out; by &Label;
Run;
Data ctrl;
set Out(rename=(&Label=label &Start=start )) end=last;
retain fmtname &fmtname type &type;
output;
If last then do;
hlo='O';
label='*ERROR';
output;
End;
Run;
%End;
%Else %do;
proc sort data=&DSN out=Out; by &Start;
Run;
Data ctrl;
set Out(rename=(&Start=label &Label=start )) end=last;
retain fmtname &fmtname type &type;
output;
If last then do;
hlo='O';
label='*ERROR';
output;
End;
Run;
%End;
proc format library=myfmt cntlin=ctrl;
Run;
%Mend CreateFormat;
Here is the code for control data set through which above macro should run for each observation of the data set and the values of the observations are inputs for varibales in the macro....
Data OPER.format_control;
Input DSN :$12. Label :$15. Start :$15. fmtName :$8. type :$1. fmt_Startdt :mmddyy. fmt_Enddt :mmddyy.;
format fmt_Startdt fmt_Enddt date9.;
Datalines;
ssin.prd prd_nm prd_id mealnm n . 12/31/9999
ssin.prd prd_id prd_nm mealid c . 12/31/9999
ssin.fac fac_nm onesrc_fac_id fac1SRnm n . 12/31/9999
ssin.fac fac_nm D3_fac_id facD3nm n . 12/31/9999
ssin.fac onesrc_fac_id D3_fac_id facD31SR n . 12/31/9999
oper.wrkgrp wrkgrp_nm wrkgrp_id grpnm n . 12/31/9999
;
Something like this.
proc sql;
select catx(',',cats('%CreateFormat(',DSN),Label,Start,fmtname,cats(type,')');
into :formcreatelist separated by ' '
from oper.format_control;
quit;
You may need to PUT some of your variables to get the format you want into the macro variable. I use the slightly cludgy cats/catx combo here, you could cats once with ',' added in a bunch of times also.
You do have a limit here - around 20,000 characters total in a macro variable. If it's over that, you either have to use CALL EXECUTE (which has some quirky features) or you can put the macro call into a text file and %INCLUDE it.
There is a better way to do this rather than select ... into a macro variable. Use a temp file like this:
filename dyncode temp;
data _null_;
file dyncode;
set OPER.format_control;
put '%createformat ....';
run;
%include dyncode;
filename dyncode clear;
This technique is not limited by the 32k length limitation on macro variables.
Note that you should definitely use single quotes around the %createformat to prevent SAS from invoking the macro just prior to data step compilation. You want the macro to run when the %include runs.
The above approach is analogous to call execute, but call execute is evil because it does not execute the macro and embedded data/proc code within the macro in the expected order. Avoid call execute.
Finally, if you are running interactive SAS and using the technique there is a neat trick you can use to debug. Comment out the last two lines of code -- the include and the filename clear. After you run the remaining code, enter the SAS command "fslist dyncode" in the command window. This will pop up a notepad view on the dynamic code you just generated. You can review it and make sure you got what you intended.
Here's a call execute solution, just for completeness:
data _null_;
set OPER.format_control;
call execute('%CreateFormat(' || DSN || ',' || Label || ',' || Start || ',' || fmtname || ',' || type || ');');
run;