Macro to Include Based on Variable Record Names - sas

I'm attempting to create a macro to apply to number tables where it would only keep rows where the Account_Description equals 'Office Equipment', 'Computer Peripherals' or 'Equipment Rental'.
So for example, I have the marcro code (my version does not work) followed by the tables, which are each followed by the Macro, respectively.
You may say, why not just copy and paste-what I'm asking is a simplistic view of my question as I have 13 other sets of Account_Description pools that I would like to turn into Macros as well.
%MACRO Total_Minor_Equipment;
%if &Account_Description = 'Office Equipment'
or &Account_Description = 'Computer Peripherals'
or &Account_Description = 'Equipment Rental';
%MEND Total_Minor_Equipment;
Data June_2019_v2;
set June_2019_v1;
run;
%Total_Minor_Equipment
Data July_2019_v2;
set July_2019_v1;
run;
%Total_Minor_Equipment
Data Aug_2019_v2;
set Aug_2019_v1;
run;
%Total_Minor_Equipment

I think a brief study of SAS datastep and macro may help you since you have it a bit confused. Remember, anything that goes inside a macro gets resolved during compile time and output is actually code that gets executed during runtime. So when you call your macro outside a datastep, it writes out some arbitrary code outside the datastep which will fail to execute. Also you seem to mix up up macro %if with a datstep if.
What you need to do is:
%MACRO Total_Minor_Equipment;
if Account_Description in ('Office Equipment', 'Computer Peripherals', 'Equipment Rental');
%MEND Total_Minor_Equipment;
Data June_2019_v2;
set June_2019_v1;
%Total_Minor_Equipment
run;
You could use the IN operator instead or OR here.
assuming the field name is Account_Description. If indeed the macro variable Account_Description contains the actual field name then you should change it to &Account_Description as you had it before.

Rather than store the account description classification rules in source code, use a second data set to map the descriptions to the class. The data set can be joined to the original data, or be used as the basis of a custom format that returns the class value.
As in your prior question, recommend stacking all the monthly data sets as one (per SET statement shown by #Reeza). (If you had previously split them out from a single table, maybe don't do that). Apply the format to the description to compute the desired pool. Use WHERE and BY statements in downstream analytics when month-wise grouping or segregation called for.
data account_pool_cntlin;
length description pool fmtname $32 ;
retain fmtname "$acctpool";
input description: & pool:; datalines;
Office Equipment Minor_Equipment
Computer Peripherals Minor_Equipment
Equipment Rental Minor_Equipment
Shipping Container Major Equipment
BigRig Tractor Major Equipment
BigRig Tailer Major Equipment
Baseball Cap Major League Equipment
Baseball Uniforms Major League Equipment
Baseball Bases Major League Equipment
run;
proc format cntlin=account_pool_cntlin (
rename = (
description = start
pool = label
))
;
run;
data want;
set Jan_2019 Feb_2019 ... Aug_2019 indsname=dataset_name;
mon_year = dataset_name;
pool = put (account_description, $acctpool.);
run;
Note: A deeper understanding and application of formats in your analyses may result in code in which there is no need to compute the pool variable at all.

You can try this macro, if you want to call macro manual several times:
%MACRO Total_Minor_Equipment(base_table_name,out_table_name);
Data &out_table_name;
set &base_table_name;
if Account_Description in ('Office Equipment', 'Computer Peripherals', 'Equipment Rental');
run;
%MEND Total_Minor_Equipment;
%Total_Minor_Equipment(June_2019_v2, June_2019_v1);
%Total_Minor_Equipment(July_2019_v2, July_2019_v1);
%Total_Minor_Equipment(Aug_2019_v2, Aug_2019_v1);
If you want to do it dynamically, calling macro one time (from _v1 to _v2):
%let mask_list = June_2019_ July_2019_ Aug_2019_;
%MACRO Total_Minor_Equipment(tmask_list);
%do i=1 %to %sysfunc(countw(&tmask_list,%str( )));
%let name&i = %scan(&tmask_list,&i,%str( ));
Data &&name&i.v2;
set &&name&i.v1;
if Account_Description in ('Office Equipment', 'Computer Peripherals', 'Equipment Rental');
run;
%end;
%MEND Total_Minor_Equipment;
%Total_Minor_Equipment(&mask_list);

I'd recommend stacking all the data sets at once and filtering at once.
%let dset_list = June_2019_v1 July_2019_v1 Aug_2019_v1;
data want;
set &dset_list indsname=source;
input_file = source;
if account_description in ('Office Equipment', 'Computer Peripherals', 'Equipment Rental');
run;
That will combine all the data sets, filter it, and add a record indicating which record belonged to each individual data set.

Related

loop a list of variables in SAS

I have a dataset with 10+ dependent variables and several categorical variables as independent variables. I'm plan to use proc sgplot and proc mixed functions to do analysis. However, putting all variables one by one in the same function will be really time consuming. I'm pretty new to SAS, is there a way to create a loop with dependent variables and put them into the function.
Something like:
%let var_list= read math science english spanish
proc mixed data=mydata;
model var_list= gender age race/ solution;
random int/subject=School;
run;
Thank you!
SAS has a macro language you can use to generate code. But for this problem you might want to just restructure your data so that you can use BY processing instead.
data tall ;
set mydata ;
array var_list read math science english spanish ;
length varname $32 value 8;
do _n_=1 to dim(var_list);
varname=vname(var_list(_n_));
value = var_list(_n_);
output;
end;
run;
proc sort data=tall;
by varname ;
run;
Now you can process each value of VARNAME (ie 'read','math', ....) as separate analyses with one PROC MIXED call.
proc mixed data=tall;
by varname;
model value = gender age race/ solution;
random int/subject=School;
run;
I would do something like this. This creates a loop around your proc mixed -call. I didn't take a look at the proc mixed -specification, but that may not work as described in your example.
The loop works however, and loops through whatever you put in the place of the proc mixed -call and the loop is dynamically sized based on the number of elements in the dependent variable list.
First define some macro variables.
%let y_var_list = read math science english spanish;
%let x_var_list = gender age race;
%let mydata = my_student_data;
Then define the macro that does the looping.
%macro do_analysis(my_data=, y_variables=, x_variables=);
%* this checks the nr of variables in y_var_list;
%let len_var_list = %eval(%sysfunc(count(&y_variables., %quote( )))+1);
%do _i=1 %to &len_var_list;
%let y_var = %scan(&y_variables, &_i);
%put &y_var; %* just printing out the macrovar to be sure it works;
%* model specification;
proc mixed data=&my_data.; %* data given as parameter in the macro call. proc mixed probably needs some output options too, to work;
model &y_var = &x_variables/ solution; %* independent vars as a macro parameter;
random int/subject=School;
run;
%end;
%mend do_analysis;
Last but not least, remember to call your macro with the given variable lists and dataset specifications. Hope this helps!
%do_analysis(my_data=&mydata, y_variables=&y_var_list, x_variables=&x_var_list);

SAS-Creating Panel by several datasets

Suppose there are ten datasets with same structure: date and price, particularly they have same time period but different price
date price
20140604 5
20140605 7
20140607 9
I want to combine them and create a panel dataset. Since there is no name in each datasets, I attempt to add a new variable name into each data and then combine them.
The following codes are used to add name variable into each dataset
%macro name(sourcelib=,from=,going=);
proc sql noprint; /*read datasets in a library*/
create table mytables as
select *
from dictionary.tables
where libname = &sourcelib
order by memname ;
select count(memname)
into:obs
from mytables;
%let obs=&obs.;
select memname
into : memname1-:memname&obs.
from mytables;
quit;
%do i=1 %to &obs.;
data
&going.&&memname&i;
set
&from.&&memname&i;
name=&&memname&i;
run;
%end;
%mend;
So, is this strategy correct? Whether are there a different way to creating a panel data?
There are really two ways to setup repeated measures data. You can use the TALL method that your code will create. That is generally the most flexible. The other would be a wide format with each PRICE being stored in a different variable. That is usually less flexible, but can be easier for some analyses.
You probably do not need to use macro code or even code generation to combine 10 datasets. You might find that it is easier to just type the 10 dataset names than to write complex code to pull the names from metadata. So a data step like this will let you list any number of datasets in the SET statement and use the membername as the value for the new PANEL variable that distinguishes the source dataset.
data want ;
length dsn $41 panel $32 ;
set in1.panel1 in1.panela in1.panelb indsname=dsn ;
panel = scan(dsn,-1,'.') ;
run;
And if your dataset names follow a pattern that can be used as a member list in the SET statement then the code is even easier to write. So you could have a list of names that have a numeric suffix.
set in1.panel1-in1.panel10 indsname=dsn ;
or perhaps names that all start with a particular prefix.
set in1.panel: indsname=dsn ;
If the different panels are for the same dates then perhaps the wide format is easier? You could then merge the datasets by DATE and rename the individual PRICE variables. That is generate a data step that looks like this:
data want ;
merge in1.panel1 (rename=(price=price1))
in1.panel2 (rename=(price=price2))
...
;
by date;
run;
Or perhaps it would be easier to add a BY statement to the data set that makes the TALL dataset and then transpose it into the WIDE format.
data tall;
length dsn $41 panel $32 ;
set in1.panel1 in1.panela in1.panelb indsname=dsn ;
by date ;
panel = scan(dsn,-1,'.') ;
run;
proc transpose data=tall out=want ;
by date;
id panel;
var price ;
run;
I can't comment on the SQL code but the strategy is correct. Add a name to each data set and then panel on the name with the PANELBY statement.
That is a valid way to achieve what you are looking for.
You are going to need 2 . in between the macros for library.data syntax. The first . is used to concatenate. The second shows up as a ..
I assume you will want to append all of these data sets together. You can add
data &going..want;
set
%do i=1 %to &obs;
&from..&&memname&i
%end;
;
run;
You can combine your loop that adds the names and that data step like this:
data &going..want;
set
%do i=1 %to &obs;
&from..&&memname&i (in=d&i)
%end;
;
%do i=1 %to &obs;
if d&i then
name = &&memname&i;
%end;
run;

SAS: How to Automate the Creation of Many Datasets using Another Data set

I am looking to create multiple datasets from city_variables dataset. There are a total of 58 observations that I summed up into macrovariable (&count) to stop the do loop.
The city_variables dataset looks like (vertically ofcourse):
CITY_NAME
City1
City2
City3
City4
City5
City6
City7
City8
City9
City10
..........
City58
I created macrovariable &name from a data null statement in order to input the cityname into the dataset name.
Any help would be great on how to automate the creation of the 48 files by name (not number). Thanks again.
/Create macro with number of observations in concordinate file/
proc sql;
select count(area_name);
into :count
from main.state_all;
quit;
%macro repeat;
data _null_;
set city_variables;
%do i= 1 %UNTIL (i = &count);
call symput('name',CITY_NAME);
run;
data &name;
set dataset;
where city_name = &name;
run;
%end;
%mend repeat;
%repeat
Well, if you're going to do
proc sql;
select count(area_name);
into :count
from main.state_all;
quit;
Then why not go all the way? Make a macro that does one dataset output, given the criteria as parameters, then make one call for each separate whatever-name. This might be close to what you're looking at.
%macro make_data(data_name=, set_name=, where=);
data &data_name.;
set &set_name.;
where &where.;
run;
%mend make_data;
proc sql;
select
cats('%make_data(data_name=',city_name,
', set_name=dataset, where=city_name="',
city_name,
'" )')
into :make_datalist
separated by ' '
from main.state_all;
quit;
&make_datalist.;
Some other options that I'll just link to:
Chris Hemedinger # SAS Dummy blog How to Split One Data Set Into Many shows a similar concept except he doesn't put the macro wrapper where I do.
Paul Dorfman, Data Step Hash Objects as Programming Tools is the seminal paper on using a hash table to do this. This is the "fastest" way to do this, likely, if you understand hash tables and have the memory available.
You don't need to use a macro to automate splitting up your data in this way. Since your example is really simple, I would consider using call execute in a null data step:
data test;
infile datalines ;
input city_name $20.;
datalines;
City1
City2
City2
City3
City3
City3
;
run;
data _null_;
set test;
call execute("data "||strip(city_name)||";"||"
set test;
where city_name = '"||strip(city_name)||"';"||"
run;");
run;

SAS: Drop column in a if statement

I have a dataset called have with one entry with multiple variables that look like this:
message reference time qty price
x 101 35000 100 .
the above dataset changes every time in a loop where message can be ="A". If the message="X" then this means to remove 100 qty from the MASTER set where the reference number equals the reference number in the MASTER database. The price=. is because it is already in the MASTER database under reference=101. The MASTER database aggregates all the available orders at some price with quantity available. If in the next loop message="A" then the have dataset would look like this:
message reference time qty price
A 102 35010 150 500
then this mean to add a new reference number to the MASTER database. In other words, to append the line to the MASTER.
I have the following code in my loop to update the quantity in my MASTER database when there is a message X:
data b.master;
modify b.master have(where=(message="X")) updatemode=nomissingcheck;
by order_reference_number;
if _iorc_ = %sysrc(_SOK) then do;
replace;
end;
else if _iorc_ = %sysrc(_DSENMR) then do;
output;
_error_ = 0;
end;
else if _iorc_ = %sysrc(_DSEMTR) then do;
_error_ = 0;
end;
else if _iorc_ = %sysrc(_DSENOM) then do;
_error_ = 0;
end;
run;
I use the replace to update the quantity. But since my entry for price=. when message is X, the above code sets the price='.' where reference=101 in the MASTER via the replace statement...which I don't want. Hence, I prefer to delete the price column is message=X in the have dataset. But I don't want to delete column price when message=A since I use this code
proc append base=MASTER data=have(where=(msg_type="A")) force;
run;
Hence, I have this code price to my Modify statement:
data have(drop=price_alt);
set have; if message="X" then do;
output;end;
else do; /*I WANT TO MAKE NO CHANGE*/
end;run;
but it doesn't do what I want. If the message is not equal X then I don't want to drop the column. If it is equal X, I want to drop the column. How can I adapt the code above to make it work?
Its a bit of a strange request to be honest, such that it raises questions about whether what you're doing is the best way of doing it. However, in the spirit of answering the question...
The answer by DomPazz gives the option of splitting the data into two possible sets, but if you want code down the line to always refer to a specific data set, this creates its own complications.
You also can't, in the one data step, tell SAS to output to the "same" data set where one instance has a column and one instance doesn't. So what you'd like, therefor, is for the code itself to be dynamic, so that the data step that exists is either one that does drop the column, or one that does not drop the column, depending on whether message=x. The answer to this, dynamic code, like many things in SAS, resolves to the creative use of macros. And it looks something like this:
/* Just making your input data set */
data have;
message='x';
time=35000;
qty=1000;
price=10.05;
price_alt=10.6;
run;
/* Writing the macro */
%macro solution;
%local id rc1 rc2;
%let id=%sysfunc(open(work.have));
%syscall set(id);
%let rc1=%sysfunc(fetchobs(&id, 1));
%let rc2=%sysfunc(close(&id));
%IF &message=x %THEN %DO;
data have(drop=price_alt);
set have;
run;
%END;
%ELSE %DO;
data have;
set have;
run;
%END;
%mend solution;
/* Running the macro */
%solution;
Try this:
data outX(drop=price_alt) outNoX;
set have;
if message = "X" then
output outX;
else
output outNoX;
run;
As #sasfrog says in the comments, a table either has a column or it does not. If you want to subset things where MESSAGE="X" then you can use something like this to create 2 data sets.

Block bootstrap from subject list, extract coefficients in PROC MIXED

I'm trying to efficiently implement a block bootstrap technique to get the distribution of regression coefficients from PROC MIXED. The main outline is as follows:
I have a panel data set, say firm and year are the indices. For each iteration of the bootstrap, I wish to sample with replacement n subjects. From this sample, I need to construct a new data set that is a "stack" (concatenated row on top of row) of all the observations for each sampled subject. With this new data set, I can run the regression and pull out the coefficients of interest. Repeat for a bunch of iterations, say 2000.
Each firm can potentially be selected multiple times, so I need to include its data multiple times in each iteration's data set.
Using a loop and subset approach, seems computationally burdensome.
My real data set quite large (a 2Gb .sas7bdat file).
Example pseudo/explanatory code (please pardon all noob errors!):
DATA subjectlist;
SET mydata;
BY firm;
IF first.firm;
RUN;
%macro blockboot(input=, subjects=, iterations=);
%let numberfirms = LENGTH(&subjects);
%do i = 1 %to &iterations ;
DATA mytempdat;
DO i=1 TO &numberfirms;
rec = ceil(&numberfirms * ranuni(0));
*** This is where I want to include all observations for the randomly selected subjects;
*** However, this code doesn't include the same subject multiple times, which...;
*** ...is what I want;
SET &INPUT subjects IN &subjects;
OUTPUT;
END;
STOP;
PROC MIXED DATA=mytempdat;
CLASS firm year;
MODEL yval= cov1 cov2;
RANDOM intercept /sub=subject type=un;
OUTPUT out=outx cov1=cov1 ***want to output the coefficient estimate on cov1 here;
RUN;
%IF &i = 1 %THEN %DO;
DATA outall;
SET outx;
%END;
%ELSE %DO;
PROC APPEND base=outall data=outx;
%END;
%END; /* i=1 to &REPS loop */
PROC UNIVARIATE data=outall;
VAR cov1;
OUTPUT out=final pctlpts=2.5, 97.5 pctlpre=ci;
%mend;
%blockboot(input=mydata,subjects=subjectlist, reps=2000)
This question is identical to a question I asked previously, found here:
block bootstrap from subject list
Any help is appreciated!
See the following paper for details on the best way to do this in SAS:
http://www2.sas.com/proceedings/forum2007/183-2007.pdf
The general summary is to use PROC SURVEYSELECT with a method that allows sampling with replacement to create your bootstrap sample, then use BY processing with PROC MIXED to run the PROC only once rather than running it 2000 times.