Block bootstrap from subject list, extract coefficients in PROC MIXED - sas

I'm trying to efficiently implement a block bootstrap technique to get the distribution of regression coefficients from PROC MIXED. The main outline is as follows:
I have a panel data set, say firm and year are the indices. For each iteration of the bootstrap, I wish to sample with replacement n subjects. From this sample, I need to construct a new data set that is a "stack" (concatenated row on top of row) of all the observations for each sampled subject. With this new data set, I can run the regression and pull out the coefficients of interest. Repeat for a bunch of iterations, say 2000.
Each firm can potentially be selected multiple times, so I need to include its data multiple times in each iteration's data set.
Using a loop and subset approach, seems computationally burdensome.
My real data set quite large (a 2Gb .sas7bdat file).
Example pseudo/explanatory code (please pardon all noob errors!):
DATA subjectlist;
SET mydata;
BY firm;
IF first.firm;
RUN;
%macro blockboot(input=, subjects=, iterations=);
%let numberfirms = LENGTH(&subjects);
%do i = 1 %to &iterations ;
DATA mytempdat;
DO i=1 TO &numberfirms;
rec = ceil(&numberfirms * ranuni(0));
*** This is where I want to include all observations for the randomly selected subjects;
*** However, this code doesn't include the same subject multiple times, which...;
*** ...is what I want;
SET &INPUT subjects IN &subjects;
OUTPUT;
END;
STOP;
PROC MIXED DATA=mytempdat;
CLASS firm year;
MODEL yval= cov1 cov2;
RANDOM intercept /sub=subject type=un;
OUTPUT out=outx cov1=cov1 ***want to output the coefficient estimate on cov1 here;
RUN;
%IF &i = 1 %THEN %DO;
DATA outall;
SET outx;
%END;
%ELSE %DO;
PROC APPEND base=outall data=outx;
%END;
%END; /* i=1 to &REPS loop */
PROC UNIVARIATE data=outall;
VAR cov1;
OUTPUT out=final pctlpts=2.5, 97.5 pctlpre=ci;
%mend;
%blockboot(input=mydata,subjects=subjectlist, reps=2000)
This question is identical to a question I asked previously, found here:
block bootstrap from subject list
Any help is appreciated!

See the following paper for details on the best way to do this in SAS:
http://www2.sas.com/proceedings/forum2007/183-2007.pdf
The general summary is to use PROC SURVEYSELECT with a method that allows sampling with replacement to create your bootstrap sample, then use BY processing with PROC MIXED to run the PROC only once rather than running it 2000 times.

Related

loop a list of variables in SAS

I have a dataset with 10+ dependent variables and several categorical variables as independent variables. I'm plan to use proc sgplot and proc mixed functions to do analysis. However, putting all variables one by one in the same function will be really time consuming. I'm pretty new to SAS, is there a way to create a loop with dependent variables and put them into the function.
Something like:
%let var_list= read math science english spanish
proc mixed data=mydata;
model var_list= gender age race/ solution;
random int/subject=School;
run;
Thank you!
SAS has a macro language you can use to generate code. But for this problem you might want to just restructure your data so that you can use BY processing instead.
data tall ;
set mydata ;
array var_list read math science english spanish ;
length varname $32 value 8;
do _n_=1 to dim(var_list);
varname=vname(var_list(_n_));
value = var_list(_n_);
output;
end;
run;
proc sort data=tall;
by varname ;
run;
Now you can process each value of VARNAME (ie 'read','math', ....) as separate analyses with one PROC MIXED call.
proc mixed data=tall;
by varname;
model value = gender age race/ solution;
random int/subject=School;
run;
I would do something like this. This creates a loop around your proc mixed -call. I didn't take a look at the proc mixed -specification, but that may not work as described in your example.
The loop works however, and loops through whatever you put in the place of the proc mixed -call and the loop is dynamically sized based on the number of elements in the dependent variable list.
First define some macro variables.
%let y_var_list = read math science english spanish;
%let x_var_list = gender age race;
%let mydata = my_student_data;
Then define the macro that does the looping.
%macro do_analysis(my_data=, y_variables=, x_variables=);
%* this checks the nr of variables in y_var_list;
%let len_var_list = %eval(%sysfunc(count(&y_variables., %quote( )))+1);
%do _i=1 %to &len_var_list;
%let y_var = %scan(&y_variables, &_i);
%put &y_var; %* just printing out the macrovar to be sure it works;
%* model specification;
proc mixed data=&my_data.; %* data given as parameter in the macro call. proc mixed probably needs some output options too, to work;
model &y_var = &x_variables/ solution; %* independent vars as a macro parameter;
random int/subject=School;
run;
%end;
%mend do_analysis;
Last but not least, remember to call your macro with the given variable lists and dataset specifications. Hope this helps!
%do_analysis(my_data=&mydata, y_variables=&y_var_list, x_variables=&x_var_list);

SAS-Creating Panel by several datasets

Suppose there are ten datasets with same structure: date and price, particularly they have same time period but different price
date price
20140604 5
20140605 7
20140607 9
I want to combine them and create a panel dataset. Since there is no name in each datasets, I attempt to add a new variable name into each data and then combine them.
The following codes are used to add name variable into each dataset
%macro name(sourcelib=,from=,going=);
proc sql noprint; /*read datasets in a library*/
create table mytables as
select *
from dictionary.tables
where libname = &sourcelib
order by memname ;
select count(memname)
into:obs
from mytables;
%let obs=&obs.;
select memname
into : memname1-:memname&obs.
from mytables;
quit;
%do i=1 %to &obs.;
data
&going.&&memname&i;
set
&from.&&memname&i;
name=&&memname&i;
run;
%end;
%mend;
So, is this strategy correct? Whether are there a different way to creating a panel data?
There are really two ways to setup repeated measures data. You can use the TALL method that your code will create. That is generally the most flexible. The other would be a wide format with each PRICE being stored in a different variable. That is usually less flexible, but can be easier for some analyses.
You probably do not need to use macro code or even code generation to combine 10 datasets. You might find that it is easier to just type the 10 dataset names than to write complex code to pull the names from metadata. So a data step like this will let you list any number of datasets in the SET statement and use the membername as the value for the new PANEL variable that distinguishes the source dataset.
data want ;
length dsn $41 panel $32 ;
set in1.panel1 in1.panela in1.panelb indsname=dsn ;
panel = scan(dsn,-1,'.') ;
run;
And if your dataset names follow a pattern that can be used as a member list in the SET statement then the code is even easier to write. So you could have a list of names that have a numeric suffix.
set in1.panel1-in1.panel10 indsname=dsn ;
or perhaps names that all start with a particular prefix.
set in1.panel: indsname=dsn ;
If the different panels are for the same dates then perhaps the wide format is easier? You could then merge the datasets by DATE and rename the individual PRICE variables. That is generate a data step that looks like this:
data want ;
merge in1.panel1 (rename=(price=price1))
in1.panel2 (rename=(price=price2))
...
;
by date;
run;
Or perhaps it would be easier to add a BY statement to the data set that makes the TALL dataset and then transpose it into the WIDE format.
data tall;
length dsn $41 panel $32 ;
set in1.panel1 in1.panela in1.panelb indsname=dsn ;
by date ;
panel = scan(dsn,-1,'.') ;
run;
proc transpose data=tall out=want ;
by date;
id panel;
var price ;
run;
I can't comment on the SQL code but the strategy is correct. Add a name to each data set and then panel on the name with the PANELBY statement.
That is a valid way to achieve what you are looking for.
You are going to need 2 . in between the macros for library.data syntax. The first . is used to concatenate. The second shows up as a ..
I assume you will want to append all of these data sets together. You can add
data &going..want;
set
%do i=1 %to &obs;
&from..&&memname&i
%end;
;
run;
You can combine your loop that adds the names and that data step like this:
data &going..want;
set
%do i=1 %to &obs;
&from..&&memname&i (in=d&i)
%end;
;
%do i=1 %to &obs;
if d&i then
name = &&memname&i;
%end;
run;

Insert text into all cells of first column in a sas dataset

I've output 'Moments' from Proc Univariate to datasets. Many.
Example: Moments_001.sas7bdat through to Moments_237.sas7bdat
For the first column of each dataset (new added first column, and probably new dataset, as opposed to the original) I would like to have a particular text in every cell going down to bottom row.
The exact text would be the name of the respective dataset file: say, "Moments_001".
I do not have to 'grab' the filename, per se, if that's not possible. As I know what the names are already, I can put that text into the procedure. However, grabbing the filenames, if possible, would be easier from my standpoint.
I'd greatly appreciate any help anyone could provide to accomplish this.
Thanks,
Nicholas Kormanik
Are you looking for the INDSNAME option of the SET statement? You need to define two variables because the one generated by the option is automatically dropped.
data want;
length moment dsn $41 ;
set Moments_001 - Moments_237 indsname=dsn ;
moment=dsn;
run;
I think something along these lines should be what you're after. Assuming you have a list of moments, you can loop through it and add a new variable as the first column of each dataset.
%let list_of_moments = moments_001 moments_002 ... moments_237;
%macro your_macro;
%do i = 1 %to %sysfunc(countw(&list_of_moments.));
%let this_moment = %scan(&list_of_moments., &i.);
data &this_moment._v2;
retain new_variable;
set &this_moment.;
new_variable = "&this_moment.";
run;
%end;
%mend your_macro;
%your_macro;
The brute force entering of text into column 1 looks like this:
data moments_001;
length text $ 16;
set moments_001;
text="Moments_001";
run;
You could also write a macro that would loop through all 237 data sets and insert the text.
UNTESTED CODE
%macro do_all;
%do i=1 %to 237;
%let num = %sysfunc(putn(&i,z3.));
data moments_#
length text & 16;
set moments_#
text="Moments_&num";
run;
%end;
%mend
%do_all
It seems to me (not knowing your problem) that if you use PROC UNIVARIATE with the BY option, then you wouldn't need 237 different data sets, all of your output would be in one data set and the BY variable would also be in the data set. Does that solve your problem?

Get data filtered by dynamic column list in SAS stored process

My goal is to create a SAS stored process is to return data for a single dataset and to filter the columns in that dataset based on a multi-value input parameter passed into the stored process.
Is there a simple way to do this?
Is there a way to do this at all?
Here's what I have so far. I'm using a macro to dynamically generate the KEEP statement to define which columns to return. I've defined macro variables at the top to mimic what gets passed into the stored process when called through SAS BI Web Services, so unfortunately those have to remain as they are. That's why I've tried to use the VVALUEX method to turn the column name strings into variable names.
Note - I'm new to SAS
libname fetchlib meta library="lib01" metaserver="123.12.123.123"
password="password" port=1234
repname="myRepo" user="myUserName";
/* This data represents input parameters to stored process and
* is removed in the actual stored process*/
%let inccol0=3;
%let inccol='STREET';
%let inccol1='STREET';
%let inccol2='ADDRESS';
%let inccol3='POSTAL';
%let inccol_count=3;
%macro keepInputColumns;
%if &INCCOL_COUNT = 1 %then
&inccol;
%else
%do k=1 %to (&INCCOL_COUNT);
var&k = VVALUEX(&&inccol&k);
%end;
KEEP
%do k=1 %to (&INCCOL_COUNT);
var&k
%end;
;
%mend;
data test1;
SET fetchlib.Table1;
%keepInputColumns;
run;
/*I switch this output to _WEBOUT in the actual stored process*/
proc json out='C:\Logs\Log1.txt';
options firstobs=1 obs=10;
export test1 /nosastags;
run;
There are some problems with this. The ouput uses var1, var2 and var3 as the column names and not the actual column names. It also doesn't filter by any columns when I change the output to _webout and run it using BI Web Services.
OK, I think I have some understanding of what you're doing here.
You can use KEEP and RENAME in conjunction to get your variable names back.
KEEP
%do k=1 %to (&INCCOL_COUNT);
var&k
%end;
;
This has an equivalent
RENAME
%do k=1 %to (&INCCOL_COUNT);
var&k = &&inccol&k.
%end;
;
and now, as long as the user doesn't separately keep the original variables, you're okay. (If they do, then you will get a conflict and an error).
If this way doesn't work for your needs, and I don't have a solution for the _webout as I don't have a server to play with, you might consider trying this in a slightly different way.
proc format;
value agef
11-13 = '11-13'
14-16 = '14-16';
quit;
ods output report=mydata(drop=_BREAK_);
proc report data=sashelp.class nowd;
format age agef.;
columns name age;
run;
ods output close;
The first part is just a proc format to show that this grabs the formatted value not the underlying value. (I assume that's desired, as if it's not this is a LOT easier.)
Now you have the data in a dataset a bit more conveniently, I think, and can put it out to JSON however you want. In your example you'd do something like
ods output report=work.mydata(drop=_BREAK_);
proc report data=fetchlib.Table1 nowd;
columns
%do k=1 %to (&INCCOL_COUNT);
&&inccol&k.;
%end;
;
run;
ods output close;
And then you can send that dataset to JSON or whatever. It's actually possible that you might be able to go more directly than that even, but I don't know almost anything about PROC JSON.
Reading more about JSON, you may actually have an easier way to do this.
On the export line, you have the various format options. So, assuming we have a dataset that is just a subset of the original:
proc json out='C:\Logs\Log1.txt';
options firstobs=1 obs=10;
export fetchlib.Table1
(
%do k=1 %to (&INCCOL_COUNT);
&&inccol&k.;
%end;
)
/ nosastags FMTCHARACTER FMTDATETIME FMTNUMERIC ;
run;
This method doesn't allow for the variable order to be changed; if you need that, you can use an intermediate dataset:
data intermediate/view=intermediate;
set fetchlib.Table1;
retain
%do k=1 %to (&INCCOL_COUNT);
&&inccol&k.;
%end;
;
keep
%do k=1 %to (&INCCOL_COUNT);
&&inccol&k.;
%end;
;
run;
and then write that out. I'm just guessing that you can use a view in this context.
It turns out that the simplest way to implement this was to change the way that the columns (aka SAS variables) were passed into the stored process. Although Joe's answer was helpful, I ended up solving the problem by passing in the columns to the keep statement as a space-separated column list, which greatly simplified the SAS code because I didn't have to deal with a dynamic list of columns.
libname fetchlib meta library="lib01" metaserver="123.12.123.123"
password="password" port=1234
repname="myRepo" user="myUserName";"&repository" user="&user";
proc json out=_webout;
export fetchlib.&tablename(keep=&columns) /nosastags;
run;
Where &columns gets set to something like this:
Column1 Column2 Column3

SAS: Drop column in a if statement

I have a dataset called have with one entry with multiple variables that look like this:
message reference time qty price
x 101 35000 100 .
the above dataset changes every time in a loop where message can be ="A". If the message="X" then this means to remove 100 qty from the MASTER set where the reference number equals the reference number in the MASTER database. The price=. is because it is already in the MASTER database under reference=101. The MASTER database aggregates all the available orders at some price with quantity available. If in the next loop message="A" then the have dataset would look like this:
message reference time qty price
A 102 35010 150 500
then this mean to add a new reference number to the MASTER database. In other words, to append the line to the MASTER.
I have the following code in my loop to update the quantity in my MASTER database when there is a message X:
data b.master;
modify b.master have(where=(message="X")) updatemode=nomissingcheck;
by order_reference_number;
if _iorc_ = %sysrc(_SOK) then do;
replace;
end;
else if _iorc_ = %sysrc(_DSENMR) then do;
output;
_error_ = 0;
end;
else if _iorc_ = %sysrc(_DSEMTR) then do;
_error_ = 0;
end;
else if _iorc_ = %sysrc(_DSENOM) then do;
_error_ = 0;
end;
run;
I use the replace to update the quantity. But since my entry for price=. when message is X, the above code sets the price='.' where reference=101 in the MASTER via the replace statement...which I don't want. Hence, I prefer to delete the price column is message=X in the have dataset. But I don't want to delete column price when message=A since I use this code
proc append base=MASTER data=have(where=(msg_type="A")) force;
run;
Hence, I have this code price to my Modify statement:
data have(drop=price_alt);
set have; if message="X" then do;
output;end;
else do; /*I WANT TO MAKE NO CHANGE*/
end;run;
but it doesn't do what I want. If the message is not equal X then I don't want to drop the column. If it is equal X, I want to drop the column. How can I adapt the code above to make it work?
Its a bit of a strange request to be honest, such that it raises questions about whether what you're doing is the best way of doing it. However, in the spirit of answering the question...
The answer by DomPazz gives the option of splitting the data into two possible sets, but if you want code down the line to always refer to a specific data set, this creates its own complications.
You also can't, in the one data step, tell SAS to output to the "same" data set where one instance has a column and one instance doesn't. So what you'd like, therefor, is for the code itself to be dynamic, so that the data step that exists is either one that does drop the column, or one that does not drop the column, depending on whether message=x. The answer to this, dynamic code, like many things in SAS, resolves to the creative use of macros. And it looks something like this:
/* Just making your input data set */
data have;
message='x';
time=35000;
qty=1000;
price=10.05;
price_alt=10.6;
run;
/* Writing the macro */
%macro solution;
%local id rc1 rc2;
%let id=%sysfunc(open(work.have));
%syscall set(id);
%let rc1=%sysfunc(fetchobs(&id, 1));
%let rc2=%sysfunc(close(&id));
%IF &message=x %THEN %DO;
data have(drop=price_alt);
set have;
run;
%END;
%ELSE %DO;
data have;
set have;
run;
%END;
%mend solution;
/* Running the macro */
%solution;
Try this:
data outX(drop=price_alt) outNoX;
set have;
if message = "X" then
output outX;
else
output outNoX;
run;
As #sasfrog says in the comments, a table either has a column or it does not. If you want to subset things where MESSAGE="X" then you can use something like this to create 2 data sets.