'notin' used within a 'where' statement syntax in PROC PROBIT - sas

I'm working with some one else's code which contains the following instance of PROC PROBIT.
proc probit data = mortality order=data;
where group notin (9);
class survive;
model survive =log_dose / D = LOGISTIC INVERSECL;
ods output /*logprobitanalysis=logprobliv_dose*/ probitanalysis=probliv_dose;
RUN;
What function does the (9) serve in the where statement?
I'm scouring documentation, but not having much luck finding an explanation. Is it native to the where statement? Or, does the order= option alter the capabilities of where within proc probit? I assume that notin is a variable, but it's not entirely clear to me from the code. Is notin some obscure keyword for not in (list)?
(Un)Fortunately, the author is no longer with us.

NOTIN is the same as NOT IN. I assume SAS sees NOT and applies it as a modifier for what comes next, if what comes next is an operator.
So this works:
data test;
do group=1 to 9;
output;
end;
run;
data want;
set test;
where group notin (1,9);
run;
Leaving you with group in {2,3,...,8}

Related

loop a list of variables in SAS

I have a dataset with 10+ dependent variables and several categorical variables as independent variables. I'm plan to use proc sgplot and proc mixed functions to do analysis. However, putting all variables one by one in the same function will be really time consuming. I'm pretty new to SAS, is there a way to create a loop with dependent variables and put them into the function.
Something like:
%let var_list= read math science english spanish
proc mixed data=mydata;
model var_list= gender age race/ solution;
random int/subject=School;
run;
Thank you!
SAS has a macro language you can use to generate code. But for this problem you might want to just restructure your data so that you can use BY processing instead.
data tall ;
set mydata ;
array var_list read math science english spanish ;
length varname $32 value 8;
do _n_=1 to dim(var_list);
varname=vname(var_list(_n_));
value = var_list(_n_);
output;
end;
run;
proc sort data=tall;
by varname ;
run;
Now you can process each value of VARNAME (ie 'read','math', ....) as separate analyses with one PROC MIXED call.
proc mixed data=tall;
by varname;
model value = gender age race/ solution;
random int/subject=School;
run;
I would do something like this. This creates a loop around your proc mixed -call. I didn't take a look at the proc mixed -specification, but that may not work as described in your example.
The loop works however, and loops through whatever you put in the place of the proc mixed -call and the loop is dynamically sized based on the number of elements in the dependent variable list.
First define some macro variables.
%let y_var_list = read math science english spanish;
%let x_var_list = gender age race;
%let mydata = my_student_data;
Then define the macro that does the looping.
%macro do_analysis(my_data=, y_variables=, x_variables=);
%* this checks the nr of variables in y_var_list;
%let len_var_list = %eval(%sysfunc(count(&y_variables., %quote( )))+1);
%do _i=1 %to &len_var_list;
%let y_var = %scan(&y_variables, &_i);
%put &y_var; %* just printing out the macrovar to be sure it works;
%* model specification;
proc mixed data=&my_data.; %* data given as parameter in the macro call. proc mixed probably needs some output options too, to work;
model &y_var = &x_variables/ solution; %* independent vars as a macro parameter;
random int/subject=School;
run;
%end;
%mend do_analysis;
Last but not least, remember to call your macro with the given variable lists and dataset specifications. Hope this helps!
%do_analysis(my_data=&mydata, y_variables=&y_var_list, x_variables=&x_var_list);

SAS DI LAG1 alternative?

Trying to use the LAG function in SAS to replicate a piece of code in a migration into SAS DI, however there doesn't seem to be the same function in SAS DI at all.
Using SAS DI 4.21 currently, with a view to move up to 4.9 soon.
So my question is, is there an alternative way of replicating the following code in SAS DI:
DATA work.dm_chg_bal;
SET tmp_bal_chg;
FORMAT dt2 date9.;
acct_id2 = LAG1(acct_id);
app_suf2 = LAG1(app_suf);
dt2 = LAG1(start_dt);
RUN;
Cheers,
For this I would use a User Written transform. There is really no issue with doing this so long as you are diligent enough to perform the variable mapping - hence retaining the data lineage in metadata.
An explanation of this is available here.
I don't know the DI Studio transformations well (I typically only use the User Written transform).
I wonder if there is a transformation you could trick into generating:
data work.dm_chg_bal;
set tmp_bal_chg;
output;
set tmp_bal_chg(rename=(acct_id=acct_id2 app_suf=app_suf2 start_dt=dt2));
run;
or
data work.dm_chg_bal;
if _n_ > 1 then set tmp_bal_chg(rename=(acct_id=acct_id2 app_suf=app_suf2 start_dt=dt2));
set tmp_bal_chg;
run;
If not, I'm sure there are data transformations that would allow you to make two copies of the dataset, one with ID=_n_ and the other with ID=_n_+1, and then merge by ID. That is, generate:
data main;
set tmp_bal_chg;
ID = _n_ ;
run;
data lag;
set tmp_bal_chg (rename=(acct_id=acct_id2 app_suf=app_suf2 start_dt=dt2));
ID = _n_ + 1;
run;
data work.dm_chg_bal;
merge main (in=a)
lag (keep=id acct_id2 app_suf2 dt2 in=b)
;
by id;
if a;
run;

PROC FREQ on multiple variables combined into one table

I have the following problem. I need to run PROC FREQ on multiple variables, but I want the output to all be on the same table. Currently, a PROC FREQ statement with something like TABLES ERstatus Age Race, InsuranceStatus; will calculate frequencies for each variable and print them all on separate tables. I just want the data on ONE table.
Any help would be appreciated. Thanks!
P.S. I tried using PROC TABULATE, but it didn't not calculate N correctly, so I'm not sure what I did wrong. Here is my code for PROC TABULATE. My variables are all categorical, so I just need to know N and percentages.
PROC TABULATE DATA = BCanalysis;
CLASS ERstatus PRstatus Race TumorStage InsuranceStatus;
TABLE (ERstatus PRstatus Race TumorStage) * (N COLPCTN), InsuranceStatus;
RUN;
The above code does not return the correct frequencies based on InsuranceStatus where 0 = insured and 1 = uninsured, but PROC FREQ does. Also doesn't calculate correctly with ROWPCTN. So any way that I can get PROC FREQ to calculate multiple variables on one table, or PROC TABULATE to return the correct frequencies, would be appreciated.
Here is a nice image of my output in a simplified analysis of only ERstatus and InsuranceStatus. You can see that PROC FREQ returns 204 people with an ERstatus of 1 and InsuranceStatus of 1. That's correct. The values in PROC TABULATE are not.
OUTPUT
I'll answer this separately as this is answering the other possible interpretation of the question; when it's clarified I'll delete one or the other.
If you want this in a single printed table, then you either need to use proc tabulate or you need to normalize your data - meaning put it in the form of variable | value. PROC FREQ is not capable of doing multiple one-way frequencies in a single table.
For PROC TABULATE, likely your issue is missing data. Any variable that is on the class statement will be checked for missingness, and if any rows are missing data for any of the class variables, those rows are entirely excluded from the tabulation for all variables.
You can override this by adding the missing option on the class statement, or in the table statement, or in the proc tabulate statement. So:
PROC TABULATE DATA = BCanalysis;
CLASS ERstatus PRstatus Race TumorStage InsuranceStatus/missing;
TABLE (ERstatus PRstatus Race TumorStage) * (N COLPCTN), InsuranceStatus;
RUN;
This will result in a slightly different appearance than on your table, though, as it will include the missing rows in places you probably do not want them, and they'll be factored against the colpctn when again you probably don't want them.
Typically some manipulation is then necessary; the easiest is to normalize your data and then run a tabulation (using PROC TABULATE or PROC FREQ, whichever is more appropriate; TABULATE has better percentaging options though) against that normalized dataset.
Let's say we have this:
data class;
set sashelp.class;
if _n_=5 then call missing(age);
if _n_=3 then call missing(sex);
run;
And we want these two tables in one table.
proc freq data=class;
tables age sex;
run;
If we do this:
proc tabulate data=class;
class age sex;
tables (age sex),(N colpctn);
run;
Then we get an N=17 total for both subtables - that's not what we want, we want N=18. Then we can do:
proc tabulate data=class;
class age sex/missing;
tables (age sex),(N colpctn);
run;
But that's not quite right either; I want F to have 8/18 = 44.44% and M 10/18 = 55.55%, not 42% and 53% with 5% allocated to the missing row.
The way I do this is to normalize the data. This means you get a dataset with 2 variables, varname and val, or whatever makes sense for your data, plus whatever identifier/demographic/whatnot variables you might have. val has to be character unless all of your values are numeric.
So for example here I normalize class with age and sex variables. I don't keep any identifiers, but you certainly could in your data, I imagine InsuranceStatus would be kept there if I understand what you're doing in that table. Once I have the normalized table, I just use those two variables, and carefully construct a denominator definition in proc tabulate to have the right basis for my pctn value. It's not quite the same as the single table before - the variable name is in its own column, not on top of the list of values - but honestly that looks better in my opinion.
data class_norm;
set class;
length val $2;
varname='age';
val=put(age,2. -l);
if not missing(age) then output;
varname='sex';
val=sex;
if not missing(sex) then output;
keep varname val;
run;
proc tabulate data=class_norm;
class varname val;
tables varname=' '*val=' ',n pctn<val>;
run;
If you want something better than this, you'll probably have to construct it in proc report. That gives you the most flexibility, but is the most onerous to program in also.
You can use ODS OUTPUT to get all of the PROC FREQ output to one dataset.
ods output onewayfreqs=class_freqs;
proc freq data=sashelp.class;
tables age sex;
run;
ods output close;
or
ods output crosstabfreqs=class_tabs;
proc freq data=sashelp.class;
tables sex*(height weight);
run;
ods output close;
Crosstabfreqs is the name of the cross-tab output, while one-way frequencies are onewayfreqs. You can use ods trace to find out the name if you forget it.
You may (probably will) still need to manipulate this dataset some to get the structure you want ultimately.

proc report print null dataset

I have a null dataset such as
data a;
if 0;
run;
Now I wish to use proc report to print this dataset. Of course, there will be nothing in the report, but I want one sentence in the report said "It is a null dataset". Any ideas?
Thanks.
You can test to see if there are any observations in the dataset first. If there are observations, then use the dataset, otherwise use a dummy dataset that looks like this and print it:
data use_this_if_no_obs;
msg = 'It is a null dataset';
run;
There are plenty of ways to test datasets to see if they contain any observations or not. My personal favorite is the %nobs macro found here: https://stackoverflow.com/a/5665758/214994 (other than my answer, there are several alternate approaches to pick from, or do a google search).
Using this %nobs macro we can then determine the dataset to use in a single line of code:
%let ds = %sysfunc(ifc(%nobs(iDs=sashelp.class) eq 0, use_this_if_no_obs, sashelp.class));
proc print data=&ds;
run;
Here's some code showing the alternate outcome:
data for_testing_only;
if 0;
run;
%let ds = %sysfunc(ifc(%nobs(iDs=for_testing_only) eq 0, use_this_if_no_obs, sashelp.class));
proc print data=&ds;
run;
I've used proc print to simplify the example, but you can adapt it to use proc report as necessary.
For the no data report you don't need to know how many observations are in the data just that there are none. This example shows how I would approach the problem.
Create example data with zero obs.
data class;
stop;
set sashelp.class;
run;
Check for no obs and add one obs with missing on all vars. Note that no observation are every read from class in this step.
data class;
if eof then output;
stop;
modify class end=eof;
run;
make the report
proc report data=class missing;
column _all_;
define _all_ / display;
define name / order;
compute before name;
retain_name=name;
endcomp;
compute after;
if not missing(retain_name) then l=0;
else l=40;
msg = 'No data for this report';
line msg $varying. l;
endcomp;
run;

SAS Drop Condition Output - Not enough variables observed in Output

I am learning drop_conditions in SAS. I am requesting a sample on three variables, however only two observations are retrieved. Please advise! Thank you!
data temp;
set mydata.ames_housing_data;
format drop condition $30.;
if (LotArea in (7000:9000)) then drop_condition = '01: Between 7000-9000 sqft';
else if (LotShape EQ 'IR3') then drop_condition = '02: Irregular Shape of Property';
else if (condition1 in 'Artery' OR 'Feedr') then drop_condition = '03: Proximity to arterial/feeder ST';
run;
proc freq data=temp;
tables drop_condition;
title 'Sample Waterfall';
run; quit;
Your conditions/comparisons aren't specified correctly, I think you're looking for the IN operator to do multiple comparisons in a single line. I'm surprised there aren't errors in your log.
Rather than the following:
if (LotArea = 7000:9000)
Try:
if (lotArea in (7000:9000))
and
if (condition1 EQ 'Atrery' OR 'Feedr')
should be
if (condition1 in ('Atrery', 'Feedr'))
EDIT:
You also need to specify a length for the drop_condition variable, instead of a format to ensure the variable is long enough to hold the text specified. It's also useful to verify your answer afterwards with a proc freq against the specified conditions, for example:
proc freq data=temp;
where drop_condition=:'01';
tables drop_condition*lot_area;
run;