Logistic regression with BY statement ERROR message - sas

I'm currently working on a SAS program that processes 50 logistic regression for 50 different samples. I previously had help on this thread (How to loop a logistic regression n number of times?), people advised me to use a BY statement to avoid looping this process n times. Works really well but I get this ERROR MESSAGE:
ERROR: No valid observations due either to missing values in the response, explanatory, frequency, or weight variable, or to
nonpositive frequency or weight values.
NOTE: The above message was for the following BY group:
Sample Replicate Number=.
You'll find my code below, if any of you have an idea of where does it come from, I'm open to anything, thank you in advance!
proc surveyselect data=TOP_1 NOPRINT out=ALEA_1
seed=0
method=urs
outhits
reps=5
n=300;
run;
proc surveyselect data=TOP_0 NOPRINT out=ALEA_0
seed=0
method=urs
outhits
reps=5
n=300;
run;
PROC SQL;
CREATE TABLE APPEND_TABLE As
SELECT * FROM ALEA_1
OUTER UNION CORR
SELECT * FROM ALEA_0;
QUIT;
/* Régression logistique*/
DATA WORK.TMP0TempTableAddtnlPredictData;
SET WORK.APPEND_TABLE(IN=__ORIG) WORK.BASE_PREDICT_2;
__FLAG=__ORIG;
__DEP=TOP_CREDIT_HABITAT_2017;
if not __FLAG then TOP_CREDIT_HABITAT_2017=.;
RUN;
PROC SQL;
CREATE VIEW WORK.SORTTempTableSorted AS
SELECT *
FROM WORK.TMP0TempTableAddtnlPredictData
ORDER BY REPLICATE;
QUIT;
TITLE;
TITLE1 "Résultats de la régression logistique";
FOOTNOTE;
FOOTNOTE1 "Généré par le Système SAS (&_SASSERVERNAME, &SYSSCPL) le %TRIM(%QSYSFUNC(DATE(), NLDATE20.)) à %TRIM(%SYSFUNC(TIME(), TIMEAMPM12.))";
PROC LOGISTIC DATA=WORK.SORTTempTableSorted
PLOTS(ONLY)=ROC
;
By Replicate;
CLASS age_classe (PARAM=EFFECT) Flag_bq_principale (PARAM=EFFECT) flag_univers_detenus (PARAM=EFFECT) csp_1 (PARAM=EFFECT) SGMT_FIDELITE (PARAM=EFFECT) situ_fam_1 (PARAM=EFFECT);
MODEL TOP_CREDIT_HABITAT_2017 (Event = '1') [...6## Heading ##] /
SELECTION=STEPWISE
SLE=0.1
SLS=0.1
INCLUDE=0
LINK=LOGIT
;
OUTPUT OUT=WORK.PREDLogRegPredictions(LABEL="Statistiques et prédictions de régression logistique pour WORK.APPEND_TABLE" WHERE=(NOT ws__FLAG))
PREDPROBS=INDIVIDUAL;
RUN;
QUIT;
DATA WORK.PREDLogRegPredictions;
set WORK.PREDLogRegPredictions;
TOP_CREDIT_HABITAT_2017=__DEP;
_FROM_=__DEP;
DROP __DEP;
DROP __FLAG;
RUN ;
QUIT ;
/* Création du fichier de sorti final*/
PROC SQL;
CREATE TABLE MODELE_RESULTS As
SELECT IDCLI_CALCULE, IP_1
FROM PREDLogRegPredictions;
RUN;
QUIT;
ODS GRAPHICS OFF;

Probably from this:
DATA WORK.TMP0TempTableAddtnlPredictData;
SET WORK.APPEND_TABLE(IN=__ORIG) WORK.BASE_PREDICT_2;
__FLAG=__ORIG;
__DEP=TOP_CREDIT_HABITAT_2017;
if not __FLAG then TOP_CREDIT_HABITAT_2017=.;
RUN;
You're appending a dataset that does not have a replicate number on it here. I'm not really sure I follow what this dataset is - are you intending this to be added to each replicate perhaps? Then you might do something like this (untested):
DATA WORK.TMP0TempTableAddtnlPredictData;
do _n_ = 1 by 1 until (eof);
SET WORK.APPEND_TABLE(IN=__ORIG) end=eof;
output;
end;
do replicate = 1 to 5;
do n_predict = 1 to nobs_predict;
set WORK.BASE_PREDICT_2 nobs=nobs_predict point=n_predict;
__FLAG=__ORIG;
__DEP=TOP_CREDIT_HABITAT_2017;
if not __FLAG then TOP_CREDIT_HABITAT_2017=.;
output;
end;
end;
stop;
RUN;
This is the complicated way to get 5 copies of that, one for each replicate. But I'm not sure that's actually what you want - does it even have all of the variables you need? Are you sure you didn't mean to MERGE instead of SET?
Separately, I don't understand why you use the SQL step to append the two samples. I'd either do that in the same data step here or I'd use PROC APPEND, both would be faster than the SQL union and then immediately appending more to the dataset.

Related

I need to find top 5 Transaction_Due_Date in Proc sql

This code works for top value but I need top 5 values
proc sql;
create table cash.gO5 as
select * , max(Transaction_Due_Date) as max1 format = date9.
from cash.Orders_Dim65
group by Customer_Name;
quit;
PROC SQL does not support order analytical functions such as rank() as found in other flavors of SQL; however, there are numerous ways in which you can get a rank by group. Here are a few options you can use.
Option 1: PROC RANK
proc rank does exactly what it sounds like: ranks stuff. Note that your data must be sorted if being used in SAS 9 or SPRE.
proc rank data=sashelp.cars
out=want(where=(msrp_rank LE 5))
descending;
by make;
var msrp; /* Variable to rank */
ranks msrp_rank; /* Name of variable holding ranks */
run;
Option 2: Data Step
You can rank using a data step. Note that your data must be sorted if using SAS 9 or SPRE.
proc sort data=sashelp.cars
out=cars;
by make descending msrp;
run;
data want;
set cars;
by make descending msrp;
if(first.make) then Rank = 0;
Rank+1;
if(Rank LE 5);
run;
Option 3: simple.topK CAS Action
If you have Viya, you can use CAS actions to quickly rank large datasets. This can be used in both SAS and Python with the SWAT package.
/* Load sashelp.cars into CAS */
data casuser.cars;
set sashelp.cars;
run;
proc cas;
simple.topk result=r /
table = {caslib='casuser' name='cars' groupby='make'}
casout = {caslib='casuser' name='cars_top_5' replace=true}
aggregator ='max'
bottomK = 0
topK = 5
inputs = {{name='msrp'}}
;
quit;

PROC REPORT within DATA in SAS

I am trying to do a simple thing - write a PROC REPORT procedure within a DATA sentence. My main idea is - if the condition in data step is true - lets execute PROC REPORT, if it is false - do not execute PROC REPORT. Any ideas? Code runs without errors for now, but I see that condition in IF statement is not applied and PROC REPORT is ececute despite the fact that condition is not fulfilled.
Thank you in Advance.
%let DATO = 13062016;
PROC IMPORT OUT= WORK.auto1 DATAFILE= "C:\Users\BC1554\Desktop\andel.xlsx"
DBMS=xlsx REPLACE;
SHEET="sheet1";
GETNAMES=YES;
RUN;
data want;
set WORK.auto1;
rownum=_n_;
run;
DATA tbl2;
SET want;
if (rownum => 1 and rownum <=6 ) then output work.tbl2 ;
RUN;
ODS NORESULTS;
ods LISTING close;
ODS RTF FILE="C:\Users\BC1554\Desktop\Statistik_andel_&DATO..rtf";
title "Statistics from monthly run of DK shares of housing companies (andelsboliger)";
data Tbl21 ;
set work.Tbl2;
where (DKANDEL='Daekning_pct_24052016' or DKANDEL='Daekning_pct_18042016') ;
difference = dif(Andel);
difference1 = dif(Total);
run;
data Tbl211 ;
set work.Tbl21;
where (DKANDEL='Daekning_pct_18042016') ;
run;
data Tbl2111 ;
set work.Tbl211;
where (DKANDEL='Daekning_pct_18042016') ;
if abs(difference) > 10 and abs (difference1) > 107 then ;
run;
proc report data= work.Tbl2 spanrows;
columns DKANDEL Andel Total Ukendt ;
title2 "-";
title3 "We REPORT numbers on p.4-5".;
title4 "-";
title5 "The models coverage";
title6 "Run date &DATO.";
footnote1 "Assets without currency code not included";
define DKANDEL / order;
define Andel / order;
define Total / order;
define Ukendt / order;
define DKANDEL/ display;
define Andel / display;
Compute DKANDEL;
call define (_col_,"style","style={background=orange}");
endcomp;
Compute Andel;
call define (_col_,"style","style={background=red}");
endcomp;
run; title; footnote1;
ODS RTF close;
ODS LISTING;
title;
run;
To conditionally execute code you need to use a macro so that you can use macro logic like %IF to conditionally generate the code.
But for your simple problem you can use a macro variable to modify the RUN; statement on your PROC REPORT step. Create a macro variable and set it to the value CANCEL when you don't want the step to run.
%let cancel=CANCEL;
...
if abs(difference) > 10 and abs (difference1) > 107 then call symputx('cancel','');
...
proc report ... ;
...
run &cancel ;
Simple example. Produce report if anyone is aged 13.
%let cancel=CANCEL;
data _null_;
set sashelp.class ;
if age=13 then call symputx('cancel',' ');
run;
proc report data=sashelp.class ;
run &cancel;
Tom's answer is a good one, and probably what I'd do. But, an alternative that is more exactly what you suggested in the question seems also appropriate.
The way you execute a PROC REPORT in a data step (or execute any non-data-step code in a data step) is with call execute. You can use call execute to execute a macro, or just a string of code; up to you how you want to handle it. I would make it a macro, because that makes development much easier (you can write the macro just like regular code, and you can test it independently).
Here's a simple example that is analogous to what Tom put in his answer.
%macro print_report(data=);
proc report data=&data.;
run;
%mend print_report;
data _null_;
set sashelp.class ;
if age=13 then do;
call execute('%print_report(data=sashelp.class)');
stop; *prevent it from donig this more than once;
end;
run;

Create new variables from format values

What i want to do: I need to create a new variables for each value labels of a variable and do some recoding. I have all the value labels output from a SPSS file (see sample).
Sample:
proc format; library = library ;
value SEXF
1 = 'Homme'
2 = 'Femme' ;
value FUMERT1F
0 = 'Non'
1 = 'Oui , occasionnellement'
2 = 'Oui , régulièrement'
3 = 'Non mais j''ai déjà fumé' ;
value ... (many more with different amount of levels)
The new variable name would be the actual one without F and with underscore+level (example: FUMERT1F level 0 would become FUMERT1_0).
After that i need to recode the variables on this pattern:
data ds; set ds;
FUMERT1_0=0;
if FUMERT1=0 then FUMERT1_0=1;
FUMERT1_1=0;
if FUMERT1=1 then FUMERT1_1=1;
FUMERT1_2=0;
if FUMERT1=2 then FUMERT1_2=1;
FUMERT1_3=0;
if FUMERT1=3 then FUMERT1_3=1;
run;
Any help will be appreciated :)
EDIT: Both answers from Joe and the one of data_null_ are working but stackoverflow won't let me pin more than one right answer.
Update to add an _ underscore to the end of each name. It looks like there is not option for PROC TRANSREG to put an underscore between the variable name and the value of the class variable so we can just do a temporary rename. Create rename name=newname pairs to rename class variable to end in underscore and to rename them back. CAT functions and SQL into macro variables.
data have;
call streaminit(1234);
do caseID = 1 to 1e4;
fumert1 = rand('table',.2,.2,.2) - 1;
sex = first(substrn('MF',rand('table',.5),1));
output;
end;
stop;
run;
%let class=sex fumert1;
proc transpose data=have(obs=0) out=vnames;
var &class;
run;
proc print;
run;
proc sql noprint;
select catx('=',_name_,cats(_name_,'_')), catx('=',cats(_name_,'_'),_name_), cats(_name_,'_')
into :rename1 separated by ' ', :rename2 separated by ' ', :class2 separated by ' '
from vnames;
quit;
%put NOTE: &=rename1;
%put NOTE: &=rename2;
%put NOTE: &=class2;
proc transreg data=have(rename=(&rename1));
model class(&class2 / zero=none);
id caseid;
output out=design(drop=_: inter: rename=(&rename2)) design;
run;
%put NOTE: _TRGIND(&_trgindn)=&_trgind;
First try:
Looking at the code you supplied and the output from Joe's I don't really understand the need for the formats. It looks to me like you just want to create dummies for a list of class variables. That can be done with TRANSREG.
data have;
call streaminit(1234);
do caseID = 1 to 1e4;
fumert1 = rand('table',.2,.2,.2) - 1;
sex = first(substrn('MF',rand('table',.5),1));
output;
end;
stop;
run;
proc transreg data=have;
model class(sex fumert1 / zero=none);
id caseid;
output out=design(drop=_: inter:) design;
run;
proc contents;
run;
proc print data=design(obs=40);
run;
One good alternative to your code is to use proc transpose. It won't get you 0's in the non-1 cells, but those are easy enough to get. It does have the disadvantage that it makes it harder to get your variables in a particular order.
Basically, transpose once to vertical, then transpose back using the old variable name concatenated to the variable value as the new variable name. Hat tip to Data null for showing this feature in a recent SAS-L post. If your version of SAS doesn't support concatenation in PROC TRANSPOSE, do it in the data step beforehand.
I show using PROC EXPAND to then set the missings to 0, but you can do this in a data step as well if you don't have ETS or if PROC EXPAND is too slow. There are other ways to do this - including setting up the dataset with 0s pre-proc-transpose - and if you have a complicated scenario where that would be needed, this might make a good separate question.
data have;
do caseID = 1 to 1e4;
fumert1 = rand('Binomial',.3,3);
sex = rand('Binomial',.5,1)+1;
output;
end;
run;
proc transpose data=have out=want_pre;
by caseID;
var fumert1 sex;
copy fumert1 sex;
run;
data want_pre_t;
set want_pre;
x=1; *dummy variable;
run;
proc transpose data=want_pre_t out=want delim=_;
by caseID;
var x;
id _name_ col1;
copy fumert1 sex;
run;
proc expand data=want out=want_e method=none;
convert _numeric_ /transformin=(setmiss 0);
run;
For this method, you need to use two concepts: the cntlout dataset from proc format, and code generation. This method will likely be faster than the other option I presented (as it passes through the data only once), but it does rely on the variable name <-> format relationship being straightforward. If it's not, a slightly more complex variation will be required; you should post to that effect, and this can be modified.
First, the cntlout option in proc format makes a dataset of the contents of the format catalog. This is not the only way to do this, but it's a very easy one. Specify the appropriate libname as you would when you create a format, but instead of making one, it will dump the dataset out, and you can use it for other purposes.
Second, we create a macro that performs your action one time (creating a variable with the name_value name and then assigning it to the appropriate value) and then use proc sql to make a bunch of calls to that macro, once for each row in your cntlout dataset. Note - you may need a where clause here, or some other modifications, if your format library includes formats for variables that aren't in your dataset - or if it doesn't have the nice neat relationship your example does. Then we just make those calls in a data step.
*Set up formats and dataset;
proc format;
value SEXF
1 = 'Homme'
2 = 'Femme' ;
value FUMERT1F
0 = 'Non'
1 = 'Oui , occasionnellement'
2 = 'Oui , régulièrement'
3 = 'Non mais j''ai déjà fumé' ;
quit;
data have;
do caseID = 1 to 1e4;
fumert1 = rand('Binomial',.3,3);
sex = rand('Binomial',.5,1)+1;
output;
end;
run;
*Dump formats into table;
proc format cntlout=formats;
quit;
*Macro that does the above assignment once;
%macro spread_var(var=, val=);
&var._&val.= (&var.=&val.); *result of boolean expression is 1 or 0 (T=1 F=0);
%mend spread_var;
*make the list. May want NOPRINT option here as it will make a lot of calls in your output window otherwise, but I like to see them as output.;
proc sql;
select cats('%spread_var(var=',substr(fmtname,1,length(Fmtname)-1),',val=',start,')')
into :spreadlist separated by ' '
from formats;
quit;
*Actually use the macro call list generated above;
data want;
set have;
&spreadlist.;
run;

Macro not loading data set

Previous Posts :
Variable check and summary out
Macro that outputs table with testing results of SAS table
Question/Problem
From the previous posts, I thought I was able to run the macro and produce the desired results. However, after finally getting a report back that the output is not working I'm really confused as to why I'm getting the error that there were missing variables. It appears as if the data set is not being loaded after sub-setting. I'm able to process basic summary statistic tables, but when I load the macro the output is not working.
Why is the data set not loading? Does a macro require a certain type of data set?
Note : A limitation is that I do not have access to the data set, so I must send code to be run and won't get results for a few days. It's a very long and frustrating process, but I'm sure some can relate.
The code that is causing problems is the macro (in beginning of code) and the very last section which calls the macro with the data set.
Error Log :
Code :
# Filename : Census2007_Hawaii_BearingCoffee_BigIsland.sas
/******************************************************************
Clearance Test Macro
input_dataset - desired dataset which variables are located
output_dataset - an output table with test results
variable_to_consider - list of variables to compute test on
*******************************************************************/
%macro clearance_test(input_dataset= ,output_dataset=, variable_to_consider=);
%let variable_to_consider=%cmpres(&variable_to_consider);
proc sql noprint;
select count(*) into : obs_count from &input_dataset;
quit;
%let obs_count=&obs_count;
proc transpose data=&input_dataset out=&output_dataset prefix=top_;
var &variable_to_consider;
run;
data &output_dataset;
set &output_dataset end=eof;
array top(*) top_&obs_count.-top_1;
x=dim(top);
call sortn(of top[*]);
total=sum(of top[*]);
top_2_total=sum(top_1, top_2);
if sum(top_1,top_2) > 0.9 * total then Flag90=1; else Flag90=0;
if top_1 > total * 0.6 then Flag60=1; else Flag60=0;
keep total top_1 top_2 _name_ top_2_total total Flag60 Flag90;
run;
%mend mymacro;
/***********************************************************************/
*Define file path statics;
Libname def 'P:\Hawaii_Arita\John_Hawaii_Coffee\Datasets';
Libname abc "P:\Hawaii_Arita\John_Hawaii_Coffee\Datasets";
option obs=max;
/* Initialize database */
DATA def.Census2007_Hawaii_Coffee;
SET abc.census2007_hawaii_SubSet_Coffee;
**<create the variables used in the macro> **;
RUN;
/* Clearance Test Results */
%clearance_test(input_dataset=def.census2007_hawaii_SubSet_Coffee, output_dataset=test_data ,variable_to_consider= OIR OIRO ROA ROAO SProfit
LProfit SProfitAcre LProfitAcre Profitable MachineandRent UtilityandFuel LaborH LaborO FertilizerandChem MaintandCustom
Interest Tax Dep Others TFPE_cal operators workers operatorsandworkers)
A Complete/Verifiable Example :
This has been tested on the remote machine and works perfectly.
/* Create test data set*/
data business_data;
do firm = 1 to 3;
revenue = rand("uniform");
costs = rand("uniform");
profits = rand("uniform");
vcost = rand("uniform");
output;
end;
run;
/******************************************************************
Clearance Test Macro
input_dataset - desired dataset which variables are located
output_dataset - an output table with test results
variable_to_consider - list of variables to compute test on
*******************************************************************/
%macro clearance_test(input_dataset= ,output_dataset=, variable_to_consider=);
%let variable_to_consider=%cmpres(&variable_to_consider);
proc sql noprint;
select count(*) into : obs_count from &input_dataset;
quit;
%let obs_count=&obs_count;
proc transpose data=&input_dataset out=&output_dataset prefix=top_;
var &variable_to_consider;
run;
data &output_dataset;
set &output_dataset end=eof;
array top(*) top_&obs_count.-top_1;
x=dim(top);
call sortn(of top[*]);
total=sum(of top[*]);
top_2_total=sum(top_1, top_2);
if sum(top_1,top_2) > 0.9 * total then Flag90=1; else Flag90=0;
if top_1 > total * 0.6 then Flag60=1; else Flag60=0;
keep total top_1 top_2 _name_ top_2_total total Flag60 Flag90;
run;
%mend mymacro;
/* Print summary table, run macro, and print clearance test table */
PROC MEANS data = business_data n sum mean median std;
VAR revenue costs profits vcost;
RUN;
%clearance_test(input_dataset=business_data, output_dataset=test_data ,
variable_to_consider=revenue costs profits vcost)
proc print data = test_data; run;
This is where a minimal, complete verifiable example (MCVE) would be helpful for testing whether your problem is a problem with the code, or the data.
Here's the code above, but with a SASHELP dataset (those are built-in to SAS so everyone has them).
%macro clearance_test(input_dataset= ,output_dataset=, variable_to_consider=);
%let variable_to_consider=%cmpres(&variable_to_consider);
proc sql noprint;
select count(*) into : obs_count from &input_dataset;
quit;
%let obs_count=&obs_count;
proc transpose data=&input_dataset out=&output_dataset prefix=top_;
var &variable_to_consider;
run;
data &output_dataset;
set &output_dataset end=eof;
array top(*) top_&obs_count.-top_1;
x=dim(top);
call sortn(of top[*]);
total=sum(of top[*]);
top_2_total=sum(top_1, top_2);
if sum(top_1,top_2) > 0.9 * total then Flag90=1; else Flag90=0;
if top_1 > total * 0.6 then Flag60=1; else Flag60=0;
keep total top_1 top_2 _name_ top_2_total total Flag60 Flag90;
run;
%mend clearance_test;
%clearance_test(input_dataset=sashelp.cars, output_dataset=work.test, variable_to_consider=mpg_city mpg_highway);
That's the exact macro, just using a different input dataset. It works correctly on my machine (the flag variables are meaningless since the data isn't right for them, but the code works).
Run the same on your colleague's machine, and if it runs, then you know the data is the problem (ie, the dataset doesn't have the variables you think it does). If it doesn't run, then you have some other problem (perhaps an issue with how it's being submitted, maybe you end up with spurious characters or something).

How do I eliminate variables with missing results in SAS?

Here are my results. Since PPD has a missing result I'd like to eliminate all results for PPD. I.e. I'd like to eliminate all records where ticker='PPD' if any record where ticker='PPD' has a missing result (corr).
How can I program this in SAS? I don't want to just eliminate that missing observation but eliminate PPD altogether. Thanks.
Ticker Day Corr
PPD 7 -1
PPD 8
PTP 7 0.547561231
PTP 8 0.183279038
Lots of ways to do this, and what is most efficient depends on your data. If you don't have too much data, then I'd use the easiest method that fits with your knowledge and other habits.
*SQL delete;
proc sql;
delete from have H where exists (
select 1 from have V where H.ticker=V.ticker and V.corr is null);
quit;
*FREQ for missing (or means or whatever) then delete from that;
*Requires have to be sorted.;
proc freq data=have;
tables ticker*corr/missing out=ismiss(where=(missing(corr)));
run;
data want;
merge have(in=_h) ismiss(in=_m);
by ticker;
if _h and not _m;
run;
*double DoW. Requires either dataset is sorted by ticker,;
*or requires it to be organized by ticker (but tickers can be not alphabetically sorted); *and use norsorted on by statement;
data want;
do _n_=1 by 1 until (last.ticker);
set have;
by ticker;
if missing(corr) then _miss=1;
end;
do _n_=1 by 1 until (last.ticker);
set have;
by ticker;
if _miss ne 1 then output;
end;
run;
This is easily accomplished in PROC SQL...
proc sql ;
create table to_delete as
select distinct ticker
from mydata
where missing(corr) ;
delete from mydata
where ticker in(select ticker from to_delete) ;
quit ;
Unfortunately, it can't be done in a single SQL step as the delete from statement would recursively reference the source dataset.