Macro not loading data set - sas

Previous Posts :
Variable check and summary out
Macro that outputs table with testing results of SAS table
Question/Problem
From the previous posts, I thought I was able to run the macro and produce the desired results. However, after finally getting a report back that the output is not working I'm really confused as to why I'm getting the error that there were missing variables. It appears as if the data set is not being loaded after sub-setting. I'm able to process basic summary statistic tables, but when I load the macro the output is not working.
Why is the data set not loading? Does a macro require a certain type of data set?
Note : A limitation is that I do not have access to the data set, so I must send code to be run and won't get results for a few days. It's a very long and frustrating process, but I'm sure some can relate.
The code that is causing problems is the macro (in beginning of code) and the very last section which calls the macro with the data set.
Error Log :
Code :
# Filename : Census2007_Hawaii_BearingCoffee_BigIsland.sas
/******************************************************************
Clearance Test Macro
input_dataset - desired dataset which variables are located
output_dataset - an output table with test results
variable_to_consider - list of variables to compute test on
*******************************************************************/
%macro clearance_test(input_dataset= ,output_dataset=, variable_to_consider=);
%let variable_to_consider=%cmpres(&variable_to_consider);
proc sql noprint;
select count(*) into : obs_count from &input_dataset;
quit;
%let obs_count=&obs_count;
proc transpose data=&input_dataset out=&output_dataset prefix=top_;
var &variable_to_consider;
run;
data &output_dataset;
set &output_dataset end=eof;
array top(*) top_&obs_count.-top_1;
x=dim(top);
call sortn(of top[*]);
total=sum(of top[*]);
top_2_total=sum(top_1, top_2);
if sum(top_1,top_2) > 0.9 * total then Flag90=1; else Flag90=0;
if top_1 > total * 0.6 then Flag60=1; else Flag60=0;
keep total top_1 top_2 _name_ top_2_total total Flag60 Flag90;
run;
%mend mymacro;
/***********************************************************************/
*Define file path statics;
Libname def 'P:\Hawaii_Arita\John_Hawaii_Coffee\Datasets';
Libname abc "P:\Hawaii_Arita\John_Hawaii_Coffee\Datasets";
option obs=max;
/* Initialize database */
DATA def.Census2007_Hawaii_Coffee;
SET abc.census2007_hawaii_SubSet_Coffee;
**<create the variables used in the macro> **;
RUN;
/* Clearance Test Results */
%clearance_test(input_dataset=def.census2007_hawaii_SubSet_Coffee, output_dataset=test_data ,variable_to_consider= OIR OIRO ROA ROAO SProfit
LProfit SProfitAcre LProfitAcre Profitable MachineandRent UtilityandFuel LaborH LaborO FertilizerandChem MaintandCustom
Interest Tax Dep Others TFPE_cal operators workers operatorsandworkers)
A Complete/Verifiable Example :
This has been tested on the remote machine and works perfectly.
/* Create test data set*/
data business_data;
do firm = 1 to 3;
revenue = rand("uniform");
costs = rand("uniform");
profits = rand("uniform");
vcost = rand("uniform");
output;
end;
run;
/******************************************************************
Clearance Test Macro
input_dataset - desired dataset which variables are located
output_dataset - an output table with test results
variable_to_consider - list of variables to compute test on
*******************************************************************/
%macro clearance_test(input_dataset= ,output_dataset=, variable_to_consider=);
%let variable_to_consider=%cmpres(&variable_to_consider);
proc sql noprint;
select count(*) into : obs_count from &input_dataset;
quit;
%let obs_count=&obs_count;
proc transpose data=&input_dataset out=&output_dataset prefix=top_;
var &variable_to_consider;
run;
data &output_dataset;
set &output_dataset end=eof;
array top(*) top_&obs_count.-top_1;
x=dim(top);
call sortn(of top[*]);
total=sum(of top[*]);
top_2_total=sum(top_1, top_2);
if sum(top_1,top_2) > 0.9 * total then Flag90=1; else Flag90=0;
if top_1 > total * 0.6 then Flag60=1; else Flag60=0;
keep total top_1 top_2 _name_ top_2_total total Flag60 Flag90;
run;
%mend mymacro;
/* Print summary table, run macro, and print clearance test table */
PROC MEANS data = business_data n sum mean median std;
VAR revenue costs profits vcost;
RUN;
%clearance_test(input_dataset=business_data, output_dataset=test_data ,
variable_to_consider=revenue costs profits vcost)
proc print data = test_data; run;

This is where a minimal, complete verifiable example (MCVE) would be helpful for testing whether your problem is a problem with the code, or the data.
Here's the code above, but with a SASHELP dataset (those are built-in to SAS so everyone has them).
%macro clearance_test(input_dataset= ,output_dataset=, variable_to_consider=);
%let variable_to_consider=%cmpres(&variable_to_consider);
proc sql noprint;
select count(*) into : obs_count from &input_dataset;
quit;
%let obs_count=&obs_count;
proc transpose data=&input_dataset out=&output_dataset prefix=top_;
var &variable_to_consider;
run;
data &output_dataset;
set &output_dataset end=eof;
array top(*) top_&obs_count.-top_1;
x=dim(top);
call sortn(of top[*]);
total=sum(of top[*]);
top_2_total=sum(top_1, top_2);
if sum(top_1,top_2) > 0.9 * total then Flag90=1; else Flag90=0;
if top_1 > total * 0.6 then Flag60=1; else Flag60=0;
keep total top_1 top_2 _name_ top_2_total total Flag60 Flag90;
run;
%mend clearance_test;
%clearance_test(input_dataset=sashelp.cars, output_dataset=work.test, variable_to_consider=mpg_city mpg_highway);
That's the exact macro, just using a different input dataset. It works correctly on my machine (the flag variables are meaningless since the data isn't right for them, but the code works).
Run the same on your colleague's machine, and if it runs, then you know the data is the problem (ie, the dataset doesn't have the variables you think it does). If it doesn't run, then you have some other problem (perhaps an issue with how it's being submitted, maybe you end up with spurious characters or something).

Related

SAS - Loop through rows and calculate MD5

I want to sweep each table in a libname and calculate a hash over each row.
For that purpose, i have already a table with libname, memname, concatenated columns with ',' and number of observations
libname
memname
columns
num_obs
lib_A
table_A
col1a,col2a...colna
1
lib_A
table_B
col1b,col2b...colnb
2
lib_B
table_C
col1c,col2c...colnc
1
I first get all data into ranged macro variables (i think its easier to work, but could be wrong, ofc)
proc sql;
select libname, memname, columns, num_obs
into :lib1-, :tab1-, :column1-, :sqlobs1-
from have
where libname="&sel_livraria"; /*macro var = prompt from user*/
quit;
Just for developing guideline i made the code just to check one specific table without getting the row number of it since with a simple counter doesn't work (i get the order of the rows mess up each time i run) and it works for that purpose
%let lib=lib_A;
%let tab=table_B;
%let columns=col1b,col2b,colnb;
data want;
length check $32.;
format check $hex32.;
set &lib..&tab;
libname="&lib";
memname="&tab";
check = md5(cats(&columns));
hash = put(check,$hex32.);
keep libname memname hash;
put hash;
put _all_;
run;
So, what’s the best approach for getting a MD5 from each row (same order as tables) of all tables in a libname? I saw problems i couldn’t overcame using data steps, procs or macros.
The result i wanted if lib_A was selected in prompt were something like:
libname
memname
obs_row
hash
lib_A
table_A
1
64A29CCA15F53C83A9583841294A26AA
lib_A
table_B
1
80DAC7B9854CF71A67F9C00A7EC4D9EF
lib_A
table_B
2
0AC44CD79DAB2E33C93BB2312D3A9A40
Need some help.
Tks in advance.
You're pretty close. This is how I would approach it. We'll create a macro with three parameters: data, lib, and out. data is the dataset you have with the column information. lib is the library you want to pull from your dataset, and out is the output dataset that you want to have.
We'll read each column into an individual macro variable:
memname1
memname2
memname3
libname1
libname2
libname3
etc.
From here, we simply need to loop over all of the macro variables and apply them where appropriate. We can easily count how many there are in a data step. All we need to do is add double-ampersands to resolve them correctly. For more information on why this is, check out this MWSUG paper.
%macro get_md5(data=, lib=, out=);
/* Save all variables into macro variables:
memname1 memname2 ...
columns1 columns2 ...
*/
data _null_;
set &data.;
where upcase(libname)=upcase("&lib.");
call symputx(cats('memname', _N_), memname);
call symputx(cats('columns', _N_), columns);
call symputx(cats('obs', _N_), obs);
call symputx('n_datasets', _N_);
run;
/* Loop through all the datasets and access each macro variable */
%do i = 1 %to &n_datasets.;
/* Double ampersand needed:
First, resolve &i. to get &memname1
Then resolve &mename1 to get the value stored in the macro variable memname1
*/
%let memname = &&memname&i.;
%let columns = &&columns&i.;
%let obs = &&obs&i.;
/* Calculate md5 in a temporary dataset */
data _tmp_;
length lib $8.
memname $32.
obs_row 8.
hash $32.
;
set &lib..&memname.(obs=&obs.);
lib = "&lib.";
memname = "&memname.";
obs_row = _N_;
hash = put(md5(cats(&columns.)), $hex32.);
keep libname memname obs_row hash;
run;
/* Overwrite the dataset so we don't keep appending */
%if(&i. = 1) %then %do;
data &out.;
set _tmp_;
run;
%end;
%else %do;
proc append base=&out. data=_tmp_;
run;
%end;
%end;
/* Remove temporary data */
proc datasets lib=work nolist;
delete _tmp_;
quit;
%mend;
Example:
data have;
length libname memname columns $15.;
input libname$ memname$ columns$ obs;
datalines;
sashelp cars make,model,msrp 1
sashelp class age,height,name 2
sashelp comet dose,length,sample 1
;
run;
%get_md5(data=have, lib=sashelp, out=want);
Output:
libname memname obs_row hash
sashelp cars 1 258DADA4843E7068ABAF95667E881B7F
sashelp class 1 29E8F4F03AD2275C0F191FE3DAA03778
sashelp class 2 DB664382B88BE7E445418B1A1C8CE13B
sashelp comet 1 210394B77E7696506FDEFD78890A8AB9
I would make a macro that takes as input the four values in your metadata dataset. Note that commas are anathema to SAS programs, especially macro code, so make the macro so it can accept space delimited variable lists (like normal SAS program statements do).
To reduce the risk of name conflict I will name the variable using triple underscores and then rename them back to human friendly names when the dataset is written.
%macro next_ds(libname,memname,num_obs,varlist);
data next_ds;
length ___1 $8 ___2 $32 ___3 8 ___4 $32 ;
___1 = "&libname";
___2 = "&memname";
___3 + 1;
set &libname..&memname(obs=&num_obs keep=&varlist);
___4 = put(md5(cats(of &varlist)),$hex32.);
keep ___1-___4 ;
rename ___1=libname ___2=memname ___3=obs_row ___4=hash;
run;
%mend next_ds;
Let's make some test metadata that reference datasets everyone should have.
data have;
infile cards truncover ;
input libname :$8. memname :$32. num_obs columns $200.;
cards;
sashelp class 3 name,sex,age
sashelp cars 2 make,model
;
And make sure the target dataset does not already exists.
%if %sysfunc(exist(want)) %then %do;
proc delete data=want; run;
%end;
Now you can call that macro once for each observation in your source metadata dataset. There is no need to generated oodles of macro variables. Instead you can use CALL EXECUTE() to generate the macro calls directly from the dataset.
We can replace the commas in the column lists when making the macro call. You can add in a PROC APPEND step after each macro call to aggregate the results into a single dataset.
data _null_;
set have;
call execute(cats(
'%nrstr(%next_ds)(',libname,',',memname,',',num_obs
,',',translate(columns,' ',','),')'
));
call execute('proc append data=next_ds base=want force; run;');
run;
Notice that wrapping the macro call in %NRSTR() makes the SAS log easier to read.
1 + %next_ds(sashelp,class,3,name sex age)
2 + proc append data=next_ds base=want force; run;
3 + %next_ds(sashelp,cars,2,make model)
4 + proc append data=next_ds base=want force; run;
Results:
Obs libname memname obs_row hash
1 sashelp class 1 5425E9CEDA1DDEB71B2692A3C7050A8A
2 sashelp class 2 C532D227D358A3764C2D225DC8C02D18
3 sashelp class 3 13AD5F1517E0C4494780773B6DC15211
4 sashelp cars 1 777C60693BF5E16F38706C89301CD0A8
5 sashelp cars 2 07080C9321145395D1A2BCC10FBE6B83
Note that CATS() might not be the best method for generating the string to pass to the MD5() function. That can generate the same string for different combinations of the source variables. For example 'AB' || 'CD' is the same as 'A' || 'BCD'. Perhaps just use CAT() instead.
Stu's approach is nice, and will work most of the time but will fall over when you have wiiiide variables, a large number of variables, variables with large precision, and other edge cases.
So for the actual hashing part, you might consider this macro, which is extensively tested within Data Controller for SAS:
https://core.sasjs.io/mp__md5_8sas.html
Usage:
data _null_;
set sashelp.class;
hashvar=%mp_md5(cvars=name sex, nvars=age height weight);
put hashvar=;
run;

Using SAS macro

Here is code:
%macro do_regression(dep);
proc glimmix data=xxx;
class group id intv month1;
model &dep = group month1 group*month1/d=normal link=identity;
random intv(id);
lsmeans group*month1/ilink diff cl ;
lsmestimate group*month1 'bsl-3 ' 1 -1 0 0 -1 1 0 0/cl ;
lsmestimate group*month1 'bsl-6' 1 0 -1 0 -1 0 1 0/cl;
ods output LSMEstimates
run; quit;
%mend;
%do_regression(original total domain1)
Here is example of data structure:
Question: I am new to SAS macros and am working with a SAS macro code to run the following regression model for three outcome variables (original total domain1). I output the results using: ods output LSMEstimates which created three datasets named data1—data3 with the estimates. However, I cannot figure out how to attach the outcome variable names of these datasets. Eventually, I would only want the following to be stored in one final dataset that can “set” data1—data3: effect label estimate lower upper. [I only want to store estimates from the two lsmestimate statements shown that I am outputting using: ods output LSMEstimates]
To aggregate the datasets you can use PROC APPEND.
ods output LSMEstimates=lsm;
run;quit;
proc append data=lsm base=lsm_aggregate force;
run;
If the value/variable &DEP is not already in the dataset generated by the ODS OUTPUT statement then add a step to add it.
data lsm_dep ;
length dep $32 ;
dep = "&dep";
set lsm;
run;
proc append data=lsm_dep base=lsm_aggregate force;
run;
Make sure to remove the LSM_AGGREGATE dataset before running a new batch of models.
proc delete data=lsm_aggregate; run;
%do_regression(original )
%do_regression(total )
%do_regression(domain1)

Missing values in VARMAX

I have a dataset with visitors and weather variables. I'm trying to forecast visitors based on the weather variables. Since the dataset only consists of visitors in season there is missing values and gaps for every year. When running proc reg in sas it's all okay but the issue comes when i'm using proc VARMAX. I cannot run the regression due to missing values. How can i tackle this?
proc varmax data=tivoli4 printall plots=forecast(all);
id obs interval=day;
model lvisitors = rain sunshine averagetemp
dfebruary dmarch dmay djune djuly daugust doctober dnovember ddecember
dwednesday dthursday dfriday dsaturday dsunday
d_24Dec2016 d_05Dec2013 d_24Dec2017 d_24Dec2014 d_24Dec2015 d_24Dec2019
d_24Dec2018 d_24Sep2012 d_06Jul2015
d_08feb2019 d_16oct2014 d_15oct2019 d_20oct2016 d_15oct2015 d_22sep2017 d_08jul2015
d_20Sep2019 d_08jul2016 d_16oct2013 d_01aug2012 d_18oct2012 d_23dec2012 d_30nov2013 d_20sep2014 d_17oct2012 d_17jun2014
dFrock2012 dFrock2013 dFrock2014 dFrock2015 dFrock2016 dFrock2017 dFrock2018 dFrock2019
dYear2015 dYear2016 dYear2017
/p=7 q=2 Method=ml dftest;
garch p=1 q=1 form=ccc OUTHT=CONDITIONAL;
restrict
ar(3,1,1)=0, ar(4,1,1)=0, ar(5,1,1)=0,
XL(0,1,13)=0, XL(0,1,14)=0, XL(0,1,13)=0, XL(0,1,27)=0, XL(0,1,38)=0, XL(0,1,42)=0;
output lead=10 out=forecast;
run;
As with any forecast, you will first need to prepare your time-series. You should first run through your data through PROC TIMESERIES to fill-in or impute missing values. The impute choice that is most appropriate is dependent on your variables. The below code will:
Sum lvisitors by day and set missing values to 0
Set missing values of averagetemp to average
Set missing values of rain, sunshine, and your variables starting with d to 0 (assuming these are indicators)
Code:
proc timeseries data=have out=want;
id obs interval = day
setmissing = 0
notsorted
;
var lvisitors / accumulate=total;
crossvar averagetemp / accumulate=none setmissing=average;
crossvar rain sunshine d: / accumulate=none;
run;
Important Time Interval Consideration
Depending on your data, this could bias your error rate and estimates since you always know no one will be around in the off-season. If you have many missing values for off-season data, you will want to remove those rows.
Since PROC VARMAX does not support custom time intervals, you can instead create a simple time identifier. You can alternatively turn this into a format for proc format and converttime_id at the end.
data want;
set have;
time_id+1;
run;
proc varmax data=want;
id time_id interval=day;
...
output lead=10 out=myforecast;
run;
data myforecast;
merge myforecast
want(keep=time_id date)
;
by time_id;
run;
Or, if you made a format:
data myforecast;
set myforecast;
date = put(time_id, timeid.);
drop time_id;
run;

Logistic regression with BY statement ERROR message

I'm currently working on a SAS program that processes 50 logistic regression for 50 different samples. I previously had help on this thread (How to loop a logistic regression n number of times?), people advised me to use a BY statement to avoid looping this process n times. Works really well but I get this ERROR MESSAGE:
ERROR: No valid observations due either to missing values in the response, explanatory, frequency, or weight variable, or to
nonpositive frequency or weight values.
NOTE: The above message was for the following BY group:
Sample Replicate Number=.
You'll find my code below, if any of you have an idea of where does it come from, I'm open to anything, thank you in advance!
proc surveyselect data=TOP_1 NOPRINT out=ALEA_1
seed=0
method=urs
outhits
reps=5
n=300;
run;
proc surveyselect data=TOP_0 NOPRINT out=ALEA_0
seed=0
method=urs
outhits
reps=5
n=300;
run;
PROC SQL;
CREATE TABLE APPEND_TABLE As
SELECT * FROM ALEA_1
OUTER UNION CORR
SELECT * FROM ALEA_0;
QUIT;
/* Régression logistique*/
DATA WORK.TMP0TempTableAddtnlPredictData;
SET WORK.APPEND_TABLE(IN=__ORIG) WORK.BASE_PREDICT_2;
__FLAG=__ORIG;
__DEP=TOP_CREDIT_HABITAT_2017;
if not __FLAG then TOP_CREDIT_HABITAT_2017=.;
RUN;
PROC SQL;
CREATE VIEW WORK.SORTTempTableSorted AS
SELECT *
FROM WORK.TMP0TempTableAddtnlPredictData
ORDER BY REPLICATE;
QUIT;
TITLE;
TITLE1 "Résultats de la régression logistique";
FOOTNOTE;
FOOTNOTE1 "Généré par le Système SAS (&_SASSERVERNAME, &SYSSCPL) le %TRIM(%QSYSFUNC(DATE(), NLDATE20.)) à %TRIM(%SYSFUNC(TIME(), TIMEAMPM12.))";
PROC LOGISTIC DATA=WORK.SORTTempTableSorted
PLOTS(ONLY)=ROC
;
By Replicate;
CLASS age_classe (PARAM=EFFECT) Flag_bq_principale (PARAM=EFFECT) flag_univers_detenus (PARAM=EFFECT) csp_1 (PARAM=EFFECT) SGMT_FIDELITE (PARAM=EFFECT) situ_fam_1 (PARAM=EFFECT);
MODEL TOP_CREDIT_HABITAT_2017 (Event = '1') [...6## Heading ##] /
SELECTION=STEPWISE
SLE=0.1
SLS=0.1
INCLUDE=0
LINK=LOGIT
;
OUTPUT OUT=WORK.PREDLogRegPredictions(LABEL="Statistiques et prédictions de régression logistique pour WORK.APPEND_TABLE" WHERE=(NOT ws__FLAG))
PREDPROBS=INDIVIDUAL;
RUN;
QUIT;
DATA WORK.PREDLogRegPredictions;
set WORK.PREDLogRegPredictions;
TOP_CREDIT_HABITAT_2017=__DEP;
_FROM_=__DEP;
DROP __DEP;
DROP __FLAG;
RUN ;
QUIT ;
/* Création du fichier de sorti final*/
PROC SQL;
CREATE TABLE MODELE_RESULTS As
SELECT IDCLI_CALCULE, IP_1
FROM PREDLogRegPredictions;
RUN;
QUIT;
ODS GRAPHICS OFF;
Probably from this:
DATA WORK.TMP0TempTableAddtnlPredictData;
SET WORK.APPEND_TABLE(IN=__ORIG) WORK.BASE_PREDICT_2;
__FLAG=__ORIG;
__DEP=TOP_CREDIT_HABITAT_2017;
if not __FLAG then TOP_CREDIT_HABITAT_2017=.;
RUN;
You're appending a dataset that does not have a replicate number on it here. I'm not really sure I follow what this dataset is - are you intending this to be added to each replicate perhaps? Then you might do something like this (untested):
DATA WORK.TMP0TempTableAddtnlPredictData;
do _n_ = 1 by 1 until (eof);
SET WORK.APPEND_TABLE(IN=__ORIG) end=eof;
output;
end;
do replicate = 1 to 5;
do n_predict = 1 to nobs_predict;
set WORK.BASE_PREDICT_2 nobs=nobs_predict point=n_predict;
__FLAG=__ORIG;
__DEP=TOP_CREDIT_HABITAT_2017;
if not __FLAG then TOP_CREDIT_HABITAT_2017=.;
output;
end;
end;
stop;
RUN;
This is the complicated way to get 5 copies of that, one for each replicate. But I'm not sure that's actually what you want - does it even have all of the variables you need? Are you sure you didn't mean to MERGE instead of SET?
Separately, I don't understand why you use the SQL step to append the two samples. I'd either do that in the same data step here or I'd use PROC APPEND, both would be faster than the SQL union and then immediately appending more to the dataset.

proc report print null dataset

I have a null dataset such as
data a;
if 0;
run;
Now I wish to use proc report to print this dataset. Of course, there will be nothing in the report, but I want one sentence in the report said "It is a null dataset". Any ideas?
Thanks.
You can test to see if there are any observations in the dataset first. If there are observations, then use the dataset, otherwise use a dummy dataset that looks like this and print it:
data use_this_if_no_obs;
msg = 'It is a null dataset';
run;
There are plenty of ways to test datasets to see if they contain any observations or not. My personal favorite is the %nobs macro found here: https://stackoverflow.com/a/5665758/214994 (other than my answer, there are several alternate approaches to pick from, or do a google search).
Using this %nobs macro we can then determine the dataset to use in a single line of code:
%let ds = %sysfunc(ifc(%nobs(iDs=sashelp.class) eq 0, use_this_if_no_obs, sashelp.class));
proc print data=&ds;
run;
Here's some code showing the alternate outcome:
data for_testing_only;
if 0;
run;
%let ds = %sysfunc(ifc(%nobs(iDs=for_testing_only) eq 0, use_this_if_no_obs, sashelp.class));
proc print data=&ds;
run;
I've used proc print to simplify the example, but you can adapt it to use proc report as necessary.
For the no data report you don't need to know how many observations are in the data just that there are none. This example shows how I would approach the problem.
Create example data with zero obs.
data class;
stop;
set sashelp.class;
run;
Check for no obs and add one obs with missing on all vars. Note that no observation are every read from class in this step.
data class;
if eof then output;
stop;
modify class end=eof;
run;
make the report
proc report data=class missing;
column _all_;
define _all_ / display;
define name / order;
compute before name;
retain_name=name;
endcomp;
compute after;
if not missing(retain_name) then l=0;
else l=40;
msg = 'No data for this report';
line msg $varying. l;
endcomp;
run;