SAS: Get number of variables in current data step - sas

I need a way to dynamically return the number of variables in the current data step.
Using SAS NOTE 24671: Dynamically determining the number of observations and variables in a SAS data set, I have come up with the following macro.
%macro GetVarCount(dataset);
/* Open assigns ID to open data set. Assigns 0 if DNE */
%let exists = %sysfunc(open(&dataset));
%if &exists %then
%do;
%let returnValue = %sysfunc(attrn(&exists, nvars));
%let closed = %sysfunc(close(&exists));
%end;
/* Output error if no dataset */
%else %put %sysfunc(sysmsg());
&returnValue
%mend;
Unfortunately, this errors out on an initial pass of a data set since the data set has not yet been created. After the first pass, and a dataset with 0 observations has been created, the macro can access the table and the number of variables.
For instance,
data example;
input x y;
put "NOTE: [DEV] There are %GetVarCount(example) variables in the EXAMPLE data set.";
datalines;
1
2
;
run;
The first run produces:
ERROR: File WORK.EXAMPLE.DATA does not exist.
WARNING: Apparent symbolic reference RETURNVALUE not resolved.
NOTE: [DEV] There are &returnValue variables in the EXAMPLE data set.
The second run produces:
NOTE: [DEV] There are 2 variables in the EXAMPLE data set.
Is there a way to get the number of variables in a data set first time the data step is run?

In your example, you're trying to determine the number of active variables in a data step - this isn't necessarily the same as the number of variables that will be in the output data set, because (a) there might not be an output data set and (b) some of the variables might get dropped.
With that caveat in mind, if you really want to do that, then this works:
data fred;
length x y z $ 20 f g 8;
array vars_char _character_;
array vars_num _numeric_;
total_vars = dim(vars_char) + dim(vars_num);
put "Vars in data step: " total_vars;
run;
This works by using the special _character_ and _numeric_ keywords to create arrays of all character and numeric vars in the current buffer, and the dim() function to get the sizes of those arrays.
It will only count variables that exist when the arrays are declared, so it doesn't count total_vars in this case.
You could wrap this in a macro like:
%macro var_count(var_count_name):
array vars_char _character_;
array vars_num _numeric_;
&var_count_name = dim(vars_char) + dim(vars_num);
%mend;
and then use it like:
data fred;
length x y z $ 20 f g 8;
%var_count(total_vars);
put "Vars in data step: " total_vars;
run;

Try to open a dataset that has already been created.
The 'open' function requires the dataset that WILL be open to exist, I think you want 'open' to give you an ID of the already open dataset; that is not the case.
The reason it works only after the first pass (not just the second), is because the first pass created an empty dataset with metadata regarding the variables it contains.
Use a library to permanently store your dataset first and then try your macro to read from it:
Data <lib>.dataset;
update:
#Reeza already gave you the answer in the comments.
Another alternative:
Using put _all_; will print all the variables to the log, if you write the put into a file and then read it and count the '=' signs you can get the variable count too. Just remove _n_ and _ERROR_ from the count.

Related

is there a way to keep and update a "list" in a %let variable?

Still pretty new (and struggling!) to SAS, here's something that I'd really like to be able to do but just can't figure out: keep and update/append to a list throughout a SAS script.
Situation: my job involves creating/sorting datasets based on a large database, for others to use. E.g. we receive a list of selection criteria and variables requested that we use to create a dataset for research purposes. Some of the requested variables will be 'delivery ready' within our database, others we have to compute/create in the requested format. We write SAS scripts that document the entire process from selection to writing of the dataset for delivery. Which means that towards the end there is a step were we select from all the variables in the dataset that we work in, only the variables that we want to deliver to our 'clients'. What I would really like to do is to 'build' the list of variables for delivery 'as I go', i.e. add the name of each variable that I created or have verified to be ready for delivery to a list called "varstodeliver", so that at the end I can simply tell it to select all vars in "varstodeliver". Is this possible?
This is how far I've got:
%let varstodeliver = IDvar;
%put &varstodeliver; * prints IDvar;
data _null_;
call symputx("varstodeliver", catx(" ", vname(&varstodeliver.), "var1 var2"));
run;
%put &varstodeliver; * prints IDvar var1 var2 ;
Note that I ended up resorting to the use of 'vname( )' in order to get the actual name stored in &varstodeliver. So far so good, but if I then attempt to add a fourth variable name (or rather: a third addition, since the previous addition was two variablenames in one go), it stalls, due to too many levels of &varstodeliver:
data _null_;
call symputx("varstodeliver", catx(" ", vname(&varstodeliver), "var3"));
run;
ERROR: The VNAME function call has too many arguments.
Input or ideas how else to keep a running tally are very welcome!!
p.s. among the things I've tried is this:
data _null_;
call symputx("varsteleveren", catx(" ", vlist(vname(&varsteleveren.(*))), "var3"));
run;
which returns:
ERROR: Undeclared array referenced: var2.
ERROR: The ARRAYNAME[*] specification requires an array.
ERROR: The VNAME function call has too many arguments.
If you just want to add names of variables to macro variable and keep them all, you can use %let:
%let varstodeliver = IDvar;
%put &=varstodeliver;
VARSTODELIVER=IDvar
%let varstodeliver= &varstodeliver var1 var2;
%put &=varstodeliver;
VARSTODELIVER=IDvar var1 var2
%let varstodeliver= &varstodeliver var3;
%put &=varstodeliver;
VARSTODELIVER=IDvar var1 var2 var3

Is there a way to skip missing data sets when iterating through names?

I have a few data sets in SAS which I am trying to collate into one larger set which I will be filtering later. They're all called something like table_201802. My problem is that there are a few missing months (i.e. there exists table201802 and table201804 and up, but not table201803.
I'm new enough to SAS, but what I've tried so far is to create a new data set called output testing and ran a macro loop iterating over the names (they go from 201802 to 201903, and they're monthly data so anything from 812 to 900 won't exist).
data output_testing;
set
%do i=802 %to 812;
LIBRARY.table_201&i
%end;
;
run;
%mend append;
I want the code to ignore missing tables and just look for ones that do exist and then append them to the new output_testing table.
If the table name prefix is distinct, and you are confident the data structures amongst the tables are consistent (variable names, types and lengths are the same) then the table can be stacked using table name prefix lists (:)
For a specific known range of table names you can also use numbered range lists (-) tab
data have190101 have190102 have190103;
x =1;
run;
data want_version1_stack; /* any table name that starts with have */
set have:;
run;
data want_version1b_stack; /* 2019 and 2020 */
set have19: have20:;
run;
options nodsnferr;
data want_version2_stack; /* any table names in the iterated numeric range */
set have190101-have191231;
run;
options dsnferr;
From helps
Using Data Set Lists with SET
You can use data set lists with the SET
statement. Data set lists provide a quick way to reference existing
groups of data sets. These data set lists must either be name prefix
lists or numbered range lists.
Name prefix lists refer to all data
sets that begin with a specified character string. For example, set
SALES1:; tells SAS to read all data sets that start with "SALES1" such
as SALES1, SALES10, SALES11, and SALES12. >
Numbered range lists
require you to have a series of data sets with the same name, except
for the last character or characters, which are consecutive numbers.
In a numbered range list, you can begin with any number and end with
any number. For example, these lists refer to the same data sets:
sales1 sales2 sales3 sales4
sales1-sales4
Some macro code with proc append should solve the problem.
%let n = 10;
%macro get_list_table;
%do i = 1 %to &n;
%let dsn = data&n;
%if %sysfunc(exist(&dsn)) %then %do;
proc append data = &dsn base = appended_data force;
run;
%end;
%end;
%mend;
You can use shortcuts:
data output_testing;
set LIBRARY.table_201:
;
run;
but in this case you will get in set all tables that start with "table_201".
For example:
LIBRARY.table_201tablesss LIBRARY.table_201ed56

macro variable is uninitialized after %let statement in sas

I want to create something in SAS that works like an Excel lookup function. Basically, I set the values for macro variables var1, var2, ... and I want to find their index number according to the ref table. But I get the following messages in the data step.
NOTE: Variable A is uninitialized.
NOTE: Variable B is uninitialized.
NOTE: Variable NULL is uninitialized.
When I print the variables &num1,&num2, I get nothing. Here is my code.
data ref;
input index varname $;
datalines;
0 NULL
1 A
2 B
3 C
;
run;
%let var1=A;
%let var2=B;
%let var3=NULL;
data temp;
set ref;
if varname=&var1 then call symput('num1',trim(left(index)));
if varname=&var2 then call symput('num2',trim(left(index)));
if varname=&var3 then call symput('num3',trim(left(index)));
run;
%put &num1;
%put &num2;
%put &num3;
I can get the correct values for &num1,&num2,.. if I type varname='A' in the if-then statement. And if I subsequently change the statement back to varname=&var1, I can still get the required output. But why is it so? I don't want to input the actual string value and then change it back to macro variable to get the result everytime.
Solution to immediate problem
You need to wrap your macro variables in double quotes if you want SAS to treat them as string constants. Otherwise, it will treat them the same way as any other random bits of text it finds in your data step.
Alternatively, you could re-define the macro vars to include the quotes.
As a further option, you could use the symget or resolve functions, but these are not usually needed unless you want to create a macro variable and use it again within the same data step. If you use them as a replacement for double quotes they tend to use a lot more CPU as they will evaluate the macro vars once per row by default - normally, macro vars are evaluated just once, at compile time, before your code executes.
A better approach?
For the sort of lookup you're doing, you actually don't need to use a dataset at all - you can instead define a custom format, which gives you much more flexibility in how you can use it. E.g. this creates a format called lookup:
proc format;
value lookup
1 = 'A'
2 = 'B'
3 = 'C'
other = '#N/A' /*Since this is what vlookup would do :) */
;
run;
Then you can use the format like so:
%let testvar = 1;
%let testvar_lookup = %sysfunc(putn(&testvar, lookup.));
Or in a data step:
data _null_;
var1 = 1;
format var1 lookup.;
put var1=;
run;

SAS creating and populating dataset variables from macro variables

I have a group of data sets where certain variables have been defined as having lengths >2000 characters. What I want to do is create a macro that identifies these variables and then creates a set of new variables to hold the values.
doing this in base code would be something like:
data new_dset;
set old_dset:
length colnam1 colnam2 colnam3 2000.;
colnam1 = substr(long_column,1,2000);
colnam2 = substr(long_column,2001,2000);
run;
I can build up the list of variable names and lengths as a set of macro variables, But I don't know how to create the new variables from the macro variables.
What I was thinking it would look like is:
%macro split;
data new_dset;
set old_dset;
%do i = 1%to &num_cols;
if &&collen&i > 2000 then do;
&&colnam&i 1 = substr(&&colnam&i,1,2000);
end;
%en;
run;
%mend;
I know that doesn't work, but that's the idea I have.
If anyone can help em work out how I can do this I would be very grateful.
Thanks
Bryan
Your macro doesn't need to be an entire data step. In this case it's helpful to see exactly what you're replicating and then write a macro based on that.
So your code is:
data new_dset;
set old_dset:
length colnam1 colnam2 colnam3 2000.;
colnam1 = substr(long_column,1,2000);
colnam2 = substr(long_column,2001,2000);
run;
Your macro then really needs to be:
length colnam1 colnam2 colnam3 2000.;
colnam1 = substr(long_column,1,2000);
colnam2 = substr(long_column,2001,2000);
So what you can do is put that in a macro:
%macro split(colname=);
length &colname._1 &colname._2 $2000;
&colname._1 = substr(&colname.,1,2000);
&colname._2 = substr(&colname.,2001,4000);
%mend;
Then you generate a list of calls:
proc sql;
select cats('%split(colname=',name,')') into :calllist separated by ' '
from dictionary.columns
where libname = 'WORK' and memname='MYDATASET'
and length > 2000;
quit;
Then you run them:
data new_dset;
set old_dset;
&calllist;
run;
Now you're done :) &calllist contains a list of %split(colname) calls. If you may need more than 2 variables (ie, > 4000 length), you may want to add a new parameter 'length'; or if you're in 9.2 or newer you can just use SUBPAD instead of SUBSTR and generate all three variables for each outer variable.

Categorical variables with macro

I am trying to create categorical variables in sas. I have written the following macro, but I get an error: "Invalid symbolic variable name xxx" when I try to run. I am not sure this is even the correct way to accomplish my goal.
Here is my code:
%macro addvars;
proc sql noprint;
select distinct coverageid
into :coverageid1 - :coverageid9999999
from save.test;
%do i=1 %to &sqlobs;
%let n=coverageid&i;
%let v=%superq(&n);
%let f=coverageid_&v;
%put &f;
data save.test;
set save.test;
%if coverageid eq %superq(&v)
%then &f=1;
%else &f=0;
run;
%end;
%mend addvars;
%addvars;
You're combining macro code with data step code in a way that isn't correct. %if = macro language, meaning you are actually evaluating whether the text "coverageid" is equal to the text that %superq(&v) evaluates to, not whether the contents of the coverageid variable equal the value in &v. You could just convert %if to if, but even if you got that to work properly it would be hideously inefficient (you're rewriting the dataset N times, so if you have 1500 values for coverageID you rewrite the entire 500MB dataset or whatnot 1500 times, instead of just once).
If what you want to do is take the variable 'coverageid' and convert it to a set of variables that consist of all possible values of coverageid, 1/0 binary, for each, there are a nubmer of ways to do it. I'm fairly sure the ETS module has a procedure that just does this, but I don't recall it off the top of my head - if you were to post this to the SAS mailing list, one of the guys there would undoubtedly have it quickly.
The simple way for me, is to do this with entirely datastep code. First determine how many potential values there are for COVERAGEID, then assign each to a direct value, then assign the value to the correct variable.
If the COVERAGEID values are consecutive (ie, 1 to some number, no skips, or you don't mind skipping) then this is easy - set up an array and iterate over it. I will assume they are NOT consecutive.
*First, get the distinct values of coverageID. There are a dozen ways to do this, this works as well as any;
proc freq data=save.test;
tables coverageid/out=coverage_values(keep=coverageid);
run;
*Then save them into a format. This converts each value to a consecutive number (so the lowest value becomes 1, the next lowest 2, etc.) This is not only useful for this step, but it can be useful in the future in converting back.;
data coverage_values_fmt;
set coverage_values;
start=coverageid;
label=_n_;
fmtname='COVERAGEF';
type='i';
call symputx('CoverageCount',_n_);
run;
*Import the created format;
proc format cntlin=coverage_values_fmt;
quit;
*Now use the created format. If you had already-consecutive values, you could skip to this step and skip the input statement - just use the value itself;
data save.test_fin;
set save.test;
array coverageids coverageid1-coverageid&coveragecount.;
do _t = 1 to &coveragecount.;
if input(coverageid,COVERAGEF.) = _t then coverageids[_t]=1;
else coverageids[_t]=0;
end;
drop _t;
run;
Here's another way that doesn't use formats, and may be easier to follow.
First, just make some test data:
data test;
input coverageid ##;
cards;
3 27 99 105
;
run;
Next, create a data set with no observations but one variable for each level of coverageid. Note that this approach allows arbitrary values here.
proc transpose data=test out=wide(drop=_name_);
id coverageid;
run;
Finally, create a new data set that combines the initial data set and the wide one. Then, for each level of x, look at each categorical variable and decide whether to turn it "on".
data want;
set test wide;
array vars{*} _:;
do i=1 to dim(vars);
vars{i} = (coverageid = substr(vname(vars{i}),2,1));
end;
drop i;
run;
The line
vars{i} = (coverageid = substr(vname(vars{i}),2));
may require more explanation. vname returns the name of the variable, and since we didn't specify a prefix in proc transpose, all variables are named something like _1, _2, etc. So we take the substring of the variable name that starts in the second position, and compare it to coverageid; if they're the same, we set the variable to 1; otherwise it evaluates to 0.