Convert the number of observations in a dataset into a macro variable - sas

I am trying to determine the number of observations in a dataset, then convert this number into a macro variable that i can use as part of a loop. I've searched the web for answers and not had much luck. I would post some example code I've tried but I have literally no idea how to approach this.
Could anybody assist?
Thanks
Chris

SAS stores dataset information, such as number of observations, separately, so the key is to access this information without having to read in the entire dataset.
The following code will do just that, the if 0 part is never true so the dataset isn't read, however the information is.
data _null_;
if 0 then set sashelp.class nobs=n;
call symput('numobs',n);
stop;
run;
%put n=&numobs;

You can also get it from dictionary.tables like this:
proc sql noprint;
select nobs into :nobs
from dictionary.tables
where libname='YourLibrary' and memname='YourDatasetName';
quit;

Here it is:
Create macro variable:
data _null_;
set sashelp.class;
call symput("nbobs",_N_);
run;
See result:
%put &nbobs;
Use it:
data test;
do i = 1 to &nbobs;
put i;
end;
run;

Related

How to skip code if created dataset has zero rows

I have a job which at first imports some xlsx files, then connects to multiple DB tables. Based on conditions, the job selects rows to output, and creates an excel file to send on to the final end-user.
Sometimes, that job returns zero rows, which is acceptable; in that case, I would prefer to create an empty excel file with only the variables, but not run the other code (checking/cleaning code).
How can I conditionally execute code only when there are results?
Something like this:
I get 0 rows
If Result = 0 then Go to *"here"*
Else *"just run the code further"*
You have a few useful things that can help you here.
First off, PROC SQL sets a macro variable SQLOBS, which is particularly useful in identifying how many records were returned from the last SQL query it ran.
proc sql;
select * from sashelp.class;
quit;
%put I returned &SQLOBS rows;
You might use this to drive further processing, either with %IF blocks as Tom notes in comments or other methods I will cover below.
You can also check how many rows are in a dataset explicitly, if you prefer a slightly more robust option.
proc sql;
select count(*) into :class_count from sashelp.class;
quit;
%put I returned &class_count rows;
For very large datasets, there are faster options (using the dataset descriptors, dictionary tables, or a few other options), but for most tables this is fine.
Either way, what I would typically do with a program I intended to run in production would be then to drive the rest of the program from macros.
%macro whatIWantToDo(params);
...
do stuff
...
%mend whatIWantToDo;
proc sql;
mySqlStuff;
quit;
%if &sqlobs. gt 0 %then %do;
%whatIWantToDo(params);
%end;
%else %do;
%put Nothing to do;
%end;
Another option is to use call execute; this is appropriate if your data drives the macro parameters. The big advantage of call execute is that it only runs if you have data rows - if you have zero, it won't do anything!
Say you have some datasets to run code on. You could have up to twelve - one per month - but only have them for the current calendar year, so in Jan you have one, Feb you have two, etc. You could do this:
data mydata_jan mydata_feb mydata_mar;
set sashelp.class;
run;
%macro printit(data=);
title "Printing &data.";
proc print data=&data;
run;
title;
%mend printit;
data _null_;
set sashelp.vtable;
where upcase(memname) like 'MYDATA_%' and nobs gt 0;
callstr = cats('%printit(data=',memname,')');
call execute(Callstr);
run;
First I make the datasets, with a name I can programmatically identify. Then I make the macro that I want to run on each (this could be checking, cleaning, whatever). Then I use sashelp.vtable which shows which tables are created, and check the nobs variable (number of observations) is more than zero. Then I use call execute to run the macro on that dataset!

Is there any way to apply conditional logic for keeping/dropping columns in SAS?

I'm trying to drop particular columns in SAS based on a macro variable, my hands are slightly tied in terms of what code I can use - so I need a solution in BASE SAS.
I've already tried wrapping the drop/keep in an if, but I know the drop happens at run time so this won't work.
Example:
data dropsomecolumns;
if &somemacro =1 then do;
drop somecol1 somecol2;
end;
run;
You either need to use macro code to conditionally generate the code you want.
data dropsomecolumns;
set have;
%if &somemacro =1 %then %do;
drop somecol1 somecol2;
%end;
run;
Or change so that the macro variable has the list of columns to drop.
%let drop_columns=somecol1 somecol2;
data dropsomecolumns(drop=&drop_columns);
set have;
run;
Note that the drop statement will give a warning if no variables are listed, but the drop= dataset option will not give that warning.
You have have no variables listed and it will work fine and assumes not are to be dropped. So if the macro variable is a list of variables just use:
drop &drop_columns;
This works fine:
data demo;
set sashelp.class;
drop ;
run;
So no conditional logic needed.

SAS: proc reg and macro

i have a data that contain 30 variable and 2000 Observations.
I want to calculate regression in a loop, whan in each step I delete the i row in the data.
so in the end I need thet my output will be 2001 regrsion, one for the regrsion on all the data end 2000 on each time thet I drop a row.
I am new to sas, and I tray to find how to do it withe macro, but I didn't understand.
Any comments and help will be appreciated!
This will create the data set I was talking about in my comment to Chris.
data del1V /view=del1v;
length group _obs_ 8;
set sashelp.class nobs=nobs;
_obs_ = _n_;
group=0;
output;
do group=1 to nobs;
if group eq _n_ then;
else output;
end;
run;
proc sort out=analysis;
by group;
run;
DATA NEW;
DATA OLD;
do i = 1 to 2001;
IF _N_ ^= i THEN group=i;
else group=.;
output;
end;
proc sort data=new;
by group;
proc reg syntax;
by group;
run;
This will create a data set that is much longer. You will only call proc reg once, but it will run 2001 models.
Examining 2001 regression outputs will be difficult just written as output. You will likely need to go read the PROC REG support documentation and look into the output options for whatever type of output you're interested in. SAS can create a data set with the GROUP column to differentiate the results.
I edited my original answer per #data null suggestion. I agree that the above is probably faster, though I'm not as confident that it would be 100x faster. I do not know enough about the costs of the overhead of proc reg versus the cost of the group by statement and a larger data set. Regardless the answer above is simpler programming. Here is my original answer/alternate approach.
You can do this within a macro program. It will have this general structure:
%macro regress;
%do i=1 %to 2001;
DATA NEW;
DATA OLD;
IF _N_=&I THEN DELETE;
RUN;
proc reg syntax;
run;
%end;
%mend;
%regress
Macros are an advanced programming function in SAS. The macro program is required in order to do a loop of proc reg. The %'s are indicative of macro functions. &i is a macro variable (& is the prefix of a macro variable that is being called). The macro is created in a block that starts and ends with %macro / %mend, and called by %regress.
Examining 2001 regression outputs will be difficult just written as output. You will likely need to go read the PROC REG support documentation and look into the output options for whatever type of output you're interested in. Use &i to create a different data set each time and then append together as part of the macro loop.

Single and Double Hash in Proc sql

I have 12 columns and I want to add them through sql. I have tried:
proc sql;
select*,sum(a1-a12) as total
from tablename;
quit;
However this isn't working. Is there an alternative or can we use single and double hash only in Data steps.
If you want to add values in the same observation then you need to use SAS function sum(,...) and not the SQL aggregate function sum(). You current code looks like the later since it only has one value listed, the difference between variables A1 and A12. This is because PROC SQL does not recognize variable lists. You will need to list all of your variables.
select *,sum(a1,a2,a3,a4,a5,a6,a7,a8,a9,a10,a11,a12) as total
from have
;
If you want this in SQL because you're making use of other SQL functionality in addition to this, make a view.
data have_v/view=have_v;
set have;
total = sum(of a1-a12);
run;
proc sql;
select * from have_v; *presumably you do other things here;
quit;
In some cases you do not know how many variables there are or you don't want to hard code it. The syntax is in this case: sum(of < variable>:);
data test;
a1=1;
a2=2;
/*number 3 is missing*/
a4=4;
a5=5;
run;
data test2;
set test;
sum_of_all_As= sum(of a:);
run;
For more tips and tricks see: http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000245953.htm

Using a series of values for a SAS macro parameter

I'm looking for a way to use a series of values for a macro parameter instead of a single value. I'm basically manipulating a series of files for consecutive months (May 2014 to Sept 2015) and I've written a macro to take advantage of the naming conventions. However, I'm still manually writing out the months to use the macro. I'm doing this many times over with lots of different files from this month. Is there a way to have the parameter reference a list of values and go through them like an array/do-loop? I've looked into %ARRAY as a possibility but that doesn't seem to do what I'm looking for unless I'm not seeing it's full capability. I've attached a sample of this code below.
%MACRO memmonth(monyr=);
proc freq data=work.both_&monyr ;
table var1/ out=work.mem_&monyr;
run;
data work.mem_&monyr;
set work.mem_&monyr;
count_&monyr=count;
run;
%MEND memmonth;
%memmonth(monyr=may14)
%memmonth(monyr=jun14)
%memmonth(monyr=jul14)
%memmonth(monyr=aug14)
%memmonth(monyr=sep14)
%memmonth(monyr=oct14)
%memmonth(monyr=nov14)
%memmonth(monyr=dec14)
%memmonth(monyr=jan15)
%memmonth(monyr=feb15)
%memmonth(monyr=mar15)
%memmonth(monyr=apr15)
%memmonth(monyr=may15)
%memmonth(monyr=jun15)
%memmonth(monyr=jul15)
%memmonth(monyr=aug15)
%memmonth(monyr=sep15)
In general I would recommend passing the list of values as a space delimited list and adding looping logic in the macro. If spaces are valid characters in the values then use some other delimiter. Do not use comma as the delimiter as it means you will need to use macro quoting to call the macro.
So your basic macro is this.
%macro memmonth(monyr);
proc freq data=work.both_&monyr ;
table var1/ out=work.mem_&monyr (rename=(count=count_&monyr)) ;
run;
%mend memmonth;
%memmonth(may14)
%memmonth(jun14)
You could change it to this.
%macro memmonth(monyrlist);
%local i monyr;
%do i=1 %to %sysfunc(countw(&monyrlist));
%let monyr=%scan(&monyrlist,&i);
proc freq data=work.both_&monyr ;
table var1/ out=work.mem_&monyr (rename=(count=count_&monyr)) ;
run;
%end;
%mend memmonth;
%memmonth(may14 jun14)
If you always want to process all of the months in an interval then you could just pass in the start and end month of the interval.
%macro memmonth(start,end);
%local i monyr;
%do i=0 %to %sysfunc(intck(month,"01&start"d,"01&end"d));
%let monyr=%sysfunc(intnx(month,"01&start"d,&i),monyy5.);
proc freq data=work.both_&monyr ;
table var1/ out=work.mem_&monyr (rename=(count=count_&monyr)) ;
run;
%end;
%mend memmonth;
%memmonth(may14,sep15)
If you have a source list, whether it is a text file or a dataset, you can use a simple data step to generate macro calls. So if you have an input dataset with the variable MONYR then your driver program would look like this:
data _null_;
set mylist ;
call execute(cats('%nrstr(memmonth)(',MONYR,')'));
run;
If the source is a file with the names then replace the SET statement with the appropriate INFILE and INPUT statements. If the source is a directory name then look at one of the many SAS ways to read the names of files in a directory into a dataset and use that to drive the macro call generation.