'Unordered' by group processing in SAS - sas

Is there a way to use by group processing in SAS when the data is grouped together but is out of order?
data sample;
input x;
datalines;
3
3
1
1
2
2
;
run;
Try to print out the first of each group:
data _null_;
set sample;
by x;
if first.x then do;
put _all_;
end;
run;
Results in the below error:
x=3 FIRST.x=1 LAST.x=0 _ERROR_=0 _N_=1
ERROR: BY variables are not properly sorted on data set WORK.SAMPLE.
x=3 FIRST.x=1 LAST.x=0 _ERROR_=1 _N_=2
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 3 observations read from the data set WORK.SAMPLE.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.00 seconds
And just to reiterate - I do not want to sort the grouped data first - I need to process it in this order. I know I could create a proxy variable to sort on using an intermediary datastep and either a retain statement or the lag() function but I'm really looking for a solution that avoids this step. Also, I'd like to use the first and last keywords in my by-group processing.

Use the NOTSORTED option on your BY statement:
data _null_;
set sample;
by x NOTSORTED;
if first.x then do;
put _all_;
end;
run;

Related

How to get last two observations from a dataset in SAS without using sort or First. and Last. or Do loop

Below code will solve for getting last 2 observations from the dataset without using loops, first & last dot concept or sorting.
data a;
set sashelp.cars nobs=_nobs_;/*create the temporary variable to store total no of obs*/
if _N_ ge _nobs_-1;/*Now compare the automatic variable _N_ to _nobs_*/
run;
Not sure there is a question here.
You can also use the POINT= option on the SET statement. You have to explicitly end the data step since most data steps end when they read past the end of the input data and this step cannot do that.
data want;
do p=max(1,nobs-1) to nobs;
set have point=p nobs=nobs;
output;
end;
stop;
run;
A DATA step is an implicit loop with an automatic index variable _N_. You can leverage that fact to implicitly output rows without an explicit DO loop. As per #Tom, the point= option is used so the entire data set does not need to be read to reach the last two rows.
Example:
data want;
if _N_ > min(2,_Z_) then stop;
_P_ = _Z_ - min(2,_Z_) + _N_;
set sashelp.class point=_P_ nobs=_Z_;
run;

What is the difference with on one set statement in do loop and multi-set statements?

I am learning the skill of using double set statement and got a trouble in the following code:
data test1;
do i = 1 to 2;
set sashelp.class;
end;
run;
data test2;
set sashelp.class;
set sashelp.class;
run;
Test1 has 9 observations(all of the even rows) and Test2 has 19 observations, can somebody explain this for me?
The SAS output statement writes out observations to your output data set. When no explicit output statement is used (as in your data steps) an implicit output at the end of the data step outputs the current observation to the output data set.
In your first data step the do loop causes the set statement to be executed twice, the first time reading obs #1, the second time reading obs #2. The loop finishes and the next statement is run, so the implicit output outputs the current observation which is #2. The next iteration of the data step causes the do loop to read obs #3 and then #4, so the last obs (#4) is output, and so on until the end of the data set.
The second data step executes the first set statement reading in obs #1, then it executes the second set statement, reading obs #1 from that input data set, overwriting the current observation. The implicit output causes this obs to be written out. The data step reiterates causing the same to happen to obs #2, and so on until all 19 obs are read and output.
Inserting some diagnostics can help understand what is happening, e.g submit the following and check the log:
data test1;
do i = 1 to 2;
set sashelp.class;
putlog 'In loop: ' i= name=;
end;
putlog 'About to output: ' name=;
run;

SAS: proc reg and macro

i have a data that contain 30 variable and 2000 Observations.
I want to calculate regression in a loop, whan in each step I delete the i row in the data.
so in the end I need thet my output will be 2001 regrsion, one for the regrsion on all the data end 2000 on each time thet I drop a row.
I am new to sas, and I tray to find how to do it withe macro, but I didn't understand.
Any comments and help will be appreciated!
This will create the data set I was talking about in my comment to Chris.
data del1V /view=del1v;
length group _obs_ 8;
set sashelp.class nobs=nobs;
_obs_ = _n_;
group=0;
output;
do group=1 to nobs;
if group eq _n_ then;
else output;
end;
run;
proc sort out=analysis;
by group;
run;
DATA NEW;
DATA OLD;
do i = 1 to 2001;
IF _N_ ^= i THEN group=i;
else group=.;
output;
end;
proc sort data=new;
by group;
proc reg syntax;
by group;
run;
This will create a data set that is much longer. You will only call proc reg once, but it will run 2001 models.
Examining 2001 regression outputs will be difficult just written as output. You will likely need to go read the PROC REG support documentation and look into the output options for whatever type of output you're interested in. SAS can create a data set with the GROUP column to differentiate the results.
I edited my original answer per #data null suggestion. I agree that the above is probably faster, though I'm not as confident that it would be 100x faster. I do not know enough about the costs of the overhead of proc reg versus the cost of the group by statement and a larger data set. Regardless the answer above is simpler programming. Here is my original answer/alternate approach.
You can do this within a macro program. It will have this general structure:
%macro regress;
%do i=1 %to 2001;
DATA NEW;
DATA OLD;
IF _N_=&I THEN DELETE;
RUN;
proc reg syntax;
run;
%end;
%mend;
%regress
Macros are an advanced programming function in SAS. The macro program is required in order to do a loop of proc reg. The %'s are indicative of macro functions. &i is a macro variable (& is the prefix of a macro variable that is being called). The macro is created in a block that starts and ends with %macro / %mend, and called by %regress.
Examining 2001 regression outputs will be difficult just written as output. You will likely need to go read the PROC REG support documentation and look into the output options for whatever type of output you're interested in. Use &i to create a different data set each time and then append together as part of the macro loop.

How to suppress infinite loop note when using "IF 0 THEN SET"

To 'copy' the PDV structure of a data set, it has been advised to "reference a data set at compile time" using
if 0 then set <data-set>
For example,
data toBeCopied;
length var1 $ 4. var2 $ 4. ;
input var1 $ var2 $;
datalines;
this is
just some
fake data
;
run;
data copyPDV;
if 0 then set toBeCopied;
do var1 = 'cutoff' ;
do var2 = 'words';
output;
end;
end;
run;
When you run this, however, the following NOTE appears in the log:
NOTE: DATA STEP stopped due to looping.
This is because the data step never reaches the EOF marker and gets stuck in an infinite loop, as explained in Data Set Looping. (It turns out the DATA step recognizes this and terminates the loop, hence the NOTE in the log).
It seems like usage of the if 0 then set <data-set> statement is a longstanding practice, dating as far back as 1987. Although it seems hacky to me, I can't think of another way to produce the same result (i.e. copying PDV structure), aside from manually restating the attribute requirements. It also strikes me as poor form to allow ERRORs, WARNINGs, and NOTEs which imply unintended program behavior to remain in the log.
Is there way to suppress this note or an altogether better method to achieve the same result (i.e. of copying the PDV structure of a data set)?
If you include a stop; statement, as in
if 0 then do;
set toBeCopied;
stop;
end;
the NOTE still persists.
Trying to restrict the SET to a single observation also seems to have no effect:
if 0 then set toBeCopied (obs=1);
Normally SAS will terminate a data step at the point when you read past the input. Either of raw data or SAS datasets. For example this data step will stop when it executes the SET statement for the 6th time and finds there are no more observations to read.
data want;
put _n_=;
set sashelp.class(obs=5);
run;
To prevent loops SAS checks to see if you read any observations in this iteration of the data step. It is smart enough not to warn you if you are not reading from any datasets. So this program does not get a warning.
data want ;
do age=10 to 15;
output;
end;
run;
But by adding that SET statement you triggered the checking. You can prevent the warning by having a dataset that you are actually reading so that it stops when it reads past the end of the actual input data.
data want;
if 0 then set sashelp.class ;
set my_class;
run;
Or a file you are reading.
data want ;
if 0 then set sashelp.class ;
infile 'my_class.csv' dsd firstobs=2 truncover ;
input (_all_) (:) ;
run;
Otherwise add a STOP statement to manually end the data step.
data want ;
if 0 then set sashelp.class;
do age=10 to 15;
do sex='M','F';
output;
end;
end;
stop;
run;
The stop needs to not be in the if 0 branch. That branch is never executed. stop needs to be executed, and executed at the spot where you want the execution to stop.
data copyPDV;
if 0 then set toBeCopied;
do var1 = 'cutoff' ;
do var2 = 'words';
output;
end;
end;
stop;
run;
Not to suppress this note in your situation. It is method to get structure of a data set.
data class;
set sashelp.class(obs=0);
run;
or
proc sql;
create table class1 like sashelp.class;
quit;

Controlling export (ODS or PROC) based on Number of Observations

I currently have a SAS process that generates multiple data sets (whether they have observations or not). I want to determine a way to control the export procedure based on the total number of observations (if nobs > 0, then export). My first attempt was something primitive using if/then logic comparing a select into macro var (counting obs in a data set) -
DATA _NULL_;
SET A_EXISTS_ON_B;
IF &A_E > 0 THEN DO;
FILE "C:\Users\ME\Desktop\WORKLIST_T &PDAY..xls";
PUT TASK;
END;
RUN;
The issue here is that I don't have a way to write multiple sets to the same workbook with multiple sheets(or do I?)
In addition, whenever I try and add another "Do" block, with similar logic, the execution fails. If this cannot be done with a data null, would ODS be the answer?
The core of what you want to do, conditionally execute code, can be done one of a number of ways.
Let's imagine we have a short macro that exports a dataset to excel. Simple as pie.
%macro export_to_excel(data=,file=,sheet=);
proc export data=&data. outfile=&file. dbms=excel replace;
sheet=&sheet.;
run;
%mend export_to_excel;
Now let's say we want to do this conditionally. How we do it depends, to some degree, on how we call this macro in our code now.
Let's say you have:
%let wherecondition=1; *always true!;
data class;
set sashelp.class;
if &wherecondition. then output;
run;
%export_to_excel(data=class,file="c:\temp\class.xlsx", sheet=class1);
Now you want to make this so it only exports if class has some rows in it, right. So you get the # of obs in class:
proc sql;
select count(1) into :classobs from class;
quit;
And now you need to incorporate that somehow. In this case, the easiest way is to add a condition to the export macro. Open code doesn't allow conditional executing of code, so it needs to be in a macro.
So we do:
%macro export_to_excel(data=,file=,sheet=,condition=1);
%if &condition. %then %do;
proc export data=&data. outfile=&file. dbms=excel replace;
sheet=&sheet.;
run;
%end;
%mend export_to_excel;
And you add the count to the call:
%export_to_excel(data=class,file="c:\temp\class.xlsx", sheet=class1,condition=&classobs.)
Tada, now it won't try to export when it's 0. Great.
If this code is already in a macro, you don't have to alter the export macro itself. You can simply put that %if %then part around the macro call. But that's only if the whole thing is already a macro - %if isn't allowed outside of macros (sorry).
Now, if you're exporting a whole bunch of datasets, and you're generating your export calls from something, you can add the condition there, more easily and more smoothly than this.
Basically, either make by hand (if that makes sense), or use proc sql or proc contents or (other method of your choice) to make a dataset that contains one row per dataset-to-export, with four variables: dataset name, file to export, sheet to export (unless that's the same as the dataset name), and count of observations for that dataset. Often the first three would be made by hand, and then merged/updated via sql or something else to the count of obs per dataset.
Then you can generate calls to export, like so:
proc sql;
select cats('%export_to_excel(data=',dataname,',file=',filename,',sheet=',sheetname,')')
into :explist separated by ' '
from datasetwithnames
where obsnum>0;
quit;
&explist.; *this actually executes them;
Assuming obsnum is the new variable you created with the # of obs, and the other variables are obviously named. That won't pull a line with anything with 0 observations - so it never tries to execute the export. That works with the initial export macro just as well as with the modified one.
Suggest you google around for different approaches to writing XLS files.
Regarding using a DATA step or PROC step, the DATA step is tolerant of datasets that have 0 obs. If the SET statement reads a dataset that has 0 obs, it will simply end the step. So you don't need special logic. Most PROCS also accomodate 0 obs dataset without throwing a warning or error.
For example:
1218 *Make a 0 obs dataset;
1219 data empty;
1220 x=1;
1221 stop;
1222 run;
NOTE: The data set WORK.EMPTY has 0 observations and 1 variables.
1223
1224 data want;
1225 put "I run before SET statement.";
1226 set empty;
1227 put "I do not run after SET statement.";
1228 run;
I run before SET statement.
NOTE: There were 0 observations read from the data set WORK.EMPTY.
NOTE: The data set WORK.WANT has 0 observations and 1 variables.
1229
1230 proc print data=empty;
1231 run;
NOTE: No observations in data set WORK.EMPTY.
But note as Joe points out, PROC EXPORT will happily export a dataset with 0 obs and write an file with 0 records, overwriting if it was there already. e.g.:
1582 proc export data=sashelp.class outfile="d:\junk\class.xls";
1583 run;
NOTE: File "d:\junk\class.xls" will be created if the export process succeeds.
NOTE: "CLASS" range/sheet was successfully created.
1584
1585 data class;
1586 stop;
1587 set sashelp.class;
1588 run;
NOTE: The data set WORK.CLASS has 0 observations and 5 variables.
1589
1590 *This will replace class.xls";
1591 proc export data=class outfile="d:\junk\class.xls" replace;
1592 run;
NOTE: "CLASS" range/sheet was successfully created.
ODS statements would likely do the same.
I use a macro to check if a dataset is empty. SO answers like:
How to detect how many observations in a dataset (or if it is empty), in SAS?