I am a beginner in SAS and wants to know how below programme is giving me first obs from sashelp.class and in log window, it is saying that data is stopped because of looping.
Can someone please explain, what is happening in the background?
data test;
if age<14;
set sashelp.class;
run;
proc print;
run;
Excellent question! It's important to learn how the DATA step works, and part of that is to know when it stops.
The typical way a DATA step stops is the SET statement tries to read the next record in a dataset and hits the end of the file.
Another way a step will stop is if it has a SET statement in it, and it goes one full iteration of the DATA step loop without a SET statement executing. When it stops for this reason, you get the "stopped due to looping" message. It's basically protection against an infinite loop.
Look at your code, with some PUT statements added:
27 data test;
28 put "top of loop " _n_= age=;
29 if age<14;
30 set sashelp.class;
31 put "bottom of loop " _n_= age=;
32 run;
top of loop _N_=1 age=.
bottom of loop _N_=1 age=14
top of loop _N_=2 age=14
NOTE: DATA STEP stopped due to looping.
NOTE: There were 1 observations read from the data set SASHELP.CLASS.
NOTE: The data set WORK.TEST has 1 observations and 5 variables.
At the top of the first iteration of the loop, age=., so if age<14 is true. The SET statement executes and the first record is read. At the bottom of the loop age=14.
At the top of the second iteration of the loop (_n_=2), age=14 because it is automatically retained. The subsetting IF statement is false. Control goes to the bottom of the loop. The DATA step sees that during the second of iteration of the loop, no records were read. It stops, with the note that it stopped "due to looping."
If you change your subsetting IF to be AFTER the SET statement, the step will not stop due to looping, because on every iteration of the DATA step loop a record will be read.
33 data test;
34 put "top of loop " _n_= age=;
35 set sashelp.class;
36 if age<14;
37 put "bottom of loop " _n_= age=;
38 run;
top of loop _N_=1 age=.
top of loop _N_=2 age=14
bottom of loop _N_=2 age=13
top of loop _N_=3 age=13
bottom of loop _N_=3 age=13
top of loop _N_=4 age=13
top of loop _N_=5 age=14
top of loop _N_=6 age=14
bottom of loop _N_=6 age=12
top of loop _N_=7 age=12
bottom of loop _N_=7 age=12
top of loop _N_=8 age=12
top of loop _N_=9 age=15
bottom of loop _N_=9 age=13
top of loop _N_=10 age=13
bottom of loop _N_=10 age=12
top of loop _N_=11 age=12
bottom of loop _N_=11 age=11
top of loop _N_=12 age=11
top of loop _N_=13 age=14
bottom of loop _N_=13 age=12
top of loop _N_=14 age=12
top of loop _N_=15 age=15
top of loop _N_=16 age=16
bottom of loop _N_=16 age=12
top of loop _N_=17 age=12
top of loop _N_=18 age=15
bottom of loop _N_=18 age=11
top of loop _N_=19 age=11
top of loop _N_=20 age=15
NOTE: There were 19 observations read from the data set SASHELP.CLASS.
NOTE: The data set WORK.TEST has 10 observations and 5 variables.
Related
I have a below code in the SAS:
proc sort data=MYDATA1;
by VarNum Size Flavour Brand Retailer Market date;
run;
DATA MYDATA;
SET MYDATA1;
by VarNum Brand Size Flavour Retailer Market date;
/* Loop while for transformations. */
SUM = 0;
VAR1 = 1;
V1= Transformation;
VAR = Variable_for_SAS;
DO WHILE(FIND(V1,";")<>0);
V=V1;
V1=substr(V1,1,FIND(V1,";")-1);
IF SUBSTR(V1,1,1)="/" THEN
VT=STRIP(SUBSTR(V1,2,Find(V1,";")-2))||STRIP(date);
if _n_=1 then do;
declare hash h(dataset: 'MYDATA1');
h.definekey('Variable_date');
h.definedata('Variable_for_SAS');
h.definedone();
end;
if not h.find(key: VT) then new=Variable_for_SAS;
h.find();
SUM1=1*VAR;
/*Overwrite variable*/
VAR=SUM1;
V1=substr(TRIM(V),FIND(V,";")+1);
run;
But I have error:
run;
_
117
ERROR 117-185: There was 1 unclosed DO block.
Do you know what I should do to solve this problem?
Is problem because I use DO WHILE and hash together?
Now code is full.
Just add in the missing END statement for where you want your DO WHILE () loop to end.
Because you could be looping multiple times on even the first iteration of the data step your IF condition to run the hash creation steps is not sufficient to make sure those statements only run once. So either move the block that creates the HASH object to BEFORE the while loop. Or add additional conditions to the IF statement to keep it from re-running on every iteration of the DO WHILE loop.
Can someone explain this code to me in depth?? I have a list of comments in the code where I am confused. Is there anyway I can attach a csv of the data? Thanks in advance.
data have;
infile "&sasforum.\datasets\Returns.csv" firstobs=2 dsd truncover;
input DATE :mmddyy10. A B B_changed;
format date yymmdd10.;
run;
data spread;
do nb = 1 by 1 until(not missing(B));
set have;
end;
br = B;
do i = 1 to nb;
set have; *** I don't get how you can do i = 1 to nb with set have. There is not variable nb on set have. The variable nb is readinto the dataset spread;
if nb > 1 then B_spread = (1+br)**(1/nb) - 1;
else B_spread = B;
output;
end;
drop nb i br;
run;
***** If i comment out "drop nb i br" i get to see that nb takes a value of 2 for the null values of B.. I don't get how this is done or possible. Because if I run the code right after the line "br = B", and put an output statement in the first do loop, I am clearly seeing that nb takes a valueof one for B null values.Honestly, It is like the first do loop is reads in future observations for B as BR. Can you please explain this to me. The second dataset "bunch" seems to follow the same type of principles as the first... So i imagine if I get a grasp on the first on how the datasetspread is created, then I will understand how bunch is created.;
This is an advanced DATA step programming technique, commonly referred to as a DoW loop. If you search lexjansen.com for DoW, you will find helpful papers like http://support.sas.com/resources/papers/proceedings09/038-2009.pdf. The DoW loop codes and explicit loop around a SET statement. This is actually a "Double-DoW loop", because you have two explicit loops.
I made some sample data, and added some PUT statements to your code:
data have ;
input B ;
cards ;
.
.
1
2
.
.
.
3
;
data spread;
do nb = 1 by 1 until(not missing(B));
set have;
put _n_= "top do-loop " (nb B)(=) ;
end;
br = B;
do i = 1 to nb;
set have;
if nb > 1 then B_spread = (1+br)**(1/nb) - 1;
else B_spread = B;
output;
put _n_= "bottom do-loop " (nb B br B_spread)(=) ;
end;
drop nb i br;
run;
With that sample data, on the first iteration of the DATA step (N=1), the top do loop will iterate three times, reading the first three records of HAVE. At that point, (not missing(B)) will be true, and the loop will not iterate again. The variable NB will have a value of 3. The bottom loop will then iterate 3 times, because NB has a value of 3. It will also read the first three records have HAVE. It will compute B_Spread, and output each record.
On the second iteration of the DATA step, the top DO loop will iterate only once. It will read the 4th record, with B=2. The bottom loop will iterate once, reading the 4th record, computing B_spread, and output.
On the third iteration of the DATA step, the top DO loop will iterate four times, reading the 5th through 8th records. The bottom loop will also iterate four times, reading the 5th through 8th records, computing B_spread, and output.
On the fourth iteration of the DATA step, the step to complete, because the SET statement in the top loop will read the End Of File mark.
The core concept of a Double-DoW loop is that typically you are reading the data in groups. Often groups are identified by an ID. Here they are defined by sequential records read until not missing(B). The top DO-loop reads the first group of records, and computes some value (in this case, it computes NB, the number of records in the group). Then the bottom DO-loop reads the first group of records, and computes some new value, using the value computed in top DO-loop. In this case, the bottom DO-loop computes B_spread, using NB.
I have the following code. I am trying to test a paragraph (descr) for a list of keywords (key_words). When I execute this code, the log reads in all the variables for the array, but will only test 2 of the 20,000 rows in the do loop (do i=1 to 100 and on). Any suggestions on how to fix this issue?
data JE.KeywordMatchTemp1;
set JE.JEMasterTemp end=eof;
if _n_ = 1 then do i = 1 by 1 until (eof);
set JE.KeyWords;
array keywords[100] $30 _temporary_;
keywords[i] = Key_Words;
end;
match = 0;
do i = 1 to 100;
if index(descr, keywords[i]) then match = 1;
end;
drop i;
run;
Your problem is that your end=eof is in the wrong place.
This is a trivial example calculating the 'rank' of the age value for each SASHELP.CLASS respondent.
See where I put the end=eof. That's because you need to use it to control the array filling operation. Otherwise, what happens is your loop that is do i = 1 to eof; doesn't really do what you're saying it should: it's not actually terminating at eof since that is never true (as it is defined in the first set statement). Instead it terminates because you reach beyond the end of the dataset, which is specifically what you don't want.
That's what the end=eof is doing: it's preventing you from trying to pull a row when the array filling dataset is finished, which terminates the whole data step. Any time you see a data step terminate after exactly 2 iterations, you can be confident that's what the problem is likely to be - it is a very common issue.
data class_ranks;
set sashelp.class; *This dataset you are okay iterating over until the end of the dataset and then quitting the data step, like a normal data step.;
array ages[19] _temporary_;
if _n_=1 then do;
do _i = 1 by 1 until (eof); *iterate until the end of the *second* set statement;
set sashelp.class end=eof; *see here? This eof is telling this loop when to stop. It is okay that it is not created until after the loop is.;
ages[_i] = age;
end;
call sortn(of ages[*]); *ordering the ages loaded by number so they are in proper order for doing the trivial rank task;
end;
age_rank = whichn(age,of ages[*]); *determine where in the list the age falls. For a real version of this task you would have to check whether this ever happens, and if not you would have to have logic to find the nearest point or whatnot.;
run;
Is it possible to loop through the records of a table to populate an html email without repeating the beginning and the end of the email?
With this example I get a mail with 5 tables of 1 row (because WORK.MyEmailTable is table of 5 records and set creates a loop in the data step):
data _null_;
file mymail;
set WORK.MyEmailTable;
put '<html><body><table>';
***loop through all records;
put '<tr>';
put %sysfunc(cats('<td>',var1,'</td>'));
put %sysfunc(cats('<td>',var2,'</td>'));
put %sysfunc(cats('<td>',var3,'</td>'));
put '</tr>';
put '</table></body></html>';
run;
And I'm looking to have 1 table of 5 rows.
I don't know if there is a way to prevent recursively put the beginning and the end of the mail when you use set in the data step.
(Let me know if it's not clear I'll update.)
Thank you,
You can use the _n_ automatic datastep variable to let you know when you are on the first observation, and the set statement option end= to know that you are on the last observation:
data _null_;
file mymail;
set WORK.MyEmailTable end=eof;
if _n_ eq 1 then do;
put '<html><body><table>';
end;
/*loop trhough all records*/
put '<tr>';
put %sysfunc(cats('<td>','_n_=',n,' eof=',eof,' ',var1,'</td>'));
put %sysfunc(cats('<td>','_n_=',n,' eof=',eof,' ',var2,'</td>'));
put %sysfunc(cats('<td>','_n_=',n,' eof=',eof,' ',var3,'</td>'));
put '</tr>';
if eof then do;
put '</table></body></html>';
end;
run;
I've added the values _n_ and eof to the output so you can see clearly how they work.
Rob's method is pretty much the standard, but there is another option if you prefer scripting an explicit loop (which can be more comfortable for non-SAS programmers to read). This will function exactly like Rob's answer, and may well compile to the same machine code even.
data _null_;
file mymail;
put '<html><body><table>';
do _n_ = 1 by 1 until (eof);
/*loop trhough all records*/
set WORK.MyEmailTable end=eof;
put '<tr>';
put %sysfunc(cats('<td>',var1,'</td>'));
put %sysfunc(cats('<td>',var2,'</td>'));
put %sysfunc(cats('<td>',var3,'</td>'));
put '</tr>';
end;
put '</table></body></html>';
stop;
run;
_n_ here doesn't have any special meaning (like it does in Rob's answer); it's used by convention since this way it does effectively have the same meaning as it does normally.
You need to use the end=eof to create a variable eof which is true on the last record of the dataset; otherwise the data step will terminate prematurely (before actually hitting your final statement). You also need the stop to tell it to not go back to the start - otherwise it will, and will put a new starting section, then terminate instantly when it hits the set. (Try it and see.)
do _n_=1 by 1 until (eof); is a SAS-specific way of using an incremental loop; it's similar to the c/c++ for (_n_=1; !eof; _n_++) for example - it allows you to have an auto-incremented do loop whilst having a separate, unrelated stopping criteria.
Could you please explain why none the data step statements are processed if we set the (obs=0) data set option in the (wrong) example below?
data temp;
x=0;
run;
data wrong;
set temp(obs=0);
x=1;
y=1;
output;
y=2;
output;
run;
data right;
set temp(obs=1);
x=1;
y=1;
output;
y=2;
output;
run;
I would normally expect that both work.wrong and work.right have the same output.
One of the ways a data step stops executing is when a SET statement executes and reads an end-of-file character (i.e. there are no more records to read).
So if you SET a dataset with (obs=0), when the SET statement executes, the data step stops. For example:
122 data _null_ ;
123 put _n_= "I ran" ;
124 set sashelp.class(obs=0) ;
125 put _n_= "I did not run" ;
126 run;
_N_=1 I ran
NOTE: There were 0 observations read from the data set SASHELP.CLASS.
The first PUT statement executes, but the second does not, because the step stopped when the SET statement executed.
When you SET a dataset with (OBS=1), the data step stops on the SECOND iteration:
135 data _null_ ;
136 put _n_= "I ran before SET" ;
137 set sashelp.class(obs=1) ;
138 put _n_= "I ran after SET" ;
139 run;
_N_=1 I ran before SET
_N_=1 I ran after SET
_N_=2 I ran before SET
NOTE: There were 1 observations read from the data set SASHELP.CLASS.