The docs say that when an end= option on the end of a set statement will not work if you're using point or by, but the below code does neither and yet my final do block never runs. Is there a way to get that to run?
data my_peeps ;
input
#1 mrn $char10.
;
datalines ;
roy
mary
gene
bobby
joey
dee-dee
sting
evelyn
yo-mama
dude
dude2
bam-bam
;
run ;
data mrn_remaps ;
input
#1 mrn $char10.
#13 should_be $char10.
;
datalines ;
roy teddy
bobby robert
yo-mama mrs_p
dude phil
dude2 phil
bam-bam james
;
run ;
data corrected_peeps ;
length should_be $ 10 ; * <-- needed for hash ;
retain __num_corrections 0 ;
set my_peeps end = last_record ;
if _n_ = 1 then do ;
declare hash corrections(dataset: 'mrn_remaps') ;
corrections.definekey('mrn') ;
corrections.definedata('mrn', 'should_be') ;
corrections.definedone() ;
call missing (should_be) ;
end ;
* This while loop seems to be whats killing the end= block. ;
do while(corrections.find() = 0) ;
put "INFO: found a merged MRN. BEWARE OF DUPLICATES IN OUTPUT DATASET!" ;
mrn = should_be ;
__num_corrections = __num_corrections + 1 ;
end ;
if last_record then do ;
put 'INFO: Corrected ' __num_corrections 'MRNs.' ;
end ;
drop should_be __num_corrections ;
run ;
proc print ;
run ;
Codewise, you can let .find() perform the correction replacement, and .check if you find it necessary to log information. These changes compact the code a little. It does require changing the remap table to have columns old_mrn(key) and mrn, instead of mrn(key) and new_mrn.
Example:
data my_peeps ;
input
#1 mrn $char10.
;
datalines ;
roy
mary
gene
bobby
joey
dee-dee
sting
evelyn
yo-mama
dude
dude2
bam-bam
;
data mrn_remaps ;
input
#1 old_mrn $char10.
#13 mrn $char10.
;
datalines ;
roy teddy
bobby robert
yo-mama mrs_p
dude phil
dude2 phil
bam-bam james
;
data corrected_peeps ;
retain _count;
set my_peeps end = last_record ;
length old_mrn $ 10 ; * <-- needed for hash ;
if _n_ = 1 then do ;
declare hash corrections(dataset: 'mrn_remaps') ;
corrections.definekey('old_mrn') ;
corrections.definedata('mrn') ;
corrections.definedone();
call missing(old_mrn); * <-- prevent uninitialized NOTE;
end ;
* if corrections.check(key:mrn) then
put 'INFO:' mrn= 'will be corrected';
* increment counter and perform replacement of value in host variable if find succeeds;
_count + (corrections.find(key:mrn) = 0);
if last_record then do ;
put 'INFO: Corrected ' _count 'MRNs.' ;
end ;
drop _:;
run ;
Related
I studied SET statement in Do loop in SAS but i don't understand how to work SET statement in DO loop.
I create the following example dataset a1:
/* Create data a1 */
data a1 ;
input fruit $ ;
cards ;
melon
apple
orange
;
run ;
proc print data=a1 ;
title "Results of a1" ;
run;
Then, I create the following new dataset c1 :
/* Create data c1 using a1 -- This is a upper code block */
data c1 ;
do i = 1 to 3 ;
set a1 ;
count + 1 ;
N_VAR = _N_ ;
ERR_VAR = _ERROR_ ;
output ;
end;
run ;
proc print data=c1 LABEL ;
LABEL N_VAR = "_N_" ;
LABEL ERR_VAR = "_ERROR_" ;
title "Results of c1" ;
run ;
Question: Why doesn't the upper code have the same output as the below code block? I don't understand how to work SET statement in a DO loop. What concept am I missing?
/* My expectation for c1 -- This is a below code block */
data my_expectation ;
input i fruit $ count N ERROR ;
cards ;
1 melon 1 1 0
1 apple 2 2 0
1 orange 3 3 0
2 melon 4 1 0
2 apple 5 2 0
2 orange 6 3 0
3 melon 7 1 0
3 apple 8 2 0
3 orange 9 3 0
;
run;
proc print data=my_expectation label ;
LABEL N = "_N_" ;
LABEL ERROR = "_ERROR_" ;
title "The result that I expected for c1" ;
run ;
I attached result image file below.
Thank you for your attention.
Each SET statement sets up an independent reading stream.
A DATA step is an implicit loop.
After the DO loop iterates 3 times the implicit DATA step loop returns control to the top of the step.
At the second implicit iteration, the DO loop is entered, and in its first iteration the SET statement is reached (for the 4th time). The input data set (A1) has no more observations, so the DATA step ends.
You can observe the flow behavior with this version of your DATA step:
data c1 ;
put 'TOP';
do i = 1 to 3 ;
put i= 'pre SET';
set a1 ;
put i= 'post SET';
count + 1 ;
N_VAR = _N_ ;
ERR_VAR = _ERROR_ ;
output ;
end;
put 'BOTTOM';
run;
Aside:
When a DATA step does not have any explicit OUTPUT statements, the step will implicitly output an observation when control reaches the bottom of the step -- There are statements that prevent flow from reaching the bottom, such as, a RETURN statement or a subsetting IF statement that fails.
I answered your why question, #Tom showed you how to produce your expected result with DATA step. The result is a cross join that SQL can also perform:
data a1 ;
input fruit $ ;
cards ;
melon
apple
orange
;
data replicates;
do i = 1 to 3;
output;
end;
run;
proc sql;
create table want as
select i, a1.*
from replicates cross join a1
;
quit;
If you want to output each observation three times then move the DO loop after the SET.
set a1;
do i=1 to 3; output; end;
If you really want to read through the dataset three times then you either need three separate SET statements
i=1;
set a1;
output;
i=2;
set a1;
output;
i=3;
set a1;
output;
or use POINT= option to explicitly control which observation you are reading with the SET statement.
do i=1 to 3 ;
do p=1 to nobs;
set a1 point=p nobs=nobs ;
output;
end;
end;
stop;
Most DATA step stops when they read past the input and since that cannot happen with the POINT= option you need the STOP statement to prevent the data step from repeating forever.
I tried this in C# but have not had much success. So I am now trying in SAS. Using an EG session and my SAS code, we work with the list of students in SASHELP.CLASS.
These people want to get to know each other and have a monthly random pairing to go on a Coffee Date.
Rules:
A random Coffee Date List is Generated monthly;
I store each months pairing into a Historical Dataset, which I append monthly.
One person cannot have coffee with the same person within a 6 month period. So we keep a separate dataset for historical purposes with 3 Vars:
LastDate,InviterID,InvitedID
We check each pairing against the Historical list of which we only load the most recent 6 months data into a temp dataset for checking purposes.
If no recent matched pair is found, a new matched pair is added to a new Paired Dataset, and the 2 names (Rows) are removed from the original Participants dataset until the dataset has less than 2 rows. (a single person cannot be paired with another)
Unfortunately we have 19 people in this list so one person will be left out until we can add a new participant. Is anyone interested in joining our coffee club? :-)
So I start by deriving and ID (n) from the dataset, and I only keep the Name
Data Participants(Keep=ID Name);
FORMAT ID 8.;
set SASHelp.class;
ID=_n_;
run;
These 19 People will be my Participants in the Coffee Club.
I more or less follow the line of thought:
data _null_;
randvar = ceil(rand('UNIFORM') * 100000);
call symput('RANDSEED', randvar);
run;
data CR.names2(keep=MEMID randid);
set CR.MasterNames;
randid = rand('UNIFORM');
run;
proc sort data=CR.names2 ; by randid; run;
data CR.pairs(keep=pairgrp MEMID);
set CR.names2 nobs=num_peeps;
pairgrp+1;
if pairgrp > floor(num_peeps/2) then pairgrp=1;
run;
proc sort data=CR.pairs; by pairgrp;run;
proc transpose data=CR.pairs
out=CR.pairs2 (drop=_NAME_);
var memid;
by pairgrp;
run;
Data CR.Pairs3;
set CR.pairs2;
rename COL1=InviterID COL2=InvitedID;
run;
But I get stuck :-(
I need help with the rest please...
Has anyone else done this type of random pairing successfully before? I am grasping straws here...
Any help much appreciated.
Len
Here is my idea. This is far from efficient. Esp. when NOBS is getting big, as there is a cartesian product involved. Also I cheated on the odd number by adding another row in that case.
Prepare data and generate empty result table.
Create a list of all possible pairings (combinations) excluding recent pairings.
Random sort and descend through the list until every element has been picked once.
Append to result table.
There is a drawback as there might be members who will not get pairings as all possible partners are already picked. To avoid that we could iterate until we get a maximum of pairings.
EDIT: Added iteration. Now the program makes draws randomly until everyone is matched or a threshold is reached.
This problem should probably be implemented in a matrix orientated language like IML or R.
data Participants(Keep=ID Name) ;
set SASHelp.class nobs = num_peeps ;
ID=_n_ ;
output ;
if _n_ = 1 and mod(num_peeps,2) then do ; /* get even number of members: empty ID to pair with last participant*/
name = 'empty' ;
id = 0 ;
output ;
end ;
run ;
data list_of_meetings ;
length iteration InviterID InvitedID 8. ;
run ;
/****
iter = number of club meetings
hist = length of memory for pairings
tries = number of iterations to pair everyone
****/
%macro loop_coffee (iter=, hist=6, tries= 10) ;
proc sql noprint ;
select max(0,max(iteration)) + 1 into :base
from list_of_meetings ;
quit ;
%do i = &base. %to &iter. ; /* loop through number of meetings */
proc sort data = list_of_meetings (where=(iteration >= &i - &hist )) out = lookup nodupkey ; by InviterID InvitedID ; run ; /* get memory of pairings */
proc sql ; /* list all acceptable pairs */
create table all_pairs as
select a.ID as InviterID, b.ID as InvitedID
from Participants a
inner join Participants b
on a.ID lt b.ID
left join lookup c /* exclude the memory */
on a.ID eq c.InviterID and b.ID eq c.InvitedID
where c.InviterID is NULL ;
quit ;
%let j = 0 ;
%let all_pairs = 0 ;
%do %until (&all_pairs | &j > &tries) ; /* iterate and random sort until all members are paired */
%let j = %eval( &j + 1 ) ;
data all_pairs;
set all_pairs;
randnum = ranuni(12345 + &i + &j);
run;
proc sort data = all_pairs ; by randnum ; run ; /* random sort */
data out_pairs ; /* select the pairs: no. of IDs/2 */
declare hash h() ;
h.defineKey("ID") ;
h.defineDone() ;
do until ( eof1 ) ;
set Participants (keep= ID) end = eof1 ;
rc = h.add () ; /* populate list of members */
end ;
do until ( eof2 ) ;
set all_pairs (keep= InviterID InvitedID) end = eof2 ;
rc1 = h.check (key:InviterID) ;
rc2 = h.check (key:InvitedID) ;
if rc1 = 0 and rc2 = 0 then do ;
rc = h.remove (key:InviterID) ; /* delete member from list if paired */
rc = h.remove (key:InvitedID) ;
output ;
end ;
if h.num_items = 0 then do ;
call symput('all_pairs', 1 ) ;
stop ;
end;
end ;
stop ;
keep InviterID InvitedID ;
run ;
%end ;
data list_of_meetings ;
set list_of_meetings (where=(iteration ne .))
Out_pairs (in=pairs) ;
if pairs then iteration = &i. ;
run ;
%end ;
%mend ;
%loop_coffee (iter=10,hist=6,tries=10) ;
I have a data set with a variable named "Condition" that I want to use in the code. I'm guessing I need to do it in a macro but I'm still learning how to write macros in SAS.
So if my data set is this:
Question,Answer,Condition,Result
Q1,1,Answer=1," "
Q2,2,Answer=1," "
Q3,3,Answer=4," "
Then I want the program to take the Condition variable as a string and then use it as an if statement:
if Condition then Result = "Correct";
Is this possible?
That is not easy to do. For your simple example you could do:
data want ;
set have ;
if cats('Answer=',answer) = condition then ....
But that will not generalize to situations where CONDITION references the values of other variables. You might be able to generate code from a set of unique values of CONDITION.
Sample data:
data have ;
infile cards dsd truncover ;
input Question $ Answer Condition :$30. Expected $ ;
cards;
Q1,1,Answer=1,"Correct"
Q2,2,Answer=1,"Wrong"
Q3,3,Answer=4,"Wrong"
;;;;
Generate code using unique values of CONDITION.
filename code temp ;
data _null_;
set have end=eof ;
by condition ;
file code ;
if _n_=1 then put 'SELECT ;' ;
if first.condition then put ' WHEN (' CONDITION= :$quote. ' AND (' condition ')) RESULT="CORRECT" ;' ;
if eof then put ' OTHERWISE RESULT="WRONG";'
/ 'END;'
;
run;
Use the generated code in a data step.
data want ;
set have ;
%inc code / source2;
run;
Sample Log records.;
252 data want ;
253 set have ;
254 %inc code / source2;
255 +SELECT ;
256 + WHEN (Condition="Answer=1" AND (Answer=1 )) RESULT="CORRECT" ;
257 + WHEN (Condition="Answer=4" AND (Answer=4 )) RESULT="CORRECT" ;
258 + OTHERWISE RESULT="WRONG";
259 +END;
NOTE: %INCLUDE (level 1) ending.
260 run;
I have a SAS dataset which contains one column of polynomials. For example, X1**(-2)+X1**(2).
Is there a function to transform this into a numeric expression?
Many thanks,
If I understand you correctly, I don't think there is a specific function that will easily let you do this. You have two options - write your own logic to interpret the polynomial expressions, or use call execute to have SAS write out a (potentially very long) data step for you, assuming that the polynomials are all entered as valid data step code. Here's a call execute approach:
data have;
input x1 polynomial $255.;
infile datalines truncover;
datalines;
1 X1**(-2)+X1**(2)
2 X1**(-1)+X1**(1)
3 X1**(1)+X1**(-1)
;
run;
data _null_;
set have end = eof;
if _n_ = 1 then call execute('data want; set have; select(_n_);');
call execute(catx(' ','when(',_N_,') y =',polynomial,';'));
if eof then call execute('end; run;');
run;
Convert them to macro variables, and then resolve them into a calculation...
Using the dataset example in user667489's answer :
/* Create numbered macro variables, 1 per row of data */
data _null_ ;
set have end=eof ;
call symputx(cats('POLY',_n_),polynomial) ;
if eof then call symputx('POLYN',_n_) ;
run ;
%MACRO ROWLOOPER ;
%DO N = 1 %TO &POLYN ;
if _n_ = &N then result = &&POLY&N ;
%END ;
%MEND ;
data want ;
set have ;
/* Not very efficient, looping over all polynomials on each row of data */
/* So for 3 rows, you'll perform 9 iterations here */
%ROWLOOPER ;
run ;
Or, alternatively, write your dataset out into a SAS program, and %inc that program :
data _null_ ;
file "polynomials.sas" ;
set have end=eof ;
if _n_ = 1 then do ;
put "data poly;" ;
put " set have;" ;
end ;
put " result = " polynomial ";" ;
if eof then put "run;" ;
run ;
%inc "polynomials.sas" ;
I want the dataset like it is below:
From a dataset that does not have grand total row and column and the rest is same like the dataset in image.
Some dummy data:
data input ;
array M(5) M201402,M201404,M201405,M210406,201409 ;
do desc='ABCD','EFGH' ;
do i=1 to 5 ;
M(i)=int(ranuni(1))*100 ;
output ;
end ;
end ;
run
Generate grand total column plus grand total row:
data output ;
set input end=eof;
array M(*) M2014: ;
array F(*) _temporary_ ;
* Create grand total column ;
grand_total=sum(of m(*)) ;
output ;
* Output grand total row ;
if eof then do ;
do i=1 to dim(m) ;
M(i)=F(i) ;
end ;
output ;
end ;
run ;