Random number generation without repetition in SAS - sas

I am trying to obtain random numer generation without repetition. My idea is to make do while loop which will go 5 times. Inside i will get the random numer, store it in the table and check at every iteration if the picked numer is in the table or not and then decide if this random pick is an repetition or not.
Here is my code where I try to perform my idea but something is wrong and i do not know where i made a mistake.
data WithoutRepetition;
counter = 0;
array temp (5) _temporary_;
do while(1);
rand=round(4*ranuni(0) +1,1);
if counter = 0 then
do;
temp(1) = rand;
counter=counter+1;
output;
continue;
end;
do a=1 to counter by 1;
if temp(a) = rand then continue ;
end;
temp(counter) = rand;
output;
counter=counter+1;
if counter = 5 then do;
leave;
end;
end;
run;

Sounds like you want a random permutation.
165 data _null_;
166 seed=12345;
167 array r[5] (1:5);
168 put r[*];
169 call ranperm(seed,of r[*]);
170 put r[*];
171 run;
1 2 3 4 5
5 1 4 3 2
This is a simplified version of what you are trying to do.
data WithoutRepetition;
i=0;
array temp[5];
do r=1 by 1 until(i eq dim(temp));
rand=round(4*ranuni(0)+1,1);
if rand not in temp then do; i+1; temp[i]=rand; end;
end;
drop i rand;
run;

You were reasonably close to a working, if convoluted solution. For educational purposes, although data _null_'s answer is much cleaner, here's why your code wasn't working:
Your leave statement is inside a do-end block which is inside another do-loop. Leave statements only break out of the innermost do-loop, so yours has no effect.
The same is true of your continue statements, the first of which is completely unnecessary.
Because you are updating your array with newly-found unique values before you increment counter, previously populated values are being overwritten. This often results in duplicates of the overwritten values appearing in your output.
I would place continue and leave in the same category as goto - avoid using them if at all possible, as they tend to make code difficult to debug. It's clearer to set the exit conditions for all your loops at the point of entry.
Just for fun, though, here's a fixed version of your original code:
data WithoutRepetition;
counter = 0;
array temp (5) _temporary_;
do while(1);
rand=round(4*ranuni(0) +1,1);
if counter = 0 then do;
temp(1) = rand;
counter +1;
output;
end;
dupe = 0;
do a=1 to counter;
if temp(a) = rand then dupe=1;
end;
if dupe then continue;
counter +1;
temp(counter) = rand;
output;
if counter = 5 then leave;
end;
run;
And here is an equivalent version with all the leave and continue statements replaced with more readable alternatives:
data WithoutRepetition;
counter = 0;
array temp (5) _temporary_;
do while(counter < 5);
rand=round(4*ranuni(0) +1,1);
if counter = 0 then do;
temp(1) = rand;
counter +1;
output;
end;
else do;
dupe = 0;
do a=1 to counter while(dupe = 0);
if temp(a) = rand then dupe=1;
end;
if dupe = 0 then do;
counter +1;
temp(counter) = rand;
output;
end;
end;
end;
run;

Related

mean of 10 variables with different starting point (SAS)

I have 18 numerical variables pm25_total2000 to pm25_total2018
Each person have a starting year between 2013 and 2018, we can call that variable "reqyear".
Now I want to calculate mean for each persons 10 years before the starting year.
For example if a person have starting year 2015 I want mean(of pm25_total2006-pm25_total2015)
Or if a person have starting year 2013 I want mean(of pm25_total2004-pm25_total2013)
How to do this?
data _null_;
set scapkon;
reqyear=substr(iCDate,1,4)*1;
call symput('reqy',reqyear);
run;
data scatm;
set scapkon;
/* Medelvärde av 10 år innan rekryteringsår */
pm25means=mean(of pm25_total%eval(&reqy.-9)-pm25_total%eval(&reqy.));
run;
%eval(&reqy.-9) will be constant value (the same value for all as for the first person) , in my case 2007
That doesn't work.
You can compute the mean with a traditional loop.
data want;
set have;
array x x2000-x2018;
call missing(sum, mean, n);
do _n_ = 1 to 10;
v = x ( start - 1999 -_n_ );
if not missing(v) then do;
sum + v;
n + 1;
end;
end;
if n then mean = sum / n;
run;
If you want to flex your SAS skill, you can use POKE and PEEK concepts to copy a fixed length slice (i.e. a fixed number of array elements) of an array to another array and compute the mean of the slice.
Example:
You will need to add sentinel elements and range checks on start to prevent errors when start-10 < 2000.
data have;
length id start x2000-x2018 8;
do id = 1 to 15;
start = 2013 + mod(id,6);
array x x2000-x2018;
do over x;
x = _n_;
_n_+1;
end;
output;
end;
format x: 5.;
run;
data want;
length id start mean10yrPriorStart 8;
set have;
array x x2000-x2018;
array slice(10) _temporary_;
call pokelong (
peekclong ( addrlong ( x(start-1999-10) ) , 10*8 ) ,
addrlong ( slice (1))
);
mean10yrPriorStart = mean(of slice(*));
run;
use an array and loop
index the array with years
accumulate the sum of the values
accumulate the count to account for any missing values
divide to obtain the mean value
data want;
set have;
array _pm(2000:2018) pm25_total2000 - pm25_total2018;
do year=reqyear to (reqyear-9) by -1;
*add totals;
total = sum(total, _pm(year));
*add counts;
nyears = sum(nyears,not missing(_pm(year)));
end;
*accounts for possible missing years;
mean = total/nyears;
run;
Note this loop goes in reverse (start year to 9 years previous) because it's slightly easier to understand this way IMO.
If you have no missing values you can remove the nyears step, but not a bad thing to include anyways.
NOTE: My first answer did not address the OP's question, so this a redux.
For this solution, I used Richard's code for generating test data. However, I added a line to randomly add missing values.
x = _n_;
if ranuni(1) < .1 then x = .;
_n_+1;
This alternative does not perform any checks for missing values. The sum() and n() functions inherently handle missing values appropriately. The loop over the dynamic slice of the data array only transfers the value to a temporary array. The final sum and count is performed on the temp array outside of the loop.
data want;
set have;
array x(2000:2018) x:;
array t(10) _temporary_;
j = 1;
do i = start-9 to start;
t(j) = x(i);
j + 1;
end;
sum = sum(of t(*));
cnt = n(of t(*));
mean = sum / cnt;
drop x: i j;
run;
Result:
id start sum cnt mean
1 2014 72 7 10.285714286
2 2015 305 10 30.5
3 2016 458 9 50.888888889
4 2017 631 9 70.111111111

SAS maximize a function of variables

Given a set of variable v(1) - v(k), a function f is defined as f(v1,v2,...vk).
The target is to have a set of v(i) that maximize f given v(1)+v(2)+....+v(k)=n. All elements are restricted to non-negative integers.
Note: I don't have SAS/IML or SAS/OR.
If k is known, say 2, then I can do sth like this.
data out;
set in;
maxf = 0;
n1 = 0;
n2 = 0;
do i = 0 to n;
do j = 0 to n;
if i + j ne n then continue;
_max = f(i,j);
if _max > maxf then do;
maxf = max(maxf,_max);
n1 = i;
n2 = j;
end;
end;
end;
drop i j;
run;
However, this solution has several issues.
Using loops seems to be very inefficient.
It doesn't know how may nested loops needed when k is unknown.
It's exactly the "Allocate n balls into k bins" problem where k is determined by # of columns in data in with specific prefix and n is determined by macro variable.
Function f is known, e.g f(i,j) = 2*i+3*j;
Is this possible to be done in data step?
As said in the comments, general non-linear integer programs are hard to solve. The method below will solve for continuous parameters. You will have to take the output and find the nearest integer values that maximize your function. However, the loop will now be much smaller and quicker to run.
First let's make a function. This function has an extra parameter and is linear in that parameter. Wrap your function inside something like this.
proc fcmp outlib=work.fns.fns;
function f(x1,x2,a);
out = -10*(x1-5)*(x1-5) + -2*(x2-2)*(x2-2) + 2*(x1-5) + 3*(x2-2);
return(out+a);
endsub;
run;quit;
options cmplib=work.fns;
We need to add the a parameter so that we can have a value that SAS can pass besides the actual parameters. SAS will think it's solving the likelihood of A, based on x1 and x2.
Generate a Data Set with an A value.
data temp;
a = 1;
run;
Now use PROC NLMIXED to maximize the likelihood of A.
ods output ParameterEstimates=Parameters;
ods select ParameterEstimates;
proc nlmixed data=temp;
parms x1=1 x2=1;
bounds x1>0, x2>0;
y = f(x1,x2,a);
model a ~ general(y);
run;
ods select default;
I get output of x1=5.1 and x2=2.75. You can then search "around" that to see where the maximum comes out.
Here's my attempt at a Data Step to search around the value:
%macro call_fn(fn,n,parr);
%local i;
&fn(&parr[1]
%do i=2 %to &n;
, &parr[&i]
%end;
,0)
%mend;
%let n=2;
%let c=%sysevalf(2**&n);
data max;
set Parameters end=last;
array parms[&n] _temporary_;
array start[&n] _temporary_;
array pmax[&n];
max = -9.99e256;
parms[_n_] = estimate;
if last then do;
do i=1 to &n;
start[i] = floor(parms[i]);
end;
do i=1 to &c;
x = put(i,$binary2.);
do j=1 to &n;
parms[j] = input(substr(x,j,1),best.) + start[j];
end;
/*You need a macro to write this dynamically*/
val = %call_fn(f,&n,parms);
*put i= max= val=;
if val > max then do;
do j=1 to &n;
pmax[j] = parms[j];
end;
max = val;
end;
end;
output;
end;

Please explain the value of j in output

data primes;
length status $12.;
do i=3 to 6;
status='Prime';
do j=2 to i-1;
if mod(i, j) = 0 then do;
status='Composite';
leave; *exit loop;
end;
end;
Output;
end;
run;
proc print data = primes;
run;
I wrote this code and got the output as below. can someone please explain how the value of j is being picked here? How can j=i in output when the loop goes till j=i-1
Obs status i j
1 Prime 3 3
2 Composite 4 2
3 Prime 5 5
4 Composite 6 2
I has to do with the way the loop is stopped. The check is done at the top of the loop after the index variable is incremented. If it is greater than the stop value the loop stops. You could stop your loop with until(j eq i-1) and see the value you expect. The reason the DO loop uses GT stop is because the increment may never have the exact value of stop.
Also note this is all in the book.
DO Statement, Iterative

SAS do loop substring index with if statment

I have a binary string like '100111111100001111111111000'
It shows as a char variable in SAS.
How can I capture every single change either from 1 to 0 or 0 to 1?
My idea output would be like
Blockquote
type position
1-0 2
0-1 4
1-0 11
0-1 15
1-0 22
I stuck at how to write a recursive statement.(process like 20000 string all in once , every string could be really long......) I'm thinking I can have
zero=index(string,'0'); one=index(string,'1'); if zero>one then
string=substr(string, zero); else if zero
Is this a right direction? How should I put in a DO LOOP statement?
Thank you very much
Aaron
Seems reasonable to me. Slightly simplified.
do position = 1 to length(String)-1;
if subpad(string,position,2)='10' then do;
... output a row for the 1-0 change ...
end;
else if subpad(string,position,2)='01' then do;
... output a row for the 0-1 change ...
end;
end;
With you doing whatever it is you want to output (I assume something like setting a variable to '1-0' and then output;).
I use SUBPAD there sort of out of habit, SUBSTR should work just as well as long as you check the string length properly. SUBPAD won't error if it goes past the end of the string is all.
Please try these codes to see if this is what you are looking for
data have (keep=type pos);
retain type pos;
x = '100111111100001111111111000';
ct01 = count(x,'01');
ct10 = count(x,'10');
pos = 1;
do i =1 to ct01;
pos = find(x,'01',pos)+1;
type='0-1';
output;
end;
pos = 1;
do i =1 to ct10;
pos = find(x,'10',pos)+1;
type='1-0';
output;
end;
run;
proc sort data=have;
by pos;
run;

Shortcut code writing with loop

I am trying to create a loop that wiil shortcut code writing.
I want that every veriable from x1- x30 will be equal: x square i, when i is the index of x1(i.e 1).
For example x7 will be x7=x**7;
I wrote a code, but it doesn't work. and i don't know how to fix him. I will glad for your help people.
DATA maarah (drop = i e);
e = constant("e");
do i = -10 to 10 by 0.01;
x=i;
y=e**x;
output;
end;
length x1-x30 $2001;
do i =1 to 30 by 1;
x i=x**i;
output;
end;
run;
You're close. You need to declare an array. You don't explain what the first half is (the e**i part), so it's not clear what you want here - do you want a few thousand rows with powers of e, and then some rows with x1-x30? And why do you output each time in the second loop? To answer the core question, here:
DATA maarah (drop = i e);
e = constant("e");
do i = -10 to 10 by 0.01;
x=i;
y=e**x;
output;
end;
*length x1-x30 $2001; *what is this? Why do you want it 2001 characters, instead of numeric?;
array xs x1-x30; *you would need a $ after this if you truly wanted character;
do i =1 to 30 by 1;
xs[i]=x**i;
*output; *You probably do not want this. Output is probably outside of the loop.;
end;
run;
I would guess what you really want is this:
DATA maarah (drop = i e);
e = constant("e");
do i = -10 to 10 by 0.01;
x=i;
y=e**x;
*length x1-x30 $2001; *what is this? Why do you want it 2001 characters, instead of numeric?;
array xs x1-x30; *you would need a $ after this if you truly wanted character;
do j =1 to 30;
xs[j]=x**j;
end; *the x1-x30 loop;
output;
end; *the outer loop;
run;