SAS maximize a function of variables - sas

Given a set of variable v(1) - v(k), a function f is defined as f(v1,v2,...vk).
The target is to have a set of v(i) that maximize f given v(1)+v(2)+....+v(k)=n. All elements are restricted to non-negative integers.
Note: I don't have SAS/IML or SAS/OR.
If k is known, say 2, then I can do sth like this.
data out;
set in;
maxf = 0;
n1 = 0;
n2 = 0;
do i = 0 to n;
do j = 0 to n;
if i + j ne n then continue;
_max = f(i,j);
if _max > maxf then do;
maxf = max(maxf,_max);
n1 = i;
n2 = j;
end;
end;
end;
drop i j;
run;
However, this solution has several issues.
Using loops seems to be very inefficient.
It doesn't know how may nested loops needed when k is unknown.
It's exactly the "Allocate n balls into k bins" problem where k is determined by # of columns in data in with specific prefix and n is determined by macro variable.
Function f is known, e.g f(i,j) = 2*i+3*j;
Is this possible to be done in data step?

As said in the comments, general non-linear integer programs are hard to solve. The method below will solve for continuous parameters. You will have to take the output and find the nearest integer values that maximize your function. However, the loop will now be much smaller and quicker to run.
First let's make a function. This function has an extra parameter and is linear in that parameter. Wrap your function inside something like this.
proc fcmp outlib=work.fns.fns;
function f(x1,x2,a);
out = -10*(x1-5)*(x1-5) + -2*(x2-2)*(x2-2) + 2*(x1-5) + 3*(x2-2);
return(out+a);
endsub;
run;quit;
options cmplib=work.fns;
We need to add the a parameter so that we can have a value that SAS can pass besides the actual parameters. SAS will think it's solving the likelihood of A, based on x1 and x2.
Generate a Data Set with an A value.
data temp;
a = 1;
run;
Now use PROC NLMIXED to maximize the likelihood of A.
ods output ParameterEstimates=Parameters;
ods select ParameterEstimates;
proc nlmixed data=temp;
parms x1=1 x2=1;
bounds x1>0, x2>0;
y = f(x1,x2,a);
model a ~ general(y);
run;
ods select default;
I get output of x1=5.1 and x2=2.75. You can then search "around" that to see where the maximum comes out.
Here's my attempt at a Data Step to search around the value:
%macro call_fn(fn,n,parr);
%local i;
&fn(&parr[1]
%do i=2 %to &n;
, &parr[&i]
%end;
,0)
%mend;
%let n=2;
%let c=%sysevalf(2**&n);
data max;
set Parameters end=last;
array parms[&n] _temporary_;
array start[&n] _temporary_;
array pmax[&n];
max = -9.99e256;
parms[_n_] = estimate;
if last then do;
do i=1 to &n;
start[i] = floor(parms[i]);
end;
do i=1 to &c;
x = put(i,$binary2.);
do j=1 to &n;
parms[j] = input(substr(x,j,1),best.) + start[j];
end;
/*You need a macro to write this dynamically*/
val = %call_fn(f,&n,parms);
*put i= max= val=;
if val > max then do;
do j=1 to &n;
pmax[j] = parms[j];
end;
max = val;
end;
end;
output;
end;

Related

mean of 10 variables with different starting point (SAS)

I have 18 numerical variables pm25_total2000 to pm25_total2018
Each person have a starting year between 2013 and 2018, we can call that variable "reqyear".
Now I want to calculate mean for each persons 10 years before the starting year.
For example if a person have starting year 2015 I want mean(of pm25_total2006-pm25_total2015)
Or if a person have starting year 2013 I want mean(of pm25_total2004-pm25_total2013)
How to do this?
data _null_;
set scapkon;
reqyear=substr(iCDate,1,4)*1;
call symput('reqy',reqyear);
run;
data scatm;
set scapkon;
/* Medelvärde av 10 år innan rekryteringsår */
pm25means=mean(of pm25_total%eval(&reqy.-9)-pm25_total%eval(&reqy.));
run;
%eval(&reqy.-9) will be constant value (the same value for all as for the first person) , in my case 2007
That doesn't work.
You can compute the mean with a traditional loop.
data want;
set have;
array x x2000-x2018;
call missing(sum, mean, n);
do _n_ = 1 to 10;
v = x ( start - 1999 -_n_ );
if not missing(v) then do;
sum + v;
n + 1;
end;
end;
if n then mean = sum / n;
run;
If you want to flex your SAS skill, you can use POKE and PEEK concepts to copy a fixed length slice (i.e. a fixed number of array elements) of an array to another array and compute the mean of the slice.
Example:
You will need to add sentinel elements and range checks on start to prevent errors when start-10 < 2000.
data have;
length id start x2000-x2018 8;
do id = 1 to 15;
start = 2013 + mod(id,6);
array x x2000-x2018;
do over x;
x = _n_;
_n_+1;
end;
output;
end;
format x: 5.;
run;
data want;
length id start mean10yrPriorStart 8;
set have;
array x x2000-x2018;
array slice(10) _temporary_;
call pokelong (
peekclong ( addrlong ( x(start-1999-10) ) , 10*8 ) ,
addrlong ( slice (1))
);
mean10yrPriorStart = mean(of slice(*));
run;
use an array and loop
index the array with years
accumulate the sum of the values
accumulate the count to account for any missing values
divide to obtain the mean value
data want;
set have;
array _pm(2000:2018) pm25_total2000 - pm25_total2018;
do year=reqyear to (reqyear-9) by -1;
*add totals;
total = sum(total, _pm(year));
*add counts;
nyears = sum(nyears,not missing(_pm(year)));
end;
*accounts for possible missing years;
mean = total/nyears;
run;
Note this loop goes in reverse (start year to 9 years previous) because it's slightly easier to understand this way IMO.
If you have no missing values you can remove the nyears step, but not a bad thing to include anyways.
NOTE: My first answer did not address the OP's question, so this a redux.
For this solution, I used Richard's code for generating test data. However, I added a line to randomly add missing values.
x = _n_;
if ranuni(1) < .1 then x = .;
_n_+1;
This alternative does not perform any checks for missing values. The sum() and n() functions inherently handle missing values appropriately. The loop over the dynamic slice of the data array only transfers the value to a temporary array. The final sum and count is performed on the temp array outside of the loop.
data want;
set have;
array x(2000:2018) x:;
array t(10) _temporary_;
j = 1;
do i = start-9 to start;
t(j) = x(i);
j + 1;
end;
sum = sum(of t(*));
cnt = n(of t(*));
mean = sum / cnt;
drop x: i j;
run;
Result:
id start sum cnt mean
1 2014 72 7 10.285714286
2 2015 305 10 30.5
3 2016 458 9 50.888888889
4 2017 631 9 70.111111111

Macro does not retain in a IF-THEN loop in DATA STEP

data output;
set input;
by id;
if first.id = 1 then do;
call symputx('i', 1); ------ also tried %let i = 1;
a_&i = a;
end;
else do;
call symputx('i', &i + 1); ------ also tried %let i = %sysevalf (&i + 1);
a_&i = a;
end;
run;
Example Data:
ID A
1 2
1 3
2 2
2 4
Want output:
ID A A_1 A_2
1 2 2 .
1 3 . 3
2 2 2 .
2 4 . 4
I know that you can do this using transpose, but i'm just curious why does this way not work. The macro does not retain its value for the next observation.
Thanks!
edit: Since %let is compile time, and call symput is execution time, %let will run only once and call symput will always be 1 step slow.
why does this way not work
The sequence of behavior in SAS executor is
resolve macro expressions
process steps
automatic compile of proc or data step (compile-time)
run the compilation (run-time)
a running data step can not modify its pdv layout (part of the compilation process) while it is running.
call symput() is performed at run-time, so any changes it makes will not and can not be applied to the source code as a_&i = a;
Array based transposition
You will need to determine the maximum number of items in the groups prior to coding the data step. Use array addressing to place the a value in the desired array slot:
* Hand coded transpose requires a scan over the data first;
* determine largest group size;
data _null_;
set have end=lastrecord_flag;
by id;
if first.id
then seq=1;
else seq+1;
retain maxseq 0;
if last.id then maxseq = max(seq,maxseq);
if lastrecord_flag then call symputx('maxseq', maxseq);
run;
* Use maxseq macro variable computed during scan to define array size;
data want (drop=seq);
set have;
by id;
array a_[&maxseq]; %* <--- set array size to max group size;
if first.id
then seq=1;
else seq+1;
a_[seq] = a; * 'triangular' transpose;
run;
Note: Your 'want' is a triangular reshaping of the data. To achieve a row per id reshaping the a_ elements would have to be cleared (call missing()) at first.id and output at last.id.

Random number generation without repetition in SAS

I am trying to obtain random numer generation without repetition. My idea is to make do while loop which will go 5 times. Inside i will get the random numer, store it in the table and check at every iteration if the picked numer is in the table or not and then decide if this random pick is an repetition or not.
Here is my code where I try to perform my idea but something is wrong and i do not know where i made a mistake.
data WithoutRepetition;
counter = 0;
array temp (5) _temporary_;
do while(1);
rand=round(4*ranuni(0) +1,1);
if counter = 0 then
do;
temp(1) = rand;
counter=counter+1;
output;
continue;
end;
do a=1 to counter by 1;
if temp(a) = rand then continue ;
end;
temp(counter) = rand;
output;
counter=counter+1;
if counter = 5 then do;
leave;
end;
end;
run;
Sounds like you want a random permutation.
165 data _null_;
166 seed=12345;
167 array r[5] (1:5);
168 put r[*];
169 call ranperm(seed,of r[*]);
170 put r[*];
171 run;
1 2 3 4 5
5 1 4 3 2
This is a simplified version of what you are trying to do.
data WithoutRepetition;
i=0;
array temp[5];
do r=1 by 1 until(i eq dim(temp));
rand=round(4*ranuni(0)+1,1);
if rand not in temp then do; i+1; temp[i]=rand; end;
end;
drop i rand;
run;
You were reasonably close to a working, if convoluted solution. For educational purposes, although data _null_'s answer is much cleaner, here's why your code wasn't working:
Your leave statement is inside a do-end block which is inside another do-loop. Leave statements only break out of the innermost do-loop, so yours has no effect.
The same is true of your continue statements, the first of which is completely unnecessary.
Because you are updating your array with newly-found unique values before you increment counter, previously populated values are being overwritten. This often results in duplicates of the overwritten values appearing in your output.
I would place continue and leave in the same category as goto - avoid using them if at all possible, as they tend to make code difficult to debug. It's clearer to set the exit conditions for all your loops at the point of entry.
Just for fun, though, here's a fixed version of your original code:
data WithoutRepetition;
counter = 0;
array temp (5) _temporary_;
do while(1);
rand=round(4*ranuni(0) +1,1);
if counter = 0 then do;
temp(1) = rand;
counter +1;
output;
end;
dupe = 0;
do a=1 to counter;
if temp(a) = rand then dupe=1;
end;
if dupe then continue;
counter +1;
temp(counter) = rand;
output;
if counter = 5 then leave;
end;
run;
And here is an equivalent version with all the leave and continue statements replaced with more readable alternatives:
data WithoutRepetition;
counter = 0;
array temp (5) _temporary_;
do while(counter < 5);
rand=round(4*ranuni(0) +1,1);
if counter = 0 then do;
temp(1) = rand;
counter +1;
output;
end;
else do;
dupe = 0;
do a=1 to counter while(dupe = 0);
if temp(a) = rand then dupe=1;
end;
if dupe = 0 then do;
counter +1;
temp(counter) = rand;
output;
end;
end;
end;
run;

SAS: Continuously adding observations in a do loop

I am working on a piece of code to take an array of dates and then out put an array of all dates that are within a given buffer of the dates within the original dates vector.
My plan is to use 2 nested do loops to loop through the original array of dates and each time add/subtract the buffer and then use data set to add these two observations to the original set.
I have been using the following code, but I end up in an infinite loop and SAS crashes.
%let buffer = 3;
data dates_with_buffer;
do i = -1*&buffer. to &buffer.;
do j = 1 to 14;
set original_dates point = j;
output_dates = dates + &buffer.;
output;
end;
end;
run;
When you use point= on a set statement, you need to include a stop statement as well to prevent an infinite loop. Try this:
%let buffer = 3;
data dates_with_buffer;
do i = -1*&buffer. to &buffer.;
do j = 1 to 14;
set original_dates point = j;
output_dates = dates + &buffer.;
output;
end;
end;
stop;
run;

Transposing wide to long with an array in SAS

I have a SAS dataset, with the following variables: ID, Var1_0, Var1_3, Var1_6, Var2_0, Var2_3, Var2_6, which can be read like this: Var1_0 is parameter 1 at time 0. For every subjects I have 2 variables and 3 time points. I want to transpose this into a long format using an array, I did this:
data long;
set wide;
array a Var1_0 Var1_3 Var1_6 Var2_0 Var_3 Var_6;
Do i=1 to dim(a);
outcome = a[i];
*Var = ???;
if (mod(i,3)=1) then Time = 0;
else if (mod(i,3)=2) then Time = 3;
else Time = 6;
Output;
end;
keep ID Outcome Time;
run;
The problem is that I don't know how to calculate the parameter variable, i.e., I want to add a variables that is either 1 or 2, depending on which parameter the value is related to. Is there a better way of doing this? Thank you !
Reeza gave you the answer in her comment. Here it is typed out.
data long;
set wide;
array a[*] Var1_0 Var1_3 Var1_6 Var2_0 Var2_3 Var2_6;
do i=1 to dim(a);
outcome = a[i];
var = vname(a[i]);
time = input(scan(var,2,'_'),best.);
/*Other stuff you want to do*/
output;
end;
run;
VNAME(array[sub]) gives you the variable name of the variable referenced by array[sub].
scan(str,i,delim) gives you the ith word in str using the specified delimiter.