Force append in SAS to save missing data - sas

I am trying to save a vector named ez product of some matrix calculations based on a group variable named ID_bloque. For each ID_bloque value my code computes a vector called ez and append it in a matrix with the same name. However, if zk is a missing data vector (which is not a mistake) the code stops and make no other computations, which is a problem for other groups since no calculations are made for them. Is there any way to force append clause to keep this data and those resulting for the other groups? Thank you so much
proc iml;
/*Maximo= maximum ID_bloque value*/
do i=1 to %EVAL(&MAXIMO);
use T6; /*Dataset*/
/*Variables*/
read all var{E1S1 E1S2 E2S1 E2S2 E3S1 E3S2 E4S1 E4S2}
into XK where (ID_bloque=i) ;
read all var{FEX_P} into dk where (ID_bloque=i) ;
read all var{VK} into vk where (ID_bloque=i) ;
read all var{z} into zk where (ID_bloque=i) ;
/* Matrix computations */
MAT=J(NCOL(XK),NCOL(XK),0);
do j=1 to Nrow(XK);
MAT= MAT + dk[j]*vk[j]*((XK[j,]`)*XK[j,]);
end;
/* ez values depending on missing information in zk*/
if all(zk)=. then do;
ez=zk;
end;
else do;
Brz=(Ginv(MAT))*((XK`)*(dk#vk#zk));
ez=zk-XK*Brz;
end;
/* Vectors appending (error source) */
if i=1 then do;
create ez var{ez}; APPEND;
end;else do; APPEND var{ez} ;end;
end;
close ez;
quit;

To check whether a vector is all missing, you should use
if all(zk = .) then...
The best way to handle appending to a data set within a loop is to open the data set BEFORE the loop, append within the loop, and then close after the loop. SAS needs to know what data types and lengths will be written, so I usually use fake data (such as missing values) when I open the data set just to provide the variable types.
Lastly, the MAT computation you are doing is a matrix multiplication, so you can simplify that computation:
proc iml;
/*Maximo= maximum ID_bloque value*/
ez = .; /* tell IML that ez is a numeric vector */
create ez var {ez}; /* open the data set */
do i=1 to %EVAL(&MAXIMO);
use T6; /*Dataset*/
/*Variables*/
read all var{E1S1 E1S2 E2S1 E2S2 E3S1 E3S2 E4S1 E4S2}
into XK where (ID_bloque=i) ;
read all var{FEX_P} into dk where (ID_bloque=i) ;
read all var{VK} into vk where (ID_bloque=i) ;
read all var{z} into zk where (ID_bloque=i) ;
/* Matrix computations */
/*
MAT=J(NCOL(XK),NCOL(XK),0);
do j=1 to Nrow(XK);
MAT= MAT + dk[j]*vk[j]*((XK[j,]`)*XK[j,]);
end;
*/
MAT = (dk#vk#XK)` * XK;
/* ez values depending on missing information in zk*/
if all(zk=.) then do;
ez=repeat(1, Nrow(XK));
end;
else do;
Brz=(Ginv(MAT))*((XK`)*(dk#vk#zk));
ez=zk-XK*Brz;
end;
/* Vectors appending (error source) */
append;
end;
close ez;
quit;

Related

Macro does not retain in a IF-THEN loop in DATA STEP

data output;
set input;
by id;
if first.id = 1 then do;
call symputx('i', 1); ------ also tried %let i = 1;
a_&i = a;
end;
else do;
call symputx('i', &i + 1); ------ also tried %let i = %sysevalf (&i + 1);
a_&i = a;
end;
run;
Example Data:
ID A
1 2
1 3
2 2
2 4
Want output:
ID A A_1 A_2
1 2 2 .
1 3 . 3
2 2 2 .
2 4 . 4
I know that you can do this using transpose, but i'm just curious why does this way not work. The macro does not retain its value for the next observation.
Thanks!
edit: Since %let is compile time, and call symput is execution time, %let will run only once and call symput will always be 1 step slow.
why does this way not work
The sequence of behavior in SAS executor is
resolve macro expressions
process steps
automatic compile of proc or data step (compile-time)
run the compilation (run-time)
a running data step can not modify its pdv layout (part of the compilation process) while it is running.
call symput() is performed at run-time, so any changes it makes will not and can not be applied to the source code as a_&i = a;
Array based transposition
You will need to determine the maximum number of items in the groups prior to coding the data step. Use array addressing to place the a value in the desired array slot:
* Hand coded transpose requires a scan over the data first;
* determine largest group size;
data _null_;
set have end=lastrecord_flag;
by id;
if first.id
then seq=1;
else seq+1;
retain maxseq 0;
if last.id then maxseq = max(seq,maxseq);
if lastrecord_flag then call symputx('maxseq', maxseq);
run;
* Use maxseq macro variable computed during scan to define array size;
data want (drop=seq);
set have;
by id;
array a_[&maxseq]; %* <--- set array size to max group size;
if first.id
then seq=1;
else seq+1;
a_[seq] = a; * 'triangular' transpose;
run;
Note: Your 'want' is a triangular reshaping of the data. To achieve a row per id reshaping the a_ elements would have to be cleared (call missing()) at first.id and output at last.id.

SAS Random array with a nonrecurrent elements

Could you please help me.
I would like to generate a random array from 0 to 5 and I'm using this function
rand_num = int(ranuni(0)*5+1)
But I would like to generate a random array with a nonrecurrent elements.
For example (1,2,3,4,5) (3,1,5,4,2) etc..
How I can do it?
Thank You!
/* draw with repetition */
data a;
array rand(5);
do i = 1 to dim(rand);
rand(i) = int(ranuni(0)*5+1);
end;
keep rand:;
run;
/* draw without repetition */
data a;
array rand(5);
do i = 1 to dim(rand);
do until(rand(i) ^= .);
value = int(ranuni(0)*5+1);
if value not in rand then
rand(i) = value;
end;
end;
keep rand:;
run;
I think call ranperm is the better solution for this, though both seem to have roughly the same statistical properties. Here's a solution using that (very similar to what Keith pointed to in #data_null_'s solution on another question):
data want;
array rand_array[5];
*initialize the array (once);
do _i = 1 to dim(rand_array);
rand_array[_i]=_i;
end;
*seed for the RNG;
seed=5;
*randomize;
*each time `call ranperm` is used, this shuffles the group;
do _i = 1 to 1e5;
call ranperm(seed,of rand_array[*]);
output;
end;
run;

SAS maximize a function of variables

Given a set of variable v(1) - v(k), a function f is defined as f(v1,v2,...vk).
The target is to have a set of v(i) that maximize f given v(1)+v(2)+....+v(k)=n. All elements are restricted to non-negative integers.
Note: I don't have SAS/IML or SAS/OR.
If k is known, say 2, then I can do sth like this.
data out;
set in;
maxf = 0;
n1 = 0;
n2 = 0;
do i = 0 to n;
do j = 0 to n;
if i + j ne n then continue;
_max = f(i,j);
if _max > maxf then do;
maxf = max(maxf,_max);
n1 = i;
n2 = j;
end;
end;
end;
drop i j;
run;
However, this solution has several issues.
Using loops seems to be very inefficient.
It doesn't know how may nested loops needed when k is unknown.
It's exactly the "Allocate n balls into k bins" problem where k is determined by # of columns in data in with specific prefix and n is determined by macro variable.
Function f is known, e.g f(i,j) = 2*i+3*j;
Is this possible to be done in data step?
As said in the comments, general non-linear integer programs are hard to solve. The method below will solve for continuous parameters. You will have to take the output and find the nearest integer values that maximize your function. However, the loop will now be much smaller and quicker to run.
First let's make a function. This function has an extra parameter and is linear in that parameter. Wrap your function inside something like this.
proc fcmp outlib=work.fns.fns;
function f(x1,x2,a);
out = -10*(x1-5)*(x1-5) + -2*(x2-2)*(x2-2) + 2*(x1-5) + 3*(x2-2);
return(out+a);
endsub;
run;quit;
options cmplib=work.fns;
We need to add the a parameter so that we can have a value that SAS can pass besides the actual parameters. SAS will think it's solving the likelihood of A, based on x1 and x2.
Generate a Data Set with an A value.
data temp;
a = 1;
run;
Now use PROC NLMIXED to maximize the likelihood of A.
ods output ParameterEstimates=Parameters;
ods select ParameterEstimates;
proc nlmixed data=temp;
parms x1=1 x2=1;
bounds x1>0, x2>0;
y = f(x1,x2,a);
model a ~ general(y);
run;
ods select default;
I get output of x1=5.1 and x2=2.75. You can then search "around" that to see where the maximum comes out.
Here's my attempt at a Data Step to search around the value:
%macro call_fn(fn,n,parr);
%local i;
&fn(&parr[1]
%do i=2 %to &n;
, &parr[&i]
%end;
,0)
%mend;
%let n=2;
%let c=%sysevalf(2**&n);
data max;
set Parameters end=last;
array parms[&n] _temporary_;
array start[&n] _temporary_;
array pmax[&n];
max = -9.99e256;
parms[_n_] = estimate;
if last then do;
do i=1 to &n;
start[i] = floor(parms[i]);
end;
do i=1 to &c;
x = put(i,$binary2.);
do j=1 to &n;
parms[j] = input(substr(x,j,1),best.) + start[j];
end;
/*You need a macro to write this dynamically*/
val = %call_fn(f,&n,parms);
*put i= max= val=;
if val > max then do;
do j=1 to &n;
pmax[j] = parms[j];
end;
max = val;
end;
end;
output;
end;

SAS: Continuously adding observations in a do loop

I am working on a piece of code to take an array of dates and then out put an array of all dates that are within a given buffer of the dates within the original dates vector.
My plan is to use 2 nested do loops to loop through the original array of dates and each time add/subtract the buffer and then use data set to add these two observations to the original set.
I have been using the following code, but I end up in an infinite loop and SAS crashes.
%let buffer = 3;
data dates_with_buffer;
do i = -1*&buffer. to &buffer.;
do j = 1 to 14;
set original_dates point = j;
output_dates = dates + &buffer.;
output;
end;
end;
run;
When you use point= on a set statement, you need to include a stop statement as well to prevent an infinite loop. Try this:
%let buffer = 3;
data dates_with_buffer;
do i = -1*&buffer. to &buffer.;
do j = 1 to 14;
set original_dates point = j;
output_dates = dates + &buffer.;
output;
end;
end;
stop;
run;

Simulating ARMA/ARIMA time series processes in SAS

I've been trying to find the simplest way to generate simulated time series datasets in SAS. I initially was experimenting with the LAG operator, but this requires input data, so is proabably not the best way to go. (See this question: SAS: Using the lag function without a set statement (to simulate time series data.))
Has anyone developed a macro or dataset that enables time series to be genereated with an arbitrary number of AR and MA terms? What is the best way to do this?
To be specific, I'm looking to generate what SAS calls an ARMA(p,q) process, where p denotes the autoregressive component (lagged values of the dependent variable), and q is the moving average component (lagged values of the error term).
Thanks very much.
I have developed a macro to attempt to answer this question, but I'm not sure whether this is the most efficient way of doing this. Anyway, I thought it might be useful to someone:
%macro TimeSeriesSimulation(numDataPoints=100, model=y=e,outputDataSetName=ts, maxLags=10);
data &outputDataSetName (drop=j);
array lagy(&maxlags) _temporary_;
array lage(&maxlags) _temporary_;
/*Initialise values*/
e = 0;
y=0;
t=1;
do j = 1 to 10;
lagy(j) = 0;
lage(j) = 0;
end;
output;
do t = 2 to &numDataPoints; /*Change this for number of observations*/
/*SPECIFY MODEL HERE*/
e = rannorm(-1); /*Draw from a N(0,1)*/
&model;
/*Update values of lags on the moving average and autoregressive terms*/
do j = &maxlags-1 to 1 by -1; /*Note you have to do this backwards because otherwise you cascade the current value to all past values!*/
lagy(j+1) = lagy(j);
lage(j+1) = lage(j);
end;
lagy(1) = y;
lage(1) = e;
output;
end;
run;
%mend;
/*Example 1: Unit root*/
%TimeSeriesSimulation(numDataPoints=1000, model=y=lagy(1)+e)
/*Example 2: Simple process with AR and MA components*/
%TimeSeriesSimulation(numDataPoints=1000, model=y=0.5*lagy(1)+0.5*lage(1)+e)