Could you please help me.
I would like to generate a random array from 0 to 5 and I'm using this function
rand_num = int(ranuni(0)*5+1)
But I would like to generate a random array with a nonrecurrent elements.
For example (1,2,3,4,5) (3,1,5,4,2) etc..
How I can do it?
Thank You!
/* draw with repetition */
data a;
array rand(5);
do i = 1 to dim(rand);
rand(i) = int(ranuni(0)*5+1);
end;
keep rand:;
run;
/* draw without repetition */
data a;
array rand(5);
do i = 1 to dim(rand);
do until(rand(i) ^= .);
value = int(ranuni(0)*5+1);
if value not in rand then
rand(i) = value;
end;
end;
keep rand:;
run;
I think call ranperm is the better solution for this, though both seem to have roughly the same statistical properties. Here's a solution using that (very similar to what Keith pointed to in #data_null_'s solution on another question):
data want;
array rand_array[5];
*initialize the array (once);
do _i = 1 to dim(rand_array);
rand_array[_i]=_i;
end;
*seed for the RNG;
seed=5;
*randomize;
*each time `call ranperm` is used, this shuffles the group;
do _i = 1 to 1e5;
call ranperm(seed,of rand_array[*]);
output;
end;
run;
Related
I have 18 numerical variables pm25_total2000 to pm25_total2018
Each person have a starting year between 2013 and 2018, we can call that variable "reqyear".
Now I want to calculate mean for each persons 10 years before the starting year.
For example if a person have starting year 2015 I want mean(of pm25_total2006-pm25_total2015)
Or if a person have starting year 2013 I want mean(of pm25_total2004-pm25_total2013)
How to do this?
data _null_;
set scapkon;
reqyear=substr(iCDate,1,4)*1;
call symput('reqy',reqyear);
run;
data scatm;
set scapkon;
/* Medelvärde av 10 år innan rekryteringsår */
pm25means=mean(of pm25_total%eval(&reqy.-9)-pm25_total%eval(&reqy.));
run;
%eval(&reqy.-9) will be constant value (the same value for all as for the first person) , in my case 2007
That doesn't work.
You can compute the mean with a traditional loop.
data want;
set have;
array x x2000-x2018;
call missing(sum, mean, n);
do _n_ = 1 to 10;
v = x ( start - 1999 -_n_ );
if not missing(v) then do;
sum + v;
n + 1;
end;
end;
if n then mean = sum / n;
run;
If you want to flex your SAS skill, you can use POKE and PEEK concepts to copy a fixed length slice (i.e. a fixed number of array elements) of an array to another array and compute the mean of the slice.
Example:
You will need to add sentinel elements and range checks on start to prevent errors when start-10 < 2000.
data have;
length id start x2000-x2018 8;
do id = 1 to 15;
start = 2013 + mod(id,6);
array x x2000-x2018;
do over x;
x = _n_;
_n_+1;
end;
output;
end;
format x: 5.;
run;
data want;
length id start mean10yrPriorStart 8;
set have;
array x x2000-x2018;
array slice(10) _temporary_;
call pokelong (
peekclong ( addrlong ( x(start-1999-10) ) , 10*8 ) ,
addrlong ( slice (1))
);
mean10yrPriorStart = mean(of slice(*));
run;
use an array and loop
index the array with years
accumulate the sum of the values
accumulate the count to account for any missing values
divide to obtain the mean value
data want;
set have;
array _pm(2000:2018) pm25_total2000 - pm25_total2018;
do year=reqyear to (reqyear-9) by -1;
*add totals;
total = sum(total, _pm(year));
*add counts;
nyears = sum(nyears,not missing(_pm(year)));
end;
*accounts for possible missing years;
mean = total/nyears;
run;
Note this loop goes in reverse (start year to 9 years previous) because it's slightly easier to understand this way IMO.
If you have no missing values you can remove the nyears step, but not a bad thing to include anyways.
NOTE: My first answer did not address the OP's question, so this a redux.
For this solution, I used Richard's code for generating test data. However, I added a line to randomly add missing values.
x = _n_;
if ranuni(1) < .1 then x = .;
_n_+1;
This alternative does not perform any checks for missing values. The sum() and n() functions inherently handle missing values appropriately. The loop over the dynamic slice of the data array only transfers the value to a temporary array. The final sum and count is performed on the temp array outside of the loop.
data want;
set have;
array x(2000:2018) x:;
array t(10) _temporary_;
j = 1;
do i = start-9 to start;
t(j) = x(i);
j + 1;
end;
sum = sum(of t(*));
cnt = n(of t(*));
mean = sum / cnt;
drop x: i j;
run;
Result:
id start sum cnt mean
1 2014 72 7 10.285714286
2 2015 305 10 30.5
3 2016 458 9 50.888888889
4 2017 631 9 70.111111111
I have a dataset that consists of variables named month0-month120 and for each record I am trying to check if these variables equal a particular value. I am having a bit of trouble trying to do this dynamically rather than writing 120 lines of code. How would be the proper way to accomplish this? I am also having trouble formulating how to word the question which is also hindering me when searching online.
Edit: So basically I have this time series of values from the last 5 years represented in month0-120. I am trying to see how many '.' values are present within this array for each record. An example of input is as such
data testing;
set blah;
len = 0;
do i = 0 to 120;
if month[i] = . then len+1;
end;
run;
To count the number of missing values use NMISS().
data testing;
set blah;
len = nmiss(of month0-month120);
run;
Note CMISS() will also work since CMISS() works with both numeric and character variables.
For more general solution for referencing a set of variables use an ARRAY.
data testing;
set blah;
array months month0-month120;
do index=1 to dim(months);
* do something with MONTHS[index] ;
end;
run;
For the code you posted, you would need to explicitly declare your array and it's easier if you specify the index from 0 to 120. Otherwise, SAS would index it from 1 to 121 essentially.
data testing;
set blah;
array months(0:120) month0-month120;
len = 0;
do i = 0 to 120;
if month[i] = . then len+1;
end;
run;
Given a set of variable v(1) - v(k), a function f is defined as f(v1,v2,...vk).
The target is to have a set of v(i) that maximize f given v(1)+v(2)+....+v(k)=n. All elements are restricted to non-negative integers.
Note: I don't have SAS/IML or SAS/OR.
If k is known, say 2, then I can do sth like this.
data out;
set in;
maxf = 0;
n1 = 0;
n2 = 0;
do i = 0 to n;
do j = 0 to n;
if i + j ne n then continue;
_max = f(i,j);
if _max > maxf then do;
maxf = max(maxf,_max);
n1 = i;
n2 = j;
end;
end;
end;
drop i j;
run;
However, this solution has several issues.
Using loops seems to be very inefficient.
It doesn't know how may nested loops needed when k is unknown.
It's exactly the "Allocate n balls into k bins" problem where k is determined by # of columns in data in with specific prefix and n is determined by macro variable.
Function f is known, e.g f(i,j) = 2*i+3*j;
Is this possible to be done in data step?
As said in the comments, general non-linear integer programs are hard to solve. The method below will solve for continuous parameters. You will have to take the output and find the nearest integer values that maximize your function. However, the loop will now be much smaller and quicker to run.
First let's make a function. This function has an extra parameter and is linear in that parameter. Wrap your function inside something like this.
proc fcmp outlib=work.fns.fns;
function f(x1,x2,a);
out = -10*(x1-5)*(x1-5) + -2*(x2-2)*(x2-2) + 2*(x1-5) + 3*(x2-2);
return(out+a);
endsub;
run;quit;
options cmplib=work.fns;
We need to add the a parameter so that we can have a value that SAS can pass besides the actual parameters. SAS will think it's solving the likelihood of A, based on x1 and x2.
Generate a Data Set with an A value.
data temp;
a = 1;
run;
Now use PROC NLMIXED to maximize the likelihood of A.
ods output ParameterEstimates=Parameters;
ods select ParameterEstimates;
proc nlmixed data=temp;
parms x1=1 x2=1;
bounds x1>0, x2>0;
y = f(x1,x2,a);
model a ~ general(y);
run;
ods select default;
I get output of x1=5.1 and x2=2.75. You can then search "around" that to see where the maximum comes out.
Here's my attempt at a Data Step to search around the value:
%macro call_fn(fn,n,parr);
%local i;
&fn(&parr[1]
%do i=2 %to &n;
, &parr[&i]
%end;
,0)
%mend;
%let n=2;
%let c=%sysevalf(2**&n);
data max;
set Parameters end=last;
array parms[&n] _temporary_;
array start[&n] _temporary_;
array pmax[&n];
max = -9.99e256;
parms[_n_] = estimate;
if last then do;
do i=1 to &n;
start[i] = floor(parms[i]);
end;
do i=1 to &c;
x = put(i,$binary2.);
do j=1 to &n;
parms[j] = input(substr(x,j,1),best.) + start[j];
end;
/*You need a macro to write this dynamically*/
val = %call_fn(f,&n,parms);
*put i= max= val=;
if val > max then do;
do j=1 to &n;
pmax[j] = parms[j];
end;
max = val;
end;
end;
output;
end;
I have a SAS dataset, with the following variables: ID, Var1_0, Var1_3, Var1_6, Var2_0, Var2_3, Var2_6, which can be read like this: Var1_0 is parameter 1 at time 0. For every subjects I have 2 variables and 3 time points. I want to transpose this into a long format using an array, I did this:
data long;
set wide;
array a Var1_0 Var1_3 Var1_6 Var2_0 Var_3 Var_6;
Do i=1 to dim(a);
outcome = a[i];
*Var = ???;
if (mod(i,3)=1) then Time = 0;
else if (mod(i,3)=2) then Time = 3;
else Time = 6;
Output;
end;
keep ID Outcome Time;
run;
The problem is that I don't know how to calculate the parameter variable, i.e., I want to add a variables that is either 1 or 2, depending on which parameter the value is related to. Is there a better way of doing this? Thank you !
Reeza gave you the answer in her comment. Here it is typed out.
data long;
set wide;
array a[*] Var1_0 Var1_3 Var1_6 Var2_0 Var2_3 Var2_6;
do i=1 to dim(a);
outcome = a[i];
var = vname(a[i]);
time = input(scan(var,2,'_'),best.);
/*Other stuff you want to do*/
output;
end;
run;
VNAME(array[sub]) gives you the variable name of the variable referenced by array[sub].
scan(str,i,delim) gives you the ith word in str using the specified delimiter.
I've been trying to find the simplest way to generate simulated time series datasets in SAS. I initially was experimenting with the LAG operator, but this requires input data, so is proabably not the best way to go. (See this question: SAS: Using the lag function without a set statement (to simulate time series data.))
Has anyone developed a macro or dataset that enables time series to be genereated with an arbitrary number of AR and MA terms? What is the best way to do this?
To be specific, I'm looking to generate what SAS calls an ARMA(p,q) process, where p denotes the autoregressive component (lagged values of the dependent variable), and q is the moving average component (lagged values of the error term).
Thanks very much.
I have developed a macro to attempt to answer this question, but I'm not sure whether this is the most efficient way of doing this. Anyway, I thought it might be useful to someone:
%macro TimeSeriesSimulation(numDataPoints=100, model=y=e,outputDataSetName=ts, maxLags=10);
data &outputDataSetName (drop=j);
array lagy(&maxlags) _temporary_;
array lage(&maxlags) _temporary_;
/*Initialise values*/
e = 0;
y=0;
t=1;
do j = 1 to 10;
lagy(j) = 0;
lage(j) = 0;
end;
output;
do t = 2 to &numDataPoints; /*Change this for number of observations*/
/*SPECIFY MODEL HERE*/
e = rannorm(-1); /*Draw from a N(0,1)*/
&model;
/*Update values of lags on the moving average and autoregressive terms*/
do j = &maxlags-1 to 1 by -1; /*Note you have to do this backwards because otherwise you cascade the current value to all past values!*/
lagy(j+1) = lagy(j);
lage(j+1) = lage(j);
end;
lagy(1) = y;
lage(1) = e;
output;
end;
run;
%mend;
/*Example 1: Unit root*/
%TimeSeriesSimulation(numDataPoints=1000, model=y=lagy(1)+e)
/*Example 2: Simple process with AR and MA components*/
%TimeSeriesSimulation(numDataPoints=1000, model=y=0.5*lagy(1)+0.5*lage(1)+e)