Can someone please help with the scenario below? I am very new to SaS and am not sure how to get this to work?
Simulate 200 observations from the following linear model:
Y = alpha + beta1 * X1 + beta2 * X2 + noise
where:
• alpha=1, beta1=2, beta2=-1.5
• X1 ~ N(1, 4), X2 ~ N(3,1), noise ~ N(0,1)
I have tried this code but not sure its completely accurate:
DATA ONE;
alpha = 1;
beta1 = 2;
beta2 = -1.5;
RUN;
DATA CALC;
SET ONE;
DO i = 1 to 200;
Y=alpha+beta1*X1+beta2*X2+Noise;
X1=Rannor(1);
X2=rannor(3);
Noise=ranuni(0);
OUTPUT;
END;
RUN;
PROC PRINT DATA=CALC;
RUN;
You need to have a look in the SAS help for the topics
"rannor","ranuni","generating random numbers",...
rannor: generating standard normal distributed RVs.
ranuni: uniform distributed RVs.
The argument in rannor is the seed number, not the expected value.
If N(x,y) in your example means that the random variable is normally distributed with expected value x and standard deviation y (or do you mean the variance???) then the code could be (have a look on the changed order of the statements; the definition of Y has to be after the definition of the random numbers...):
DATA ONE;
alpha = 1;
beta1 = 2;
beta2 = -1.5;
RUN;
DATA CALC;
SET ONE;
seed = 1234;
DO i = 1 to 200;
X1=1+4*Rannor(seed);
X2=3+rannor(seed);
Noise=rannor(seed);
Y=alpha+beta1*X1+beta2*X2+Noise;
OUTPUT;
END;
RUN;
PROC PRINT DATA=CALC;
RUN;
There are also variants for generating random numbers, e.g. "call rannor". There are different concepts to deal with seed numbers in SAS. See the SAS help for these topics, e.g. here
I have data set named input_data below import from EXCEL.
0.353481635 0.704898683 0.078640917 0.813815803 0.510842666 0.240912872 0.986312218 0.781868961 0.682272971
0.443441526 0.653187181 0.753981865 0.34909803 0.84215961 0.793863082 0.047816942 0.176759112 0.54213244
0.21443281 0.142501578 0.927011587 0.407251043 0.290280445 0.90730524 0.677030212 0.770541244 0.915728969
0.583493041 0.685127614 0.119042255 0.067769934 0.795793907 0.405029459 0.817724346 0.594170688 0.345660875
0.816193304 0.636823417 0.036348358 0.027985453 0.117027493 0.436516667 0.593191955 0.916981676 0.574223091
0.766842249 0.743249552 0.400052263 0.809650253 0.683610082 0.42152573 0.050520292 0.329441952 0.868549022
0.112847881 0.462579082 0.526220066 0.320851313 0.944585551 0.233027402 0.66141107 0.8380858 0.120044416
0.873949265 0.118525986 0.590234323 0.481974796 0.668976582 0.466558592 0.934633956 0.643438048 0.053508922
And I have another data set called p below
data p;
input p;
datalines;
0.12
0.23
0.11
0.49
0.52
0.78
0.8
0.03
0.02
run;
proc transpose data = p out=p2;
run;
What I want to do is matrix manipulation in IML using SAS.
I have some code already, but the final calculation got error. Can someone give me a hand?
proc iml;
use input_data;
read all var _num_ into x;
print x;
proc iml;
use p2;
read all var _num_ into k;
print k;
proc iml;
Value1 = k * x;
print Value1;
quit;
You have several problems here.
First off, you have three PROC IML statements. PROC IML only persists values while it's running; once it quits, all of the vectors go away forever. So remove the PROC IMLs.
Second, you need to make sure your matrices are correctly ordered and structured. Matrix multiplication works by the following:
m x n * n x p = m x p
Where both N's must be the same. This is rows x columns, so the left-side matrix must have the same number of columns as the right-side matrix has rows. (This is because each element of each row on the left-side matrix is multiplied by the corresponding element in the column on the right-side matrix and then summed, so if the numbers don't match it's not possible to do.)
So you have 8x9 and 9x1, which you transpose to 1x9. So first off, don't transpose p, leave it 9x1. Then, make sure you have the order right (matrix multiplication is NOT commutative, the order matters). k * x means 9x1 * 8x9 which doesn't work (since 1 and 8 aren't the same - remember, the inner two numbers have to match.) x*k does work, since that is 8x9 * 9x1, the two 9s match.
Final output:
proc iml;
use input_data;
read all var _num_ into x;
print x;
use p;
read all var _num_ into k;
print k;
Value1 = x * k;
print Value1;
quit;
For my SAS project I have to generate pairs of (X,Y) with a distribution Y ~ N(3 + X + .5X^2, sd = 2). I have looked at all of the SAS documentation for normal() and I see absolutely no way to do this. I have tried many different methods and am very frustrated.
I believe this is an example of what the asker wants to do:
data sample;
do i = 1 to 1000;
x = ranuni(1);
y = rand('normal', 3 + x + 0.5*x**2, 2);
output;
end;
run;
proc summary data = sample;
var x y;
output out = xy_summary;
run;
Joe is already more or less there - I think the only key point that needed addressing was making the mean of each y depend on the corresponding x, rather than using a single fixed mean for all the pairs. So rather than 1000 samples from the same Normal distribution, the above generates 1 sample from each of 1000 different Normal distributions.
I've used a uniform [0,1] distribution for x, but you could use any distribution you like.
You generate random numbers in SAS using the rand function. It has all sorts of distributions available; read the documentation to fully understand.
I'm not sure if you can directly use your PDF, but if you're able to use it with a regular normal distribution, you can do that. On top of that, most of the Univariate DFs SAS supports start out with the Uniform distribution and then apply their formula (Discrete or continuous) to that, so that might be the right way to go. That's heading into stat-land which is somewhere I'm averse to going. There isn't a direct way to simply pass a function for X as far as I know, however.
To generate [numsamp] normals with mean M and standard deviation SD:
%let m=0;
%let sd=2;
%let numsamp=100;
data want;
call streaminit(7);
do id = 1 to &numsamp;
y = rand('Normal',&m.,&sd.);
output;
end;
run;
So if I understand what you want right, this might work:
%let m=0;
%let sd=2;
%let numsamp=1000;
data want;
call streaminit(7);
do id = 1 to &numsamp;
x = rand('Normal',&m.,&sd.);
y = 0.5*x**2 + x + 3;
output;
end;
run;
proc means data=want;
var x y;
run;
X has mean 0.5 with SD 1.96 (roughly what you ask for). Y has mean 5 with SD 3.5. If you're asking for Y to have a SD of 2, i'm not sure how to do that.
I have a dataset mydat with the following variables:
MNES IV
0.84 0.40
0.89 0.34
0.91 0.31
0.93 0.29
0.95 0.26
0.98 0.23
0.99 0.22
1.00 0.22
1.02 0.20
1.04 0.18
1.07 0.18
And I need to fit cubic splines to these elements, where MNES is the object (X) and IV is the image (Y).
I have successfully accomplished what I need through PROC IML but I am afraid this is not the most efficient solution.
Specifically, my intended output dataset is:
mnes iv
0.333 0.40
0.332 0.40 <- for mnes out of sample MNES range, copy first IV;
0.336 0.40
... ...
0.834 0.40
0.837 0.40
0.840 0.40
0.842 INTERPOLATION
0.845 INTERPOLATION
0.848 INTERPOLATION
...
1.066 INTERPOLATION
1.069 INTERPOLATION
1.072 INTERPOLATION
1.074 0.18
1.077 0.18 <- for mnes out of sample MNES range, copy last IV;
1.080 0.18
... ...
3.000 0.18
The necessary specifics are the following:
I always have 1001 points for MNES, ranging from 0.(3) to 3 (thus, each step is (3-1/3)/1000).
The interpolation for IV should only be used for the points between the minimum and maximum MNES.
For the points where MNES is greater than the maximum MNES in the sample, IV should be equal to the IV of the maximum MNES and likewise for the minimum MNES (it is always sorted by MNES).
My worry for efficiency is due to the fact that I have to solve this problem roughly 2 million times and right now it (the code below, using PROC IML) takes roughly 5 hours for 100k different input datasets.
My question is: What alternatives do I have if I wish to fit cubic splines given an input data set such as the one above and output it to a specific grid of objects?
And what solution would be the most efficient?
With PROC IML I can do exactly this with the splinev function, but I am concerned that using PROC IML is not the most efficient way;
With PROC EXPAND, given that this is not a time series, it does not seem adequate. Additionally, I do not know how to specify the grid of objects which I need through PROC EXPAND;
With PROC TRANSREG, I do not understand how to input a dataset into the knots and I do not understand whether it will output a dataset with the corresponding interpolation;
With the MSPLINT function, it seems doable but I do not know how to input a data set to its arguments.
I have attached the code I am using below for this purpose and an explanation of what I am doing. Reading what is below is not necessary for answering the question but it could be useful for someone solving this sort of problem with PROC IML or wanting a better understanding of what I am saying.
I am replicating a methodology (Buss and Vilkov (2012)) which, among other things, applies cubic splines to these elements, where MNES is the object (X) and IVis the image (Y).
The following code is heavily based on the Model Free Implied Volatility (MFIV) MATLAB code by Vilkov for Buss and Vilkov (2012), available on his website.
The interpolation is a means to calculate a figure for stock return volatility under the risk-neutral measure, by computing OTM put and call prices. I am using this for the purpose of my master thesis. Additionally, since my version of PROC IML does not have functions for Black-Scholes option pricing, I defined my own.
proc iml;
* Define BlackScholes call and put function;
* Built-in not available in SAS/IML 9.3;
* Reference http://www.lexjansen.com/wuss/1999/WUSS99039.pdf ;
start blackcall(x,t,s,r,v,d);
d1 = (log(s/x) + ((r-d) + 0.5#(v##2)) # t) / (v # sqrt(t));
d2 = d1 - v # sqrt(t);
bcall = s # exp(-d*t) # probnorm(d1) - x # exp(-r*t) # probnorm(d2);
return (bcall);
finish blackcall;
start blackput(x,t,s,r,v,d);
d1 = (log(s/x) + ((r-d) + 0.5#(v##2)) # t) / (v # sqrt(t));
d2 = d1 - v # sqrt(t);
bput = -s # exp(-d*t) # probnorm(-d1) + x # exp(-r*t) # probnorm(-d2);
return (bput);
finish blackput;
store module=(blackcall blackput);
quit;
proc iml;
* Specify necessary input parameters;
currdate = "&currdate"d;
currpermno = &currpermno;
currsecid = &currsecid;
rate = &currrate / 100;
mat = &currdays / 365;
* Use inputed dataset and convert to matrix;
use optday;
read all var{mnes impl_volatility};
mydata = mnes || impl_volatility;
* Load BlackScholes call and Put function;
load module=(blackcall blackput);
* Define parameters;
k = 2;
m = 500;
* Define auxiliary variables according to Buss and Vilkov;
u = (1+k)##(1/m);
a = 2 * (u-1);
* Define moneyness (ki) and implied volatility (vi) grids;
mi = (-m:m);
mi = mi`;
ki = u##mi;
* Preallocation of vi with 2*m+1 ones (1001 in the base case);
vi = J(2*m+1,1,1);
* Define IV below minimum MNESS equal to the IV of the minimum MNESS;
h = loc(ki<=mydata[1,1]);
vi[h,1] = mydata[1,2];
* Define IV above maximum MNESS equal to the IV of the maximum MNESS;
h = loc(ki>=mydata[nrow(mydata),1]);
vi[h,1] = mydata[nrow(mydata),2];
* Define MNES grid where there are IV from data;
* (equal to where ki still has ones resulting from the preallocation);
grid = ki[loc(vi=1),];
* Call splinec to interpolate based on available data and obtain coefficients;
* Use coefficients to create spline on grid and save on smoothFit;
* Save smoothFit in correct vi elements;
call splinec(fitted,coeff,endSlopes,mydata);
smoothFit = splinev(coeff,grid);
vi[loc(vi=1),1] = smoothFit[,2];
* Define elements of mi corresponding to OTM calls (MNES >=1) and OTM puts (MNES <1);
ic = mi[loc(ki>=1)];
ip = mi[loc(ki<1)];
* Calculate call and put prices based on call and put module;
calls = blackcall(ki[loc(ki>=1),1],mat,1,rate,vi[loc(ki>=1),1],0);
puts = blackput(ki[loc(ki<1),1],mat,1,rate,vi[loc(ki<1),1],0);
* Complete volatility calculation based on Buss and Vilkov;
b1 = sum((1-(log(1+k)/m)#ic)#calls/u##ic);
b2 = sum((1-(log(1+k)/m)#ip)#puts/u##ip);
stddev = sqrt(a*(b1+b2)/mat);
* Append to voldata dataset;
edit voldata;
append var{currdate currsecid currpermno mat stddev};
close voldata;
quit;
Ok. I'm going to do this for 2 data sets to help you with the fact you have a bunch. You will have to modify for your inputs, but this should give you better performance.
Create some inputs
Get the first and last values from each input data set.
Create a list of all MNES values.
Merge each input to the MNES list and set the upper and lower values.
Append the Inputs together
Run PROC EXPAND with a BY statement to single pass all the input values and create the splines.
The trick is to "trick" EXPAND into thinking MNES is a Daily timeseries. I do this by making it an integer -- date values are integers behind the scenes in SAS. With no gaps, ETS Procedures will assume a "daily" frequency.
After this is done, run a Data Step to call the Black-Scholes (BLKSHPTPRC, BLKSHCLPRC) functions and complete your analysis.
/*Sample Data*/
data input1;
input MNES IV;
/*Make MNES and integer*/
MNES = MNES * 1000;
datalines;
0.84 0.40
0.89 0.34
0.91 0.31
0.93 0.29
0.95 0.26
0.98 0.23
0.99 0.22
1.00 0.22
1.02 0.20
1.04 0.18
1.07 0.18
;
run;
data input2;
input MNES IV;
MNES = MNES * 1000;
datalines;
0.80 0.40
0.9 0.34
0.91 0.31
0.93 0.29
0.95 0.26
0.98 0.23
1.02 0.19
1.04 0.18
1.07 0.16
;
run;
/*Get the first and last values from the input data*/
data _null_;
set input1 end=last;
if _n_ = 1 then do;
call symput("first1",mnes);
call symput("first1_v",iv);
end;
if last then do;
call symput("last1",mnes);
call symput("last1_v",iv);
end;
run;
data _null_;
set input2 end=last;
if _n_ = 1 then do;
call symput("first2",mnes);
call symput("first2_v",iv);
end;
if last then do;
call symput("last2",mnes);
call symput("last2_v",iv);
end;
run;
/*A list of the MNES values*/
data points;
do mnes=333 to 3000;
output;
end;
run;
/*Join Inputs to the values and set the lower and upper values*/
data input1;
merge points input1;
by mnes;
if mnes < &first1 then
iv = &first1_v;
if mnes > &last1 then
iv = &last1_v;
run;
data input2;
merge points input2;
by mnes;
if mnes < &first2 then
iv = &first2_v;
if mnes > &last2 then
iv = &last2_v;
run;
/*Append the data sets together, keep a value
so you can tell them apart*/
data toSpline;
set input1(in=ds1)
input2(in=ds2);
if ds1 then
Set=1;
else if ds2 then
Set=2;
run;
/*PROC Expand for the spline. The integer values
for MNES makes it think these are "daily" data*/
proc expand data=toSpline out=outSpline method=spline;
by set;
id mnes;
run;
Here is the solution I came up with. Sadly, I cannot yet conclude whether this is more efficient than the PROC IML solution - just for one dataset they both take the pretty much the same running time.
MSPLINT:
real time: 1.42 seconds
cpu time 0.23 seconds
PROC IML:
real time: 1.02 seconds
cpu time: 0.26 seconds
The biggest disadvantage of this solution when compared to the one above by #DomPazz is that I cannot process the data by 'By groups', which would certainly make it a lot faster... I am still thinking whether I can solve this without resorting to a macro loop but I am all out of ideas.
I keep the solution of defining a macro variable with the first and last values, as proposed by #DomPazz, but I then use a datastep, which copies the first and last values or applies the interpolation depending on what value of MNES it is stepping through. It applies the interpolation through the MSPLINT function. Its syntax is as follows:
MSPLINT(X, n, X1 <, X2, ..., Xn>, Y1 <,Y2, ..., Yn> <, D1, Dn>)
Where X is the object at which you wish to evaluate the spline, n is the number of knots supplied to the function (i.e. the number of observations in the input data), X1,...,Xn are the objects in the input data (i.e. MNES) and Y1,...,Yn are the images in the input data (i.e. IV). D1 and Dn (optional) are the derivatives you wish to maintain for interpolation objects X < X1 and X>Xn.
An interesting note is that by specifying D1 and Dn. as 0 you can have the points beyond the grid equal to the last observation inside the interpolated area. However, this forces the spline images to converge to a derivative of zero, potentially generating a non-natural pattern in the data. I opted not to define these as zero and defining the points outside the interpolation area separately.
So, I use PROC SQL to define the lists of elements of MNES and IV in macro variables, divided by commas, so that I can input them in the MSPLINT function. I also define the number of observations through PROC SQL.
MNES, as I commented in the answer above, was not well defined in my explanation. It needs to be defined as the variable u to the power of elements from -500 to 500. This is just a detail but it will allow you to understand where MNES comes from in the example below.
So, here is the solution, including example data.
* Set model inputs;
%let m = 500;
%let k = 2;
%let u = (1+&k) ** (1/&m);
/*Sample Data*/
data input1;
input MNES 13.10 IV 8.6;
cards;
0.8444984010 0.400535
0.8901469633 0.347988
0.9129712444 0.318596
0.9357955255 0.291456
0.9586198066 0.264852
0.9814440877 0.236231
0.9928562283 0.224858
1.0042683688 0.220035
1.0270926499 0.201118
1.0499169310 0.189373
1.0727412121 0.185628
;
run;
data _null_;
set input1 end=last;
if _n_ = 1 then do;
call symput("first1",MNES);
call symput("first1_v",IV);
end;
if last then do;
call symput("last1",MNES);
call symput("last1_v",IV);
end;
run;
proc sql noprint;
select MNES into:mneslist
separated by ','
from input1;
select IV into:IVlist
separated by ','
from input1;
select count(*) into:countlist
from input1;
quit;
data splined;
do grid=-500 to 500;
mnes = (&u) ** grid;
if mnes < &first1 then IV = &first1_v;
if mnes > &last1 then IV = &last1_v;
if mnes >= &first1 and mnes <= &last1 then IV = msplint(mnes, &countlist, &mneslist, &IVlist);
end;
run;
PRODUCT CODE Quantity
A 1 100
A 2 150
A 3 50
total product A 300
B 1 10
B 2 15
B 3 5
total product B 30
I made a proc report and the break after product gives me the total quantity for each product. How can I compute an extra column on the right to calculate the percent quantity of product based on the subtotal?
SAS has a good example of this in their documentation, here. I reproduce a portion of this with some additional comments below. See the link for the initial datasets and formats (or create basic ones yourself).
proc report data=test nowd split="~" style(header)=[vjust=b];
format question $myques. answer myyn.;
column question answer,(n pct cum) all;
/* Since n/pct/cum are nested under answer, they are columns 2,3,4 and 5,6,7 */
/* and must be referred to as _c2_ _c3_ etc. rather than by name */
/* in the OP example this may not be the case, if you have no across nesting */
define question / group "Question";
define answer / across "Answer";
define pct / computed "Column~Percent" f=percent8.2;
define cum / computed "Cumulative~Column~Percent" f=percent8.2;
define all / computed "Total number~of answers";
/* Sum total number of ANSWER=0 and ANSWER=1 */
/* Here, _c2_ refers to the 2nd column; den0 and den1 store the sums for those. */
/* compute before would be compute before <variable> if there is a variable to group by */
compute before;
den0 = _c2_;
den1 = _c5_;
endcomp;
/* Calculate percentage */
/* Here you divide the value by its denominator from before */
compute pct;
_c3_ = _c2_ / den0;
_c6_ = _c5_ / den1;
endcomp;
/* This produces a summary total */
compute all;
all = _c2_ + _c5_;
/* Calculate cumulative percent */
temp0 + _c3_;
_c4_ = temp0;
temp1 + _c6_;
_c7_ = temp1;
endcomp;
run;