I want to calculate daily the implied volatility for a data set of option chains. I have all necessary data in a dataset with the columns:
OptionID opt_price strike today exp eq_price intrate
The SAS code for the IV is:
options pageno=1 nodate ls=80 ps=64;
proc fcmp;
opt_price=5;
strike=50;
today='20jul2010'd;
exp='21oct2010'd;
eq_price=50;
intrate=.05;
time=exp - today;
array opts[5] initial abconv relconv maxiter status
(.5 .001 1.0e-6 100 -1);
function blksch(strike, time, eq_price, intrate, volty);
return(blkshclprc(strike, time/365.25,
eq_price, intrate, volty));
endsub;
bsvolty=solve("blksch", opts, opt_price, strike,
time, eq_price, intrate, .);
put 'Option Implied Volatility:' bsvolty
'Initial value: ' opts[1]
'Solve status: ' opts[5];
run;
Source: https://documentation.sas.com/?docsetId=proc&docsetTarget=p1xoknqns865t7n1wehj6xarwhdb.htm&docsetVersion=9.4&locale=en#p0ymk0vrf7cecfn1kec073rxqm7z
Now, this function somehow does not need sigma. Why?
Second, how can I feed in and output a dataset with option series for a few years?
I tried by optionID but I don't know how I feed in the data correctly and then add it in a dataset (new variable called bsvolty.
Use the FCMP options DATA= and OUT= to provide inputs and capture outputs.
As for the missing value (.) in the sigma argument position, the SOLVE documentation states:
The SOLVE function finds the value of the specified argument that makes the expression of the following form equal to zero.
expected-value -
function-name
(argument-1,argument-2,
..., argument-n)
You specify the argument of interest with a missing value (.), which appears in place of the argument in the parameter list that is shown above. If the SOLVE function finds the value, then the value that is returned for this function is the implied value.
So, the SOLVE() is for the blkshclprc sigma (i.e. the volatilty)
Example code:
data have;
input OptionID opt_price strike today: date9. exp: date9. eq_price intrate;
format today exp date9.;
datalines;
1 5 50 20jul2010 21oct2010 50 0.05
2 5 75 21jul2010 22oct2010 50 0.05
3 5 55 22jul2010 23oct2010 50 0.05
4 5 60 23jul2010 24oct2010 50 0.05
;
proc fcmp data=have out=want;
time = exp - today;
array opts[5]
initial abconv relconv maxiter status
( .5 .001 1.0e-6 100 -1)
;
function blksch(strike, time, eq_price, intrate, volty);
put volty=; /* show the SOLVE iterations in the OUTPUT window */
return ( blkshclprc (
strike, /* E: exercise prices */
time/365.25, /* t: time to maturity (years) */
eq_price, /* S: share price */
intrate, /* r: annualized risk-free interest rate, continuouslycompounded */
volty /* sigma: volatility of the underlying asset */
));
endsub;
bsvolty=solve("blksch", opts, opt_price, strike,
time, eq_price, intrate, .);
run;
The output data set
The OUTPUT window
Related
I have a column for dollar-amount that I need to break apart into $1000 segments - so $0-$999, $1,000-$1,999, etc.
I could use Case/When, but there are an awful lot of groups I would have to make.
Is there a more efficient way to do this?
Thanks!
You could just use arithmetic. For example you could convert them to upper limit of the $1,000 range.
up_to = 1000*ceil(dollar/1000);
Let's make up some example data:
data test;
do dollar=0 to 5000 by 500 ;
up_to = 1000*ceil(dollar/1000);
output;
end;
run;
Results:
Obs dollar up_to
1 0 0
2 500 1000
3 1000 1000
4 1500 2000
5 2000 2000
6 2500 3000
7 3000 3000
8 3500 4000
9 4000 4000
10 4500 5000
11 5000 5000
Absolutely. This is a great use case for user-defined formats.
proc format;
value segment
0-<1000 = '0-1000'
1000-<2000 = '1000s'
2000-<3000 = '2000s'
;
quit;
If the number is too high to write out, do it with code!
data segments;
retain
fmtname 'segment'
type 'n' /* numeric format */
eexcl 'Y' /* exclude the "end" match, so 0-1000 excluding 1000 itself */
;
do start = 0 to 1e6 by 1000;
end = start + 1000;
label = catx('- <',start,end); * what you want this to show up as;
output;
end;
run;
proc format cntlin=segments;
quit;
Then you can use segment = put(dollaramt,segment.); to assign the value of segment, or just apply the format format dollaramt segment.; if you're just using it in PROC SUMMARY or somesuch.
And you can combine the two approaches above to generate a User Defined Format that will bin the amounts for you.
Create bins to set up a user defined format. One drawback of this method is that it requires you to know the range of data ahead of time.
Use a user defined function via PROC FCMP.
Use a manual calculation
I illustrate version of the solution for 1 & 3 below. #2 requires PROC FCMP but I think using it a plain data step can be simpler.
data thousands_format;
fmtname = 'thousands_fmt';
type = 'N';
do Start = 0 to 10000 by 1000;
END = Start + 1000 - 1;
label = catx(" - ", put(start, dollar12.0), put(end, dollar12.0));
output;
end;
run;
proc format cntlin=thousands_format;
run;
data demo;
do i=100 to 10000 by 50;
custom_format = put(i, thousands_fmt.);
manual_format = catx(" - ", put(floor(i/1000)*1000, dollar12.0), put((ceil(i/1000))*1000-1, dollar12.0));
output;
end;
run;
I have ~4M observations of points data, and would like to segment them into different bins like the following:
Point_Range Frequency
0-100 1000000
100-200 2000000
200-300 1000000
How would I be able to assign a range to each of the observations & output the above-like table without having to write manual "case when" or "if then" statements?
Use a custom format to map a point value to it's range representation.
Example:
data user_points;
call streaminit(20201230);
do user_id = 1 to 4e6;
points = rand('integer', 0, 300);
output;
end;
run;
proc format;
value point_range
0-<100 = ' 0 - 99'
100-<200 = '100 - 199'
200- 300 = '200 - 300'
;
run;
proc freq noprint data=user_points;
format points point_range.;
table points / out=bins;
run;
Creates data set
I am trying to develop a recursive program to in missing string values using flat probabilities (for instance if a variable had three possible values and one observation was missing, the missing observation would have a 33% of being replace with any value).
Note: The purpose of this post is not to discuss the merit of imputation techniques.
DATA have;
INPUT id gender $ b $ c $ x;
CARDS;
1 M Y . 5
2 F N . 4
3 N Tall 4
4 M Short 2
5 F Y Tall 1
;
/* Counts number of categories i.e. 2 */
proc sql;
SELECT COUNT(Unique(gender)) into :rescats
FROM have
WHERE Gender ~= " " ;
Quit;
%let rescats = &rescats;
%put &rescats; /*internal check */
/* Collects response categories separated by commas i.e. F,M */
proc sql;
SELECT UNIQUE gender into :genders separated by ","
FROM have
WHERE Gender ~= " "
GROUP BY Gender;
QUIT;
%let genders = &genders;
%put &genders; /*internal check */
/* Counts entries to be evaluated. In this case observations 1 - 5 */
/* Note CustomerKey is an ID variable */
proc sql;
SELECT COUNT (UNIQUE(customerKey)) into :ID
FROM have
WHERE customerkey < 6;
QUIT;
%let ID = &ID;
%put &ID; /*internal check */
data want;
SET have;
DO i = 1 to &ID; /* Control works from 1 to 5 */
seed = 12345;
/* Sets u to rand value between 0.00 and 1.00 */
u = RanUni(seed);
/* Sets rand gender to either 1 and 2 */
RandGender = (ROUND(u*(&rescats - 1)) + 1)*1;
/* PROBLEM Should if gender is missing set string value of M or F */
IF gender = ' ' THEN gender = SCAN(&genders, RandGender, ',');
END;
RUN;
I the SCAN function does not create a F or M observation within gender. It also appears to create a new M and F variable. Additionally the DO Loop creates addition entry under within CustomerKey. Is there any way to get rid of these?
I would prefer to use loops and macros to solve this. I'm not yet proficient with arrays.
Here is my attempt at tidying this up a little:
/*Changed to delimited input so that values end up in the right columns*/
DATA have;
INPUT id gender $ b $ c $ x;
infile cards dlm=',';
CARDS;
1,M,Y, ,5
2,F,N, ,4
3, ,N,Tall,4
4,M, ,Short,2
5,F,Y,Tall,1
;
/*Consolidated into 1 proc, addded noprint and removed unnecessary group by*/
proc sql noprint;
/* Counts number of categories i.e. 2 */
SELECT COUNT(unique(gender)) into :rescats
FROM have
WHERE not(missing(Gender));
/* Collects response categories separated by commas i.e. F,M */
SELECT unique gender into :genders separated by ","
FROM have
WHERE not(missing(Gender))
;
Quit;
/*Removed redundant %let statements*/
%put rescats = &rescats; /*internal check */
%put genders = &genders; /*internal check */
/*Removed ID list code as it wasn't making any difference to the imputation in this example*/
data want;
SET have;
seed = 12345;
/* Sets u to rand value between 0.00 and 1.00 */
u = RanUni(seed);
/* Sets rand gender to either 1 or 2 */
RandGender = ROUND(u*(&rescats - 1)) + 1;
IF missing(gender) THEN gender = SCAN("&genders", RandGender, ','); /*Added quotes around &genders to prevent SAS interpreting M and F as variable names*/
RUN;
Halo8:
/*Changed to delimited input so that values end up in the right columns*/
DATA have;
INPUT id gender $ b $ c $ x;
infile cards dlm=',';
CARDS;
1,M,Y, ,5
2,F,N, ,4
3, ,N,Tall,4
4,M, ,Short,2
5,F,Y,Tall,1
;
run;
Tip: You can use a dot (.) to mean a missing value for a character variable during INPUT.
Tip: DATALINES is the modern alternative to CARDS.
Tip: Data values don't have to line up, but it helps humans.
Thus this works as well:
/*Changed to delimited input so that values end up in the right columns*/
DATA have;
INPUT id gender $ b $ c $ x;
DATALINES;
1 M Y . 5
2 F N . 4
3 . N Tall 4
4 M . Short 2
5 F Y Tall 1
;
run;
Tip: Your technique requires two passes over the data.
One to determine the distinct values.
A second to apply your imputation.
Most approaches require two passes per variable processed. A hash approach can do only two passes but requires more memory.
There are many ways to deteremine distinct values: SORTING+FIRST., Proc FREQ, DATA Step HASH, SQL, and more.
Tip: Solutions that move data to code back to data are sometimes needed, but can be troublesome. Often the cleanest way is to let data remain data.
For example: INTO will be the wrong approach if the concatenated distinct values would require more than 64K
Tip: Data to Code is especially troublesome for continuous values and other values that are not represented exactly the same when they become code.
For example: high precision numeric values, strings with control-characters, strings with embedded quotes, etc...
This is one approach using SQL. As mentioned before, Proc SURVEYSELECT is far better for real applications.
Proc SQL;
Create table REPLACEMENTS as select distinct gender from have where gender is NOT NULL;
%let REPLACEMENT_COUNT = &SQLOBS; %* Tip: Take advantage of automatic macro variable SQLOBS;
data REPLACEMENTS;
set REPLACEMENTS;
rownum+1; * rownum needed for RANUNI matching;
run;
Proc SQL;
* Perform replacement of missing values;
Update have
set gender =
(
select gender
from REPLACEMENTS
where rownum = ceil(&REPLACEMENT_COUNT * ranuni(1234))
)
where gender is NULL
;
%let SYSLAST = have;
DM 'viewtable have' viewtable;
You don't have to be concerned about columns not having a missing value because no replacement would occur in those. For columns having a missing the list of candidate REPLACEMENTS excludes the missing and the REPLACEMENT_COUNT is correct for computing the uniform probability of replacement, 1/COUNT, coded as rownum = ceil (random)
I have the numeric values of salaries of different employee's. I want to break the ranges up into categories. However I do not want a new column rather, I want to just format the existing salary column into this range method:
At least $20,000 but less than $100,000 -
At least $100,000 and up to $500,000 - >$100,000
Missing - Missing salary
Any other value - Invalid salary
I've done something similar with gender. I just want to use the proc print and format command to show salary and gender.
DATA Work.nonsales2;
SET Work.nonsales;
RUN;
PROC FORMAT;
VALUE $Gender
'M'='Male'
'F'='Female'
'O'='Other'
other='Invalid Code';
PROC FORMAT;
VALUE salrange
'At least $20,000 but less than $100,000 '=<$100,000
other='Invalid Code';
PROC PRINT;
title 'Salary and Gender';
title2 'for Non-Sales Employees';
format gender $gender.;
RUN;
Proc Format is the correct method and you need a numeric format:
proc format;
value salfmt
20000 - <100000 = "At least $20,000 but less than $100,000"
100000 - 500000 = "100,000 +"
. = 'Missing'
other = 'Other';
Then in your print apply the format, similar to what you did for gender.
format salary salfmt.;
This should help get you started.
I created a little function that mimics the R cut functions :
options cmplib=work.functions;
proc fcmp outlib=work.functions.test;
function cut2string(var, cutoffs[*], values[*] $) $;
if var <cutoffs[1] then return (values[1]);
if var >=cutoffs[dim(cutoffs)] then return (values[dim(values)]);
do i=1 to dim(cutoffs);
if var >=cutoffs[i] & var <cutoffs[i+1] then return (values[i+1]);
end;
return ("Error, this shouldn't ever happen");
endsub;
run;
Then you can use it like this :
data Work.nonsales2;
set Work.nonsales;
array cutoffs[3] _temporary_ (20000 100000 500000);
array valuesString[4] $10 _temporary_ ("<20k " "20k-100k" "100k-500k" ">500k");
salary_string = cut2string(salary ,cutoffs,valuesString);
run;
I want to create a summary report by Proc Report which should have following columns
Probability, Nbr_of_Optys, Total_Media_Value & Tot_Forecast which is computed by the product of probability & Total_Media_Value.
I have written this code:
proc report data = Cs1.olympics headline;
column Probability Stage (n) Total_Media_Value Tot_Forecast;
where Probability > 0;
define Probability/group Descending 'Probability';
define Stage/group noprint;
define n / format = comma6. 'Nbr_of_Optys';
define Total_Media_Value/analysis format = dollar25. 'Tot_Budget';
define Tot_Forecast/computed format = dollar25.;
compute Tot_Forecast;
Tot_Forecast = (Total_Media_Value.sum*Probability)/100;
endcomp;
rbreak after / summarize ol ul skip;
run;
After running the report a Message is given Missing Values are generated after performing the operation followed by this report
Probability Nbr_of_Optys Tot_Budget Tot_Forecast
100 7 $171,675,000 $171,675,000
90 4 $205,000,000 $184,500,000
70 8 $264,000,000 $184,800,000
50 20 $127,040,000 $63,520,000
30 3 $2,450,000 $735,000
10 319 $333,729,670 $333,372,967
361 $1,103,894,670
I didn't get any summarize value for Tot_Forecast
Your problem is that, because rbreak is executing at the end of the report (as it should), it doesn't have a value for Probability to insert in that computation. Note that PROC REPORT in RBREAK gives you according to the documentation:
the results of the calculations based on the code in the corresponding compute block
Only grouping values can be used. So Probability.sum could be used there, except it's not an analytic variable so it doesn't have a sum; you'd have to add a separate variable with probability in it to use as analytic - but even then it probably doesn't get you what you want.
What you want, I think, is the sum of the tot_forecast. But that's not what you get, unfortunately. What you need to do is help the RBREAK out:
proc report data = olympics headline;
column Probability Stage (n) Total_Media_Value Tot_Forecast ;
where Probability > 0;
define Probability/group Descending 'Probability';
define Stage/group noprint;
define n / format = comma6. 'Nbr_of_Optys';
define Total_Media_Value/analysis format = dollar25. 'Tot_Budget';
define Tot_Forecast/computed format = dollar25.;
compute tot_forecast;
if upcase (_BREAK_) = ' ' then do;
Tot_Forecast = (Total_Media_Value.sum*Probability)/100;
tot_forecast_summer=sum(tot_forecast_summer,tot_Forecast);
end;
else do;
tot_forecast=tot_forecast_Summer;
end;
endcomp;
rbreak after / summarize ol ul skip;
run;
What I've done here is create a summation variable tot_forecast_Summer which sums the tot_forecast values. Then I use it in the else case (when _BREAK_='_RBREAK_' technically, but I just compare empty to else for simplicity). Here we make a different choice in the compute statement when the break is on: we set the computed column to the computed sum. This gives us:
Probability Nbr_of_Optys Tot_Budget Tot_Forecast
100 1 $171,675,000 $171,675,000
90 1 $205,000,000 $184,500,000
70 1 $264,000,000 $184,800,000
50 1 $127,040,000 $63,520,000
30 1 $2,450,000 $735,000
10 1 $333,729,670 $33,372,967
6 $1,103,894,670 $638,602,967