Down and In Call Option using Monte Carlo in C++ - c++

I am trying to write a C++ program that runs a Monte Carlo simulation to approximate the theoretical price of a down-and-in call option with a barrier between the moment of pricing and the option expiry. I implemented a BarrOption constructed but I don't know if I implemented this correctly. Please if anyone has an idea about what should be corrected leave a comment. Code below:
using namespace std;
#include <cmath>
#include <cstdlib>
#include <iostream>
#include <vector>
#include <algorithm>
#include "random.h"
#include "function.h"
//definition of constructor
int nInt_,
double strike_,
double spot_,
double vol_,
double r_,
double expiry_,
double barr_){
nInt = nInt_;
strike = strike_;
spot = spot_;
vol = vol_;
r = r_;
expiry = expiry_;
barr = barr_;
void BarrOption::generatePath (){
double thisDrift = (r * expiry - 0.5 * vol * vol * expiry) / double(nInt);
double cumShocks = 0;
for(int i = 0; i < nInt; i++){
cumShocks += (thisDrift + vol * sqrt(expiry / double(nInt)) * GetOneGaussianByBoxMuller());
thisPath.push_back(spot * exp(cumShocks));
// definition of getBarrOptionPrice(int nReps) method:
double BarrOption::getBarrOptionPrice(int nReps){
double rollingSum = 0.0;
for(int i = 0; i < nReps; i++){
std::vector<double>::iterator minPrice;
minPrice = std::min_element(thisPath.begin(), thisPath.end());
if (thisPath[std::distance(thisPath.end(), minPrice)] <= barr & thisPath.back() > strike) {
rollingSum += (thisPath.back() - strike);
return exp(-r*expiry)*rollingSum/double(nReps);
// definition of printPath() method:
void BarrOption::printPath(){
for(int i = 0; i < nInt; i++){
std::cout << thisPath[i] << "\n";

Hi some comments from a first read, I am pretty sure you did some other mistakes, but below some:
You are using push_back instead of reserve + assignment.
Your diffusion is better is in log-space.
You are recalculating (thisDrift + vol * sqrt(expiry / double(nInt)) each time.
Just use auto instead of std::vector::iterator minPrice.
std::distance(thisPath.end(), minPrice) this is negative no? Should be std::distance(minPrice, thisPath.end())??
You are checking the min first, which takes expensive time, first start with if the call is triggered (S>strike), and then after check the barrier, otherwise you are just wasting time.
You are finding the min, while all you need is a number less than the barrier, so better if you do a for loop with a break than finding the min.
Really this is bad code :) however, it's a good start.


C++ function to approximate sine using taylor series expansion

Hi I am trying to calculate the results of the Taylor series expansion for sine to the specified number of terms.
I am running into some problems
Your task is to implement makeSineToOrder(k)
This is templated by the type of values used in the calculation.
It must yield a function that takes a value of the specified type and
returns the sine of that value (in the specified type again)
double factorial(double long order){
#include <iostream>
#include <iomanip>
#include <cmath>
double fact = 1;
for(int i = 1; i <= num; i++){
fact *= i;
return fact;
void makeSineToOrder(long double order,long double precision = 15){
double value = 0;
for(int n = 0; n < precision; n++){
value += pow(-1.0, n) * pow(num, 2*n+1) / factorial(2*n + 1);
return value;
int main()
using namespace std;
long double pi = 3.14159265358979323846264338327950288419716939937510L;
for(int order = 1;order < 20; order++) {
auto sine = makeSineToOrder<long double>(order);
cout << "order(" << order << ") -> sine(pi) = " << setprecision(15) << sine(pi) << endl;
return 0;
I tried debugging
here is a version that at least compiles and gives some output
#include <iostream>
#include <iomanip>
#include <cmath>
using namespace std;
double factorial(double long num) {
double fact = 1;
for (int i = 1; i <= num; i++) {
fact *= i;
return fact;
double makeSineToOrder(double num, double precision = 15) {
double value = 0;
for (int n = 0; n < precision; n++) {
value += pow(-1.0, n) * pow(num, 2 * n + 1) / factorial(2 * n + 1);
return value;
int main(){
long double pi = 3.14159265358979323846264338327950288419716939937510L;
for (int order = 1; order < 20; order++) {
auto sine = makeSineToOrder(order);
cout << "order(" << order << ") -> sine(pi) = " << setprecision(15) << sine << endl;
return 0;
not sure what that odd sine(pi) was supposed to be doing
Apart the obvious syntax errors (the includes should be before your factorial header) in your code:
I see no templates in your code which your assignment clearly states to use
so I would expect template like:
<class T> T mysin(T x,int n=15){ ... }
using pow for generic datatype is not safe
because inbuild pow will use float or double instead of your generic type so you might expect rounding/casting problems or even unresolved function in case of incompatible type.
To remedy that you can rewrite the code to not use pow as its just consequent multiplication in loop so why computing pow again and again?
using factorial function is waste
you can compute it similar to pow in the same loop no need to compute the already computed multiplications again and again. Also not using template for your factorial makes the same problems as using pow
so putting all together using this formula:
along with templates and exchanging pow,factorial functions with consequent iteration I got this:
template <class T> T mysin(T x,int n=15)
int i;
T y=0; // result
T x2=x*x; // x^2
T xi=x; // x^i
T ii=1; // i!
if (n>0) for(i=1;;)
y+=xi/ii; xi*=x2; i++; ii*=i; i++; ii*=i; n--; if (!n) break;
y-=xi/ii; xi*=x2; i++; ii*=i; i++; ii*=i; n--; if (!n) break;
return y;
so factorial ii is multiplied by i+1 and i+2 every iteration and power xi is multiplied by x^2 every iteration ... the sign change is hard coded so for loop does 2 iterations per one run (that is the reason for the break;)
As you can see this does not use anything funny so you do not need any includes for this not even math ...
You might want to add x=fmod(x,6.283185307179586476925286766559) at the start of mysin in order to use more than just first period however in that case you have to ensure fmod implementation uses T or compatible type to it ... Also the 2*pi constant should be in target precision or higher
beware too big n will overflow both int and generic type T (so you might want to limit n based on used type somehow or just use it wisely).
Also note on 32bit floats you can not get better than 5 decimal places no matter what n is with this kind of computation.
Btw. there are faster and more accurate methods of computing goniometrics like Chebyshev and CORDIC

Monte Carlo Method calculating pi

I'm trying to calculate the pi using Monte Carlo Method. But I always get zero, I don't know why.
Here's my code
#include <tchar.h>
#include <Windows.h>
#include <omp.h>
#include <iostream>
using namespace std;
int main(int argc, char *argv[]){
int N = 1000, n = 0;
double x = 0, y = 0;
double answer;
for (int i = 0; i < N; i++){
x = (double)rand() / (double)RAND_MAX;
y = (double)rand() / (double)RAND_MAX;
if (((x*x) + (y*y)) < 1)
//cout << "n = " <<n << endl;
answer = n / N;
cout << answer*4.0 << endl;
Integer division in the answer calculation:
answer = n / N;
'nuff said.
Edit 1:
It's Friday, so I'll add some explanation.
The variables n and N are declared as integers.
The division takes precedence over any conversions or assignments. The division is performed as two integers, then the fractional portion is truncated. The remaining value is converted to double then assigned to the variable answer.
Please, please, don't differentiate identifiers by case. The n and N should be different letters. This helps the writer and reviewer refrain from typo defects.

Why is my C++ code so much slower than R?

I have written the following codes in R and C++ which perform the same algorithm:
a) To simulate the random variable X 500 times. (X has value 0.9 with prob 0.5 and 1.1 with prob 0.5)
b) Multiply these 500 simulated values together to get a value. Save that value in a container
c) Repeat 10000000 times such that the container has 10000000 values
ptm <- proc.time()
steps <- 500
MCsize <- 10000000
a <- rbinom(MCsize,steps,0.5)
b <- rep(500,times=MCsize) - a
result <- rep(1.1,times=MCsize)^a*rep(0.9,times=MCsize)^b
#include <numeric>
#include <vector>
#include <iostream>
#include <random>
#include <thread>
#include <mutex>
#include <cmath>
#include <algorithm>
#include <chrono>
const size_t MCsize = 10000000;
std::mutex mutex1;
std::mutex mutex2;
unsigned seed_;
std::vector<double> cache;
void generatereturns(size_t steps, int RUNS){
// setting seed
std::mt19937 tmpgenerator(seed_);
seed_ = tmpgenerator();
std::cout << "SEED : " << seed_ << std::endl;
}catch(int exception){
// Creating generator
std::binomial_distribution<int> distribution(steps,0.5);
std::mt19937 generator(seed_);
for(int i = 0; i!= RUNS; ++i){
double power;
double returns;
power = distribution(generator);
returns = pow(0.9,power) * pow(1.1,(double)steps - power);
std::lock_guard<std::mutex> guard(mutex1);
int main(){
std::chrono::steady_clock::time_point start = std::chrono::steady_clock::now();
size_t steps = 500;
seed_ = 777;
unsigned concurentThreadsSupported = std::max(std::thread::hardware_concurrency(),(unsigned)1);
int remainder = MCsize % concurentThreadsSupported;
std::vector<std::thread> threads;
// starting sub-thread simulations
if(concurentThreadsSupported != 1){
for(int i = 0 ; i != concurentThreadsSupported - 1; ++i){
if(remainder != 0){
threads.push_back(std::thread(generatereturns,steps,MCsize / concurentThreadsSupported + 1));
threads.push_back(std::thread(generatereturns,steps,MCsize / concurentThreadsSupported));
//starting main thread simulation
if(remainder != 0){
generatereturns(steps, MCsize / concurentThreadsSupported + 1);
generatereturns(steps, MCsize / concurentThreadsSupported);
for (auto& th : threads) th.join();
std::chrono::steady_clock::time_point end = std::chrono::steady_clock::now() ;
typedef std::chrono::duration<int,std::milli> millisecs_t ;
millisecs_t duration( std::chrono::duration_cast<millisecs_t>(end-start) ) ;
std::cout << "Time elapsed : " << duration.count() << " milliseconds.\n" ;
return 0;
I can't understand why my R code is so much faster than my C++ code (3.29s vs 12s) even though I have used four threads in the C++ code? Can anyone enlighten me please? How should I improve my C++ code to make it run faster?
Thanks for all the advice! I reserved capacity for my vectors and reduced the amount of locking in my code. The crucial update in the generatereturns() function is :
std::vector<double> cache(MCsize);
std::vector<double>::iterator currit = cache.begin();
// Creating generator
std::binomial_distribution<int> distribution(steps,0.5);
std::mt19937 generator(seed_);
std::vector<double> tmpvec(RUNS);
for(int i = 0; i!= RUNS; ++i){
double power;
double returns;
power = distribution(generator);
returns = pow(0.9,power) * pow(1.1,(double)steps - power);
tmpvec[i] = returns;
std::lock_guard<std::mutex> guard(mutex1);
currit += RUNS;
Instead of locking every time, I created a temporary vector and then used std::move to shift the elements in that tempvec into cache. Now the elapsed time has reduced to 1.9seconds.
First of all, are you running it in release mode?
Switching from debug to release reduced the running time from ~15s to ~4.5s on my laptop (windows 7, i5 3210M).
Also, reducing the number of threads to 2 instead of 4 in my case (I just have 2 cores but with hyperthreading) further reduced the running time to ~2.4s.
Changing the variable power to int (as jimifiki also suggested) also offered a slight boost, reducing the time to ~2.3s.
I really enjoyed your question and I tried the code at home. I tried to change the random number generator, my implementation of std::binomial_distribution requires on average about 9.6 calls of generator().
I know the question is more about comparing R with C++ performances, but since you ask "How should I improve my C++ code to make it run faster?" I insist with pow optimization. You can easily avoid one half of the call by precomputing either 0.9^steps or 1.1^steps before the for loop. This makes your code run a bit faster:
double power1 = pow(0.9,steps);
double ratio = 1.1/0.9;
for(int i = 0; i!= RUNS; ++i){
returns = myF1 * pow(myF2, (double)power);
Analogously you can improve the R code:
ratio <-1.1/0.9
pow1 = 0.9^steps
result <- rep(ratio,times=MCsize)^rep(pow1,times=MCsize)
Probably doesn't help you that much, but
start by using pow(double,int) when your exponent is an int.
int power;
returns = pow(0.9,power) * pow(1.1,(int)steps - power);
Can you see any improvement?

Armadillo+OpenBLAS slower than MATLAB?

New to SO. I am test-driving Armadillo+OpenBLAS, and a simple Monte-Carlo geometric Brownian motion logic shows much longer runtime than MATLAB. I believe something must be wrong.
Intel i-5 4 core,
8GB ram,
VS 2012 Express,
Armadillo 4.2,
OpenBLAS (official x64 binary) v0.2.9.rc2,
MATLAB takes 2 seconds for the same logic, but Armadillo+OB takes 12 seconds. I also noticed that the program is running on single thread, but I turned to OpenBLAS because I heard of its multi-core capability.
Thanks for any advice.
#include <iostream>
#include <armadillo>
#include <ctime>
using namespace std;
using namespace arma;
int main()
clock_t start;
start = clock();
unsigned int R=100000;
vec Spre = 100*ones<vec> (R);
vec S = zeros<vec> (R);
double r = 0.03;
double Vol = 0.2;
double TTM = 5;
unsigned int T=260*TTM;
double dt = TTM/T;
for (unsigned int iT=0; iT<T; ++iT)
S = Spre%exp((r-0.5*Vol*Vol)*dt + Vol*sqrt(dt)*randn(R));
Spre = S;
cout << mean(S) << endl;
cout << (clock()-start) / (double) CLOCKS_PER_SEC << endl;
return 0;
First, the bottleneck is not exp(), though std::exp is slow. The problem is randn().
on my machine, randn() takes most of the time. And when I use MKL_VSL 's implementation of randn, the time cost dropped from 12s to 4s, comparable to matlab's 3s or so.
#include <iostream>
#include <armadillo>
#include <ctime>
#include "mkl_vml.h"
#include "mkl_vsl.h"
using namespace std;
using namespace arma;
#define SEED 0
#define METHOD 0
int main()
clock_t start;
VSLStreamStatePtr stream;
start = clock();
vslNewStream(&stream, BRNG, SEED);
unsigned int R=100000;
vec Spre = 100*ones<vec> (R);
vec S = zeros<vec> (R);
double r = 0.03;
double Vol = 0.2;
double TTM = 5;
unsigned int T=260*TTM;
double dt = TTM/T;
double tmp = sqrt(dt);
vec tmp2=100*zeros<vec>(R);
vec tmp3=100*zeros<vec>(R);
for (unsigned int iT=0; iT<T; ++iT)
vdRngGaussian(METHOD,stream, R, tmp3.memptr(), 0, 1);
tmp2 =(r - 0.5 * Vol * Vol) * dt + Vol * tmp * tmp3;
vdExp(R, tmp2.memptr(), tmp3.memptr());
S = Spre%tmp3;
Spre = S;
cout << mean(S) << endl;
cout << (clock()-start) / (double) CLOCKS_PER_SEC << endl;
return 0;
Key observation is that Armadillo exp() function is way slower than MATLAB.
Similar overhead is observed in log(), pow() and sqrt().
Just a guess, but it looks like you need to set the number of threads to use in OpenBLAS via the OPENBLAS_NUM_THREADS environment variable.
Try something like:
...on the command line before you run your program. Substitute the number of cores in your system where I put "4" (some would say set it to twice the number of cores in your system--YMMV).
Make sure you have Streaming SIMD Extensions enabled when you compile your code. In Visual Studio, check your project C/C++ compiler code generation options.

C++ Use secant method to solve function

I have a school problem but I do not understand what it actually asks. Any of you have an idea what it's really asking for? I don't need code, I just need to understand it.
This is the problem:
Construct a computer program that uses the Secant method to solve the problem:
f(x)  =  (1+x) cos( sin(x)3 ) -  1.4   =  0
Starting with the initial guesses of x=2.0 and x=2.1, obtain an approximation to x such that  |f(x)| < 0.0000001.
This is my code from what I understand, but I think I'm not understanding the question correctly.
#include <iostream>
#include <cmath>
double secant(double x);
using namespace std;
int main()
double x = 2.0;
double r = 0.0;
int counter = 0;
while( r < 0 && counter <= 40)
r =secant(x);
cout << "x: " << x << ", f(x): " << r << endl;
x += 0.1;
return 0;
double secant(double x)
double r;
r = (1+x) * cos(pow(sin(x), 3.0)) - 1.4;
return r;
You are supposed to use the Secant Method:
Follow the method as described in the article. It is an iterative method much like Netwon's method. You'll need to make a function to evaluate x(n+1) given x(n) and iterate it until your margin of error is less than specified.
The coding side of this may prove fairly straightforward as long as you know what the secant method is. Also, that page has a code example. That should prove pretty useful. :)