Timing of a C++ function wrapped with cython

Timing of a C++ function wrapped with cython - c++

I do not really know how to ask this question, but I will try to be as clear as possible.
I am timing a C++ function call from python. The C++ function is wrapped with cython.
I am currently timing the python call of the cython function and I get like 52.9 ms with time.time(). On an other hand, I am timing the whole C++ function with the C++ std::chrono::high_resolution_clock library.
The thing is, I am measuring 17.1 ms in C++.
The C++ function is declared like this vector<float> cppfunc(vector<float> array, int a, int b, int c); and is a class A method.
The cython code only calls the C++ class method. The vector contains about 320k elements.
I was wondering if these two measured times could be compared like this?
If we can, what can explain this gap?
If not, which timing tool should I use?
Edit1: (links in comments) both timing libraries are precise enough for my use case (10e-9 for cpp on my arch and 10e-6 for python).
Edit2: added simplified code to illustrate my point. With this code, the python call duration (~210ms) is 8x the intern cpp duration (~28ms).
// example.cpp
#include "example.h"
#include <iostream>
#include <chrono>
std::vector<float> wrapped_function(std::vector<float> array)
{
auto start = std::chrono::high_resolution_clock::now();
std::vector<float> result;
for (int i = 0; i < (int) array.size(); i++) {
result.push_back(array[i] / 1.1);
}
auto end = std::chrono::high_resolution_clock::now();
std::chrono::duration<float> duration = end - start;
printf("Within duration: %.5f\n", duration.count());
return result;
}
// example.h
#ifndef __EXAMPLE_H_
#define __EXAMPLE_H_
#include <vector>
std::vector<float> wrapped_function(std::vector<float> array);
#endif
# example_wrapper.pxd
from libcpp.vector cimport vector
cdef extern from "example.h":
vector[float] wrapped_function(vector[float])
# example_wrapper.pyx
from example_wrapper cimport wrapped_function
def cython_wrap(array):
return wrapped_function(array)
# setup.py
from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext
setup(
cmdclass = {"build_ext": build_ext},
ext_modules = [
Extension(name="example_wrapper",
sources=["example_wrapper.pyx", "example.cpp"],
include_dirs=["/home/SO/"],
language="c++",
extra_compile_args=["-O3", "-Wall", "-std=c++11"]
)
]
)
# test.py
import example_wrapper
from time import time
array = [i for i in range(1000000)]
t0 = time()
result = example_wrapper.cython_wrap(array)
t1 = time()
print("Wrapped duration: {}".format(t1 - t0))

Obviously the difference is from cython overhead, but why it is so big?
The call of the wrapped function is more complex than meets the eye:
def cython_wrap(array):
return wrapped_function(array)
array is a list of integers, the wrapped_function expects vector of floats, so the cython creates automatically a vector and populates it with values from the list.
The wrapped_function returns a vector of floats, but in order to be used by python it must be converted to a python-list. Once again cython automatically creates a python-list and populates it with python-floats which are quite costly to construct and correspond to the floats from the returned vector.
As you can see, a lot of copying is going on and this explains the overhead you are observing.
Here is the set of rules cython automatically applies when converting from c++-containers to python.
Another issue: you are passing the vector array by value, so it has to be copied. Your timing of c++ code does not include this copying and thus it is a little bit unfair.
You should pass the vector by const-reference, i.e.
... wrapped_function(const std::vector<float> &array)
One more thing: you return a vector which possible will be copied as well and this copying-time is once again not included in your c++-timings. However all modern compilers apply return value optimization, so this is not an issue here.

Related

Dynamic allocation of matrix. How?? (c++ Stroustrup)

I'm using Stroustrup's matrix.h implementation as I have a lot of matrix heavy computation to do. It will make life easier if I can just get the matrix populated!
I'm receiving a complex object with a matrix that is not known until received. Once it enters the method, I can get the row and column count, but I have to use a double i,j loop to pull the values since they are in a cpp17::any structure and I have to convert them using asNumber().
I declare it as follows as part of an object definition:
Matrix<double,2> inputLayer;
In the code that instantiates the object, I have the following code:
int numRows = sourceSeries->rowCount();
int numColumns = sourceSeries->columnCount();
int i,j = 0;
for(i=0; i<numRows; i++){
for(j=0;j<numColumns;j++) {
// make sure you skip the header row in sourceSeries
inputLayer[i][j] = asNumber(sourceSeries->data(i+1,j,ItemDataRole::Display));
}
}
There is nothing like push_back() for the matrix template. The examples I can find in his books and on the web either pre-populate the matrix in the definition or create it from existing lists, which I won't have at this particular time.
Do I need to define a "new" double, receive the asNumber(), then set inputlayer[][] = the "new" double?
I'm hoping not to have to manage the memory like I can do with vectors that release when I go out of scope, which is why I was avoiding "new."
I'm using the boost frameworks as well and I'm wondering if I should try ublas version instead, or just get this one working.

Thanks for the pointers to Eigen, that was so simple! Here's all I had to do:
In the header file:
#include "Eigen/Dense"
using namespace Eigen;
In the object definition of the header file:
Matrix<double, Dynamic, Dynamic> inputLayer;
In the code where I need to read in the matrix:
int numRows = sourceSeries->rowCount();
int numColumns = sourceSeries->columnCount();
int i,j = 0;
MatrixXd inputLayer(numRows,numColumns);
for(i=0; i<numRows; i++){
for(j=0;j<numColumns;j++) {
// make sure you skip the header row in sourceSeries
inputLayer(i,j) = asNumber(sourceSeries->data(i+1,j,ItemDataRole::Display));
}
}
Sorry I had to waste so much time trying to get the other code to work, but at least I got real familiar with my debugger and the codebase again. Thanks everyone for the comments!

Random normal distribution by Gaussian in C++

I have my function in Python for normal distribution. I need to convert it to C++ and i am not familiar with language.
Here is my Python:
def calculation(value):
sigma = 0.5
size = 10000
x = 200
x_distribution = np.random.normal(value, sigma, size)
for i in x_distribution:
x.append(i)
return x
And it works as expected. I am trying to re-write same thing in C++ and found only the Link and where the "std::normal_distribution<> d{5,2};
" has to make magic. But i could not figure it out how to implement.
Here what i have tried and it is failing.
# include frame.distribution
Frame DistributionModel(x_mu, x_sigma)
{
// Motion model;ignore it
model = std::normal_distribution<> d{x_mu,x_sigma};
return model;
}
Please, help me. Looking for any hints. Thanks.

Well, trouble without end...
# include frame.distribution
Syntax for inclusion is:
#include <name_of_header_file>
// or:
#include "name_of_header_file"
(The space in between # and include does not harm, but is absolutely uncommon...)
Frame DistributionModel(x_mu, x_sigma)
C++ is a strongly typed language, i. e. you cannot just give variables a name as in Python, but you need to give them a type!
Frame DistributionModel(double x_mu, double x_sigma)
Same for local variables; type must match what you actually assign to (unless using auto)
std::normal_distribution<double> nd(x_mu, x_sigma);
This is a bit special about C++: You define a local variable, e. g.
std::vector<int> v;
In case of a class, it gets already constructed using its default constructor. If you want to call a constructor with arguments, you just append the call to the variable name:
std::vector<int> v(10); // vector with 10 elements.
What you saw in the sample is a feature called "uniform initialisation", using braces instead of parentheses. I personally strongly oppose against its usage, though, so you won't ever see it in code I have written (see me constructing the std::normal_distribution above...).
std::normal_distribution is defined in header random, so you need to include it (before your function definition):
#include <random>
About the return value: You only can return Frame, if the data type is defined somewhere. Now before trying to define a new class, we just can use an existing one: std::vector (it's a template class, though). A vector is quite similar to a python list, it is a container class storing a number of objects in contiguous memory; other than python lists, though, the type of all elements stored must be the same. We can use such a vector to collect the results:
std::vector<double> result;
Such a vector can grow dynamically, however, this can result in necessity to re-allocate the internal storage memory. Costly. If you know the number of elements in advance, you can tell the vector to allocate sufficient memory in advance, too:
result.reserve(max);
The vector is what we are going to return, so we need to adjust the function signature (I allowed to give it a different name and added another parameter):
std::vector<double> getDistribution(double x_mu, double x_sigma, size_t numberOfValues)
It would be possible to let the compiler deduce the return type, using auto keyword for. While auto brings quite a lot of benefits, I do not recommend it for given purpose: With explicit return type, users of the function see right from the signature what kind of result to expect and do not have to look into the function body to know about.
std::normal_distribution now is a number generator; it does not deliver the entire sequence at once as the python equivalent does, you need to draw the values one by another explicitly:
while(numberOfValues-- > 0)
{
auto value = nd(gen);
result.push_back(value);
}
nd(gen): std::normal_distribution provides a function call operator operator(), so objects of can be called just like functions (such objects are called "functors" in C++ terminology). The function call, however, requires a random number generator as argument, so we need to provide it as in the example you saw. Putting all together:
#include <random>
#include <vector>
std::vector<double> getDistribution
(
double x_mu, double x_sigma, size_t numberOfValues
)
{
// shortened compared to your example:
std::mt19937 gen((std::random_device())());
// create temporary (anonymous) ^^
// instance and call it immediately ^^
// afterwards
std::normal_distribution<double> nd(x_mu, x_sigma);
std::vector<double> result;
result.reserve(numberOfValues);
while(numberOfValues-- > 0)
{
// shorter than above: using result of previous
// function (functor!) call directly as argument to next one
result.push_back(nd(gen));
}
// finally something familiar from python:
return result;
}

#include<iostream>
#include<random>
#include<chrono>
int main() {
unsigned seed = std::chrono::system_clock::now().time_since_epoch().count();
std::default_random_engine generator(seed);
std::normal_distribution<double> distribution(0.0, 3.0);
double number = abs(distribution(generator));
std::cout << number;
std::cin.get();
return 0;
}
This may help, create a random number using gaussian with mean=0.0 and std_dev= 3.0

Cython w/ STL vector & no NumPy to call from other apps, what's missing?

I have Cython code that I'm trying to compile as a DLL so that I can call it from other languages. The strange thing is that using STL vectors instead of NumPy MemoryViews I get 12x lower performance. Trying to use OpenMP with Cython's prange also doesn't seem to work (I get 100% utilization on all 4 threads w/ memoryviews and maybe 50% max w/ STL vectors). Anyone have any thoughts on how to revamp the STL version to be comparable? The Cython profiler only shows the cimport cython and cpdef statements to interact with Python... could it be that when called from C++ just renaming them as cdef would improve things? Or do I have to use Intel MKL vectors as in the examples here https://software.intel.com/en-us/node/531898 which with simple option formulas these are really not a big deal to rewrite...? I'm really so inexperienced in C++ I would have to research on the internet to just create a C++ only test script... Code segment below:
cimport cython
from libcpp.vector cimport vector
cdef extern from "math.h":
double exp(double)
double sqrt(double)
double log(double)
double erf(double)
cdef inline double std_norm_cdf(double x):
return 0.5*(1+erf(x/sqrt(2.0)))
cpdef CyBlackP(vector[double] Black_PnL, vector[double] Black_S, vector[double] Black_Texpiry, vector[double] Black_strike, vector[double] Black_volatility, vector[double] Black_IR, vector[int] Black_callput):
cdef int i, N
N = Black_PnL.size()
cdef double d1, d2
for i in range(N):
d1 = (log(Black_S[i] / Black_strike[i]) + Black_Texpiry[i] * (Black_volatility[i] *Black_volatility[i]) / 2) / (Black_volatility[i] * sqrt(Black_Texpiry[i]))
d2 = d1 - Black_volatility[i] * sqrt(Black_Texpiry[i])
Black_PnL[i] = exp(-Black_IR[i] * Black_Texpiry[i]) * (Black_callput[i] * Black_S[i] * std_norm_cdf(Black_callput[i] * d1) - Black_callput[i] * Black_strike[i] * std_norm_cdf(Black_callput[i] * d2))
return Black_PnL
For others out there this code above is tied out completely to be accurate w/ the Black model if you want to use it.

This is slow because numpy arrays and c++ vectors are not interchangeable - based on the doc note here it seems like the numpy array is being iterated/copied into a new vector.
For example, consider a couple functions that do nothing:
# ext.pyx
cpdef pass_vec(vector[double] v):
return 0.0
cpdef pass_arr(double[::1] a):
return 0.0
The timings below highlight just how much overhead there is. Note that your function will be fast if called with a c++ vector as the argument, just not with a numpy array passed.
In [1]: import ext
In [2]: a = np.zeros(1000000)
In [4]: %timeit ext.pass_arr(a)
1000000 loops, best of 3: 808 ns per loop
In [5]: %timeit ext.pass_vec(a)
10 loops, best of 3: 63 ms per loop

Fast conversion of C/C++ vector to Numpy array

I'm using SWIG to glue together some C++ code to Python (2.6), and part of that glue includes a piece of code that converts large fields of data (millions of values) from the C++ side to a Numpy array. The best method I can come up with implements an iterator for the class and then provides a Python method:
def __array__(self, dtype=float):
return np.fromiter(self, dtype, self.size())
The problem is that each iterator next call is very costly, since it has to go through about three or four SWIG wrappers. It takes far too long. I can guarantee that the C++ data are stored contiguously (since they live in a std::vector), and it just feels like Numpy should be able to take a pointer to the beginning of that data alongside the number of values it contains, and read it directly.
Is there a way to pass a pointer to internal_data_[0] and the value internal_data_.size() to numpy so that it can directly access or copy the data without all the Python overhead?

You will want to define __array_interface__() instead. This will let you pass back the pointer and the shape information directly.

Maybe it would be possible to use f2py instead of swig. Despite its name, it is capable of interfacing python with C as well as Fortran. See http://www.scipy.org/Cookbook/f2py_and_NumPy
The advantage is that it handles the conversion to numpy arrays automatically.
Two caveats: if you don't already know Fortran, you may find f2py a bit strange; and I don't know how well it works with C++.

If you wrap your vector in an object that implements Pythons Buffer Interface, you can pass that to the numpy array for initialization (see docs, third argument). I would bet that this initialization is much faster, since it can just use memcpy to copy the data.

So it looks like the only real solution is to base something off pybuffer.i that can copy from C++ into an existing buffer. If you add this to a SWIG include file:
%insert("python") %{
import numpy as np
%}
/*! Templated function to copy contents of a container to an allocated memory
* buffer
*/
%inline %{
//==== ADDED BY numpy.i
#include <algorithm>
template < typename Container_T >
void copy_to_buffer(
const Container_T& field,
typename Container_T::value_type* buffer,
typename Container_T::size_type length
)
{
// ValidateUserInput( length == field.size(),
// "Destination buffer is the wrong size" );
// put your own assertion here or BAD THINGS CAN HAPPEN
if (length == field.size()) {
std::copy( field.begin(), field.end(), buffer );
}
}
//====
%}
%define TYPEMAP_COPY_TO_BUFFER(CLASS...)
%typemap(in) (CLASS::value_type* buffer, CLASS::size_type length)
(int res = 0, Py_ssize_t size_ = 0, void *buffer_ = 0) {
res = PyObject_AsWriteBuffer($input, &buffer_, &size_);
if ( res < 0 ) {
PyErr_Clear();
%argument_fail(res, "(CLASS::value_type*, CLASS::size_type length)",
$symname, $argnum);
}
$1 = ($1_ltype) buffer_;
$2 = ($2_ltype) (size_/sizeof($*1_type));
}
%enddef
%define ADD_NUMPY_ARRAY_INTERFACE(PYVALUE, PYCLASS, CLASS...)
TYPEMAP_COPY_TO_BUFFER(CLASS)
%template(_copy_to_buffer_ ## PYCLASS) copy_to_buffer< CLASS >;
%extend CLASS {
%insert("python") %{
def __array__(self):
"""Enable access to this data as a numpy array"""
a = np.ndarray( shape=( len(self), ), dtype=PYVALUE )
_copy_to_buffer_ ## PYCLASS(self, a)
return a
%}
}
%enddef
then you can make a container "Numpy"-able with
%template(DumbVectorFloat) DumbVector<double>;
ADD_NUMPY_ARRAY_INTERFACE(float, DumbVectorFloat, DumbVector<double>);
Then in Python, just do:
# dvf is an instance of DumbVectorFloat
import numpy as np
my_numpy_array = np.asarray( dvf )
This has only the overhead of a single Python <--> C++ translation call, not the N that would result from a typical length-N array.
A slightly more complete version of this code is part of my PyTRT project at github.

Sorting based on associative arrays in D

I am trying to follow examples given in various places for D apps. Generally when learning a language I start on example apps and change them myself, purely to test stuff out.
One app that caught my eye was to count the frequency of words in a block of text passed in. As the dictionary was built up in an associative array (with the elements storing the frequency, and the keys being the words themselves), the output was not in any particular order. So, I attempted to sort the array based on examples given on the site.
Anyway, the example showed a lambda 'sort!(...)(array);' but when I attempt the code dmd won't compile it.
Here's the boiled down code:
import std.stdio;
import std.string;
void main() {
uint[string] freqs;
freqs["the"] = 51;
freqs["programming"] = 3;
freqs["hello"] = 10;
freqs["world"] = 10;
/*...You get the point...*/
//This is the actual example given, but it doesn't
//seem to work, old D version???
//string[] words = array(freqs.keys);
//This seemed to work
string[] words = freqs.keys;
//Example given for how to sort the 'words' array based on
//external criteria (i.e. the frequency of the words from
//another array). This is the line where the compilor craps out!
sort!((a,b) {return freqs[a] < freqs[b];})(words);
//Should output in frequency order now!
foreach(word; words) {
writefln("%s -> %s", word, freqs[word]);
}
}
When I try to compile this code, I get the following
s1.d(24): Error: undefined identifier sort
s1.d(24): Error: function expected before (), not sort of type int
Can anyone tell me what I need to do here?
I use DMD v2.031, I've tried installing the gdc but this only seems to support the v1 language spec. I've only started looking at dil, so I can't comment on whether this supports the code above.

Try adding this near the top of the file:
import std.algorithm;

Here's an even simpler way to get an input file (from cmdline), get lines/words and print a table of word frequencing, in descending order :
import std.algorithm;
import std.file;
import std.stdio;
import std.string;
void main(string[] args)
{
auto contents = cast(string)read(args[1]);
uint[string] freqs;
foreach(i,line; splitLines(contents))
foreach(word; split(strip(line)))
++freqs[word];
string[] words = freqs.keys;
sort!((a,b)=> freqs[a]>freqs[b])(words);
foreach(s;words)
writefln("%s\t\t%s",s,freqs[s]);
}
Well, almost 4 years later... :-)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Timing of a C++ function wrapped with cython - c++

Related

Dynamic allocation of matrix. How?? (c++ Stroustrup)

Random normal distribution by Gaussian in C++

Cython w/ STL vector & no NumPy to call from other apps, what's missing?

Fast conversion of C/C++ vector to Numpy array

Sorting based on associative arrays in D

Categories

Resources