I'm using SWIG to glue together some C++ code to Python (2.6), and part of that glue includes a piece of code that converts large fields of data (millions of values) from the C++ side to a Numpy array. The best method I can come up with implements an iterator for the class and then provides a Python method:
def __array__(self, dtype=float):
return np.fromiter(self, dtype, self.size())
The problem is that each iterator next call is very costly, since it has to go through about three or four SWIG wrappers. It takes far too long. I can guarantee that the C++ data are stored contiguously (since they live in a std::vector), and it just feels like Numpy should be able to take a pointer to the beginning of that data alongside the number of values it contains, and read it directly.
Is there a way to pass a pointer to internal_data_[0] and the value internal_data_.size() to numpy so that it can directly access or copy the data without all the Python overhead?
You will want to define __array_interface__() instead. This will let you pass back the pointer and the shape information directly.
Maybe it would be possible to use f2py instead of swig. Despite its name, it is capable of interfacing python with C as well as Fortran. See http://www.scipy.org/Cookbook/f2py_and_NumPy
The advantage is that it handles the conversion to numpy arrays automatically.
Two caveats: if you don't already know Fortran, you may find f2py a bit strange; and I don't know how well it works with C++.
If you wrap your vector in an object that implements Pythons Buffer Interface, you can pass that to the numpy array for initialization (see docs, third argument). I would bet that this initialization is much faster, since it can just use memcpy to copy the data.
So it looks like the only real solution is to base something off pybuffer.i that can copy from C++ into an existing buffer. If you add this to a SWIG include file:
%insert("python") %{
import numpy as np
%}
/*! Templated function to copy contents of a container to an allocated memory
* buffer
*/
%inline %{
//==== ADDED BY numpy.i
#include <algorithm>
template < typename Container_T >
void copy_to_buffer(
const Container_T& field,
typename Container_T::value_type* buffer,
typename Container_T::size_type length
)
{
// ValidateUserInput( length == field.size(),
// "Destination buffer is the wrong size" );
// put your own assertion here or BAD THINGS CAN HAPPEN
if (length == field.size()) {
std::copy( field.begin(), field.end(), buffer );
}
}
//====
%}
%define TYPEMAP_COPY_TO_BUFFER(CLASS...)
%typemap(in) (CLASS::value_type* buffer, CLASS::size_type length)
(int res = 0, Py_ssize_t size_ = 0, void *buffer_ = 0) {
res = PyObject_AsWriteBuffer($input, &buffer_, &size_);
if ( res < 0 ) {
PyErr_Clear();
%argument_fail(res, "(CLASS::value_type*, CLASS::size_type length)",
$symname, $argnum);
}
$1 = ($1_ltype) buffer_;
$2 = ($2_ltype) (size_/sizeof($*1_type));
}
%enddef
%define ADD_NUMPY_ARRAY_INTERFACE(PYVALUE, PYCLASS, CLASS...)
TYPEMAP_COPY_TO_BUFFER(CLASS)
%template(_copy_to_buffer_ ## PYCLASS) copy_to_buffer< CLASS >;
%extend CLASS {
%insert("python") %{
def __array__(self):
"""Enable access to this data as a numpy array"""
a = np.ndarray( shape=( len(self), ), dtype=PYVALUE )
_copy_to_buffer_ ## PYCLASS(self, a)
return a
%}
}
%enddef
then you can make a container "Numpy"-able with
%template(DumbVectorFloat) DumbVector<double>;
ADD_NUMPY_ARRAY_INTERFACE(float, DumbVectorFloat, DumbVector<double>);
Then in Python, just do:
# dvf is an instance of DumbVectorFloat
import numpy as np
my_numpy_array = np.asarray( dvf )
This has only the overhead of a single Python <--> C++ translation call, not the N that would result from a typical length-N array.
A slightly more complete version of this code is part of my PyTRT project at github.
Related
I have a struct that receives an array of type Float from a C++ library.
class MyStruct extends Struct{
#Array.multi([12])
external Array<Float> states;
}
I am able to receive data and parse it in Dart.
Now I want to do the reverse. I have a List<double> which I want to assign to this struct and pass to C++.
The following cast fails at run time.
myStructObject.states = listObject as Array<Float>;
Neither Array class, nor List class has any related methods. Any idea on this?
There's no way to get around copying elements into FFI arrays.
for (var i = 0; i < listObject.length; i++) {
my_struct.states[i] = listObject[i];
}
This may seem inefficient, but consider that depending on the specialization of listObject, the underlying memory layout of the data may differ significantly from the contiguous FFI layout, and so a type conversion sugar provided by Dart would likely also need to perform conversions on individual elements anyways (as opposed to just performing a single memcpy under the hood).
One possibility for closing the convenience gap would be to define an extension method. For example:
extension FloatArrayFill<T> on ffi.Array<ffi.Float> {
void fillFromList(List<T> list) {
for (var i = 0; i < list.length; i++) {
this[i] = list[i] as double;
}
}
}
Usage:
my_struct.states.fillFromList(list);
Note that a separate extension method would be need to be defined for each ffi.Array<T> specialization you want to do this for (Array<Uint32>, Array<Double>, Array<Bool>, etc.).
This is due to the [] operator being implemented through a separate extension method for each of these type specializations internally.
I do not really know how to ask this question, but I will try to be as clear as possible.
I am timing a C++ function call from python. The C++ function is wrapped with cython.
I am currently timing the python call of the cython function and I get like 52.9 ms with time.time(). On an other hand, I am timing the whole C++ function with the C++ std::chrono::high_resolution_clock library.
The thing is, I am measuring 17.1 ms in C++.
The C++ function is declared like this vector<float> cppfunc(vector<float> array, int a, int b, int c); and is a class A method.
The cython code only calls the C++ class method. The vector contains about 320k elements.
I was wondering if these two measured times could be compared like this?
If we can, what can explain this gap?
If not, which timing tool should I use?
Edit1: (links in comments) both timing libraries are precise enough for my use case (10e-9 for cpp on my arch and 10e-6 for python).
Edit2: added simplified code to illustrate my point. With this code, the python call duration (~210ms) is 8x the intern cpp duration (~28ms).
// example.cpp
#include "example.h"
#include <iostream>
#include <chrono>
std::vector<float> wrapped_function(std::vector<float> array)
{
auto start = std::chrono::high_resolution_clock::now();
std::vector<float> result;
for (int i = 0; i < (int) array.size(); i++) {
result.push_back(array[i] / 1.1);
}
auto end = std::chrono::high_resolution_clock::now();
std::chrono::duration<float> duration = end - start;
printf("Within duration: %.5f\n", duration.count());
return result;
}
// example.h
#ifndef __EXAMPLE_H_
#define __EXAMPLE_H_
#include <vector>
std::vector<float> wrapped_function(std::vector<float> array);
#endif
# example_wrapper.pxd
from libcpp.vector cimport vector
cdef extern from "example.h":
vector[float] wrapped_function(vector[float])
# example_wrapper.pyx
from example_wrapper cimport wrapped_function
def cython_wrap(array):
return wrapped_function(array)
# setup.py
from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext
setup(
cmdclass = {"build_ext": build_ext},
ext_modules = [
Extension(name="example_wrapper",
sources=["example_wrapper.pyx", "example.cpp"],
include_dirs=["/home/SO/"],
language="c++",
extra_compile_args=["-O3", "-Wall", "-std=c++11"]
)
]
)
# test.py
import example_wrapper
from time import time
array = [i for i in range(1000000)]
t0 = time()
result = example_wrapper.cython_wrap(array)
t1 = time()
print("Wrapped duration: {}".format(t1 - t0))
Obviously the difference is from cython overhead, but why it is so big?
The call of the wrapped function is more complex than meets the eye:
def cython_wrap(array):
return wrapped_function(array)
array is a list of integers, the wrapped_function expects vector of floats, so the cython creates automatically a vector and populates it with values from the list.
The wrapped_function returns a vector of floats, but in order to be used by python it must be converted to a python-list. Once again cython automatically creates a python-list and populates it with python-floats which are quite costly to construct and correspond to the floats from the returned vector.
As you can see, a lot of copying is going on and this explains the overhead you are observing.
Here is the set of rules cython automatically applies when converting from c++-containers to python.
Another issue: you are passing the vector array by value, so it has to be copied. Your timing of c++ code does not include this copying and thus it is a little bit unfair.
You should pass the vector by const-reference, i.e.
... wrapped_function(const std::vector<float> &array)
One more thing: you return a vector which possible will be copied as well and this copying-time is once again not included in your c++-timings. However all modern compilers apply return value optimization, so this is not an issue here.
I have my function in Python for normal distribution. I need to convert it to C++ and i am not familiar with language.
Here is my Python:
def calculation(value):
sigma = 0.5
size = 10000
x = 200
x_distribution = np.random.normal(value, sigma, size)
for i in x_distribution:
x.append(i)
return x
And it works as expected. I am trying to re-write same thing in C++ and found only the Link and where the "std::normal_distribution<> d{5,2};
" has to make magic. But i could not figure it out how to implement.
Here what i have tried and it is failing.
# include frame.distribution
Frame DistributionModel(x_mu, x_sigma)
{
// Motion model;ignore it
model = std::normal_distribution<> d{x_mu,x_sigma};
return model;
}
Please, help me. Looking for any hints. Thanks.
Well, trouble without end...
# include frame.distribution
Syntax for inclusion is:
#include <name_of_header_file>
// or:
#include "name_of_header_file"
(The space in between # and include does not harm, but is absolutely uncommon...)
Frame DistributionModel(x_mu, x_sigma)
C++ is a strongly typed language, i. e. you cannot just give variables a name as in Python, but you need to give them a type!
Frame DistributionModel(double x_mu, double x_sigma)
Same for local variables; type must match what you actually assign to (unless using auto)
std::normal_distribution<double> nd(x_mu, x_sigma);
This is a bit special about C++: You define a local variable, e. g.
std::vector<int> v;
In case of a class, it gets already constructed using its default constructor. If you want to call a constructor with arguments, you just append the call to the variable name:
std::vector<int> v(10); // vector with 10 elements.
What you saw in the sample is a feature called "uniform initialisation", using braces instead of parentheses. I personally strongly oppose against its usage, though, so you won't ever see it in code I have written (see me constructing the std::normal_distribution above...).
std::normal_distribution is defined in header random, so you need to include it (before your function definition):
#include <random>
About the return value: You only can return Frame, if the data type is defined somewhere. Now before trying to define a new class, we just can use an existing one: std::vector (it's a template class, though). A vector is quite similar to a python list, it is a container class storing a number of objects in contiguous memory; other than python lists, though, the type of all elements stored must be the same. We can use such a vector to collect the results:
std::vector<double> result;
Such a vector can grow dynamically, however, this can result in necessity to re-allocate the internal storage memory. Costly. If you know the number of elements in advance, you can tell the vector to allocate sufficient memory in advance, too:
result.reserve(max);
The vector is what we are going to return, so we need to adjust the function signature (I allowed to give it a different name and added another parameter):
std::vector<double> getDistribution(double x_mu, double x_sigma, size_t numberOfValues)
It would be possible to let the compiler deduce the return type, using auto keyword for. While auto brings quite a lot of benefits, I do not recommend it for given purpose: With explicit return type, users of the function see right from the signature what kind of result to expect and do not have to look into the function body to know about.
std::normal_distribution now is a number generator; it does not deliver the entire sequence at once as the python equivalent does, you need to draw the values one by another explicitly:
while(numberOfValues-- > 0)
{
auto value = nd(gen);
result.push_back(value);
}
nd(gen): std::normal_distribution provides a function call operator operator(), so objects of can be called just like functions (such objects are called "functors" in C++ terminology). The function call, however, requires a random number generator as argument, so we need to provide it as in the example you saw. Putting all together:
#include <random>
#include <vector>
std::vector<double> getDistribution
(
double x_mu, double x_sigma, size_t numberOfValues
)
{
// shortened compared to your example:
std::mt19937 gen((std::random_device())());
// create temporary (anonymous) ^^
// instance and call it immediately ^^
// afterwards
std::normal_distribution<double> nd(x_mu, x_sigma);
std::vector<double> result;
result.reserve(numberOfValues);
while(numberOfValues-- > 0)
{
// shorter than above: using result of previous
// function (functor!) call directly as argument to next one
result.push_back(nd(gen));
}
// finally something familiar from python:
return result;
}
#include<iostream>
#include<random>
#include<chrono>
int main() {
unsigned seed = std::chrono::system_clock::now().time_since_epoch().count();
std::default_random_engine generator(seed);
std::normal_distribution<double> distribution(0.0, 3.0);
double number = abs(distribution(generator));
std::cout << number;
std::cin.get();
return 0;
}
This may help, create a random number using gaussian with mean=0.0 and std_dev= 3.0
In short, the question is how to pass a
Eigen::Map<Eigen::MatrixXd>
object to a function which expects a
Eigen::MatrixXd
object.
Longer story:
I have this C++ function declaration
void npMatrix(const Eigen::MatrixXd &data, Eigen::MatrixXd &result);
together with this implementation
void npMatrix(const Eigen::MatrixXd &data, Eigen::MatrixXd &result)
{
//Just do s.th. with arguments
std::cout << data << std::endl;
result(1,1) = -5;
std::cout << result << std::endl;
}
I want to call this function from python using numpy.array as arguments. To this end, I use a wrapper function written in c++
void pyMatrix(const double* p_data, const int dimData[],
double* p_result, const int dimResult[]);
which takes a pointer to data, the size of the data array, a pointer to result, and the size of the result array. The data pointer points to a const patch of memory, since data is not to be altered while the patch of memory reserved for result is writeable. The implementation of the function
void pyMatrix(const double *p_data, const int dimData[], double *p_result, const int dimResult[])
{
Eigen::Map<const Eigen::MatrixXd> dataMap(p_data, dimData[0], dimData[1]);
Eigen::Map<Eigen::MatrixXd> resultMap(p_result, dimResult[0], dimResult[1]);
resultMap(0,0) = 100;
npMatrix(dataMap, resultMap);
}
defines a Eigen::Map for data and result, respectively. A Eigen::Map allows to access raw memory as a kind of Eigen::Matrix. The dataMap is of type
<const Eigen::MatrixXd>
since the associated memory is read only; resultMap in contrast is of type
<Eigen::MatrixXd>
since it must we writeable. The line
resultMap(0,0) = 100;
shows, that resultMap is in deed writeable. While passing dataMap to the npMatrix() where a const Eigen::MatrixXd is expected works, I could not find a way to pass resultMap in the same way. I am sure, the trouble comes from the fact, that the first argument of npMatrix is const, and the second is not. A possible solution I found is to define
Eigen::MatrixXd resultMatrix = resultMap;
and pass this resutlMatrix to npMatrix(). However, I guess, this creates a copy and hence kills the nice memory mapping of Eigen::Map. So my question is.
Is there a way to pass a Eigen:Map to a function which expects a non-const Eigen::MatrixXd instead?
As a side note: I could change npMatrix to expect a Eigen::Map, but since in the real project, functions are already there and tested, I would rather not temper with them.
To complete the question, here is the python file to call pyMatrix()
import ctypes as ct
import numpy as np
import matplotlib.pyplot as plt
# Load libfit and define input types
ct.cdll.LoadLibrary("/home/wmader/Methods/fdmb-refactor/build/pyinterface/libpyfit.so")
libfit = ct.CDLL("libpyfit.so")
libfit.pyMatrix.argtypes = [np.ctypeslib.ndpointer(dtype=np.float64, ndim=2),
np.ctypeslib.ndpointer(dtype=np.int32, ndim=1),
np.ctypeslib.ndpointer(dtype=np.float64, ndim=2, flags='WRITEABLE'),
np.ctypeslib.ndpointer(dtype=np.int32, ndim=1)
]
data = np.array(np.random.randn(10, 2), dtype=np.float64, order='F')
result = np.zeros_like(data, dtype=np.float64, order='F')
libfit.pyMatrix(data, np.array(data.shape, dtype=np.int32),
result, np.array(result.shape, dtype=np.int32))
Pass it as plain pointer to data, and Eigen::Map it there. Alternatively, use template <typename Derived> and the like, found in http://eigen.tuxfamily.org/dox/TopicFunctionTakingEigenTypes.html
My personal Choice is the first though, as it is better to have code that doesn't expose all the stubbornness of every API you have used. Also, you won't lose compatibility neither with eigen ,nor with any other kind of library that you (or anyone else) may use later.
There is also another trick i found out, which can be used in numerous occasions:
Eigen::MatrixXd a;
//lets assume a data pointer like double* DATA that we want to map
//Now we do
new (&a) Eigen::Map<Eigen::Matrix<Double,Eigen::Dynamic,Eigen::Dynamic>> (DATA,DATA rows,DATA cols);
This will do what you ask, without wasting memory. I think it is a cool trick, and a will behave as a matrixXd, but i haven't tested every occasion. It has no memory copy. However, you might need to resize a to the right size before assigning. Even so, the compiler will not immediately allocate all memory at the time you request the resize operation, so there won't be big useless memory allocations either!
Be careful! Resizing operations might reallocate the memory used by an eigen matrix! So, if you ::Map a memory but then you perform an action that resizes the matrix, it might be mapped to a different place in memory.
For anyone still struggling with the problem of passing an Eigen::Map to a function with signature Eigen::Matrix or vice versa, and found the Eigen::Matrix to Eigen::Map implicit casting trick which #Aperture Laboratories suggested didn't work (in my case this gave runtime errors associated with trying to free already released memory, [Mismatched delete / Invalid delete errors when ran with valgrind]),
I suggest using the Eigen::Ref class for function signatures as suggested in the answer given by #ggael here:
Passing Eigen::Map<ArrayXd> to a function expecting ArrayXd&
and written in the documentation:
http://eigen.tuxfamily.org/dox/TopicFunctionTakingEigenTypes.html#TopicUsingRefClass
under the title:
How to write generic, but non-templated function?
For example, for the function specified in the question, changing the signature to
void npMatrix(const Eigen::Ref<const Eigen::MatrixXd> & data, Eigen::Ref< Eigen::MatrixXd> result);
means passing either Eigen::Map<Eigen::MatrixXd> orEigen::MatrixXd objects to the function would work seamlessly (see #ggael's answer to Correct usage of the Eigen::Ref<> class for different ways to use Eigen::Ref in function signature).
I appreciate OP said he didn't want to change the function signatures, but in terms of using Eigen::Maps and Eigen::Matrix's interchangeably I found this the easiest and most robust method.
I have python code that contains the following code.
d = {}
d[(0,0)] = 0
d[(1,2)] = 1
d[(2,1)] = 2
d[(2,3)] = 3
d[(3,2)] = 4
for (i,j) in d:
print d[(i,j)], d[(j,i)]
Unfortunately looping over all the keys in python isn't really fast enough for my purpose, and I would like to translate this code to C++. What is the best C++ data structure to use for a python dictionary that has tuples as its keys? What would be the C++ equivalent of the above code?
I looked at sparse matrices in the boost library, but couldn't find an easy way to loop only over the non-zero elements.
A dictionary would be a std::map in c++, and a tuple with two elements would be a std::pair.
The python code provided would translate to:
#include <iostream>
#include <map>
typedef std::map<std::pair<int, int>, int> Dict;
typedef Dict::const_iterator It;
int main()
{
Dict d;
d[std::make_pair(0, 0)] = 0;
d[std::make_pair(1, 2)] = 1;
d[std::make_pair(2, 1)] = 2;
d[std::make_pair(2, 3)] = 3;
d[std::make_pair(3, 2)] = 4;
for (It it(d.begin()); it != d.end(); ++it)
{
int i(it->first.first);
int j(it->first.second);
std::cout <<it->second <<' '
<<d[std::make_pair(j, i)] <<'\n';
}
}
The type is
std::map< std::pair<int,int>, int>
The code to add entries to map is like here:
typedef std::map< std::pair<int,int>, int> container;
container m;
m[ make_pair(1,2) ] = 3; //...
for(container::iterator i = m.begin(); i != m.end(); ++i){
std::cout << i.second << ' ';
// not really sure how to translate [i,j] [j,i] idiom here easily
}
Have a look at Boost.python. It's for interaction between python and C++ (basically building python libs using C++, but also for embedding python in C++ programs). Most pythons data structures and their C++ equivalents are described (didn't checked for the one you want).
std::map or more likely std::tr1::unordered_map / boost::unordered_map (aka hash_map) is what you want.
Also, as kriss said, Boost.Python is a good idea to look at here. It provides a C++ version of python's dict class already, so if you're doing cross-language stuff, it might be useful.
Map is often implemented as a balanced binary tree not a hash table. This not the case for a Python dict. So you need a C++ O(1) equivalent data structure to use your pairs with.
Do you want to call an optimized C++ routine via Python? If so, read on:
Often times I use PyYaml when dealing with dictionaries in Python. Perhaps you could link in something like LibYAML or yamlcpp to:
Translate a Python dictionary into a YAML string
Use Python to call a C++ function wrapped using something like SWIG, taking the YAML string as a parameter.
Use a C++ library to parse the YAML & obtain a std::map object
Operate on std::map object
Warning: I have never tried this, but using everyone's favorite search engine on "yaml std::map" yields lots of interesting links
As a direct answer to your question (for the python part look at my other answer). You can forget the tuple part if you want. You can use any mapping type key/value (hash, etc.) in C++, you just have to find a unique key function. In some cases that can be easy. For instance if you two integers are integers between 1 and 65536 you just could use a 32 bits integer with each 16 bits part one of the keys. A simple shift and an 'or' or + to combine the two values would do the trick and it's very efficient.