Passing a C++ std::Vector to numpy array in Python - c++

I am trying a pass a vector of doubles that I generate in my C++ code to a python numpy array. I am looking to do some downstream processing in Python and want to use some python facilities, once I populate the numpy array. One of the biggest things I want to do is to be able to plot things, and C++ is a bit clumsy when it comes to that. Also I want to be able to leverage Python's statistical power.
Though I am not very clear as to how to do it. I spent a lot of time going through the Python C API documentation. I came across a function PyArray_SimpleNewFromData that apparently can do the trick. I still am very unclear as far as the overall set up of the code is concerned. I am building certain very simple test cases to help me understand this process. I generated the following code as a standlone Empty project in Visual Studio express 2012. I call this file Project1
#include <Python.h>
#include "C:/Python27/Lib/site-packages/numpy/core/include/numpy/arrayobject.h"
PyObject * testCreatArray()
{
float fArray[5] = {0,1,2,3,4};
npy_intp m = 5;
PyObject * c = PyArray_SimpleNewFromData(1,&m,PyArray_FLOAT,fArray);
return c;
}
My goal is to be able to read the PyObject in Python. I am stuck because I don't know how to reference this module in Python. In particular how do I import this Project from Python, I tried to do a import Project1, from the project path in python, but failed. Once I understand this base case, my goal is to figure out a way to pass the vector container that I compute in my main function to Python. I am not sure how to do that either.
Any experts who can help me with this, or maybe post a simple well contained example of some code that reads in and populates a numpy array from a simple c++ vector, I will be grateful. Many thanks in advance.

I'm not a cpp-hero ,but wanted to provide my solution with two template functions for 1D and 2D vectors. This is a one liner for usage l8ter and by templating 1D and 2D vectors, the compiler can take the correct version for your vectors shape. Throws a string in case of unregular shape in the case of 2D. The routine copies the data here, but one can easily modify it to take the adress of the first element of the input vector in order to make it just a "representation".
Usage looks like this:
// Random data
vector<float> some_vector_1D(3,1.f); // 3 entries set to 1
vector< vector<float> > some_vector_2D(3,vector<float>(3,1.f)); // 3 subvectors with 1
// Convert vectors to numpy arrays
PyObject* np_vec_1D = (PyObject*) vector_to_nparray(some_vector_1D);
PyObject* np_vec_2D = (PyObject*) vector_to_nparray(some_vector_2D);
You may also change the type of the numpy array by the optional arguments. The template functions are:
/** Convert a c++ 2D vector into a numpy array
*
* #param const vector< vector<T> >& vec : 2D vector data
* #return PyArrayObject* array : converted numpy array
*
* Transforms an arbitrary 2D C++ vector into a numpy array. Throws in case of
* unregular shape. The array may contain empty columns or something else, as
* long as it's shape is square.
*
* Warning this routine makes a copy of the memory!
*/
template<typename T>
static PyArrayObject* vector_to_nparray(const vector< vector<T> >& vec, int type_num = PyArray_FLOAT){
// rows not empty
if( !vec.empty() ){
// column not empty
if( !vec[0].empty() ){
size_t nRows = vec.size();
size_t nCols = vec[0].size();
npy_intp dims[2] = {nRows, nCols};
PyArrayObject* vec_array = (PyArrayObject *) PyArray_SimpleNew(2, dims, type_num);
T *vec_array_pointer = (T*) PyArray_DATA(vec_array);
// copy vector line by line ... maybe could be done at one
for (size_t iRow=0; iRow < vec.size(); ++iRow){
if( vec[iRow].size() != nCols){
Py_DECREF(vec_array); // delete
throw(string("Can not convert vector<vector<T>> to np.array, since c++ matrix shape is not uniform."));
}
copy(vec[iRow].begin(),vec[iRow].end(),vec_array_pointer+iRow*nCols);
}
return vec_array;
// Empty columns
} else {
npy_intp dims[2] = {vec.size(), 0};
return (PyArrayObject*) PyArray_ZEROS(2, dims, PyArray_FLOAT, 0);
}
// no data at all
} else {
npy_intp dims[2] = {0, 0};
return (PyArrayObject*) PyArray_ZEROS(2, dims, PyArray_FLOAT, 0);
}
}
/** Convert a c++ vector into a numpy array
*
* #param const vector<T>& vec : 1D vector data
* #return PyArrayObject* array : converted numpy array
*
* Transforms an arbitrary C++ vector into a numpy array. Throws in case of
* unregular shape. The array may contain empty columns or something else, as
* long as it's shape is square.
*
* Warning this routine makes a copy of the memory!
*/
template<typename T>
static PyArrayObject* vector_to_nparray(const vector<T>& vec, int type_num = PyArray_FLOAT){
// rows not empty
if( !vec.empty() ){
size_t nRows = vec.size();
npy_intp dims[1] = {nRows};
PyArrayObject* vec_array = (PyArrayObject *) PyArray_SimpleNew(1, dims, type_num);
T *vec_array_pointer = (T*) PyArray_DATA(vec_array);
copy(vec.begin(),vec.end(),vec_array_pointer);
return vec_array;
// no data at all
} else {
npy_intp dims[1] = {0};
return (PyArrayObject*) PyArray_ZEROS(1, dims, PyArray_FLOAT, 0);
}
}

Since there is no answer to this that is actually helpful for people that might be looking for this sort of thing I figured I'd put an easy solution.
First you will need to create a python extension module in C++, this is easy enough to do and is all in the python c-api documentation so i'm not going to go into that.
Now to convert a c++ std::vector to a numpy array is extremely simple. You first need to import the numpy array header
#include <numpy/arrayobject.h>
and in your intialising function you need to import_array()
PyModINIT_FUNC
inittestFunction(void){
(void) Py_InitModule("testFunction". testFunctionMethods);
import_array();
}
now you can use the numpy array functions that are provided.
The one that you will want for this is as the OP said a few years back PyArray_SimpleNewFromData, it's stupidly simple to use. All you need is an array of type npy_intp, this is the shape of the array to be created. make sure it is the same as your vector using testVector.size(), (and for multiple dimensions do testVector[0].size(), testVector[0][0].size() ect. vectors are guaranteed to be continuous in c++11 unless it's a bool).
//create testVector with data initialised to 0
std::vector<std::vector<uint16_t>> testVector;
testVector.resize(width, std::vector<uint16_t>(height, 0);
//create shape for numpy array
npy_intp dims[2] = {width, height}
//convert testVector to a numpy array
PyArrayObject* numpyArray = (PyArrayObject*)PyArray_SimpleNewFromData(2, dims, NPY_UINT16, (uint16_t*)testVector.data());
To go through the paramaters. First you need to cast it to a PyArrayObject, otherwise it will be a PyObject and when returned to python won't be a numpy array.
The 2, is the number of dimensions in the array.
dims, is the shape of the array. This has to be of type npy_intp
NPY_UINT16 is the data type that the array will be in python.
you then use testVector.data() to get the data of the array, cast this to either void* or a pointer of the same data type as your vector.
Hope this helps anyone else who may need this.
(Also if you don't need pure speed I would advise avoiding using the C-API, it causes quite a few problems and cython or swig are still probably your best choices. There is also c types which can be quite helpful.

I came across your post when trying to do something very similar. I was able to cobble together a solution, the entirety of which is on my Github. It makes two C++ vectors, converts them to Python tuples, passes them to Python, converts them to NumPy arrays, then plots them using Matplotlib.
Much of this code is from the Python Documentation.
Here are some of the important bits from the .cpp file :
//Make some vectors containing the data
static const double xarr[] = {1,2,3,4,5,6,7,8,9,10,11,12,13,14};
std::vector<double> xvec (xarr, xarr + sizeof(xarr) / sizeof(xarr[0]) );
static const double yarr[] = {0,0,1,1,0,0,2,2,0,0,1,1,0,0};
std::vector<double> yvec (yarr, yarr + sizeof(yarr) / sizeof(yarr[0]) );
//Transfer the C++ vector to a python tuple
pXVec = PyTuple_New(xvec.size());
for (i = 0; i < xvec.size(); ++i) {
pValue = PyFloat_FromDouble(xvec[i]);
if (!pValue) {
Py_DECREF(pXVec);
Py_DECREF(pModule);
fprintf(stderr, "Cannot convert array value\n");
return 1;
}
PyTuple_SetItem(pXVec, i, pValue);
}
//Transfer the other C++ vector to a python tuple
pYVec = PyTuple_New(yvec.size());
for (i = 0; i < yvec.size(); ++i) {
pValue = PyFloat_FromDouble(yvec[i]);
if (!pValue) {
Py_DECREF(pYVec);
Py_DECREF(pModule);
fprintf(stderr, "Cannot convert array value\n");
return 1;
}
PyTuple_SetItem(pYVec, i, pValue); //
}
//Set the argument tuple to contain the two input tuples
PyTuple_SetItem(pArgTuple, 0, pXVec);
PyTuple_SetItem(pArgTuple, 1, pYVec);
//Call the python function
pValue = PyObject_CallObject(pFunc, pArgTuple);
And the Python code:
def plotStdVectors(x, y):
import numpy as np
import matplotlib.pyplot as plt
print "Printing from Python in plotStdVectors()"
print x
print y
x = np.fromiter(x, dtype = np.float)
y = np.fromiter(y, dtype = np.float)
print x
print y
plt.plot(x, y)
plt.show()
return 0
Which results in the plot that I can't post here due to my reputation, but is posted on my blog post here.

_import_array(); //this is required for numpy to create an array correctly
Note: In Numpy's extension guide they use import_array() to accomplish the same goal that I used _import_array() for. When I tried using import_array(), on a mac I got an error. So you may need to try both commands and see which one works.
By the way you can use C++ std::vector in the call to PyArray_SimpleNewFromData.
If your std::vector is my_vector, replace fArraywith &my_vector[0]. &my_vector[0] allows you to access the pointer that stores the data in my_vector.

Related

Adding a custom sparse op (Sparse Determinant)

I am working on trying to get some sparse matrix operations working in Tensorflow. The first one I am tackling is a sparse determinant, via a sparse Cholesky decomposition. Eigen has a sparse Cholesky, so my thought is to wrap that.
I have been making some progress, but am now a little bit stuck. I know that SparseTensors in Tensorflow are made up of three parts: indices, values, and shape. Copying similar ops, I went for the following REGISTER_OP declaration:
REGISTER_OP("SparseLogDet")
.Input("a_indices: int64")
.Input("a_values: float32")
.Input("a_shape: int64")
.Output("determinant: float32")
.SetShapeFn([](::tensorflow::shape_inference::InferenceContext* c) {
shape_inference::ShapeHandle h;
c->set_output(0, h);
return Status::OK();
});
This compiles fine, but when I run it using some example code:
import tensorflow as tf
log_det_op = tf.load_op_library('./sparse_log_det_op.so')
with tf.Session(''):
t = tf.SparseTensor(indices=[[0, 0], [1, 2]], values=[1, 2],
dense_shape=[3, 4])
print(log_det_op.sparse_log_det(t).eval().shape)
print(log_det_op.sparse_log_det(t).eval())
It complains, saying:
TypeError: sparse_log_det() missing 2 required positional arguments: 'a_values' and 'a_shape'
This makes sense to me, since it's expecting the other arguments. However, I would really just like to pass the sparse tensor, not break it up into components! Does anyone know how this is handled for other sparse operations?
Thanks!
If you want to pass in the sparse tensor and then determine indices, values and shape from this, this should be possible. Just modify your OP to take a single Tensor input, and produce a single float output. Then extract the desired information form the Eigen::Tensor by looping through its elements as seen below:
#include "tensorflow/core/framework/op.h"
#include "tensorflow/core/framework/shape_inference.h"
#include "tensorflow/core/framework/op_kernel.h"
#include <Eigen/Dense>
using namespace tensorflow;
REGISTER_OP("SparseDeterminant")
.Input("sparse_tensor: float")
.Output("sparse_determinant: float");
class SparseDeterminantOp : public OpKernel {
public:
explicit SparseDeterminantOp(OpKernelConstruction *context) : OpKernel(context) {}
void Compute(OpKernelContext *context) override {
// get the input tesnorflow tensor
const Tensor& sparse_tensor = context->input(0);
// get shape of input
const TensorShape& sparse_shape = sparse_tensor.shape();
// get Eigen Tensor for input tensor
auto eigen_sparse = sparse_tensor.matrix<float>();
//extract the data you want from the sparse tensor input
auto a_shape = sparse_tensor.shape();
// loop over all elements of the input tensor and add to values and indices
for (int i=0; i<a_shape.dim_size(0); ++i){
for (int j=0; j<a_shape.dim_size(1); ++j){
if(eigen_sparse(i,j) != 0){
/// ***Here add non zero elements to list/tensor of values and their indicies***
std::cout<<eigen_sparse(i,j)<<" at"<<" "<<i<<" "<<j<<" "<<"not zero."<<std::endl;
}
}
}
// create output tensor
Tensor *output_tensor = NULL;
TensorShape output_shape;
OP_REQUIRES_OK(context, context->allocate_output(0, output_shape, &output_tensor));
auto output = output_tensor->scalar<float>();
output(0) = 1.; //**asign return value***;
}
};
REGISTER_KERNEL_BUILDER(Name("SparseDeterminant").Device(DEVICE_CPU), SparseDeterminantOp);
sadly, when you pass t into your op it becomes a Tensorflow::Tensor and loses the values and indices methods associated with tf.sparsetensor, so you can't get them easily.
Once compiled this code can be run with:
//run.py
import tensorflow as tf
import numpy as np
my_module = tf.load_op_library('./out.so')
# create sparse matrix
a = np.zeros((10,10))
for i in range(len(a)):
a[i,i] = i
print(a)
a_t = tf.convert_to_tensor(a, dtype= float)
with tf.Session() as sess:
sess.run(my_module.sparse_determinant(a_t))

Creating a 3D array in C++ using passed in parameters

I have a function that takes in a void* buffer parameter. This function (which is provided by HDF here. From my understanding, it reads info from a dataset into the buffer. I have this working, but only if I create a 3d int array using constant values. I need to be able to do this using values passed in by the user.
Here is the start of that function:
void* getDataTest(int countX, int countY)
{
int NX = countX;
int NY = countY;
int NZ = 1;
int data_out[NX][NY][NZ]; //I know this doesn't work, just posting it for reference
//.
//. more code here...
//.
// Read function is eventually called...
h5Dataset.read(data_out, H5::PredType::NATIVE_INT, memspace, h5Dataspace);
}
This constantly fails on me. However, my previoud implementation that used const int values when creating the data_out array worked fine:
void* getDataTest(int countX, int countY)
{
const int NX = 5;
const int NY = 5;
const int NZ = 1;
int data_out[NX][NY][NZ];
//.
//. more code here...
//.
// Read function is eventually called...
h5Dataset.read(data_out, H5::PredType::NATIVE_INT, memspace, h5Dataspace);
}
This works fine. From my understanding, this function (which I have no control over) requires dataspaces of the same dimensionality (e.g. a 3D array will only work with a 3D array while a 2D array will only work with a 2D array when copying over the data to the buffer).
So, my key problem here is that I can't seem to figure out how to create a 3D int array that the read function is happy with (the function parameter is a void* but I can't seem to get anything other than a 3d int array to work). I've tried a 3D int array represented as an array of arrays of arrays using:
int*** data_out = new int**[NX];
but this failed as well. Any ideas on how I can create a 3D int array of the form int arrayName[non-constant value][non-constant value][non-constant value]? I know you can't create an array using non-constant values, but I added them in an attempt to clarify my goal. Should there be a way in C++ to use function parameters as values for instantiating an array?
I think the easiest is to do this:
int* data_out = new int[NX * NY * NZ];
You can then access this 1D array as a 3D array like that:
int value = array[z * NX * NY + y * NX + x];
In a more C++11 style, you can use an std::vector:
std::vector<int> data_out;
data_out.resize(NX * NY * NZ);
And calling the function like that:
h5Dataset.read(data_out.begin(), H5::PredType::NATIVE_INT, memspace, h5Dataspace);
Do it like this:
std::vector<int> array;
array.resize(Nx*Ny*Nz);
array[z*Ny*Nx + y*Nx + x] = value
It's nice to have the array[z][y][x] syntax, but supporting it is more trouble than it is worth.

VlFeat kdtree setup and query

I've managed to get VlFeat's SIFT implmentation working and I'd like to try matching two sets of image descriptors.
SIFT's feature vectors are 128 element float arrays, I've stored the descriptor lists in std::vectors as shown in the snippet below:
std::vector<std::vector<float> > ldescriptors = leftImage->descriptors;
std::vector<std::vector<float> > rdescriptors = rightImage->descriptors;
/* KDTree, L1 comparison metric, dimension 128, 1 tree, L1 metric */
VlKDForest* forest = vl_kdforest_new(VL_TYPE_FLOAT, 128, 1, VlDistanceL1);
/* Build the tree from the left descriptors */
vl_kdforest_build(forest, ldescriptors.size(), ldescriptors.data());
/* Searcher object */
VlKDForestSearcher* searcher = vl_kdforest_new_searcher(forest);
VlKDForestNeighbor neighbours[2];
/* Query the first ten points for now */
for(int i=0; i < 10; i++){
int nvisited = vl_kdforestsearcher_query(searcher, &neighbours, 2, rdescriptors[i].data());
cout << nvisited << neighbours[0].distance << neighbours[1].distance;
}
As far as I can tell that should work, but all I get out, for the distances, are nan's. The length of the descriptor arrays checkout so there does seem to be data going into the tree. I've plotted the keypoints and they also look reasonable, so the data is fairly sane.
What am I missing?
Rather sparse documentation here (links to the API): http://www.vlfeat.org/api/kdtree.html
What am I missing?
The 2nd argument of vl_kdforestsearcher_query takes a pointer to VlKDForestNeighbor:
vl_size
vl_kdforestsearcher_query(
VlKDForestSearcher *self,
VlKDForestNeighbor *neighbors,
vl_size numNeighbors,
void const *query
);
But here you declared VlKDForestNeighbor neighbours[2]; and then passed &neighbours as 2nd parameter which is not correct - your compiler probably issued a incompatible pointer types warning.
Since you declared an array, what you must do instead is either pass explicitly a pointer to the 1st neighbor:
int nvisited = vl_kdforestsearcher_query(searcher, &neighbours[0], 2, qrys[i]);
Or alternatively let the compiler do it for you:
int nvisited = vl_kdforestsearcher_query(searcher, neighbours, 2, qrys[i]);
EDIT
There is indeed a second (major) problem related to the way you build the kd-tree with ldescriptors.data().
Here you pass a std::vector<float>* pointer when VLFeat expects a float * contiguous array containing all your data points in row major order. So what you can do is copying your data in this format:
float *data = new float[128*ldescriptors.size()];
for (unsigned int i = 0; i < ldescriptors.size(); i++)
std::copy(ldescriptors[i].begin(), ldescriptors[i].end(), data + 128*i);
vl_kdforest_build(forest, ldescriptors.size(), data);
// ...
// then, right after `vl_kdforest_delete(forest);`
// do a `delete[] data;`

What is wrong in my mex file? input/output definition?

I am trying to run my mex function which I've written in c++ in VS. It compiles successfully in MATLAB but returns the wrong values. I'm pretty much sure, I'm not reading the 16-by-21 input matrix gammas. Can anybody see what is wrong here?
void fun(double gammas[], int num1, int num2, int length, double a[])
{
...
}
void mexFunction(int nlhs, mxArray *plhs, int nrhs, const mxArray *prhs)
{
double *gammas, *a;
int num1, num2, length;
size_t mrows, mcols;
mrows = 4; mcols = 21;
length = 21;
plhs[0] = mxCreateDoubleMatrix((mSize)mrows, (mwSize)ncols, mxREAL);
gammas = mxGetPr(prhs[0]);
num1 = (int)*mxGetPr(prhs[1]);
num2 = (int)*mxGetPr(prhs[2]);
a = mxGetPr(plhs[0]);
fun(gammas, num1, num2, length, a);
}
I get correct "a" when I call "fun" within a "main" instead of "mex" function in VS and manually provide the input gammas. I receive wrong "a" when I call the resulted mex file in my MATLAB code.
As suspected in comments to your question issue is due to how matlab and c/c++ order array elements for linear storage as 1D array in memory. Matlab uses column-major order while C/C++ uses row-major.
I would not advice you to do permutation prior to call mex-function but rather do the permutation inside the mex function. Either as suggested by #chappjc by call to permute with mexCallMatlab or by call to mxCalcSingleSubscript which returns matlab's linear index from coordinates (whatever the number of dimensions).
Side note: Need confirmation and find back great article I read about that, but matlab uses column-major ordering because it's more appropriate for matrix multiplication (creates less page-defaults when accessing memory cache, and is thus faster). Again need confirmation ... but at least this organisation is better suited for access by columns rather than by rows ...
Edit
Btw, some simple code (C#) to obtain coordinates from maltab's zero based linear index (reverse of mxCalcSingleSubscript):
private static int[] getCoordinatesFromMatlabLinearIndex(int index, int[] arrayDims)
{
var ret = new int[count];
var count = arrayDims.Length;
for (var k = 0; k < count; k++)
{
index = Math.DivRem(index, arrayDims[k], out ret[k]);
}
return ret;
}
As an alternative to inputting a transposed matrix to address the row/column-major discrepancy that CitizenInsane pointed out, you can have the transpose handled inside the MEX file. Use a helper C++ function. You can either write a loop to copy elements, or simply call permute via mexCallMATLAB. Something like the following:
int permute2DMATtoC(mxArray*& matPermuted, const mxArray* mat)
{
mxAssert(mxGetNumberOfDimensions(mat)<=3, "Requires 2D or 3D matrix.");
mxArray *permuteRHSArgs[2];
permuteRHSArgs[0] = const_cast<mxArray*>(mat);
permuteRHSArgs[1] = mxCreateDoubleMatrix(1,3,mxREAL);
mxGetPr(permuteRHSArgs[1])[0] = 2;
mxGetPr(permuteRHSArgs[1])[1] = 1;
mxGetPr(permuteRHSArgs[1])[2] = 3; // supports 2D and 3D
return mexCallMATLAB(1, &matPermuted, 2, permuteRHSArgs, "permute");
}
Use:
mxArray *matPermuted;
permute2DMATtoC(matPermuted, prhs[0]); // matPermuted is MATLAB-managed
double *gammas = (double*)mxGetData(matPermuted);
NOTE: Since matPermuted is manage by MATLAB, you don't need to explicitly destroy it to reclaim resources, but when you are done you can do this if you want:
mxDestroyArray(matPermuted);
For RGB, it may be necessary to convert pixel order (RGB-RGB-RGB-...) to planar order (RRRR...-GGGG...-BBBB...).

Import 3D array from MAT-file using C++

I would like to know if there is a way to know the 'z' dimension of a 3D array when reading data from a 'MAT-file' using the MATLAB API. I've implemented a function to load the data from file as follows:
double* importMATFile(const char* i_file)
{
MATFile *pMF;
// open MAT-file
pMF = matOpen(i_file, "r");
// check for file errors
// Matlab Array Data
mxArray *mArrayData;
// Matlab Variable Name
const char* mVarName = NULL;
// read data from file
mArrayData = matGetNextVariable(pMF, &mVarName);
// pointer to mxArray data
double *dataPtr;
dataPtr = (double*) mxGetPr(mArrayData);
// NOTE MATLAB work in COLUMN-MAJOR order
// dimension of the array : rows
int32_t NROWS = mxGetM(mArrayData);
// Right now the z dimension must be known a priori
int32_t NDEPTH = 32
// dimension of the array : cols
int32_t NCOLS = mxGetN(mArrayData) / NDEPTH;
return dataPtr;
}
I'm stuck when getting the DEPTH value, in order to know the number of columns. I've have noticed that the result of the function mxGetNumberOfDimensions(mArrayData) is 3, so, the API knows there are three dimensions.
I believe what you want is mxGetDimensions. It will return the size of each of the dimensions. This should work for any number of dimensions, not just 3.