Converting arrayfire array to a numpy array using pybind - c++

I am a complete novice trying to interface some code I wrote with python with another code written with C++. The basic gist is that I have a function in my python code that takes in three numpy arrays, does some processing with them, and then spits out another numpy array. The C++ code I am trying to hook it up to will produce three arrayfire arrays that should be converted to numpy arrays fed into my python script for processing, and then I want my python script to do the processing on these arrays and spit out the resulting array which should then be converted back into an arrayfire array. I really have very little knowledge of C++ or pybind, so I've basically been working off trial/error combined with chatGPT. Unfortunately, I've hit a complete wall in the conversion step (converting to and from arrayfire arrays to numpy arrays). I'm trying to do the most simple "Hello world" example I can think of (i.e. convert a 2 by 2 arrayfire array to numpy), but when I compile and run this code, I inevitably end with a segmentation fault. I'm a bit at my wits end here, so I'm trying to figure out what is going wrong.
Here is my c++ code so far
#include <iostream>
#include <pybind11/pybind11.h>
#include <pybind11/embed.h>
#include <pybind11/numpy.h>
#include <arrayfire.h>
namespace py = pybind11;
int main() {
af::array arr = af::array({2, 2}, {1, 2, 3, 4}).as(af::dtype::f32);
// Get the data as a pointer
float *h_arr = arr.host<float>();
// Convert the C++ array to a NumPy array
py::array np_arr = py::array(
py::buffer_info(
h_arr,
sizeof(float),
py::format_descriptor<float>::format(),
2,
{2, 2},
{sizeof(float) * 2, sizeof(float)}
)
);
return 0;
}
This code compiles fine, but when I run it I inevitably get a segmentation fault, and being a novice, it's very hard to figure out what is the cause of this.

Related

How does one pass an integer, to a pointer, to an std::array<double, integer>, while satisfying a constnt expression?

I have a function noise.cpp which is currently of the form,
double* noise(int* steps, ...)
//some code
std::array<double, *steps> NoiseOut;
//rest of code
Which is being tested and accessed by cppnoise.cpp,
#include <random>
#include <cmath>
#include<stdio.h>
#include <iostream>
#include "noise.h"
main(){
double* out;
int steps = 8;
int* ptr = &steps;
out = noise(ptr, 1, 1000, 0.1, 1, 3);
printf("TEST \n");
std::cout << out[0];
}
With header file,
extern double* noise(int*, double, double, double, double, double);
Previously I accessed the noise.cpp function through Python where the NoiseOut array was initially, double* NoiseOut = new double[steps]; with desirable results, but this method resulted in a memory leak.
Initially I tried deleting the allocated memory. But the function returns NoiseOut, so I am not sure if that's possible? So, instead I found out that the modern way is to use std::array as it comes with some form of garbage collection. If I tried to do,
double* noise(int steps, ...)
std::array<double, steps> NoiseOut;
I was told steps was not a constant expression. I tried every which way of constexpr, const, and static but, with no success. Usually with the same error error: ‘steps’ is not a constant expression. Also, the reason I pass a pointer from cppnoise.cpp to noise.cpp is because I read somewhere that the pointer is easier to work with, later in compile-time? Such that maybe I could convert it to a constant expression? Probably a fever dream.
So, how can I declare an integer value in a program, which I pass to a function, which is usable in an std::array without causing that error?
NOTE: I am very new to c++ and work primarily with SQL, Python, R, and SageMath.
std::array is ill suited for this because you don't know what size you need until the code runs. std::array needs to know the size when it is compiled.
Using new gives a dynamic array size at run time, which is why you could use it before.
If you are concerned about memory leaks (or actually, in general), then I suggest using an std::vector instead:
#include <vector>
//...
std::vector<double> NoiseOut;
NoiseOut.reserve(*steps);
An std::vector should allow you to do most everything an std::array or C Array would allow you to so, though I suggest reading up on its documentation (linked above). Note that std::vector also comes with its own garbage collection of sorts in the same way std::array does.

Pass Array of pointers to arrays to function where malloc() will occur

I've been avoiding this situation by running malloc() outside the function, but in reality the function knows how big the arrays need to be and the outside can't know how big the arrays need to be.
What I have: uint8_t *jpg[6], which is six pointers to six jpg compressed images which will be malloc-ed by the code that reads in the files. To put it another way this is an array of six pointers to six arrays of indeterminate size.
I have been trying to figure out how to pass the pointer to the pointers into the function so it can malloc() the memory with the known sizes of the jpg data.
I have tried many things but can't get anything to compile.
My latest attempt looks like this and I don't understand why it doesn't work:
Main code:
...
uint8_t *jpg[6];
int size[6]; // returns the size of the images in bytes.
LoadJPG(&jpg, size);
...
Function:
LoadJPG(uint8_t ***jpg, int *size)
{
...
*jpg = (uint8_t *) malloc(blahblahblah);
...
memcpy(**jpg, *indata, blahblahblah);
...
}
Error points to the function call and function:
error: argument of type "uint8_t *(*)[6]" is incompatible with parameter of type "uint8_t ***"
I'm compiling with gcc 4.9.4
In C++ it is undefined behaviour to write into malloc'd space without also creating objects in it. You mention you're learning - a good way to learn is to use simple, idiomatic C++ code.
The program could look like:
#include <array>
#include <vector>
void LoadJPG( std::array<std::vector<uint8_t>, 6> &jpgs )
{
jpgs[0].resize(12345);
// use std::copy or memcpy to copy into &jpgs[0][0]
jpgs[1].resize(23456);
// etc.
}
int main()
{
std::array<std::vector<uint8_t>, 6> jpgs;
LoadJPG(jpgs);
}
For those who are confused like I was, the right way to do it with C structures (in case you're using something antiquated like CudaC and don't want to spend all eternity converting C++ structures to C structures) is really pretty obvious and I feel pretty dumb for not realizing it until this morning.
main:
uint8_t *jpg[CAMERAS];
int size[CAMERAS];
GetRawImagesFromCamera(jpg, size);
...
free(jpg[]);
function:
void GetRawImagesFromCamera(uint8_t **jpg, int *size)
...
for (i=0; i < CAMERAS; i++)
{
jpg[i] = (uint8_t *) malloc(size[i]);
memcpy((void *) jpg[i], (void *) buff[i], size[i]);
...
}
...
This works because arrays are passed by a pointer to the first element. I had convinced myself that I needed to pass a pointer to the pointers, but that's exactly what gets passed when you pass an array.

Error in using max function with Armadillo sparse matrices

Here is the code that I am getting error(type mismatch) on line no. with max:
#include <iostream>
#include <stdlib.h>
#include <math.h>
#include<armadillo>
using namespace std;
using namespace arma;
int main(int argc, char** argv) {
umat loc;
loc<<0<<0<<3<<endr
<<2<<4<<4<<endr;
vec val={1,2,3};
sp_mat m(loc,val);
double t=arma::max(sum(square(m),1)) + 1.0;
cout<<t<<endl;
return 0;
}
Can somebody tell me why is that error happening and how to get around this.
Note: cout<<max(sum(square(m),1)) prints the result to console but adding any number to the output flags error.
If you want to convert a 1x1 matrix into a pure scalar (like double), use the as_scalar() function. Same goes for any Armadillo expression that results in a 1x1 matrix.
It's a good idea to read the Armadillo documentation thoroughly before posting questions on Stackoverflow.
Modifying your example:
umat loc = { { 0, 0, 3 },
{ 2, 4, 4 } };
vec val = {1, 2, 3};
sp_mat m(loc,val);
m.print("m:");
max(sum(square(m),1)).print("expression:");
double t = as_scalar( max(sum(square(m),1)) );
cout << t << endl;
You haven't told us (and I can't find in the documentation) exactly what data type is returned by arma::max(sum(square(m),1))
You have tested that whatever it is does not implicitly convert to double and whatever it is can be sent to a stream and when that is done it looks like a double.
My guess is it is something that can be explicitly converted to double so try:
(double)arma::max(sum(square(m),1)) + 1.0
The documentation shows the returned value for a dense matrix being used to initialize a double so that is obviously a type than can be explicitly converted to double. I had initially missed the thing you linked for me effectively saying sum does something on sparse matrix compatible with what it does on dense. So you can almost conclude (rather than just guess) that max(sum(m)) should be the same type (explicitly convertible to double).
If that doesn't work, we will really need a full quote of the error message, not just a summary of what it seems to mean.
Now that we have an error message, we can see this is a flaw in Armadillo's template metaprogramming:
Operations are stacked in template meta programming in order to avoid creating excess temporary objects. Then the meta programming must resolve the whole mess when the result is used.
If this is a minor flaw in the meta programming, you could add just one trivial temporary to fix it:
double t = arma::max(sum(square(m),1));
cout << t+1.0 endl;
But you probably already tried that. So you may need more temporaries and you probably need to give them exact correct types (rather than use auto). My first guess would be:
colvec v = sum(square(m),1);
Then see what works with arma::max(v)
(Earlier I made a negative comment on an answer that suggested starting with auto temporaries for each step. That answer was deleted. It wasn't far wrong. But I'd still say it was wrong to start there without seeing the template meta-programming failures and likely, though I'm not sure, wrong to use auto to try to bypass a meta-programming failure.)

Using vectors with GMP

I'm trying to use vectors with GMP. But when I compile anything like this, I get "[...]\bits\vector.tcc [Error] array must be initialized with a brace-enclosed initializer". Any data structure with dynamic size works - a deque would be best but I had even more errors popping up when I tried that. How do I make this stop failing?
#include <vector>
#include <gmp.h>
int main(){
mpz_t test;
mpz_init(test);
std::vector<mpz_t> a_vector;
a_vector.push_back(test);
return 0;
}
Since GMP numbers are not directly assignable (in other words, you can't do mpz_t test = 0;' ormpz_t test1l; test1 = test;`, I don't believe they can be used in standard C++ container types.
If you want to do that, you may want to use the C++ interface for GMP instead:
https://gmplib.org/manual/C_002b_002b-Interface-General.html

Using "extern C" from C++ with vector for ctypes

My goal is to use ctypes to call from within Python a C++ library I'm creating separately, where I am passing numpy vectors to the C++ routines via pointers. So Python would allocate the memory and just pass the address, and then the C++ routine would access the data and perform its calculations and return the results.
Since I'm new to ctypes, right now I'm gradually building a working toy example. I'm starting off writing the C++ code and creating a C extern interface that the Python wrapper code will use. I haven't even started on the Python wrapper code yet but have some experience with that aspect already.
Here is my example which includes a constructor and a simple total function.
// foo.cpp //
#include <iostream>
#include <string>
#include <vector>
using namespace std;
class Data {
public:
Data(vector<double>*);
void PrintTotal(void);
private:
vector<double>* myContents;
};
Data::Data(vector<double>* input) {
std::cout << "Entered constructor in C++" << std::endl;
myContents = input;
}
void Data::PrintTotal(void) {
double total = 0;
for (int i=0; i<myContents->size(); i++){
total += (*myContents)[i];
}
std::cout << "Hello, my total is " << total << std::endl;
}
extern "C" {
Data* Data_new(double (**input)) {return new Data(input);}
void Print_Total(Data* pData) {pData->PrintTotal();}
}
Note in particular that I'm using the vector class from the STL, which may be part of the problem described below. The idea is that the class Data holds a pointer to the data and does not replicate the data.
When I try to compile this using the command
g++ -c -fPIC foo.cpp -o foo.o
on our Linux machine, I get the following error:
foo.cpp: In function âData* Data_new(double**)â:
foo.cpp:26: error: no matching function for call to âData::Data(double**&)â
foo.cpp:13: note: candidates are: Data::Data(std::vector<double, std::allocator<double> >*)
foo.cpp:6: note: Data::Data(const Data&)
This seems pretty clear to me: it's saying that the way I call the Data constructor in the third to last line (#26) in foo.cpp does not coincide with the constructor I wrote in the C++ part of the code. In both the C++ code and the C extern code, I'm trying to say that the input is a pointer to a vector/array of doubles. I've tried other options on line #36 but it still doesn't compile.
How do I write the extern C part while still using the vector class (which I will find useful when I code the real algorithm) in the C++ part? Am I running into trouble because vector is part of the STL, which doesn't work well with extern C?
Thanks in advance.
First, let's answer your question. You should change your constructor so that it takes the same arguments as Data_new, and converts the input into a std::vector. However, it looks like there are a couple of conceptual issues that you need to focus on first:
Arrays and vectors are not equivalent: if your input points to an
array, but you want to work with a vector, you'll need to copy
all of the elements from the array into the vector (or ask a
vector constructor to do it for you). You need to do the copy
because all STL containers hold onto copies of the elements that are
put in them.
You can construct the vector using two iterators, with:
myContents(data, data + numberOfElements);
But this brings up the second conceptual point --- there is nothing in the signature of your code that tells you how large the input array is.
In Data you're holding onto a pointer to a vector, but you
never free it. This is a memory leak. That being said, you should
probably just hold onto a vector, not a pointer to a vector.