Modifying a C++ array in main() from Lua without extra allocation - c++

I am sketching a small C++ program that will pass arrays to Lua and have them modified there, where I intend to have a lua script read in the program so I can modify it without needing to recompile the program
My first obstacle is to ensure Lua is able to modify arrays already allocated instead of having them allocated again in the Lua space. The data will be float and the size will be really large, but I am starting small for the moment.
To simplify this interface I tried LuaBridge 2.6, but it doesn't provide the expected result. Below is a fully "working" program.
#include <iostream>
#include <cstdint>
#include <cstring>
#include <vector>
#include <lua5.3/lua.hpp>
#include <LuaBridge/LuaBridge.h>
int main(void)
{
const uint32_t LENGTH = 512 * 256;
std::vector <float> input(LENGTH),
output(LENGTH);
memset(output.data(), 0, LENGTH * sizeof(float)); // Zero the output
for(uint32_t i = 0; i < LENGTH; i++) // Populate input
input[i] = (float)i + 0.5f;
lua_State *luastate = luaL_newstate();
luabridge::push(luastate, input.data()); // Supposedly passing a pointer to the first element of input, according to LuaBridge manual chap 3-3.1
luabridge::push(luastate, output.data()); // Same for output
luaL_dostring(luastate, "output[10] = input[256]"); // Expecting to assign this value in the C++ arrays, not in the Lua space
lua_getglobal(luastate, "output[10]"); // Find this assigned value in the Lua stack
lua_Number val = lua_tonumber(luastate, 1); // Retrieving this value from Lua to C++
std::cout << input[256] << ", " << output[10] << ", " << val << std::endl; // The values of val and in output[10] don't match
lua_close(luastate);
return 0;
}
Notice that nothing matches. What is going to output[10] in Lua is not the value of input[256] in the C++ space, but input[0].
The C++ output array is not updated from within Lua, cout shows that it remains as we initialized (0).
To confirm that, we pushed this value of output[10] to the stack, which is not input[256] in C++, and retrieved from C++.
Can you guys correct me or point me to where I should be going to achieve this?
======= UPDATE 08/11/2020 =======
To clarify what the program is doing (or supposed to do), after reading Robert's and Joseph's considerations, I post below an updated version of both the C++ part and the lua script called by it. Notice I abandoned LuaBridge since I didn't succeed in the first attempt:
C++:
#include <iostream>
#include <cstdint>
#include <cstring>
#include <vector>
#include <luajit-2.0/lua.hpp> // LuaJIT 2.0.4 from Ubuntu 16.04
int main(void)
{
const uint32_t LENGTH = 256 * 512;
std::vector <float> input(LENGTH),
output(LENGTH);
memset(output.data(), 0, LENGTH * sizeof(float));
for(uint32_t i = 0; i < LENGTH; i++)
input[i] = (float)i + 0.5f;
lua_State *luastate = luaL_newstate();
luaL_openlibs(luastate);
// Here I have to pass &input[0], &output[0] and LENGTH
// to Lua, which in turn will pass to whatever functions
// are being called from a .so lib opened in Lua-side
luaL_dofile(luastate, "my_script.lua");
lua_close(luastate);
return 0;
}
The Lua script looks like this:
local ffi = require("ffi")
local mylib = ffi.load("/path_to_lib/mylib.so")
-- Here I import and call any fuctions needed from mylib.so
-- without needing to recompile anything, just change this script
-- At this point the script has to know &input[0], &output[0] and LENGTH
ffi.cdef[[int func1(const float *in, float *out, const uint32_t LEN);]]
ffi.cdef[[int func2(const float *in, float *out, const uint32_t LEN);]]
ffi.cdef[[int funcX(const float *in, float *out, const uint32_t LEN);]]
if(mylib.func1(input, output, LENGTH) == 0) then
print("Func1 ran successfuly.")
else
print("Func1 failed.")
end

I am sketching a small C++ program that will pass arrays to Lua
The data will be float and the size will be really large,
My suggestion:
Keep the buffer on the C side (as a global variable for example)
Expose a C-function to LUA GetTableValue(Index)
Expose a C-function to Lua SetTableValue(Index, Value)
It should be something like this:
static int LUA_GetTableValue (lua_State *LuaState)
{
float Value;
/* lua_gettop returns the number of arguments */
if ((lua_gettop(LuaState) == 1) && (lua_isinteger(LuaState, -1)))
{
/* Get event string to execute (first parameter) */
Offset = lua_tointeger(LuaState, -1);
/* Get table value */
Value = LUA_FloatTable[Offset];
/* Push result to the stack */
lua_pushnumber(Value);
}
else
{
lua_pushnil(LuaState);
}
/* return 1 value */
return 1;
}
And you also need to register the function:
lua_register(LuaState, "GetTableValue", LUA_GetTableValue);
I let you write the SetTableValue but it should be very close.
Doing so, the buffer is on C side and can be accessed from Lua with dedicated functions.

I recommend you create a userdata that exposes the arrays via __index and __newindex, something like this (written as a C and C++ polyglot like Lua itself):
#include <stdio.h>
#include <string.h>
#ifdef __cplusplus
extern "C" {
#endif
#include <lua5.3/lua.h>
#include <lua5.3/lauxlib.h>
#ifdef __cplusplus
}
#endif
struct MyNumbers {
lua_Number *arr;
lua_Integer len;
};
int MyNumbers_index(lua_State *L) {
struct MyNumbers *t = (struct MyNumbers *)luaL_checkudata(L, 1, "MyNumbers");
lua_Integer k = luaL_checkinteger(L, 2);
if(k >= 0 && k < t->len) {
lua_pushnumber(L, t->arr[k]);
} else {
lua_pushnil(L);
}
return 1;
}
int MyNumbers_newindex(lua_State *L) {
struct MyNumbers *t = (struct MyNumbers *)luaL_checkudata(L, 1, "MyNumbers");
lua_Integer k = luaL_checkinteger(L, 2);
if(k >= 0 && k < t->len) {
t->arr[k] = luaL_checknumber(L, 3);
return 0;
} else {
return luaL_argerror(L, 2,
lua_pushfstring(L, "index %d out of range", k));
}
}
struct MyNumbers *MyNumbers_new(lua_State *L, lua_Number *arr, lua_Integer len) {
struct MyNumbers *var = (struct MyNumbers *)lua_newuserdata(L, sizeof *var);
var->arr = arr;
var->len = len;
luaL_setmetatable(L, "MyNumbers");
return var;
}
int main(void) {
const lua_Integer LENGTH = 512 * 256;
lua_Number input[LENGTH], output[LENGTH];
memset(output, 0, sizeof output);
for(lua_Integer i = 0; i < LENGTH; ++i)
input[i] = i + 0.5f;
lua_State *L = luaL_newstate();
luaL_newmetatable(L, "MyNumbers");
lua_pushcfunction(L, MyNumbers_index);
lua_setfield(L, -2, "__index");
lua_pushcfunction(L, MyNumbers_newindex);
lua_setfield(L, -2, "__newindex");
/* exercise for the reader: implement __len and __pairs too, and maybe shift the indices so they're 1-based to Lua */
lua_pop(L, 1);
MyNumbers_new(L, input, LENGTH);
lua_setglobal(L, "input");
MyNumbers_new(L, output, LENGTH);
lua_setglobal(L, "output");
luaL_dostring(L, "output[10] = input[256]");
lua_getglobal(L, "output");
lua_geti(L, -1, 10);
lua_Number val = lua_tonumber(L, -1);
printf("%f, %f, %f\n", input[256], output[10], val);
lua_close(L);
}
With this approach, there is no copy of any data in Lua, and your own MyNumbers_ functions control how all access to them is done.
If you want to be able to use the arrays through LuaJIT's FFI instead of directly manipulating them in Lua, then you can pass their addresses in a light userdata instead, like this:
#include <string.h>
#ifdef __cplusplus
extern "C" {
#endif
#include <luajit-2.0/lua.h>
#include <luajit-2.0/lualib.h>
#include <luajit-2.0/lauxlib.h>
#ifdef __cplusplus
}
#endif
int main(void) {
const lua_Integer LENGTH = 256 * 512;
lua_Number input[LENGTH], output[LENGTH];
memset(output, 0, sizeof output);
for(lua_Integer i = 0; i < LENGTH; ++i)
input[i] = i + 0.5f;
lua_State *L = luaL_newstate();
luaL_openlibs(L);
lua_pushlightuserdata(L, input);
lua_setglobal(L, "input");
lua_pushlightuserdata(L, output);
lua_setglobal(L, "output");
lua_pushinteger(L, LENGTH);
lua_setglobal(L, "LENGTH");
luaL_dofile(L, "my_script.lua");
lua_close(L);
}

Related

How to pass userdata from one Lua chunk to another in C++

I'm trying to get userdata from a Lua script(chunk A) in C++(through a returned variable from function in my example) and then, later pass this userdata back to Lua script(chunk B) from C++(through a function argument in my example) so the userdata can be used in chunk B as it was in the chunk A.
MyBindings.h
class Vec2
{
public:
Vec2():x(0), y(0){};
Vec2(float x, float y):x(x), y(y){};
float x, y;
};
MyBindings.i
%module my
%{
#include "MyBindings.h"
%}
%include "MyBindings.h"
main.cpp
#include <iostream>
#include <lua.hpp>
extern "C"
{
int luaopen_my(lua_State *L);
}
int main()
{
lua_State *L = luaL_newstate();
luaL_openlibs(L);
luaopen_my(L);
lua_settop(L, 0);
/* chunk A */
luaL_dostring(L, "local vec2 = my.Vec2(3, 4)\n"
"function setup()\n"
"return vec2\n"
"end\n");
/* chunk B */
luaL_dostring(L, "function test(p)\n"
"print(p.x)\n"
"end\n");
void *userDataPtr = nullptr;
/* call setup function */
int top = lua_gettop(L);
lua_getglobal(L, "setup");
if (lua_pcall(L, 0, LUA_MULTRET, 0))
{
std::cout << lua_tostring(L, -1) << '\n';
lua_pop(L, 1);
}
/* check the return value */
if (lua_gettop(L) - top)
{
/* store userdata to a pointer */
if (lua_isuserdata(L, -1))
userDataPtr = lua_touserdata(L, -1);
}
/* check if userDataPtr is valid */
if (userDataPtr != nullptr)
{
/* call test function */
lua_getglobal(L, "test");
lua_pushlightuserdata(L, userDataPtr); /* pass userdata as an argument */
if (lua_pcall(L, 1, 0, 0))
{
std::cout << lua_tostring(L, -1) << '\n';
lua_pop(L, 1);
}
}
lua_close(L);
}
The Result I get :
[string "local vec2 = my.Vec2(3, 4)..."]:6: attempt to index a
userdata value (local 'p')
The Result I expect :
3
Is it possible to get userdata from chunk A and then pass this to chunk B so it can be used like it was in chunk A?
You're losing all information about the object's type when you get raw pointer to userdata's data and pushing it to arguments as lightuserdata. The lightuserdata even has no individual metatables.
The correct way would be to pass the Lua value as it is. Leave the original returned value on Lua stack, or copy it into other Lua container (your Lua table for temporaries, or Lua registry), then copy that value on Lua stack to pass it as an argument. That way you don't have to know anything about binding implementation. You don't even have to care if that's a userdata or any other Lua type.
Based on your code, this might look like this:
#include <iostream>
#include <lua.hpp>
extern "C"
{
int luaopen_my(lua_State *L);
}
int main()
{
lua_State *L = luaL_newstate();
luaL_openlibs(L);
/* chunk A */
luaL_dostring(L, "local vec2 = {x=3, y=4}\n"
"function setup()\n"
"return vec2\n"
"end\n");
/* chunk B */
luaL_dostring(L, "function test(p)\n"
"print(p.x)\n"
"end\n");
/* call setup function */
int top = lua_gettop(L);
lua_getglobal(L, "setup");
if (lua_pcall(L, 0, LUA_MULTRET, 0))
{
std::cout << lua_tostring(L, -1) << '\n';
lua_pop(L, 1);
exit(EXIT_FAILURE); // simpy fail for demo
}
/* check the return value */
if (lua_gettop(L) - top)
{
// the top now contains the value returned from setup()
/* call test function */
lua_getglobal(L, "test");
// copy the original value as argument
lua_pushvalue(L, -2);
if (lua_pcall(L, 1, 0, 0))
{
std::cout << lua_tostring(L, -1) << '\n';
lua_pop(L, 1);
exit(EXIT_FAILURE);
}
// drop the original value
lua_pop(L, 1);
}else
{
// nothing is returned, nothing to do
}
lua_close(L);
}
In addition to the other answer I would like to show a variant where you store the a reference to the value in the Lua registry. The advantage of this approach is that you don't have to keep the value on the stack and think about what the offset will be. See also 27.3.2 – References in “Programming in Lua”.
This approach uses three functions:
int luaL_ref (lua_State *L, int t);
Pops from the stack the uppermost value, stores it into the table at index t and returns the index the value has in that table. Hence to save a value in the registry we use
userDataRef = luaL_ref(L, LUA_REGISTRYINDEX);
int lua_rawgeti (lua_State *L, int index, lua_Integer n);
Pushes onto the stack the value of the element n of the table at index (t[n] in Lua). Hence to retrieve a value at index userDataRef from the registry we use
lua_rawgeti(L, LUA_REGISTRYINDEX, userDataRef);
void luaL_unref (lua_State *L, int t, int ref);
Removes the reference stored at index ref in the table at t such that the reference can be garbage collected and the index ref can be reused. Hence to remove the reference userDataRef from the registry we use
luaL_unref(L, LUA_REGISTRYINDEX, userDataRef);
#include <iostream>
#include <lua.hpp>
extern "C" {
int luaopen_my(lua_State *L);
}
int main() {
lua_State *L = luaL_newstate();
luaL_openlibs(L);
luaopen_my(L);
lua_settop(L, 0);
/* chunk A */
luaL_dostring(L, "local vec2 = my.Vec2(3, 4)\n"
"function setup()\n"
"return vec2\n"
"end\n");
/* chunk B */
luaL_dostring(L, "function test(p)\n"
"print(p.x)\n"
"end\n");
int userDataRef = LUA_NOREF;
/* call setup function */
int top = lua_gettop(L);
lua_getglobal(L, "setup");
if (lua_pcall(L, 0, LUA_MULTRET, 0)) {
std::cout << lua_tostring(L, -1) << '\n';
lua_pop(L, 1);
}
/* check the return value */
if (lua_gettop(L) - top) {
/* store userdata to a pointer */
userDataRef = luaL_ref(L, LUA_REGISTRYINDEX);
}
/* check if userDataRef is valid */
if (userDataRef != LUA_NOREF && userDataRef != LUA_REFNIL) {
/* call test function */
lua_getglobal(L, "test");
lua_rawgeti(L, LUA_REGISTRYINDEX, userDataRef);
/* free the registry slot (if you are done) */
luaL_unref(L, LUA_REGISTRYINDEX, userDataRef);
if (lua_pcall(L, 1, 0, 0)) {
std::cout << lua_tostring(L, -1) << '\n';
lua_pop(L, 1);
}
}
lua_close(L);
}
Maybe you want to check out the Sol2 wrapper for the Lua-C-API. It can do exactly what you want with minimal boilerplate. However, it requires C++14.
#include <iostream>
#define SOL_CHECK_ARGUMENTS 1
#include <sol.hpp>
extern "C" int luaopen_my(lua_State *L);
int main() {
sol::state L;
L.open_libraries();
luaopen_my(L);
/* chunk A */
L.script("local vec2 = my.Vec2(3, 4)\n"
"function setup()\n"
"return vec2\n"
"end\n");
/* chunk B */
L.script("function test(p)\n"
"print(p.x)\n"
"end\n");
auto userDataRef = L["setup"]();
L["test"](userDataRef);
}

How to use dynamic MPI_Type_create with MPI_Bcast?

So far I've been using openmpi/1.10.2 with gcc/5.3.0, and my code has been working fine.
The cluster I'm working on changed its MPI implementation to cray-mpich/7.5.0 with gcc/5.3.0, and I found the following error.
The compiler is preallocating local variables (idx, disp, blocks, and types) as <optimized out>. All the arrays are preallocated with size == 2.
#include<mpi.h>
#include<vector>
#include<iostream>
int main( int argc, char** argv)
{
MPI_Init(&argc, &argv);
int rank;
int size;
MPI_Comm_rank(MPI_COMM_WORLD,&rank); //passing the references
MPI_Comm_size(MPI_COMM_WORLD,&size); //passing the references
std::vector<int> mIntegers(0);
std::vector<double> mFloats(2);
if (rank == 0 )
{
mFloats[0]=1.0;
mFloats[1]=1.0;
}
int ioRank = 0;
int nBlocks = 0;
if(mIntegers.size() > 0)
{
nBlocks++;
}
if(mFloats.size() > 0)
{
nBlocks++;
}
int idx = 0;
MPI_Aint displ[nBlocks];
int blocks[nBlocks];
MPI_Datatype types[nBlocks];
MPI_Aint element;
// Create integer part
if(mIntegers.size() > 0)
{
MPI_Get_address(mIntegers.data(), &element);
displ[idx] = element;
blocks[idx] = mIntegers.size();
types[idx] = MPI_INT;
idx++;
}
// Create floats part
if(mFloats.size() > 0)
{
MPI_Get_address(mFloats.data(), &element);
displ[idx] = element;
blocks[idx] = mFloats.size();
types[idx] = MPI_DOUBLE;
idx++;
}
MPI_Datatype paramType;
// Create MPI datatype
MPI_Type_create_struct(nBlocks, blocks, displ, types, &paramType);
// Commit MPI datatype
MPI_Type_commit(&paramType);
// Broadcast the information
MPI_Bcast(MPI_BOTTOM, 1, paramType, ioRank, MPI_COMM_WORLD);
MPI_Barrier(MPI_COMM_WORLD);
std::cout<<"Process:"<< rank <<" of " << size << " F[0] "<<mFloats[0]<< ", F[1] "<<mFloats[1]<< std::endl ;
// Free the datatype
MPI_Type_free(&paramType);
MPI_Finalize();
return 0;
}
I've tried initialising the arrays with new setting them to zero and std::vector to avoid over-optimisation or memory leaks without any success.
The code is compiled with:
$mpic++ -O2 segFault.cpp -o segFault
and executed:
$mpirun -n 16 segFault
As a result MPI_Bcast leads to segmentation fault due to a mismatch of memory allocation.
MPICH defines MPI_BOTTOM and MPIR_F08_MPI_BOTTOMas
#define MPI_BOTTOM (void *)0
extern int MPIR_F08_MPI_BOTTOM;
whereas open-mpi defines MPI_BOTTOM as
#define MPI_BOTTOM ((void *) 0) /* base reference address */

reading data in C++

I am having the matlab code to read binary data:
**nfft = 256;
navg = 1024;
nsamps = navg * nfft;
f_s = 8e6;
nblocks = floor(10 / (nsamps / f_s));
for i = 1:nblocks
nstart = 1 + (i - 1) * nsamps;
fid = fopen('data.dat'); % binary data and 320 MB
fseek(fid,4 * nstart,'bof');
y = fread(fid,[2,nsamps],'short');
x = complex(y(1,:),y(2,:));
end**
it will give me complex data with the length up to 8e6.
I am trying to write C++ to do the same function what matab does, but I could not get all the data or they are not the same original.
Can anyone help for ideals?
Here is my C++ code which I am working on.
Thank you so much.
#include <cstdio>
#include <cstring>
#include <iostream>
#include <complex>
#include <vector>
#include <stdlib.h>
struct myfunc{
char* name;
};
int main() {
FILE* r = fopen("data.bin", "rb");
fread( w, sizeof(int), 30, r);
fread(&c, sizeof(myfunc),1,r);
for(int i=0; i < 30; i++){
cout<< i << ". " << w[i] << endl;
}
return 0;
}
Based on comment
the c I called from the struct myfunc and the w is the vector. so they are will be : int w[40]; myfunc c;
fread(&c, sizeof(myfunc),1,r);
will read one pointer's worth of data from file stream r into c. This will not be particularly useful as whatever address myfunc.name pointed at when the file was written will almost certainly be invalid when the file is read back.
Solution: Serialize myfunc.name when writing to the file and deserialize it when reading. Insufficient information is in the question to suggest how best to do this. I would store the string Pascal style and prepend the length of myfunc.name to make reading it back easier:
int len = strlen(myfunc.name);
fwrite(&len, sizeof(len), 1, outfile); // write length
fwrite(myfunc.name, len, 1, outfile); // write string
and read it
int len;
fread(&len, sizeof(len), 1, infile); // read length
myfunc.name = new char[len+1]; // size string with space for terminator
fwrite(myfunc.name, len, 1, infile); // read string
myfunc.name[len] = '\0'; // terminate string
Note the above code completely ignores endian and error handling.

Using extern on Halide with GPU

I try to use extern function in Halide. In my context, I want to do it on GPU.
I compile in AOT compilation with opencl statement.
Of course, opencl can still use CPU, so I use this:
halide_set_ocl_device_type("gpu");
For now, everything is schedule at compute_root().
First question, if I use compute_root() and OpenCL gpu, did my process will be compute on the device with some CopyHtoD and DtoH? (Or it will be on Host buffer)
Second question, more related to the extern functions. We use some extern call because some of our algorithm is not in Halide.
Extern call:
foo.define_extern("cool_foo", args, Float(32), 4);
Extern retrieve:
extern "C" int cool_foo(buffer_t * in, int w, int h, int z, buffer_t * out){ .. }
But, in the cool_foo function, my buffer_t are load only in host memory. The dev address is 0 (default).
If I try to copy the memory before the algorithm:
halide_copy_to_dev(NULL, &in);
It does nothing.
If I make available only the device memory:
in.host = NULL;
My host pointer are null, but the device address is still 0.
(dev_dirty is true on my case and host_dirty is false)
Any idea?
EDIT (To answer dsharlet)
Here's the structure of my code:
Parse data correctly on CPU. --> Sent the buffer on the GPU (Using halide_copy_to_dev...) --> Enter in Halide structure, read parameter and Add a boundary condition --> Go in my extern function -->...
I don't have a valid buffer_t in my extern function.
I schedule everything in compute_root(), but use HL_TARGET=host-opencl and set ocl to gpu.
Before entering in Halide, I can read my device address and it's ok.
Here's my code:
Before Halide, everything was CPU stuff(The pointer) and we transfert it to GPU
buffer_t k = { 0, (uint8_t *) k_full, {w_k, h_k, num_patch_x * num_patch_y * 3}, {1, w_k, w_k * h_k}, {0}, sizeof(float), };
#if defined( USEGPU )
// Transfer into GPU
halide_copy_to_dev(NULL, &k);
k.host_dirty = false;
k.dev_dirty = true;
//k.host = NULL; // It's k_full
#endif
halide_func(&k)
Inside Halide:
ImageParam ...
Func process;
process = halide_sub_func(k, width, height, k.channels());
process.compute_root();
...
Func halide_sub_func(ImageParam k, Expr width, Expr height, Expr patches)
{
Func kBounded("kBounded"), kShifted("kShifted"), khat("khat"), khat_tuple("khat_tuple");
kBounded = repeat_image(constant_exterior(k, 0.0f), 0, width, 0, height, 0, patches);
kShifted(x, y, pi) = kBounded(x + k.width() / 2, y + k.height() / 2, pi);
khat = extern_func(kShifted, width, height, patches);
khat_tuple(x, y, pi) = Tuple(khat(0, x, y, pi), khat(1, x, y, pi));
kShifted.compute_root();
khat.compute_root();
return khat_tuple;
}
Outside Halide(Extern function):
inline ....
{
//The buffer_t.dev and .host are 0 and null. I expect a null from the host, but the dev..
}
I find the solution for my problem.
I post the answer in code just here. (Since I did a little offline test, the variable name doesn't match)
Inside Halide: (Halide_func.cpp)
#include <Halide.h>
using namespace Halide;
using namespace Halide::BoundaryConditions;
Func thirdPartyFunction(ImageParam f);
Func fourthPartyFunction(ImageParam f);
Var x, y;
int main(int argc, char **argv) {
// Input:
ImageParam f( Float( 32 ), 2, "f" );
printf(" Argument: %d\n",argc);
int test = atoi(argv[1]);
if (test == 1) {
Func f1;
f1(x, y) = f(x, y) + 1.0f;
f1.gpu_tile(x, 256);
std::vector<Argument> args( 1 );
args[ 0 ] = f;
f1.compile_to_file("halide_func", args);
} else if (test == 2) {
Func fOutput("fOutput");
Func fBounded("fBounded");
fBounded = repeat_image(f, 0, f.width(), 0, f.height());
fOutput(x, y) = fBounded(x-1, y) + 1.0f;
fOutput.gpu_tile(x, 256);
std::vector<Argument> args( 1 );
args[ 0 ] = f;
fOutput.compile_to_file("halide_func", args);
} else if (test == 3) {
Func h("hOut");
h = thirdPartyFunction(f);
h.gpu_tile(x, 256);
std::vector<Argument> args( 1 );
args[ 0 ] = f;
h.compile_to_file("halide_func", args);
} else {
Func h("hOut");
h = fourthPartyFunction(f);
std::vector<Argument> args( 1 );
args[ 0 ] = f;
h.compile_to_file("halide_func", args);
}
}
Func thirdPartyFunction(ImageParam f) {
Func g("g");
Func fBounded("fBounded");
Func h("h");
//Boundary
fBounded = repeat_image(f, 0, f.width(), 0, f.height());
g(x, y) = fBounded(x-1, y) + 1.0f;
h(x, y) = g(x, y) - 1.0f;
// Need to be comment out if you want to use GPU schedule.
//g.compute_root(); //At least one stage schedule alone
//h.compute_root();
return h;
}
Func fourthPartyFunction(ImageParam f) {
Func fBounded("fBounded");
Func g("g");
Func h("h");
//Boundary
fBounded = repeat_image(f, 0, f.width(), 0, f.height());
// Preprocess
g(x, y) = fBounded(x-1, y) + 1.0f;
g.compute_root();
g.gpu_tile(x, y, 256, 1);
// Extern
std::vector < ExternFuncArgument > args = { g, f.width(), f.height() };
h.define_extern("extern_func", args, Int(16), 3);
h.compute_root();
return h;
}
The external function: (external_func.h)
#include <cstdint>
#include <cstdio>
#include <cstdlib>
#include <cassert>
#include <cinttypes>
#include <cstring>
#include <fstream>
#include <map>
#include <vector>
#include <complex>
#include <chrono>
#include <iostream>
#include <clFFT.h> // All OpenCL I need are include.
using namespace std;
// Useful stuff.
void completeDetails2D(buffer_t buffer) {
// Read all elements:
std::cout << "Buffer information:" << std::endl;
std::cout << "Extent: " << buffer.extent[0] << ", " << buffer.extent[1] << std::endl;
std::cout << "Stride: " << buffer.stride[0] << ", " << buffer.stride[1] << std::endl;
std::cout << "Min: " << buffer.min[0] << ", " << buffer.min[1] << std::endl;
std::cout << "Elem size: " << buffer.elem_size << std::endl;
std::cout << "Host dirty: " << buffer.host_dirty << ", Dev dirty: " << buffer.dev_dirty << std::endl;
printf("Host pointer: %p, Dev pointer: %" PRIu64 "\n\n\n", buffer.host, buffer.dev);
}
extern cl_context _ZN6Halide7Runtime8Internal11weak_cl_ctxE;
extern cl_command_queue _ZN6Halide7Runtime8Internal9weak_cl_qE;
extern "C" int extern_func(buffer_t * in, int width, int height, buffer_t * out)
{
printf("In extern\n");
completeDetails2D(*in);
printf("Out extern\n");
completeDetails2D(*out);
if(in->dev == 0) {
// Boundary stuff
in->min[0] = 0;
in->min[1] = 0;
in->extent[0] = width;
in->extent[1] = height;
return 0;
}
// Super awesome stuff on GPU
// ...
cl_context & ctx = _ZN6Halide7Runtime8Internal11weak_cl_ctxE; // Found by zougloub
cl_command_queue & queue = _ZN6Halide7Runtime8Internal9weak_cl_qE; // Same
printf("ctx: %p\n", ctx);
printf("queue: %p\n", queue);
cl_mem buffer_in;
buffer_in = (cl_mem) in->dev;
cl_mem buffer_out;
buffer_out = (cl_mem) out->dev;
// Just copying data from one buffer to another
int err = clEnqueueCopyBuffer(queue, buffer_in, buffer_out, 0, 0, 256*256*4, 0, NULL, NULL);
printf("copy: %d\n", err);
err = clFinish(queue);
printf("finish: %d\n\n", err);
return 0;
}
Finally, the non-Halide stuff: (Halide_test.cpp)
#include <halide_func.h>
#include <iostream>
#include <cinttypes>
#include <external_func.h>
// Extern function available inside the .o generated.
#include "HalideRuntime.h"
int main(int argc, char **argv) {
// Init the kernel in GPU
halide_set_ocl_device_type("gpu");
// Create a buffer
int width = 256;
int height = 256;
float * bufferHostIn = (float*) malloc(sizeof(float) * width * height);
float * bufferHostOut = (float*) malloc(sizeof(float) * width * height);
for( int j = 0; j < height; ++j) {
for( int i = 0; i < width; ++i) {
bufferHostIn[i + j * width] = i+j;
}
}
buffer_t bufferHalideIn = {0, (uint8_t *) bufferHostIn, {width, height}, {1, width, width * height}, {0, 0}, sizeof(float), true, false};
buffer_t bufferHalideOut = {0, (uint8_t *) bufferHostOut, {width, height}, {1, width, width * height}, {0, 0}, sizeof(float), true, false};
printf("IN\n");
completeDetails2D(bufferHalideIn);
printf("Data (host): ");
for(int i = 0; i < 10; ++ i) {
printf(" %f, ", bufferHostIn[i]);
}
printf("\n");
printf("OUT\n");
completeDetails2D(bufferHalideOut);
// Send to GPU
halide_copy_to_dev(NULL, &bufferHalideIn);
halide_copy_to_dev(NULL, &bufferHalideOut);
bufferHalideIn.host_dirty = false;
bufferHalideIn.dev_dirty = true;
bufferHalideOut.host_dirty = false;
bufferHalideOut.dev_dirty = true;
// TRICKS Halide to force the use of device.
bufferHalideIn.host = NULL;
bufferHalideOut.host = NULL;
printf("IN After device\n");
completeDetails2D(bufferHalideIn);
// Halide function
halide_func(&bufferHalideIn, &bufferHalideOut);
// Get back to HOST
bufferHalideIn.host = (uint8_t*)bufferHostIn;
bufferHalideOut.host = (uint8_t*)bufferHostOut;
halide_copy_to_host(NULL, &bufferHalideOut);
halide_copy_to_host(NULL, &bufferHalideIn);
// Validation
printf("\nOUT\n");
completeDetails2D(bufferHalideOut);
printf("Data (host): ");
for(int i = 0; i < 10; ++ i) {
printf(" %f, ", bufferHostOut[i]);
}
printf("\n");
// Free all
free(bufferHostIn);
free(bufferHostOut);
}
You can compile the halide_func with the test 4 to use all the Extern functionnality.
Here's some of the conclusion I have. (Thanks to Zalman and zougloub)
Compute_root don't call the device if you use it alone.
We need gpu() of gpu_tile() in the code to call GPU routine. (BTW, you need to put all your variable inside)
gpu_tile les than your item will crash your stuff.
BoundaryCondition works well in GPU.
Before calling extern function, the Func that goes as a input need to be:
f.compute_root(); f.gpu_tile(x,y,...,...); The compute_root in the middle stage is not implicit.
If the dev address is 0, it's normal, we resend the dimension and the extern will be called again.
Last stage as a compute_root() implicit.
Are you aware of the bounds inference protocol for external array functions? This takes place when the host pointer of any buffer is NULL. (Briefly, in this case, you need to fill in the extent fields of the buffer_t structures that have NULL host pointers and do nothing else.) If you have already taken care of that, then ignore the above.
If you've tested that the host pointers are non-NULL for all buffers, then calling halide_copy_to_dev should work. You may need to explicitly set host_dirty to true beforehand to get the copy part to happen, depending where the buffer came from. (I would hope Halide gets this right and it is already set if the buffer came from a previous pipeline stage on the CPU. But if the buffer came from something outside Halide, the dirty bits are probably false from initialization. It seems halide_dev_malloc should set dev_dirty if it allocates device memory, and currently it does not.)
I would expect the dev field to be populated after a call to halide_copy_to_dev as the first thing it does is call halide_dev_malloc. You can try calling halide_dev_malloc explicitly yourself, setting host_dirty and then calling halide_copy_to_dev.
Is the previous stage on the host or on the GPU? If it is on the GPU, I'd expect the input buffer to be on the GPU as well.
This API needs work. I am in the middle of a first refactoring of somethings that will help, but ultimately it will require changing the buffer_t structure. It is possible to get most things to work, but it requires a modifying the host_dirty and dev_dirty bits as well as calling the halide_dev* APIs in just the right way. Thank you for your patience.

c++ mcrypt error at mcrypt_generic

I have the following code:
[test.cpp]
#include <mcrypt.h>
#include <string>
#include <iostream>
#include <vector>
#include <stdio.h>
#include <stdlib.h>
#include <string>
using namespace std;
int main()
{
char algo[] = "rijndael-256";
char mode[] = "cbc";
char *block_buffer=(char*)"HELLO!! MY NAME IS: ";
cout<<"here"<<endl;
string s;
char key="1234-5678-9654-7512-7895-2543-12";
char iv[] = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0};;
MCRYPT td = mcrypt_module_open(algo, NULL, mode, NULL);
if (td == MCRYPT_FAILED) { cout<<"error"<<endl;}
int keysize=32;
int r = mcrypt_generic_init(td, key, keysize, iv);
if (r<0)
{
cout<<"error2"<<endl;
mcrypt_perror(r);
return 1;
}
//while ( fread (&block_buffer, 1, 1, stdin) == 1 ) {
int j= mcrypt_generic (td, &block_buffer, sizeof(block_buffer));
if (j!=0){std::cout<<"error encrypting"<<std::endl;} // I HAVE ERROR HERE J==0
//how to print the encrypted string??
cout<<"buffer "<<block_buffer<<endl; //this is not the encriperd string. why?
mcrypt_generic_deinit(td);
mcrypt_module_close(td);
}
I am testing the code:
$: g++ test.cpp -o tst -lmcrypt
$: ./tst
WHERE SHOULT I ADD THE PKCS 7?
I have the following method:
std::string add_pkcs7_padding(std::string s, std::size_t n)
{
const std::size_t fill = n - (s.length() % n);
s.append(fill, static_cast<char>(fill));
return s;
}
std::string strip_pkcs7_padding(std::string s, std::size_t n)
{
const std::size_t pad = static_cast<unsigned char>(*s.rbegin());
return s.substr(0, s.length() - pad);
}
I din't know when should i run it and where in my code.
NEED SOME HELP. APPRECIATE A LOT!!
EDIT:
I have error at: mcrypt_generic (td, &block_buffer, sizeof(block_buffer)); The compiler prints
that the value j=0;
You should invoke mcrypt_generic() with a char*, not a char** as you do:
mcrypt_generic(td, block_buffer, sizeof(block_buffer));
^^^ ^^^^^^^^^^^^^^^^^^^^
ouch!
Also, the length is wrong, as sizeof(block_buffer) is just be the size of the pointer, not the string; if anything you need strlen(block_buffer).
But this is still going to be wrong in general because you need your message to be a multiple of the block size. Use the padding function:
std::string s = add_pkcs7_padding(block_buffer, mcrypt_enc_get_block_size(td));
std::vector<char> v(s.begin(), s.end()); // this will hold the encrypted data
mcrypt_generic(td, v.data(), v.size());
By the way, your plaintext should be declared like this:
const char * block_buffer = "HELLO!! MY NAME IS: ";
^^^^^ ^^^^
constness! no explicit cast!
But why so clumsy, it's better to just use a string:
std::string plaintext = "HELLO!! MY NAME IS: ";
I think you might benefit from picking up a good C++ book and familiarizing yourself with the basics of the language a bit - it's good to have a project to work on, but most of your problems aren't really related to encryption or mcrypt, but just general C++ programming.