How to create Apache Arrow vectors in Java, pass them to C++ code through JNI, read/write them in C++ - java-native-interface

I've been reading the Apache Arrow docs, and I've figured out how to use it in Java and C++. But what I'd like to do is offload some work to JNI (C/C++) code from Java, and the documentation (e.g. https://arrow.apache.org/docs/java/cdata.html) just doesn't seem to cover my use cases, and methods in the example (e.g. getMemoryAddress on IntVector) just don't seem to exist like they do in the examples. I want to start simple, so here's what I'd like to do:
Allocate two Arrow IntVector's in Java and fill them with data
Allocate space for another IntVector in Java for the result
Get whatever native pointers I need from those vectors and pass them through a JNI call
Wrap those vectors in C++ so I can access them.
Do whatever work I want to offload and finalize the result vector
Return to Java and have the result accessible.
Can anyone point me to an example or some tips on how to do this?
BTW, the examples also use JavaCPP instead of JNI. But I already have a bunch of JNI code in this project, and I'd rather not mix in another kind of bridge if it's not necessary.
Thanks.
I tried allocating IntVector objects in Java, but I can't tell which naive pointers I have to retrieve to pass to C++ to provide proper access to those vectors.

JavaCPP is merely a convenience for the example, JNI is fine.
The C Data Interface is still what you want. When you say "get whatever native pointers I need": that is exactly what a struct ArrowArray is in the C Data Interface. Use the C Data Interface module in Java to export your Java arrays and get the address of a struct ArrowArray, and then pass that address to your C++ code via JNI. Then, use libarrow's C Data Interface implementation to import the arrays and work with them.
When the C++ side is done, it does the same thing: it exports the result vector and returns an address to Java via JNI; the Java code then imports the vector from that address.

Related

Creating a new c++ object from within a lua script?

---Context---
I want to have a class called "fileProcessor". This class is completely static and merely serves as a convinient namespace (within my normal library namespace) for some global function. This is a basic blueprint of the class with only the relevant stuff
class fileProcessor{
private:
lua_State* LUA_state;
public:
static std::variant<type1,type2> processFile(const char* filePath,const char* processorScript);
}
Please note again that I ommitted most of the stuff from the class so if anything seems odd ignore it.
What process file is supposed to do is:
Read the filePath file, storing all directives including it (this is my own filetype or style of syntax. This is already handeled correctly). The directives are stored with strings, one for the command and one for everything after it.
Read the script file and check if it has a commented out fileProcessor line at the top. This is to make sure that the lua script loaded is relevant and not some random behaviour script
Load and compile the lua script.
Make all read directives available (they are saved in a struct of 2 strings as mentioned before)
Run the file and recieve a object back. The object should only be of types that I listed in the return type (variant)
I am having problems with step 4 and one vital part of the scripting.
---Question---
How can I make the creation of a full new object of type1 or type2 possible within lua, write to it from within lua and then get it back from the lua stack into c++ and still know if its type1 or type2?
---No example provided since this question is more general and the only reason I provided my class is for context.---
It seems like you are trying to do it the other way around. I quote a part of this answer:
...you are expecting Lua to be the primary language, and C++ to be the client. The problem is, that the Lua C interface is not designed to work like that, Lua is meant to be the client, and all the hard work is meant to be written in C so that Lua can call it effortlessly.
If you are convinced there is no other way that doing it other way around you can follow the workaround that answer has given. Otherwise I think you can achieve what you need by using LUA as it meant to be.
LUA has 8 basic types (nil, boolean, number, string, userdata, function, thread, and table). But you can add new types as you require by creating a class as the new type in native C++ and registering it with LUA.
You can register by either:
Using some LUA helper for C++ like luna.h (as shown in this tutorial).
Pushing a new lua table with the C++ class (check this answer).
Class object instance is created in your native C++ code and passed to LUA. LUA then makes use of the methods given by the class interface.

call lua callback with custom data as function argument

I'm just looking for solution how to pass object from C to lua callback as function argument, is it even possible? I cannot find any referece. just trying something like this:
luabind::call_function(State, "Callback_name", object);
but luabind is not a option in this case because it is obsolete and additionally some features just don't work that's probably because of boost as I read but don't know exactly, thats why I switched to toLua++ but still cannot find any solution for it. I'm just trying to achieve to access data from C in lua callback ex:
function LUA_Callback(object1, object2)
print(object1.name)
print(object2.size)
end
Is there any way to solve it or library(without boost) to do this?
That depends on what you mean by "push object". It's trivial to push numbers, strings, C functions, tables, etc. -- in other words, Lua objects -- using API routines like lua_pushstring, lua_pushnumber, etc.
If you're talking about pushing C++ objects, you need to create and push userdata. If you want them to behave like objects from Lua, then you'll need the appropriate metatable/methods.

Modifying SWIG Interface file to Support C void* and structure return types

I'm using SWIG to generate my JNI layer for a large set of C APIs and I was wondering what is the best practices for the below situations. The below not only pertain to SWIG but JNI in general.
When C functions return pointers to Structures, should the SWIG interface file (JNI logic) be heavily used or should C wrapper functions be created to return the data in pieces (i.e. a char array that contains the various data elements)?
When C Functions return void* should the C APIs be modified to return the actual data type, whether it be primitive or structure types?
I'm unsure if I want to add a mass amount of logic and create a middle layer (SWIG interface file/JNI logic). Thoughts?
My approach to this in the past has been to write as little code as possible to make it work. When I have to write code to make it work I write it in this order of preference:
Write as C or C++ in the original library - everyone can use this code, you don't have to write anything Java or SWIG specific (e.g. add more overloads in C++, add more versions of functions in C, use return types that SWIG knows about in them)
Write more of the target language - supply "glue" to bring some bits of the library together. In this case that would be Java.
It doesn't really matter if this is "pure" Java, outside of SWIG altogether, or as part of the SWIG interface file from my perspective. Users of the Java interface shouldn't be able to distinguish the two. You can use SWIG to help avoid repetition in a number of cases though.
Write some JNI through SWIG typemaps. This is ugly, error prone if you're not familiar with writing it, harder to maintain (arguably) and only useful to SWIG+Java. Using SWIG typemaps does at least mean you only write it once for every type you wrap.
The times I'd favour this over 2. is one or more of:
When it comes up a lot (saves repetitious coding)
I don't know the target language at all, in which case using the language's C API probably is easier than writing something in that language
The users will expect this
Or it just isn't possible to use the previous styles.
Basically these guidelines I suggested are trying to deliver functionality to as many users of the library as possible whilst minimising the amount of extra, target language specific code you have to write and reducing the complexity of it when you do have to write it.
For a specific case of sockaddr_in*:
Approach 1
The first thing I'd try and do is avoid wrapping anything more than a pointer to it. This is what swig does by default with the SWIGTYPE_p_sockaddr_in thing. You can use this "unknown" type in Java quite happily if all you do is pass it from one thing to another, store in containers/as a member etc., e.g.
public static void main(String[] argv) {
Module.takes_a_sockaddr(Module.returns_a_sockaddr());
}
If that doesn't do the job you could do something like write another function, in C:
const char * sockaddr2host(struct sockaddr_in *in); // Some code to get the host as a string
unsigned short sockaddr2port(struct sockaddr_in *in); // Some code to get the port
This isn't great in this case though - you've got some complexity to handle there with address families that I'd guess you'd rather avoid (that's why you're using sockaddr_in in the first place), but it's not Java specific, it's not obscure syntax and it all happens automatically for you besides that.
Approach 2
If that still isn't good enough then I'd start to think about writing a little bit of Java - you could expose a nicer interface by hiding the SWIGTYPE_p_sockaddr_in type as a private member of your own Java type, and wrapping the call to the function that returns it in some Java that constructs your type for you, e.g.
public class MyExtension {
private MyExtension() { }
private SWIGTYPE_p_sockaddr_in detail;
public static MyExtension native_call() {
MyExtension e = new MyExtension();
e.detail = Module.real_native_call();
return e;
}
public void some_call_that_takes_a_sockaddr() {
Module.real_call(detail);
}
}
No extra SWIG to write, no JNI to write. You could do this through SWIG using %pragma(modulecode) to make it all overloads on the actual Module SWIG generates - this feels more natural to the Java users probably (it doesn't look like a special case) and isn't really any more complex. The hardwork is being done by SWIG still, this just provides some polish that avoids repetitious coding on the Java side.
Approach 3
This would basically be the second part of my previous answer. It's nice because it looks and feels native to the Java users and the C library doesn't have to be modified either. In essence the typemap provides a clean-ish syntax for encapsulating the JNI calls for converting from what Java users expect to what C works with and neither side knows about the other side's outlook.
The downside though is that it is harder to maintain and really hard to debug. My experience has been that SWIG has a steep learning curve for things like this, but once you reach a point where it doesn't take too much effort to write typemaps like that the power they give you through re-use and encapsulation of the C type->Java type mapping is very useful and powerful.
If you're part of a team, but the only person who really understands the SWIG interface then that puts a big "what if you get hit by a bus?" factor on the project as a whole. (Probably quite good for making you unfirable though!)

Lua + SWIG Monkey Patching

I have used SWIG to bind a set of classes to lua. I know C++ itself doesn't support monkey patching, and I'm not trying to modify my C++ objects, merely their lua representations. The problem comes if I want to start monkey patching the lua tables and objects exported by SWIG, so that I can modify the API presented on the lua side.
e.g. the following lua code:
game.GetEnemies1 = game.GetEnemies2
does not work as expected. The behaviour after that line is still consistent with the original GetEnemies1 not GetEnemies2.
how do I combat this problem?
I've successfully monkeypatched lua userdata by adding and replacing existing methods. It involved modifying their metatables.
Here's a sample of what I had to do in order to add a couple methods to an existing userdata object.
As you can see, instead of modifying the object iself, I had to modify its metatable.
This solution will only work if your userdata objects are set up so their metatables "point to themselves": mt.__index = mt.
Regards!
Swig generates lua wrappers from c++ functions, it does not inject lua functions into c++. If GetEnemies1 is a c++ function, called from other c++ functions, monkey patching wont work.
You will have to rewrite your c++ code so that the code which executes GetEnemies1 looks for some sort of callback which you can wrap with swig.

Java JNI: global variable collision in native code

Good day, masters. Suppose I have a java class A:
class A {
public A() {}
public native void setValue(String value);
public native String getValue();
}
When implementing the native C code, a global char[] variable is used to store the value just be set by native setValue method. The getValue method just return that global char[] variable.
Here comes my question: I create several A objects, and invoke their respective set/get method, I found that they end up writting and readding the same block of memory! Actually the global char[] variable in C native code is totally shared by all A object.
Can anybody give me some in-depth explaination of this behavior? I knew I had a fundmental misunderstanding in terms of the way JNI works. Thanks!
The problem is that you have the object oriented java on the one side, and the procedural C on the other side. When a java class A is loaded, JNI does not somehow create an object in the C world (this is C - not C++), there is no object-to-object mapping. So JNI loads the corresponding CPP file in the C world once into memory (this is completely normal behavior in C). The compiler sees a bunch of functions and a global reference, there is no need to create multiple instances somehow. When loading a Java class, all the JNI compiler does is to add the C file to the list of files to be compiled.
What you would like to have is a mapping between Java objects and C objects. This can be achieved in several ways, I suggest you read about the proxy pattern in JNI.
To explain very shortly, you have two fundamental ways to achieve this. You can either do it on the C side or on the Java side. When storing the relationship in the C world, you would maintain a hashtable in C where C objects map to Java objects. The other way around, you would have an int object in a Java object that is actually a pointer to a C object in memory. In the C code you would then dereference this pointer (through GetIntField()) and cast to the C object you require.
Of course adding
private String value;
to your Java class A, would indeed work very smootly. Java has a notion of objects, so you could access those Strings from the C world with
env->GetObjectField(obj, "value", "Ljava/lang/String;")
with env being the JNI environment.