Modifying SWIG Interface file to Support C void* and structure return types - java-native-interface

I'm using SWIG to generate my JNI layer for a large set of C APIs and I was wondering what is the best practices for the below situations. The below not only pertain to SWIG but JNI in general.
When C functions return pointers to Structures, should the SWIG interface file (JNI logic) be heavily used or should C wrapper functions be created to return the data in pieces (i.e. a char array that contains the various data elements)?
When C Functions return void* should the C APIs be modified to return the actual data type, whether it be primitive or structure types?
I'm unsure if I want to add a mass amount of logic and create a middle layer (SWIG interface file/JNI logic). Thoughts?

My approach to this in the past has been to write as little code as possible to make it work. When I have to write code to make it work I write it in this order of preference:
Write as C or C++ in the original library - everyone can use this code, you don't have to write anything Java or SWIG specific (e.g. add more overloads in C++, add more versions of functions in C, use return types that SWIG knows about in them)
Write more of the target language - supply "glue" to bring some bits of the library together. In this case that would be Java.
It doesn't really matter if this is "pure" Java, outside of SWIG altogether, or as part of the SWIG interface file from my perspective. Users of the Java interface shouldn't be able to distinguish the two. You can use SWIG to help avoid repetition in a number of cases though.
Write some JNI through SWIG typemaps. This is ugly, error prone if you're not familiar with writing it, harder to maintain (arguably) and only useful to SWIG+Java. Using SWIG typemaps does at least mean you only write it once for every type you wrap.
The times I'd favour this over 2. is one or more of:
When it comes up a lot (saves repetitious coding)
I don't know the target language at all, in which case using the language's C API probably is easier than writing something in that language
The users will expect this
Or it just isn't possible to use the previous styles.
Basically these guidelines I suggested are trying to deliver functionality to as many users of the library as possible whilst minimising the amount of extra, target language specific code you have to write and reducing the complexity of it when you do have to write it.
For a specific case of sockaddr_in*:
Approach 1
The first thing I'd try and do is avoid wrapping anything more than a pointer to it. This is what swig does by default with the SWIGTYPE_p_sockaddr_in thing. You can use this "unknown" type in Java quite happily if all you do is pass it from one thing to another, store in containers/as a member etc., e.g.
public static void main(String[] argv) {
Module.takes_a_sockaddr(Module.returns_a_sockaddr());
}
If that doesn't do the job you could do something like write another function, in C:
const char * sockaddr2host(struct sockaddr_in *in); // Some code to get the host as a string
unsigned short sockaddr2port(struct sockaddr_in *in); // Some code to get the port
This isn't great in this case though - you've got some complexity to handle there with address families that I'd guess you'd rather avoid (that's why you're using sockaddr_in in the first place), but it's not Java specific, it's not obscure syntax and it all happens automatically for you besides that.
Approach 2
If that still isn't good enough then I'd start to think about writing a little bit of Java - you could expose a nicer interface by hiding the SWIGTYPE_p_sockaddr_in type as a private member of your own Java type, and wrapping the call to the function that returns it in some Java that constructs your type for you, e.g.
public class MyExtension {
private MyExtension() { }
private SWIGTYPE_p_sockaddr_in detail;
public static MyExtension native_call() {
MyExtension e = new MyExtension();
e.detail = Module.real_native_call();
return e;
}
public void some_call_that_takes_a_sockaddr() {
Module.real_call(detail);
}
}
No extra SWIG to write, no JNI to write. You could do this through SWIG using %pragma(modulecode) to make it all overloads on the actual Module SWIG generates - this feels more natural to the Java users probably (it doesn't look like a special case) and isn't really any more complex. The hardwork is being done by SWIG still, this just provides some polish that avoids repetitious coding on the Java side.
Approach 3
This would basically be the second part of my previous answer. It's nice because it looks and feels native to the Java users and the C library doesn't have to be modified either. In essence the typemap provides a clean-ish syntax for encapsulating the JNI calls for converting from what Java users expect to what C works with and neither side knows about the other side's outlook.
The downside though is that it is harder to maintain and really hard to debug. My experience has been that SWIG has a steep learning curve for things like this, but once you reach a point where it doesn't take too much effort to write typemaps like that the power they give you through re-use and encapsulation of the C type->Java type mapping is very useful and powerful.
If you're part of a team, but the only person who really understands the SWIG interface then that puts a big "what if you get hit by a bus?" factor on the project as a whole. (Probably quite good for making you unfirable though!)

Related

Programming language development practice, how to compile golang style interfaces to c++?

For fun I have been working on my own programming language that compiles down to C++. While most things are fairly straightforward to print, I have been having trouble compiling my golang style interfaces to c++. In golang you don't need to explicitly declare that a particular struct implements an interface, it happens automatically if the struct has all the functions declared in the interface. Originally I was going to compile the interfaces down to a class with all virtual methods like so
class MyInterface {
public:
void DoSomthing() = 0;
}
and all implementing structures would simply extend from the interface like you normally would in c++
class MyClass: public MyInterface {
// ...
}
However this would mean that my compiler would have to loop over every interface defined in the source code (and all dependencies) as well as every struct defined in the source and check if the struct implements the interface using an operation that would take O(N*M) time where N is the number of structs and M is the number of interfaces. I did some searching and stumbled upon some c++ code here: http://wall.org/~lewis/2012/07/23/go-style-interfaces-in-cpp.html that makes golang style interfaces in c++ a reality in which case I could just compile my interfaces to code similar to that (albeit not exactly since I am hesitant to use raw pointers over smart pointers) and not have to worry about explicitly implementing them. However the author states that It should not be done for production code which worries me a little.
This is kinda a loaded question that may be a little subjective, but could anyone with more C++ knowledge tell me if doing it the way suggested in the article is a really bad idea or is it actually not that bad and could be done, or if there is a better way to write c++ code that would allow me to achieve the behavior I want without resorting to the O(N*M) loop?
My initial thought is to make use of the fact that C++ supports multiple inheritance. Decompose your golang interface into single-function interfaces. Hash all interfaces by their unique signature. It now becomes an O(N) operation to find the set of C++ abstract interfaces for your concrete classes.
Similarly, when you consume an object, you find all the consumed interfaces. This is now O(M) by the same logic. The total compiler complexity then becomes O(N)+O(M) instead of O(N*M).
The slight downside is that you're going to have O(N) vtables in C++. Some of those might be merged if certain interfaces are always groupd together.

C++ properties for opaque data types

I am building a C++ API which needs to be extendable without recompiling the software that uses it (for lots of bad reasons). This requires opaque data types so that fields can be added to classes. Something like
struct CheshireCat; // Not defined here
std::unique_ptr<CheshireCat> d_ptr;
But this API will have lots of simple property get/set methods that just access fields within CheshireCat. (I do not want to hear about "encapsulation" philosophy, that is just the way that it is for this application.)
There are clever techniques that create classes like MyInt that override operators to emulate properties like
int & operator = (const int &i) { return value = i; }
operator int () const { return value; }
But I cannot see any good way of such a MyInt getting access to the CheshireCat pointer in the containing class, let alone the hidden properties within the structure.
It seems like this must be a very common problem, so I am looking for other people's clever solutions. Possibly using some obscure macrology.
(The alternatives would be 1. Forget binary compatibility, or 2. write lots and lots of boilerplate code by hand.)
I realize that for binary compatibility new virtual methods need to be added to the end. Mangling is compiler dependent but should be stable for a given compiler (we will ship binaries made for a specific compiler/version).
This feels like C++ 101 but I could not find anything that addressed it elegantly. (My background is Java/.Net/Lisp where JITing solves this problem. A C++ system that did the final compilation phases in the (DLL) linker would be excellent but sadly that is not the way it is.)
I don't think you can get binary compatibility by exposing C++.
What you can do is expose a C interface (which is binary compatible). That means, as you have hinted, exposing functions that deal with opaque handles (i.e. pointers).
Of course, the internals can be written in C++.

How to make API names into variables for easier coding

I am looking for a way to turn some long and confusing API function names into shorter types to reduce the amount of typing and over all errors due to misspelling.
For example : I would like to take gtk_functionName(); and make it a variable like so. doThis = gtk_functionName;
Sometimes the code will have lots of repeating suffix. I want to know if I can take this g_signal_connect_ and turn it into this connect so I could just type connectswapped instead of g_signal_connect_swapped.
I am looking to do this in C\C++ but would be happy to know how its done in any language. I thought I had seen a code that did this before but I can not figure out what this would be called so searching for it has been fruitless.
I am sure this is possible and I am just not able to remember how its done.
I believe what you are wanting to do is apply the Facade Pattern, which is to present a simplified interface to a larger, more complex body of code.
What this basically means is you define your own simplified interfaces for the functionality you want. The implementation of these interfaces use the longer more complex packages you want to simplify. After that, the rest of your code use the simplified interfaces instead of the complex package directly.
void doThis (That *withThat) {
gtk_functionName(withThat->arg1, withThat->arg2 /* etc. */);
}

Ways to use variable as object name in c/c++

Just out of curiosity: is there a way to use variable as object name in c++?
something along the lines:
char a[] = "testme\0";
*a *vr = new *a();
If you were to write a c/c++ compiler how would you go about to implement such a thing?
I know they implemented this feature in zend engine but to lazy to look it up.
Maybe some of you guys can enlight me :)
In case what you are looking for is something like this
<?php
$className = "ClassName";
$instance = new $className();
?>
That's simply not possible in C++. This fails for many reasons, one of them that C++ at runtime doesn't know much about names of classes anymore (only in debug mode) If somebody wanted to write a compiler that would allow something like this, it would be necessary to keep a lot of information that a C++ compiler only needs during compilation and linking. Changing this would create a new language.
If you want to dynamically create classes depending on information only available at runtime, in C++ you would most likely use some of the Creational Design Patterns.
Edit:
PHP is one language, C++ is a very different one. 16M may not be that much nowadays, for a C++ programmer where some programs are in the k range, it's a whole world. Nobody wants to ship a complete compiler with his C++ app to be able to get all the dynamic features (that btw PHP too implements only in a limited way as far as I know, if you want really dynamic runtime code creation, have a look at Ruby or Python). C++ has (as all languages) a certain philosophy and creating objects by name in a string doesn't fit very well with it. This feature alone is quite useless anyway and would by no means justify the overhead necessary to implement it. This could most likely be done without adding runtime compilation, but even the extra kilobytes necessary to store the names alone make no sense in the C++ world. And C++ is strictly typed and this functionality would have to make sure, that type checking doesn't break.
In C and C++, identifier names do not have the same meaning they do in PHP.
PHP is a dynamic language, and (at least conceptually) runs in an interpreted context. Identifier names are present at run time, they can be inspected through PHP's reflection features, you can use strings to refer to functions, variables, globals, and object properties by name, etc. PHP identifiers are actual semantic entities.
In C++, identifiers are lost at run time (again, conceptually speaking). You use them in your source code to identify variables, functions, classes, etc., but the compiler translates them into memory addresses or literal values, or even optimizes them away completely. Identifier names are not generally present in the compiled binary (unless you instructed the compiler to include debug symbols), and there is no way to inspect them at run-time. Even with RTTI, the best you can get is an arbitrary number to identify a type; you can compare them for equality, but you cannot get the name back.
Consequently, if you want to translate strings into identifier names at run-time in C++, you have to perform the mapping manually. std::map can be a great help for this - you hand it a string, and it gives you a value. This doesn't work directly for class names; for these, you need to implement some sort of factory method. A nice solution is to have one wrapper function for each type, and then a std::map that maps class names to the corresponding wrappers. Something like:
map<string, FoobarFactoryMethod> factory_map;
Foobar* FooFactory() { return new Foo(); }
Foobar* BarFactory() { return new Bar(); }
Foobar* BazFactory() { return new Baz(); }
void fill_map() {
factory_map["Foo"] = FooFactory;
factory_map["Bar"] = BarFactory;
factory_map["Baz"] = BazFactory;
}
// and then later:
Foobar* f = factory_map[classname]();
Why do you even want to have this feature? You are most likely misusing OOP. Whenever my needs ran into hard language barriers like this I ended up doing one of the following:
Rethink your solution to the problem so it fits OOP better
Create a DSL for your problem (domain specific language)
Create a code generator for this part of your problem
Pick a language that fits your problem better
A combination of the above
I would think that what you want to do would be best accomplished using interfaces and a factory pattern.

Dynamic C++

I'm wondering about an idea in my head. I want to ask if you know of any library or article related to this. Or you can just tell me this is a dumb idea and why.
I have a class, and I want to dynamically add methods/properties to it at runtime. I'm well aware of the techniques of using composite/command design pattern and using embedded scripting language to accomplish what I'm talking about. I'm just exploring the idea. Not necessary saying that it is a good idea.
class Dynamic
{
public:
typedef std::map<std::string, boost::function<void (Dynamic&)> > FuncMap;
void addMethod(const std::string& name, boost::function<void (Dynamic&)> func) {
funcMap_[name] = func;
}
void operator[](const std::string& name) {
FuncMap::iterator funcItr = funcMap_.find(name);
if (funcItr != funcMap_.end()) {
funcItr->second(*this);
}
}
private:
FuncMap funcMap_;
};
void f(Dynamic& self) {
doStuffWithDynamic(self);
}
int main()
{
Dynamic dyn;
dyn.addMethod("f", boost::bind(&f, _1));
dyn["f"]; // invoke f
}
The idea is that I can rebind the name "f" to any function at runtime. I'm aware of the performance problem in string lookup and boost::function vs. raw function pointer. With some hard work and non-portable hack I think I can make the performance problem less painful.
With the same kind of technique, I can do "runtime inheritance" by having a "v-table" for name lookup and dispatch function calls base on dynamic runtime properties.
If just want to tell me to use smalltalk or Objective-C, I can respect that but I love my C++ and I'm sticking with it.
What you want is to change C++ into something very different. One of the (many) goals of C++ was efficient implementation. Doing string lookup for function calls (no matter how well you implement it), just isn't going to be very efficient compared to the normal call mechanisms.
Basically, I think you're trying to shoehorn in functionality of a different language. You CAN make it work, to some degree, but you're creating C++ code that no one else is going to be able (or willing) to try to understand.
If you really want to write in a language that can change it's objects on the fly, then go find such a language (there are many choices, I'm sure). Trying to shoehorn that functionality into C++ is just going to cause you problems down the road.
Please note that I'm no stranger to bringing in non-C++ concepts into C++. I once spent a considerable amount of time removing another engineer's attempt at bringing a based-object system into a C++ project (he liked the idea of containers of 'Object *', so he made every class in the system descend from his very own 'Object' class).
Bringing in foreign language concepts almost always ends badly in two ways: The concept runs up against other C++ concepts, and can't work as well as it did in the source language, AND the concept tends to break something else in C++. You end up losing a lot of time trying to implement something that just isn't going to work out.
The only way I could see something like this working at all well, is if you implemented a new language on top of C++, with a cfront-style pre-compiler. That way, you could put some decent syntax onto the thing, and eliminate some of your problems.
If you implemented this, even as a pure library, and then used it extensively, you would in a way be using a new language - one with a hideous syntax, and a curious combination of runtime method resolution and unreliable bounds checking.
As a fan of C/C++ style syntax and apparently a fan of dynamic method dispatch, you may be interested in C# 4.0, which is now in Beta, and has the dynamic keyword to allow exactly this kind of thing to be seamlessly mixed into normal statically typed code.
I don't think it would be a good idea to change C++ enough to make this work. I'd suggest working in another language, such as Lisp or Perl or another language that's basically dynamic, or imbedding a dynamic language and using it.
What you are doing is actually a variation of the Visitor pattern.
EDIT: By the way, another approach would be by using Lua, since the language allows you to add functions at runtime. So does Objective-C++.
EDIT 2: You could actually inherit from FuncMap as well:
class Dynamic;
typedef std::map<std::string, boost::function<void (Dynamic&)> > FuncMap;
class Dynamic : public FuncMap
{
public:
};
void f(Dynamic& self) {
//doStuffWithDynamic(self);
}
int main()
{
Dynamic dyn;
dyn["f"] = boost::bind(&f, _1);
dyn["f"](dyn); // invoke f, however, 'dyn'param is awkward...
return 0;
}
If I understand what you are trying to accomplish correctly, it seems as though dynamic linking (i.e. Dynamically loaded libraries in windows or linux) will do most of what you are trying to accomplish.
That is, you can, at runtime, select the name of the function you want to execute (eg. the name of the DLL), which then gets loaded and executed. Much in the way that COM works. Or you can even use the name of the function exported from that library to select the correct function (C++ name mangling issues aside).
I don't think there's a library for this exact thing.
Of course, you have to have these functions pre-written somehow, so it seems there would be an easier way to do what you want. For example you could have just one method to execute arbitrary code from your favorite scripting language. That seems like an easier way to do what you want.
I keep thinking of the Visitor pattern. That allows you to do a vtable lookup on the visiting object (as well as the visited object, thought that doesn't seem relevant to your question).
And at runtime, you could have some variable which refers to the visitor, and call
Dynamic dynamic;
DynamicVisitor * dv = ...;
dynamic->Accept(dv);
dv = ...; // something else
dynamic->Accept(dv);
The point is, the visitor object has a vtable, which you said you wanted, and you can change its value dynamically, which you said you wanted. Accept is basically the "function to call things I didn't know about at compile time."
I've considered doing this before as well. Basically, however, you'd be on your way to writing a simple VM or interpreter (look at, say, Lua or Topaz's source to see what I mean -- Topaz is a dead project that pre-dates Parrot).
But if you're going that route it makes sense to just use an existing VM or interpreter.