How to define a class and method in DDS idl file? - c++

I am new to DDS.... so far I have little experience in OpenDDS and CycloneDDS
Is it possible to define a class inside the idle file and have member variables and member methods? or only structure and primitive data types are supported in DDS standards?

The IDL language is defined in the OMG IDL specification. It consists of a number of building blocks that include Core Data Types, like the structures and primitive data types that you mentioned, and Interfaces, that includes the methods you asked about.
However, only a subset of those building blocks is used by DDS. For the current version 4.2, section 9.3 DDS Profiles defines which of them are relevant for three different levels of support by DDS: Plain DDS, Extensible DDS and DDS over RPC.
You will see that the latter indeed includes Building Block Interfaces - Basic, as you might expect from RPC. However, not all DDS implementations support RPC. Plain DDS and Extensible DDS are more commonly supported and interfaces are not part of that functionality.
Since you asked about this in another question: note that the interface functionality as captured in DDS over RPC is not for the purpose of distributing objects with their methods, but for invoking methods on objects remotely -- as the name Remote Procedure Call implies.

Another answer to your question is that you are, perhaps, asking the follow-up question as if it is the initial one. There are many different ways of building distributed systems, and given your question, three examples seem appropriate:
those designed around remote procedure calls/remote method invocation: in this context, CORBA is the perfect reference, but there are many (RPC, gRPC, DCOM, you name it);
those designed around shipping objects with their implementation across: one example is Java/JINI, but there are many others (JavaScript in a browser could be considered one);
those designed around shipping state (a.k.a. plain old data) and adding/transforming that state: SPLICE in the ancient history, DDS today.
Your question suggests that you are looking for middleware for doing distributed object computing. If that's indeed what you are looking for, DDS is a very suboptimal choice. Yes, RPC can be built on top of it (RPC-over-DDS simply makes it a bit easier to do it) and in a system predominantly built around distributing state it makes sense to do that.
If you can serialise objects with their methods then of course you can use DDS to distribute them in the network (there are fun things you can do that way). However, that's more a function of the programming language you use than of the middleware and IDL won't help you with that.

Related

How to best expose ocaml library to other languages?

There are various exchange languages - json, ect - that provide an ability to quickly and reliably export and parse data to a common format. This is a boon between languages, and for it there is Piqi, which basically generates parsable exchange formats for any type that you define; it automates the process of writing boiler code (writing functions that read in some exchange info and build up a instance of some arbitrary type). Basically, the best option to date is protocol buffers, and I absolutely want, if I go down the route of ocaml-rpc, to use protocol buffers.
It would be nice if there were some declarative pattern to manage function exposure, so that the ocaml library can be reached over some medium (like RPC, or map a function to a url with encoding for arguments).
Imagine offering a library as a service; where you don't want to or can't make actual bindings between every single pair of languages. But servers and the data parsing has already been written... so wouldn't there be some way to integrate the two, and just specify what functions should be exposed and where/how?
Lastly, it appears to me that protocol buffers is a mechanism by which you can encode/decode data quickly, but not a transport mechanism... is there some kind of ocaml-RPC spec or some ocaml RPC library? Aren't there various RPC protocols (and ergo, if I try to point two languages using diff protocols at one another, achieve failure)? Additionally, the server mechanism that waits and receives RPC calls is (possibly) another module(?)
How do I achieve this?
To update this, the latest efforts under the piqi project are aimed at producing a working OCaml RPC service. From this, it would be, in vision, easy to specify what functions to expose at the RPC service end, and target function selection on the client side should allow for some mechanized facility to allow those exposed functions to be selected.
At the current time, this RPC system for ocaml facilitates inter-language exchange of data that can be reconstructed by parsers through the use of proto-buffers; it is under development and still being discussed here
I think that ocaml-rpc library suits your requirements. It can infer serialization functions and, also, can generate client and server code. The interesting part, is that they use OCaml as a IDL language. For example, this is a definition of the rpc function:
external rpc2 : ?opt:string -> variant -> unit = ""
From which there will be inferred server and client functorized code, that will take care on transporting marshaling and demarshaling the data, so that you need to work only with pure OCaml data types.
The problem with this library is that it is barely documented, so you may find it hard to use.
Also, as now I know, that you're tackling with BAP, I would like to bring your attention to a new BAP 1.x, that will be ready soon, and it will have bindings, that will allow to call it from any language, although currently we're mostly targeting python.

COM in the non-Windows world?

Hope this question isn't going to be too vague. Reading through the COM spec and Don Box's Essential COM book, there is plenty of talk of the "problems that COM solves" - and they all sound important, relevant and current.
So how are the problems that COM addresses dealt with on other systems (linux, unix, OSX, android)? I'm thinking of things like:
binary compatibility across compilers and compiler versions
binary component reuse
compiling an application such that it has run-time dependencies rather than load-time ones (so that it runs even when a dependency is missing)
access to library functionality from languages other than the library's own
reasonably low-overhead remote procedure calls to components loaded in the address space of a different process
etc (I'm sure the list goes on)
I'm basically just trying to understand why for instance on Linux CORBA isn't a thing like COM is a thing on Windows (if that makes any sense). Does maybe software development on Linux subscribe to a different philosophy than the component-based model proposed by COM?
And finally, is COM a C/C++ thing? Several times I've come across comments from people saying COM is made "obsolete" by .NET but without really explaining what they meant by that.
For the remainder of this post, I'm going to use Linux as an example of open-source software. Where I mention "Linux" it's mostly a short/simple way to refer to open source software in general though, not anything specific to Linux.
COM vs. .NET
COM isn't actually restricted to C and C++, and .NET doesn't actually replace COM. However, .NET does provide alternatives to COM for some situations. One common use of COM is to provide controls (ActiveX controls). .NET provides/supports its own protocol for controls that allows somebody to write a control in one .NET language, and use that control from any other .NET language--more or less the same sort of thing that COM provides outside the .NET world.
Likewise, .NET provides Windows Communication Foundation (WCF). WCF implements SOAP (Simple Object Access Protocol)--which may have started out simple, but grew into something a lot less simple at best. In any case, WCF provides many of the same kinds of capabilities as COM does. Although WCF itself is specific to .NET, it implements SOAP, and a SOAP server built using WCF can talk to one implemented without WCF (and vice versa). Since you mention overhead, it's probably worth mentioning that WCF/SOAP tend to add more overhead that COM (I've seen anywhere from nearly equal to about double the overhead, depending on the situation).
Differences in Requirements
For Linux, the first two points tend to have relatively low relevance. Most software is open source, and many users are accustomed to building from source in any case. For such users, binary compatibility/reuse is of little or no consequence (in fact, quite a few users are likely to reject all software that isn't distributed in source code form). Although binaries are commonly distributed (e.g., with apt-get, yum, etc.) they're basically just caching a binary built for a specific system. That is, on Windows you might have a single binary for use on anything from Windows XP up through Windows 10, but if you use apt-get on, say, Ubuntu 18.02, you're installing a binary built specifically for Ubuntu 18.02, not one that tries to be compatible with everything back to Ubuntu 10 (or whatever).
Being able to load and run (with reduced capabilities) when a component is missing is also most often a closed-source problem. Closed source software typically has several versions with varying capabilities to support different prices. It's convenient for the vendor to be able to build one version of the main application, and give varying levels of functionality depending on which other components are supplied/omitted.
That's primarily to support different price levels though. When the software is free, there's only one price and one version: the awesome edition.
Access to library functionality between languages again tends to be based more on source code instead of a binary interface, such as using SWIG to allow use of C or C++ source code from languages like Python and Ruby. Again, COM is basically curing a problem that arises primarily from lack of source code; when using open source software, the problem simply doesn't arise to start with.
Low-overhead RPC to code in other processes again seems to stem primarily from closed source software. When/if you want Microsoft Excel to be able to use some internal "stuff" in, say, Adobe Photoshop, you use COM to let them communicate. That adds run-time overhead and extra complexity, but when one of the pieces of code is owned by Microsoft and the other by Adobe, it's pretty much what you're stuck with.
Source Code Level Sharing
In open source software, however, if project A has some functionality that's useful in project B, what you're likely to see is (at most) a fork of project A to turn that functionality into a library, which is then linked into both the remainder of project A and into Project B, and quite possibly projects C, D, and E as well--all without imposing the overhead of COM, cross-procedure RPC, etc.
Now, don't get me wrong: I'm not trying to act as a spokesperson for open source software, nor to say that closed source is terrible and open source is always dramatically superior. What I am saying is that COM is defined primarily at a binary level, but for open source software, people tend to deal more with source code instead.
Of course SWIG is only one example among several of tools that support cross-language development at a source-code level. While SWIG is widely used, COM is different from it in one rather crucial way: with COM, you define an interface in a single, neutral language, and then generate a set of language bindings (proxies and stubs) that fit that interface. This is rather different from SWIG, where you're matching directly from one source to one target language (e.g., bindings to use a C library from Python).
Binary Communication
There are still cases where it's useful to have at least some capabilities similar to those provided by COM. These have led to open-source systems that resemble COM to a rather greater degree. For example, a number of open-source desktop environments use/implement D-bus. Where COM is mostly an RPC kind of thing, D-bus is mostly an agreed-upon way of sending messages between components.
D-bus does, however, specify things it calls objects. Its objects can have methods, to which you can send signals. Although D-bus itself defines this primarily in terms of a messaging protocol, it's fairly trivial to write proxy objects that make invoking a method on a remote object look pretty much like invoking one on a local object. The big difference is that COM has a "compiler" that can take a specification of the protocol, and automatically generate those proxies for you (and corresponding stubs in the far end to receive the message, and invoke the proper function based on the message it received). That's not part of D-bus itself, but people have written tools to take (for example) an interface specification and automatically generate proxies/stubs from that specification.
As such, although the two aren't exactly identical, there's enough similarity that D-bus can be (and often is) used for many of the same sorts of things as COM.
Systems Similar to DCOM
COM also allows you to build distributed systems using DCOM (Distributed COM). That is, a system where you invoke a method on one machine, but (at least potentially) execute that invoked method on another machine. This adds more overhead, but since (as pointed out above with respect to D-bus) RPC is basically communication with proxies/stubs attached to the ends, it's pretty easy to do the same thing in a distributed fashion. The difference in overhead, however, tends to lead to differences in how systems need to be designed to work well, though, so the practical advantage of using exactly the same system for distributed systems as local systems tends to be fairly minimal.
As such, the open source world provides tools for doing distributed RPC, but doesn't usually work hard at making them look the same as non-distributed systems. CORBA is well known, but generally viewed as large and complex, so (at least in my experience) current use is fairly minimal. Apache Thrift provides some of the same general type of capabilities, but in a rather simpler, lighter-weight fashion. In particular, where CORBA attempts to provide a complete set of tools for distributed computing (complete with everything from authentication to distributed time keeping), Thrift follows the Unix philosophy much more closely, attempting to meet exactly one need: generate proxies and stubs from an interface definition (written in a neutral language). If you want to do those CORBA-like things with Thrift you undoubtedly can, but in a more typical case of building internal infrastructure where the caller and callee trust each other, you can avoid a lot of overhead and just get on with the business at hand. Likewise, google RPC provides roughly the same sorts of capabilities as Thrift.
OS X Specific
Cocoa provides distributed objects that are fairly similar to COM. This is based on Objective-C though, and I believe it's now deprecated.
Apple also offers XPC. XPC is more about inter-process communication than RPC, so I'd consider it more directly comparable to D-bus than to COM. But, much like D-bus, it has a lot of the same basic capabilities as COM, but in different form that places more emphasis on communication, and less on making things look like local function calls (and many now prefer messaging to RPC anyway).
Summary
Open source software has enough different factors in its design that there's less demand for something providing the same mix of capabilities as Microsoft's COM provides on Windows. COM is largely a single tool that tries to meet all needs. In the open-source world, there's less drive to provide that single, all-encompassing solution, and more tendency toward a kit of tools, each doing one thing well, that can be put together into a solution for a specific need.
Being more commercially oriented, Apple OS X probably has what are (at least arguably) closer analogs to COM than most of the more purely open-source world.
A quick answer on the last question: COM is far from being obsolete. Almost everything in the Microsoft world is COM-based, including the .NET engine (the CLR), and including the new Windows 8.x's Windows Runtime.
Here is what Microsoft says about .NET in it latest C++ pages Welcome Back to C++ (Modern C++):
C++ is experiencing a renaissance because power is king again.
Languages like Java and C# are good when programmer productivity is
important, but they show their limitations when power and performance
are paramount. For high efficiency and power, especially on devices
that have limited hardware, nothing beats modern C++.
PS: which is a bit of a shock for a developer who has invested more than 10 years on .NET :-)
In the Linux world, it is more common to develop components that are statically linked, or which run in separate processes and communicate by piping text (maybe JSON or XML) back and forth.
Some of this is due to tradition. UNIX developers have been doing stuff like this long before CORBA or COM existed. It's "the UNIX way".
As Jerry Coffin says in his answer, when you have the source code for everything, binary interfaces are not as important, and in fact just make everything more difficult.
COM was invented back when personal computers were a lot slower than they are today. In those days, loading components into your app's process space and invoking native code was often necessary to achieve reasonable performance. Now, parsing text and running interpreted scripts aren't things to be afraid of.
CORBA never really caught on in the open-source world because the initial implementations were proprietary and expensive, and by the time high-quality free implementations were available, the spec was so complicated that nobody wanted to use it if they weren't required to do so.
To a large extent, the problems solved by COM are simply ignored on Linux.
It is true that binary compatibility is less important when you have the source code available. However, you still have to worry about modularisation and versioning. If two different programs depend on different versions of the same library, you need to somehow support that.
Then there is the case of the same program using different versions of the same library. This is often useful when working on large legacy programs, where upgrading everything can be prohibitively expensive but you would like to use new features anyway. With COM, the old parts of the program can just be left alone, since new library versions can more easily be made backwards compatible.
In addition, having to compile from source instead of binary compatibility is a huge hassle. Especially if you are actually developing software, since binary incompatibility means you have to recompile much more often. If one tiny part changes in a large C++ program, you may have to wait for a 30 minute recompile. If the different pieces are compatible, only the part which changed has to be recompiled.
COM and DCOM in particular have been around in windows for some considerable time now and naturally windows developers have made use of this powerful framework.
We are now in the cross platform age and when porting such applications to other platforms we are faced with challenges which in many cases can be mitigated or eliminated altogether unless the application we are porting is more than just one simple standalone app.
If your dealing with a whole suite of modules running on different machines all communicating using windows specific technologies such as DCE/RPC, DCOM or even windows named pipes then your job just became an order of magnitude harder.
DCE/RPC DCOM and windows named pipes all are very windows specific, non portable and of course subject to windows security access control.
For instance anyone familar with OPC DA (an industrial automation protocol based on DCOM still very much in use but now superceded by OPC UA (which avoids DCOM))) will know that there are no elegant solutions here if the client (or server) needs to be available for Linux!!
Sure there appear to be some technical hurdles here given that the MS code is not in the public domain but projects such as Wine have a partly ok DCE/RPC implementation and MS do publish some of the protocol docs. Try searching and you will probably find little information and few products open source or otherwise to help you.
Perhaps the lack of open source or affordable options here is more due to legal concerns - I wonder!
Some simpler solutions simply involve installing a "gateway service" on the windows machines to allow an alternative means of access to DCOM interfaces on that machine. This is fine if the windows machine does not belong to an unwilling 3rd party which unfortunately is sometimes the case!!! I know we'll just chuck another Windows machine as the gateway in the middle is the usual global warming enhancing solution to that problem.
I would conclude that Linux to Windows DCOM interoperability is certainly not impossible but it does appear to be a topic that few are interested in talking about unless you get your wallet out!

Is it common practice to abstract library dependencies from implementation?

My answer to this question would be "no." But my coworkers disagree.
We're rebuilding our product and have a lot of critical decisions to make in the near-term.
While doing some of my own work I noticed that we've got some in-house C++ classes to abstract some of the POSIX API (threads, mutexes, semaphores, and rw locks) and other utility classes. Note that these classes are basic, and haven't been ported from Linux (portability is a factor in the rebuild.) We are also using POCO C++ libraries.
I brought this to the attention of my coworkers and suggested that we ditch our in-house classes in favour of their POCO equivalents. I want to take full advantage of the library we're already using. They suggested that we should implement our in-house classes using POCO, and further abstract additional POCO classes as necessary, so as not to depend on any specific C++ library (citing future unknowns - what if we want to use a different lib/framework like QT or boost, what if the one we choose turns out to be no good or development becomes inactive, etc.)
They also don't want to refactor legacy code, and by abstracting parts of POCO with our own classes, we can implement additional functionality (classic OOP.) Both of these arguments I can appreciate. However, I argue that if we're doing a recode we should go big, or go home. Now would be the time to refactor and it really shouldn't be that bad especially given the similarity between our classes and those in POCO (threads, etc.) I don't know what to say regarding the second point - should we only use extended classes where the functionality is necessary?
My coworkers also don't want to litter the POCO namespace all over the place. I argue that we should pick a library/framework/toolkit, and stick with it. Take full advantage of its features. Is this not typical practice? The only project I've seen that abstracts an entire framework is Freeswitch (that provides its own interface to APR.)
One suggestion is that the API we expose to each other, and potential customers, should be free of POCO, but it would be present in the implementation (which makes sense.)
None of us really have experience in these kinds of design decisions, and it shows in the current product. Having been at this since I was young, I've got some intuition that has brought me here, but no practical experience either. I really want to avoid poor solutions to problems that are already solved.
I think my question boils down to this: When building a product, should we a) choose a dominant framework on which to base most of our code, and b) expect that framework to be tightly coupled with the product? Isn't that the point of a framework? (Is framework or library more appropriate for POCO?)
First, the API that you expose should definitely be free of POCO, boost, qt, or any other type that is not part of the standard C++ library. This is because the base libraries have their own release cycle, distinct from the release cycle of your library. If the users of your library also use boost, but a different, incompatible, version, they would need to spend time to resolve the incompatibility. The only exception to this rule is when you design a library to be released as part of a wider framework - say, an addition to the POCO toolkit. In this case the release of your library is tied to the release of the entire toolkit.
Internally, however, you should avoid using your own wrappers, unless the library that you are abstracting out is a true "commodity library"1. The reason for this is that when you hide an external library behind your classes, most of the time you mimic the level of abstraction of the library that you are hiding. The code that uses your wrapper will program to the level of abstraction dictated by the external library. When you swap the implementation behind your wrapper for a different framework, it is very likely that you would either (1) adapt the new framework to fit the level of abstraction of the old framework, or (2) will need to change the way in which you use your wrapper. Both cases are highly suspect: if you do (1), perhaps you shouldn't switch in the first place, and if you do (2), then your wrappers prove to be useless.
1 By "commodity library" I mean a library that provides a level of abstraction commonly found in other libraries that serve a similar purpose.
There are two situations where I think it's worth having your own wrappers:
1) You've looked at several different mutex implementations on different systems/libraries, you've established a common set of requirements that they can all satisfy and that are sufficient for your software. Then you define that abstraction and implement it one or more times, knowing that you've planned ahead for flexibility. The rest of your code is written to rely only on your abstraction, not on any incidental properties of the current implementation(s). I have done this in the past, although not in code I can show you.
A classic example of this "least common interface" would be to change rename in the filesystem abstraction, on the basis that Windows cannot implement an atomic rename-over-an-existing-file. So your code must not rely on atomic rename-replacement if you might in future swap out your current *nix implementation for one that can't do that. You have to restrict the interface from the start.
When done right, this kind of interface can considerably ease any kind of future porting, either to a new system or because you want to change your third-party library dependencies. However, an entire framework is probably too big to successfully do this with -- essentially you'd be inventing and writing your own framework, which is not a trivial task and conceivably is a larger task than writing your actual software.
2) You want to be able to mock/stub/sham/spoof/plagiarize/whatever the next clever technique is, the mutex in tests, and decide that you will find this easier if you have your own wrapper thrown over it than if you're trying to mess with symbols from third-party libraries, or that are built-in.
Note that defining your own functions called wrap_pthread_mutex_init, wrap_pthread_mutex_lock etc, that precisely mimic pthread_* functions, and take exactly the same parameters, might satisfy (2) but doesn't satisfy (1). And anyway, doing (2) properly probably requires more than just wrappers, you usually also want to inject the dependencies into your code.
Doing extra work under the heading of flexibility, without actually providing for flexibility, is pretty much a waste of time. It can be very difficult or even provably impossible to implement one threading environment in terms of another one. If you decide in future to switch from pthreads to std::thread in C++, then having used an abstraction that looks exactly like the pthreads API under different names is (approximately) no help whatsoever.
For another possible change you might make, implementing the full pthreads API on Windows is sort of possible, but probably more difficult than only implementing what you actually need. So if you port to Windows, all your abstraction saves you is the time to search and replace all calls in the rest of your software. You're still going to have to (a) plug in a complete Posix implementation for Windows, or (b) do the work to figure out what you actually need, and only implement that. Your wrapper won't help with the real work.

Need to know about good C++ Reflection API (For RuntimeType Identification -RTTI and runtime calling)

I need a good C++ Reflection API (like a Microsoft API) which enables me to determine the types (class, struct, enum, int, float, double, etc) identified at runtime, declare them, and call methods on those types at runtime.
Regards,
Usman
If you are trying to get to a plugin-type architecture, the POCO Library at http://pocoproject.org has some pieces that might get you part of the way. It will allow you to load a .dll or .so at runtime and create the classes contained in it. But the calling code will still need a header file which describes an interface (or abstract base class) to be able to get the signatures of the methods.
C++ is an incredibly complex language. "Reflective" APIs weren't part of the language design and so basically it isn't there.
If you want general purpose "reflection" and "metaprogramming", you can get that by stepping outside the language and using a program transformation system (PTS). Such a tool for your purpose has to parse C++ (in more than one compilation unit at a time), provide you with access to all the language structures, let you reflect, that is, determine the type (or other properties) of any construct (e.g., variable, expression or other syntax construction) and enable you to apply arbitrary code modifications. Obviously, this won't happen at "runtime" (although I suppose you could shell out to such machinery if you insisted).
Our DMS Software Reengineering Toolkit with its C++ Front End has a proven track record at analyzing and transformating very large sets of C++ code. See the technical papers for some detailed use cases. I don't think the other tools at the Wikipedia site handle C++, although they have the right mindset.
Although it isn't really a PTS (no source-to-source transformations), Clang might work, too. I'm not sure (since I don't use it all), how it can collect type information and use it to drive transformations to the source code. Its clearly very good at using such information to do LLVM code generation.

How to design a C / C++ library to be usable in many client languages? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I'm planning to code a library that should be usable by a large number of people in on a wide spectrum of platforms. What do I have to consider to design it right? To make this questions more specific, there are four "subquestions" at the end.
Choice of language
Considering all the known requirements and details, I concluded that a library written in C or C++ was the way to go. I think the primary usage of my library will be in programs written in C, C++ and Java SE, but I can also think of reasons to use it from Java ME, PHP, .NET, Objective C, Python, Ruby, bash scrips, etc... Maybe I cannot target all of them, but if it's possible, I'll do it.
Requirements
It would be to much to describe the full purpose of my library here, but there are some aspects that might be important to this question:
The library itself will start out small, but definitely will grow to enormous complexity, so it is not an option to maintain several versions in parallel.
Most of the complexity will be hidden inside the library, though
The library will construct an object graph that is used heavily inside. Some clients of the library will only be interested in specific attributes of specific objects, while other clients must traverse the object graph in some way
Clients may change the objects, and the library must be notified thereof
The library may change the objects, and the client must be notified thereof, if it already has a handle to that object
The library must be multi-threaded, because it will maintain network connections to several other hosts
While some requests to the library may be handled synchronously, many of them will take too long and must be processed in the background, and notify the client on success (or failure)
Of course, answers are welcome no matter if they address my specific requirements, or if they answer the question in a general way that matters to a wider audience!
My assumptions, so far
So here are some of my assumptions and conclusions, which I gathered in the past months:
Internally I can use whatever I want, e.g. C++ with operator overloading, multiple inheritance, template meta programming... as long as there is a portable compiler which handles it (think of gcc / g++)
But my interface has to be a clean C interface that does not involve name mangling
Also, I think my interface should only consist of functions, with basic/primitive data types (and maybe pointers) passed as parameters and return values
If I use pointers, I think I should only use them to pass them back to the library, not to operate directly on the referenced memory
For usage in a C++ application, I might also offer an object oriented interface (Which is also prone to name mangling, so the App must either use the same compiler, or include the library in source form)
Is this also true for usage in C# ?
For usage in Java SE / Java EE, the Java native interface (JNI) applies. I have some basic knowledge about it, but I should definitely double check it.
Not all client languages handle multithreading well, so there should be a single thread talking to the client
For usage on Java ME, there is no such thing as JNI, but I might go with Nested VM
For usage in Bash scripts, there must be an executable with a command line interface
For the other client languages, I have no idea
For most client languages, it would be nice to have kind of an adapter interface written in that language. I think there are tools to automatically generate this for Java and some others
For object oriented languages, it might be possible to create an object oriented adapter which hides the fact that the interface to the library is function based - but I don't know if its worth the effort
Possible subquestions
is this possible with manageable effort, or is it just too much portability?
are there any good books / websites about this kind of design criteria?
are any of my assumptions wrong?
which open source libraries are worth studying to learn from their design / interface / souce?
meta: This question is rather long, do you see any way to split it into several smaller ones? (If you reply to this, do it as a comment, not as an answer)
Mostly correct. Straight procedural interface is the best. (which is not entirely the same as C btw(**), but close enough)
I interface DLLs a lot(*), both open source and commercial, so here are some points that I remember from daily practice, note that these are more recommended areas to research, and not cardinal truths:
Watch out for decoration and similar "minor" mangling schemes, specially if you use a MS compiler. Most notably the stdcall convention sometimes leads to decoration generation for VB's sake (decoration is stuff like #6 after the function symbol name)
Not all compilers can actually layout all kinds of structures:
so avoid overusing unions.
avoid bitpacking
and preferably pack the records for 32-bit x86. While theoretically slower, at least all compilers can access packed records afaik, and the official alignment requirements have changed over time as the architecture evolved
On Windows use stdcall. This is the default for Windows DLLs. Avoid fastcall, it is not entirely standarized (specially how small records are passed)
Some tips to make automated header translation easier:
macros are hard to autoconvert due to their untypeness. Avoid them, use functions
Define separate types for each pointer types, and don't use composite types (xtype **) in function declarations.
follow the "define before use" mantra as much as possible, this will avoid users that translate headers to rearrange them if their language in general requires defining before use, and makes it easier for one-pass parsers to translate them. Or if they need context info to auto translate.
Don't expose more than necessary. Leave handle types opague if possible. It will only cause versioning troubles later.
Do not return structured types like records/structs or arrays as returntype of functions.
always have a version check function (easier to make a distinction).
be careful with enums and boolean. Other languages might have slightly different assumptions. You can use them, but document well how they behave and how large they are. Also think ahead, and make sure that enums don't become larger if you add a few fields, break the interface. (e.g. on Delphi/pascal by default booleans are 0 or 1, and other values are undefined. There are special types for C-like booleans (byte,16-bit or 32-bit word size, though they were originally introduced for COM, not C interfacing))
I prefer stringtypes that are pointer to char + length as separate field (COM also does this). Preferably not having to rely on zero terminated. This is not just because of security (overflow) reasons, but also because it is easier/cheaper to interface them to Delphi native types that way.
Memory always create the API in a way that encourages a total separation of memory management. IOW don't assume anything about memory management. This means that all structures in your lib are allocated via your own memory manager, and if a function passes a struct to you, copy it instead of storing a pointer made with the "clients" memory management. Because you will sooner or later accidentally call free or realloc on it :-)
(implementation language, not interface), be reluctant to change the coprocessor exception mask. Some languages change this as part of conforming to their standards floating point error(exception-)handling.
Always pair a callbacks with an user configurable context. This can be used by the user to give the the callback state without defining global variables. (like e.g. an object instance)
be careful with the coprocessor status word. It might be changed by others and break your code, and if you change it, other code might stop working. The status word is generally not saved/restored as part of calling conventions. At least not in practice.
don't use C style varargs parameters. Not all languages allow variable number of parameters in an unsafe way
(*) Delphi programmer by day, a job that involves interfacing a lot of hardware and thus translating vendor SDK headers. By night Free Pascal developer, in charge of, among others, the Windows headers.
(**)
This is because what "C" means binary is still dependant on the used C compiler, specially if there is no real universal system ABI. Think of stuff like:
C adding an underscore prefix on some binary formats (a.out, Coff?)
sometimes different C compilers have different opinions on what to do with small structures passed by value. Officially they shouldn't support it at all afaik, but most do.
structure packing sometimes varies, as do details of calling conventions (like skipping
integer registers or not if a parameter is registerable in a FPU register)
===== automated header conversions ====
While I don't know SWIG that well, I know and use some delphi specific header tools( h2pas, Darth/headconv etc).
However I never use them in fully automatic mode, since more often then not the output sucks. Comments change line or are stripped, and formatting is not retained.
I usually make a small script (in Pascal, but you can use anything with decent string support) that splits a header up, and then try a tool on relatively homogeneous parts (e.g. only structures, or only defines etc).
Then I check if I like the automated conversion output, and either use it, or try to make a specific converter myself. Since it is for a subset (like only structures) it is often way easier than making a complete header converter. Of course it depends a bit what my target is. (nice, readable headers or quick and dirty). At each step I might do a few substitutions (with sed or an editor).
The most complicated scheme I did for Winapi commctrl and ActiveX/comctl headers. There I combined IDL and the C header (IDL for the interfaces, which are a bunch of unparsable macros in C, the C header for the rest), and managed to get the macros typed for about 80% (by propogating the typecasts in sendmessage macros back to the macro declaration, with reasonable (wparam,lparam,lresult) defaults)
The semi automated way has the disadvantage that the order of declarations is different (e.g. first constants, then structures then function declarations), which sometimes makes maintenance a pain. I therefore always keep the original headers/sdk to compare with.
The Jedi winapi conversion project might have more info, they translated about half of the windows headers to Delphi, and thus have enormous experience.
I don't know but if it's for Windows then you might try either a straight C-like API (similar to the WINAPI), or packaging your code as a COM component: because I'd guess that programming languages might want to be able to invoke the Windows API, and/or use COM objects.
Regarding automatic wrapper generation, consider using SWIG. For Java, it will do all the JNI work. Also, it is able to translate complex OO-C++-interfaces properly (provided you follow some basic guidelines, i.e. no nested classes, no over-use of templates, plus the ones mentioned by Marco van de Voort).
Think C, nothing else. C is one of the most popular programming languages. It is widely used on many different software platforms, and there are few computer architectures for which a C compiler does not exist. All popular high-level languages provide an interface to C. That makes your library accessible from almost all platforms in existence. Don't worry too much about providing an Object Oriented interface. Once you have the library done in C, OOP, functional or any other style interface can be created in appropriate client languages. No other systems programming language will give you C's flexibility and potability.
NestedVM I think is going to be slower than pure Java because of the array bounds checking on the int[][] that represents the MIPS virtual machine memory. It is such a good concept but might not perform well enough right now (until phone manufacturers add NestedVM support (if they do!), most stuff is going to be SLOW for now, n'est-ce pas)? Whilst it may be able to unpack JPEGs without error, speed is of no small concern! :)
Nothing else in what you've written sticks out, which isn't to say that it's right or wrong! The principles sound (mainly just listening to choice of words and language to be honest) like roughly standard best practice but I haven't thought through the details of everything you've said. As you said yourself, this really ought to be several questions. But of course doing this kind of thing is not automatically easy just because you're fixed on perhaps a slightly different architecture to the last code base you've worked on...! ;)
My thoughts:
All your comments on C interface compatibility sound sensible to me, pretty much best practice except you don't seem to properly address memory management policy - some sentences a bit ambiguous/vague/wrong-sounding. The design of the memory management will be to a large extent determined by the access patterns made in your application, rather than the functionality per se. I suiggest you study others' attempts at making portable interfaces like the standard ANSI C API, Unix API, Win32 API, Cocoa, J2SE, etc carefully.
If it was me, I'd write the library in a carefully chosen subset of the common elements of regular Java and Davlik virtual machine Java and also write my own custom parser that translates the code to C for platforms that support C, which would of course be most of them. I would suggest that if you restrict yourself to data types of various size ints, bools, Strings, Dictionaries and Arrays and make careful use of them that will help in cross-platform issues without affecting performance much most of the time.
your assumptions seem ok, but i see trouble ahead, much of which you have already spotted in your assumptions.
As you said, you can't really export c++ classes and methods, you will need to provide a function based c interface. What ever facade you build around that, it will remain a function based interface at heart.
The basic problem i see with that is that people choose a specific language and its runtime because their way of thinking (functional or object oriented) or the problem they address (web programming, database,...) corresponds to that language in some way or other.
A library implemented in c will probably never feel like the libraries they are used to, unless they program in c themselves.
Personally, I would always prefer a library that "feels like python" when I use python, and one that feels like java when I do Java EE, even though I know c and c++.
So your effort might be of little actual use (other than your gain in experience), because people will probably want to stick with their mindset, and rather re-implement the functionality than use a library that does the job, but does not fit.
I also fear the desired portability will seriously hamper development. Just think of the infinite build settings needed, and tests for that. I have worked on a project that tried to maintain compatibility for 5 operating systems (all posix-like, but still) and about 10 compilers, the builds were a nightmare to test and maintain.
Give it an XML interface, whether passed as a parameter and return value or as files through a command-line invocation. This may not seem as direct as a normal function interface, but is the most practical way to access an executable from, e.g., Java.