Use C/C++ in Apache-Flink - c++

My team and I are developing an application that makes use of Flink.
The data will be processed using a computationally-heavy numerical algorithm.
In order to optimize it as much as possible, I would like to write this algorithm in C/C++ rather than in Java.
The question is: is it possible to use C/C++ code within Flink? Perhaps by wrapping it into a Java library?

I've never tested this case in particular. In general, you can always use native code from Java using JNI, the Java Native Interface.
The idea would be to have a Java facade expose your native code and use these methods in your computation graph defined with Flink in Java (or other JVM languages, like Scala). You would have to make both Java and native libraries available on all involved nodes to make this work. If you have an Hadoop cluster you can leverage YARN to ship files along with your job (docs here, see --yarn-ship CLI option).
I would suggest you test this incrementally, with a very small native function exposed. Also, don't underestimate Java's capabilities in terms of performance: with some well thought programming and leveraging JIT and other runtime optimizations, long running processes can enjoy even better performances than similar native code with unmanaged memory.
Keep in mind that this of course resorting to native code will mean restricting the portability of your code to the platforms for which you'll compile your libraries.

Related

Building a simple DSL in Haskell with C++ interop

I'm designing a simple interpreted language for testing real-time embedded systems. The control flow is severely restricted to provide strong static guarantees on what the scripts will do + how long they will run. For example, you can only branch on constant conditionals or loop over fixed ranges.
There is a large existing codebase in C++ with relevant models and IO libs, so this language must be able to call into C++. The systems under test have hard timing requirements, so we can't tolerate much jitter in the test framework. Our past solution was a custom DSL embedded in the C++ runtime, but we ended up re-inventing too many wheels (parser, linter, interactive interpreter, etc..) to achieve the static guarantees we need.
Haskell's facilities for crafting embedded DSLs with these guarantees are extremely appealing to me, but I'm stuck in determining how to embed it into the soft real-time C++ runtime. Any ideas? Pointers to any libraries / existing projects would be greatly appreciated!
Sounds like the path of least resistance would be an EDSL that generates C++. This way, you don't have to worry about the potential mismatch between soft real-time and the GHC RTS.
You might look at how other EDSLs that generate PLs are implemented:
HJScript uses a free monad approach to embedding JS.
JMacro uses more of an external DSL approach but embedded via TH. Wouldn't be my choice.
Instead of generating strings of C++ code, it's nice to have a data structure. Unfortunately, there doesn't seem to be a package available for C++. However, you could take a look at language-c — perhaps extend that or build your own. You might even consider generating C and using the C to C++ interop provided by those languages.
I'd probably dissuade you from looking at the design of Cryptol or Cogent as these are fully-fledged programming languages (that you have indicated you are inclined to steer away from).

Deciding on a language/framework for a modular OpenCV application

What's this about?
We have a C++ application dealing with image processing and computer vision on videos using OpenCV, we're going to rewrite it from scratch and need some help deciding what technologies to use. More specifically, I need help on how to choose the technology I'd use.
About the app
The app's functionality is divided in modules that are called in an order defined by a configuration XML file and can also be changed in runtime, but not in realtime (i.e. the application doesn't need to close, but the processing will start from scratch). These modules share data in a central datapool.
Why are we starting from scratch?
This application wasn't planned to be as dynamic as it currently strives to be, so it's grown to be a collection of buggy patches, macros and workarounds; it's now full of memory leaks, unnecessary QT dependencies, slow conversions between QT and OpenCV image formats and compilation and testing times have grown too much.
Language choice
The original code used C++, just because the guy who originally started the project only knew C++. This may be a good choice, because we need it to be as fast as possible, but there may be better choices to account for the dynamic nature of the application.
We're limited by the languages supported by OpenCV (C++, Java and Python mainly; although I've read there is also support for Ruby, Ch, C# and any JVM language)
What is needed
Speed: We're aiming for realtime tracking. This may rule out Python and Ruby.
Class Instantiation by name: Although our C++ macros and class registration system work, a language designed to be dynamic that has it's own runtime would be nice. Maybe Objective-C++, or Java.
What would be ideal
Module/Plugin/Extension/Component Framework: Why reinvent the wheel, using a good framework for this would let us focus on what's special about our app. There are many options here. Objective-C has it's NSBundles; C++ has libraries like Boost.Extension, Pluma, DynObj, FxEngine, etc; C has C-Pluff; I'd even say there are too many options.
Runtime class loading and reloading: From a developing point of view, it would be interesting to be able to compile and reload just one module. I've seen this done in via code injection in Objective-C and using Java's reflection.
What am I missing?
I have too many interesting options!
Here's where I need help, based on your experiences in modular app development, with this constraints, what kind of language/framework feature should I be looking for?
What question should I make myself about this project that would let me narrow my search?
Edit
I hadn't noticed that OpenCV had GPU bindings only for C++, so I'm stuck with it.
Now that the language is fixed, the search has narrowed a lot. I could use Objective-C++ to get the dynamism needed (Obj-C runtime + NSBundle from Cocoa/GnuStep/Cocotron), which sounds complicated; or C++ with a framework.
So I'll now narrow my question to:
Is using NSBundle in a crossplatform way with Objective-C++ easier than it sounds?
What C++ framework will provide me with hot-swappable modules?
The main reason for swapping modules in runtime is to be able to change code in a fast way, would Runtime-Compiled C++ be a better solution?
Meta: I did my research on how to ask a question like this, I hope it's acceptable.
"What question should I make myself about this project that would let me narrow my search?"
if you need gpu support(cuda/ocl), your only choice is c++.
you can safely discard C, as it won't be supported in the near future
have no fear of python, even if you need direct pixel access, that's all numpy arrays (running c-code again)
i'd be a bit sceptical of ruby, c# ch and the like, since those bindings are community based, and might not be up to date / maintained properly, while the java & python bindings are machine - generated from the c++ api, and are part of the official distribution.
If you're looking for portability and have large memory for disposal then you can go with Java.
The performance hit between C++ and Java is not that bad. For conversion between Mat and other image format I'm still not sure, coz it needs deep copy to perform that, so if your code can display the image in openCV native format then you can fasten the application
pro :
You can stop worrying about memory leak
The project is much more portable compared to C/C++(this can be wrong if you can avoid using primitive datatypes which size is non consistent and for example always use int*_t in C)
cons:
slower than C/C++
more memory and CPU clock needed
http://www.ibm.com/developerworks/java/library/j-jtp09275/index.html

How interoperability works

I know that many large-scale applications such as video games are created using multiple langages. For example, it's likely the game/physics engines are written in C++ while gameplay tasks, GUI are written in something like Python or Lua.
I understand why this division of roles is done; use lower-level languages for tasks that require extreme optimization, tweaking, efficiency and speed, while using higher-level languages to speed up production time, reduce nasty bugs ect.
Recently, I've decided to undertake a larger personal project and would like to divy-up parts of the project similar to above. At this point in time, I'm really confused about how this interoperability between languages (especially compiled vs interpreted) works.
I'm quite familiar with the details of going from ANSCII code test to loading an executable, when written in something like C/C++. I'm very curious at how something like a video game, built from many different languages, works. This is a large/broad question, but specifically I'm interested in:
How does the code-level logic work? I.e. how can I call Python code from a C++ program? Especially since they don't support the same built-in types?
What does the program image look like? From what I can tell, a video game is running in a single process, so what does the runtime image look like when running a C/C++ program that calls a Python function?
If calling code from an interpreted language from a compiled program, what are the sequence of events that occur? I.e If I'm inside my compiled executable, and for some reason have a call to an interpreted language inside a loop, do I have to wait for the interpreter every iteration?
I'm actually finding a hard time finding information on what happening at the machine-level, so any help would be appreciated. Although I'm curious in general about interoperation of software, I'm specifically interested in C++ and Python interaction.
Thank you very much for any insight, even if it's just pointing me to where I can find more information.
In the specific case of python, you have basically three options (and this generally applies across the board):
Host python in C++: From the perspective of the C++ programme, the python interpreter is a C library. On the python side, you may or may not need to use something like ctypes to expose the C(++) api.
Python uses C++ code as DLLs/SOs - C++ code likely knows nothing of python, python definitely has to use a foreign function interface.
Interprocess communication - basically, two separate processes run, and they talk over a socket. These days you'd likely use some kind of web services architecture to accomplish this.
Depending on what you want to do:
Have a look at SWIG: http://www.swig.org/ It's a tool that aims to connect C/C++ code with Python, Tcl, Perl, Ruby, etc. The common use case is a Python interface (graphical or not) that will call the C/C++ code. SWIG will parse the C/C++ code in order to generate the interfaces.
Libpython: it's a lib that allows you to embed Python code. You have some examples here: http://docs.python.org/3.0/extending/embedding.html

Wanted: Compiler Tool for Users of Software System

I am not sure if the title of this question gets to the point. I have written a large software system in C C++ for Windows, and want to give the users of this system the option to add compiled code to it. The user should be able to do basic stuff, and exchange data with my program.
Currently the implemented way is via DLLs. But for this, a grown up compiler is needed, and it is not as easy as I wished. Is there a tiny C compiler that can create Windows DLLs?
Another idea is the Java native interface. But this requires a complete Java system to run in the background, and it is not easy to run code in it.
Do you have any other ideas?
Any interpreted language? (TCL and Lua were designed as extension languages, but you can nearly as easily interface with any other).
How about python integration?
You could create an python interface that interfaces with your application. Python is rather easy to learn and should integrate easily with c/c++. The python documentation has an own chapter on that.
You could also use a tool like swig to generate the interface.
The advantage of this is that they wouldn't have to compile anything. They could just supply python files that could be loaded into your application and run within the software. This is a well known use for python, and something its simple syntax promotes.
As the other says you will be best of by providing an embedded language.
I would chip in for javascript and use the google v8 engine
By using javascript you get a language nearly everbody can use and program in.
There is other javascript engines you can embed like SpiderMonkey.
See this answer for what to choose.
An interpreted language is not good enough. I need speed. The software itself is an interpreted language. So I added support for the tiny C compiler. It is only C, and I do check mingw, which probably would not be as tiny as this. Thanks for all your hints.
Added after several months:
I have now two tools, actually: TinyC and Python. The speed difference between those is noticable (factor 5-10), but that usually does not matter too much. Python is much easier for the user, though I managed to integrate both into the Euler GUI quite nicely.
One of the ways is to add scripting. You application can host scripting environment and expose its services there. Users would be able to execute JScript/VBScript scripts and interact with your application. The advantage is that with reasonable effort you can get the power of well known and well documented scripting languages into your application (I suppose there is even debugger for scripting there). You will be required to shape your app services as COM interfaces and scripts will be able to access them automatically using method names you assigned on C++ side.
C++, Win32 and Scripting: Quick way to add Scripting support to your applications
MSDN Entry Point - IActiveScript interface

What is a good scripting language to integrate into high-performance applications?

I'm a game's developer and am currently in the processing of writing a cross-platform, multi-threaded engine for our company. Arguably, one of the most powerful tools in a game engine is its scripting system, hence I'm on the hunt for a new scripting language to integrate into our engine (currently using a relatively basic in-house engine).
Key features for the desired scripting system (in order of importance) are:
Performance - MUST be fast to call & update scripts
Cross platform - Needs to be relatively easy to port to multiple platforms (don't mind a bit of work, but should only take a few days to port to each platform)
Offline compilation - Being able to pre-parse the script code offline is almost essential (helps with file sizes and load times)
Ability to integrate well with c++ - Should be able to support OO code within the language, and integrate this functionality with c++
Multi-threaded - not required, but desired. Would be best to be able to run separate instances of it on multiple threads that don't interfere with each other (i.e. no globals within the underlying code that need to be altered while running). Critical Section and Mutex based solutions need not apply.
I've so far had experience integrating/using Lua, Squirrel (OO language, based on Lua) and have written an ActionScript 2 virtual machine.
So, what scripting system do you recommend that fits the above criteria? (And if possible, could you also post or link to any comparisons to other scripting languages that you may have)
Thanks,
Grant
Lua has the advantage of being time-tested by a number of big-name video game developers and a good base of knowledgeable developers thanks to Blizzard-Activision's adoption of it as the primary platform for developing World of Warcraft add-ins.
Lua is a very good match for your needs. I'll take them in the same order.
Lua is one of the fastest scripting languages. It's fast to compile and fast to run.
Lua compiles on any platform with an ANSI C compiler, which afaik includes all gaming platforms.
Lua can be pre-compiled, but as a very dynamic languages most errors are only detectable at runtime. Also precompiled code (as bytecode) is often larger in terms of size than source code.
There are many Lua/C++ binding tools.
It doesn't support multi-threading (you cannot access a single instance of the interpreter from multiple threads), but you can have several instances of the interpreter, one per thread, or even one per game object.
Lua have been used in video-game industry for years. Lightweight and efficient.
That being said, ChaiScript and Falcon are good candidates matching your needs and with higher level language than Lua but with less history and community support.
Lua
Boost Python
SWIG
We've had good luck with Squirrel so far. Lua is so popular it's on its way to becoming a standard.
I recommend you worry more about memory than speed. Most scripting languages are "fast enough" and if they get slow you can always push some of that functionality back down into C++. Many of them burn through lots of memory, though, and on a console memory is an even more scarce resource than CPU time. Unbounded memory consumption will crash you eventually, and if you have to allocate 4MB just for the interpreter, that's like having to throw 30 textures out the window to make room.
Lua, and then LuaJIT for extra blaziness!
just don't expect too much from automatic C++ binding libraries, most are slow and restrictive. better do your own binding for your own objects.
as for concurrency, either LuaLanes, or roll your own. if your C++ program is already multithreaded, just call separate LuaStates from each thread, and use your own C++ shared structures as communications channels if needed.
as you might already know, the most often repeated answer in Lua is 'roll your own', and it's often the best advice! except when it's about bindings to common C/C++ libraries, in that case it's quite probable there's already one.
If you haven't looked at it yet I would suggest you check out Angelscript.
I have successfully used it in a cross platform environment (Windows and Linux with only a recompile) and it is designed to integrate well with C++ (both objects and code).
It is lightweight and supports multi-threading (in the sense that the question was asked), performs well and compiles to byte code which could be done in advance.
Start with Python.
If you can prove that you need more speed, then look at Stackless Python. That's what EVE Online uses for their game.
JavaScript may be a reasonable option, because of the mountains of effort that have gone into optimizing the various implementations for use in web-browsers.
These come to mind:
Lua
Python with boost::python
MzScheme or Guile
Ruby with SWIG