JNI Performance - java-native-interface

JNI Performance - java-native-interface

Our main program is in java but the code that extracts our data from storage is written in C. I need to build an HDF5 file from extracted data. Would it be better to use JNI to call the C code to get the data and then build the HDF5 file from Java or to build the HDF5 from the C code?
I have little experience with JNI or C.
Also One of our main criteria is performance. How much of a performance hit is there when using JNI?

The function call boundary is "slow", so if you're doing many calls to your native routine, then performance will suffer.
An example of the kind of thing that may benefit from moving to JNI (I emphasize may, because Java is more than fast enough for many purposes) would be a routine to do some sort of image processing on a large bitmap. But to call a JNI routine for each pixel would be much, much slower than doing it within a loop in pure Java.
Extracting data from one format to another is, frankly, the kind of thing best done in a "scripting" language like Python, and will never be bound by CPU. Rather, the disk speed is going to be way, way slower than any language interpreter.

Looks to me that since you have little experience with C, your best choice is to do it from withing Java.. JNI is really not that bad.

I'm not familiar at all with the HDF interfaces other than the C one, so I cannot really compare for you.
I would point out that this is the API which all other interfaces use, so if you're finding that you need to eek out the maximum performance then the C API is going to be the best.
There is also a question about what parts of the API are available to you. For example I originally started with the C++ interface. Some APIs were still only available in the C API. This is not a problem when you're using C & C++ but it may be a problem if an API you need is not available to you.
Having said that, you do need to have your head screwed on in terms of the object model of C: eg. use of pointers, handles etc.

Related

Price of switching control between C++ and Python

I'm developing a C++ application that is extended/ scriptable with Python. Of course C++ is much faster than Python, in general, but does that necessarily mean that you should prefer to execute C++ code over Python code as often as possible?
I'm asking this because I'm not sure, is there any performance cost of switching control between code written in C++ and code written in Python? Should I use code written in C++ on every occasion, or should I avoid calling back to C++ for simple tasks because any speed gain you might have from executing C++ code is outmatched by the cost of switching between languages?
Edit: I should make this clear, I'm not asking this to actually solve a problem. I'm asking purely out of curiosity and it's something worth keeping in mind for the future. So I'm not interested in alternative solutions, I just want to know the answer, from a technical standpoint. :)

I don't know there is a concrete rule for this, but a general rule that many follow is to:
Prototype in python. This is quicker to write, and may be easier to read/reason about.
Once you have a prototype, you can now identify the slow portions that should be written in c++ (through profiling).
Depending on the domain of your code, the slow bits are usually isolated to the 'inner loop' types of code, so the number of switches between python an this code should be relatively small.
If your program is sufficiently fast, you've successfully avoided prematurely optimizing your code by writing too much in c++.

Keep it simple and tune performance as needed. The primary reason for embedding an interpreter in a C++ app is to allow run-time configuration/data to specify some processing - i.e. you can modify the script without recompiling the C++ program - that's your guide for when to call into the interpreter. Once in some interpreter call, the primary reasons to call back into C++ are:
to access or update some data that can't reasonably be exposed as a parameter to the call (or via some other registration process the interpreter supports)
to get better performance during some critical part of the processing
For the latter, try the script first (assuming it's as easy to develop there), then if it's slow identify where and how some C++ code might help. If/where performance does prove a problem - as a general guideline when calling from C++ to the interpreter or vice versa: try to line up as much work as possible then make the call into the other system. If you get stuck, come back to stackoverflow with a specific problem and actual code.

The cost is present but negligible. That's because you probably do a fair bit of work converting python's high level datatypes to C++-compatible representations. Of course this is similar to the cost of calling one C++ function from another, there's some overhead. The rules for when it's a good idea to switch from python to C++ are:
A function with few arguments
A function which does a large amount of processing on a small amount of data
A function which is called as rarely as possible - consolidate function calls if possible

The best metric should be something that wieghs up for you....
Makes development, debugging and testing easier (lowers dev cost)
Lowers the cost of maintenance
meets the performance requirement (provides solution)

Lua, game state and game loop

Call main.lua script at each game loop iteration - is it good or bad design? How does it affect on the performance (relatively)?
Maintain game state from a. C++ host-program or b. from Lua scripts or c. from both and synchronise them?
(Previous question on the topic: Lua and C++: separation of duties)
(I vote for every answer. The best answer will be accepted.)

My basic rule for lua is - or any script language in a game -
Anything that happens on every frame: c++
asynchronous events - user input - lua
synchronous game engine events - lua
Basically, any code thats called at >33-100Hz (depending on frame rate) is C++
I try to invoke the script engine <10Hz.
Based on any kind of actual metric? not really. but it does put a dividing line in the design, with c++ and lua tasks clearly delineated - without the up front delineation the per frame lua tasks will grow until they are bogging processing per frame - and then theres no clear guideline on what to prune.

The best thing about lua is that it has a lightweight VM, and after the chunks get precompiled running them in the VM is actually quite fast, but still not as fast as a C++ code would be, and I don't think calling lua every rendered frame would be a good idea.
I'd put the game state in C++, and add functions in lua that can reach, and modify the state. An event based approach is almost better, where event registering should be done in lua (preferably only at the start of the game or at specific game events, but no more than a few times per minute), but the actual events should be fired by C++ code. User inputs are events too, and they don't usually happen every frame (except for maybe MouseMove but which should be used carefully because of this). The way you handle user input events (whether you handle everything (like which key was pressed, etc) in lua, or whether there are for example separate events for each keys on the keyboard (in an extreme case) depends on the game you're trying to make (a turn based game might have only one event handler for all events, an RTS should have more events, and an FPS should be dealt with care (mainly because moving the mouse will happen every frame)). Generally the more separate kinds of events you have, the less you have to code in lua (which will increase performance), but the more difficult it gets if a "real event" you need to handle is actually triggered by more separate "programming level events" (which might actually decrease performance, because the lua code needs to be more complex).
Alternatively if performance is really important you can actually improve the lua VM by adding new opcodes to it (I've seen some of the companies to do this, but mainly to make decompilation of the compiled lua chunks more harder), which is actually not a hard thing to do. If you have something that the lua code needs to do a lot of times (like event registering, event running, or changing the state of the game) you might want to implement them in the lua VM, so instead of multiple getglobal and setglobal opcodes they would only take one or two (for example you could make a SETSTATE opcode with a 0-255 and a 0-65535 parameter, where the first parameter descibes which state to modify, and the second desribes the new value of the state. Of course this only works if you have a maximum of 255 events, with a maximum of 2^16 values, but it might be enough in some cases. And the fact that this only takes one opcode means that the code will run faster). This would also make decompilation more harder if you intend to obscure your lua code (although not much to someone who knows the inner workings of lua). Running a few opcodes per frame (around 30-40 tops) won't hit your performance that badly. But 30-40 opcodes in the lua VM won't get you far if you need to do really complex things (a simple if-then-else can take up to 10-20 or more opcodes depending on the expression).

I don't like C++. But I do like games.
My approach might be a bit atypical: I do everything I can in Lua, and only the absolute minimum in C++. The game loop, the entities, etc are all done in Lua. I even have a QuadTree implementation done in Lua. C++ handles graphical and filesystem stuff, as well as interfacing with external libraries.
This is not a machine-based decision, but a programmer-based one; I output code much faster in Lua than In C++. So I spend my programmer cycles on new features rather than on saving computer cycles. My target machines (any laptop from the last 3 years) are able to cope with this amount of Lua very easily.
Lua is surprisingly low-footprint (take a look to luaJIT if you don't know it).
This said, if I ever find a bottleneck (I haven't yet) I'll profile the game in order to find the slow part, and I'll translate that part to C++ ... only if I can't find a way around it using Lua.

I am using Lua for the first time in a game I've been working on. The C++ side of my application actually holds pointers to instances of each game state. Some of the game states are implemented in C++ and some are implemented in Lua (such as the "game play" state).
The update and main application loop live on the C++ side of things. I have exposed functions that allow the Lua VM to add new game states to the application at runtime.
I have not yet had any problems with slowness, even running on hardware with limited resources (Atom processor with integrated video). Lua functions are called every frame. The most expensive (in terms of time) operation in my application is rendering.
The ability to create new states completely in Lua was one of the best decisions I made on the project, since it allows me to freely add portions of the game without recompiling the whole thing.
Edit: I'm using Luabind, which I have read performs slower in general than other binding frameworks and of course the Lua C API.

You probably don't want to execute the entire Lua script on every frame iteration, because any sufficiently complex game will have multiple game objects with their own behaviors. In other words, the advantages of Lua are lost unless you have multiple tiny scripts that handle a specific part of the behavior of a larger game. You can use the lua_call function to call any appropriate lua routine in your script, not just the entire file.
There's no ideal answer here, but the vast majority of your game state is traditionally stored in the game engine (i.e. C++). You reveal to Lua just enough for Lua to do the decision making that you've assigned to Lua.
You need to consider which language is appropriate for which behaviors. Lua is useful for high level controls and decisions, and C++ is useful for performance oriented code. Lua is particularly useful for the parts of your game that you need to tweak without recompiling. All magic constants and variables could go into Lua, for example. Don't try to shoehorn Lua where it does not belong, i.e. graphics or audio rendering.

IMHO Lua scripts are for specific behaviours, it's definitely going to hurt performance if you are calling a Lua script 60 times per second.
Lua scripts are often to separate stuff like Behaviour, and specific events from your Game Engine logic (GUI, Items, Dialogs, game engine events, etc...). A good usage of Lua for example would be when triggering an explosion (particle FX), if the Game Character walks somewhere, hard-coding the output of that event in your engine would be a very ugly choice. Though, making the engine trigger the correct script would be a better choice, decoupling that specific behavior off your engine.
I would recommend, to try to keep your Game State in one part, instead of upscaling the level of complexity of keeping states synchronized in two places (Lua and Engine), add threading to that, and you will end up having a very ugly mess. Keep it simple. (In my Designs I mostly keep Game State in C++)
Good luck with your Game!

I'd like to throw in my two cents since I strongly believe that there's some incorrect advice being given here. For context, I am using Lua in a large game that involves both intensive 3D rendering as well as intensive game logic simulation. I've become more familiar than I'd have liked to with Lua and performance...
Note that I'm going to talk specifically about LuaJIT, because you're going to want to use LuaJIT. It's plug-and-play, really, so if you can embed Lua you can embed LuaJIT. You'll want it, if not for the extra speed, then for the automagic foreign function interface module (require 'ffi') that will allow you to call your native code directly from Lua without ever having to touch the Lua C API (95%+ of cases).
It's perfectly fine to call Lua at 60hz (I call it at 90hz in VR..). The catch is that you are going to have to be careful to do it correctly. As someone else mentioned, it's critical that you load the script only once. You can then use the C API to get access to functions you defined in that script, or to run the script itself as a function. I recommend the former: for a relatively simple game, you can get by with defining functions like onUpdate (dt), onRender (), onKeyPressed (key), onMouseMoved (dx, dy), etc. You can call these at the appropriate time from your main loop in C++. Alternatively, you can actually have your entire main loop be in Lua, and instead invoke your C++ code for performance-critical routines (I do that). This is especially easy to do with the LuaJIT FFI.
This is the really hard question. It will depend on your needs. Can you quite easily hammer down the game state? Great, put it C++-side and access from LuaJIT FFI. Not sure what all will be in the game state / like to be able to prototype quickly? Nothing wrong with keeping it in Lua. That is, until you start talking about a complex game with 1000s of objects, each containing non-trivial state. In this case, hybrid is the way to go, but figuring out exactly how to split state between C++ and Lua, and how to martial said state between the two (especially in perf-critical routines) is something of an art. Let me know if you come up with a bulletproof technique :) As with everything else, the general rule of thumb is: data that goes through performance-critical pathways needs to be on the native side. For example, if your entities have positions and velocities that you update each frame, and you have thousands of said entities, you need to do this in C++. However, you can get away with layering an 'inventory' on top of these entities using Lua (an inventory doesn't need a per-frame update).
Now, a couple more notes I'd like to throw out both as general FYIs and in response to some of the other answers.
Event-based approaches are, in general, critical to the performance of any game, but that goes doubly for systems written in Lua. I said it's perfectly fine to call Lua # 60hz. But it's not perfectly fine to be running tight loops over lots of game objects each frame in said Lua. You might get away with wastefully calling update() on everything in the universe in C++ (though you shouldn't), but doing so in Lua will start eating up those precious milliseconds far too quickly. Instead, as others have mentioned, you need to be thinking of Lua logic as 'reactive' -- usually, this means handling an event. For example, don't check that one entity is in range of another each frame in Lua (I mean, this is fine for one entity, but in general when you're scaling up your game you need to not think like this). Instead, tell your C++ engine to notify you when the two entities get within a certain distance of one another. In this way, Lua becomes the high-level controller of game logic, dispatching high-level commands and responding to high-level events, not carrying out the low-level math grind of trivial game logic.
Be wary of the advice that 'mixed code' is slow. The Lua C API is lightweight and fast. Wrapping functions for use with Lua is, at worst, quite easy (and if you take a while to understand how Lua interfaces with C w.r.t. the virtual stack, etc, you will notice that it has been designed specifically to minimize call overhead), and at best, trivial (no wrapping required) and 100% as performant as native calls (thanks, LuaJIT FFI!) In most cases I find that a careful mixture of Lua and C (via the LJ FFI) is superior to pure Lua. Even vector operations, which I believe the LuaJIT manuals mention somewhere as an example where code should be kept to Lua, I have found to be faster when I execute via Lua->C calls. Ultimately, don't trust anyone or anything but your own (careful) performance measurements when it comes to pieces of code that are performance-sensitive.
Most small-ish games could get away just fine with game state, the core loop, input handling, etc. done entirely in Lua and running under LuaJIT. If you're building a small-ish game, consider the "move to C++ as needed" approach (lazy compilation!) Write in Lua, when you identify a clear bottleneck (and have measured it to make sure it's the culprit), move that bit to C++ and be on your way. If you're writing a big game with 1000s of complex objects with complex logic, advanced rendering, etc. then you'll benefit from more up-front designing.
Best of luck :)

About the performance of 1: if main.lua does not change, load it once with lua_loadfile or loadfile, save a reference to the returned function, and then call it when needed.

Most of the performance will be lost through the binding between Lua and C++. A function call will actually need to be wrapped, and re-wrapped, and like that a couple of time usually. Pure Lua or pure C++ code is usually faster than mixed code (for small operations).
Having said that, I personally didn't see any strong performance hit running a Lua script every frame.
Usually scripting is good at high level. Lua has been used in famous games for the Bots (Quake 3) and for the User Interface (World of Warcraft). Used at high level Lua micro-threads come handy: The coroutines can save a lot (compared to real threads). For example to run some Lua code only once in a while.

How can I implement scripting in my game?

I'm trying to write a game and implement scripting so that later on in development I won't have to recompile everything when I want to change numbers.
My problem is that I don't know how scripts should interface with the game. The scripting language I'm using is angelscript.
Right now, I have a state: the intro state, which I'm using as a test for most of the modules in my game "engine" (it's more like a loose collection of classes). It will load and draw a picture and draw text, and use scripting to update itself, and maybe switch to a dummy state afterwards to test the state manager.
While writing it, I realized that using the script to do most of the updating would require that I register most of my game engine's modules with the script, and pretty much move the bulk of the code to the scripting language. Personally, I'd rather have the C++ portion doing the majority of the work, and have the scripting language come up with the numbers to use in the formulas/drawing/whatever.
However, if I'm right, doing it that way would entail lots of different update modules for the majority of the things in the game that need to be updated, and requiring that they all be loaded in, and that the C++ code would have to run each update function individually.
Or, there's a way to achieve script and program interoperability that I'm overlooking. Either way, could someone help me figure out what the best way to get scripting implemented into my game is?

There's no correct answer to such a large question really. You do it the same way you would do engine/game logic separation in C++. Define an API that the script can call that allows it whatever it is you want it to do. Register functions in that API with the script, and use the API in angelscript. What that API should be depends entirely on your needs and what kind of power you want to give the scripter.

If you want AngelCode (or any other scripting approach of your choice) to just "come up with some numbers", hey, use it that way -- e.g., in AngelCode, compile the scripts by exposing to them a single C++ function of yours, say "void ProvideNumberFor(string reason, number value)", and the scripts will be responsible for calling that function as many times as needed to "provide the numbers", and nothing more.

If you look at real examples like Garry's mod or games written with UnrealScript you'll find that quite a bit of logic in modern games is implemented in the scripts. C/C++ code is best for "static" and bottleneck-prone parts of the engine, like the renderer, physics engine, low-level networking, etc. Scripts are best for content (i.e., game logic).
Aside: The best game-scripting language IMHO is Lua. It provides easy intregration with C/C++, is very well-documented, and will be familiar to users of Javascript.
I've used Lua for exactly this purpose, and it does a great job. Look at all the games programmed using Lua. Also, it's blazingly fast.
EDIT: I didn't read the question fully...sorry. This is my real answer ;)

If you're familiar with Qt4/Javscript you can always use QtScript http://qt.nokia.com/doc/4.5/qtscript.html.

Will my iPhone app take a performance hit if I use Objective-C for low level code?

When programming a CPU intensive or GPU intensive application on the iPhone or other portable hardware, you have to make wise algorithmic decisions to make your code fast.
But even great algorithm choices can be slow if the language you're using performs more poorly than another.
Is there any hard data comparing Objective-C to C++, specifically on the iPhone but maybe just on the Mac desktop, for performance of various similar language aspects? I am very familiar with this article comparing C and Objective-C, but this is a larger question of comparing two object oriented languages to each other.
For example, is a C++ vtable lookup really faster than an Obj-C message? How much faster? Threading, polymorphism, sorting, etc. Before I go on a quest to build a project with duplicate object models and various test code, I want to know if anybody has already done this and what the results where. This type of testing and comparison is a project in and of itself and can take a considerable amount of time. Maybe this isn't one project, but two and only the outputs can be compared.
I'm looking for hard data, not evangelism. Like many of you I love and hate both languages for various reasons. Furthermore, if there is someone out there actively pursuing this same thing I'd be interesting in pitching in some code to see the end results, and I'm sure others would help out too. My guess is that they both have strengths and weaknesses, my goal is to find out precisely what they are so that they can be avoided/exploited in real-world scenarios.

Mike Ash has some hard numbers for performance of various Objective-C method calls versus C and C++ in his post "Performance Comparisons of Common Operations". Also, this post
by Savoy Software is an interesting read when it comes to tuning the performance of an iPhone application by using Objective-C++.
I tend to prefer the clean, descriptive syntax of Objective-C over Objective-C++, and have not found the language itself to be the source of my performance bottlenecks. I even tend to do things that I know sacrifice a little bit of performance if they make my code much more maintainable.

Yes, well written C++ is considerably faster. If you're writing performance critical programs and your C++ is not as fast as C (or within a few percent), something's wrong. If your ObjC implementation is as fast as C, then something's usually wrong -- i.e. the program is likely a bad example of ObjC OOD because it probably uses some 'dirty' tricks to step below the abstraction layer it is operating within, such as direct ivar accesses.
The Mike Ash 'comparison' is very misleading -- I would never recommend the approach to compare execution times of programs you have written, or recommend it to compare C vs C++ vs ObjC. The results presented are provided from a test with compiler optimizations disabled. A program compiled with optimizations disabled is rarely relevant when you are measuring execution times. To view it as a benchmark which compares C++ against Objective-C is flawed. The test also compares individual features, rather than entire, real world optimized implementations -- individual features are combined in very different ways with both languages. This is far from a realistic performance benchmark for optimized implementations. Examples: With optimizations enabled, IMP cache is as slow as virtual function calls. Static dispatch (as opposed to dynamic dispatch, e.g. using virtual) and calls to known C++ types (where dynamic dispatch may be bypassed) may be optimized aggressively. This process is called devirtualization, and when it is used, a member function which is declared virtual may even be inlined. In the case of the Mike Ash test where many calls are made to member functions which have been declared virtual and have empty bodies: these calls are optimized away entirely when the type is known because the compiler sees the implementation and is able to determine dynamic dispatch is unnecessary. The compiler can also eliminate calls to malloc in optimized builds (favoring stack storage). So, enabling compiler optimizations in any of C, C++, or Objective-C can produce dramatic differences in execution times.
That's not to say the presented results are entirely useless. You could get some useful information about external APIs if you want to determine if there are measurable differences between the times they spend in pthread_create or +[NSObject alloc] on one platform or architecture versus another. Of course, these two examples will be using optimized implementations in your test (unless you happen to be developing them). But for comparing one language to another in programs you compile… the presented results are useless with optimizations disabled.
Object Creation
Consider also object creation in ObjC - every object is allocated dynamically (e.g. on the heap). With C++, objects may be created on the stack (e.g. approximately as fast as creating a C struct and calling a simple function in many cases), on the heap, or as elements of abstract data types. Each time you allocate and free (e.g. via malloc/free), you may introduce a lock. When you create a C struct or C++ object on the stack, no lock is required (although interior members may use heap allocations) and it often costs just a few instructions or a few instructions plus a function call.
As well, ObjC objects are reference counted instances. The actual need for an object to be a std::shared_ptr in performance critical C++ is very rare. It's not necessary or desirable in C++ to make every instance a shared, reference counted instance. You have much more control over ownership and lifetime with C++.
Arrays and Collections
Arrays and many collections in C and C++ also use strongly typed containers and contiguous memory. Since the address of the next element's members are often known, the optimizer can do much more, and you have great cache and memory locality. With ObjC, that's far from reality for standard objects (e.g. NSObject).
Dispatch
Regarding methods, many C++ implementations use few virtual/dynamic calls, particularly in highly optimized programs. These are static method calls and fodder for the optimizers.
With ObjC methods, each method call (objc message send) is dynamic, and is consequently a firewall for the optimizer. Ultimately, that results in many restrictions or inconveniences regarding what you can and cannot do to keep performance at a minimum when writing performance critical ObjC. This may result in larger methods, IMP caching, frequent use of C.
Some realtime applications cannot use any ObjC messaging in their render paths. None -- audio rendering is a good example of this. ObjC dispatch is simply not designed for realtime purposes; Allocations and locks may happen behind the scenes when messaging objects, making the complexity/time of objc messaging unpredictable enough that the audio rendering may miss its deadline.
Other Features
C++ also provides generics/template implementations for many of its libraries. These optimize very well. They are typesafe, and a lot of inlining and optimizations may be made with templates (consider it polymorphism, optimization, and specialization which takes place at compilation). C++ adds several features which just are not available or comparable in strict ObjC. Trying to directly compare langs, objects, and libraries which are very different is not so useful -- it's a very small subset of actual realizations. It's better to expand the question to a library/framework or real program, considering many aspects of design and implementation.
Other Points
C and C++ symbols can be more easily removed and optimized away in various stages of the build (stripping, dead code elimination, inlining and early inlining, as well as Link Time Optimization). The benefits of this include reduced binary sizes, reduced launch/load times, reduced memory consumption, etc.. For a single app, that may not be such a big deal; but if you reuse a lot of code, and you should, then your shared libraries could add a lot of unnecessary weight to the program, if implemented ObjC -- unless you are prepared to jump through some flaming hoops. So scalability and reuse are also factors in medium/large projects, and groups where reuse is high.
Included Libraries
ObjC library implementors also optimize for the environment, so its library implementors can make use of some language and environment features to offer optimized implementations. Although there are some pretty significant restrictions when writing an optimized program in pure ObjC, some highly optimized implementations exist in Cocoa. This is one of Cocoa's strong points, although the C++ standard library (what some people call the STL) is no slouch either. Cocoa operates at a much higher level of abstraction than C++ -- if you don't know well what you're doing (or should be doing), operating closer to the metal can really cost you. Falling back on to a good library implementation if you are not an expert in some domain is a good thing, unless you are really prepared to learn. As well, Cocoa's environments are limited; you can find implementations/optimizations which make better use of the OS.
If you're writing optimized programs and have experience doing so in both C++ and ObjC, clean C++ implementations will often be twice as fast or faster than clean ObjC (yes, you can compare against Cocoa). If you know how to optimize, you can often do better than higher level, general purpose abstractions. Although, some optimized C++ implementations will be as fast as or slower than Cocoa's (e.g. my initial attempt at file I/O was slower than Cocoa's -- primarily because the C++ implementation initializes its memory).
A lot of it comes down to the language features you are familiar with. I use both langs, they both have different strengths and models/patterns. They complement each other quite well, and there are great libraries for both. If you're implementing a complex, performance critical program, correct use of C++'s features and libraries will give you much more control and provide significant advantages for optimization, such that in the right hands, "several times faster" is a good default expectation (don't expect to win every time, or without some work, however). Remember, it takes years to understand C++ well enough to really reach that point.
I keep the majority of my performance critical paths as C++, but also recognize that ObjC is also a very good solution for some problems, and that there are some very good libraries available.

It's very hard to collect "hard data" for this that's not misguiding.
The biggest problem with doing a feature-to-feature comparison like you suggest is that the two languages encourage very different coding styles. Objective-C is a dynamic language with duck typing, where typical C++ usage is static. The same object-oriented architecture problem would likely have very different ideal solutions using C++ or Objective-C.
My feeling (as I have programmed much in both languages, mostly on huge projects): To maximize Objective-C performance, it has to be written very close to C. Whereas with C++, it's possible to make much more use of the language without any performance penalty compared to C.
Which one is better? I don't know. For pure performance, C++ will always have the edge. But the OOP style of Objective-C definitely has its merits. I definitely think it is easier to keep a sane architecture with it.

This really isn't something that can be answered in general as it really depends on how you use the language features. Both languages will have things that they are fast at, things that they are slow at, and things that are sometimes fast and sometimes slow. It really depends on what you use and how you use it. The only way to be certain is to profile your code.
In Objective C you can also write c++ code, so it might be easier to code in Objective C for the most part, and if you find something that doesn't perform well in it, then you can have a go at writting a c++ version of it and seeing if that helps (C++ tends to optimize better at compile time). Objective C will be easier to use if APIs you are interfacing with are also written in it, plus you might find it's style of OOP is easier or more flexible.
In the end, you should go with what you know you can write safe, robust code in and if you find an area that needs special attention from the other language, then you can swap to that. X-Code does allow you to compile both in the same project.

I have a couple of tests I did on an iPhone 3G almost 2 years ago, there was no documentation or hard numbers around in those days. Not sure how valid they still are but the source code is posted and attached.
This isn't a very extensive test, I was mainly interested in NSArray vs C Array for iterating a large number of objects.
http://memo.tv/nsarray_vs_c_array_performance_comparison
http://memo.tv/nsarray_vs_c_array_performance_comparison_part_ii_makeobjectsperformselector
You can see the C Array is much faster at high iterations. Since then I've realized that the bottleneck is probably not the iteration of the NSArray but the sending of the message. I wanted to try methodForSelector and calling the methods directly to see how big the difference would be but never got round to it. According to Mike Ash's benchmarks it's just over 5x faster.

I don't have hard data for Objective C, but I do have a good place to look for C++.
C++ started as C with Classes according to Bjarne Stroustroup in his reflection on the early years of C++ (http://www2.research.att.com/~bs/hopl2.pdf), so C++ can be thought of (like Objective C) as pushing C to its limits for object orientation.
What are those limits? In the 1994-1997 time frame, a lot of researchers figured out that object-orientation came at a cost due to dynamic binding, e.g. when C++ functions are marked virtual and there may/may not be children classes that override these functions. (In Java and C#, all functions expect ctors are inherently virtual, and there isnt' much you can do about it.) In "A Study of Devirtualization Techniques for a Java Just-In-Time Compiler" from researchers at IBM Research Tokyo, they contrast the techniques used to deal with this, including one from Urz Hölzle and Gerald Aigner. Urz Hölzle, in a separate paper with Karel Driesen, had shown that on average 5.7% of time in C++ programs (and up to ~50%) was spent in calling virtual functions (e.g. vtables + thunks). He later worked with some Smalltalk researachers in what ended up the Java HotSpot VM to solve these problems in OO. Some of these features are being backported to C++ (e.g. 'protected' and Exception handling).
As I mentioned, C++ is static typed where Objective C is duck typed. The performance difference in execution (but not lines of code) probably is a result of this difference.

This study says to really get the performance in a CPU intensive game, you have to use C. The linked article is complete with a XCode project that you can run.
I believe the bottom line is: Use Objective-C where you must interact with the iPhone's functions (after all, putting trampolines everywhere can't be good for anyone), but when it comes to loops, things like vector object classes, or intensive array access, stick with C++ STL or C arrays to get good performance.
I mean it would be totally silly to see position = [[Vector3 alloc] init] ;. You're just asking for a performance hit if you use references counts on basic objects like a position vector.

yes. c++ reign supreme in performance/expresiveness/resource tradeoff.
"I'm looking for hard data, not evangelism". google is your best friend.
obj-c nsstring is swapped with c++'s by apple enginneers for performance. in a resource constrained devices, only c++ cuts it as a MAINSTREAM oop language.
NSString stringWithFormat is slow
obj-c oop abstraction is deconstructed into procedural-based c-structs for performance, otherwise a MAGNITUDE order slower than java! the author is also aware of message caching - yet no-go. so modeling lots of small players/enemies objects is done in oop with c++ or else, lots of Procedural structs with a simple OOP wrapper around it with obj-c. there can be one paradigm that equates Procedural + Object-Oriented Programming = obj-c.
http://ejourneyman.wordpress.com/2008/04/23/writing-a-ray-tracer-for-cocoa-objective-c/

How do you glue Lua to C++ code?

Do you use Luabind, toLua++, or some other library (if so, which one) or none at all?
For each approach, what are the pro's and con's?

I can't really agree with the 'roll your own' vote, binding basic types and static C functions to Lua is trivial, yes, but the picture changes the moment you start dealing with tables and metatables; things go trickier very quickly.
LuaBind seems to do the job, but I have a philosophical issue with it. For me it seems like if your types are already complicated the fact that Luabind is heavily templated is not going to make your code any easier to follow, as a friend of mine said "you'll need Herb Shutter to figure out the compilation messages". Plus it depends on Boost, plus compilation times get a serious hit, etc.
After trying a few bindings, Tolua++ seems the best. Tolua doesn't seem to be very much in development, where as Tolua++ seems to work fine (plus half the 'Tolua' tutorials out there are, in fact, 'Tolua++' tutorials, trust me on that:) Tolua does generate the right stuff, the source can be modified and it seems to deal with complicated cases (like templates, unions, nameless structs, etc, etc)
The biggest issue with Tolua++ seems to be the lack of proper tutorials, pre-set Visual Studio projects, or the fact that the command line is a bit tricky to follow (you path/files can't have white spaces -in Windows at least- and so on) Still, for me it is the winner.

To answer my own question in part:
Luabind: once you know how to bind methods and classes via this awkward template syntax, it's pretty straightforward and easy to add new bindings. However, luabind has a significant performance impact and shouldn't be used for realtime applications. About 5-20 times more overhead than calling C functions that manipulate the stack directly.

I don't use any library. I have used SWIG to expose a C library some time ago, but there was too much overhead, and I stop using it.
The pros are better performance and more control, but its takes more time to write.

Use raw Lua API for your bindings -- and keep them simple. Take inspiration in the API itself (AUX library) and libraries by Lua authors.
With some practice raw API is the best option -- maximum flexibility and minimum of unneeded overhead. You've got what you want and no more, the way you need it to be.
If you must bind large third-party libraries use automated generators like tolua, tolua++ (or even roll your own for the specific case). It would free you from manual work.
I would not recommend using Luabind. At the moment it's development stalled (however starting to come back to life), and if you would meet some corner case, you may be on your own. Also Luabind heavily uses template metaprogramming. This may (and may not) be unacceptable, depending on the point of view.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js