Languages faster than C++ [closed] - c++

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
It is said that Blitz++ provides near-Fortran performance.
Does Fortran actually tend to be faster than regular C++ for equivalent tasks?
What about other HL languages of exceptional runtime performance? I've heard of a few languages suprassing C++ for certain tasks... Objective Caml, Java, D...
I guess GC can make much code faster, because it removes the need for excessive copying around the stack? (assuming the code is not written for performance)
I am asking out of curiosity -- I always assumed C++ is pretty much unbeatable barring expert ASM coding.

Fortran is faster and almost always better than C++ for purely numerical code. There are many reasons why Fortran is faster. It is the oldest compiled language (a lot of knowledge in optimizing compilers). It is still THE language for numerical computations, so many compiler vendors make a living of selling optimized compilers. There are also other, more technical reasons. Fortran (well, at least Fortran77) does not have pointers, and thus, does not have the aliasing problems, which plague the C/C++ languages in that domain. Many high performance libraries are still coded in Fortran, with a long (> 30 years) history. Neither C or C++ have any good array constructs (C is too low level, C++ has as many array libraries as compilers on the planet, which are all incompatible with each other, thus preventing a pool of well tested, fast code).

Whether fortran is faster than c++ is a matter of discussion. Some say yes, some say no; I won't go into that. It depends on the compiler, the architecture you're running it on, the implementation of the algorithm ... etc.
Where fortran does have a big advantage over C is the time it takes you to implement those algorithms. And that makes it extremely well suited for any kind of numerical computing. I'll state just a few obvious advantages over C:
1-based array indexing (tremendously helpful when implementing larger models, and you don't have to think about it, but just FORmula TRANslate
has a power operator (**) (God, whose idea was that a power function will do ? Instead of an operator?!)
it has, I'd say the best support for multidimensional arrays of all the languages in the current market (and it doesn't seem that's gonna change so soon) - A(1,2) just like in math
not to mention avoiding the loops - A=B*C multiplies the arrays (almost like matlab syntax with compiled speed)
it has parallelism features built into the language (check the new standard on this one)
very easily connectible with languages like C, python, so you can make your heavy duty calculations in fortran, while .. whatever ... in the language of your choice, if you feel so inclined
completely backward compatible (since whole F77 is a subset of F90) so you have whole century of coding at your disposal
very very portable (this might not work for some compiler extensions, but in general it works like a charm)
problem oriented solving community (since fortran users are usually not cs, but math, phy, engineers ... people with no programming, but rather problem solving experience whose knowledge about your problem can be very helpful)
Can't think of anything else off the top of my head right now, so this will have to do.

What Blitz++ is competing against is not so much the Fortran language, but the man-centuries of work going into Fortran math libraries. To some extent the language helps: an older language has had a lot more time to get optimizing compilers (and , let's face it, C++ is one of the most complex languages). On the other hand, high level C++ libraries like Blitz++ and uBLAS allows you to state your intentions more clearly than relatively low-level Fortran code, and allows for whole new classes of compile-time optimizations.
However, using any library effectively all the time requires developers to be well acquainted with the language, the library and the mathematics. You can usually get faster code by improving any one of the three...

FORTAN is typically faster than C++ for array processing because of the different ways the languages implement arrays - FORTRAN doesn't allow aliasing of array elements, whereas C++ does. This makes the FORTRAN compilers job easier. Also, FORTRAN has many very mature mathematical libraries which have been worked on for nearly 50 years - C++ has not been around that long!

This will depend a lot on the compiler, programmers, whether it has gc and can vary too much. If it is compiled directly to machine code then expect to have better performance than interpreted most of the time but there is a finite amount of optimization possible before you have asm speed anyway.
If someone said fortran was slightly faster would you code a new project in that anyway?

the thing with c++ is that it is very close to the hardware level. In fact, you can program at the hardware level (via assembly blocks). In general, c++ compilers do a pretty good job at optimisations (for a huge speed boost, enable "Link Time Code Generation" to allow the inlining of functions between different cpp files), but if you know the hardware and have the know-how, you can write a few functions in assembly that work even faster (though sometimes, you just can't beat the compiler).
You can also implement you're own memory managers (which is something a lot of other high level languages don't allow), thus you can customize them for your specific task (maybe most allocations will be 32 bytes or less, then you can just have a giant list of 32-byte buffers that you can allocate/deallocate in O(1) time). I believe that c++ CAN beat any other language, as long as you fully understand the compiler and the hardware that you are using. The majority of it comes down to what algorithms you use more than anything else.

You must be using some odd managed XML parser as you load this page then. :)
We continously profile code and the gain is consistently (and this is not naive C++, it is just modern C++ with boos). It consistensly paves any CLR implementation by at least 2x and often by 5x or more. A bit better than Java days when it was around 20x times faster but you can still find good instances and simply eliminate all the System.Object bloat and clearly beat it to a pulp.
One thing managed devs don't get is that the hardware architecture is against any scaling of VM and object root aproaches. You have to see it to believe it, hang on, fire up a browser and go to a 'thin' VM like Silverlight. You'll be schocked how slow and CPU hungry it is.
Two, kick of a database app for any performance, yes managed vs native db.

It's usually the algorithm not the language that determines the performance ballpark that you will end up in.
Within that ballpark, optimising compilers can usually produce better code than most assembly coders.
Premature optimisation is the root of all evil
This may be the "common knowledge" that everyone can parrot, but I submit that's probably because it's correct. I await concrete evidence to the contrary.

D can sometimes be faster than C++ in practical applications, largely because the presence of garbage collection helps avoid the overhead of RAII and reference counting when using smart pointers. For programs that allocate large amounts of small objects with non-trivial lifecycles, garbage collection can be faster than C++-style memory management. Also, D's builtin arrays allow the compiler to perform better optimizations in some cases than C++'s STL vector, which the compiler doesn't understand. Furthermore, D2 supports immutable data and pure function annotations, which recent versions of DMD2 optimize based on. Walter Bright, D's creator, wrote a JavaScript interpreter in both D and C++, and according to him, the D version is faster.

C# is much faster than C++ - in C# I can write an XML parser and data processor in a tenth the time it takes me to write it C++.
Oh, did you mean execution speed?
Even then, if you take the time from the first line of code written to the end of the first execution of the code, C# is still probably faster than C++.
This is a very interesting article about converting a C++ program to C# and the effort required to make the C++ faster than the C#.
So, if you take development speed into account, almost anything beats C++.
OK, to address tht OP's runtime only performance requirement: It's not the langauge, it's the implementation of the language that determines the runtime performance. I could write a C++ compiler that produces the slowest code imaginable, but it's still C++. It is also theoretically possible to write a compiler for Java that targets IA32 instructions rather than the Java VM byte codes, giving a runtime speed boost.
The performance of your code will depend on the fit between the strengths of the language and the requirements of the code. For example, a program that does lots of memory allocation / deallocation will perform badly in a naive C++ program (i.e. use the default memory allocator) since the C++ memory allocation strategy is too generalised, whereas C#'s GC based allocator can perform better (as the above link shows). String manipulation is slow in C++ but quick in languages like php, perl, etc.

It all depends on the compiler, take for example the Stalin Scheme compiler, it beats almost all languages in the Debian micro benchmark suite, but do they mention anything about compile times?
No, I suspect (I have not used Stalin before) compiling for benchmarks (iow all optimizations at maximum effort levels) takes a jolly long time for anything but the smallest pieces of code.

if the code is not written for performance then C# is faster than C++.
A necessary disclaimer: All benchmarks are evil.
Here's benchmarks that in favour of C++.
The above two links show that we can find cases where C++ is faster than C# and vice versa.

Performance of a compiled language is a useless concept: What's important is the quality of the compiler, ie what optimizations it is able to apply. For example, often - but not always - the Intel C++ compiler produces better performing code than g++. So how do you measure the performance of C++?
Where language semantics come in is how easy it is for the programmer to get the compiler to create optimal output. For example, it's often easier to parallelize Fortran code than C code, which is why Fortran is still heavily used for high-performance computation (eg climate simulations).
As the question and some of the answers mentioned assembler: the same is true here, it's just another compiled language and thus not inherently 'faster'. The difference between assembler and other languages is that the programmer - who ideally has absolute knowledge about the program - is responsible for all of the optimizations instead of delegating some of them to the 'dumb' compiler.
Eg function calls in assembler may use registers to pass arguments and don't need to create unnecessary stack frames, but a good compiler can do this as well (think inlining or fastcall). The downside of using assembler is that better performing algorithms are harder to implement (think linear search vs. binary seach, hashtable lookup, ...).

Doing much better than C++ is mostly going to be about making the compiler understand what the programmer means. An example of this might be an instance where a compiler of any language infers that a region of code is independent of its inputs and just computes the result value at compile time.
Another example of this is how C# produces some very high performance code simply because the compiler knows what particular incantations 'mean' and can cleverly use the implementation that produces the highest performance, where a transliteration of the same program into C++ results in needless alloc/delete cycles (hidden by templates) because the compiler is handling the general case instead of the particular case this piece of code is giving.
A final example might be in the Brook/Cuda adaptations of C designed for exotic hardware that isn't so exotic anymore. The language supports the exact primitives (kernel functions) that map to the non von-neuman hardware being compiled for.

Is that why you are using a managed browser? Because it is faster. Or managed OS because it is faster. Nah, hang on, it is the SQL database.. Wait, it must be the game you are playing. Stop, there must be a piece of numerical code Java adn Csharp frankly are useless with. BTW, you have to check what your VM is written it to slag the root language and say it is slow.
What a misconecption, but hey show me a fast managed app so we can all have a laugh. VS? OpenOffice?

Ahh... The good old question - which compiler makes faster code?
It only matters in code that actually spends much time at the bottom of the call stack, i.e. hot spots that don't contain function calls, such as matrix inversion, etc.
(Implied by 1) It only matters in code the compiler actually sees. If your program counter spends all its time in 3rd-party libraries you don't build, it doesn't matter.
In code where it does matter, it all comes down to which compiler makes better ASM, and that's largely a function of how smartly or stupidly the source code is written.
With all these variables, it's hard to distinguish between good compilers.
However, as was said, if you've got a lot of Fortran code to compile, don't re-write it.

Related

What are examples of efficient, type-inferred languages appropriate for working with multi-dimensional arrays

I don't care much about garbage collection, if present it should be optional. The language D fits the bill but I am exploring other options. Surprising to me, this seems to be a sparsely populated spot among languages. I want something with which I can run things at 80% of the speed of C, if possible.
I also would like the language to have good support for multicores. Not necessarily via threads, but anything that does not involve lots copying. For example GNU's parallel mode for libstdc++ is a reasonably good abstraction for me, but a little weak at giving pre-baked array primitives (this is not a complaint, it is not its job to give array primitives).
I suspect what I am driving at is a OCaMl like language with:
good support for multidimensional arrays,
no(or optional) garbage collection,
parallel programming primitives for array intensive code,
convenient C FFI,
and with a reasonable chance of running at 80% of C speed.
I am not sure what tags to use so suggestions are welcome. I also want to make this a wiki, but not sure how to do it. I have heard of Felix but do not know if that would be appropriate here.
You're pretty much describing D, especially if you have a long time horizon and are willing to wait for some stuff that's in the works to fully pan out.
Good support for multidimensional arrays: I'm mentoring a Google Summer of Code project focused on just that. It's not polished and ready for prime time yet, but it's already usable if you live on the bleeding edge. It also includes bindings to BLAS and LAPACK and expression templates and evaluators to provide a nice wrapper over these libraries.
D's default approach to memory management is garbage collection, but enough low-level facilities exist that you don't have to use it where performance or space efficiency is a high priority. You have full access to the C standard library and can use C's malloc and free directly. The ability to work with untyped blocks of memory also allows custom allocators to be implemented. Both I and the GSoC student I'm mentoring have been using a region allocator, which will be reviewed soon for inclusion in Phobos (the D standard library).
Parallel programming: See the std.parallelism module of the standard library. It's not geared specifically to array intensive code but it's definitely usable for that purpose.
D fully supports the C ABI. All you have to do is translate the header file and link in a C object file. There's even a tool called htod that can automate the simple cases.
D is currently somewhat slower than C because it hasn't had years of work put into optimization, as the focus has been on features and bug fixes. However, it's not far behind and almost all of the performance difference can be explained by a somewhat naive garbage collector and lack of aggressive inlining. If you avoid using the garbage collector in the most performance-critical parts of your code and manually inline a few functions here and there, it should be as fast as C.
Although I'm a huge fan of both OCaml and D, in regards to existing libraries I have found c++ to be the out and out winner here. Expression template work cut it's teeth in c++'s manipulation of multidimensional arrays and has been used to create extremely efficient libraries that vastly outperform bare c. Temporaries may be removed from calculations that you just cannot remove from c code that is generic enough to compose operations. Loops can be unrolled automatically during translationtime. And you can use autoparallelisation features out of the box that further drive your use of multicore boxes to new heights. With c++11's type inference features, you have all of your requested behavior in a mature implementation that will outperform nearly any other language out there.
Now, I'm not saying you can't do the same in D. You just won't have the maturity of libraries like uBLAS, Eigen, or Blitz++. And you won't have the compiler support that you find with Intel's Cilk Plus and other Parallel Building Blocks. Obviously, down the road I have no doubt that this kind of support will be available from the community, but c++ is the only language I have used that offers it today.
I am saying that you can't get that with OCaml, at least the standard compiler and library, simply because of the lack of true symmetric multiprocessing. Something like JoCaml, OC4MC, etc. may provide the necessary parallelisation, but you still lack a deep use of temporary reduction in commonly available matrix libraries.
This is just my experience. It is much harder yet to get these kinds of optimisations from C#, F#, and that family due to lack of deterministic destruction of temporaries - making expression template techniques much more unwieldy. The lack of compiletime type inference of templates makes many techniques unreachable. Of all the languages I have worked with over the years, building tensor and spinor libraries, graph libraries, abstract algebras (represented elements in matrices), etc., c++ still holds the best support for efficiency, now in a much better type deduced environment.
I would suggest F#.
Familiar syntax of Ocaml
2D array support in std library for F#
TPL, PLINQ and Async support for parallel programming
CLR based, hence garbage collection (Runs on Microsoft .NET as well as open source Mono)
PInvoke for call native C libraries
Good performance
Jay's FIsh provides what you want, except for the C FFI. FIsh is polyadic, not merely polymorphic. It infers the dimensions of arrays. You can write code and test on 1 or 2 dimensions but calculate with 5 or 6. FIsh is a partial evaluator. With the Fortran backend it outperforms C code easily. FIsh code is evaluated entirely on the stack (no heap, let alone garbage collector).
FYI: Felix does not have type inference. It is otherwise suitable, and easily has the best binding to C of any language other than C++. It's intended to provide low level concurrency (share memory multi-threading). Felix can usually outperform C by use of high level optimisations no C compiler could dream of. The author of Felix will provide any modifications necessary to make your code easy to read and write and run faster than C (I'm the author :) Just give me some use cases!

Compilation and execution time of C++ vs C source code

I am not sure if this is a valid comparison or a valid statement but over the years I have heard folks claiming that the programs written in C++ generally take a longer time for compilation than the same written in C and that the applications coded in C++ are generally slower at run time than ones written in C.
Is there any truth in these statements?
Apart from reaping the benefits of OOP flexibility that C++ provides, should the above comparison be given a consideration purely from a compilation/execution time perspective?
I hope that this doesn't get closed as too generic or vague, it is just an attempt to know the actual facts about statements I have been hearing over the years from many programmers(C programmers predominantly).
I'll answer one specific part of the question that's pretty objective. C++ code that uses templates is going to be slower to compile than C code. If you don't use templates (which you probably will if you use the standard library) it should be very similar compilation time.
EDIT:
In terms of runtime it's much more subjective. Even though C may be a somewhat lower level language, C++ optimizers are getting really good, and C++ lends itself to more naturally representing real world concepts. If it's easier to represent your requirements in code (as I'd argue in C++) it's often easier to write better (and more performant) code than you would in another language. I don't think there's any objective data showing C or C++ being faster in all possible cases. I would actually suggest picking your language based on project needs and then write it in that language. If things are too slow, profile and proceed with normal performance improvement techniques.
The relatively runtime speed is a bit hard to predict. At one time, when most people thought of C++ as being all about inheritance, and used virtual functions a lot (even when they weren't particularly appropriate), code written in C++ was typically a little bit slower than equivalent C.
With (what most of us would consider) modern C++, the reverse tends to be true: templates give enough more compile-time flexibility that you can frequently produce code that's noticeably faster than any reasonable equivalent in C. In theory you could always avoid that by writing specialized code equivalent to the result of "expanding" a template -- but in reality, doing so is exceedingly rare, and quite expensive.
There is something of a tendency for C++ to be written somewhat more generally as well -- just for example, reading data into std::string or std::vector (or std::vector<std::string>) so the user can enter an arbitrary amount of data without buffer overflow or the data simply be truncated at some point. In C it's a lot more common to see somebody just code up a fixed-size buffer, and if you enter more than that, it either overflows or truncates. Obviously enough, you pay something for that -- the C++ code typically ends up using dynamic allocation (new), which is typically slower than just defining an array. OTOH, if you write C to accomplish the same thing, you end up writing a lot of extra code, and it typically runs about the same speed as the C++ version.
In other words, it's pretty easy to write C that's noticeably faster for things like benchmarks and single-use utilities, but the speed advantage evaporates in real code that has to be robust. In the latter case, about the best you can usually hope for is that the C code is equivalent to a C++ version, and in all honesty doing even that well is fairly unusual (at least IME).
Comparing compilation speed is no easier. On one hand, it's absolutely true that templates can be slow -- at least with most compilers, instantiating templates is quite expensive. On a line-for-line basis, there's no question that C will almost always be faster than anything in C++ that uses templates much. The problem with that is that a line-for-line comparison rarely makes much sense -- 10 lines of C++ may easily be equivalent to hundreds or even thousands of lines of C. As long as you look only at compile time (not development time), the balance probably favors C anyway, but certainly not by nearly as dramatic a margin as might initially seem to be the case. This also depends heavily on the compiler: just for example, clang does a lot better than gcc in this respect (and gcc has improved a lot in the last few years too).
The run time of C++ versus C would suffer only if you use certain C++-specific features. Exceptions and virtual function calls add to run time compared to returning error code and direct calls. On the other hand, if you find yourself using function pointers in C (as, say, GTK does) you are already paying at least some of the price for virtual functions. And checking error code after each function return will consume time too - you don't do it when you use exceptions.
On the other hand, inlining and templates in C++ may allow you to do a lot of work compile-time - work that C defers to run time. In some cases, C++ may end up faster than C.
If you compile the same code as C and C++, there should be no difference.
If you let the compiler do the job for you, like expanding templates, that will take some time. If you do the same in C, with cut-and-paste or some intricate macros, it will take up your time instead.
In some cases, inline expansion of templates will actually result in code that is more specialized and runs faster than the equivalent C code. Like here:
http://www2.research.att.com/~bs/new_learning.pdf
Or this report showing that many C++ features have no runtime cost:
http://www.open-std.org/jtc1/sc22/wg21/docs/TR18015.pdf
Although its an old question I'd like to add my 5cents here, as I'm probably not the only one who finds this question via a search engine.
I can't comment on compilation speed but on execution speed:
To the best of my knowledge, there is only one feature in c++ that costs performance, even if you don't use it. This feature are the c++ exceptions, because they prevent a few compiler optimizations (that is the reason, why noexcept was introduced in c++11). However, if you use some kind of error checking mechanism, then exceptions are probably more efficient than the combination of return value checking and a lot if else statements. This is especially true, if you have to escalate an error up the stack.
Anyway, if you turn off exceptions during compilation, c++ introduces no overhead, except in places where you deliberately use the associated features (e.g. you dont't have to pay for polymorphism if you don't use virtual functions), whereas most features introduce no runtime overhead at all (overloading, templates, namespaces a.s.o.).
On the other hand, most forms of generic code will be much faster in c++ than the equivalent in c, because c++ provides the built-in mechanisms (templates and classes) to do this. A typical example is c's qsort vs c++'s std::sort. The c++ version is usually much faster, because inside sort, the used comparator function is known at compile time, which at least saves a call through a function lookup and in the best case allows a lot of additional compiler optimizations.
That being said, the "problem" with c++ is that it's easy to hide complexity from the user such that seemingly innocent code might be much slower than expected. This is mainly due to operator overloading, polymorphism and constructors/destructors, but even a simple call to a member function hides the passed this-pointer which also isn't a NOP.
Considering operator overloading: When you see a * in c, you know this is (on most architectures) a single, cheap assembler instruction, in c++ on the other hand it can be a complex function call (think about matrix multiplication). That doesn't mean, you could implement the same functionality in c any faster but in c++ you don't directly see that this might be an expensive operation.
Destructors are a similar case: In "modern" c++, you will hardly see any explicit destructions via delete but any local variable that goes out of scope might potentially trigger an expensive call to a (virtual) destructor without a single line of code indicating this (ignoring } of course).
And finally, some people (especially coming from Java) tend to write complex class hierarchies with lots of virtual functions, where each call to such a function is a hidden indirect function call which is difficult or impossible to optimize.
So while hiding complexity from the programmer is in general a good thing it sometimes has an adverse effect on runtime, if the programmer is not aware of the costs of these "easy to use" constructs.
As a summary I would say, that c++ makes it easier for inexperienced programmers to write slow code (because they don't directly see inefficiencies in a program). But c++ also allows good programmers to write "good", correct and fast code faster than with c - which gives them more time to think about optimizations when they really are necessary.
P.S.:
Two things I haven't mentioned (probably among others that I simply forgot) are c++'s ability for complex compile time computations (thanks to templates and constexpr) and c's restrict keyword. This is because didn't use either of them in time critical programs yet and so I can't comment on their general usefulness and the real world performance benefit.
C applications are faster to compile and execute than C++ applications.
C++ applications are generally slower at runtime, and are much slower to compile than C programs.
Look at these links:
C vs C++ (a)
C vs C++ (b)
Efficient C/C++
C++ Compile Time and RunTime Performance

Is it worth writing part of code in C instead of C++ as micro-optimization?

I am wondering if it is still worth with modern compilers and their optimizations to write some critical code in C instead of C++ to make it faster.
I know C++ might lead to bad performance in case classes are copied while they could be passed by reference or when classes are created automatically by the compiler, typically with overloaded operators and many other similar cases; but for a good C++ developer who knows how to avoid all of this, is it still worth writing code in C to improve performance?
I'm going to agree with a lot of the comments. C syntax is supported, intentionally (with divergence only in C99), in C++. Therefore all C++ compilers have to support it. In fact I think it's hard to find any dedicated C compilers anymore. For example, in GCC you'll actually end up using the same optimization/compilation engine regardless of whether the code is C or C++.
The real question is then, does writing plain C code and compiling in C++ suffer a performance penalty. The answer is, for all intents and purposes, no. There are a few tricky points about exceptions and RTTI, but those are mainly size changes, not speed changes. You'd be so hard pressed to find an example that actually takes a performance hit that it doesn't seem worth it do write a dedicate module.
What was said about what features you use is important. It is very easy in C++ to get sloppy about copy semantics and suffer huge overheads from copying memory. In my experience this is the biggest cost -- in C you can also suffer this cost, but not as easily I'd say.
Virtual function calls are ever so slightly more expensive than normal functions. At the same time forced inline functions are cheaper than normal function calls. In both cases it is likely the cost of pushing/popping parameters from the stack that is more expensive. Worrying about function call overhead though should come quite late in the optimization process -- as it is rarely a significant problem.
Exceptions are costly at throw time (in GCC at least). But setting up catch statements and using RAII doesn't have a significant cost associated with it. This was by design in the GCC compiler (and others) so that truly only the exceptional cases are costly.
But to summarize: a good C++ programmer would not be able to make their code run faster simply by writing it in C.
measure! measure before thinking about optimizing, measure before applying optimization, measure after applying optimization, measure!
If you must run your code 1 nanosecond faster (because it's going to be used by 1000 people, 1000 times in the next 1000 days and that second is very important) anything goes.
Yes! it is worth ...
changing languages (C++ to C; Python to COBOL; Mathlab to Fortran; PHP to Lisp)
tweaking the compiler (enable/disable all the -f options)
use different libraries (even write your own)
etc
etc
What you must not forget is to measure!.
pmg nailed it. Just measure instead of global assumptions. Also think of it this way, compilers like gcc separate the front, middle, and back end. so the frontend fortran, c, c++, ada, etc ends up in the same internal middle language if you will that is what gets most of the optimization. Then that generic middle language is turned into assembler for the specific target, and there are target specific optimizations that occur. So the language may or may not induce more code from the front to middle when the languages differ greatly, but for C/C++ I would assume it is the same or very similar. Now the binary size is another story, the libraries that may get sucked into the binary for C only vs C++ even if it is only C syntax can/will vary. Doesnt necessarily affect execution performance but can bulk up the program file costing storage and transfer differences as well as memory requirements if the program loaded as a while into ram. Here again, just measure.
I also add to the measure comment compile to assembler and/or disassemble the output and compare the results of your different languages/compiler choices. This can/will supplement the timing differences you see when you measure.
The question has been answered to death, so I won't add to that.
Simply as a generic question, assuming you have measured, etc, and you have identified that a certain C++ (or other) code segment is not running at optimal speed (which generally means you have not used the right tool for the job); and you know you can get better performance by writing it in C, then yes, definitely, it is worth it.
There is a certain mindset that is common, trying to do everything from one tool (Java or SQL or C++). Not just Maslow's Hammer, but the actual belief that they can code a C construct in Java, etc. This leads to all kinds of performance problems. Architecture, as a true profession, is about placing code segments in the appropriate architectural location or platform. It is the correct combination of Java, SQL and C that will deliver performance. That produces an app that does not need to be re-visited; uneventful execution. In which case, it will not matter if or when C++ implements this constructors or that.
I am wondering if it is still worth with modern compilers and their optimizations to write some critical code in C instead of C++ to make it faster.
no. keep it readable. if your team prefers c++ or c, prefer that - especially if it is already functioning in production code (don't rewrite it without very good reasons).
I know C++ might lead to bad performance in case classes are copied while they could be passed by reference
then forbid copying and assigning
or when classes are created automatically by the compiler, typically with overloaded operators and many other similar cases
could you elaborate? if you are referring to templates, they don't have additional cost in runtime (although they can lead to additional exported symbols, resulting in a larger binary). in fact, using a template method can improve performance if (for example) a conversion would otherwise be necessary.
but for a good C++ developer who knows how to avoid all of this, is it still worth writing code in C to improve performance?
in my experience, an expert c++ developer can create a faster, more maintainable program.
you have to be selective about the language features that you use (and do not use). if you break c++ features down to the set available in c (e.g., remove exceptions, virtual function calls, rtti) then you're off to a good start. if you learn to use templates, metaprogramming, optimization techniques, avoid type aliasing (which becomes increasingly difficult or verbose in c), etc. then you should be on par or faster than c - with a program which is more easily maintained (since you are familiar with c++).
if you're comfortable using the features of c++, use c++. it has plenty of features (many of which have been added with speed/cost in mind), and can be written to be as fast as c (or faster).
with templates and metaprogramming, you could turn many runtime variables into compile-time constants for exceptional gains. sometimes that goes well into micro-optimization territory.

F# performance in scientific computing

I am curious as to how F# performance compares to C++ performance? I asked a similar question with regards to Java, and the impression I got was that Java is not suitable for heavy numbercrunching.
I have read that F# is supposed to be more scalable and more performant, but how is this real-world performance compares to C++? specific questions about current implementation are:
How well does it do floating-point?
Does it allow vector instructions
how friendly is it towards optimizing
compilers?
How big a memory foot print does it have? Does it allow fine-grained control over memory locality?
does it have capacity for distributed
memory processors, for example Cray?
what features does it have that may be of interest to computational science where heavy number processing is involved?
Are there actual scientific computing
implementations that use it?
Thanks
I am curious as to how F# performance compares to C++ performance?
Varies wildly depending upon the application. If you are making extensive use of sophisticated data structures in a multi-threaded program then F# is likely to be a big win. If most of your time is spent in tight numerical loops mutating arrays then C++ might be 2-3× faster.
Case study: Ray tracer My benchmark here uses a tree for hierarchical culling and numerical ray-sphere intersection code to generate an output image. This benchmark is several years old and the C++ code has been improved upon dozens of times over the years and read by hundreds of thousands of people. Don Syme at Microsoft managed to write an F# implementation that is slightly faster than the fastest C++ code when compiled with MSVC and parallelized using OpenMP.
I have read that F# is supposed to be more scalable and more performant, but how is this real-world performance compares to C++?
Developing code is much easier and faster with F# than C++, and this applies to optimization as well as maintenance. Consequently, when you start optimizing a program the same amount of effort will yield much larger performance gains if you use F# instead of C++. However, F# is a higher-level language and, consequently, places a lower ceiling on performance. So if you have infinite time to spend optimizing you should, in theory, always be able to produce faster code in C++.
This is exactly the same benefit that C++ had over Fortran and Fortran had over hand-written assembler, of course.
Case study: QR decomposition This is a basic numerical method from linear algebra provided by libraries like LAPACK. The reference LAPACK implementation is 2,077 lines of Fortran. I wrote an F# implementation in under 80 lines of code that achieves the same level of performance. But the reference implementation is not fast: vendor-tuned implementations like Intel's Math Kernel Library (MKL) are often 10x faster. Remarkably, I managed to optimize my F# code well beyond the performance of Intel's implementation running on Intel hardware whilst keeping my code under 150 lines of code and fully generic (it can handle single and double precision, and complex and even symbolic matrices!): for tall thin matrices my F# code is up to 3× faster than the Intel MKL.
Note that the moral of this case study is not that you should expect your F# to be faster than vendor-tuned libraries but, rather, that even experts like Intel's will miss productive high-level optimizations if they use only lower-level languages. I suspect Intel's numerical optimization experts failed to exploit parallelism fully because their tools make it extremely cumbersome whereas F# makes it effortless.
How well does it do floating-point?
Performance is similar to ANSI C but some functionality (e.g. rounding modes) is not available from .NET.
Does it allow vector instructions
No.
how friendly is it towards optimizing compilers?
This question does not make sense: F# is a proprietary .NET language from Microsoft with a single compiler.
How big a memory foot print does it have?
An empty application uses 1.3Mb here.
Does it allow fine-grained control over memory locality?
Better than most memory-safe languages but not as good as C. For example, you can unbox arbitrary data structures in F# by representing them as "structs".
does it have capacity for distributed memory processors, for example Cray?
Depends what you mean by "capacity for". If you can run .NET on that Cray then you could use message passing in F# (just like the next language) but F# is intended primarily for desktop multicore x86 machines.
what features does it have that may be of interest to computational science where heavy number processing is involved?
Memory safety means you do not get segmentation faults and access violations. The support for parallelism in .NET 4 is good. The ability to execute code on-the-fly via the F# interactive session in Visual Studio 2010 is extremely useful for interactive technical computing.
Are there actual scientific computing implementations that use it?
Our commercial products for scientific computing in F# already have hundreds of users.
However, your line of questioning indicates that you think of scientific computing as high-performance computing (e.g. Cray) and not interactive technical computing (e.g. MATLAB, Mathematica). F# is intended for the latter.
In addition to what others said, there is one important point about F# and that's parallelism. The performance of ordinary F# code is determined by CLR, although you may be able to use LAPACK from F# or you may be able to make native calls using C++/CLI as part of your project.
However, well-designed functional programs tend to be much easier to parallelize, which means that you can easily gain performance by using multi-core CPUs, which are definitely available to you if you're doing some scientific computing. Here are a couple of relevant links:
F# and Task-Parallel library (blog by Jurgen van Gael, who is doing machine-learning stuff)
Another interesting answer at SO regarding parllelism
An example of using Parallel LINQ from F#
Chapter 14 of my book discusses parallelism (source code is available)
Regarding distributed computing, you can use any distributed computing framework that's available for the .NET platform. There is a MPI.NET project, which works well with F#, but you may be also able to use DryadLINQ, which is a MSR project.
Some articles: F# MPI tools for .NET, Concurrency with MPI.NET
DryadLINQ project hompepage
F# does floating point computation as fast as the .NET CLR will allow it. Not much difference from C# or other .NET languages.
F# does not allow vector instructions by itself, but if your CLR has an API for these, F# should not have problems using it. See for instance Mono.
As far as I know, there is only one F# compiler for the moment, so maybe the question should be "how good is the F# compiler when it comes to optimisation?". The answer is in any case "potentially as good as the C# compiler, probably a little bit worse at the moment". Note that F# differs from e.g. C# in its support for inlining at compile time, which potentially allows for more efficient code which rely on generics.
Memory foot prints of F# programs are similar to that of other .NET languages. The amount of control you have over allocation and garbage collection is the same as in other .NET languages.
I don't know about the support for distributed memory.
F# has very nice primitives for dealing with flat data structures, e.g. arrays and lists. Look for instance at the content of the Array module: map, map2, mapi, iter, fold, zip... Arrays are popular in scientific computing, I guess due to their inherently good memory locality properties.
For scientific computation packages using F#, you may want to look at what Jon Harrop is doing.
As with all language/performance comparisons, your mileage depends greatly on how well you can code.
F# is a derivative of OCaml. I was surprised to find out that OCaml is used a lot in the financial world, where number crunching performance is very important. I was further surprised to find out that OCaml is one of the faster languages, with performance on par with the fastest C and C++ compilers.
F# is built on the CLR. In the CLR, code is expressed in a form of bytecode called the Common Intermediate Language. As such, it benefits from the optimizing capabilities of the JIT, and has performance comparable to C# (but not necessarily C++), if the code is written well.
CIL code can be compiled to native code in a separate step prior to runtime by using the Native Image Generator (NGEN). This speeds up all later runs of the software as the CIL-to-native compilation is no longer necessary.
One thing to consider is that functional languages like F# benefit from a more declarative style of programming. In a sense, you are over-specifying the solution in imperative languages such as C++, and this limits the compiler's ability to optimize. A more declarative programming style can theoretically give the compiler additional opportunities for algorithmic optimization.
It depends on what kind of scientific computing you are doing.
If you are doing traditional heavy computing, e.g. linear algebra, various optimizations, then you should not put your code in .Net framework, at least not suitable in F#. Because this is at the algorithm level, most of the algorithms must be coded in an imperative languages to have good performance in running time and memory usage. Others mentioned parallel, I must say it is probably useless when you doing low level stuff like parallel an SVD implementation. Because when you know how to parallel an SVD, you simply won't use an high level languages, Fortran, C or modified C(e.g. cilk) are your friends.
However, a lot of the scientific computing today is not of this kind, which is some kind of high level applications, e.g. statistical computing and data mining. In these tasks, aside from some linear algebra, or optimization, there are also a lot of data flows, IOs, prepossessing, doing graphics, etc. For these tasks, F# is really powerful, for its succinctness, functional, safety, easy to parallel, etc.
As others have mentioned, .Net well supports Platform Invoke, actually quite a few projects inside MS are use .Net and P/Invoke together to improve the performance at the bottle neck.
I don't think that you'll find a lot of reliable information, unfortunately. F# is still a very new language, so even if it were ideally suited for performance heavy workloads there still wouldn't be that many people with significant experience to report on. Furthermore, performance is very hard to accurately gauge and microbenchmarks are hard to generalize. Even within C++, you can see dramatic differences between compilers - are you wondering whether F# is competitive with any C++ compiler, or with the hypothetical "best possible" C++ executable?
As to specific benchmarks against C++, here are some possibly relevant links: O'Caml vs. F#: QR decomposition; F# vs Unmanaged C++ for parallel numerics. Note that as an author of F#-related material and as the vendor of F# tools, the writer has a vested interest in F#'s success, so take these claims with a grain of salt.
I think it's safe to say that there will be some applications where F# is competitive on execution time and likely some others where it isn't. F# will probably require more memory in most cases. Of course the ultimate performance will also be highly dependent on the skill of the programmer - I think F# will almost certainly be a more productive language to program in for a moderately competent programmer. Furthermore, I think that at the moment, the CLR on Windows performs better than Mono on most OSes for most tasks, which may also affect your decisions. Of course, since F# is probably easier to parallelize than C++, it will also depend on the type of hardware you're planning to run on.
Ultimately, I think that the only way to really answer this question is to write F# and C++ code representative of the type of calculations that you want to perform and compare them.
Here are two examples I can share:
Matrix multiplication:
I have a blog post comparing different matrix multiplication implementations.
LBFGS
I have a large scale logistic regression solver using LBFGS optimization, which is coded in C++. The implementation is well tuned. I modified some code to code in C++/CLI, i.e. I compiled the code into .Net. The .Net version is 3 to 5 times slower than the naive compiled one on different datasets. If you code LBFGS in F#, the performance can not be better than C++/CLI or C#, (but would be very close).
I have another post on Why F# is the language for data mining, although not quite related to the performance issue you concern here, it is quite related to scientific computing in F#.
If I say "ask again in 2-3 years" I think that will answer your question completely :-)
First, don't expect F# to be any different than C# perf-wise, unless you are doing some convoluted recursions on purpose and I'd guess you are not since you asked about numerics.
Floating-point wise it is bound to be better than Java since CLR doesn't aim at cross-platform uniformity, meaning that JIT will go to 80-bits whenever it can. On the other side you don't control over that beyond watching the number of variables to make sure there's enough FP registers.
Vector-wise, if you scream loud enough maybe something happens in 2-3 yr since Direct3D is entering .NET as a general API anyway and C# code done in XNA runs on Xbox whihc is as close to the bare metal you can get with CLR. That still means that you'd need do so some intermediary code on your own.
So don't expect CUDA or even ability to just link NVIDIA libs and get going. You'd have much more luck trying that approach with Haskell if for some reason you really, really need a "functional" language since Haskell was designed to be linking-friendly out of pure necessity.
Mono.Simd has been mentioned already and while it should be back-portable to CLR it might be quite some work to actually do it.
There,s quite some code in a social.msdn posting on using SSE3 in .NET, vith C++/CLI and C#, come array blitting, injecting SSE3 code for perf etc.
There was some talk about running CECIL on compiled C# to extract parts into HLSL, compile into shaders and link a glue code to schedule it (CUDA is doing the equivalent anyway) but I don't think that there's anything runnable coming out of that.
A thing that might be worth more to you if you want to try something soon is PhysX.Net on codeplex. Don't expect it to just unpack and do the magic. However, ih has currently active author and the code is both normal C++ and C++/CLI and yopu can probably get some help from the author if you want to go into details and maybe use similar approach for CUDA. For full speed CUDA you'll still need to compile your own kernels and then just interface to .NET so the easier that part goes the happier you are going to be.
There is a CUDA.NET lib which is supposed to be free but the page gives just e-mail address so expect some strings attached, and while the author writes a blog he's not particularly talkative about what's inside the lib.
Oh and if you have the budget yo might give that Psi Lambda a look (KappaCUDAnet is the .NET part). Apparently they are going to jack up the prices in Nov (if it's not a sales trick :-)
Firstly C is significantly faster than C++.. So if you need so much speed you should make the lib etc in c.
With regards to F# most bench marks use Mono which is up to 2 * slower than MS CLR due t partially to its use of the boehm GC ( they have a new GC and LVVM but these are still immature dont support generics etc).
.NEt languages itself are compiled to an IR ( the CIL) which compile to native code as efficiently as C++. There is one problem set that most GC languages suffer in and that is large amounts of mutable writes ( this includes C++ .NET as mentioned above) . And there is a certain scientific problem set that requires this , these when needed should probably use a native library or use the Flyweight pattern to reuse objects from a pool ( which reduces writes) . The reason is there is a write barrier in the .NET CLR where when updating a reference field (including a box) it will set a bit in a table saying this table is modified . If your code consists of lots of such writes it will suffer.
That said a .NET app like C# using lots of static code , structs and ref/out on the structs can produce C like performance but it is very difficult to code like this or maintain the code ( like C) .
Where F# shines however is parralelism over immutable data which goes hand and hand with more read based problems. Its worth noting most benchmarks are much higher in mutable writes than real life applications.
With regard to floating point , you should use an alternative lib ( ie the .Net one) to the oCaml ones due to it being slow. C/C++ allows faster for lower precision which oCaml doesnt by default.
Lastly i woudl argue a high level language like C#, F# and proper profiling will give you betetr pefromance than c and C++ for the same developer time. If you change a bottle neck to a c lib pinvoke call you will also end up with C like performance for critical areas. That said if you have unlimited budget and care more about speed then maintenance than C is the way to go ( not C++) .
Last I knew, most scientific computing was still done in FORTRAN. It's still faster than anything else for linear algebra problems - not Java, not C, not C++, not C#, not F#. LINPACK is nicely optimized.
But the remark about "your mileage may vary" is true of all benchmarks. Blanket statements (except mine) are rarely true.

Will my iPhone app take a performance hit if I use Objective-C for low level code?

When programming a CPU intensive or GPU intensive application on the iPhone or other portable hardware, you have to make wise algorithmic decisions to make your code fast.
But even great algorithm choices can be slow if the language you're using performs more poorly than another.
Is there any hard data comparing Objective-C to C++, specifically on the iPhone but maybe just on the Mac desktop, for performance of various similar language aspects? I am very familiar with this article comparing C and Objective-C, but this is a larger question of comparing two object oriented languages to each other.
For example, is a C++ vtable lookup really faster than an Obj-C message? How much faster? Threading, polymorphism, sorting, etc. Before I go on a quest to build a project with duplicate object models and various test code, I want to know if anybody has already done this and what the results where. This type of testing and comparison is a project in and of itself and can take a considerable amount of time. Maybe this isn't one project, but two and only the outputs can be compared.
I'm looking for hard data, not evangelism. Like many of you I love and hate both languages for various reasons. Furthermore, if there is someone out there actively pursuing this same thing I'd be interesting in pitching in some code to see the end results, and I'm sure others would help out too. My guess is that they both have strengths and weaknesses, my goal is to find out precisely what they are so that they can be avoided/exploited in real-world scenarios.
Mike Ash has some hard numbers for performance of various Objective-C method calls versus C and C++ in his post "Performance Comparisons of Common Operations". Also, this post
by Savoy Software is an interesting read when it comes to tuning the performance of an iPhone application by using Objective-C++.
I tend to prefer the clean, descriptive syntax of Objective-C over Objective-C++, and have not found the language itself to be the source of my performance bottlenecks. I even tend to do things that I know sacrifice a little bit of performance if they make my code much more maintainable.
Yes, well written C++ is considerably faster. If you're writing performance critical programs and your C++ is not as fast as C (or within a few percent), something's wrong. If your ObjC implementation is as fast as C, then something's usually wrong -- i.e. the program is likely a bad example of ObjC OOD because it probably uses some 'dirty' tricks to step below the abstraction layer it is operating within, such as direct ivar accesses.
The Mike Ash 'comparison' is very misleading -- I would never recommend the approach to compare execution times of programs you have written, or recommend it to compare C vs C++ vs ObjC. The results presented are provided from a test with compiler optimizations disabled. A program compiled with optimizations disabled is rarely relevant when you are measuring execution times. To view it as a benchmark which compares C++ against Objective-C is flawed. The test also compares individual features, rather than entire, real world optimized implementations -- individual features are combined in very different ways with both languages. This is far from a realistic performance benchmark for optimized implementations. Examples: With optimizations enabled, IMP cache is as slow as virtual function calls. Static dispatch (as opposed to dynamic dispatch, e.g. using virtual) and calls to known C++ types (where dynamic dispatch may be bypassed) may be optimized aggressively. This process is called devirtualization, and when it is used, a member function which is declared virtual may even be inlined. In the case of the Mike Ash test where many calls are made to member functions which have been declared virtual and have empty bodies: these calls are optimized away entirely when the type is known because the compiler sees the implementation and is able to determine dynamic dispatch is unnecessary. The compiler can also eliminate calls to malloc in optimized builds (favoring stack storage). So, enabling compiler optimizations in any of C, C++, or Objective-C can produce dramatic differences in execution times.
That's not to say the presented results are entirely useless. You could get some useful information about external APIs if you want to determine if there are measurable differences between the times they spend in pthread_create or +[NSObject alloc] on one platform or architecture versus another. Of course, these two examples will be using optimized implementations in your test (unless you happen to be developing them). But for comparing one language to another in programs you compile… the presented results are useless with optimizations disabled.
Object Creation
Consider also object creation in ObjC - every object is allocated dynamically (e.g. on the heap). With C++, objects may be created on the stack (e.g. approximately as fast as creating a C struct and calling a simple function in many cases), on the heap, or as elements of abstract data types. Each time you allocate and free (e.g. via malloc/free), you may introduce a lock. When you create a C struct or C++ object on the stack, no lock is required (although interior members may use heap allocations) and it often costs just a few instructions or a few instructions plus a function call.
As well, ObjC objects are reference counted instances. The actual need for an object to be a std::shared_ptr in performance critical C++ is very rare. It's not necessary or desirable in C++ to make every instance a shared, reference counted instance. You have much more control over ownership and lifetime with C++.
Arrays and Collections
Arrays and many collections in C and C++ also use strongly typed containers and contiguous memory. Since the address of the next element's members are often known, the optimizer can do much more, and you have great cache and memory locality. With ObjC, that's far from reality for standard objects (e.g. NSObject).
Dispatch
Regarding methods, many C++ implementations use few virtual/dynamic calls, particularly in highly optimized programs. These are static method calls and fodder for the optimizers.
With ObjC methods, each method call (objc message send) is dynamic, and is consequently a firewall for the optimizer. Ultimately, that results in many restrictions or inconveniences regarding what you can and cannot do to keep performance at a minimum when writing performance critical ObjC. This may result in larger methods, IMP caching, frequent use of C.
Some realtime applications cannot use any ObjC messaging in their render paths. None -- audio rendering is a good example of this. ObjC dispatch is simply not designed for realtime purposes; Allocations and locks may happen behind the scenes when messaging objects, making the complexity/time of objc messaging unpredictable enough that the audio rendering may miss its deadline.
Other Features
C++ also provides generics/template implementations for many of its libraries. These optimize very well. They are typesafe, and a lot of inlining and optimizations may be made with templates (consider it polymorphism, optimization, and specialization which takes place at compilation). C++ adds several features which just are not available or comparable in strict ObjC. Trying to directly compare langs, objects, and libraries which are very different is not so useful -- it's a very small subset of actual realizations. It's better to expand the question to a library/framework or real program, considering many aspects of design and implementation.
Other Points
C and C++ symbols can be more easily removed and optimized away in various stages of the build (stripping, dead code elimination, inlining and early inlining, as well as Link Time Optimization). The benefits of this include reduced binary sizes, reduced launch/load times, reduced memory consumption, etc.. For a single app, that may not be such a big deal; but if you reuse a lot of code, and you should, then your shared libraries could add a lot of unnecessary weight to the program, if implemented ObjC -- unless you are prepared to jump through some flaming hoops. So scalability and reuse are also factors in medium/large projects, and groups where reuse is high.
Included Libraries
ObjC library implementors also optimize for the environment, so its library implementors can make use of some language and environment features to offer optimized implementations. Although there are some pretty significant restrictions when writing an optimized program in pure ObjC, some highly optimized implementations exist in Cocoa. This is one of Cocoa's strong points, although the C++ standard library (what some people call the STL) is no slouch either. Cocoa operates at a much higher level of abstraction than C++ -- if you don't know well what you're doing (or should be doing), operating closer to the metal can really cost you. Falling back on to a good library implementation if you are not an expert in some domain is a good thing, unless you are really prepared to learn. As well, Cocoa's environments are limited; you can find implementations/optimizations which make better use of the OS.
If you're writing optimized programs and have experience doing so in both C++ and ObjC, clean C++ implementations will often be twice as fast or faster than clean ObjC (yes, you can compare against Cocoa). If you know how to optimize, you can often do better than higher level, general purpose abstractions. Although, some optimized C++ implementations will be as fast as or slower than Cocoa's (e.g. my initial attempt at file I/O was slower than Cocoa's -- primarily because the C++ implementation initializes its memory).
A lot of it comes down to the language features you are familiar with. I use both langs, they both have different strengths and models/patterns. They complement each other quite well, and there are great libraries for both. If you're implementing a complex, performance critical program, correct use of C++'s features and libraries will give you much more control and provide significant advantages for optimization, such that in the right hands, "several times faster" is a good default expectation (don't expect to win every time, or without some work, however). Remember, it takes years to understand C++ well enough to really reach that point.
I keep the majority of my performance critical paths as C++, but also recognize that ObjC is also a very good solution for some problems, and that there are some very good libraries available.
It's very hard to collect "hard data" for this that's not misguiding.
The biggest problem with doing a feature-to-feature comparison like you suggest is that the two languages encourage very different coding styles. Objective-C is a dynamic language with duck typing, where typical C++ usage is static. The same object-oriented architecture problem would likely have very different ideal solutions using C++ or Objective-C.
My feeling (as I have programmed much in both languages, mostly on huge projects): To maximize Objective-C performance, it has to be written very close to C. Whereas with C++, it's possible to make much more use of the language without any performance penalty compared to C.
Which one is better? I don't know. For pure performance, C++ will always have the edge. But the OOP style of Objective-C definitely has its merits. I definitely think it is easier to keep a sane architecture with it.
This really isn't something that can be answered in general as it really depends on how you use the language features. Both languages will have things that they are fast at, things that they are slow at, and things that are sometimes fast and sometimes slow. It really depends on what you use and how you use it. The only way to be certain is to profile your code.
In Objective C you can also write c++ code, so it might be easier to code in Objective C for the most part, and if you find something that doesn't perform well in it, then you can have a go at writting a c++ version of it and seeing if that helps (C++ tends to optimize better at compile time). Objective C will be easier to use if APIs you are interfacing with are also written in it, plus you might find it's style of OOP is easier or more flexible.
In the end, you should go with what you know you can write safe, robust code in and if you find an area that needs special attention from the other language, then you can swap to that. X-Code does allow you to compile both in the same project.
I have a couple of tests I did on an iPhone 3G almost 2 years ago, there was no documentation or hard numbers around in those days. Not sure how valid they still are but the source code is posted and attached.
This isn't a very extensive test, I was mainly interested in NSArray vs C Array for iterating a large number of objects.
http://memo.tv/nsarray_vs_c_array_performance_comparison
http://memo.tv/nsarray_vs_c_array_performance_comparison_part_ii_makeobjectsperformselector
You can see the C Array is much faster at high iterations. Since then I've realized that the bottleneck is probably not the iteration of the NSArray but the sending of the message. I wanted to try methodForSelector and calling the methods directly to see how big the difference would be but never got round to it. According to Mike Ash's benchmarks it's just over 5x faster.
I don't have hard data for Objective C, but I do have a good place to look for C++.
C++ started as C with Classes according to Bjarne Stroustroup in his reflection on the early years of C++ (http://www2.research.att.com/~bs/hopl2.pdf), so C++ can be thought of (like Objective C) as pushing C to its limits for object orientation.
What are those limits? In the 1994-1997 time frame, a lot of researchers figured out that object-orientation came at a cost due to dynamic binding, e.g. when C++ functions are marked virtual and there may/may not be children classes that override these functions. (In Java and C#, all functions expect ctors are inherently virtual, and there isnt' much you can do about it.) In "A Study of Devirtualization Techniques for a Java Just-In-Time Compiler" from researchers at IBM Research Tokyo, they contrast the techniques used to deal with this, including one from Urz Hölzle and Gerald Aigner. Urz Hölzle, in a separate paper with Karel Driesen, had shown that on average 5.7% of time in C++ programs (and up to ~50%) was spent in calling virtual functions (e.g. vtables + thunks). He later worked with some Smalltalk researachers in what ended up the Java HotSpot VM to solve these problems in OO. Some of these features are being backported to C++ (e.g. 'protected' and Exception handling).
As I mentioned, C++ is static typed where Objective C is duck typed. The performance difference in execution (but not lines of code) probably is a result of this difference.
This study says to really get the performance in a CPU intensive game, you have to use C. The linked article is complete with a XCode project that you can run.
I believe the bottom line is: Use Objective-C where you must interact with the iPhone's functions (after all, putting trampolines everywhere can't be good for anyone), but when it comes to loops, things like vector object classes, or intensive array access, stick with C++ STL or C arrays to get good performance.
I mean it would be totally silly to see position = [[Vector3 alloc] init] ;. You're just asking for a performance hit if you use references counts on basic objects like a position vector.
yes. c++ reign supreme in performance/expresiveness/resource tradeoff.
"I'm looking for hard data, not evangelism". google is your best friend.
obj-c nsstring is swapped with c++'s by apple enginneers for performance. in a resource constrained devices, only c++ cuts it as a MAINSTREAM oop language.
NSString stringWithFormat is slow
obj-c oop abstraction is deconstructed into procedural-based c-structs for performance, otherwise a MAGNITUDE order slower than java! the author is also aware of message caching - yet no-go. so modeling lots of small players/enemies objects is done in oop with c++ or else, lots of Procedural structs with a simple OOP wrapper around it with obj-c. there can be one paradigm that equates Procedural + Object-Oriented Programming = obj-c.
http://ejourneyman.wordpress.com/2008/04/23/writing-a-ray-tracer-for-cocoa-objective-c/