Mixing assembler code with c/c++

Mixing assembler code with c/c++ - c++

Why is assembly language code often needed along with C/C++ ?
What can't be done in C/C++, which is possible when assembly language code is mixed?
I have some source code of some 3D computer games. There are a lot of assembler code in use.

Things that pop to mind, in no particular order:
Special instructions. In an embedded application, I need to invalidate the cache after a DMA transfer has filled the memory buffer. The only way to do that on an SH-4 CPU is to execute a special instruction, so inline assembly (or a free-standing assembly function) is the only way to go.
Optimizations. Once upon a time, it was common for compilers to not know every trick that was possible to do. In some of those cases, it was worth the effort to replace an inner loop with a hand-crafted version. On the kinds of CPUs you find in small embedded systems (think 8051, PIC, and so forth) it can be valuable to push inner loops into assembly. I will emphasize that for modern processors with pipelines, multi-issue execution, extensive caching and more, it is often exceptionally difficult for hand coding to even approach the capabilities of the optimizer.
Interrupt handling. In an embedded application it is often needed to catch system events such as interrupts and exceptions. It is often the case that the first few instructions executed by an interrupt have special responsibilities and the only way to guarantee that the right things happen is to write the outer layer of a handler in assembly. For example, on a ColdFire (or any descendant of the 68000) only the very first instruction is guaranteed to execute. To prevent nested interrupts, that instruction must modify the interrupt priority level to mask out the priority of the current interrupt.
Certain portions of an OS kernel. For example, task switching requires that the execution state (at least most registers including PC and stack pointer) be saved for the current task and the state loaded for the new task. Fiddling with execution state of the CPU is well outside of the feature set of the language, but can be wrapped in a small amount of assembly code in a way that allows the rest of the kernel to be written in C or C++.
Edit: I've touched up the wording about optimization. Let me emphasize that for targets with large user populations and well supported compilers with decent optimization, it is highly unlikely that an assembly coder can beat the performance of the optimizer.
Before attempting, start by careful profiling to determine where the bottlenecks really lie. With that information in hand, examine assumptions and algorithms carefully, because the best optimization of all is usually to find a better way to handle the larger picture. Then, if all else fails, isolate the bottleneck in a test case, benchmark it carefully, and begin tweaking in assembly.

Why is assembly language code often
needed along with C/C++ ?
Competitive advantage. Like, if you are writing software for the (soon-to-be) #1 gaming company in the world.
What can't be done in C/C++, which is
possible when assembly language code
is mixed?
Nothing, unless some absolute performance level is needed, say, X frames per second or Y billions of polygons per second.
Edit: based on other replies, it seems the consensus is that embedded systems (iPhone, Android etc) have hardware accelerators that certainly require the use of assembly.
I have some source code of some 3D
computer games. There are a lot of
assembler code in use.
They are either written in the 80's-90's, or they are used sparingly (maybe 1% - 5% of total source code) inside a game engine.
Edit: to this date, compiler auto-vectorization quality is still poor. So, you may see programs that contain vectorization intrinsics, and since it's not really much different from writing in actual assembly (most intrinsics have one-one mapping to assembly instructions) some folks might just decide to write in assembly.
Update:
According to anecdotal evidence, RollerCoaster Tycoon is written in 99% assembly.
http://www.chrissawyergames.com/faq3.htm

In the past, compilers used to be pretty poor at optimizing for a particular architecture, and architectures used to be simpler. Now the reverse is true. These days, it's pretty hard for a human to write better assembly than an optimizing compiler, for deeply-pipelined, branch-predicting processors. And so you won't see it much. What there is will be short, and highly targeted.
In short, you probably won't need to do this. If you think you do, profile your code to make sure you've identified a hotspot - don't optimize something just because it's slow, if you're only spending 0.1% of your execution time there. See if you can improve your design or algorithm. If you don't find any improvement there, or if you need functionality not exposed by your higher-level language, look into hand-coding assembly.

There are certain things that can only be done in assembler and cannot be done in C/C++.
These include:
generating software interrupts (SWI or INT instructions)
Use of instructions like SWP for creating mutexes
specialist coporcessor instructions (such as those needed to program the MMU and manage RAM caches)
Access to carry and overflow flags.
You may also be able to optimize code better in assembler than C/C++ (eg memcpy on Android is written in assembler)

There may be new instructions that your compiler cannot yet generate, or the compiler does a bad job, or you may need to control the CPU directly.

Why is assembly language code often
needed along with C/C++ ?needed along with C/C++ ?
It isn't
What can't be done in C/C++, which is
possible when assembly language code
is mixed?
Accessing system registers or IO ports on the CPU.
Accessing BIOS functions.
Using specialized instructions that doesn't map directly to the programming language,
e.g. SIMD instructions.
Provide optimized code that's better than the compiler produces.
The two first points you usually don't need unless you're writing an operating system, or code
running without an operatiing system.
Modern CPUs are quite complex, and you'll be hard pressed to find people that actually can write assembly than what the compiler produces. Many compilers come with libraries giving you access
to more advanced features, like SIMD instructions, so nowadays you often don't need to fall back to
assembly for that.

One more thing worth mentioning is:
C & C++ do not provide any convenient way to setup stack frames when one needs to implement a binary level interop with a script language - or to implement some kind of support for closures.

Assembly can be very optimal than what any compiler can generate in certain situations.

Related

Convert Freepascal function to assembly?

Due to performance issues, I'd like to attempt to convert a Freepascal function (SHA1Update, from the SHA1 unit) to assembly. I use Freepascal 2.6.4 and Lazxarus 1.2.4.
The reason is, I have a loop structure (repeat...until) that reads 64Kb blocks of raw data from disk into a buffer, and then it is hashed. Without the hashing, I can read the disk at 4Gb p\min. With the hashing, it slows to just over 1Gb p\min. So someone suggested converting the hashing routine to assembly.
I am a below average programmer when using high-level languages, let alone assembly, but the potential for performance improvement is drving me to at least enquire.
So my question is : is there a program or script that can take a procedure or function and magically convert it to assembly that I can then compile using the Freepascal compiler? I know it can be done for C\C++ using even web based system like this one

Assembly is indeed what you would use for optimising selected sequences of code. But, because native code compilers generate machine code, usually using an intermediate assembly source representation, which is then run through an assembler, the advantage you gain from using a compiler to "magically convert" your section of code, subject to optimisation, to assembly which then is linked to the rest of the program, compared to simply compiling the whole program with the compiler, is about zero - you're using the same compiler for converting, after all. From that angle, a compiler is nothing else than such a program which "magically converts it to assembly". For optimisation purposes, you want to hand code those section of code - and you need to be good at it. Many compilers generate code nowadays which performs better than non-expert crafted code, for various reasons. One is that target CPUs are very different in what is best performing code for them, and the rules to determine how efficient code for a specific CPU must look like, are often extremely complex. As a hand coder, you need to know the differences between them, to know how to write code which performs well. This knowledge is something many compilers have, and are therefore able to generate code such that one or another CPU architecture or model can benefit from the differences the compiler puts into code generation.
Often much better performance gains can be achieved by choosing more efficient algorithms. A better algorithm, coded in high level, usually outperforms a less adequate algorithm, hand coded in assembly. Therefore I'd look into possibilities to make the hashing process as such faster, by looking at alternative and faster algorithms, rather than trying to improve speed using assembly at this stage - consider assembly optimisation as a last, final step optimisation, when other means to speed up your code have been exhausted.

As #Bushmills already explained your code is converted to assembly automatically by the FreePascal compiler - before producing the machine code in the Portable Executable (*.exe) format.
What you would need is not the assembly language, but hand-optimized code written in assembly language. This is task for experienced assembly programmer. You can 1) become an assembly language expert by yourself, this Stack Overflow question can give you some starting points: A good NASM/FASM tutorial?
My guess is that any programmer can become an assembly language expert (either CISC or RISC architectures) in about a year. Depending on your previous experience and the courses you'd take and your eagerness. For theoretical background (processor-neutral) I'd recommend Donald Knuth's MMIX lectures
You should be able to 2) see the intermediate assembly files produced by the FreePascal compiler by following instructions in this: http://free-pascal-general.1045716.n5.nabble.com/Assembler-file-generate-by-compiler-td5710837.html discussion
If you want to really move on in a reasonable time-frame then I'd suggest you to create Minimal, Complete and Verifiable example and 3) ask for code review at some code review sites where some more experienced programmers will take a look at your code and propose some changes. These sites should be a good candidates:
https://codereview.stackexchange.com/
https://www.codementor.io/
Those are sites designed especially for helping beginners and intermediate programmers with problems like the one of yours

How much faster if write in-line assembly rather than regular c/c++ code? [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 11 years ago.
One of my senior collegues optimizes a function (he is implementing image filtering) by writing in-line assembly. Is that really necessary? Wouldn't modern compiler do that for us? Typically, how much gain do we have by converting C code into assembly? If assembly code really brings lots of benefits, when should we convert C/C++ code into assembly and when should we leave the code as it is, since assembly code is hard to read and maintain.

If you are smarter than the compiler, you may be able to make your code faster on one specific platform by writing it by hand in assembly.
However, most big C/C++ compilers are extremely good optimizers; you are unlikely to be smarter than them.

No, that's not really necessary, and this also makes porting the app much more different. This is the main concern about inline assembly.
And, of course, 80% of the time compiler can do this better.

First find an efficient algorithm.
Then implement it in clearly readable code.
Then evaluate its performance.
If your code's performance is inadequate, consider alternative algorithms
Repeat steps 3 and 4 until either performance is acceptable or you have exhausted all algorithmic alternatives
Drink some coffee.
Take a walk.
Repeat steps 3 and 4 again some more.
Have a beer.
Give steps 3 and 4 another few tries.
Get some rest
Back to 3 and 4.
Spend years studying the architecture of the CPU(s) your code will run on
Now consider hand-writing some assembly.

I'd imagine that for image filtering you might benefit from e.g. the availability of SIMD instructions, but not all compilers can automatically compile your code to use them, and not all the time. So in-line assembly or intrinsics can help with that.

One of my senior collegues optimizes a function (he is implementing image filtering) by writing in-line assembly. Is that really necessary?
Obviously I can't comment on your colleagues exact situation, but I wouldn't be surprised if it was necessary. There's many specialised instructions that are used for image filters that won't necessarily be used by the compiler. Inline assembly is often the only way to access those instructions (or through intrinsics).
Wouldn't modern compiler do that for us?
Obviously this depends on what 'that' is, but while modern compilers are certainly good at generating code, they aren't magic. It is often the case where you know something about your code that the compiler doesn't (or can't).
If your line of work involves high performance code then there are definitely places where you can get major improvements from using inline assembly (or even just compiler intrinsics).
If assembly code really brings lots of benefits, when should we convert C/C++ code into assembly and when should we leave the code as it is, since assembly code is hard to read and maintain.
Here's how:
First, profile your code to see what potential benefits are to be gained.
Look at the disassembly to see what the compiler is doing. If it is already doing things optimally then there is no point going further.
If there are opportunities for improvement, consider using compiler intrinsics before hand-written assembly as it is generally easier to maintain and more portable.
Only if all that fails should you go to inline assembly.

The short answer is no it's not necessary, the longer answer is... well, it depends. Modern compilers do indeed do a very good job of optimizing code, but they don't necessarily have access to all the assumptions a human does when optimizing. Hand coded assembler can beat compiled code, but there is a tradeoff between portability and maintenance.
Assuming that you have already determined this bit of code is a hotspot, the first thing you should do is tweak algorithms, then tweak the C++ code to make it faster (for example unrolling loops), and then tweak compiler flags. As a last resort, if you still can't make it go as fast as you need, consider whether its worth paying the cost of hand-optimizing, given all the future cost you will incur in maintenance and portability.

Where image processing is concerned I would be cautious either way as it depends on the input data, the algorithm and the compiler. Intel's ICC has a very good parallelizer and vectorizer for generating SSE code, it may be hard to beat by hand in most general purpose image processing cases. VCC on the other hand might not do such a good job.
However, I would expect that most benefit could be gained using compiler intrinsics rather than inline assembler.

programming languages are very well coded. Unless you are using very simple bitwise operations, like add, bitshift or using pointers or new instruction sets, you should use a practical programming lanugage. you DO NOT need assembly language for anything in your life. standard c operations call the relevant cpu instructions. If somebody makes a new CPU and it supports new instructions and you want to use those instructions, the programming languages or libraries do not support them and adaptation takes time. A new instruction in a cpu, makes things faster but you won't ever work in teams like DirectX or Opengl or MMX, SSE bla bla bla. Think of a day when graphic libraries like directx or opengl were not developed and intel,for say, creates some isntruction sets currently supported by none of the languages or inexistent in none of the libraries developed. then you would want to call some method from cpu and pass your parameters to that, for better performance. You can still do the same things without the new instruction in cpu. Another example, a new cpu by intel can support md5 hash checking, it doesnt mean you can't use md5, it means a library developed which uses md5 instructions will work faster because the cpu has a separate entity inside which will execute the operation efficiently. but normally you would wait until somebody publishes a library which uses md5 instructions int the cpu. cpus today add instrucion sets for zip, hash check, encryption and so on. you would use assembly language for some specific instruction. not for the good old add, multiply, subtract or divide because your programming language is already using them in the most efficient way possible.

Local variables in assembly:are they faster than global variables?

I was wondering if local variables in assembly are faster than the global variables that we use. The context for this is that I am learning some 2d animation using the win32 api, from a book. The author uses a function to initialize(Creation,registration, showing and updating of the window) the main window for the program. I wrote that function in asm(just to practice some asm). So, I was wondering if there is any performance advantage involved, since in the asm function i used, the WNDCLASSEX structure was created locally(in stack). I know that the local variables in assembly are supposed to be faster, but having gone through the disassembly for another program(entirely in cpp), I noticed that the compiler creates the WNDCLASSEX locally as well. This confused me about the topic. So i want to know if there is any difference in performance between the asm code and C++ code.
Devjeet

The top of the stack is touched by a lot of code. That means the the top of the stack is usually in CPU cache. Accessing this would be faster than accessing other memory areas (from .bss, etc).
But for a function like CreateWindow which is called only a few times per program this doesn't really matter. The difference is less than a a few hundred CPU cycles. For other parts of the code the difference might be more noticeable. But an important thing to note is that if you're doing the same thing repeatedly with the same piece of data that data too will end up in CPU cache and thus the performance difference would be negated.

To be honest, I think you should leave such decisions to the compiler. The people who wrote the compiler have spent a lot of person-years optimizing the code, so there is very little reason to worry about such things for 99% of all apps. In the 1% case, when you compile, do an assembly listing and check the code, since there you may earn a cycle or two.

Without having your code to judgment, i have an advise:
Some days, coding in assembly was a great way to write a fast program. But, nowadays with third-party libraries, complex algorithms, smart optimizations in compilers, portability issues and huge improvements in cpu and memory speeds, this approach is leaving off.
Even, in my experience hand made assembly codes may occurs bad effect on performance of code, because compilers can not make smart codes in that assembly block.
Another bad thing with coding with assembly inside a high level language is preventing portability a code.
Note: Yet, there is some machines and systems which coding by assembly is good for them.

C/C++ compiler feedback optimization

Has anyone seen any real world numbers for different programs which are using the feedback optimization that C/C++ compilers offer to support the branch prediction, cache preloading functions etc.
I searched for it and amazingly not even the popular interpreter development groups seem to have checked the effect. And increasing ruby,python,php etc. performance by 10% or so should be considered usefull.
Is there really no benefit or is the whole developer community just to lazy to use it?

10% is a good ballpark figure. That said, ...
You have to REALLY care about the performance to go this route. The product I work on (DB2) uses PGO and other invasive and agressive optimizations. Among the costs are significant build time (triple on some platforms) and development and support nightmares.
When something goes wrong it can be non-trivial to map the fault location in the optimized code back to the source. Developers don't usually expect that functions in different modules can end up merged and inlined and this can have "interesting" effects.
Problems with pointer aliasing, which are nasty to track down also usually show up with these sorts of optimizations. You have the additional fun of having non-deterministic builds (an aliasing problem can show up in monday's build, vanish again till thursday's, ...).
The line between what is correct or incorrect compiler behaviour under these sorts of aggressive optimizations also becomes fairly blurred. Even with the luxury of having our compiler guys in house (literally) the optimization issues (either in our source or the compiler) are still not easy to understand and resolve.

From unladen-swallow (a project optimizing the CPython VM):
For us, the final nail in PyBench's coffin was when experimenting with gcc's feedback-directed optimization tools, we were able to produce a universal 15% performance increase across our macrobenchmarks; using the same training workload, PyBench got 10% slower.
So some people are at least looking at it. That said, PGO sets some pretty tricky requirements on the build environment that are hard to satisfy for open-source projects meant to be built by a distributed heterogeneous group of people. Heavy optimization also creates difficult to debug heisenbugs. It's less work to give the compiler explicit hints for the performance critical parts.
That said, I expect significant performance increases from runtime profile guided optimization. JIT'ing allows the optimizer to cope with the profile of data changing across the execution of a program and do many extremely runtime data specific optimizations that would explode the code size for static compilation. Especially dynamic languages need good runtime data based optimization to perform well. With dynamic language performance getting significant attention lately (JavaScript VM's, MS DLR, JSR-292, PyPy and so on) there's a lot of work being done in this area.

Traditional methods in improving the compiler efficiency via profiling is done by performance analysis tools. However, how the data from the tools may be of use in optimization still depends on the compiler you use. For example, GCC is a framework being worked on to produce compilers for different domains. Providing profiling mechanism in the such compiler framework will be extremely difficult.
We can rely on statistical data to do certain optimization. For instance, GCC unrolls a loop if the loop count is less than a constant (say 7). How it fixes up the constant will be based on statistical result of the code size generated for different target architecture.
Profile guided optimizations track the special areas of the source. Details regarding previous run results needs to be stored which is an overhead. The input on the other hand requires a statistical representation of the target application which may use the compiler. So the complexity level rises with the number of different inputs and outputs. In short, deciding profile guided optimization needs extreme data collection. Automation or embedding such profiling into source needs careful monitoring. If not, the entire result will be awry and in our effort to swim we actually will drown.
However, experimentation on this regard is ongoing. Just have a look at POGO.

Languages faster than C++ [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
It is said that Blitz++ provides near-Fortran performance.
Does Fortran actually tend to be faster than regular C++ for equivalent tasks?
What about other HL languages of exceptional runtime performance? I've heard of a few languages suprassing C++ for certain tasks... Objective Caml, Java, D...
I guess GC can make much code faster, because it removes the need for excessive copying around the stack? (assuming the code is not written for performance)
I am asking out of curiosity -- I always assumed C++ is pretty much unbeatable barring expert ASM coding.

Fortran is faster and almost always better than C++ for purely numerical code. There are many reasons why Fortran is faster. It is the oldest compiled language (a lot of knowledge in optimizing compilers). It is still THE language for numerical computations, so many compiler vendors make a living of selling optimized compilers. There are also other, more technical reasons. Fortran (well, at least Fortran77) does not have pointers, and thus, does not have the aliasing problems, which plague the C/C++ languages in that domain. Many high performance libraries are still coded in Fortran, with a long (> 30 years) history. Neither C or C++ have any good array constructs (C is too low level, C++ has as many array libraries as compilers on the planet, which are all incompatible with each other, thus preventing a pool of well tested, fast code).

Whether fortran is faster than c++ is a matter of discussion. Some say yes, some say no; I won't go into that. It depends on the compiler, the architecture you're running it on, the implementation of the algorithm ... etc.
Where fortran does have a big advantage over C is the time it takes you to implement those algorithms. And that makes it extremely well suited for any kind of numerical computing. I'll state just a few obvious advantages over C:
1-based array indexing (tremendously helpful when implementing larger models, and you don't have to think about it, but just FORmula TRANslate
has a power operator (**) (God, whose idea was that a power function will do ? Instead of an operator?!)
it has, I'd say the best support for multidimensional arrays of all the languages in the current market (and it doesn't seem that's gonna change so soon) - A(1,2) just like in math
not to mention avoiding the loops - A=B*C multiplies the arrays (almost like matlab syntax with compiled speed)
it has parallelism features built into the language (check the new standard on this one)
very easily connectible with languages like C, python, so you can make your heavy duty calculations in fortran, while .. whatever ... in the language of your choice, if you feel so inclined
completely backward compatible (since whole F77 is a subset of F90) so you have whole century of coding at your disposal
very very portable (this might not work for some compiler extensions, but in general it works like a charm)
problem oriented solving community (since fortran users are usually not cs, but math, phy, engineers ... people with no programming, but rather problem solving experience whose knowledge about your problem can be very helpful)
Can't think of anything else off the top of my head right now, so this will have to do.

What Blitz++ is competing against is not so much the Fortran language, but the man-centuries of work going into Fortran math libraries. To some extent the language helps: an older language has had a lot more time to get optimizing compilers (and , let's face it, C++ is one of the most complex languages). On the other hand, high level C++ libraries like Blitz++ and uBLAS allows you to state your intentions more clearly than relatively low-level Fortran code, and allows for whole new classes of compile-time optimizations.
However, using any library effectively all the time requires developers to be well acquainted with the language, the library and the mathematics. You can usually get faster code by improving any one of the three...

FORTAN is typically faster than C++ for array processing because of the different ways the languages implement arrays - FORTRAN doesn't allow aliasing of array elements, whereas C++ does. This makes the FORTRAN compilers job easier. Also, FORTRAN has many very mature mathematical libraries which have been worked on for nearly 50 years - C++ has not been around that long!

This will depend a lot on the compiler, programmers, whether it has gc and can vary too much. If it is compiled directly to machine code then expect to have better performance than interpreted most of the time but there is a finite amount of optimization possible before you have asm speed anyway.
If someone said fortran was slightly faster would you code a new project in that anyway?

the thing with c++ is that it is very close to the hardware level. In fact, you can program at the hardware level (via assembly blocks). In general, c++ compilers do a pretty good job at optimisations (for a huge speed boost, enable "Link Time Code Generation" to allow the inlining of functions between different cpp files), but if you know the hardware and have the know-how, you can write a few functions in assembly that work even faster (though sometimes, you just can't beat the compiler).
You can also implement you're own memory managers (which is something a lot of other high level languages don't allow), thus you can customize them for your specific task (maybe most allocations will be 32 bytes or less, then you can just have a giant list of 32-byte buffers that you can allocate/deallocate in O(1) time). I believe that c++ CAN beat any other language, as long as you fully understand the compiler and the hardware that you are using. The majority of it comes down to what algorithms you use more than anything else.

You must be using some odd managed XML parser as you load this page then. :)
We continously profile code and the gain is consistently (and this is not naive C++, it is just modern C++ with boos). It consistensly paves any CLR implementation by at least 2x and often by 5x or more. A bit better than Java days when it was around 20x times faster but you can still find good instances and simply eliminate all the System.Object bloat and clearly beat it to a pulp.
One thing managed devs don't get is that the hardware architecture is against any scaling of VM and object root aproaches. You have to see it to believe it, hang on, fire up a browser and go to a 'thin' VM like Silverlight. You'll be schocked how slow and CPU hungry it is.
Two, kick of a database app for any performance, yes managed vs native db.

It's usually the algorithm not the language that determines the performance ballpark that you will end up in.
Within that ballpark, optimising compilers can usually produce better code than most assembly coders.
Premature optimisation is the root of all evil
This may be the "common knowledge" that everyone can parrot, but I submit that's probably because it's correct. I await concrete evidence to the contrary.

D can sometimes be faster than C++ in practical applications, largely because the presence of garbage collection helps avoid the overhead of RAII and reference counting when using smart pointers. For programs that allocate large amounts of small objects with non-trivial lifecycles, garbage collection can be faster than C++-style memory management. Also, D's builtin arrays allow the compiler to perform better optimizations in some cases than C++'s STL vector, which the compiler doesn't understand. Furthermore, D2 supports immutable data and pure function annotations, which recent versions of DMD2 optimize based on. Walter Bright, D's creator, wrote a JavaScript interpreter in both D and C++, and according to him, the D version is faster.

C# is much faster than C++ - in C# I can write an XML parser and data processor in a tenth the time it takes me to write it C++.
Oh, did you mean execution speed?
Even then, if you take the time from the first line of code written to the end of the first execution of the code, C# is still probably faster than C++.
This is a very interesting article about converting a C++ program to C# and the effort required to make the C++ faster than the C#.
So, if you take development speed into account, almost anything beats C++.
OK, to address tht OP's runtime only performance requirement: It's not the langauge, it's the implementation of the language that determines the runtime performance. I could write a C++ compiler that produces the slowest code imaginable, but it's still C++. It is also theoretically possible to write a compiler for Java that targets IA32 instructions rather than the Java VM byte codes, giving a runtime speed boost.
The performance of your code will depend on the fit between the strengths of the language and the requirements of the code. For example, a program that does lots of memory allocation / deallocation will perform badly in a naive C++ program (i.e. use the default memory allocator) since the C++ memory allocation strategy is too generalised, whereas C#'s GC based allocator can perform better (as the above link shows). String manipulation is slow in C++ but quick in languages like php, perl, etc.

It all depends on the compiler, take for example the Stalin Scheme compiler, it beats almost all languages in the Debian micro benchmark suite, but do they mention anything about compile times?
No, I suspect (I have not used Stalin before) compiling for benchmarks (iow all optimizations at maximum effort levels) takes a jolly long time for anything but the smallest pieces of code.

if the code is not written for performance then C# is faster than C++.
A necessary disclaimer: All benchmarks are evil.
Here's benchmarks that in favour of C++.
The above two links show that we can find cases where C++ is faster than C# and vice versa.

Performance of a compiled language is a useless concept: What's important is the quality of the compiler, ie what optimizations it is able to apply. For example, often - but not always - the Intel C++ compiler produces better performing code than g++. So how do you measure the performance of C++?
Where language semantics come in is how easy it is for the programmer to get the compiler to create optimal output. For example, it's often easier to parallelize Fortran code than C code, which is why Fortran is still heavily used for high-performance computation (eg climate simulations).
As the question and some of the answers mentioned assembler: the same is true here, it's just another compiled language and thus not inherently 'faster'. The difference between assembler and other languages is that the programmer - who ideally has absolute knowledge about the program - is responsible for all of the optimizations instead of delegating some of them to the 'dumb' compiler.
Eg function calls in assembler may use registers to pass arguments and don't need to create unnecessary stack frames, but a good compiler can do this as well (think inlining or fastcall). The downside of using assembler is that better performing algorithms are harder to implement (think linear search vs. binary seach, hashtable lookup, ...).

Doing much better than C++ is mostly going to be about making the compiler understand what the programmer means. An example of this might be an instance where a compiler of any language infers that a region of code is independent of its inputs and just computes the result value at compile time.
Another example of this is how C# produces some very high performance code simply because the compiler knows what particular incantations 'mean' and can cleverly use the implementation that produces the highest performance, where a transliteration of the same program into C++ results in needless alloc/delete cycles (hidden by templates) because the compiler is handling the general case instead of the particular case this piece of code is giving.
A final example might be in the Brook/Cuda adaptations of C designed for exotic hardware that isn't so exotic anymore. The language supports the exact primitives (kernel functions) that map to the non von-neuman hardware being compiled for.

Is that why you are using a managed browser? Because it is faster. Or managed OS because it is faster. Nah, hang on, it is the SQL database.. Wait, it must be the game you are playing. Stop, there must be a piece of numerical code Java adn Csharp frankly are useless with. BTW, you have to check what your VM is written it to slag the root language and say it is slow.
What a misconecption, but hey show me a fast managed app so we can all have a laugh. VS? OpenOffice?

Ahh... The good old question - which compiler makes faster code?
It only matters in code that actually spends much time at the bottom of the call stack, i.e. hot spots that don't contain function calls, such as matrix inversion, etc.
(Implied by 1) It only matters in code the compiler actually sees. If your program counter spends all its time in 3rd-party libraries you don't build, it doesn't matter.
In code where it does matter, it all comes down to which compiler makes better ASM, and that's largely a function of how smartly or stupidly the source code is written.
With all these variables, it's hard to distinguish between good compilers.
However, as was said, if you've got a lot of Fortran code to compile, don't re-write it.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js