C++ - put expression to register and use it in assembly - c++

How to resolve expression and put it in the register, use it in inline assembly and even use it again and put it somewhere?
For example:
EAX=a[i]; //Any expression that valid in C++
__asm xor eax,0xFFFF //Do something with this
b[i]=EAX; //And then put it in some variable.
By the way, the reason is for performance.

Several compilers have compiler specific ways of accomplishing this. But it's almost never worth doing.
There are a list of reasons why this is almost never worth doing:
The compiler will usually generate better code than you can write most of the time.
Even if it doesn't, you can frequently tweak your code slightly to convince the compiler to write code that's at least as good as you could write, and still have your program remain portable.
The code that has the perceived performance issue is not actually critical to performance because the program spends 0.01% of it's time there.
You want your program to stay standard C++ and don't want to clutter it with tons of #ifdef guards.
The example you've shown is not very compelling.

Related

Increase Program Speed By Avoiding Functions? (C++)

When it comes to procedural programming, functional decomposition is ideal for maintaining complicated code. However, functions are expensive- adding to the call stack, passing parameters, storing return addresses. all of this takes extra time! When speed is crucial, how can I get the best of both worlds? I want a highly decomposed program without any necessary overhead introduced by function calls. I'm familiar with the keyword: "inline" but that seems to be only be a suggestion to the compiler, and if used incorrectly by the programmer it will yield an even slower program. I'm using g++, so will the -03 flag optimize away my functions that call functions that call functions..
I just wanted to know, if my concerns are valid and if there are any methods to combat this issue.
First, as always when dealing with performance issues, you should try and measure what are your bottlenecks with a profiler. The first thing coming out is usually not function calls and by a large margin. If you did this, then please read on.
Then, you can anticipate a bit what functions you want inlined by using the inline keyword. The compiler is usually smart enough to know what to inline and what not to inline (it can inline functions you forgot and may not inline some you mentionned if he thinks it won't help).
If (really) you still want to improve performance on function calls and want to force inlining, some compilers allow you to do so (see this question). Please consider that massive inlining may actually decrease performance: your code will use a lot of memory and you may get more cache misses on the code than before (which is not good).
If it's a specific piece of code you're worried about you can measure the time yourself. Just run it in a loop a large number of times and get the system time before and after. Use the difference to find the average time of each call.
As always the numbers you get are subjective, since they will vary depending on your system and compiler. You can compare the times you get from different methods to see which is generally faster, such as replacing the function with a macro. My guess is however you won't notice much difference, or at the very least it will be inconsequential.
If you don't know where the slowdown is follow J.N's advice and use a code profiler and optimise where it's needed. As a rule of thumb always pass large objects to functions by reference or const reference to avoid copy times.
I highly doubt speed is that curcial, but my suggestion would be to use preprocessor macros.
For example
#define max(a,b) ( a > b ? a : b )
This would seem obvious to me, but I don't consider myself an expect in C++, so I may have misunderstood the question.

Does separating statements with commas instead of semicolons affect my program's speed?

I was wondering if any of these samples works "faster" than the other. I know there can't be a big difference, but I just want to know if there's any difference.
CODE1:
a+b=c;
c=c*c;
d=c*a;
CODE2:
a+b=c,c=c*c,d=c*a;
So does it matter if I use , or ;?
Just asking... :D
The number of lines of a program is not indicative of its speed. To answer your question: no, there is no difference in speed between the two forms you posted. If you look at the assembly code generated by the compiler for each program, you will see it's exactly the same.
How to read the assembly output of a C program
No difference in terms of speed.
There should not be any difference. However, it depends completely on the compiler. There is no way to know for certain whether your compiler/interpreter generated different assembly based on different code that you entered until you look at the assembly generated.
In Visual Studio you can view the assembly like so:
http://msdn.microsoft.com/en-us/library/a3cwf295.aspx
In general, remember that the code you write in C++ is scanned by a program which decides best how to generate assembly for you. So in most cases, syntatic sugar like that will generate identical assembly code to the longer version.
More importantly, you should stop worrying about the difference in speed here. If speed is a concern, always look to your algorithm first, long before tiny differences like these.
In general the comma operator is not needed at all and often enough it is only used to write confusing code for the sake of fishy goals. For example more than once that I saw code like this
if (expression)
statement1,
statement2,
statement3;
Just for the 'goal' to save the one or two extra lines for { and }.
My advice:
a) simply forget about the existence of the comma operator!
b) don't even think about micro optimization like this, instead look for something real to optimize maybe a loop, or the number of c-tors being called or eliminating calls to implicit conversion operators. A single one of such an optimization will do your program real good.

Complicated code for obvious operations

Sometimes, mainly for optimization purposes, very simple operations are implemented as complicated and clumsy code.
One example is this integer initialization function:
void assign( int* arg )
{
__asm__ __volatile__ ( "mov %%eax, %0" : "=m" (*arg));
}
Then:
int a;
assign ( &a );
But actually I don't understand why is it written in this way...
Have you seen any example with real reasons to do so?
In the case of your example, I think it is a result of the fallacious assumption that writing code in assembly is automatically faster.
The problem is that the person who wrote this didn't understand WHY assembly can sometimes run faster. That is, you know more than the compiler what you are trying to do and can sometimes use this knowledge to write code at a lower level that is more performant based on not having to make assumptions that the compiler will.
In the case of a simple variable assignment, I seriously doubt that holds true and the code is likely to perform slower because it has the additional overhead of managing the assign function on the stack. Mind you, it won't be noticeably slower, the main cost here is code that is less readable and maintainable.
This is a textbook example of why you shouldn't implement optimizations without understanding WHY it is an optimization.
It seems that the assembly code intent was to ensure that the assignment to the *arg int location will be done every time - preventing (on purpose) any optimization from the compiler in this regard.
Usually the volatile keyword is used in C++ (and C...) to tell the compiler that this value should not be kept in a register (for instance) and reused from that register (optimization in order to get the value faster) as it can be changed asynchronously (by an external module, an assembly program, an interruption etc...).
For instance, in a function
int a = 36;
g(a);
a = 21;
f(a);
in this case the compiler knows that the variable a is local to the function and is not modified outside the function (a pointer on a is not provided to any call for instance). It may use a processor register to store and use the a variable.
In conclusion, that ASM instruction seems to be injected to the C++ code in order not to perform some optimizations on that variable.
While there are several reasonable justifications for writing something in assembly, in my experience those are uncommonly the actual reason. Where I've been able to study the rationale, they boil down to:
Age: The code was written so long ago that it was the most reasonable option for dealing with compilers of the era. Typically, before about 1990 can be justified, IMHO.
Control freak: Some programmers have trust issues with the compiler, but aren't inclined to investigate its actual behavior.
Misunderstanding: A surprisingly widespread and persistent myth is that anything written in assembly language inherently results in more efficient code than writing in a "clumsy" compiler—what with all its mysterious function entry/exit code, etc. Certainly a few compilers deserved this reputation
To be "cool": When time and money are not factors, what better way to strut a programmer's significantly elevated hormone levels than some macho, preferably inscrutable, assembly language?
The example you give seems flawed, in that the assign() function is liable to be slower than directly assigning the variable, reason being that calling a function with arguments involves stack usage, whereas just saying int a = x is liable to compile to efficient code without needing the stack.
The only times I have benefited from using assembler is by hand optimising the assembler output produced by the compiler, and that was in the days where processor speeds were often in the single megahertz range. Algorithmic optimisation tends to give a better return on investment as you can gain orders of magnitudes in improvement rather than small multiples. As others have already said, the only other times you go to assembler is if the compiler or language doesn't do something you need to do. With C and C++ this is very rarely the case any more.
It could well be someone showing off that they know how to write some trivial assembler code, making the next programmers job more difficult, and possibly as a half assed measure to protect their own job. For the example given, the code is confusing, possibly slower than native C, less portable, and should probably be removed. Certainly if I see any inline assmebler in any modern C code, I'd expect copious comments explaining why it is absolutely necessary.
Let compilers optimize for you. There's no possible way this kind of "optimization" will ever help anything... ever!

How to know what optimizations are done automatically by my compiler

I was going through this link Will it optimize and wondered how can we know what optimizations are done by a particular compiler.
Like does VC8.0 convert if-else statements to switch-case?
Is such information available on msdn?
As everyone seems to be bent on telling the OP that he shouldn't worry about it, there is some useful although not as specific as the OP requested) information about compiler optimization (options).
You'll have to figure out what flags you're using, especially for MSVC and Intel (GCC release build should default to -O2), but here are the links:
GCC
MSVC
Intel
This is about as close as you'll get before disassembling your binary after compilation.
It depends on the level of of optimization you choose for compiler.
you can find a very nice article about it here
First of all, if optimization took place then your program should work faster usually. After that you could inspect disassembly code to find out what kind of optimizations were performed.
I don't know anything about VC8.0, so I'm not sure how you would access that information. However, if you are generally interested in the kinds of optimisations that go on and want to experiment, I recommend you use LLVM. You can look at the unoptimised, disassembled byte code generated from the default C front end, and then run various optimiser passes over it to see what the effect is each time. Because it's a nicer, abstract assembly code, it tends to be a little easier to figure out what is an optimisation derivable from the code and what is a machine-specific optimisation.
Like does VC8.0 convert if-else statements to switch-case?
Compilers do not do magically rewrite your source code. And even if they did, what would that tell you? What you really would want to know is if the compiler compiled it into a jump table or into multiple compare operations. Any dis-assembler will tell you that.
To clarify my point: Writing a switch-case statement does not necesseraly imply that there will be a jump table in the binary. Not needing to worry about this is the whole point of having compilers.
Instead of figuring out which optimizations are done by the compiler in general, it's probably better to NOT have any dependencies on such compiler-specific knowledge.
Instead start out with a good design and algorithm, writing (as much as possible) portable code that's easy to follow. Then profile the code if it's too slow and fix the actual hotspots. Compiler optimizations are useful no doubt, but better is to apply some investigation to what's actually happening in the code. Algorithmic/design improvements at the source level will typically help performance more than the presence or absence of optimizations like transforming if/else into switch-case.
I'm not sure what "convert if/else to switch/case" means. My processor doesn't have a hardware switch/case instruction.
Typical compilers have several different ways to implement switch/case. A well-known one is using a jump table, but this is only done if appropriate.
For if/else, certainly it is normal for compilers to analyse a digraph of execution flow. I would expect a compiler to notice if each condition references the same variable, and I would expect the compiler to treat equivalent forms of conditionals the same way in general. But this isn't something I'd worry about.
IIRC, the general policy in GCC is that regressions in optimisation are tolerable so long as preferred improvements result. Optimisation is complex and what is "generally" a good optimisation isn't always that great. Plus for perfect optimisation, the compiler would have to know things it can't know (e.g. what inputs it will encounter in real life).
The point is that it really isn't worthwhile knowing that much about specific optimisations unless you happen to be a compiler developer. If you depend on something being optimised by V8, that particular optimisation might not happen in V9 or V10.

Should I use a function in a situation where it would be called an extreme number of times?

I have a section of my program that contains a large amount of math with some rather long equations. Its long and unsightly and I wish to replace it with a function. However, chunk of code is used an extreme number of times in my code and also requires a lot of variables to be initialized.
If I'm worried about speed, is the cost of calling the function and initializing the variables negligible here or should i stick to directly coding it in each time?
Thanks,
-Faken
Most compilers are smart about inlining reasonably small functions to avoid the overhead of a function call. For functions big enough that the compiler won't inline them, the overhead for the call is probably a very small fraction of the total execution time.
Check your compiler documentation to understand it's specific approach. Some older compilers required or could benefit from hints that a function is a candidate for inlining.
Either way, stick with functions and keep your code clean.
Are you asking if you should optimize prematurely?
Code it in a maintainable manner first; if you then find that this section is a bottleneck in the overall program, worry about tuning it at that point.
You don't know where your bottlenecks are until you profile your code. Anything you can assume about your code hot spots is likely to be wrong. I remember once I wanted to optimize some computational code. I ran a profiler and it turned out that 70 % of the running time was spent zeroing arrays. Nobody would have guessed it by looking at the code.
So, first code clean, then run a profiler, then optimize the rough spots. Not earlier. If it's still slow, change algorithm.
Modern C++ compilers generally inline small functions to avoid function call overhead. As far as the cost of variable initialization, one of the benefits of inlining is that it allows the compiler to perform additional optimizations at the call site. After performing inlining, if the compiler can prove that you don't need those extra variables, the copying will likely be eliminated. (I assume we're talking about primitives, not things with copy constructors.)
The only way to answer that is to test it. Without knowing more about the proposed function, nobody can really say whether the compiler can/will inline that code or not. This may/will also depend on the compiler and compiler flags you use. Depending on the compiler, if you find that it's really a problem, you may be able to use different flags, a pragma, etc., to force it to be generated inline even if it wouldn't be otherwise.
Without knowing how big the function would be, and/or how long it'll take to execute, it's impossible guess how much effect on speed it'll have if it isn't generated inline.
With both of those being unknown, none of us can really guess at how much effect moving the code into a function will have. There might be none, or little or huge.