C++ visual studio inline - c++

When building projects in Visual Studio (I'm using 2008 SP1) there is an optimizing option
called Enable link-time code generation. As far as I understand, this allows specific inlining techniques to be used and that sounds pretty cool.
Still, using this option dramatically increases the size of static libraries built. In my case it was something like 40 mb -> 250 mb and, obviously building process can become REALLY slow if you have even 5-6 libraries that are that huge.
So my question is - is it worth it?. Is the effect of link-time code generation measurable so that I leave it turned on and suffer from slooooooooooooow builds?
Thank you.

How are we supposed to know? You're the one suffering the slower link times.
If you can live with the slower builds, then it speeds up your code, which is good.
If you want faster builds, you lose out on optimizations, making your code run slower.
Is it worth it? That depends on you and nothing else. How patient are you? How long can you wait for a build?
It can significantly speed up your code though. If you need the speed, it is a very valuable optimization.

It's up to you. This is rather a subjective question. Here's a few things to go over to help you make that determination:
Benchmark the performance with and without this feature. Sometimes smaller code runs faster, sometimes more inlining works. It's not always so clear cut and dry.
Is performance critical? Will your client reject your application with it's current speed unless you find a way to improve things on that front?
How slow is acceptable in the build process? Do you have to keep this on while you, yourself build it, or can you push it off to the test environment/continuous build machine?
Personally, I'd go with whatever helped me develop faster and then worry about the optimizations later. Make sure that it does what it needs to do first.

Related

How much faster is C++ code "supposed" to be with optimizations turned on?

I have a program that runs in around 1 minute when compiling with g++ without any options.
Compiling with -O3 however makes it run in around 1-2 seconds.
My question is whether it is normal to have this much of a speed up? Or is my code perhaps so bad, that optimization can take away that much time. Obviously I know my code isn't perfect but because of this huge speedup I'm beginning to think it's worse than I thought. Please tell me what the "normal" amount of speed up is (if that's a thing), and whether too much speed up can mean bad code that could (and should) be easily optimized by hand instead of relying on the compiler.
How much faster is C++ code “supposed” to be with optimizations turned on?
In theory: There doesn't necessarily need to be any speed difference. Nor does there exist any upper limit to the speed difference. The C++ language simply doesn't specify a difference between optimisation and lack thereof.
In practice: It depends. Some programs have more to gain from optimisation than others. Some behaviours are easier to prove than others. Some optimisations can even make the program slower, because the compiler cannot know about everything that may happen at runtime.
... 1 minute ... [optimisation] makes it run in around 1-2 seconds.
My question is whether it is normal to have this much of a speed up?
It is entirely normal. You cannot assume that you'll always get as much improvement, but this is not out of the ordinary.
Or is my code perhaps so bad, that optimization can take away that much time.
If the program is fast with optimisation, then it is a fast program. If the program is slow without optimisation, we don't care because we can enable optimisation. Usually, only the optimised speed is relevant.
Faster is better than slower, although that is not the only important metric of a program. Readability, maintainability and especially correctness are more important.
Please tell me ... whether ... code ... could ... be ... optimized by hand instead of relying on the compiler.
Everything could be optimized by hand, at least if you write the program in assembly.
... or should ...
No. There is no reason to waste time doing what the compiler has already done for you.
There are sometimes reasons to optimise by hand something that is already well optimised by the compiler. Relative speedup is not one of those reasons. An example of a valid reason is that the non-optimised build may be too slow to be executed for debugging purposes when there are real time requirements (whether hard or soft) involved.

What is the performance analyzer in vs2012 doing differently?

I decided to try out the performance analyzer thingy in vs 2012. To my surprise, the test code(way too big to post) runs about 15% faster when analyzing than in the default release configuration over the length of ~1 minute. What could be the reason for this? Is it using different compiler flags or something?
To elaborate a bit more on the code: Its a specialized spatial sorting algorithm(most similar to counting sort) that operates on relatively simple pod classes and is looped 10k times, IO times are excluded from being timed.
Well I think I finally got a clue. When I run the program from the IDE, I think the thingy that allows for breakpoints in code is slowing it down, can't break in the analyzer. To confirm I ran the program from the .exe instead and surely enough it was even faster than it was in the analyzer (not by much but still), probably because it didn't have the sampler poking at it. Mystery solved!

Profile optimised C++/C code

I have some heavily templated c++ code that I am working with. I can compile and profile with AMD tools and sleepy in debug mode. However without optimisation most of time spent concentrated in the templated code and STL. With optimised compilation, all the profile tools that I know produce garbage information. Does anybody know a good way to profile optimised native code
PS1:
The code that I am writing is also heavily templated. Most of the time spent in the unoptimised code will be optimized away. I am talking about 96-97% of the run time are spent in templated code without optimisation. This is going to corrupt the accuracy of the profiling. And yes I can change many templated code or at least what part of the templated code is introducing the most trouble and I can do better in those places.
You should focus on the code you wrote because that is what you can change, time spent in STL is irrelevant, just ignore it and focus on the callers of that code. If too much time is spent in STL you probably can call some other STL primitive instead of the current one.
Profiling unoptimized code is less interesting, but you can still get some informations. If used algorithms from some parts of code are totally flawed it will show up even there. But you should be able to get useful informations from any good profiling tool in optimized code. What tools do you use exactly and why do you call their output garbage ?
Also it's usually easy enough to instrument your code by hand and find out exactly which parts are efficient and which are not. It's just a matter of calling timer functions (or reading cycle count of processor if possible) at well chosen points. I usually do that from unit tests to have reproducible results, but all depends of the specifics of your program.
Tools or instrumenting code are the easy part of optimization. The hard part is finding ways to get faster code where it's needed.
What do you mean by "garbage information"?
Profiling is only really meaningful on optimized builds, so tools are designed to work with them -- thus if you're getting meaningless results, it's probably due to the profiler not finding the right symbols, or needing to instrument the build.
In the case of Intel VTune, for example, I found I got impossible results from the sampler unless I explicitly told it where to find the PDBs for the executable I was tuning. In the instrumented version, I had to fiddle with the settings until it was reliably putting probes into the function calls.
When #kriss says
You should focus on the code you wrote
because that is what you can change
that's exactly what I was going to say.
I would add that in my opinion it is easier to do performance tuning first on code compiled without optimization, and then later turn on the optimizer, for the same reason. If something you can fix is costing excess time, it will cost proportionally excess time regardless of what the compiler does, and it's easier to find it in code that hasn't been scrambled.
I don't look for such code by measuring time. If the excess time is, say, 20%, then what I do is randomly pause it several times. As soon as I see something that can obviously be improved on 2 or more samples, then I've found it. It's an oddball method, but it doesn't really miss anything. I do measure the overall time before and after to see how much I saved. This can be done multiple times until you can't find anything to fix. (BTW, if you're on Linux, Zoom is a more automated way to do this.)
Then you can turn on the optimizer and see how much it gives you, but when you see what changes you made, you can see there's really no way the compiler could have done it for you.

Best C++ compiler and options for windows build, regarding application speed?

I am making a game for windows, mac and GNU, I can built it on windows already with MSVC and MingW...
But I am not finding good information regarding how much compilers optmize.
So what compiler, and options on that compiler, I can use to make my Windows version blazing fast?
Currently the profilers are showing some worring results like good portion of CPU time for example being wasted doing simple floating point math, and on the lua garbage collector.
EDIT: I am doing other stuff too... I am asking this question specifically about compilers, because the question is supposed to be one thing, and not several :)
Also, any minor speed improvement is good, specially with VSync turned on, a 1 frame per second drop at 60 FPS is sufficient to cause the game to run at 30 FPS to maintain sync.
First of all, don't expect compiler optimizations to make a huge difference. You can rarely expect more than a 15 or possibly 20% difference between compilers (as long as you don't try to compare one with all optimizations turn on to another with optimization completely disabled).
That said, the best (especially for F.P. math) tends to be Intel's. It's pretty much the standard that (at best) others attempt to match (and usually, truth be told, that attempt fails). Exact options to get optimal performance vary -- if there was one set that was consistently best, they probably wouldn't include the other possibilities.
I'd emphasize, however, that to get a really substantial difference, you're probably going to need to rewrite some code, not just recompile.

profile-guided optimization (C)

Anyone know this compiler feature? It seems GCC support that. How does it work? What is the potential gain? In which case it's good? Inner loops?
(this question is specific, not about optimization in general, thanks)
It works by placing extra code to count the number of times each codepath is taken. When you compile a second time the compiler uses the knowledge gained about execution of your program that it could only guess at before. There are a couple things PGO can work toward:
Deciding which functions should be inlined or not depending on how often they are called.
Deciding how to place hints about which branch of an "if" statement should be predicted on based on the percentage of calls going one way or the other.
Deciding how to optimize loops based on how many iterations get taken each time that loop is called.
You never really know how much these things can help until you test it.
PGO gives about a 5% speed boost when compiling x264, the project I work on, and we have a built-in system for it (make fprofiled). Its a nice free speed boost in some cases, and probably helps more in applications that, unlike x264, are less made up of handwritten assembly.
Jason's advise is right on. The best speedups you are going to get come from "discovering" that you let an O(n2) algorithm slip into an inner loop somewhere, or that you can cache certain computations outside of expensive functions.
Compared to the micro-optimizations that PGO can trigger, these are the big winners. Once you've done that level of optimization PGO might be able to help. We never had much luck with it though - the cost of the instrumentation was such that our application become unusably slow (by several orders of magnitude).
I like using Intel VTune as a profiler primarily because it is non-invasive compared to instrumenting profilers which change behaviour too much.
The fun thing about optimization is that speed gains are found in the unlikeliest of places.
It's also the reason you need a profiler, rather than guessing where the speed problems are.
I recommend starting with a profiler (gperf if you're using GCC) and just start poking around the results of running your application through some normal operations.