Whole program optimization failing in VC2008 - c++

I have a reasonably large C++ program (~11mb exe) compiled under VS2008 and was interested to see if whole program optimization would significantly affect its performance. However, turning on whole program optimization and link time code generation causes the link to fail as follows;
1>c:\cpp\Win32\Atlas\tin\TINDoc.Cpp : fatal error C1083: Cannot open compiler intermediate file: '.\releaseopt\TINDoc.obj': Not enough space
1>LINK : fatal error LNK1257: code generation failed
Looking at task manager, I can see the linker using more and more memory until it runs out and bombs out. The compiler is running on XP 32bit with 2GB or ram and 2gb page file. Is WPO limited to smaller applications and/or bigger environments, or is there any way to get the linker to be a bit more frugal in memory usage.
n.b. already turned of precompiled headers, which was causing the compilation to fail before linking, and turned off output of debug info and anything else that might take extra resources. The help for C1083 suggests missing header files or inadequate file handles rather than lack of space.
Edit: Got it working under VS2010, albeit without precompiled headers, but the performance gains aren't that significant. I'll leave this option alone until I move onto a beefier 64bit platform with a more robust version of VS2010.

VC2008 is a fragile beast. The optimiser just doesn't work for some cases, and looks like you might possibly have one such case.
Examples of "not working" include
De-optimised code (slow!)
Compiler or (more frequently) linker crashes
Obscure compile/link error messages
Incorrect code execution (this one is rare, but not unknown).
Actually, if you look at some of the tricks used by modern optimisers it is pretty amazing that it works as often as it does. The complexity is quite astounding.
Addressing Shane's problem specifically, some things that might cause your error:
Running out of file handles used to be a big problem in DOS and early Windows versions. You hardly ever see it in modern Windows versions. I'm not even certain there is still a limit on file handles.
A compiler bug could be doing an infinite loop, meaning that no amount of resources will be enough.
A recursive include file could also cause something similar. Do all your include files have "#pragma once" or "#if !defined(FOO_INCLUDED)"?
Is it possible you've included the file TINDoc.obj twice in the project? The 2008 compiler is aggressively multi-threaded and there might be contention between two threads trying to access the file. Actually this could happen via a compiler bug, even if you haven't included the file twice.
If the program really is just too big, consider breaking it up into one or more DLLs and building it piecemeal, or splitting the contentious source file into multiple files to compiler in multiple phases.
Don't assume that, because it says "not enough space" that has to be the reason. Some programs (including some compilers) will guess a reason for an error, instead of checking.
You can monitor the use of memory, handles, etc. using task manager or perfmon, or (my preference) Process Explorer
I already posted the first part of this as a comment (above) but I an making it an answer, as suggested by Ben (http://stackoverflow.com/users/587803/ben) -- and expanded it somewhat.

You can try to boot XP with the /3G switch to give the linker a 3GB address space. In Vista/7, use BCDEDIT /Set IncreaseUserVa 3072 in the command line to get the same effect.
This has fixed problems creating .ilk-files for me.

Related

How to reduce release build time in visual studio unmanaged code?

I have a console application written in C/C++. Usually it takes, 5-10 minutes to get compiled on non-windows platforms even though optimization flag is set to -o3. But it takes approximate 1-2 hours to get compiled on Windows platform when optimization flag is set to Full Optimization (/Ox) and Inline Function expansion is set to Any Suitable (/Ob2) in visual studio. This happens in release/debug mode both.
I understand compiler is trying to optimize the code hence it is bound to take more time but isn't it too much time compare to time taken by other compilers(mainly g++) on non-windows platforms.
So far I tried..
Removed unnecessary headers from source and header file, introduced forward declarations wherever possible but no respite.
I analyzed the all header files. Templates are used hardly in 2-3 header files out of ~50 header files in project. These headers are also not widely included in the source files.
I've two observations from this behaviour -
There is nothing terribly wrong in the source code otherwise compilers on non-windows platforms would not be able to finish so quick.
Seems VS compiler genuinely taking more time(1-2 hours) which other compilers are able to do in(10 minutes) but VS compiler can't be that bad. Therefore, I must be missing to change some configuration (apart from optimization).
Does anyone has idea how to find out what is going wrong here ? May be starting point will be to identify compilation time taken by each file. How do I find compilation time of each file ?
Could there be possibility if I can still improve/try something ?
Here are additional details about hardware, source code etc as requested in some of comments
RAM - 8.0 GB RAM
OS - Windows 7 64 bit
Processor - Intel Core i5 2.6 GHz
Visual Studio - 2013 Ultimate
Note - If I disable optimization (set /Od and /Ob0 flags in VS) then program compiles in less than 5 minutes on the same machine.
Source files - approx 55, header and source files each and 80KLOC code.
Does anyone has idea where to start to find out what is going wrong here ?
Not without more details.
Could there be possibility if I can still improve/try something ?
Yes. In particular, consider:
removing any templated code from header files (both included and defined in the .h file), and accessing that code through pimpl (because templates are reevaluated on each pass).
optimizing your usage of precompiled headers
splitting your console application into separate modules (so your build system will only update the dirty binaries on build)
Based on suggestions received in comments, I started by finding out compilation time taken by each file:
Clean project to ensure all *.obj files are removed
Build the project again
Noticed the time stamp for each file and I found a file which was
taking almost two hours to compile.
When I open the source file I see something terribly wrong in the code in contrast to my observation. It is a huge monster file of 27KLOC (Opps!, of course I didn't write this file).
There are 739 instances of a class created dynamically and assigned to an array. Each instance in turn dynamically creates some of its members as well. In shorts thousands of objects are being created in this file.
To ensure that this file is the culprit and VS studio is taking way too much optimizing this file. I disabled the optimization
in this file as proposed by #Predelnik in comments. VoilĂ ! program compiles within couple of minutes now. This source code needs a serious re-factoring.
If someone is facing such problems, I would go as following -
Enable Build-And-Run option and /MP flag. As discussed Here. If there is some problem with the code the parallel projects and file compilations would not help.
Find out if any source file is the culprit as above. I believe the link I found Here is a way to calculate the build time not compile time of each file.

VS2012 heap space issue when compiling a C++ program that runs a Simulink model

I have compiled a (pretty big) Simulink model to a dll file (using an ert_shrdlib target) and created a simple c++ snippet (in Visual Studio 2012 Express) that loads the library and steps through the model. When I try to compile the solution into an executable, I get the following compiler error:
error C1060: compiler is out of heap space c:\matlabr2011b_x86\simulink\include\simstruc.h
I've tried to search SO as well as google for ways to deal with this, but am yet to find anything that works. I tried to set /Zm to high (2000) and low (256) values, I've tried /Heap with different values, I've tried to turn off parallell compiling, and use a x64 solution platform.
Since the model is something the company has worked on for a long time (and by many people), I don't think I'll be able to do much about that at this point, so I'm hoping there's a way to deal with this in Visual Studio.
Edit:
Yes, in my OP I had not set up the 64-bit compiler correctly, but now I have and I still get the same error.
The simstruc.h header from Simulink that is reffered to in the error message includes some really big structs (~400 elements) and I guess they are the root of the issue. I've tried to set the heap to ridicuolus values (like 20000000000) but it still isn't enough. Seriously though, this workstation has 64GB RAM and should be able to compile this bloody header, surely?!
Found the problem.
It wasn't a problem with the heap, it was a couple of lines of code in the included header (from Mathworks) that my compiler couldn't handle. Once I found them and commented them out it compiled.
Your compiler may have exceeded the address space limit for 32-bit applications. The compiler itself doesn't seem to have a 64-bit version (no love from Microsoft). A 64-bit version of the compiler can be found in vc/bin/amd64 directory. You may have to set PATH accordingly, or just invoke cl.exe with its full path manually.
Try also some other compiler, such as ICC or GCC or Clang. Possibly a different OS too.

Why does C++ linking use virtually no CPU?

On a native C++ project, linking right now can take a minute or two. Yet, during this time CPU drops from 100% during compilation to virtually zero. Does this mean linking is primarily a disk activity?
If so, is this the main area an SSD would make big changes? But, why aren't all my OBJ files (or as many as possible) kept in RAM after compilation to avoid this? With 4 GB of RAM I should be able to save a lot of disk access and make it CPU-bound again, no?
Update: so the obvious follow-up is, can the VC++ compiler and linker talk together better to streamline things and keep OBJ files in memory, similar to how Delphi does it?
Linking is indeed primarily a disk-based activity. Borland Pascal (back in the day) would keep the entire program in memory, which is why it would link so fast.
Your OBJ files aren't kept in RAM because the compiler and linker are separate programs. If your development environment had an integrated compiler and linker (instead of running them as a separate processes), it could indeed keep everything in RAM.
But you would lose the ability to separate the development environment from the compilers and/or linkers - you would have to use the same compiler/linker, and you wouldn't be able to run the compiler outside the environment.
You can try installing some of those RAM disks utilities and keep your obj directory on the RAM disk or even whole project directory. That should speed it up considerably.
Don't forget to make it permanent afterwards :-D
The Visual Studio linker is largely I/O bound, but how much so depends on a few variables.
Incremental linking (common in Debug builds) generally requires a lot less I/O.
Writing a PDB file (for symbols) can consume a lot of the time. It's a specific bottleneck that Microsoft targeted in VS 2010. The PDB writing is now done asynchronously. I haven't tried it, but I've heard it can help link times quite a bit.
If you using link-time code generation (LTCG) (common in Release builds), you have all the usual I/O initially. Then, the linker re-invokes the compiler to re-generate code for sections that can be further optimized. This portion is generally much more CPU-intensive. Off hand, I don't know if the linker actually spins up the compiler in a separate process and waits (in which case you'll still see low CPU usage for the linker process), or if the compilation is done in the linker process (in which case you'll see the linker go through phases of heavy-I/O then heavy-CPU).
Using an SSD can help with the I/O bound portions. Simply having a second drive can help, too. For example, if your source and objects are all on one drive, and you write your PDB to a separate drive, the linker should spend less time waiting for the PDB writer. Having a second spinning drive has helped my current team's link times dramatically.
In debug builds in Visual Studio you can use incremental linking which allows you to usually avoid a lot of the time spent on linking. Basically it means that instead of linking the whole EXE (or DLL) file from scratch it builds upon the one you last linked, replacing only the things that changed.
This is however not recommended for release builds since it adds some overhead in runtime and can result in an EXE file that is several times larger than the usual.
It's hard to say what exactly is taking the linker so long without knowing how it is interacting with the OS. Thankfully, Microsoft provides Process Monitor so you can do just that.
It's helped me diagnose bugs with the Visual Studio IDE and debugger without access to source.

VS 2008 C++ build output?

Why when I watch the build output from a VC++ project in VS do I see:
1>Compiling...
1>a.cpp
1>b.cpp
1>c.cpp
1>d.cpp
1>e.cpp
[etc...]
1>Generating code...
1>x.cpp
1>y.cpp
[etc...]
The output looks as though several compilation units are being handled before any code is generated. Is this really going on? I'm trying to improve build times, and by using pre-compiled headers, I've gotten great speedups for each ".cpp" file, but there is a relatively long pause during the "Generating Code..." message. I do not have "Whole Program Optimization" nor "Link Time Code Generation" turned on. If this is the case, then why? Why doesn't VC++ compile each ".cpp" individually (which would include the code generation phase)? If this isn't just an illusion of the output, is there cross-compilation-unit optimization potentially going on here? There don't appear to be any compiler options to control that behavior (I know about WPO and LTCG, as mentioned above).
EDIT:
The build log just shows the ".obj" files in the output directory, one per line. There is no indication of "Compiling..." vs. "Generating code..." steps.
EDIT:
I have confirmed that this behavior has nothing to do with the "maximum number of parallel project builds" setting in Tools -> Options -> Projects and Solutions -> Build and Run. Nor is it related to the MSBuild project build output verbosity setting. Indeed if I cancel the build before the "Generating code..." step, none of the ".obj" files will exist for the most recent set of "compiled" files. This implies that the compiler truly is handling multiple translation units together. Why is this?
Compiler architecture
The compiler is not generating code from the source directly, it first compiles it into an intermediate form (see compiler front-end) and then generates the code from the intermediate form, including any optimizations (see compiler back-end).
Visual Studio compiler process spawning
In a Visual Studio build compiler process (cl.exe) is executed to compile multiple source files sharing the same command line options in one command. The compiler first performs "compilation" sequentially for each file (this is most likely front-end), but "Generating code" (probably back-end) is done together for all files once compilation is done with them.
You can confirm this by watching cl.exe with Process Explorer.
Why code generation for multiple files at once
My guess is Code generation being done for multiple files at once is done to make the build process faster, as it includes some things which can be done only once for multiple sources, like instantiating templates - it has no use to instantiate them multiple times, as all instances but one would be discarded anyway.
Whole program optimization
In theory it would be possible to perform some cross-compilation-unit optimization as well at this point, but it is not done - no such optimizations are ever done unless enabled with /LTCG, and with LTCG the whole Code generation is done for the whole program at once (hence the Whole Program Optimization name).
Note: it seems as if WPO is done by linker, as it produces exe from obj files, but this a kind of illusion - the obj files are not real object files, they contain the intermediate representation, and the "linker" is not a real linker, as it is not only linking the existing code, it is generating and optimizing the code as well.
It is neither parallelization nor code optimization.
The long "Generating Code..." phase for multiple source files goes back to VC6. It occurs independent of optimizations settings or available CPUs, even in debug builds with optimizations disabled.
I haven't analyzed in detail, but my observations are: They occur when switching between units with different compile options, or when certain amounts of code has passed the "file-by-file" part. It's also the stage where most compiler crashes occured in VC6 .
Speculation: I've always assumed that it's the "hard part" that is improved by processing multiple items at once, maybe just the code and data loaded in cache. Another possibility is that the single step phase eats memory like crazy and "Generating code" releases that.
To improve build performance:
Buy the best machine you can afford
It is the fastest, cheapest improvement you can make. (unless you already have one).
Move to Windows 7 x64, buy loads of RAM, and an i7 860 or similar. (Moving from a Core2 dual core gave me a factor of 6..8, building on all CPUs.)
(Don't go cheap on the disks, too.)
Split into separate projects for parallel builds
This is where 8 CPUS (even if 4 physical + HT) with loads of RAM come to play. You can enable per-project parallelization with /MP option, but this is incompatible with many other features.
At one time compilation meant parse the source and generate code. Now though, compilation means parse the source and build up a symbolic database representing the code. The database can then be transformed to resolved references between symbols. Later on, the database is used as the source to generate code.
You haven't got optimizations switched on. That will stop the build process from optimizing the generated code (or at least hint that optimizations shouldn't be done... I wouldn't like to guarantee no optimizations are performed). However, the build process is still optimized. So, multiple .cpp files are being batched together to do this.
I'm not sure how the decision is made as to how many .cpp files get batched together. Maybe the compiler starts processing files until it decides the memory size of the database is large enough such that if it grows any more the system will have to start doing excessive paging of data in and out to disk and the performance gains of batching any more .cpp files would be negated.
Anyway, I don't work for the VC compiler team, so can't answer conclusively, but I always assumed it was doing it for this reason.
There's a new write-up on the Visual C++ Blog that details some undocumented switches that can be used to time/profile various stages of the build process (I'm not sure how much, if any, of the write-up applies to versions of MSVC prior to VS2010). Interesting stuff which should provide at least a little insight into what's going on behind the scenes:
http://blogs.msdn.com/vcblog/archive/2010/04/01/vc-tip-get-detailed-build-throughput-diagnostics-using-msbuild-compiler-and-linker.aspx
If nothing else, it lets you know what processes, dlls, and at least some of the phases of translation/processing correspond to which messages you see in normal build output.
It parallelizes the build (or at least the compile) if you have a multicore CPU
edit: I am pretty sure it parallelizes in the same was as "make -j", it compiles multiple cpp files at the same time (since cpp are generally independent) - but obviously links them once.
On my core-2 machine it is showing 2 devenv jobs while compiling a single project.

Partial builds versus full builds in Visual C++

For most of my development work with Visual C++, I am using partial builds, e.g. press F7 and only changed C++ files and their dependencies get rebuilt, followed by an incremental link. Before passing a version onto testing, I take the precaution of doing a full rebuild, which takes about 45 minutes on my current project. I have seen many posts and articles advocating this action, but wonder is this necessary, and if so, why? Does it affect the delivered EXE or the associated PDB (which we also use in testing)? Would the software function any different from a testing perspective?
For release builds, I'm using VS2005, incremental compilation and linking, precompiled headers.
The partial build system works by checking file dates of source files against the build results. So it can break if you e.g. restore an earlier file from source control. The earlier file would have a modified date earlier than the build product, so the product wouldn't be rebuilt. To protect against these errors, you should do a complete build if it is a final build. While you are developing though, incremental builds are of course much more efficient.
Edit: And of course, doing a full rebuild also shields you from possible bugs in the incremental build system.
The basic problem is that compilation is dependent on the environment (command-line flags, libraries available, and probably some Black Magic), and so two compilations will only have the same result if they are performed in the same conditions. For testing and deployment, you want to make sure that the environments are as controlled as possible and you aren't getting wacky behaviours due to odd code. A good example is if you update a system library, then recompile half the files - half are still trying to use the old code, half are not. In a perfect world, this would either error out right away or not cause any problems, but sadly, sometimes neither of those happen. As a result, doing a complete recompilation avoids a lot of problems associated with a staggered build process.
Hasn't everyone come across this usage pattern? I get weird build errors, and before even investigating I do a full rebuild, and the problem goes away.
This by itself seems to me to be good enough reason to do a full rebuild before a release.
Whether you would be willing to turn an incremental build that completes without problems over to testing, is a matter of taste, I think.
I would definitely recommend it. I have seen on a number of occasions with a large Visual C++ solution the dependency checker fail to pick up some dependency on changed code. When this change is to a header file that effects the size of an object very strange things can start to happen.
I am sure the dependency checker has got better in VS 2008, but I still wouldn't trust it for a release build.
The biggest reason not to ship an incrementally linked binary is that some optimizations are disabled. The linker will leave padding between functions (to make it easier to replace them on the next incremental link). This adds some bloat to the binary. There may be extra jumps as well, which changes the memory access pattern and can cause extra paging and/or cache misses. Older versions of functions may continue to reside in the executable even though they are never called. This also leads to binary bloat and slower performance. And you certainly can't use link-time code generation with incremental linking, so you miss out on more optimizations.
If you're giving a debug build to a tester, then it probably isn't a big deal. But your release candidates should be built from scratch in release mode, preferably on a dedicated build machine with a controlled environment.
Visual Studio has some problems with partial (incremental) builds, (I mostly encountered linking errors) From time to time, it is very useful to have a full rebuild.
In case of long compilation times, there are two solutions:
Use a parallel compilation tool and take advantage of your (assumed) multi core hardware.
Use a build machine. What I use most is a separate build machine, with a CruiseControl set up, that performs full rebuilds from time to time. The "official" release that I provide to the testing team, and, eventually, to the customer, is always taken from the build machine, not from the developer's environment.