How to compile and start VSC++ Projects Faster?

How to compile and start VSC++ Projects Faster? - c++

What techniques do you use to compile and start VSC++ projects fast?
For us, especially the loading of all the dlls take a long time. Is there a way to speed this up? The project loads a ton of .dlls and some of them are especially slow.
Now that we use unity build for our projects, it already compiles blazingly fast! =)
Thanks!

DLLs have a default load location embedded into them. This is typically defaulted by the development tool to the same address for all DLLs. This means that whenn the DLLs are loaded into memory, there are a lot of collisions and the DLL has to be readdressed and loaded into a free memory location. When working on a project that had a significant number of DLL dependencies, we were able to make significant load time savings by setting the default address for our DLLs.
A fuller explanation into what's going on and how it helps can be found at drdobbs.
It's been some years since I've done this, so it may be out of date now.
It's worth keeping in mind if you go down this route, it might not play very well with .net.

Use delay-loaded libraries. It's a simple compile settings change (typically no code changes needed), yet it can offer very big improvements.
Of course, you still have the load times of those DLLs when you actually use them, but if you have many DLLs there's also a large chance that you won't use all of them all of the time.

Related

Should I group all my code and move into DLLs?

Will I get better performance if I break my code into pieces put them in DLLs?
Is there something wrong with having multiple DLLs in terms of performance or is it better? Or does not have any affect?
My project is quite large and I heard that DLLs are not suitable for cross-platform apps. Is it true?

DLLs can have positive and negative impact on performance. As with all performance questions you should get data before committing to a strategy.
Huge headers / huge code / slow compilation does not mean slow performance. It's often the other way around: that slow compilation is because you've given the optimizer a lot to work with. And simply rearranging your project won't reduce the amount of code. You'll just cordon off bits so the optimizer has less to work with. You may benefit in project structure, but at the cost of optimization opportunities.
A lot of modern elegant C++ is templated and purely in header files. That allows inlining as the optimizer sees fit. Breaking code into separate images will prevent optimization across image boundaries.
Consider the development cost of using DLLs as well. Interfaces across DLLs present a lot of complications. For example templated / header-inlined code should be avoided in the ABI. It's the entire reason COM was invented: to deal with the difficulties of passing basic objects between DLL boundaries.
To try and answer the question, my gut feeling is that "No", breaking your project into multiple modules will do absolutely nothing to improve performance and is much more likely to make it perform worse. I say that with about 93% confidence. But as I said: The only way to properly answer your question is to measure.

No. DLL(s) cause more delay in loading than a statically linked libraries.
Because the operating system loader needs to open the DLL and link it individually which affects time. Calling DLL functions is not slow because they are pointed in the import address table of your executable image, the loader then modify these pointers to match the address of symbols imported in the DLL.

Faster build times in C++ [duplicate]

I once worked on a C++ project that took about an hour and a half for a full rebuild. Small edit, build, test cycles took about 5 to 10 minutes. It was an unproductive nightmare.
What is the worst build times you ever had to handle?
What strategies have you used to improve build times on large projects?
Update:
How much do you think the language used is to blame for the problem? I think C++ is prone to massive dependencies on large projects, which often means even simple changes to the source code can result in a massive rebuild. Which language do you think copes with large project dependency issues best?

Forward declaration
pimpl idiom
Precompiled headers
Parallel compilation (e.g. MPCL add-in for Visual Studio).
Distributed compilation (e.g. Incredibuild for Visual Studio).
Incremental build
Split build in several "projects" so not compile all the code if not needed.
[Later Edit]
8. Buy faster machines.

My strategy is pretty simple - I don't do large projects. The whole thrust of modern computing is away from the giant and monolithic and towards the small and componentised. So when I work on projects, I break things up into libraries and other components that can be built and tested independantly, and which have minimal dependancies on each other. A "full build" in this kind of environment never actually takes place, so there is no problem.

One trick that sometimes helps is to include everything into one .cpp file. Since includes are processed once per file, this can save you a lot of time. (The downside to this is that it makes it impossible for the compiler to parallelize compilation)
You should be able to specify that multiple .cpp files should be compiled in parallel (-j with make on linux, /MP on MSVC - MSVC also has an option to compile multiple projects in parallel. These are separate options, and there's no reason why you shouldn't use both)
In the same vein, distributed builds (Incredibuild, for example), may help take the load off a single system.
SSD disks are supposed to be a big win, although I haven't tested this myself (but a C++ build touches a huge number of files, which can quickly become a bottleneck).
Precompiled headers can help too, when used with care. (They can also hurt you, if they have to be recompiled too often).
And finally, trying to minimize dependencies in the code itself is important. Use the pImpl idiom, use forward declarations, keep the code as modular as possible. In some cases, use of templates may help you decouple classes and minimize dependencies. (In other cases, templates can slow down compilation significantly, of course)
But yes, you're right, this is very much a language thing. I don't know of another language which suffers from the problem to this extent. Most languages have a module system that allows them to eliminate header files, which area huge factor. C has header files, but is such a simple language that compile times are still manageable. C++ gets the worst of both worlds. A big complex language, and a terrible primitive build mechanism that requires a huge amount of code to be parsed again and again.

Multi core compilation. Very fast with 8 cores compiling on the I7.
Incremental linking
External constants
Removed inline methods on C++ classes.
The last two gave us a reduced linking time from around 12 minutes to 1-2 minutes. Note that this is only needed if things have a huge visibility, i.e. seen "everywhere" and if there are many different constants and classes.
Cheers

IncrediBuild

Unity Builds
Incredibuild
Pointer to implementation
forward declarations
compiling "finished" sections of the proejct into dll's

ccache & distcc (for C/C++ projects) -
ccache caches compiled output, using the pre-processed file as the 'key' for finding the output. This is great because pre-processing is pretty quick, and quite often changes that force recompile don't actually change the source for many files. Also, it really speeds up a full re-compile. Also nice is the instance where you can have a shared cache among team members. This means that only the first guy to grab the latest code actually compiles anything.
distcc does distributed compilation across a network of machines. This is only good if you HAVE a network of machines to use for compilation. It goes well with ccache, and only moves the pre-processed source around, so the only thing you have to worry about on the compiler engine systems is that they have the right compiler (no need for headers or your entire source tree to be visible).

The best suggestion is to build makefiles that actually understand dependencies and do not automatically rebuild the world for a small change. But, if a full rebuild takes 90 minutes, and a small rebuild takes 5-10 minutes, odds are good that your build system already does that.
Can the build be done in parallel? Either with multiple cores, or with multiple servers?
Checkin pre-compiled bits for pieces that really are static and do not need to be rebuilt every time. 3rd party tools/libraries that are used, but not altered are a good candidate for this treatment.
Limit the build to a single 'stream' if applicable. The 'full product' might include things like a debug version, or both 32 and 64 bit versions, or may include help files or man pages that are derived/built every time. Removing components that are not necessary for development can dramatically reduce the build time.
Does the build also package the product? Is that really required for development and testing? Does the build incorporate some basic sanity tests that can be skipped?
Finally, you can re-factor the code base to be more modular and to have fewer dependencies. Large Scale C++ Software Design is an excellent reference for learning to decouple large software products into something that is easier to maintain and faster to build.
EDIT: Building on a local filesystem as opposed to a NFS mounted filesystem can also dramatically speed up build times.

Fiddle with the compiler optimisation flags,
use option -j4 for gmake for parallel compilation (multicore or single core)
if you are using clearmake , use winking
we can take out the debug flags..in extreme cases.
Use some powerful servers.

This book Large-Scale C++ Software Design has very good advice I've used in past projects.

Minimize your public API
Minimize inline functions in your API. (Unfortunately this also increases linker requirements).
Maximize forward declarations.
Reduce coupling between code. For instance pass in two integers to a function, for coordinates, instead of your custom Point class that has it's own header file.
Use Incredibuild. But it has some issues sometimes.
Do NOT put code that get exported from two different modules in the SAME header file.
Use the PImple idiom. Mentioned before, but bears repeating.
Use Pre-compiled headers.
Avoid C++/CLI (i.e. managed c++). Linker times are impacted too.
Avoid using a global header file that includes 'everything else' in your API.
Don't put a dependency on a lib file if your code doesn't really need it.
Know the difference between including files with quotes and angle brackets.

Powerful compilation machines and parallel compilers. We also make sure the full build is needed as little as possible. We don't alter the code to make it compile faster.
Efficiency and correctness is more important than compilation speed.

In Visual Studio, you can set number of project to compile at a time. Its default value is 2, increasing that would reduce some time.
This will help if you don't want to mess with the code.

This is the list of things we did for a development under Linux :
As Warrior noted, use parallel builds (make -jN)
We use distributed builds (currently icecream which is very easy to setup), with this we can have tens or processors at a given time. This also has the advantage of giving the builds to the most powerful and less loaded machines.
We use ccache so that when you do a make clean, you don't have to really recompile your sources that didn't change, it's copied from a cache.
Note also that debug builds are usually faster to compile since the compiler doesn't have to make optimisations.

We tried creating proxy classes once.
These are really a simplified version of a class that only includes the public interface, reducing the number of internal dependencies that need to be exposed in the header file. However, they came with a heavy price of spreading each class over several files that all needed to be updated when changes to the class interface were made.

In general large C++ projects that I've worked on that had slow build times were pretty messy, with lots of interdependencies scattered through the code (the same include files used in most cpps, fat interfaces instead of slim ones). In those cases, the slow build time was just a symptom of the larger problem, and a minor symptom at that. Refactoring to make clearer interfaces and break code out into libraries improved the architecture, as well as the build time. When you make a library, it forces you to think about what is an interface and what isn't, which will actually (in my experience) end up improving the code base. If there's no technical reason to have to divide the code, some programmers through the course of maintenance will just throw anything into any header file.

Cătălin Pitiș covered a lot of good things. Other ones we do:
Have a tool that generates reduced Visual Studio .sln files for people working in a specific sub-area of a very large overall project
Cache DLLs and pdbs from when they are built on CI for distribution on developer machines
For CI, make sure that the link machine in particular has lots of memory and high-end drives
Store some expensive-to-regenerate files in source control, even though they could be created as part of the build
Replace Visual Studio's checking of what needs to be relinked by our own script tailored to our circumstances

It's a pet peeve of mine, so even though you already accepted an excellent answer, I'll chime in:
In C++, it's less the language as such, but the language-mandated build model that was great back in the seventies, and the header-heavy libraries.
The only thing that is wrong about Cătălin Pitiș' reply: "buy faster machines" should go first. It is the easyest way with the least impact.
My worst was about 80 minutes on an aging build machine running VC6 on W2K Professional. The same project (with tons of new code) now takes under 6 minutes on a machine with 4 hyperthreaded cores, 8G RAM Win 7 x64 and decent disks. (A similar machine, about 10..20% less processor power, with 4G RAM and Vista x86 takes twice as long)
Strangely, incremental builds are most of the time slower than full rebuuilds now.

Full build is about 2 hours. I try to avoid making modification to the base classes and since my work is mainly on the implementation of these base classes I only need to build small components (couple of minutes).

Create some unit test projects to test individual libraries, so that if you need to edit low level classes that would cause a huge rebuild, you can use TDD to know your new code works before you rebuild the entire app. The John Lakos book as mentioned by Themis has some very practical advice for restructuring your libraries to make this possible.

Understanding static & dynamic library linking [duplicate]

Are there any compelling performance reasons to choose static linking over dynamic linking or vice versa in certain situations? I've heard or read the following, but I don't know enough on the subject to vouch for its veracity.
1) The difference in runtime performance between static linking and dynamic linking is usually negligible.
2) (1) is not true if using a profiling compiler that uses profile data to optimize program hotpaths because with static linking, the compiler can optimize both your code and the library code. With dynamic linking only your code can be optimized. If most of the time is spent running library code, this can make a big difference. Otherwise, (1) still applies.

Dynamic linking can reduce total resource consumption (if more than one process shares the same library (including the version in "the same", of course)). I believe this is the argument that drives its presence in most environments. Here "resources" include disk space, RAM, and cache space. Of course, if your dynamic linker is insufficiently flexible there is a risk of DLL hell.
Dynamic linking means that bug fixes and upgrades to libraries propagate to improve your product without requiring you to ship anything.
Plugins always call for dynamic linking.
Static linking, means that you can know the code will run in very limited environments (early in the boot process, or in rescue mode).
Static linking can make binaries easier to distribute to diverse user environments (at the cost of sending a larger and more resource-hungry program).
Static linking may allow slightly faster startup times, but this depends to some degree on both the size and complexity of your program and on the details of the OS's loading strategy.
Some edits to include the very relevant suggestions in the comments and in other answers. I'd like to note that the way you break on this depends a lot on what environment you plan to run in. Minimal embedded systems may not have enough resources to support dynamic linking. Slightly larger small systems may well support dynamic linking because their memory is small enough to make the RAM savings from dynamic linking very attractive. Full-blown consumer PCs have, as Mark notes, enormous resources, and you can probably let the convenience issues drive your thinking on this matter.
To address the performance and efficiency issues: it depends.
Classically, dynamic libraries require some kind of glue layer which often means double dispatch or an extra layer of indirection in function addressing and can cost a little speed (but is the function calling time actually a big part of your running time???).
However, if you are running multiple processes which all call the same library a lot, you can end up saving cache lines (and thus winning on running performance) when using dynamic linking relative to using static linking. (Unless modern OS's are smart enough to notice identical segments in statically linked binaries. Seems hard, does anyone know?)
Another issue: loading time. You pay loading costs at some point. When you pay this cost depends on how the OS works as well as what linking you use. Maybe you'd rather put off paying it until you know you need it.
Note that static-vs-dynamic linking is traditionally not an optimization issue, because they both involve separate compilation down to object files. However, this is not required: a compiler can in principle, "compile" "static libraries" to a digested AST form initially, and "link" them by adding those ASTs to the ones generated for the main code, thus empowering global optimization. None of the systems I use do this, so I can't comment on how well it works.
The way to answer performance questions is always by testing (and use a test environment as much like the deployment environment as possible).

1) is based on the fact that calling a DLL function is always using an extra indirect jump. Today, this is usually negligible. Inside the DLL there is some more overhead on i386 CPU's, because they can't generate position independent code. On amd64, jumps can be relative to the program counter, so this is a huge improvement.
2) This is correct. With optimizations guided by profiling you can usually win about 10-15 percent performance. Now that CPU speed has reached its limits it might be worth doing it.
I would add: (3) the linker can arrange functions in a more cache efficient grouping, so that expensive cache level misses are minimised. It also might especially effect the startup time of applications (based on results i have seen with the Sun C++ compiler)
And don't forget that with DLLs no dead code elimination can be performed. Depending on the language, the DLL code might not be optimal either. Virtual functions are always virtual because the compiler doesn't know whether a client is overwriting it.
For these reasons, in case there is no real need for DLLs, then just use static compilation.
EDIT (to answer the comment, by user underscore)
Here is a good resource about the position independent code problem http://eli.thegreenplace.net/2011/11/03/position-independent-code-pic-in-shared-libraries/
As explained x86 does not have them AFAIK for anything else then 15 bit jump ranges and not for unconditional jumps and calls. That's why functions (from generators) having more then 32K have always been a problem and needed embedded trampolines.
But on popular x86 OS like Linux you do not need to care if the .so/DLL file is not generated with the gcc switch -fpic (which enforces the use of the indirect jump tables). Because if you don't, the code is just fixed like a normal linker would relocate it. But while doing this it makes the code segment non shareable and it would need a full mapping of the code from disk into memory and touching it all before it can be used (emptying most of the caches, hitting TLBs) etc. There was a time when this was considered slow.
So you would not have any benefit anymore.
I do not recall what OS (Solaris or FreeBSD) gave me problems with my Unix build system because I just wasn't doing this and wondered why it crashed until I applied -fPIC to gcc.

Dynamic linking is the only practical way to meet some license requirements such as the LGPL.

I agree with the points dnmckee mentions, plus:
Statically linked applications might be easier to deploy, since there are fewer or no additional file dependencies (.dll / .so) that might cause problems when they're missing or installed in the wrong place.

One reason to do a statically linked build is to verify that you have full closure for the executable, i.e. that all symbol references are resolved correctly.
As a part of a large system that was being built and tested using continuous integration, the nightly regression tests were run using a statically linked version of the executables. Occasionally, we would see that a symbol would not resolve and the static link would fail even though the dynamically linked executable would link successfully.
This was usually occurring when symbols that were deep seated within the shared libs had a misspelt name and so would not statically link. The dynamic linker does not completely resolve all symbols, irrespective of using depth-first or breadth-first evaluation, so you can finish up with a dynamically linked executable that does not have full closure.

1/ I've been on projects where dynamic linking vs static linking was benchmarked and the difference wasn't determined small enough to switch to dynamic linking (I wasn't part of the test, I just know the conclusion)
2/ Dynamic linking is often associated with PIC (Position Independent Code, code which doesn't need to be modified depending on the address at which it is loaded). Depending on the architecture PIC may bring another slowdown but is needed in order to get benefit of sharing a dynamically linked library between two executable (and even two process of the same executable if the OS use randomization of load address as a security measure). I'm not sure that all OS allow to separate the two concepts, but Solaris and Linux do and ISTR that HP-UX does as well.
3/ I've been on other projects which used dynamic linking for the "easy patch" feature. But this "easy patch" makes the distribution of small fix a little easier and of complicated one a versioning nightmare. We often ended up by having to push everything plus having to track problems at customer site because the wrong version was token.
My conclusion is that I'd used static linking excepted:
for things like plugins which depend on dynamic linking
when sharing is important (big libraries used by multiple processes at the same time like C/C++ runtime, GUI libraries, ... which often are managed independently and for which the ABI is strictly defined)
If one want to use the "easy patch", I'd argue that the libraries have to be managed like the big libraries above: they must be nearly independent with a defined ABI that must not to be changed by fixes.

Static linking is a process in compile time when a linked content is copied into the primary binary and becomes a single binary.
Cons:
compile time is longer
output binary is bigger
Dynamic linking is a process in runtime when a linked content is loaded. This technic allows to:
upgrade linked binary without recompiling a primary one that increase an ABI stability[About]
has a single shared copy
Cons:
start time is slower(linked content should be copied)
linker errors are thrown in runtime
[iOS Static vs Dynamic framework]

It is pretty simple, really. When you make a change in your source code, do you want to wait 10 minutes for it to build or 20 seconds? Twenty seconds is all I can put up with. Beyond that, I either get out the sword or start thinking about how I can use separate compilation and linking to bring it back into the comfort zone.

Best example for dynamic linking is, when the library is dependent on the used hardware. In ancient times the C math library was decided to be dynamic, so that each platform can use all processor capabilities to optimize it.
An even better example might be OpenGL. OpenGl is an API that is implemented differently by AMD and NVidia. And you are not able to use an NVidia implementation on an AMD card, because the hardware is different. You cannot link OpenGL statically into your program, because of that. Dynamic linking is used here to let the API be optimized for all platforms.

Dynamic linking requires extra time for the OS to find the dynamic library and load it. With static linking, everything is together and it is a one-shot load into memory.
Also, see DLL Hell. This is the scenario where the DLL that the OS loads is not the one that came with your application, or the version that your application expects.

On Unix-like systems, dynamic linking can make life difficult for 'root' to use an application with the shared libraries installed in out-of-the-way locations. This is because the dynamic linker generally won't pay attention to LD_LIBRARY_PATH or its equivalent for processes with root privileges. Sometimes, then, static linking saves the day.
Alternatively, the installation process has to locate the libraries, but that can make it difficult for multiple versions of the software to coexist on the machine.

Another issue not yet discussed is fixing bugs in the library.
With static linking, you not only have to rebuild the library, but will have to relink and redestribute the executable. If the library is just used in one executable, this may not be an issue. But the more executables that need to be relinked and redistributed, the bigger the pain is.
With dynamic linking, you just rebuild and redistribute the dynamic library and you are done.

Static linking includes the files that the program needs in a single executable file.
Dynamic linking is what you would consider the usual, it makes an executable that still requires DLLs and such to be in the same directory (or the DLLs could be in the system folder).
(DLL = dynamic link library)
Dynamically linked executables are compiled faster and aren't as resource-heavy.

static linking gives you only a single exe, inorder to make a change you need to recompile your whole program. Whereas in dynamic linking you need to make change only to the dll and when you run your exe, the changes would be picked up at runtime.Its easier to provide updates and bug fixes by dynamic linking (eg: windows).

There are a vast and increasing number of systems where an extreme level of static linking can have an enormous positive impact on applications and system performance.
I refer to what are often called "embedded systems", many of which are now increasingly using general-purpose operating systems, and these systems are used for everything imaginable.
An extremely common example are devices using GNU/Linux systems using Busybox. I've taken this to the extreme with NetBSD by building a bootable i386 (32-bit) system image that includes both a kernel and its root filesystem, the latter which contains a single static-linked (by crunchgen) binary with hard-links to all programs that itself contains all (well at last count 274) of the standard full-feature system programs (most except the toolchain), and it is less than 20 megabytes in size (and probably runs very comfortably in a system with only 64MB of memory (even with the root filesystem uncompressed and entirely in RAM), though I've been unable to find one so small to test it on).
It has been mentioned in earlier posts that the start-up time of a static-linked binaries is faster (and it can be a lot faster), but that is only part of the picture, especially when all object code is linked into the same file, and even more especially when the operating system supports demand paging of code direct from the executable file. In this ideal scenario the startup time of programs is literally negligible since almost all pages of code will already be in memory and be in use by the shell (and and init any other background processes that might be running), even if the requested program has not ever been run since boot since perhaps only one page of memory need be loaded to fulfill the runtime requirements of the program.
However that's still not the whole story. I also usually build and use the NetBSD operating system installs for my full development systems by static-linking all binaries. Even though this takes a tremendous amount more disk space (~6.6GB total for x86_64 with everything, including toolchain and X11 static-linked) (especially if one keeps full debug symbol tables available for all programs another ~2.5GB), the result still runs faster overall, and for some tasks even uses less memory than a typical dynamic-linked system that purports to share library code pages. Disk is cheap (even fast disk), and memory to cache frequently used disk files is also relatively cheap, but CPU cycles really are not, and paying the ld.so startup cost for every process that starts every time it starts will take hours and hours of CPU cycles away from tasks which require starting many processes, especially when the same programs are used over and over, such as compilers on a development system. Static-linked toolchain programs can reduce whole-OS multi-architecture build times for my systems by hours. I have yet to build the toolchain into my single crunchgen'ed binary, but I suspect when I do there will be more hours of build time saved because of the win for the CPU cache.

Another consideration is the number of object files (translation units) that you actually consume in a library vs the total number available. If a library is built from many object files, but you only use symbols from a few of them, this might be an argument for favoring static linking, since you only link the objects that you use when you static link (typically) and don't normally carry the unused symbols. If you go with a shared lib, that lib contains all translation units and could be much larger than what you want or need.

Is it bad practice to have an application which ships with lots of DLL's?

Is it better to have lots of DLL dependencies or better to static link as mich as possible?
Thanks

No, it is not bad practice to ship with lots of DLLs; it is bad practice, though, to put them in %System32%. Actually, it is usually good to use DLLs instead of statically linking; for one thing, you can easily swap out just the DLL that you need to update, rather than having to replace the entire binary, and for another, if your program eventually needs multiple executables that work together, you only pay for one copy of the DLL code (whereas, with static linking, you would end up duplicating the code that was common).

Having static link gives your app a large memory footprint, therefore having DLL's is better from that POV i.e. you only load what you need. Nowadays installations are normally done by an installer so it doesn't matter if you have lots of DLL's.

I don't think it's a bad practice. Look at Office or Adobe or any large-scale application. They end up with lots of DLLs -- because they otherwise would have to pack everything into a 100M+ exe.
Break things into DLL when you don't absolutely need them.

Generally speaking it is not a bad practice. It is better to split the code of a program into separated dynamic libraries, especially if the functions provided are used from more than one executable.
That doesn't mean that every program should have its code split in more dynamic libraries; for simple utilities, that is not probably needed.

As mentioned by others, lots of DLLs is not a bad practice. Put some thought into what to put in each one. I like to keep the DLLs as 'tiny-island-ish' as I can. If these will be distributed, I like to have a specific naming convention that reflects the product and/or company name and/or initials of some sort.

Just wanted to add another observation from other programs that use many dynamically loaded DLLs. For example, the GIMP and its plug-ins. The way you load your DLLs will affect your client's perceived application speed, if that's a factor among the other very good ones (updates, reuse, etc.). I'm sure there's some overhead for the OS to load a DLL and you might run into process limits (like open file handles). Having very many very small DLLs might not be as desired as "smaller than that" number of "larger than that"-sized DLLs.

Static or dynamic linking the CRT, MFC, ATL, etc

Back in the 90s when I first started out with MFC I used to dynamically link my apps and shipped the relevant MFC DLLs. This caused me a few issues (DLL hell!) and I switched to statically linking instead - not just for MFC, but for the CRT and ATL. Other than larger EXE files, statically linking has never caused me any problems at all - so are there any downsides that other people have come across? Is there a good reason for revisiting dynamic linking again? My apps are mainly STL/Boost nowadays FWIW.

Most of the answers I hear about this involve sharing your dll's with other programs, or having those dll's be updated without the need to patch your software.
Frankly I consider those to be downsides, not upsides. When a third party dll is updated, it can change enough to break your software. And these days, hard drive space isn't as precious as it once was, an extra 500k in your executable? Who cares?
Being 100% sure of the version of dll that your software is using is a good thing.
Being 100% sure that the client is not going to have a dependency headache is a good thing.
The upsides far outweigh the downsides in my opinion

There are some downsides:
Bigger exe size (esp if you ship multiple exe's)
Problems using other DLL's which rely on or assume dynamic linking (eg: 3rd party DLL's which you cannot get as static libraries)
Different c-runtimes between DLL's with independent static linkage (no cross-module allocate/deallocate)
No automatic servicing of shared components (no ability to have 3rd party module supplier update their code to fix issues without recompiling and updating your application)
We do static linking for our Windows apps, primarily because it allows xcopy deployment, which is just not possible with installing or relying on SxS DLL's in a way which works, since the process and mechanism is not well documented or easily remotable. If you use local DLL's in the install directory it will kinda work, but it's not well supported. The inability to easily do remote installation without going through a MSI on the remote system is the primary reason why we don't use dynamic linking, but (as you pointed out) there are many other benefits to static linking. There are pros and cons to each; hopefully this helps enumerate them.

As long as you keep your usage limited to certain libraries and do not use any dll's then you should be good.
Unfortunately, there are some libraries that you cannot link statically. The best example I have is OpenMP. If you take advantage of Visual Studio's OpenMP support, you will have to make sure the runtime is installed (in this case vcomp.dll).
If you do use dll's then you can't pass some items back and forth without some serious gymnastics. std::strings come to mind. If your exe and dll are dynamically linked then the allocation takes place in in the CRT. Otherwise your program may try to allocate the string on one side and deallocate it on the other. Bad things ensue...
That said, I still statically link my exe's and dll's. It does reduce a lot of the variablilty in the install and I consider that well worth the few limitations.

One good feature of using dll's are that if multiple processess loads the same dll its code can be shared between them. This can save memory and shorten loading times for an application loading a dll that's already used by another program.

No, nothing new on that front. Keep it that way.

Most definitely.
Allocation is done on a 'static' heap. Since allocation an deallocation should be done on the same heap, this means that if you ship a library, you should take care that client code can not call 'your' p = new LibClass() and delete that object itself using delete p;.
My conclusion: either shield allocation and deallocation from client code, or dynamically link the CRT.

There are some software licenses such as LGPL that require you to either use a DLL or distribute your application as object files that the user can link together. If you are using such a library, you'll probably want to use it as a DLL.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js