Safe Unity Builds

Safe Unity Builds - c++

I work on a large C++ project which makes use of Unity builds. For those unfamiliar with the practice, Unity builds #include multiple related C++ implementation files into one large translation unit which is then compiled as one. This saves recompiling headers, reduces link times, improves executable size/performance by bring more functions into internal linkage, etc. Generally good stuff.
However, I just caught a latent bug in one of our builds. An implementation file used library without including the associated header file, yet it compiled and ran. After scratching my head for bit, I realized that it was being included by an implementation file included before this one in our unity build. No harm done here, but it could have been a perplexing surprise if someone had tried to reuse that file later independently.
Is there any way to catch these silent dependencies and still keep the benefits of Unity builds besides periodically building the non-Unity version?

I've used UB approaches before for frozen projects in our source repository that we never planned to maintain again. Unfortunately, I'm pretty sure the answer to your question is no. You have to periodically build all the cpp files separately if you want to test for those kinds of errors.
Probably the closest thing you can get to an automagic solution is a buildbot which automatically gathers all the cpp files in a project (with the exception of your UB files) and builds the regular way periodically from the source repository, pointing out any build errors in the meantime. This way your local development is still fast (using UBs), but you can still catch any errors you miss from using a unity build from these periodic buildbot builds which build all cpps separately.

I suggest not to use Unity Build for your local development environment. Unity Build won't help you improve compile time when you edit and compile anyway. Use Unity Build only for your non-incremental continuous build system, which you are not expect to use the production from.
As long as every one commit changes after they have compiled locally, the situation you described should not appear.
And Unity Build might form unexpected overload calls between locally defined functions happen to use with duplicated names. That's dangerous and you are not aware of that, i.e. no compile error or warning might generated in such case. Unless you have a way to prevent that, please do not reply on the production generated by Unity Build.

Related

Prevent BizTalk projects from invoking a full rebuild?

I'm on a BizTalk 2013 solution and I'm trying to grow into automated testing. However, when I try to run my tests after changing only the test project, or even just run the tests after changing nothing anywhere, I'm stuck building the same amount of projects that I build when I invoke a full rebuild on the project being tested. This eats up an enormous amount of time, and it's a death sentence for my ability to sell future investments into this type of thing.
Is this is a known deficiency with BizTalk, or with its interaction with MSBuild? Is it a known pitfall that I can repair on my end?
EDIT: After reviewing the "possible duplicate" thread, I believe this question to be similar, but distinct. The explanation from the thread highlights the mechanics by which MSBuild determines that a rebuild is necessary, but MSBuild is widely-used technology across all projects in Visual Studio and can differ significantly by project type based on that project type's specific targets import. I've edited the question title to reflect that I want to learn how to prevent this for BizTalk solutions rather than simply asking why it's happening (although knowing why is always helpful).

So, what you're seeing is not a problem with BizTalk (because BizTalk is perfect and wonderful and never has any problems ever...:).
It's actually a behavior of Visual Studio. To note, BizTalk Projects are just specialized c# Projects.
The best workaround, which I do all the time, is to uncheck the Build and Deploy options for Projects I'm not actively working with in the Solution Configuration. If the Project is not checked for Build, it will not build even when you choose Rebuild Solution.

One possible solution would be to reference not the projects, but the DLL files which are the result of the same - already compiled and built - projects.
This way, when building your test project, it would be built against these existing assemblies and hence would not take the time to rebuild those.
You have to make sure however that these DLLs are updated whenever the project behind them also updates. You could do this by rebuilding them, whenever necessary, in a separate Visual Studio instance.
It takes some practice and thinking to make sure you are building against the latest version, but it WILL save you a lot of time.

I've noticed this as well. Turning on diagnostic output on MSBuild, it turned out that the project settings .user files were being modified after the .pdb files. I've tried several ways of resolving this, including changing the modify date on the pdb file, setting the .user file to readonly, removing (renaming) the .user file, etc.
Unfortunately, the build task for BizTalk will overwrite/recreate/create new .user file after every build, and I haven't come up with a way to convince MSBuild that that it can just ignore the .user file being created as new. Due to that, I'd go with one of the other suggestions here.
Even creating an exclusive lock on the file so that MSBuild can't update it causes a rebuild, since then MSBuild thinks the build is dirty ("Project 'Schemas' is not up to date. Project dirty in MSBuild.")

C Compiler automatic Version increment

I'm wondering if there's a macro or a simple way to let the compiler increment either major, minor or revision of my code each time when I compile?
By the way I'm using the ARM compiler and uVision from Keil.

To set the version is not a compiler topic. This should be done in connection with a source code control / version control system like cvs/svn/git or others. Your build id should be connected to the content of your source code database to get reproducible builds from checkouts from your version control system. Or, if your code is not already committed to your database a dirty-tag should be provided and compiled in to give the user of the software a chance to see that this is not a controlled version.
Simply counting a value in a variable can be done by a Makefile or in pre- and post-build instructions which depends on the used IDE. Sorry, for keil I have no experience...

Define Post-Build Event to run small external program. The program has to modify specific .h file. In the header file define macros like VER_MAJOR, VER_MINOR, VER_BUILD. Date/time string can also be updated. I use this method and can control the version numbers as I wish.

IMO, you do not need to do this, especially increase a number every time you compile your code.
Set the major and minor revision manually in a header file; you should not have to do this often.
Build number should only be related to the source control revision number (i.e. you should be able to build and rebuild any revision under source control).
Imagine if you are a team of 5 developers, and everyone build and rebuild on their side what is the actual build number?
do they all update a header file ?
who is responsible of owning that header file ?

Some compilers do support features like "post build", which runs a program of your choice after compiling, but that would be tricky if your program is built from multiple source files. Not all compilers do though.
For that reason, I wouldn't actually do this sort of thing via the compiler. I'd do it in the build script (e.g. makefile) or by configuring the build settings in your IDE.
Assuming you're using make or similar, add a target something like setversion (you pick the name) that runs a program which modifies a header file which specifies the components of your version number. So typing make setversion will update your version numbers.
Optionally, that target can also - after updating the version numbers - do a make clean (i.e. delete all object files and executables) and make all (to recompile and link everything).
I also suggest avoiding changing version numbers after each recompile. Imagine you're busily testing and debugging code, and go through several rebuild cycles. Do you really want the version number updated every time you recompile even one source file? It can be done that way if you choose, but will make every rebuild take longer (and, in projects with multiple source files) you will need to take care if you want to preserve capability for incremental builds.

What is the difference between compile code and executable code?

I always use the terms compile and build interchangeably.
What exactly do these terms stand for?

Compiling is the act of turning source code into object code.
Linking is the act of combining object code with libraries into a raw executable.
Building is the sequence composed of compiling and linking, with possibly other tasks such as installer creation.
Many compilers handle the linking step automatically after compiling source code.

From wikipedia:
In the field of computer software, the term software build refers either to the process of converting source code files into standalone software artifact(s) that can be run on a computer, or the result of doing so. One of the most important steps of a software build is the compilation process where source code files are converted into executable code.
While for simple programs the process consists of a single file being compiled, for complex software the source code may consist of many files and may be combined in different ways to produce many different versions.

A build could be seen as a script, which comprises of many steps - the primary one of which would be to compile the code.
Others could be
running tests
reporting (e.g. coverage)
static analysis
pre and post-build steps
running custom tools over certain files
creating installs
labelling them and deploying/copying them to a repository

They often are used to mean the same thing. However, "build" may also mean the full process of compiling and linking a whole application (in the case of e.g. C and C++), or even more, including, among others
packaging
automatic (unit and/or integration) testing
installer generation
installation/deployment
documentation/site generation
report generation (e.g. test results, coverage).
There are systems like Maven, which generalize this with the concept of lifecycle, which consists of several stages, producing different artifacts, possibly using results and artifacts from previous stages.

From my experience I would say that "compiling" refers to the conversion of one or several human-readable source files to byte code (object files in C) while "building" denominates the whole process of compiling, linking and whatever else needs to be done of an entire package or project.

Most people would probably use the terms interchangeably.
You could see one nuance : compiling is only the step where you pass some source file through the compiler (gcc, javac, whatever).
Building could be heard as the more general process of checking out the source, creating a target folder for the compiled artifacts, checking dependencies, choosing what has to be compiled, running automated tests, creating a tar / zip / ditributions, pushing to an ftp, etc...

Partial builds versus full builds in Visual C++

For most of my development work with Visual C++, I am using partial builds, e.g. press F7 and only changed C++ files and their dependencies get rebuilt, followed by an incremental link. Before passing a version onto testing, I take the precaution of doing a full rebuild, which takes about 45 minutes on my current project. I have seen many posts and articles advocating this action, but wonder is this necessary, and if so, why? Does it affect the delivered EXE or the associated PDB (which we also use in testing)? Would the software function any different from a testing perspective?
For release builds, I'm using VS2005, incremental compilation and linking, precompiled headers.

The partial build system works by checking file dates of source files against the build results. So it can break if you e.g. restore an earlier file from source control. The earlier file would have a modified date earlier than the build product, so the product wouldn't be rebuilt. To protect against these errors, you should do a complete build if it is a final build. While you are developing though, incremental builds are of course much more efficient.
Edit: And of course, doing a full rebuild also shields you from possible bugs in the incremental build system.

The basic problem is that compilation is dependent on the environment (command-line flags, libraries available, and probably some Black Magic), and so two compilations will only have the same result if they are performed in the same conditions. For testing and deployment, you want to make sure that the environments are as controlled as possible and you aren't getting wacky behaviours due to odd code. A good example is if you update a system library, then recompile half the files - half are still trying to use the old code, half are not. In a perfect world, this would either error out right away or not cause any problems, but sadly, sometimes neither of those happen. As a result, doing a complete recompilation avoids a lot of problems associated with a staggered build process.

Hasn't everyone come across this usage pattern? I get weird build errors, and before even investigating I do a full rebuild, and the problem goes away.
This by itself seems to me to be good enough reason to do a full rebuild before a release.
Whether you would be willing to turn an incremental build that completes without problems over to testing, is a matter of taste, I think.

I would definitely recommend it. I have seen on a number of occasions with a large Visual C++ solution the dependency checker fail to pick up some dependency on changed code. When this change is to a header file that effects the size of an object very strange things can start to happen.
I am sure the dependency checker has got better in VS 2008, but I still wouldn't trust it for a release build.

The biggest reason not to ship an incrementally linked binary is that some optimizations are disabled. The linker will leave padding between functions (to make it easier to replace them on the next incremental link). This adds some bloat to the binary. There may be extra jumps as well, which changes the memory access pattern and can cause extra paging and/or cache misses. Older versions of functions may continue to reside in the executable even though they are never called. This also leads to binary bloat and slower performance. And you certainly can't use link-time code generation with incremental linking, so you miss out on more optimizations.
If you're giving a debug build to a tester, then it probably isn't a big deal. But your release candidates should be built from scratch in release mode, preferably on a dedicated build machine with a controlled environment.

Visual Studio has some problems with partial (incremental) builds, (I mostly encountered linking errors) From time to time, it is very useful to have a full rebuild.
In case of long compilation times, there are two solutions:
Use a parallel compilation tool and take advantage of your (assumed) multi core hardware.
Use a build machine. What I use most is a separate build machine, with a CruiseControl set up, that performs full rebuilds from time to time. The "official" release that I provide to the testing team, and, eventually, to the customer, is always taken from the build machine, not from the developer's environment.

What are the pros + cons of Link-Time Code Generation? (VS 2005)

I've heard that enabling Link-Time Code Generation (the /LTCG switch) can be a major optimization for large projects with lots of libraries to link together. My team is using it in the Release configuration of our solution, but the long compile-time is a real drag. One change to one file that no other file depends on triggers another 45 seconds of "Generating code...". Release is certainly much faster than Debug, but we might achieve the same speed-up by disabling LTCG and just leaving /O2 on.
Is it worth it to leave /LTCG enabled?

It is hard to say, because that depends mostly on your project - and of course the quality of the LTCG provided by VS2005 (which I don't have enough experience with to judge). In the end, you'll have to measure.
However, I wonder why you have that much problems with the extra duration of the release build. You should only hand out reproducible, stable, versioned binaries that have reproducible or archived sources. I've rarely seen a reason for frequent, incremental release builds.
The recommended setup for a team is this:
Developers typically create only incremental debug builds on their machines. Building a release should be a complete build from source control to redistributable (binaries or even setup), with a new version number and labeling/archiving the sources. Only these should be given to in-house testers / clients.
Ideally, you would move the complete build to a separate machine, or maybe a virtual machine on a good PC. This gives you a stable environment for your builds (includes, 3rd party libraries, environment variables, etc.).
Ideally, these builds should be automated ("one click from source control to setup"), and should run daily.

It allows the linker to do the actual compilation of the code, and therefore it can do more optimization such as inlining.
If you don't use LTCG, the compiler is the only component in the build process that can inline a function, as in replace a "call" to a function with the contents of the function, which is usually a lot faster. The compiler would only do so anyway for functions where this yields an improvement.
It can therefore only do so with functions that it has the body of. This means that if a function in the cpp file calls another function which is not implemented in the same cpp file (or in a header file that is included) then it doesn't have the actual body of the function and can therefore not inline it.
But if you use LTCG, it's the linker that does the inlining, and it has all the functions in all the of the cpp files of the entire project, minus referenced lib files that were not built with LTCG. This gives the linker (which becomes the compiler) a lot more to work with.
But it also makes your build take longer, especially when doing incremental changes. You might want to turn on LTCG in your release build configuration.
Note that LTCG is not the same as profile-guided optimization.

I know the guys at Bungie used it for Halo3, the only con they mentioned was that it sometimes messed up their deterministic replay data.
Have you profiled your code and determined the need for this? We actually run our servers almost entirely in debug mode, but special-case a few files that profiled as performance critical. That's worked great, and has kept things debuggable when there are problems.
Not sure what kind of app you're making, but breaking up data structures to correspond to the way they were processed in code (for better cache coherency) was a much bigger win for us.

I've found the downsides are longer compile times and that the .obj files made in that mode (LTCG turned on) can be really massive. For example, .obj files that might be 200-500k were about 2-3mb. It just to happened to me that compiling a bunch of projects in my chain led to a 2 gb folder, the bulk of which was .obj files.

I also don't see problems with extra compilation time using link-time code generation with the release build. I only build my release version once per day (overnight), and use the unit-test and debug builds during the day.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Safe Unity Builds - c++

Related

Prevent BizTalk projects from invoking a full rebuild?

C Compiler automatic Version increment

What is the difference between compile code and executable code?

Partial builds versus full builds in Visual C++

What are the pros + cons of Link-Time Code Generation? (VS 2005)

Categories

Resources