Slow Complex Builds & Hudson vs. Electric Cloud

Slow Complex Builds & Hudson vs. Electric Cloud - build

Is hudson the right tool for complex C++ builds?
I have a C++ build that takes about 4 hours. Compile and packaging take about 1/2 the time and testing consumes the other half. Presently, we are using a home grown system but there's some move to go to hudson since we use it for all of our java builds.
My problem is that continuous integration isn't very...continuous at 4 hour intervals. I want a tool that's going to let me parallelize the build in an understandable way.
Hudson's been great for small builds or java builds where I'm sitting at the top of a large maven project, but I don't think it will scale well for complex c++ builds.
What have your experiences been?

Seems like you have a few questions here:
Should I use a CI server to manage my C++ build? The answer to this is unequivocally YES. Your homegrown system may be adequate, but it's not standard, extending it is probably difficult, and maintaining it is a distraction from the work you're actually paid to do.
Is Hudson the right choice for my project? It will probably get the job done, and it has the advantage of being in deployment at your site already. However, you specifically mention that you want a tool that supports parallelization well, and I don't think that Hudson really fits the bill. The problem is that Hudson was not designed with parallelism in mind. See, the representation of a build process in Hudson is a "job", which is just a series of steps executed in sequence -- checkout, compile, test, package, etc. There's no way to get those steps to run in parallel. Now, you can get around this by modeling your process with multiple jobs. Each job is completely independent, so of course they could be run in parallel; you can use something like the Locks and Latches plugin to coordinate the jobs, but the whole process is more complicated than it ought to be, and kind of clumsy -- instead of a single job representing a single run of the build process, you have several unconnected jobs, at best tied together via naming convention.
Can Electric Cloud help? Again, an unequivocal YES. Electric Cloud offers ElectricCommander, a CI server with parallel support built-in from inception. As with Hudson, a job is used to represent a build process, but the steps within a job can easily be run in parallel (just check the "parallel" box on those steps), so you don't have to resort to add-ons and kludges: one run build process is one job, with as many parallel steps as you like.
Will the right CI server put "continuous" back into my integration? A CI server will only get you so far. The thing is, a CI server can provide you coarse-grained parallelism -- so with a little work, you can set it up to run packaging in parallel with tests, for example. With a little more work, you can probably split your test phase into a few independent pieces that can be run in parallel.
You didn't give many details, but let's assume that your build is 90 minutes of compile, 30 minutes of packaging, and 2 hours of tests that can be broken down into four 30 minute pieces. Suppose further that you can do packaging and testing simultaneously. That would bring your 4 hour process down to 2 hours total. At this point the "long pole" in your process is the compile phase, and although you might be able to break that up by hand into pieces that can be run in parallel by your CI server, the truth is that the CI server is just not the right tool for that job.
A better option is to use a build tool that can give you automatic fine-grained parallelism within the compile phase. For example, if you're using gmake already, you can try gmake -j 8 to run 8 compiles at once. If your makefiles are clean and your dependencies are all correct, and you have a beefy build server, this could give you a pretty good performance boost. You could also use ElectricAccelerator, another product from Electric Cloud, that was specifically designed to accelerate this portion of the build process, even for builds that can't safely use gmake -j due to incorrect or incomplete depedencies.
Hope that helps.

Can you not split the build into multiple parts whatsoever?
You do mention that the job has several distinct parts. The general guidance with Hudson is to do the build part in one job, testing in another, packaging in another, and so on.
You can compile the code in Job A and archive the output, then tell Job B to copy these artifacts from Job A and run the tests on them. Meanwhile, another Job A can be kicked-off, due to further commits to the source repository..

Sounds to me like the problem is with your build process (make files?, msbuild?) and not Hudson. Hudson is simply going to execute the build process the same way a user would from a command-line. Is it possible to optimize your build process?
Even if a 4 hour build process is unavoidable, Hudson can help because you can attach an unlimited number of slave machines which can all be running multiple builds in parallel, given adequate hardware horsepower.

Related

GNU Parallel host sticky jobs

I am writing a parallel build farm to build C++ cross-platform applications against various platforms / environments. Every time new code is pushed to a git repo, I build and test the latest code against all the platforms.
I've setup parallel to correctly distribute the jobs among several hosts using the --sshlogin option.
I transfer files, collect output and results. It's all working more than fine and I love the tool.
The build time being sometimes quite long for some platforms, I would like the build to be as incremental as possible.
My only issue is that the build is only incremental if the scheduler sends the jobs to the same machine and reuse the artefacts of the previous build on this specific host.
Say I have 3 hosts, I have 1 chance in 3 for the build to be incremental. If a hosts hasn't built this platform in a while, it might take a long time.
Is it possible to gain control over the host a specific input source will run on and only fallback to the other hosts if the host is busy?
Ideally, I would love to see a tag system where I tag input source with a name and tag several hosts with a name, creating pools of jobs and pools of machines specialized into that type of build.
But a very simple implementation where the input sources are distributed in the same order as the order the sshlogins are defined could be a simple & quick fix in my situation.
I tried to find the source code to implement it myself but I only see doc generation when I browse the code on Savannah.
Any ideas?
Thanks,
M

There is currently no support for prioritizing a given argument to a given sshlogin. The source code is at https://savannah.gnu.org/git/?group=parallel
Feel free to join the mailing list and discuss the idea: https://lists.gnu.org/mailman/listinfo/parallel
The only priority in the code is when a job has failed on an sshlogin, then GNU Parallel prefers to retry that job on another sshlogin. Maybe that could be extended?
If a job is marked as having failed -1 time for a given sshlogin, then GNU Parallel ought to prefer to run the job on that sshlogin.

I've been trying to discuss this idea on the mailing list as you suggested but never had any respone in more than 10 days... I guess you must be busy with other things at the moment. So I went along and forked the source code to make the necessary changes and make my solution work.
I pushed it there a week ago:
http://michakfromparis.github.io/gnu-parallel-sticky/
the source code is available on github here:
https://github.com/michaKFromParis/gnu-parallel-sticky
Wasn't exactly easy without any guidance as the source code has a lot of history so I tried to keep the changes surgical to ease merge of your future releases.
I've been using it in production for more than a week now and it works perfectly in my configuration.
It is also compatible with older formats, should be a drop-in replacement for usual parallel uses with extra features on the side.
Would love to get feedback from other users though as it might not be completely dry.
Thanks for sharing the original source code.
Best Regards,
M

Implementing single script build - non-portable dependencies

It seems that a build system best-practice is to have a single script that can build all source and package the releases. See Joel Test #2
How do you account for non-portable dependencies? For example, if you code for .net 4, then you need .net 4 installed on the box. Standard MS release .net 4 is not xcopy deployable (unless I'm mistaken?). I can see a few avenues:
the dependencies are clearly stated in some resource file (wiki, txt, whatever). When you
call the build script, the build will fail if you don't have the dependency installed. This is an acceptable outcome.
The build script is responsible for setting up the environment. So if you require .net 4 and its not on the box then it installs it for you.
A flavor of #2 - instead of installing dependencies, the script spawns a pre-packaged image (virtual machine, Amazon EC2 AMI) that is setup with all dependencies
???

For implementing a build script you have to ask yourself, how much work you want/can spent on it. This leads to the question how often you have to set up the build environment. I can see #2 would be the perfect solution, but i would need a lot of work, since usually you have more than one non portable dependency.
So we use #1 one. And it works quite well. The most important thing is, that the build script is starting with some sort of self-test. It looks for everything which is needed to build the whole software and gives an error if something is not found. And it gives a clear error message, so that any new guy knows what to do to make it running. Of course as with a lot of software it is nearly never finished and gets extended by needs. The drawback that this test can take some seconds is insignificant when whole build process needs more than minutes.
A wiki (or even sth. else) with the setup solution was not a good solution for us, since after three month nobody knows where this was, but the build script is used every day.
The build script itself is a set of a lot of different things, which where chosen by needs. It is starting with a batch (we are using Windows) which invokes a lot of other things. Other batches, MSBuild, home grown tools. Each step by it self is checking for its own dependencies, to have the problem local and you can see three lines later why this special thing is needed.

Number 2 states "Can you make a build in one step?" As described this means for a development team to be effective the build process must be as simple as possible to reduce errors in the build process and insure consistency. This is especially important as a team gets larger. You want to make sure everyone is building the same thing. (What is done with that package should also be simple, but it is not as important IMHO.) Msbuild is great at this; they provide the facilities to set up a build server that access the source control system independently so the developers actions can't corrupt the build environment. I highly recommend setting up a build server using TFS -- many build issues will go away and you will have the 1-click build Joel describes.
As for your points about what that package does for deployment -- you have many options with MS, but the more "one click" you can make it the better. I believe this is slightly different than Joel's #2. In his example he describes changing what software he will use for the install not because one performs with fewer steps, but instead because one can be incorporated into a one step build.

buildbot vs hudson/jenkins for C++ continuous integration

I'm currently using jenkins/hudson for continuous integration a large mostly C++ project. We have separate projects for trunk and every branch. Also, there are some related projects for the Java code, but the setup for those are fairly basic right now (we may do more later though). The C++ projects do the following:
Builds everything with options for whether to reconfigure, do a clean build, or use a fresh checkout
Optionally builds and runs all tests
Optionally runs all tests using Valgrind's memcheck
Runs cppcheck
Generates doxygen documentation
Publishes reports: unit tests, valgrind, cppcheck, compiler warnings, SLOC, open tasks, and code coverage (using gcov, gcovr, and the cobertura plugin)
Deploys code nightly or on demand to a test environment and a package repository
Everything is configurable for automatic builds and optional for on demand builds. Underneath, there's a bash script that controls much of this, which farther depends on our build system, which uses automake and autoconf along with custom bash scripts.
We started using Hudson (at the time) because that's what the Java guys were using and we just wanted nightly builds. Since then, we've added a lot more and continue to add more. In some ways Hudson is great, but certainly isn't ideal.
I've looked at other solutions and the only one that looks like it could be a replacement is buildbot. Would buildbot be better for this situation? Is the investment worth it since we're already using Hudson? Why?
EDIT: Someone asked why I haven't found Hudson/Jenkins to be ideal. The short answer is that everything can be improved. I'm simply wondering if Jenkins is the best current solution for my use case or whether there is something better (buildbot?) that would be easier to maintain in the long run even as new requirements come up.

Both are open source projects, but you do not need to change buildbot code to "extend" it, it is actually quite easy to import your own packages in its configuration in which you can sub-class most of the features with your own additions. Examples: your own compilation or test code, some parsing of outputs/errors to be given to the next steps, your own formating of alert emails etc. there are lots of possibilities.
Generally I would say that buildbot is the most "general purpose" automatic builds tools. Jenkins however might be the best related to running tests, especially for parsing and presenting results in nice ways (results, details, charts.. some clicks away), things that buildbot does not do "out-of-the-box". I'm actually thinking of using both to have sexier test result pages.. :-)
Also as a rule of thumb it should not be difficult to create a new tool's config: if the specification of what to do (configs, builds, tests) is too hard to switch from one tool to another, it is a (bad) sign that not enough configuration scripts are moved to the sources. Buildbot (or Jenkins) should only call simple commands. If it is simple to run tests, then developers will do it as well and this will improve the success rate, whereas if only the continuous integration system runs the tests, you will be running after it to fix the new code failures, and will loose its non-regression value, just my 0.02€ :-)
Hope it'll help.

The 'result integration' is also in jenkins/hudson, and you can relatively easily capture build products without having to 'copy them elsewhere'.
For our instance, the coverage reports and unit test metrics and javadoc for the java code is all integrated. For our C++ code, the plugins are a little lacking, but you can still get most of it.
we ran buildbot since pre 0.7, and are now running 0.8 and are only now seeing any real reason to switch, as buildbot 0.8 forgot about windows slaves for an extended period of time and the support was pretty poor.

There are many other solutions out there, besides Jenkins/Hudson/BuildBot:
TeamCity by Jetbrains
Bamboo by Atlassian
Go by Thoughtworks
Cruise Control
OpenMake Meister
The specifics about what you are doing are not so important, in fact, as long as the agents (aka nodes) that you are doing them on support those tasks.
The beauty of a CI server is noticing when the build changes to trigger a new build (and test), publish the artifacts, and publish test results.
When you compare CI tools like those we mentioned, consider features like the usability of its interface, how easy is branching (and features it might offer like automatic merging), notifications (like XMPP/jabber), or an information-radiator (like hooking up a monitor to always show status). Product support is another thing to consider - Jenkins' support is only as good as who is responding to community questions at the time you have questions.
My personal favorite is Bamboo, but it comes with a license fee.

I'm a long-time Jenkins user in the middle of evaluating Buildbot and would like to offer a few items for folks considering using Buildbot for multi-module solutions:
*) Buildbot doesn't have any out-of-the-box concept of file artifacts related to each build. It's not in the UI and it's not in any of the builtin "steps" modules as far as I can see:
http://docs.buildbot.net/current/manual/configuration/buildsteps.html
...and I see no third party plugin:
https://github.com/buildbot/buildbot/wiki/PluginList#steps
Buildbot does collect all the console output from a given build, but critically, you can't collect files related to it.
*) Given that artifacts are not supported, it's not easy to create "collector" projects that bring multiple modules into say, a single installer. Jenkins has a great feature that lets you parameterize a build with builds from other modules (the parameter type is a run).
*) Establishing dependencies between modules is trickier in Buildbot. Say you have a library that three binaries depend on, and you want those binaries to rebuild each time the library changes. Jenkins has triggers built into the UI. If you want to do triggers in Buildbot you have to script them using schedulers.Dependent, and it causes a lot of item congestion in the Schedulers UI.
*) When you're working in Buildbot, it seems that pretty much all of the configuration is done in master.cfg in code. This is awesome and frustrating.
*) Buildbot forces you to create a worker in addition to a master server. This is annoying for beginners and systems for which a single build server is sufficient.
My impression after two days of Buildbot evaluation is that we'll stick with Jenkins, primarily due to it having artifacts. Buildbot is a tool we'd only use if we had more extensive customization needs, and the time to do it.

On the subject of buildbot and artifacts -- I don't have enough user score to make a comment -- you can get artifacts from buildbot 2.x series pretty easy with built-in file/directory upload actions. However you rarely want to just move files. Typically you make a triggered buildstep that does deployment directly off the worker for best results. eg push to cloud storage, containers, thirdparty (steam uploads), etc.
This way you can get metrics on the uploads and conditionally control them better (or even mix and match artifacts across worker machines).

Hudson - different build targets for different triggers

I would like to have different build targets for periodic builds and for those that are triggered by polling SCM.
More specific: The idea is that nightly builds should call 'mvn verify' which includes integration tests, while a normal build calls 'mvn test' that just executes Unit tests.
Any ideas how this can be achieved using Hudson?
Cheers
Chris

You could create two jobs - one scheduled and the other polled.
In the scheduled you can specify a different maven goal from the polled.

The answer by Raghuram is straight forward and correct. But you can also have three jobs. The first two do the triggering and pass the maven goal as a parameter into the third job. Sounds like a lot of clutter, and to a certain point it is. But it will help, if you have a lot of configuration to do (especially if the configuration needs to be changed regularly). It will help to have the configuration correct for both jobs. Configuration does not only include the build steps but also the harvesting of all reports, post build cleanup, notifications, triggering of downstream jobs, ... Another advantage is, that you don't need to synchronize the two jobs, so that they not run in parallel (if that causes problems).
Don't understand me wrong, my first impulse would go for two jobs, which has it's own advantages. The history for the nightly build will be contain the whole day (actually since the last nightly build) and not only for the time since the last build (which could be a triggered one. Integration tests usually need a more extensive setup or access to scarce resources. With two jobs you don't block these resources when you run the test goal. In addition I expect that more test results need to be harvested to be displayed and tracked over time by Hudson. You also might want to run more metrics against your code whose results should be displayed by Hudson. The disadvantage is that you of course need to keep the build steps basically the same all the time.
But in the end it is a case-by base decision if you go with 2 or 3 jobs.

How do you get up and running with a build server?

I think everyone here would agree that in order to be considered a professional software house there are number fundamental things you must have in place.
There is no doubt that one of these things is a build server, the question is, how far do you need to go.
What are the minimum requirements for the build server? (Somewhere to just compile?)
What is the ultimate goal for your build server? (Scheduled, source control integration, auto deployment to test / live servers)
Where is a good place to start assuming you have nothing at the moment?
It would be great if we could list out a few simple tasks that an amateur developer could take on board in order to set them on the right track to a fully functional build server.
It would also be good to hear about people that feel they have a "complete" system setup that performs all the functionality they require and how they went about setting it all up from scratch.

You can start by looking into Cruise Control.
There's also CruiseControl.net if that's your poison.
Essentially though, you need the following ingredients:
A dedicated environment (Virtual Machine/server. Don't use a developer's machine, unless it's just you. Even then, run a VM if you can. Much easier to move it to a server when/if one becomes available in your organisation)
A source control system that supports labelled/tagged revisions (for example, Subversion+TortoiseSVN)
Build scripts. These can be batchfiles that start the devenv.exe or msbuild.exe applications with a command line, or you can use something like Ant or NAnt.
In this scenario, CruiseControl acts as the Continous Integration server, and can make sure that you have builds done as you check in your code. This means you know whether the build is broken quicker than if you just had nightly builds. You should probably also have nightly builds, though.

Hudson is a great CI.
We run farm locally, but we started by downloading hudson.war and doing
java -jar hudson.war
It integrates with SCM, bug trucking systems it is really awesome.
You'll need some disk space if you want to keep old build.
Enjoy it is most straightforward CI solution so far.
HTH,
Hubert.

If you're using Cruise Control, the place to start is an Ant build.xml that does the job manually.
You need a version control system that can do labeled check-outs.
You need JUnit tests to run using the Ant task and generate HTML reports.

Id say you'd have to start by implementing a build strategy so you can build your code in a structured way - I use NANT.
For a basic build server - use one of the CI offerings out there that monitors your source control and triggers a build whenever a change is detected. eg: cruiseControl.
Once you get the basic build together - add the running of your unit tests after a successfuly build.
The most successful system i've had in place had 3 different builds :-
- one that fired on a check in - all this did was build the code.
- an on demand one that would build the application, generate the installer and then put
the installer into a shared drive for the testers to pick up
- a daily build that fired at 10pm. This:
- ran some code generation to build DB and C# code from a UML model
- build the code
- created a new build verification test user on a test oracle instance
- ran the application schema into the db
- fired off a bunch of unit tests
- cleaned up the db user (if the tests were successful)
- ran coverage analysis to build a report of the unit code coverage
Software we used for this was NANT, CruiseControl.NET, a custom code generation system, custom app to build an oracle schema, and NCover for the code analysis.

Start by having a read of Martin Fowler's excellent paper on Continuous Integration.
We built such a system for a major project >2,000 kSLOC and it proved itself to be invaluable.
HTH
cheers,
Rob

Cruise, Maven, Hudson etc are all great but its always worth having a stopgap solution.
You should have a batch file, shell script or simply written instructions that will allow you to run a build from any machine. We have had build servers unavailable in the past and the ability to switch quickly to another machine was invaluable!
The spec of the build machine need not be important unless you have a monster project. We try and keep our build times down to 10 minutes (including unit tests) and we have a pretty big project.
Don't be tempted to create or write your own build system because "none of the tools out there are good enough". All modern build systems allow you to write plugins to do custom stuff.

I'm using Cruisecontrol.NET and an msbuild buildscript.
I can use the buildscript manually so that I can get the latest version of the codebase, built the codebase very easily using the commandline. (This is very interesting if you are working on an application that consists of multiple solutions).
Next to that, my CruiseControl.NET buildserver uses this buildscript as well. It checks on a regular interval if there have been changes committed to the source-control.
If that happens, CC.NET performs the 'get-latest' task that I've defined in the buildscript, builds everything, executes unit-tests and performs a statical code analysis (fxcop).
My 'buildserver' is just an old workstation. It's a PIV, 3Ghz with 1gb RAM, and it does its job perfectly.
One additional thing that I would find interesting, is to have the ability to automatically deploy a new version, or build a setup.
I haven't done that yet, since I'm not sure whether it is a good idea, nor have I found a good strategy yet to do so ...
I mean; is deploying a new version of some components into production for a mission-critical application a good idea ? I don't think so ...
I think this is a good place to start:
[http://confluence.public.thoughtworks.org/display/CC/Home;jsessionid=5201DA7E8D361EB164C40E519DA0F0DE][1]
At least, that's where I started looking when setting up my build server. :)
[1]: Home of CruiseControl

Roughly in order - minimal/least sophisticated through more sophisticated
able to get a specific set of source onto any machine
able to build that source (with no problems)
able to (schedule) build each night/or some other defined period with no user intervention
One (or more) dedicated build server (not shared as qa or dev machine)
able to do a build after each check-in/commit
Notify interested parties of the build status after a build
Provide build status at any time
Create installers as part of the build
ability to deploy/live if build is good
Run unit tests
Run tests on the product
Report the results of those tests
Static code analysis and reporting
...
And the list goes on and on
Don't be afraid to just start with batch files or shell scripts or other ad-hoc means. People made perfectly good software before the CI craze. there were plenty of good processes before Hudson and Cruise Control - ( I am not knocking those or others - I use Hudson among others) - but don't miss the point - these things are here to help you - not become overbearing process)

I couldn't give you all the details about how we set our build server up (I was only involved at the start), but:
We started with an in-house system, implemented in ASP.NET and a .NET Windows Service, using NAnt to do the actual builds. Actually, most of the workflow was implemented in NAnt (e.g. emailing people, copying stuff around, etc.).
We moved to JetBrains TeamCity (there's a free cut-down version available), which is still serving us well.
We use it for builds triggered by a commit: these just build the binaries and run the unit tests. From here, we can do a complete build, which does the MSI as well. From there, we have system test builds that run more in-depth tests, across an environment built with virtual machines (with a separate domain controller, SQL Server box, etc.). When the system tests pass, the build is made available to our QA department for manual testing and some regression tests that we've not automated yet.

In the java space I've tested most of the available build environments. The issue with automatic build is that you quite often end up spending a fair amount of time following it up. After we switched to the commercial bamboo from atlassian, we found that we have to spend a lot less time pampering the build box, which in our case turns out to be very good economy. Bamboo also supports clustering, so you can add inexpensive boxes as needs evolves.

Try & find something that fits in with your existing practices in terms of building - e.g. it's not going to be a good fit to try & use an Ant-based buildserver if you're using Maven, for instance!
Ideally, it should just be able to monitor your source-control system, checkout the code, build, run some tests & publish the results without you being aware of it, or at least not 'till it's reporting a failure. Personally, I'd suggest Hudson (https://hudson.dev.java.net/) as a good starting point as it's easy to get installed & running & has a decent UI.

We start by writing batch scripts that will run on the developers machine. Once we have all the processes automated, we move them to the build server.
On the tools side we are currently moving from Cruise Control to TFS.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js