At what stage should coverity static analysis be done? - django

When should we do coverity static analysis (no build, buildless capture since we don't use compiled language) in our CI lifecycle? We have stages like test, build, deploy. What are the pros and cons of different approaches?
This is for a django application which is deployed onto kubernetes.
test stage involves testing django end-points.
build stage involves building a docker container.
deploy stage involves rolling out the recently built docker image.
If I were to create a new stage when should it be done? Any convention followed while doing this?

Deciding where to put certain checks in your build pipeline is a matter of what you want to get out of those checks.
A build pipeline should give you fast feedback first and foremost. You want to know as quickly as possible if there's anything significant that should stop your build from going out to production. That's why you tend to move checks that run fast to the earlier stages of your pipeline. This way you quickly check whether it's worth it to move on to the slower, more cumbersome steps of your pipeline.
If your static code analysis detects issues, do you want to fail the build? If so, this might be an indicator to put this step early into your pipeline.
How long does your static code analysis take to analyse your codebase? If it's a matter of a few seconds, you can put it into an early stage into your pipeline without thinking too much about it. If it takes significant time to build (maybe tens of seconds or even minutes) this is an indicator that you should move this to a later stage so that other, faster checks can run first.
You can but don't have to put static code analysis into one of your existing (test, build and deploy) stages but there's no one stopping you from creating a dedicated stage in your pipeline for that (verification maybe?).
There's no reason to be dogmatic about this. It's valuable to experiment and see what works for you. Putting emphasis on fast feedback is a good rule of thumb to come up with a build pipeline that doesn't require you to watch the build for 20 minutes only to see that you made an indentation error on line 24.

Related

Build only changed item or build all ,which is preferred in continuous integration?

Our software is split into multiple components.
Msbuild scripts automate our build and batch scripts to invoke it.
We are taking the daily build of our components even if we have small changes.
We want to move to continuous integration so that whenever a check-in happens a build is triggered.
Our msbuild scripts are written in such a way that it will build all the sln files for the component.
In continuous integration, do I only need to build the slns that are modified?
If I only build the changed items, do I have to write a msbuild for each sln?
Can I simply use the existing msbuild script in teamcity?
I have done both, and there are advantages and disadvantages to both:
Incremental Build (only what's changed)
This is the default behavior of MSBuild. Even when you build in Visual Studio, it only builds what changed (unless you choose "Rebuild All").
Pros
Faster The faster your feedback loop, the better.
Less load on build server disk and network since you don't have to check everything out of source control each time. This may be important in high load build clusters.
Cons
Possible Source Control Hiccups We have had issues where a complicated rename or restructuring of our source tree caused the checkout to fail. We were using subversion, so it was the update that failed. Had we been doing a clean checkout, this would not have happened (of course a clean checkout means we must do a full rebuild).
Chance of false positives. We had a case where we weren't completely wiping the build agent's source files and checking out fresh copies from source. Someone changed the build file such that it didn't copy the binaries properly before testing them, but since the old binaries were still there on the disk, it was running the tests from them. The build was broken, but it was running old binary tests, so we didn't realize we had introduced bugs until I spotted the problem a week later.
Full Checkout and Rebuild
Pros
More Robust You rule out source control update issues and the chance of false positives becuase you are starting with a clean slate each and every time. This removes the possiblity that a previous build could affect this build.
Cons
Much Slower This involves waiting for a full checkout from source, which may take a long time if the project is large. Plus it requires building the entire source tree from scratch.
Higher Load on Build Agent, Disk & Network Because you are checking out and rebuilding everything, you will tax the cpu and disk of the build agent more, and also of the network (and your source control system as well).
Third Option: Incremental Checkout and Rebuild
This is the case where you only pull incremental changes from source control, but perform a full rebuild.
Pros
Faster than full checkout and is actually only slightly slower than incrementally building. Build times are usually small compared to the time it takes to do a full checkout, or run your tests.
Somewhat more robust since you rebuild all the source, but false positives can still sneak in.
Cons
Possible Source Control Hiccups See my comments on the Incremental Build.
Chance of false positives See my comments on the Incremental Build.
Which is best?
That depends on the balance between your need for fast feedback (speed), and the resource constraints on your network and build agent(s). If you had lots of resources and wanted both the fast feedback and the robustness of a full rebuild, you could build twice. Perhaps on checkin, do an incremental build, but nightly perform a full checkout and rebuild.
Both, sort of.
We do a scheduled daily full build, and the incremental during the day (gated checkin).
All depends on how big and complex your build is really, but you should err towards full if you are going for one or the other.
Whatever your granularity of build is, my bias would be to do a full, clean build of that whole component. If that build time is excessively long for a CI loop, consider breaking that component into smaller parts that are dependent on each-other's binaries. In that case, you would build what changed, and any other components that are dependent on the component that changed. In the .Net world, I'm hearing that Nuget is getting increasingly popular for managing those binary dependencies.
I talk a lot more about this strategy in Practice 2 of my CI pain relief series.

How can I guarantee all unit tests pass before committing?

We've had problems recently where developers commit code to SVN that doesn't pass unit tests, fails to compile on all platforms, or even fails to compile on their own platform. While this is all picked up by our CI server (Cruise Control), and we've instituted processes to try to stop it from happening, we'd really like to be able to stop the rogue commits from happening in the first place.
Based on a few other questions around here, it seems to be a Bad Idea™ to force this as a pre-commit hook on the server side mostly due to the length of time required to build + run the tests. I did some Googling and found this (all devs use TortoiseSVN):
http://cf-bill.blogspot.com/2010/03/pre-commit-force-unit-tests-without.html
Which would solve at least two of the problems (it wouldn't build on Unix), but it doesn't reject the commit if it fails. So my questions:
Is there a way to make a pre-commit hook in TortoiseSVN cause the commit to fail?
Is there a better way to do what I'm trying to do in general?
There is absolutely no reason why your pre-commit hook can't run the Unit tests! All your pre-commit hook has to do is:
Checkout the code to a working directory
Compile everything
Run all the unit tests
Then fail the hook if the unit tests fail.
It's completely possible to do. And, afterwords, everyone in your development shop will hate your guts.
Remember that in a pre-commit hook, the entire hook has to complete before it can allow the commit to take place and control can be returned to the user.
How long does it take to do a build and run through the unit tests? 10 minutes? Imagine doing a commit and sitting there for 10 minutes waiting for your commit to take place. That's the reason why you're told not to do it.
Your continuous integration server is a great place to do your unit testing. I prefer Hudson or Jenkins over CruiseControl. They're easier to setup, and their webpage are more user friendly. Even better they have a variety of plugins that can help.
Developers don't like it to be known that they broke the build. Imagine if everyone in your group got an email stating you committed bad code. Wouldn't you make sure your code was good before you committed it?
Hudson/Jenkins have some nice graphs that show you the results of the unit testing, so you can see from the webpage what tests passed and failed, so it's very clear exactly what happened. (CruiseControl's webpage is harder for the average eye to parse, so these things aren't as obvious).
One of my favorite Hudson/Jenkins plugin is the Continuous Integration Game. In this plugin, users are given points for good builds, fixing unit tests, and creating more passed unit tests. They lose points for bad builds and breaking unit tests. There's a scoreboard that shows all the developer's points.
I was surprised how seriously developers took to it. Once they realized that their CI game scores were public, they became very competitive. They would complain when the build server itself failed for some odd reason, and they lost 10 points for a bad build. However, the number of failed unit tests dropped way, way down, and the number of unit tests that were written soared.
There are two approaches:
Discipline
Tools
In my experience, #1 can only get you so far.
So the solution is probably tools. In your case, the obstacle is Subversion. Replace it with a DVCS like Mercurial or Git. That will allow every developer to work on their own branch without the merge nightmares of Subversion.
Every once in a while, a developer will mark a feature or branch as "complete". That is the time to merge the feature branch into the main branch. Push that into a "staging" repository which your CI server watches. The CI server can then pull the last commit(s), compile and test them and only if this passes, push them to the main repository.
So the loop is: main repo -> developer -> staging -> main.
There are many answers here which give you the details. Start here: Mercurial workflow for ~15 developers - Should we use named branches?
[EDIT] So you say you don't have the time to solve the major problems in your development process ... I'll let you guess how that sounds to anyone... ;-)
Anyway ... Use hg convert to get a Mercurial repo out of your Subversion tree. If you have a standard setup, that shouldn't take much of your time (it will just need a lot of time on your computer but it's automatic).
Clone that repo to get a work repo. The process works like this:
Develop in your second clone. Create feature branches for that.
If you need changes from someone, convert into the first clone. Pull from that into your second clone (that way, you always have a "clean" copy from subversion just in case you mess up).
Now merge the Subversion branch (default) and your feature branch. That should work much better than with Subversion.
When the merge is OK (all the tests run for you), create a patch from a diff between the two branches.
Apply the patch to a local checkout from Subversion. It should apply without problems. If it doesn't, you can clean your local checkout and repeat. No chance to lose work here.
Commit the changes in subversion, convert them back into repo #1 and pull into repo #2.
This sounds like a lot of work but within a week, you'll come up with a script or two to do most of the work.
When you notice someone broke the build (tests aren't running for you anymore), undo the merge (hg clean -C) and continue to work on your working feature branch.
When your colleagues complain that someone broke the build, tell them that you don't have a problem. When people start to notice that your productivity is much better despite all the hoops that you've got to jump, mention "it would be much more simple if we would scratch SVN".
The best thing to do is to work to improve the culture of your team, so that each developer feels enough of a commitment to the process that they'd be ashamed to check in without making sure it works properly, in whatever ways you've all agreed.

buildbot vs hudson/jenkins for C++ continuous integration

I'm currently using jenkins/hudson for continuous integration a large mostly C++ project. We have separate projects for trunk and every branch. Also, there are some related projects for the Java code, but the setup for those are fairly basic right now (we may do more later though). The C++ projects do the following:
Builds everything with options for whether to reconfigure, do a clean build, or use a fresh checkout
Optionally builds and runs all tests
Optionally runs all tests using Valgrind's memcheck
Runs cppcheck
Generates doxygen documentation
Publishes reports: unit tests, valgrind, cppcheck, compiler warnings, SLOC, open tasks, and code coverage (using gcov, gcovr, and the cobertura plugin)
Deploys code nightly or on demand to a test environment and a package repository
Everything is configurable for automatic builds and optional for on demand builds. Underneath, there's a bash script that controls much of this, which farther depends on our build system, which uses automake and autoconf along with custom bash scripts.
We started using Hudson (at the time) because that's what the Java guys were using and we just wanted nightly builds. Since then, we've added a lot more and continue to add more. In some ways Hudson is great, but certainly isn't ideal.
I've looked at other solutions and the only one that looks like it could be a replacement is buildbot. Would buildbot be better for this situation? Is the investment worth it since we're already using Hudson? Why?
EDIT: Someone asked why I haven't found Hudson/Jenkins to be ideal. The short answer is that everything can be improved. I'm simply wondering if Jenkins is the best current solution for my use case or whether there is something better (buildbot?) that would be easier to maintain in the long run even as new requirements come up.
Both are open source projects, but you do not need to change buildbot code to "extend" it, it is actually quite easy to import your own packages in its configuration in which you can sub-class most of the features with your own additions. Examples: your own compilation or test code, some parsing of outputs/errors to be given to the next steps, your own formating of alert emails etc. there are lots of possibilities.
Generally I would say that buildbot is the most "general purpose" automatic builds tools. Jenkins however might be the best related to running tests, especially for parsing and presenting results in nice ways (results, details, charts.. some clicks away), things that buildbot does not do "out-of-the-box". I'm actually thinking of using both to have sexier test result pages.. :-)
Also as a rule of thumb it should not be difficult to create a new tool's config: if the specification of what to do (configs, builds, tests) is too hard to switch from one tool to another, it is a (bad) sign that not enough configuration scripts are moved to the sources. Buildbot (or Jenkins) should only call simple commands. If it is simple to run tests, then developers will do it as well and this will improve the success rate, whereas if only the continuous integration system runs the tests, you will be running after it to fix the new code failures, and will loose its non-regression value, just my 0.02€ :-)
Hope it'll help.
The 'result integration' is also in jenkins/hudson, and you can relatively easily capture build products without having to 'copy them elsewhere'.
For our instance, the coverage reports and unit test metrics and javadoc for the java code is all integrated. For our C++ code, the plugins are a little lacking, but you can still get most of it.
we ran buildbot since pre 0.7, and are now running 0.8 and are only now seeing any real reason to switch, as buildbot 0.8 forgot about windows slaves for an extended period of time and the support was pretty poor.
There are many other solutions out there, besides Jenkins/Hudson/BuildBot:
TeamCity by Jetbrains
Bamboo by Atlassian
Go by Thoughtworks
Cruise Control
OpenMake Meister
The specifics about what you are doing are not so important, in fact, as long as the agents (aka nodes) that you are doing them on support those tasks.
The beauty of a CI server is noticing when the build changes to trigger a new build (and test), publish the artifacts, and publish test results.
When you compare CI tools like those we mentioned, consider features like the usability of its interface, how easy is branching (and features it might offer like automatic merging), notifications (like XMPP/jabber), or an information-radiator (like hooking up a monitor to always show status). Product support is another thing to consider - Jenkins' support is only as good as who is responding to community questions at the time you have questions.
My personal favorite is Bamboo, but it comes with a license fee.
I'm a long-time Jenkins user in the middle of evaluating Buildbot and would like to offer a few items for folks considering using Buildbot for multi-module solutions:
*) Buildbot doesn't have any out-of-the-box concept of file artifacts related to each build. It's not in the UI and it's not in any of the builtin "steps" modules as far as I can see:
http://docs.buildbot.net/current/manual/configuration/buildsteps.html
...and I see no third party plugin:
https://github.com/buildbot/buildbot/wiki/PluginList#steps
Buildbot does collect all the console output from a given build, but critically, you can't collect files related to it.
*) Given that artifacts are not supported, it's not easy to create "collector" projects that bring multiple modules into say, a single installer. Jenkins has a great feature that lets you parameterize a build with builds from other modules (the parameter type is a run).
*) Establishing dependencies between modules is trickier in Buildbot. Say you have a library that three binaries depend on, and you want those binaries to rebuild each time the library changes. Jenkins has triggers built into the UI. If you want to do triggers in Buildbot you have to script them using schedulers.Dependent, and it causes a lot of item congestion in the Schedulers UI.
*) When you're working in Buildbot, it seems that pretty much all of the configuration is done in master.cfg in code. This is awesome and frustrating.
*) Buildbot forces you to create a worker in addition to a master server. This is annoying for beginners and systems for which a single build server is sufficient.
My impression after two days of Buildbot evaluation is that we'll stick with Jenkins, primarily due to it having artifacts. Buildbot is a tool we'd only use if we had more extensive customization needs, and the time to do it.
On the subject of buildbot and artifacts -- I don't have enough user score to make a comment -- you can get artifacts from buildbot 2.x series pretty easy with built-in file/directory upload actions. However you rarely want to just move files. Typically you make a triggered buildstep that does deployment directly off the worker for best results. eg push to cloud storage, containers, thirdparty (steam uploads), etc.
This way you can get metrics on the uploads and conditionally control them better (or even mix and match artifacts across worker machines).

Hudson - different build targets for different triggers

I would like to have different build targets for periodic builds and for those that are triggered by polling SCM.
More specific: The idea is that nightly builds should call 'mvn verify' which includes integration tests, while a normal build calls 'mvn test' that just executes Unit tests.
Any ideas how this can be achieved using Hudson?
Cheers
Chris
You could create two jobs - one scheduled and the other polled.
In the scheduled you can specify a different maven goal from the polled.
The answer by Raghuram is straight forward and correct. But you can also have three jobs. The first two do the triggering and pass the maven goal as a parameter into the third job. Sounds like a lot of clutter, and to a certain point it is. But it will help, if you have a lot of configuration to do (especially if the configuration needs to be changed regularly). It will help to have the configuration correct for both jobs. Configuration does not only include the build steps but also the harvesting of all reports, post build cleanup, notifications, triggering of downstream jobs, ... Another advantage is, that you don't need to synchronize the two jobs, so that they not run in parallel (if that causes problems).
Don't understand me wrong, my first impulse would go for two jobs, which has it's own advantages. The history for the nightly build will be contain the whole day (actually since the last nightly build) and not only for the time since the last build (which could be a triggered one. Integration tests usually need a more extensive setup or access to scarce resources. With two jobs you don't block these resources when you run the test goal. In addition I expect that more test results need to be harvested to be displayed and tracked over time by Hudson. You also might want to run more metrics against your code whose results should be displayed by Hudson. The disadvantage is that you of course need to keep the build steps basically the same all the time.
But in the end it is a case-by base decision if you go with 2 or 3 jobs.

Slow Complex Builds & Hudson vs. Electric Cloud

Is hudson the right tool for complex C++ builds?
I have a C++ build that takes about 4 hours. Compile and packaging take about 1/2 the time and testing consumes the other half. Presently, we are using a home grown system but there's some move to go to hudson since we use it for all of our java builds.
My problem is that continuous integration isn't very...continuous at 4 hour intervals. I want a tool that's going to let me parallelize the build in an understandable way.
Hudson's been great for small builds or java builds where I'm sitting at the top of a large maven project, but I don't think it will scale well for complex c++ builds.
What have your experiences been?
Seems like you have a few questions here:
Should I use a CI server to manage my C++ build? The answer to this is unequivocally YES. Your homegrown system may be adequate, but it's not standard, extending it is probably difficult, and maintaining it is a distraction from the work you're actually paid to do.
Is Hudson the right choice for my project? It will probably get the job done, and it has the advantage of being in deployment at your site already. However, you specifically mention that you want a tool that supports parallelization well, and I don't think that Hudson really fits the bill. The problem is that Hudson was not designed with parallelism in mind. See, the representation of a build process in Hudson is a "job", which is just a series of steps executed in sequence -- checkout, compile, test, package, etc. There's no way to get those steps to run in parallel. Now, you can get around this by modeling your process with multiple jobs. Each job is completely independent, so of course they could be run in parallel; you can use something like the Locks and Latches plugin to coordinate the jobs, but the whole process is more complicated than it ought to be, and kind of clumsy -- instead of a single job representing a single run of the build process, you have several unconnected jobs, at best tied together via naming convention.
Can Electric Cloud help? Again, an unequivocal YES. Electric Cloud offers ElectricCommander, a CI server with parallel support built-in from inception. As with Hudson, a job is used to represent a build process, but the steps within a job can easily be run in parallel (just check the "parallel" box on those steps), so you don't have to resort to add-ons and kludges: one run build process is one job, with as many parallel steps as you like.
Will the right CI server put "continuous" back into my integration? A CI server will only get you so far. The thing is, a CI server can provide you coarse-grained parallelism -- so with a little work, you can set it up to run packaging in parallel with tests, for example. With a little more work, you can probably split your test phase into a few independent pieces that can be run in parallel.
You didn't give many details, but let's assume that your build is 90 minutes of compile, 30 minutes of packaging, and 2 hours of tests that can be broken down into four 30 minute pieces. Suppose further that you can do packaging and testing simultaneously. That would bring your 4 hour process down to 2 hours total. At this point the "long pole" in your process is the compile phase, and although you might be able to break that up by hand into pieces that can be run in parallel by your CI server, the truth is that the CI server is just not the right tool for that job.
A better option is to use a build tool that can give you automatic fine-grained parallelism within the compile phase. For example, if you're using gmake already, you can try gmake -j 8 to run 8 compiles at once. If your makefiles are clean and your dependencies are all correct, and you have a beefy build server, this could give you a pretty good performance boost. You could also use ElectricAccelerator, another product from Electric Cloud, that was specifically designed to accelerate this portion of the build process, even for builds that can't safely use gmake -j due to incorrect or incomplete depedencies.
Hope that helps.
Can you not split the build into multiple parts whatsoever?
You do mention that the job has several distinct parts. The general guidance with Hudson is to do the build part in one job, testing in another, packaging in another, and so on.
You can compile the code in Job A and archive the output, then tell Job B to copy these artifacts from Job A and run the tests on them. Meanwhile, another Job A can be kicked-off, due to further commits to the source repository..
Sounds to me like the problem is with your build process (make files?, msbuild?) and not Hudson. Hudson is simply going to execute the build process the same way a user would from a command-line. Is it possible to optimize your build process?
Even if a 4 hour build process is unavoidable, Hudson can help because you can attach an unlimited number of slave machines which can all be running multiple builds in parallel, given adequate hardware horsepower.