Advice for Web-based Remote Build System

Advice for Web-based Remote Build System - web-services

I'm interested in setting up a remote build system at work, initially for internal use, potentially for some customers going forward. We need to compile library code on several different machines (PC, Mac) and with multiple compilers, and it can be a real pain trying to get access to a full set. This is not our main build system, which is Jenkins-based and uses an approach that is not easily modified for the purpose envisaged here.
The idea would be that you could post your source to a website with some basic build parameters, it would compile the code and you could then download the generated code. Ideally users could pick which version of the underlying software they compiled their libraries against. I envisage it being supported by a virtual machine.
Reason I'm posting is that I don't really want to roll-my-own as much as possible - longer term it has maintenance implications - and would prefer something as pre-existing as possible. Obviously one would expect some adaptation in terms of scripting.
Any suggestions? It would have to be supported on Mac and PC at absolute minimum.

This sounds like something you could do by creating a parameterized Jenkins job (the build params given as input to your web frontent could be passed on to the job, perhaps via the Jenkins API). Personally, I would see if you could skip the step of creating a new webfrontend, and have users pass their build params directly to Jenkins.
To support downloading the resulting compiled code, you could have the Jenkins job archive the build as an artifact. Users could then download the files from the result page for that individual build.
As for how to make a Jenkins job accept source code to compile as input, perhaps you could use branches in your CM system? Your users could push their code to a branch, and then pass the branch as a build param. Otherwise, you might be able to use the file parameter feature of Jenkins.

Related

GNU Parallel host sticky jobs

I am writing a parallel build farm to build C++ cross-platform applications against various platforms / environments. Every time new code is pushed to a git repo, I build and test the latest code against all the platforms.
I've setup parallel to correctly distribute the jobs among several hosts using the --sshlogin option.
I transfer files, collect output and results. It's all working more than fine and I love the tool.
The build time being sometimes quite long for some platforms, I would like the build to be as incremental as possible.
My only issue is that the build is only incremental if the scheduler sends the jobs to the same machine and reuse the artefacts of the previous build on this specific host.
Say I have 3 hosts, I have 1 chance in 3 for the build to be incremental. If a hosts hasn't built this platform in a while, it might take a long time.
Is it possible to gain control over the host a specific input source will run on and only fallback to the other hosts if the host is busy?
Ideally, I would love to see a tag system where I tag input source with a name and tag several hosts with a name, creating pools of jobs and pools of machines specialized into that type of build.
But a very simple implementation where the input sources are distributed in the same order as the order the sshlogins are defined could be a simple & quick fix in my situation.
I tried to find the source code to implement it myself but I only see doc generation when I browse the code on Savannah.
Any ideas?
Thanks,
M

There is currently no support for prioritizing a given argument to a given sshlogin. The source code is at https://savannah.gnu.org/git/?group=parallel
Feel free to join the mailing list and discuss the idea: https://lists.gnu.org/mailman/listinfo/parallel
The only priority in the code is when a job has failed on an sshlogin, then GNU Parallel prefers to retry that job on another sshlogin. Maybe that could be extended?
If a job is marked as having failed -1 time for a given sshlogin, then GNU Parallel ought to prefer to run the job on that sshlogin.

I've been trying to discuss this idea on the mailing list as you suggested but never had any respone in more than 10 days... I guess you must be busy with other things at the moment. So I went along and forked the source code to make the necessary changes and make my solution work.
I pushed it there a week ago:
http://michakfromparis.github.io/gnu-parallel-sticky/
the source code is available on github here:
https://github.com/michaKFromParis/gnu-parallel-sticky
Wasn't exactly easy without any guidance as the source code has a lot of history so I tried to keep the changes surgical to ease merge of your future releases.
I've been using it in production for more than a week now and it works perfectly in my configuration.
It is also compatible with older formats, should be a drop-in replacement for usual parallel uses with extra features on the side.
Would love to get feedback from other users though as it might not be completely dry.
Thanks for sharing the original source code.
Best Regards,
M

C++ build artifacts management

We are moving our build and testing system to Jenkins, and are looking for an easy way (where we don't have to code all the logic ourselves) to manage the build artifacts.
Basically we need an organised way to store them in order by the type of the build, the user who built it and such for example:
johnd/nightly/r543241/win32/program.zip
johnd/nightly/trunk/lin64/program.tar.gz
master/release/2.1/win32/program.zip
This way we can upload when a build is complete, and retrieve the required artifact in the testing stage with ease.
Until now we simply stored the files in directories on NFS, but recently started considering an artifact manager. I've looked at Artifcatory, Archiva and Nexus. But all seem very Java centered or at least required maven to work with. Since I don't want to introduce more complexity (We mainly work with python, scons is our build tool) and I don't want to introduce maven to the mix, I'm looking for something that has an easy command line (or better REST/Python interface) to upload, download, manage artifacts.
If you don't use an artifact manager, but use some other clever method to manage your C++ artifacts for release/test needs I'd be happy to hear that also.

I have exactly the same issue. I didn't succeed to integrate artifactory easilly, so I forgot it.
For the moment I use the copy artefact plugin to copy by scp the file to an NFS share. But this is not a good solution since shares may be mounted differently on two machines, and windows cannot access them directly.
But a real artifact manager would be so such a good thing for our project.
In my previous job, I was using a documentation CMS (LiveLink) which provided good services:
- self organising folder.
- persistant url (we can move, rename folders and files, a public URL to a given files never changes, that was SOOO great : "http://server/go/123456789123456789" always pointed to the same file. So you just had to give this url to wget and you get the file
- file versionning.
- meta data, advanced search
I'm still searching such a solution in opensource software and don't find it.
In you just need to share artefact between two jobs, simple use the "copy artifact for another build" plugin. I use it to split my job into a a build job and a test job (actually there are 3 test jobs, one very fast done after each commit, one quite slow done every day, and one very slow done each week end).

Implementing single script build - non-portable dependencies

It seems that a build system best-practice is to have a single script that can build all source and package the releases. See Joel Test #2
How do you account for non-portable dependencies? For example, if you code for .net 4, then you need .net 4 installed on the box. Standard MS release .net 4 is not xcopy deployable (unless I'm mistaken?). I can see a few avenues:
the dependencies are clearly stated in some resource file (wiki, txt, whatever). When you
call the build script, the build will fail if you don't have the dependency installed. This is an acceptable outcome.
The build script is responsible for setting up the environment. So if you require .net 4 and its not on the box then it installs it for you.
A flavor of #2 - instead of installing dependencies, the script spawns a pre-packaged image (virtual machine, Amazon EC2 AMI) that is setup with all dependencies
???

For implementing a build script you have to ask yourself, how much work you want/can spent on it. This leads to the question how often you have to set up the build environment. I can see #2 would be the perfect solution, but i would need a lot of work, since usually you have more than one non portable dependency.
So we use #1 one. And it works quite well. The most important thing is, that the build script is starting with some sort of self-test. It looks for everything which is needed to build the whole software and gives an error if something is not found. And it gives a clear error message, so that any new guy knows what to do to make it running. Of course as with a lot of software it is nearly never finished and gets extended by needs. The drawback that this test can take some seconds is insignificant when whole build process needs more than minutes.
A wiki (or even sth. else) with the setup solution was not a good solution for us, since after three month nobody knows where this was, but the build script is used every day.
The build script itself is a set of a lot of different things, which where chosen by needs. It is starting with a batch (we are using Windows) which invokes a lot of other things. Other batches, MSBuild, home grown tools. Each step by it self is checking for its own dependencies, to have the problem local and you can see three lines later why this special thing is needed.

Number 2 states "Can you make a build in one step?" As described this means for a development team to be effective the build process must be as simple as possible to reduce errors in the build process and insure consistency. This is especially important as a team gets larger. You want to make sure everyone is building the same thing. (What is done with that package should also be simple, but it is not as important IMHO.) Msbuild is great at this; they provide the facilities to set up a build server that access the source control system independently so the developers actions can't corrupt the build environment. I highly recommend setting up a build server using TFS -- many build issues will go away and you will have the 1-click build Joel describes.
As for your points about what that package does for deployment -- you have many options with MS, but the more "one click" you can make it the better. I believe this is slightly different than Joel's #2. In his example he describes changing what software he will use for the install not because one performs with fewer steps, but instead because one can be incorporated into a one step build.

buildbot vs hudson/jenkins for C++ continuous integration

I'm currently using jenkins/hudson for continuous integration a large mostly C++ project. We have separate projects for trunk and every branch. Also, there are some related projects for the Java code, but the setup for those are fairly basic right now (we may do more later though). The C++ projects do the following:
Builds everything with options for whether to reconfigure, do a clean build, or use a fresh checkout
Optionally builds and runs all tests
Optionally runs all tests using Valgrind's memcheck
Runs cppcheck
Generates doxygen documentation
Publishes reports: unit tests, valgrind, cppcheck, compiler warnings, SLOC, open tasks, and code coverage (using gcov, gcovr, and the cobertura plugin)
Deploys code nightly or on demand to a test environment and a package repository
Everything is configurable for automatic builds and optional for on demand builds. Underneath, there's a bash script that controls much of this, which farther depends on our build system, which uses automake and autoconf along with custom bash scripts.
We started using Hudson (at the time) because that's what the Java guys were using and we just wanted nightly builds. Since then, we've added a lot more and continue to add more. In some ways Hudson is great, but certainly isn't ideal.
I've looked at other solutions and the only one that looks like it could be a replacement is buildbot. Would buildbot be better for this situation? Is the investment worth it since we're already using Hudson? Why?
EDIT: Someone asked why I haven't found Hudson/Jenkins to be ideal. The short answer is that everything can be improved. I'm simply wondering if Jenkins is the best current solution for my use case or whether there is something better (buildbot?) that would be easier to maintain in the long run even as new requirements come up.

Both are open source projects, but you do not need to change buildbot code to "extend" it, it is actually quite easy to import your own packages in its configuration in which you can sub-class most of the features with your own additions. Examples: your own compilation or test code, some parsing of outputs/errors to be given to the next steps, your own formating of alert emails etc. there are lots of possibilities.
Generally I would say that buildbot is the most "general purpose" automatic builds tools. Jenkins however might be the best related to running tests, especially for parsing and presenting results in nice ways (results, details, charts.. some clicks away), things that buildbot does not do "out-of-the-box". I'm actually thinking of using both to have sexier test result pages.. :-)
Also as a rule of thumb it should not be difficult to create a new tool's config: if the specification of what to do (configs, builds, tests) is too hard to switch from one tool to another, it is a (bad) sign that not enough configuration scripts are moved to the sources. Buildbot (or Jenkins) should only call simple commands. If it is simple to run tests, then developers will do it as well and this will improve the success rate, whereas if only the continuous integration system runs the tests, you will be running after it to fix the new code failures, and will loose its non-regression value, just my 0.02€ :-)
Hope it'll help.

The 'result integration' is also in jenkins/hudson, and you can relatively easily capture build products without having to 'copy them elsewhere'.
For our instance, the coverage reports and unit test metrics and javadoc for the java code is all integrated. For our C++ code, the plugins are a little lacking, but you can still get most of it.
we ran buildbot since pre 0.7, and are now running 0.8 and are only now seeing any real reason to switch, as buildbot 0.8 forgot about windows slaves for an extended period of time and the support was pretty poor.

There are many other solutions out there, besides Jenkins/Hudson/BuildBot:
TeamCity by Jetbrains
Bamboo by Atlassian
Go by Thoughtworks
Cruise Control
OpenMake Meister
The specifics about what you are doing are not so important, in fact, as long as the agents (aka nodes) that you are doing them on support those tasks.
The beauty of a CI server is noticing when the build changes to trigger a new build (and test), publish the artifacts, and publish test results.
When you compare CI tools like those we mentioned, consider features like the usability of its interface, how easy is branching (and features it might offer like automatic merging), notifications (like XMPP/jabber), or an information-radiator (like hooking up a monitor to always show status). Product support is another thing to consider - Jenkins' support is only as good as who is responding to community questions at the time you have questions.
My personal favorite is Bamboo, but it comes with a license fee.

I'm a long-time Jenkins user in the middle of evaluating Buildbot and would like to offer a few items for folks considering using Buildbot for multi-module solutions:
*) Buildbot doesn't have any out-of-the-box concept of file artifacts related to each build. It's not in the UI and it's not in any of the builtin "steps" modules as far as I can see:
http://docs.buildbot.net/current/manual/configuration/buildsteps.html
...and I see no third party plugin:
https://github.com/buildbot/buildbot/wiki/PluginList#steps
Buildbot does collect all the console output from a given build, but critically, you can't collect files related to it.
*) Given that artifacts are not supported, it's not easy to create "collector" projects that bring multiple modules into say, a single installer. Jenkins has a great feature that lets you parameterize a build with builds from other modules (the parameter type is a run).
*) Establishing dependencies between modules is trickier in Buildbot. Say you have a library that three binaries depend on, and you want those binaries to rebuild each time the library changes. Jenkins has triggers built into the UI. If you want to do triggers in Buildbot you have to script them using schedulers.Dependent, and it causes a lot of item congestion in the Schedulers UI.
*) When you're working in Buildbot, it seems that pretty much all of the configuration is done in master.cfg in code. This is awesome and frustrating.
*) Buildbot forces you to create a worker in addition to a master server. This is annoying for beginners and systems for which a single build server is sufficient.
My impression after two days of Buildbot evaluation is that we'll stick with Jenkins, primarily due to it having artifacts. Buildbot is a tool we'd only use if we had more extensive customization needs, and the time to do it.

On the subject of buildbot and artifacts -- I don't have enough user score to make a comment -- you can get artifacts from buildbot 2.x series pretty easy with built-in file/directory upload actions. However you rarely want to just move files. Typically you make a triggered buildstep that does deployment directly off the worker for best results. eg push to cloud storage, containers, thirdparty (steam uploads), etc.
This way you can get metrics on the uploads and conditionally control them better (or even mix and match artifacts across worker machines).

Why should one use a build system over that which is included as part of an IDE?

I've heard more than one person say that if your build process is clicking the build button, than your build process is broken. Frequently this is accompanied with advice to use things like make, cmake, nmake, MSBuild, etc. What exactly do these tools offer that justifies manually maintaining a separate configuration file?
EDIT: I'm most interested in answers that would apply to a single developer working on a ~20k line C++ project, but I'm interested in the general case as well.
EDIT2: It doesn't look like there's one good answer to this question, so I've gone ahead and made it CW. In response to those talking about Continuous Integration, yes, I understand completely when you have many developers on a project having CI is nice. However, that's an advantage of CI, not of maintaining separate build scripts. They are orthogonal: For example, Team Foundation Build is a CI solution that uses Visual Studio's project files as it's configuration.

Aside from continuous integration needs which everyone else has already addressed, you may also simply want to automate some other aspects of your build process. Maybe it's something as simple as incrementing a version number on a production build, or running your unit tests, or resetting and verifying your test environment, or running FxCop or a custom script that automates a code review for corporate standards compliance. A build script is just a way to automate something in addition to your simple code compile. However, most of these sorts of things can also be accomplished via pre-compile/post-compile actions that nearly every modern IDE allows you to set up.
Truthfully, unless you have lots of developers committing to your source control system, or have lots of systems or applications relying on shared libraries and need to do CI, using a build script is probably overkill compared to simpler alternatives. But if you are in one of those aforementioned situations, a dedicated build server that pulls from source control and does automated builds should be an essential part of your team's arsenal, and the easiest way to set one up is to use make, MSBuild, Ant, etc.

One reason for using a build system that I'm surprised nobody else has mentioned is flexibility. In the past, I also used my IDE's built-in build system to compile my code. I ran into a big problem, however, when the IDE I was using was discontinued. My ability to compile my code was tied to my IDE, so I was forced to re-do my entire build system. The second time around, though, I didn't make the same mistake. I implemented my build system via makefiles so that I could switch compilers and IDEs at will without needing to re-implement the build system yet again.
I encountered a similar problem at work. We had an in-house utility that was built as a Visual Studio project. It's a fairly simple utility and hasn't needed updating for years, but we recently found a rare bug that needed fixing. To our dismay, we found out that the utility was built using a version of Visual Studio that was 5-6 versions older than what we currently have. The new VS wouldn't read the old-version project file correctly, and we had to re-create the project from scratch. Even though we were still using the same IDE, version differences broke our build system.
When you use a separate build system, you are completely in control of it. Changing IDEs or versions of IDEs won't break anything. If your build system is based on an open-source tool like make, you also don't have to worry about your build tools being discontinued or abandoned because you can always re-build them from source (plus fix bugs) if needed. Relying on your IDE's build system introduces a single point of failure (especially on platforms like Visual Studio that also integrate the compiler), and in my mind that's been enough of a reason for me to separate my build system and IDE.
On a more philosophical level, I'm a firm believer that it's not a good thing to automate away something that you don't understand. It's good to use automation to make yourself more productive, but only if you have a firm understanding of what's going on under the hood (so that you're not stuck when the automation breaks, if for no other reason). I used my IDE's built-in build system when I first started programming because it was easy and automatic. I later started to become more aware that I didn't really understand what was happening when I clicked the "compile" button. I did a little reading and started to put together a simple build script from scratch, comparing my output to that of the IDE's build system. After a while I realized that I now had the power to do all sorts of things that were difficult or impossible through the IDE. Customizing the compiler's command-line options beyond what the IDE provided, I was able to produce a smaller, slightly faster output. More importantly, I became a better programmer by having real knowledge of the entire development process from writing code all the way down through the generation of machine language. Understanding and controlling the entire end-to-end process allows me to optimize and customize all of it to the needs of whatever project I'm currently working on.

If you have a hands-off, continuous integration build process it's going to be driven by an Ant or make-style script. Your CI process will check the code out of version control when changes are detected onto a separate build machine, compile, test, package, deploy, and create a summary report.

Let's say you have 5 people working on the same set of code. Each of of those 5 people are making updates to the same set of files. Now you may click the build button and you know that you're code works, but what about when you integrate it with everyone else. The only you'll know is that if you get everyone else's and try. This is easy every once in a while, but it quickly becomes tiresome to do this over and over again.
With a build server that does it automatically, it checks if the code compiles for everyone all the time. Everyone always knows if the something is wrong with the build, and what the problem is, and no one has to do any work to figure it out. Small things add up, it may take a couple of minutes to pull down the latest code and try and compile it, but doing that 10-20 times a day quickly becomes a waste of time, especially if you have multiple people doing it. Sure you can get by without it, but it is so much easier to let an automated process do the same thing over and over again, then having a real person do it.
Here's another cool thing too. Our process is setup to test all the sql scripts as well. Can't do that with pressing the build button. It reloads snapshots of all the databases it needs to apply patches to and runs them to make sure that they all work, and run in the order they are supposed to. The build server is also smart enough to run all the unit tests/automation tests and return the results. Making sure it can compile is fine, but with an automation server, it can handle many many steps automatically that would take a person maybe an hour to do.
Taking this a step further, if you have an automated deployment process along with the build server, the deployment is automatic. Anyone who can press a button to run the process and deploy can move code to qa or production. This means that a programmer doesn't have to spend time doing it manually, which is error prone. When we didn't have the process, it was always a crap shoot as to whether or not everything would be installed correctly, and generally it was a network admin or a programmer who had to do it, because they had to know how to configure IIS and move the files. Now even our most junior qa person can refresh the server, because all they need to know is what button to push.

the IDE build systems I've used are all usable from things like Automated Build / CI tools so there is no need to have a separate build script as such.
However on top of that build system you need to automate testing, versioning, source control tagging, and deployment (and anything else you need to release your product).
So you create scripts that extend your IDE build and do the extras.

One practical reason why IDE-managed build descriptions are not always ideal has to do with version control and the need to integrate with changes made by other developers (ie. merge).
If your IDE uses a single flat file, it can be very hard (if not impossible) to merge two project files into one. It may be using a text-based format, like XML, but XML it notoriously hard with standard diff/merge tools. Just the fact that people are using a GUI to make edits makes it more likely that you end up with unnecessary changes in the project files.
With distributed, smaller build scripts (CMake files, Makefiles, etc.), it can be easier to reconcile changes to project structure just like you would merge two source files. Some people prefer IDE project generation (using CMake, for example) for this reason, even if everyone is working with the same tools on the same platform.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js