GNU Parallel host sticky jobs

GNU Parallel host sticky jobs - c++

I am writing a parallel build farm to build C++ cross-platform applications against various platforms / environments. Every time new code is pushed to a git repo, I build and test the latest code against all the platforms.
I've setup parallel to correctly distribute the jobs among several hosts using the --sshlogin option.
I transfer files, collect output and results. It's all working more than fine and I love the tool.
The build time being sometimes quite long for some platforms, I would like the build to be as incremental as possible.
My only issue is that the build is only incremental if the scheduler sends the jobs to the same machine and reuse the artefacts of the previous build on this specific host.
Say I have 3 hosts, I have 1 chance in 3 for the build to be incremental. If a hosts hasn't built this platform in a while, it might take a long time.
Is it possible to gain control over the host a specific input source will run on and only fallback to the other hosts if the host is busy?
Ideally, I would love to see a tag system where I tag input source with a name and tag several hosts with a name, creating pools of jobs and pools of machines specialized into that type of build.
But a very simple implementation where the input sources are distributed in the same order as the order the sshlogins are defined could be a simple & quick fix in my situation.
I tried to find the source code to implement it myself but I only see doc generation when I browse the code on Savannah.
Any ideas?
Thanks,
M

There is currently no support for prioritizing a given argument to a given sshlogin. The source code is at https://savannah.gnu.org/git/?group=parallel
Feel free to join the mailing list and discuss the idea: https://lists.gnu.org/mailman/listinfo/parallel
The only priority in the code is when a job has failed on an sshlogin, then GNU Parallel prefers to retry that job on another sshlogin. Maybe that could be extended?
If a job is marked as having failed -1 time for a given sshlogin, then GNU Parallel ought to prefer to run the job on that sshlogin.

I've been trying to discuss this idea on the mailing list as you suggested but never had any respone in more than 10 days... I guess you must be busy with other things at the moment. So I went along and forked the source code to make the necessary changes and make my solution work.
I pushed it there a week ago:
http://michakfromparis.github.io/gnu-parallel-sticky/
the source code is available on github here:
https://github.com/michaKFromParis/gnu-parallel-sticky
Wasn't exactly easy without any guidance as the source code has a lot of history so I tried to keep the changes surgical to ease merge of your future releases.
I've been using it in production for more than a week now and it works perfectly in my configuration.
It is also compatible with older formats, should be a drop-in replacement for usual parallel uses with extra features on the side.
Would love to get feedback from other users though as it might not be completely dry.
Thanks for sharing the original source code.
Best Regards,
M

Related

Advice for Web-based Remote Build System

I'm interested in setting up a remote build system at work, initially for internal use, potentially for some customers going forward. We need to compile library code on several different machines (PC, Mac) and with multiple compilers, and it can be a real pain trying to get access to a full set. This is not our main build system, which is Jenkins-based and uses an approach that is not easily modified for the purpose envisaged here.
The idea would be that you could post your source to a website with some basic build parameters, it would compile the code and you could then download the generated code. Ideally users could pick which version of the underlying software they compiled their libraries against. I envisage it being supported by a virtual machine.
Reason I'm posting is that I don't really want to roll-my-own as much as possible - longer term it has maintenance implications - and would prefer something as pre-existing as possible. Obviously one would expect some adaptation in terms of scripting.
Any suggestions? It would have to be supported on Mac and PC at absolute minimum.

This sounds like something you could do by creating a parameterized Jenkins job (the build params given as input to your web frontent could be passed on to the job, perhaps via the Jenkins API). Personally, I would see if you could skip the step of creating a new webfrontend, and have users pass their build params directly to Jenkins.
To support downloading the resulting compiled code, you could have the Jenkins job archive the build as an artifact. Users could then download the files from the result page for that individual build.
As for how to make a Jenkins job accept source code to compile as input, perhaps you could use branches in your CM system? Your users could push their code to a branch, and then pass the branch as a build param. Otherwise, you might be able to use the file parameter feature of Jenkins.

Implementing single script build - non-portable dependencies

It seems that a build system best-practice is to have a single script that can build all source and package the releases. See Joel Test #2
How do you account for non-portable dependencies? For example, if you code for .net 4, then you need .net 4 installed on the box. Standard MS release .net 4 is not xcopy deployable (unless I'm mistaken?). I can see a few avenues:
the dependencies are clearly stated in some resource file (wiki, txt, whatever). When you
call the build script, the build will fail if you don't have the dependency installed. This is an acceptable outcome.
The build script is responsible for setting up the environment. So if you require .net 4 and its not on the box then it installs it for you.
A flavor of #2 - instead of installing dependencies, the script spawns a pre-packaged image (virtual machine, Amazon EC2 AMI) that is setup with all dependencies
???

For implementing a build script you have to ask yourself, how much work you want/can spent on it. This leads to the question how often you have to set up the build environment. I can see #2 would be the perfect solution, but i would need a lot of work, since usually you have more than one non portable dependency.
So we use #1 one. And it works quite well. The most important thing is, that the build script is starting with some sort of self-test. It looks for everything which is needed to build the whole software and gives an error if something is not found. And it gives a clear error message, so that any new guy knows what to do to make it running. Of course as with a lot of software it is nearly never finished and gets extended by needs. The drawback that this test can take some seconds is insignificant when whole build process needs more than minutes.
A wiki (or even sth. else) with the setup solution was not a good solution for us, since after three month nobody knows where this was, but the build script is used every day.
The build script itself is a set of a lot of different things, which where chosen by needs. It is starting with a batch (we are using Windows) which invokes a lot of other things. Other batches, MSBuild, home grown tools. Each step by it self is checking for its own dependencies, to have the problem local and you can see three lines later why this special thing is needed.

Number 2 states "Can you make a build in one step?" As described this means for a development team to be effective the build process must be as simple as possible to reduce errors in the build process and insure consistency. This is especially important as a team gets larger. You want to make sure everyone is building the same thing. (What is done with that package should also be simple, but it is not as important IMHO.) Msbuild is great at this; they provide the facilities to set up a build server that access the source control system independently so the developers actions can't corrupt the build environment. I highly recommend setting up a build server using TFS -- many build issues will go away and you will have the 1-click build Joel describes.
As for your points about what that package does for deployment -- you have many options with MS, but the more "one click" you can make it the better. I believe this is slightly different than Joel's #2. In his example he describes changing what software he will use for the install not because one performs with fewer steps, but instead because one can be incorporated into a one step build.

buildbot vs hudson/jenkins for C++ continuous integration

I'm currently using jenkins/hudson for continuous integration a large mostly C++ project. We have separate projects for trunk and every branch. Also, there are some related projects for the Java code, but the setup for those are fairly basic right now (we may do more later though). The C++ projects do the following:
Builds everything with options for whether to reconfigure, do a clean build, or use a fresh checkout
Optionally builds and runs all tests
Optionally runs all tests using Valgrind's memcheck
Runs cppcheck
Generates doxygen documentation
Publishes reports: unit tests, valgrind, cppcheck, compiler warnings, SLOC, open tasks, and code coverage (using gcov, gcovr, and the cobertura plugin)
Deploys code nightly or on demand to a test environment and a package repository
Everything is configurable for automatic builds and optional for on demand builds. Underneath, there's a bash script that controls much of this, which farther depends on our build system, which uses automake and autoconf along with custom bash scripts.
We started using Hudson (at the time) because that's what the Java guys were using and we just wanted nightly builds. Since then, we've added a lot more and continue to add more. In some ways Hudson is great, but certainly isn't ideal.
I've looked at other solutions and the only one that looks like it could be a replacement is buildbot. Would buildbot be better for this situation? Is the investment worth it since we're already using Hudson? Why?
EDIT: Someone asked why I haven't found Hudson/Jenkins to be ideal. The short answer is that everything can be improved. I'm simply wondering if Jenkins is the best current solution for my use case or whether there is something better (buildbot?) that would be easier to maintain in the long run even as new requirements come up.

Both are open source projects, but you do not need to change buildbot code to "extend" it, it is actually quite easy to import your own packages in its configuration in which you can sub-class most of the features with your own additions. Examples: your own compilation or test code, some parsing of outputs/errors to be given to the next steps, your own formating of alert emails etc. there are lots of possibilities.
Generally I would say that buildbot is the most "general purpose" automatic builds tools. Jenkins however might be the best related to running tests, especially for parsing and presenting results in nice ways (results, details, charts.. some clicks away), things that buildbot does not do "out-of-the-box". I'm actually thinking of using both to have sexier test result pages.. :-)
Also as a rule of thumb it should not be difficult to create a new tool's config: if the specification of what to do (configs, builds, tests) is too hard to switch from one tool to another, it is a (bad) sign that not enough configuration scripts are moved to the sources. Buildbot (or Jenkins) should only call simple commands. If it is simple to run tests, then developers will do it as well and this will improve the success rate, whereas if only the continuous integration system runs the tests, you will be running after it to fix the new code failures, and will loose its non-regression value, just my 0.02€ :-)
Hope it'll help.

The 'result integration' is also in jenkins/hudson, and you can relatively easily capture build products without having to 'copy them elsewhere'.
For our instance, the coverage reports and unit test metrics and javadoc for the java code is all integrated. For our C++ code, the plugins are a little lacking, but you can still get most of it.
we ran buildbot since pre 0.7, and are now running 0.8 and are only now seeing any real reason to switch, as buildbot 0.8 forgot about windows slaves for an extended period of time and the support was pretty poor.

There are many other solutions out there, besides Jenkins/Hudson/BuildBot:
TeamCity by Jetbrains
Bamboo by Atlassian
Go by Thoughtworks
Cruise Control
OpenMake Meister
The specifics about what you are doing are not so important, in fact, as long as the agents (aka nodes) that you are doing them on support those tasks.
The beauty of a CI server is noticing when the build changes to trigger a new build (and test), publish the artifacts, and publish test results.
When you compare CI tools like those we mentioned, consider features like the usability of its interface, how easy is branching (and features it might offer like automatic merging), notifications (like XMPP/jabber), or an information-radiator (like hooking up a monitor to always show status). Product support is another thing to consider - Jenkins' support is only as good as who is responding to community questions at the time you have questions.
My personal favorite is Bamboo, but it comes with a license fee.

I'm a long-time Jenkins user in the middle of evaluating Buildbot and would like to offer a few items for folks considering using Buildbot for multi-module solutions:
*) Buildbot doesn't have any out-of-the-box concept of file artifacts related to each build. It's not in the UI and it's not in any of the builtin "steps" modules as far as I can see:
http://docs.buildbot.net/current/manual/configuration/buildsteps.html
...and I see no third party plugin:
https://github.com/buildbot/buildbot/wiki/PluginList#steps
Buildbot does collect all the console output from a given build, but critically, you can't collect files related to it.
*) Given that artifacts are not supported, it's not easy to create "collector" projects that bring multiple modules into say, a single installer. Jenkins has a great feature that lets you parameterize a build with builds from other modules (the parameter type is a run).
*) Establishing dependencies between modules is trickier in Buildbot. Say you have a library that three binaries depend on, and you want those binaries to rebuild each time the library changes. Jenkins has triggers built into the UI. If you want to do triggers in Buildbot you have to script them using schedulers.Dependent, and it causes a lot of item congestion in the Schedulers UI.
*) When you're working in Buildbot, it seems that pretty much all of the configuration is done in master.cfg in code. This is awesome and frustrating.
*) Buildbot forces you to create a worker in addition to a master server. This is annoying for beginners and systems for which a single build server is sufficient.
My impression after two days of Buildbot evaluation is that we'll stick with Jenkins, primarily due to it having artifacts. Buildbot is a tool we'd only use if we had more extensive customization needs, and the time to do it.

On the subject of buildbot and artifacts -- I don't have enough user score to make a comment -- you can get artifacts from buildbot 2.x series pretty easy with built-in file/directory upload actions. However you rarely want to just move files. Typically you make a triggered buildstep that does deployment directly off the worker for best results. eg push to cloud storage, containers, thirdparty (steam uploads), etc.
This way you can get metrics on the uploads and conditionally control them better (or even mix and match artifacts across worker machines).

Slow Complex Builds & Hudson vs. Electric Cloud

Is hudson the right tool for complex C++ builds?
I have a C++ build that takes about 4 hours. Compile and packaging take about 1/2 the time and testing consumes the other half. Presently, we are using a home grown system but there's some move to go to hudson since we use it for all of our java builds.
My problem is that continuous integration isn't very...continuous at 4 hour intervals. I want a tool that's going to let me parallelize the build in an understandable way.
Hudson's been great for small builds or java builds where I'm sitting at the top of a large maven project, but I don't think it will scale well for complex c++ builds.
What have your experiences been?

Seems like you have a few questions here:
Should I use a CI server to manage my C++ build? The answer to this is unequivocally YES. Your homegrown system may be adequate, but it's not standard, extending it is probably difficult, and maintaining it is a distraction from the work you're actually paid to do.
Is Hudson the right choice for my project? It will probably get the job done, and it has the advantage of being in deployment at your site already. However, you specifically mention that you want a tool that supports parallelization well, and I don't think that Hudson really fits the bill. The problem is that Hudson was not designed with parallelism in mind. See, the representation of a build process in Hudson is a "job", which is just a series of steps executed in sequence -- checkout, compile, test, package, etc. There's no way to get those steps to run in parallel. Now, you can get around this by modeling your process with multiple jobs. Each job is completely independent, so of course they could be run in parallel; you can use something like the Locks and Latches plugin to coordinate the jobs, but the whole process is more complicated than it ought to be, and kind of clumsy -- instead of a single job representing a single run of the build process, you have several unconnected jobs, at best tied together via naming convention.
Can Electric Cloud help? Again, an unequivocal YES. Electric Cloud offers ElectricCommander, a CI server with parallel support built-in from inception. As with Hudson, a job is used to represent a build process, but the steps within a job can easily be run in parallel (just check the "parallel" box on those steps), so you don't have to resort to add-ons and kludges: one run build process is one job, with as many parallel steps as you like.
Will the right CI server put "continuous" back into my integration? A CI server will only get you so far. The thing is, a CI server can provide you coarse-grained parallelism -- so with a little work, you can set it up to run packaging in parallel with tests, for example. With a little more work, you can probably split your test phase into a few independent pieces that can be run in parallel.
You didn't give many details, but let's assume that your build is 90 minutes of compile, 30 minutes of packaging, and 2 hours of tests that can be broken down into four 30 minute pieces. Suppose further that you can do packaging and testing simultaneously. That would bring your 4 hour process down to 2 hours total. At this point the "long pole" in your process is the compile phase, and although you might be able to break that up by hand into pieces that can be run in parallel by your CI server, the truth is that the CI server is just not the right tool for that job.
A better option is to use a build tool that can give you automatic fine-grained parallelism within the compile phase. For example, if you're using gmake already, you can try gmake -j 8 to run 8 compiles at once. If your makefiles are clean and your dependencies are all correct, and you have a beefy build server, this could give you a pretty good performance boost. You could also use ElectricAccelerator, another product from Electric Cloud, that was specifically designed to accelerate this portion of the build process, even for builds that can't safely use gmake -j due to incorrect or incomplete depedencies.
Hope that helps.

Can you not split the build into multiple parts whatsoever?
You do mention that the job has several distinct parts. The general guidance with Hudson is to do the build part in one job, testing in another, packaging in another, and so on.
You can compile the code in Job A and archive the output, then tell Job B to copy these artifacts from Job A and run the tests on them. Meanwhile, another Job A can be kicked-off, due to further commits to the source repository..

Sounds to me like the problem is with your build process (make files?, msbuild?) and not Hudson. Hudson is simply going to execute the build process the same way a user would from a command-line. Is it possible to optimize your build process?
Even if a 4 hour build process is unavoidable, Hudson can help because you can attach an unlimited number of slave machines which can all be running multiple builds in parallel, given adequate hardware horsepower.

What is the purpose of a dedicated "Build Server"? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
This post was edited and submitted for review 1 year ago and failed to reopen the post:
Original close reason(s) were not resolved
Improve this question
I haven't worked for very large organizations and I've never worked for a company that had a "Build Server".
What is their purpose?
Why aren't the developers building the project on their local machines, or are they?
Are some projects so large that more powerful machines are needed to build it in a reasonable amount of time?
The only place I see a Build Server being useful is for continuous integration with the build server constantly building what is committed to the repository. Is it I have just not worked on projects large enough?
Someone, please enlighten me: What is the purpose of a build server?

The reason given is actually a huge benefit. Builds that go to QA should only ever come from a system that builds only from the repository. This way build packages are reproducible and traceable. Developers manually building code for anything except their own testing is dangerous. Too much risk of stuff not getting checked in, being out of date with other people's changes, etc. etc.
Joel Spolsky on this matter.

Build servers are important for several reasons.
They isolate the environment The local Code Monkey developer says "It compiles on my machine" when it won't compile on yours. This can mean out-of-sync check-ins or it could mean a dependent library is missing. Jar hell isn't near as bad as .dll hell; either way, using a build server is cheap insurance that your builds won't mysteriously fail or package the wrong libraries by mistake.
They focus the tasks associated with builds. This includes updating the build tag, creating any distribution packaging, running automated tests, creating and distributing build reports. Automation is the key.
They coordinate (distributed) development. The standard case is where multiple developers are working on the same code base. The version control system is the heart of this sort of distributed development but depending on the tool, the developers may not interact with each other's code much. Instead of forcing developers to risk bad builds or worry about merging code overly aggressively, design the build process where the automated build can see the appropriate code and processes the build artifacts in a predictable way. That way when a developer commits something with a problem, like not checking in a new file dependency, they can be notified quickly. Doing this in a staged area let's you flag the code that has built so that developers don't pull code that would break their local build. PVCS did this quite well using the idea of promotion groups. Clearcase could do it too using labels but would require more process administration than a lot of shops care to provide.

What is their purpose?
Take load of developer machines, provide a stable, reproducible environment for builds.
Why aren't the developers building the project on their local machines, or are they?
Because with complex software, amazingly many things can go wrong when just "compiling through". problems I have actually encountered:
incomplete dependency checks of different kinds, resulting in binaries not being updated.
Publish commands failing silently, the error message in the log ignored.
Build including local sources not yet commited to source control
(fortunately, no "damn customers" message boxes yet..).
When trying to avoid above problem by building from another folder, some files picked from the wrong folder.
Target folder where binaries are aggregated contains additional stale developer files that shoulkd not be included in release
We've got an amazing stability increase since all public releases start with a get from source control onto an empty folder. Before, there were lots of "funny problems" that "went away when Joe gave me a new DLL".
Are some projects so large that more powerful machines are needed to build it in a reasonable amount of time?
What's "reasonable"? If I run a batch build on my local machine, there are many things I can't do. Rather than pay developers for builds to complete, pay IT to buy a real build machine already.
Is it I have just not worked on projects large enough?
Size is certainly one factor, but not the only one.

A build server is a distinct concept to a Continuous Integration server. The CI server exists to build your projects when changes are made. By contrast a Build server exists to build the project (typically a release, against a tagged revision) on a clean environment. It ensures that no developer hacks, tweaks, unapproved config/artifact versions or uncommitted code makes it into the released code.

The build server is used to build everyone's code when it is checked in. Your code may compile locally, but you most likely won't have all the change made by everyone else all the time.

To add on what has already been said :
An ex-colleague worked on the Microsoft Office team and told me a complete build sometimes took 9 hours. That would suck to do it on YOUR machine, wouldn't it?

It's necessary to have a "clean" environment free of artifacts of previous versions (and configuration changes) in order to ensure that builds and tests work and don't depend on the artifacts. An effective way to isolate is to create a separate build server.

I agree with the answers so far in regards to stability, tracability, and reproducability. (Lots of 'ity's, right?). Having ONLY ever worked for large companies (Health Care, Finance) with MANY build servers, I would add that it's also about security. Ever seen the movie Office Space? If a disgruntled developer builds a banking application on his local machine and no one else looks at it or tests it... BOOM. Superman III.

These machines are used for several reasons, all trying to help you provide a superior product.
One use is to simulate a typical end user configuration. The product might work on your computer, with all your development tools and libraries set up, but the end user most likely won't have the same configuration as you. For that matter, other developers won't have the exact same setup as you either. If you have a hardcoded path somewhere in your code, it will probably work on your machine, but when Dev El O'per tries to build the same code, it won't work.
Also they can be used to monitor who broke the product last, with what update, and where the product regressed at. Whenever new code is checked in, the build server builds it, and if it fails, its clear that something is wrong and the user who committed last is at fault.

For consistent quality and to get the build 'off your machine' to spot environment errors and so that any files you forget to check in to source control also show up as build errors.
I also use it to create installers as these take a lot of time to do on the desktop with code signing etc.

We use one so that we know that the production/test boxes have the same libraries and versions of those libraries installed as what is available on the build server.

It's about management and testing for us. With a build server we always know that we can build our main "trunk" line from version control. We can create a master install with one-click and publish it to the web. We can run all of our unit tests each time code is checked in to make sure it works. By collecting all these tasks into a single machine it makes it easier to get it right repeatedly.

You are right that developers could build on their own machines.
But these are some of the things our build server buys us, and we're hardly sophisticated build makers:
Version control issues (some have been mentioned in earlier responses)
Efficiency. Devs don't have to stop to make builds locally. They can kick it off on the server and get on to the next task. If builds are large, then that is even more time the dev's machine is not occupied. For those doing continuous integration and automated testing, even better.
Centralization. Our build machine has scripts that make the build, distribute it to UAT environments, and even to production staging. Keeping them in one place reduces the hassle of keeping them in sync.
Security. We don't do much special here, but I'm sure a sysadmin can make it such that production migration tools can only be accessed on a build server by certain authorized entities.

Maybe i'm the only one...
I think everyone agrees that one should
use a file repository
do builds from the repository (and in a clean environment)
use a continous testing server (e.g. cruise control) to see if anything is broken after your "fixes"
But no one cares about automatically built versions.
When something was broken in an automatic build, but it's not anymore - who cares? It's a work in progress. Someone fixed it.
When you want to do a release version, you run a build from the repository. And i'm pretty sure you want to tag the version in the repository at that time and not every six hours when the server does it's work.
So, maybe a "build server" is just a misnomer and it's actually a "continous test server". Otherwise it sounds pretty much useless.

A build server gets you a sort of second opinion of your code. When you check it in, the code is checked. If it works, the code has a minimum quality.

Additionally, remember that low level languages take much longer to compile than high level languages. It's easy to think "Well look, my .Net project compiles in a couple of seconds! What's the big deal?" Awhile back I had to mess with some C code and I had forgotten how much longer it takes to compile.

A build server is used to schedule compile tasks (e.g. nightly builds) of usually large projects located in a repository that can sometimes take more than a couple of hours.

A build server also gives you a basis for escrow, being able to capture all the parts necessary to reproduce a build in the case that others may have rights to take ownership.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js