How can I modify the Load Balancing behavior Jenkins uses to control slaves? - build

We use Jenkins for our CI build system. We also use 'concurrent builds' so that Jenkins will build each change independently. This means we often have 5 or 6 builds of the same job running simultaneously. To accommodate this, we have 4 slaves each with 12 executors.
The problem is that Jenkins doesn't really 'load balance' among its slaves. It tries to build a job on the same slave that it previously built on (presumably to reduce the time syncing from source control). This is a problem because Jenkins will build all 6 instances of our build on the same slave (or more likely between 2 slaves). One build machine gets bogged down and runs very slowly while the rest of them sit idle.
How do I configure the load balancing behavior of Jenkins, and how it controls its slaves?

We were facing a similar issue. So I've put together a plugin that changes the Load Balancer in Jenkins to select a node that currently has the least load - https://plugins.jenkins.io/leastload/
Any feedback is appreciated.

If you do not find a plugin that does it automatically, here's an idea of what you can do:
Install Node Label Parameter plugin
Add SLAVE parameter to your jobs
Restrict jobs to run on ${SLAVE}
Add a trigger job that will do the following:
Analyze load distribution via a System Groovy Script and decide on which node to start next build.
Dispatch the build on that node with Parameterized Trigger
plugin
by assigning appropriate value to SLAVE parameter.
In order to analyze load distribution you need to install Groovy plugin and familiarize yourself with Jenkins Main Module API. Here are some useful initial pointers.

If your build machines cannot comfortably handle more than 1 build, why configure them with 12 executors? If that is indeed the case, you should reduce the number of executors to 1. My Jenkins has 30 slaves, each with 1 executor.

You may also use the Throttle Concurrent Builds plugin to restrict how many instances of a job can run in parallel on the same node

I have two labels -- one for small tasks and one for big tasks. I have one executor for the big task and 4 for the small tasks. This does balance things a little.

Related

Gitlab Runner under GCP Load Balancer

At the moment I have a load balancer which runs a Compute Engine Instance Group which has a minimum of 1 server and a maximum of 5 servers.
This is running auto scaling and use a pre-build ubuntu template with all the base stuff needed.
When an instance boots up it will log a runner into the GitLab project, and then trigger the job to update the instance to the latest copy of the code.
This is fine and works well.
The issue comes when I make a change to the git branch and push the changes, it only seems to be being picked up by one of the random 5 instances that have loaded.
I was under the impression that GitLab would push out to all the runners logged, but this doesn't seem to be the case.
I have seen answers on here that show multiple runners, but on a single server, I haven't come across my particular situation.
Has anyone come across this before? I would assume that this is a pretty normal situation, and weird that it doesn't just work.
For each job that runs in GitLab, only 1 runner receives the job. The mechanism is PULL based -- the runners constantly ask GitLab if there's any jobs available to run. GitLab never initiates communication with the runners.
Therefore, your load balancer rules do nothing to affect which runner receives a job and there is no "fairness" in distributing jobs across server. Runners will keep asking for jobs every few seconds as long as they are able to take them (according to concurrency settings in the config.toml) and GitLab will hand them out on a first-come, first-served basis.
If you set the concurrency to 1 and start multiple jobs, you should see multiple servers pick up the jobs.

How to run dependency Jobs on PCF?

I have two batch applications ex: batchapp1 and batchapp2 . I want to run batchapp2 after completion of the batchapp1. Can we achieve it using PCF scheduler or can we achive it using Spring data-flow-server?
Right now we are doing it using Control-M and run on VM's JVM (Not on cloud).
What you need is Spring Cloud Data Flow's Composed Task functionality. With this, you'd be able to orchestrate a series of Tasks/Batch-Jobs as Direct Acyclic Graph. A graph can include sequential, parallel, or both where each of the steps is a Task/Batch-job.
For your example, in SCDF, the DSL representation would look like:
task create foo --definition "batchapp1 && batchapp2"
Upon launching the Task definition foo in SCDF on PCF, it would launch batchapp1 first and upon success/failure, it would run the batchapp2 next. You can also have transitions to run cleanup/error-handling steps based on the exit-code at each step.
As an alternative, you could do all this on the interactive drag & drop visual canvas, too.
Also to note, in PCF, all the steps would be launched as short-lived CF Tasks, which run for a finite time. That would be them running as long as the App needs to run and then it is cleanly shutdown to free-up resources.

How to replicate code changes across multiple AWS instances?

We have a load balanced setup in AWS with two instances. We do pretty frequent code updates, utilizing SVN. I need to know how easy it is to update the code changes across all the instances in our cluster. Can we simply do 'snapshots' and create new volumes each time for the instances?...or?...
I would not do updates via EBS snapshots. Think of EBS volumes as a hard disk - you would not change your harddisk if you have an update for your software.
As you have your code in a version control system, code updates should be quite simple like logging in to your (multiple) servers and doing a git pull or svn update. This should fetch the latest code files from your servers. Depending on the type of application you would have to do some other tasks afterwards, running build scripts, emptying cache etc.
The problem is that this kind of setup does not scale well. If you have n servers, you will have to login and do this command n times. Therefore it makes sense to look into some remote management tools that you can use in one step. With a lot of these tools, you also get a complete configuration management stack: you define a set of recipes or tasks (like installed packages, configuration files, fetch the latest code, necessary build steps) for each of your servers, and when you boot up a new server it fetches the lastest version of its configuration and installs itself.
Popular configuration management tools include Puppet or Salt. Both tools have remote execution included which should make your task to publish your code base easier, you would only have to fire one command on your master server and it automatically executes this task on all its minions / slave servers.

Amazon EC2 multiple instances with SVN app

Heya,
Quick question.
I've got multiple instances on EC2 with a load balancer between them. I use an SVN app that used to push to my live env. at will.
With the multiple EC2's, how would I push a codebase to all of them at once?
Any thoughts/links would be appreciated.
There are a few different ways to do this.
If You Are Using Elastic Load Balancers
Write a script that:
Removes a machine from the pool
Updates the SVN repository
Re-adds the machine to the pool
Repeats for any additional machines
You could also get fancy and remove one machine, update it, remove all other machines and update them, if you're concerned about consistency.
If You Are Using a Custom Load Balancing Application
Look into Capistrano. You don't need to use it with Ruby/Rake -- you can write custom cap files that can do parallel deploys.
How about vlad or fabric for code deployment.
We use Scalr. It is available as a service (Scalr.net) or you can run it yourself (it is Open Source - though the source in the googlecode repository is sometimes a little behind the version the service uses).
Basically, Scalr has a global scripting feature whereby you can specify a script (e.g. bash, PHP, anything with #!bang) and trigger it to be run on all instances of a given 'role' (e.g. web instance). In our case we have a script that just does svn checkout or svn update as appropriate. Scalr supports periodic scheduling of scripts, so in the dev environment I run it every 5 mins to keep dev in synch with SVN, but obviously I manually trigger it for production.
(I have the script taking a param to specify the SVN branch to use)

What pattern do you use for executing generic steps on a pool of machines

I'm trying to figure out how to model my build process in hudson. At present most of our hudson builds are somewhat hard coded in that the build process is series of steps and we have one process per branch.
I have another build system that has many active branches and each build has a series of integration tests which require a suit of machines to execute. As I migrate from the home grown to hudson I'm not quite sure what's the right way to model this to keep sustainability costs and build times to a minimum
Here's my basic build:
create workspace
compile, link, package
transfer artifacts to test systems
invoke test harness on multiple systems to handle installation and acceptance tests
collect results
publish results
I'd like the integration part to be a group of generic machines (perhaps an elastic group) which can handle integration-tests for any branch. I want to run as much in parallel as possible to keep my build times low. It looks like the best way to execute in parallel on hudson is to break up steps into jobs and use the Parameterized Trigger Plugin to customize the generic jobs.
So, i'd have two main jobs: build, test
I could have on build job per branch and a generic test job. The build job would use Parameterized Trigger Plugin to call the test job and provide the location of the build artifacts. The test job would call a series of jobs in parallel passing down parameters for branch, artifact.
test
test-client-install (params: artifact location, branch)
test-server-install (params: artifact location, branch)
test-run (params: client machine, server machine)
join - collect results (params: client machine, server machine)
Each of the test-* jobs would pull a slave out of the group of slaves and execute. I'm not quite sure how to inform the slaves running the client and server jobs how to find each other nor am I sure how to reserve them from the pool and release them back into it.
I guess, I could have a write properties to a common share and have the sub jobs use that for inter-job communication.
Has anyone created this kind of complex setup in hudson, or is this usually done in another system with which hudson interacts (hudson + STAF with STAF managing resources)?
A few thoughts. The fewer jobs you have, the easier it is to maintain them. The more jobs you have, the more flexible you are and you can run more in parallel. Since you emphasized fast build times, have a look at the join plugin, to run a few jobs in parallel and when all of these are finished to go on with another job in that chain.
If your server is big enough, you can experiment a little bit with the clone workspace plugin. It will help you to reduce the need for manual copying files between jobs and the need for the parametrized trigger plugin.
Reserving a slave is easy. You can group the slaves with labels. In your job you define what label you node has to have in order to execute your job. A node can have more than one label and your job can be bound to more than one label. This way, Hudson decides where to put your job depending on availability. If your slaves have more than one build queue, they can run two jobs in parallel. I haven't used the locks-and-latches plugin for synchronizing across nodes. So I don't know if locks are only per node or for the whole Hudson installation. Latches are not supported yet. If you need to ensure that two jobs need to run on the same slave, try to combine them, otherwise you will lose the advantage of Hudson to distribute your jobs freely over the available nodes.