Gitlab Runner under GCP Load Balancer - google-cloud-platform

At the moment I have a load balancer which runs a Compute Engine Instance Group which has a minimum of 1 server and a maximum of 5 servers.
This is running auto scaling and use a pre-build ubuntu template with all the base stuff needed.
When an instance boots up it will log a runner into the GitLab project, and then trigger the job to update the instance to the latest copy of the code.
This is fine and works well.
The issue comes when I make a change to the git branch and push the changes, it only seems to be being picked up by one of the random 5 instances that have loaded.
I was under the impression that GitLab would push out to all the runners logged, but this doesn't seem to be the case.
I have seen answers on here that show multiple runners, but on a single server, I haven't come across my particular situation.
Has anyone come across this before? I would assume that this is a pretty normal situation, and weird that it doesn't just work.

For each job that runs in GitLab, only 1 runner receives the job. The mechanism is PULL based -- the runners constantly ask GitLab if there's any jobs available to run. GitLab never initiates communication with the runners.
Therefore, your load balancer rules do nothing to affect which runner receives a job and there is no "fairness" in distributing jobs across server. Runners will keep asking for jobs every few seconds as long as they are able to take them (according to concurrency settings in the config.toml) and GitLab will hand them out on a first-come, first-served basis.
If you set the concurrency to 1 and start multiple jobs, you should see multiple servers pick up the jobs.

Related

How to run dependency Jobs on PCF?

I have two batch applications ex: batchapp1 and batchapp2 . I want to run batchapp2 after completion of the batchapp1. Can we achieve it using PCF scheduler or can we achive it using Spring data-flow-server?
Right now we are doing it using Control-M and run on VM's JVM (Not on cloud).
What you need is Spring Cloud Data Flow's Composed Task functionality. With this, you'd be able to orchestrate a series of Tasks/Batch-Jobs as Direct Acyclic Graph. A graph can include sequential, parallel, or both where each of the steps is a Task/Batch-job.
For your example, in SCDF, the DSL representation would look like:
task create foo --definition "batchapp1 && batchapp2"
Upon launching the Task definition foo in SCDF on PCF, it would launch batchapp1 first and upon success/failure, it would run the batchapp2 next. You can also have transitions to run cleanup/error-handling steps based on the exit-code at each step.
As an alternative, you could do all this on the interactive drag & drop visual canvas, too.
Also to note, in PCF, all the steps would be launched as short-lived CF Tasks, which run for a finite time. That would be them running as long as the App needs to run and then it is cleanly shutdown to free-up resources.

Github to AWS EC2 using CodeDeploy best practice

We are a relatively inexperienced development team trying to do things 'the right way'. We are using Github along with AWS and CodeDeploy for multiple PHP based web applications. We are utilising Github's auto-deployment with CodeDeploy when the master branch is updated.
We have two production EC2 web servers in separate AZ's along with a single EC2 staging server.
It currently works as follows:
We write code in a branch, we push to GitHub, we merge into 'master' which then kicks off CodeDeploy to write to our staging server where we can test it. Once we have tested it we then manually kick off CodeDeploy to write to production (with the same commit ID).
The problem is, if testing brings up issues, and we have another branch waiting to be merged and tested, everything becomes backed up?
We are obviously doing something wrong. We are writing to the master branch to utilise GitHub's autodeploy, but I assumed master was only to be written to when it was ready to be deployed?
Can someone please help us and put us straight?
Thanks
Make another branch called 'livecandidate' this branch will have each of the new feature branches merged into it
Each time a feature branch is merged into 'livecandidate' pull 'livecandidate' into your Code Deploy process and install to the test machine.
If the tests pass then merge 'livecandidate' into 'master' and kick off the install to production
If the tests do not pass then unwind the merge into 'livecandidate' (assuming no dependencies on chains of changes etc)
After doing a production install or a un-merge, try the next feature
General idea is to never ever have a broken master
All problems in computer science can be solved by another level of indirection - David Wheeler

issues with elastic beanstalk leader election

We have a rails app that has been working fine for months. Today we discovered some inconsistencies with leader election. Primarily:
su - "leader_only bundle exec rake db:migrate" webapp
After many hours of trial and error (and dozens of deployments) none of the instances in our dev application run this migration. /usr/bin/leader_only looks for an environment variable that is never set on any instance (the dev app has only one instance).
Setting the application deployment to 1 instance at a time and providing the value that /usr/bin/leader_only expects as an env var works, but not as it has been and should. (Now all instances are leaders so they will fruitlessly run db:migrate and it's 1 at a time, so if we have many instances this will slow us down)
We thought maybe it was due to some issues with the code and/or app, so we rebuilt it. No change.
I even cloned our test application's RDS server and created a new application from a saved configuration, deployed a new git hash, and it never ran db:migrate as well. It attempts to and shows the leader_only line, but it never runs. That rules out code, configuration, artifacts.
Also for what its worth, it never says skipping migrations due to RAILS_SKIP_MIGRATIONS, which has a value of false. This means that it is in fact trying to run db:migrate but isn't due to not being described as the leader.
We have been in talk with the AWS support teams. It seems as though EB leader election is very fragile.
Per the tech:
Also, as explained before(Leader is the first instance in an
auto-scaling group and if it is removed we loose the leader and even
using the leader_only : true in container_commands, db:migrate doesn't
work.)
What happened is that we lost all instances. The leader is elected once, and is passed through instance rotation. If you do not lose all instances, everything is fine.
I did not mention a detail. We have many non-production environments, and through elastic beanstalk autoscaling settings, we use timed scaling to set our instance count to 0 at night, and back up to the expected 1-2 amount during the day. We do this for our dev, test, and UAT environments to make sure we dont run at full speed 24/7. Because of this, we lost the leader and never got it back.
Per the follow up from the tech:
We have a feature request in place to overcome the issue of losing the
leader when very first instance is deleted.
"Elastic Beanstalk uses leader election to determine which instance in
your worker environment queues the periodic task. Each instance
attempts to become leader by writing to a DynamoDB table. The first
instance that succeeds is the leader, and must continue to write to
the table to maintain leader status. If the leader goes out of
service, another instance quickly takes its place."
http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/using-features-managing-env-tiers.html#worker-periodictasks
In Elastic Beanstalk, you can run a command into a single "leader" instance. Just create a .ebextensions file that contain container_commands and then deploy it. Make sure you set the leader_only value to true.
For example:
.ebextensions/00_db_migration.config
container_commands:
00_db_migrate:
command: "rake db:migrate"
leader_only: true
The working directory of this command will be your new application.
The leader instance environment variable will be set by Elastic Beanstalk agent while updating time. It will not be exported to normal ssh shell.

How to replicate code changes across multiple AWS instances?

We have a load balanced setup in AWS with two instances. We do pretty frequent code updates, utilizing SVN. I need to know how easy it is to update the code changes across all the instances in our cluster. Can we simply do 'snapshots' and create new volumes each time for the instances?...or?...
I would not do updates via EBS snapshots. Think of EBS volumes as a hard disk - you would not change your harddisk if you have an update for your software.
As you have your code in a version control system, code updates should be quite simple like logging in to your (multiple) servers and doing a git pull or svn update. This should fetch the latest code files from your servers. Depending on the type of application you would have to do some other tasks afterwards, running build scripts, emptying cache etc.
The problem is that this kind of setup does not scale well. If you have n servers, you will have to login and do this command n times. Therefore it makes sense to look into some remote management tools that you can use in one step. With a lot of these tools, you also get a complete configuration management stack: you define a set of recipes or tasks (like installed packages, configuration files, fetch the latest code, necessary build steps) for each of your servers, and when you boot up a new server it fetches the lastest version of its configuration and installs itself.
Popular configuration management tools include Puppet or Salt. Both tools have remote execution included which should make your task to publish your code base easier, you would only have to fire one command on your master server and it automatically executes this task on all its minions / slave servers.

How can I modify the Load Balancing behavior Jenkins uses to control slaves?

We use Jenkins for our CI build system. We also use 'concurrent builds' so that Jenkins will build each change independently. This means we often have 5 or 6 builds of the same job running simultaneously. To accommodate this, we have 4 slaves each with 12 executors.
The problem is that Jenkins doesn't really 'load balance' among its slaves. It tries to build a job on the same slave that it previously built on (presumably to reduce the time syncing from source control). This is a problem because Jenkins will build all 6 instances of our build on the same slave (or more likely between 2 slaves). One build machine gets bogged down and runs very slowly while the rest of them sit idle.
How do I configure the load balancing behavior of Jenkins, and how it controls its slaves?
We were facing a similar issue. So I've put together a plugin that changes the Load Balancer in Jenkins to select a node that currently has the least load - https://plugins.jenkins.io/leastload/
Any feedback is appreciated.
If you do not find a plugin that does it automatically, here's an idea of what you can do:
Install Node Label Parameter plugin
Add SLAVE parameter to your jobs
Restrict jobs to run on ${SLAVE}
Add a trigger job that will do the following:
Analyze load distribution via a System Groovy Script and decide on which node to start next build.
Dispatch the build on that node with Parameterized Trigger
plugin
by assigning appropriate value to SLAVE parameter.
In order to analyze load distribution you need to install Groovy plugin and familiarize yourself with Jenkins Main Module API. Here are some useful initial pointers.
If your build machines cannot comfortably handle more than 1 build, why configure them with 12 executors? If that is indeed the case, you should reduce the number of executors to 1. My Jenkins has 30 slaves, each with 1 executor.
You may also use the Throttle Concurrent Builds plugin to restrict how many instances of a job can run in parallel on the same node
I have two labels -- one for small tasks and one for big tasks. I have one executor for the big task and 4 for the small tasks. This does balance things a little.