I am planning to use Azure VMSS for deploying a set of spring boot apps. I am planning to create a custom linux VM image with all the required softwares/utilities as well as the required directory structure and configure this image in VMSS. We use jenkins as CI/CD tool and Git as source code repo. What is the best way to build and deploy these spring boot apps on VMSS?
I think one way is to write a custom script extension which downloads code from Git repo and then starts these spring boot apps. I believe this script will then get executed every time a new VM is provisioned.
But what about cases where already multiple VMs are running on top of minimum scale instance count. I believe a manual restart will not trigger the CSE script to run on these already running VMs right?
Could anyone advise the best way to handle this?
Also once a VM is deallocated due to auto scale down, what is the best/cost optimal way to back up the log files from VM to storage (blob or file share)?
You could enable Automatically tear down virtual machines after every use in the organization settings/project setting >> agent pool >> VMSS agent pool >> settings. Then, a new VM instance is used for every job. After running a job, the VM will go offline and be reimaged before it picks up another job. The Custom Script Extension will be executed on every virtual machine in the scaleset immediately after it is created or reimaged. Here is the reference document: Create the scale set agent pool.
To back up the log files from VM, you could refer to Troubleshoot and support about related file path on the target virtual machine.
Please let me know if this question is more appropriate for a different channel but I was wondering what the recommended tools are for being able to install, configure and deploy hadoop/spark across a large number of remote servers. I'm already familiar with how to setup all of the software but I'm trying to determine what I should start using that would allow me to easily deploy across a large number of servers. I've started to look into configuration management tools (ie. chef, puppet, ansible) but was wondering what the best and most user friendly option to start off with is out there. I also do not want to use spark-ec2. Should I be creating homegrown scripts to loop through a hosts file containing IP? Should I use pssh? pscp? etc. I want to just be able to ssh with as many servers as needed and install all of the software.
If you have some experience in scripting language then you can go for chef. The recipes are already available for deployment and configuration of cluster and it's very easy to start with.
And if wants to do it by your own then you can use sshxcute java API which runs the script on remote server. You can build up the commands there and pass them to sshxcute API to deploy the cluster.
Check out Apache Ambari. Its a great tool for central management of configs, adding new nodes, monitoring the cluster, etc. This would be your best bet.
I'm new to the configuration management and deployment tools. I have to implement a Continuous Delivery/Continuous Deployment tool for one of the most interesting projects I've ever put my hands on.
First of all, individually, I'm comfortable with AWS, I know what Ansible is, the logic behind it and its purpose. I do not have same level of understanding of Docker but I got the idea. I went through a lot of Internet resources, but I can't get the the big picture.
What I've been struggling is how they fit together. Using Ansible, I can manage my Infrastructure as Code; building EC2 instances, installing packages... I can even deploy a full application by pulling its code, modify config files and start web server. Docker is, itself, a tool that packages an application and ensures that it can be run wherever you deploy it.
My problems are:
How does Docker (or Ansible and Docker) extend the Continuous Integration process!?
Suppose we have a source code repository, the team members finish working on a feature and they push their work. Jenkins detects this, runs all the acceptance/unit/integration test suites and if they all passed, it declares it as a stable build. How Docker fits here? I mean when the team pushes their work, does Jenkins have to pull the Docker file source coded within the app, build the image of the application, start the container and run all the tests against it or it runs the tests the classic way and if all is good then it builds the Docker image from the Docker file and saves it in a private place?
Should Jenkins tag the final image using x.y.z for example!?
Docker containers configuration :
Suppose we have an image built by Jenkins stored somewhere, how to handle deploying the same image into different environments, and even, different configurations parameters ( Vhosts config, DB hosts, Queues URLs, S3 endpoints, etc...) What is the most flexible way to deal with this issue without breaking Docker principles? Are these configurations backed in the image when it gets build or when the container based on it is started, if so how are they injected?
Ansible and Docker:
Ansible provides a Docker module to manage Docker containers. Assuming I solved the problems mentioned above, when I want to deploy a new version x.t.z of my app, I tell Ansible to pull that image from where it was stored on, start the app container, so how to inject the configuration settings!? Does Ansible have to log in the Docker image, before it's running ( this sounds insane to me ) and use its Jinja2 templates the same way with a classic host!? If not, how is this handled?!
Excuse me if it was a long question or if I misspelled something, but this is my thinking out loud. I'm blocked for the past two weeks and I can't figure out the correct workflow. I want this to be a reference for future readers.
Please, it would very helpful to read your experiences and solutions because this looks like a common workflow.
I would like to answer in parts
How does Docker (or Ansible and Docker) extend the Continuous Integration process!?
Since docker images same everywhere, you use your docker images as if they are production images. Therefore, when somebody committed a code, you build your docker image. You run tests against it. When all tests pass, you tag that image accordingly. Since docker is fast, this is a feasible workflow.
Also docker changes are incremental; therefore, your images will have minimal impact on storage. Also when your tests fail, you may also choose to save that image too. In this way, developer will pull that image and investigate easily why your tests failed. Developer may choose to run tests in their machine too since docker images in jenkins and their machine are not different.
What this brings that all developers will have same environment, same version of all software since you decide which one will be used in docker images. I have come across to bugs that are due to differences between developer machines. For example in the same operating system, unicode settings may affect your code. But in docker images all developers will test against same settings, same version software.
Docker containers configuration :
If you are using a private repository, and you should use one, then configuration changes will not affect hard disk space much. Therefore except security configurations, such as db passwords, you can apply configuration changes to docker images(Baking the Configuration into the Container). Then you can use ansible to apply not-stored configurations to deployed images before/after startup using environment variables or Docker Volumes.
https://dantehranian.wordpress.com/2015/03/25/how-should-i-get-application-configuration-into-my-docker-containers/
Does Ansible have to log in the Docker image, before it's running (
this sounds insane to me ) and use its Jinja2 templates the same way
with a classic host!? If not, how is this handled?!
No, ansible will not log in the Docker image, but ansible with Jinja2 templates can be used to change dockerfile. You can change dockerfile with templates and can inject your configuration to different files. Tag your files accordingly and you have configured images to spin up.
Regarding your question about handling multiple environment configurations using the same Docker image, I have been planning on using a Service Discovery tool like Consul as a centralized config/property management tool. So, when you start your container up, you set an ENV var that tells it what application it is (appID), and what environment config it should use (ex: MyApplication:Dev) and it will pull its config from Consul at startup. I still have to investigate the security around Consul (as if we are storing DB connection credentials in there for example, how do we restrict who can query/update those values). I don't want to just use this for containers, but all apps in general. Another cool capability is to change the config value in Consul and have a hook back into your app to apply the changes immediately (maybe like a REST endpoint on your app to push changes down to and dynamically apply it). Of course your app has to be written to support this!
You might be interested in checking out Martin Fowler's blog articles on immutable infrastructure and on Phoenix servers.
Although not a complete solution, I have suggestions for two of your issues. Although they might not be perfect, these are the practices we are using in our workflow, and prove themselves so far.
Defining different environments - supposing you've written a different Ansible role for each environment you launch, we define an environment variable setting the environment we wish the container to belong to. We then download the suitable configuration file from an S3 bucket using the env variable set before into the container (which should be possible if you supply AWS creds or give your server an IAM role) and inject these parameters into the code when building it.
Ansible doesn't need to log into the docker app, but the solution is a bit tricky. I've tried two ways of tackling this problem, and both aren't ideal. The first one is to download the configuration file as part of the docker image command line, and build the app on container startup. While this solution works - it breaches the Docker philosophy and makes the image highly prone to build errors.
Another solution is pushing several images to your docker hub repo, and then pulling the appropriate image according to the environment at hand.
In a broader stroke, I've tried launching our app completely with Ansible and it was hell, many configuration steps are tricky and get trickier when you try to implement them as a playbook. When I switched to maintaining the severs alone with Ansible, and deploying the app itself with Docker things got a lot easier.
I am currently using AWS ElasticBeanStalk and I was curious as to how (as in internally) it knows that when you fire up an instance (or it automatically does with scaling), to unpack the zip I uploaded as a version? Is there some enviroment setting that looks up my zip in my S3 bucket and then unpacks automatically for every instance running in that environment?
If so, could this be used to automate a task such as run an SQL query on boot-up (instance deployment) too? Are these automated tasks changeable or viewable at all?
Thanks
I don't know how beanstalk knows which version to download and unpack, but running a task on start-up is trivial. Check out cloud-init, a tool written by Ubuntu that's now packaged in Amazon Linux. It allows you to pass arbitrary shell scripts into the UserData section of the instance configuration, and those shell scripts will run on startup.
It's a great way to bootstrap instances on startup, which avoids the soul-sucking misery of managing AMIs.
A quick (possibly non-applicable) warning: If you're running a SQL query on a database that lives on the beanstalk AMI, you're pretty much guaranteed to lose your database at some point. Those machines are designed to be entirely transient. Do not put databases on them. See this answer for more details.
Since your goal seems to be to run custom configuration tasks, the answer is yes, there is a way to do that. You can define custom actions in an .ebextensions file packaged with your app. For example, you can configure a command to run every time a new machine is deployed:
http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/customize-containers-ec2.html#linux-commands
Heya,
Quick question.
I've got multiple instances on EC2 with a load balancer between them. I use an SVN app that used to push to my live env. at will.
With the multiple EC2's, how would I push a codebase to all of them at once?
Any thoughts/links would be appreciated.
There are a few different ways to do this.
If You Are Using Elastic Load Balancers
Write a script that:
Removes a machine from the pool
Updates the SVN repository
Re-adds the machine to the pool
Repeats for any additional machines
You could also get fancy and remove one machine, update it, remove all other machines and update them, if you're concerned about consistency.
If You Are Using a Custom Load Balancing Application
Look into Capistrano. You don't need to use it with Ruby/Rake -- you can write custom cap files that can do parallel deploys.
How about vlad or fabric for code deployment.
We use Scalr. It is available as a service (Scalr.net) or you can run it yourself (it is Open Source - though the source in the googlecode repository is sometimes a little behind the version the service uses).
Basically, Scalr has a global scripting feature whereby you can specify a script (e.g. bash, PHP, anything with #!bang) and trigger it to be run on all instances of a given 'role' (e.g. web instance). In our case we have a script that just does svn checkout or svn update as appropriate. Scalr supports periodic scheduling of scripts, so in the dev environment I run it every 5 mins to keep dev in synch with SVN, but obviously I manually trigger it for production.
(I have the script taking a param to specify the SVN branch to use)