Blue-Green Deployment in AWS with Ansible - amazon-web-services

I am thinking to use Ansible to manage my AWS infrastructure; I have (2 servers with auto scaling).
I will deploy using ansible-playbook -i hosts deploy-plats.yml --limit spring-boot
Here my deploy-plats.yml
---
- hosts: bastion:apache:spring-boot
vars:
remote_user: ec2-user
tasks:
- name: Copies the .jar to the Spring Boot boxes
copy: dest=~/ src=~/dev/plats/target/plats.jar mode=0777
- name: Restarts the plats service
service: name=plats state=restarted enabled=yes
become: yes
become_user: root
and I am wondering if using this technology will be a Blue-green deployment or the servers will be restarted at the same time, producing a downtime

By default, Ansible will try to manage all of the machines referenced
in a play in parallel. For a rolling updates use case, you can define
how many hosts Ansible should manage at a single time by using the
serial keyword: (maybe you look for something like this and not blue
green deployment)
- name: test play
hosts: webservers
serial: 1
ansible-serial-link
Also your playbook is not a blue green deployment, I suggest you to read about it.
little bit. A blue/green deployment is a software deployment strategy
that relies on two identical production configurations that alternate
between active and inactive. One environment is referred to as blue,
and the duplicate environment is dubbed green. The two environments,
blue and green, can each handle the entire production workload and are
used in an alternating manner rather than as a primary and secondary
space. One environment is live and the other is idle at any given
time. When a new software release is ready, the team deploys this
release to the idle environment, where it is thoroughly tested. Once
the new release has been vetted, the team will make the idle
environment active, typically by adjusting a router configuration to
redirect application traffic. This leaves the alternate environment
idle.

By default, Ansible will run each task in paralell. You can set the play-level directive "serial" to force it to run the play on one and one node. This is described in detail here:
"Delegation, Rolling Updates, and Local Actions"

serial tag must solve your problem. Limit the value to 1 so that the restart task will get executed in rolling fashion.

Related

My GKE pods stoped with error "no command specified: CreateContainerError"

Everything was Ok and nodes were fine for months, but suddenly some pods stopped with an error
I tried to delete pods and nodes but same issues.
Try below possible solutions to resolve your issue:
Solution 1 :
Check a malformed character in your Dockerfile and cause it to crash.
When you encounter CreateContainerError is to check that you have a valid ENTRYPOINT in the Dockerfile used to build your container image. However, if you don’t have access to the Dockerfile, you can configure your pod object by using a valid command in the command attribute of the object.
So workaround is to not specify any workerConfig explicitly which makes the workers inherit all configs from the master.
Refer to Troubleshooting the container runtime, similar SO1, SO2 & Also check this similar github link for more information.
Solution 2 :
Kubectl describe pod podname command provides detailed information about each of the pods that provide Kubernetes infrastructure. With the help of this you can check for clues, if Insufficient CPU follows the solution below.
The solution is to either:
1)Upgrade the boot disk: If using a pd-standard disk, it's recommended to upgrade to pd-balanced or pd-ssd.
2)Increase the disk size.
3)Use node pool with machine type with more CPU cores.
See Adjust worker, scheduler, triggerer and web server scale and performance parameters for more information.
If you still have the issue, you can then update the GKE version for your cluster Manually upgrading the control planeto one of the fixed versions.
Also check whether you have updated it in the last year to use the new kubectl authentication coming in the GKE v1.26 plugin?
Solution 3 :
If you're having a pipeline on GitLab that deploys an image to a GKE cluster: Check the version of the Gitlab runner that handles the jobs of your pipeline .
Because it turns out that every image built through a Gitlab runner running on an old version causes this issue at the container start. Simply deactivate them and only let Gitlab runners running last version in the pool, replay all pipelines.
Check the gitlab CI script using an old docker image like docker:19.03.5-dind, update to docker:dind helps the kubernetes to start the pod again.

Google Cloud Run correctly running continuous deployment to github, but not updating when deployed

I've set up a Google Cloud Run with continuous deployment to a github, and it redeploys every time there's a push to the main (what I what), but when I go to check the site, it hasn't updated the HTML I've been testing with. I've tested it on my local machine, and it's updating the code when I run the Django server, so I'm guessing it's something with my cloudbuild.yml? There was another post I tried to mimic, but it didn't take.
Any advice would be very helpful! Thank you!
cloudbuild.yml:
steps:
# Build the container image
- name: 'gcr.io/cloud-builders/docker'
args: ['build', '-t', 'gcr.io/${PROJECT_ID}/exeplore', './ExePlore']
# Push the image to Container Registry
- name: 'gcr.io/cloud-builders/docker'
args: ['push', 'gcr.io/${PROJECT_ID}/exeplore']
# Deploy image to Cloud Run
- name: 'gcr.io/cloud-builders/gcloud'
args:
- 'run'
- 'deploy'
- 'exeplore'
- '--image'
- 'gcr.io/${PROJECT_ID}/exeplore'
- '--region'
- 'europe-west2'
- '--platform'
- 'managed'
images:
- gcr.io/${PROJECT_ID}/exeplore
Here are the variables for GCR
Edit 1: I've now updated my cloudbuild, so the SHORT_SHA is all gone, but now google cloud run is saying it can't find my manage.py at /Exeplore/manage.py. I might have to trial and error it, as running the container locally is fine, and same with running the server locally. I have yet to try what Ezekias suggested, as I've tried rolled back to when it was correctly running the server and it doesn't like that.
Edit 2: I've checked the services, it is at 100% Latest
Check your Cloud Run service, either on the Cloud Console or by running gcloud run services describe. It may be set to serve traffic to a specific revision instead of having 100% of traffic serving LATEST.
If that's the case, it won't automatically move traffic to the new revision when you deploy. If you want it to automatically switch to the new update, you can run gcloud run services update-traffic --to-latest or use the "Manage Traffic" button on the revisions tab of the Cloud Console to set 100% of traffic to the latest healthy revision.
It looks like you're building gcr.io/${PROJECT_ID}/exeplore:$SHORT_SHA, but pushing and deploying gcr.io/${PROJECT_ID}/exeplore. These are essentially different images.
Update any image variables to include the SHORT_SHA to ensure all references are the same.
To avoid duplication you may also want to use dynamic substitution variables

Elastic Kubernetes Service AWS Deployment process to avoid down time

Its been a month I have started working on EKS AWS and up till now successfully deployed by code.
The steps which I follow for deployment are given below:
Create image from docker terminal.
Tag and push to ECR AWS.
Create the deployment "project.json" and service file "project-svc.json".
Save the above file in "kubectl/bin" path and deploy it with following commands below.
"kubectl apply -f projectname.json" and "kubectl apply -f projectname-svc.json".
So if I want to deployment the same project again with change, I push the new image on ECR and delete the existing deployment by using "kubectl delete -f projectname.json" without deleting the existing service and deploy it again using command "kubectl apply -f projectname.json" again.
Now, I'm in confusing that after I delete the existing deployment there is a downtime until I apply or create the deployment again. So, how to avoid this ? Because I don't want the downtime actually that is the reason why I started to use the EKS.
And one more thing is the process of deployment is a bit long too. I know I'm missing something can anybody guide me properly please?
The project is on .NET Core and if there is any simplified way to do deployment using Visual Studio please guide me for that also.
Thank You in advance!
There is actually no need to delete your deployment. Just need to update the desired state (the deployment configuration) and let K8s do its magic and apply the needed changes, like deploying a new version of your container.
If you have a single instance of your container, you will experience a short down time while changes are applied. If your application supports multiple replicas (HA), you can enjoy the rolling upgrade feature.
Start by reading the official Kubernetes documentation of a Performing a Rolling Update.
You only need to use the delete/apply if you are changing (And if you have) the ConfigMap attached to the Deployment.
Is the only change you do is the "image" of the deployment - you must use the "set-image" command.
Kubectl let you change the actual deployment image and it does the Rolling Updates all by itself and with 3+ pods you have the minimum chance for downtime.
Even more, if you use the --record flag, you can "rollback" to your previous image with no effort because it keep track of the changes.
You also have the possibility to specify the "Context" too, with no need to jump from contexts.
You can go like this:
kubectl set image deployment DEPLOYMENT_NAME DEPLOYMENT_NAME=IMAGE_NAME --record -n NAMESPACE
OR Specifying the Cluster
kubectl set image deployment DEPLOYEMTN_NAME DEPLOYEMTN_NAME=IMAGE_NAME_ECR -n NAMESPACE --cluster EKS_CLUSTER_NPROD --user EKS_CLUSTER --record
As an Eg:
kubectl set image deployment nginx-dep nginx-dep=ecr12345/nginx:latest -n nginx --cluster eu-central-123-prod --user eu-central-123-prod --record
The --record is what let you track all the changes, if you want to rollback just do:
kubectl rollout undo deployment.v1.apps/nginx-dep
More documentations about it here:
Updating a deployment
https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#updating-a-deployment
Roll Back Deployment
https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#rolling-back-a-deployment

Same Application in a docker compose configuration mapping different ports on AWS EC2 instance

The app has the following containers
php-fpm
nginx
local mysql
app's API
datadog container
In the dev process many feature branches are created to add new features. such as
app-feature1
app-feature2
app-feature3
...
I have an AWS EC2 instance per feature branches running docker engine V.18 and docker compose to build the and run the docker stack that compose the php app.
To save operation costs 1 AWS EC2 instance can have 3 feature branches at the same time. I was thinking that there should be a custom docker-compose with special port mapping and docker image tag for each feature branch.
The goal of this configuration is to be able to test 3 feature branches and access the app through different ports while saving money.
I also thought about using docker networks by keeping the same ports and using an nginx to redirect traffic to the different docker network ports.
What recommendations do you give?
One straight forward way I can think of in this case is to use the .env file for your docker-compose.
docker-compose.yaml file will look something like this
...
ports:
- ${NGINX_PORT}:80
...
ports:
- ${API_PORT}:80
.env file for each stack will look something like this
NGINX_PORT=30000
API_PORT=30001
and
NGINX_PORT=30100
API_PORT=30101
for different projects.
Note:
.env must be in the same folder as your docker-compose.yaml.
Make sure that all the ports inside .env files will not be conflicting with each other. You can have some kind of conventions like having prefix for features like feature1 will have port starting with 301 i.e. 301xx.
In this way, your docker-compose.yaml can be as generic as you may like.
You're making things harder than they have to be. Your app is containerized- use a container system.
ECS is very easy to get going with. It's a json file that defines your deployment- basically analogous to docker-compose (they actually supported compose files at some point, not sure if that feature stayed around). You can deploy an arbitrary number of services with different container images. We like to use a terraform module with the image tag as a parameter, but easy enough to write a shell script or whatever.
Since you're trying to save money, create a single application load balancer. each app gets a hostname, and each container gets a subpath. For short lived feature branch deployments, you can even deploy on Fargate and not have an ongoing server cost.
It turns out the solution involved capabilities from docker-compose. In docker docs the concept is called Multiple Isolated environments on a single host
to achieve this:
I used an .env file with so many env vars. The main one is CONTAINER_IMAGE_TAG that defines the git branch ID to identify the stack.
A separate docker-compose-dev file defines ports, image tags, extra metadata that is dev related
Finally the use of --project-name in the docker-compose command allows to have different stacks.
an example docker-compose Bash function that uses the docker-compose command
docker_compose() {
docker-compose -f docker/docker-compose.yaml -f docker/docker-compose-dev.yaml --project-name "project${CONTAINER_IMAGE_TAG}" --project-directory . "$#"
}
The separation should be done in the image tags, container names, network names, volume names and project name.

Wildfly 10 restart issue on AWS EC2

I am running my Wildfly 10.1.0 server on Linux OS on Amazon EC2 instance. I have written start and stop scripts for the server. Whenever I stop my server and re-start after some time I get the following exception -
WFLYCTL0013: Operation ("add") failed - address: ([("deployment" => "rapid.ear")]) - failure description: "WFLYSRV0137: No deployment content with hash dd66eee901c4bf79dd6659873df918e1b639bc1b is available in the deployment content repository for deployment 'rapid.ear'. This is a fatal boot error. To correct the problem, either restart with the --admin-only switch set and use the CLI to install the missing content or remove it from the configuration, or remove the deployment from the xml configuration file and restart."
When I remove the entry for that WAR from standalone.xml I am able to restart the server, but I need a more permanent solution.
The start script written is -
nohup /data/wildfly-10.1.0.Final/bin/standalone.sh -Djavax.net.ssl.trustStore="/usr/java/jdk1.8.0_121/jre/lib/security/jssecacerts" --server-config=standalone.xml &
And the stop script is -
sh /data/wildfly-10.1.0.Final/bin/jboss-cli.sh --connect command=:shutdown
It may not be quite as efficient in terms of I/O but if you've got a standalone instance I've just taken advantage of the deployment scanner. I have:
<subsystem xmlns="urn:jboss:domain:deployment-scanner:2.0">
<deployment-scanner name="myapp" path="/home/wildfly/sites/www.mysite.tld" scan-interval="60000" auto-deploy-exploded="true"/>
</subsystem>
in my standalone-full.xml (you may or may not need the "-full" part). I then deploy my webapp to "/home/wildfly/sites/www.mysite.tld" and can update it as needed. The code I show only reads the directory once a minute so it isn't terrible on I/O.
Again, your deployment may be different than mine.