Elastic Kubernetes Service AWS Deployment process to avoid down time

Elastic Kubernetes Service AWS Deployment process to avoid down time - amazon-web-services

Its been a month I have started working on EKS AWS and up till now successfully deployed by code.
The steps which I follow for deployment are given below:
Create image from docker terminal.
Tag and push to ECR AWS.
Create the deployment "project.json" and service file "project-svc.json".
Save the above file in "kubectl/bin" path and deploy it with following commands below.
"kubectl apply -f projectname.json" and "kubectl apply -f projectname-svc.json".
So if I want to deployment the same project again with change, I push the new image on ECR and delete the existing deployment by using "kubectl delete -f projectname.json" without deleting the existing service and deploy it again using command "kubectl apply -f projectname.json" again.
Now, I'm in confusing that after I delete the existing deployment there is a downtime until I apply or create the deployment again. So, how to avoid this ? Because I don't want the downtime actually that is the reason why I started to use the EKS.
And one more thing is the process of deployment is a bit long too. I know I'm missing something can anybody guide me properly please?
The project is on .NET Core and if there is any simplified way to do deployment using Visual Studio please guide me for that also.
Thank You in advance!

There is actually no need to delete your deployment. Just need to update the desired state (the deployment configuration) and let K8s do its magic and apply the needed changes, like deploying a new version of your container.
If you have a single instance of your container, you will experience a short down time while changes are applied. If your application supports multiple replicas (HA), you can enjoy the rolling upgrade feature.
Start by reading the official Kubernetes documentation of a Performing a Rolling Update.

You only need to use the delete/apply if you are changing (And if you have) the ConfigMap attached to the Deployment.
Is the only change you do is the "image" of the deployment - you must use the "set-image" command.
Kubectl let you change the actual deployment image and it does the Rolling Updates all by itself and with 3+ pods you have the minimum chance for downtime.
Even more, if you use the --record flag, you can "rollback" to your previous image with no effort because it keep track of the changes.
You also have the possibility to specify the "Context" too, with no need to jump from contexts.
You can go like this:
kubectl set image deployment DEPLOYMENT_NAME DEPLOYMENT_NAME=IMAGE_NAME --record -n NAMESPACE
OR Specifying the Cluster
kubectl set image deployment DEPLOYEMTN_NAME DEPLOYEMTN_NAME=IMAGE_NAME_ECR -n NAMESPACE --cluster EKS_CLUSTER_NPROD --user EKS_CLUSTER --record
As an Eg:
kubectl set image deployment nginx-dep nginx-dep=ecr12345/nginx:latest -n nginx --cluster eu-central-123-prod --user eu-central-123-prod --record
The --record is what let you track all the changes, if you want to rollback just do:
kubectl rollout undo deployment.v1.apps/nginx-dep
More documentations about it here:
Updating a deployment
https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#updating-a-deployment
Roll Back Deployment
https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#rolling-back-a-deployment

Related

My GKE pods stoped with error "no command specified: CreateContainerError"

Everything was Ok and nodes were fine for months, but suddenly some pods stopped with an error
I tried to delete pods and nodes but same issues.

Try below possible solutions to resolve your issue:
Solution 1 :
Check a malformed character in your Dockerfile and cause it to crash.
When you encounter CreateContainerError is to check that you have a valid ENTRYPOINT in the Dockerfile used to build your container image. However, if you don’t have access to the Dockerfile, you can configure your pod object by using a valid command in the command attribute of the object.
So workaround is to not specify any workerConfig explicitly which makes the workers inherit all configs from the master.
Refer to Troubleshooting the container runtime, similar SO1, SO2 & Also check this similar github link for more information.
Solution 2 :
Kubectl describe pod podname command provides detailed information about each of the pods that provide Kubernetes infrastructure. With the help of this you can check for clues, if Insufficient CPU follows the solution below.
The solution is to either:
1)Upgrade the boot disk: If using a pd-standard disk, it's recommended to upgrade to pd-balanced or pd-ssd.
2)Increase the disk size.
3)Use node pool with machine type with more CPU cores.
See Adjust worker, scheduler, triggerer and web server scale and performance parameters for more information.
If you still have the issue, you can then update the GKE version for your cluster Manually upgrading the control planeto one of the fixed versions.
Also check whether you have updated it in the last year to use the new kubectl authentication coming in the GKE v1.26 plugin?
Solution 3 :
If you're having a pipeline on GitLab that deploys an image to a GKE cluster: Check the version of the Gitlab runner that handles the jobs of your pipeline .
Because it turns out that every image built through a Gitlab runner running on an old version causes this issue at the container start. Simply deactivate them and only let Gitlab runners running last version in the pool, replay all pipelines.
Check the gitlab CI script using an old docker image like docker:19.03.5-dind, update to docker:dind helps the kubernetes to start the pod again.

Created a pipeline using AWS copilot, original push worked but when I make changes to code and push them to github they don't show up

would appreciate any help with this:
I've followed the guide for AWS copilot here: https://aws.github.io/copilot-cli/docs/getting-started/first-app-tutorial/ and then the guide for creating a pipeline and connecting it to github here: https://aws.github.io/copilot-cli/docs/concepts/pipelines/. That all appears to have worked and I can view the react app I'm working on at the url indicated in aws.
My problem is that when I make changes to my code and then push it to the tracked github branch, the changes don't appear when viewing the app at the url. However, when I make the push to github, the pipeline does register that a change has occured. It indicates that a change has been made and goes through the flow of creating a new build. But whatever I try, the changes don't seem to actually show up.
I assume that I'm missing something simple here, and that for some reason, docker is building the app based on the original code. But I can't figure out why that would be. Maybe something is weird with my DockerFile?
My docker file looks like this:
FROM node:16.14
WORKDIR /app
ENV PATH /app/node_modules/.bin:$PATH
COPY package.json ./
COPY package-lock.json ./
RUN npm i
COPY . ./
CMD ["npm", "run", "server"]
My understanding of how this should work, is that I push up new code to github, that is sent to the aws pipeline and a new image is generated based on that code, which is then used to create a container that is hosted on ECS. But clearly I am missing something.
copilot deploy does work. I'm unsure if
the problem is that my pipeline is successfully building (as it does not throw an error in the console) and then just not hosting it at the same url as copilot deploy. Or
the pipeline is hitting an error that just doesn't show up in the pipeline console. Digging into the logs I find this:
echo "Cloudformation stack and config files were not generated. Please check build logs to see if there was a manifest validation error." 1>&2;
Which seems to point towards the second option. Any suggestions on how resolve whatever it going on in the container if that is the problem?
The error suggests that I check build logs but these are the build logs. Are there more granular build logs I can examine?

When running containers in ECS, unless your container is already crashing because of an error, it often won't pick up code changes from your new image unless you force a new deployment. You can do this from the command line using the AWS CLI with the following:
aws ecs update-service --cluster <cluster_name> --service <service_name> --force-new-deployment --profile <aws_profile_name>
Note that the profile is optional if you're using your default aws cli configuration profile.

AWS cloudformation: How to run cfn-nag locally in Windows

I have a cloud formation template where I have all the resources and details for the project.
I have the cfn-lint setup locally and it is running perfectly fine. However when I push the code changes, build fails at deployment stage due to cfn-nag stating some simple changes which could be fixed.
I'm using windows machine and I need a way to run this cfn-nag locally so that I could check this just like cfn-lint and fix them locally instead of waiting 40 minutes for build till it reaches deployment stage.
I referred several posts online, found below two helpful
https://stelligent.com/2018/03/23/validating-aws-cloudformation-templates-with-cfn_nag-and-mu/
https://github.com/stelligent/cfn_nag
What is the difference between cfn-nag and cfn-lint and why lint is not failing on what cfn-nag is complaining about?
The above links have some instructions on Ruby and Brew but I'm using Nodejs, felt lost. Please help.

CFN-Nag looks for patterns in AWS CloudFormation templates that may indicate insecure infrastructure,
Ex:
IAM rules that are too permissive (wildcards),
Security group rules that are too permissive (wildcards),
Access logs that aren’t enabled,
Encryption that isn’t enabled,
CFN-Lint scans the AWS CloudFormation template by processing a collection of Rules, where every rule handles a specific function check or validation of the template. It validates against AWS CloudFormation Resource specification.
This collection of rules can be extended with custom rules using the --append-rules argument.
Ex: Whitespaces, alignment(YAML), type checks, valid values for resource properties, and other best practices.
Those two links you previded above have all the information needed, just not directly for a Nodejs developer using a Windows machine.
Step1: Pull the docket image stelligent/cfn-nag
Step2: Add the script to your package.json for cfn-nag
Ex:
"scripts" : {
"cfn:nag": "cfn-nag"
}
If you're using docker-compose.yml
Add the cfn-nag image details to your docker-compose.yml like below
cfn-nag:
image: "stelligent/cfn-nag"
volumes:
-./path_of_cfn_file_to_copy: /path_to_copy_to
command: ${COMMAND: -/path_to_copy_tp/cfn_file}
Just set the scripts in package.json to run via docker-compose
"cfn:nag": "docker-compose run --rm cfn-nag"

Redeploy latest image without using `kubectl apply -f file`

We have a deployment on a cluster, we want to tell it to pull the latest image and re-deploy. If I run kubectl apply -f deployment.yml that file hasn't actually changed. How do I just tell the cluster to use a newer image version?

As per documentation:
Note: A Deployment’s rollout is triggered if and only if the Deployment’s pod template (that is, .spec.template) is changed, for example if the labels or container images of the template are updated. Other updates, such as scaling the Deployment, do not trigger a rollout.
Please consider using:
kubectl patch deployment my-nginx --patch '{"spec": {"template": {"spec": {"containers": [{"name": "nginx","image": "nginx:1.7.9"}]}}}}'
kubectl --record deployment.apps/nginx-deployment set image deployment.v1.apps/nginx-deployment nginx=nginx:1.9.1
kubectl edit deploy/<your_deployment> --record
Documentation about Updating a Deployment and Container Images.
According to the best practices please Note:
You should avoid using the :latest tag when deploying containers in production as it is harder to track which version of the image is running and more difficult to roll back properly.
However if you would like always force pull new image you can use on of those options:
- set the imagePullPolicy of the container to Always.
- omit the imagePullPolicy and use :latest as the tag for the image to use.
- omit the imagePullPolicy and the tag for the image to use.
- enable the AlwaysPullImages admission controller.
Hope this help.

AWS Deploy ECS with Updated Image

It appears that one must provide a new full task definition for each service update. Even though most of the time new deployments exclusively consists of updates to one of the underlying docker images
While this is understandable as a core architectural choice. It is quite cumbersome. Is there a command-line option that makes this easier as the full JSON spec for task definitions are quite complex?
Right now the developers needs to provide complex scripts and deployment orchestrations to achieve this relatively routine task in their CI/CD processes
I see attempts at this Here and Here. These solutions do not appear to work in all cases (for example, for Fargate launches)
I know that if the updated image uses the re-use the same tag this problem is made easier, however in dev cultures that values reproducibility and audibility that is simply not an reasonable option
Is there no other option than to leverage both the AWS API & JSON manipulation libraries?
EDIT It appears this project does a fairly good job https://github.com/fabfuel/ecs-deploy

I found a few approaches
As mentioned in my comment, use ecs-deploy script per the Github link
Create a task definition via the --generate-cli-skeleton option on awscli.
Fill out all details except for execution-rule-arn, task-role-arn, image
These cannot be filled out because they will change per commit or per environment you want to deploy to
Commit this skeleton to git, so it is part of your workspace on the CI
Then use a JSON traversing/parsing library or utility such as https://jqplay.org/ to replace at build time on the CI the roleArn and image name

Use https://github.com/fabfuel/ecs-deploy.
If you want to update only the tag of an existent task:
ecs deploy <CLUSTER NAME> <SERVICE NAME> --region <REGION NAME> --tag <NEW TAG>
e.g. ecs deploy default web-service --region us-east-1 --tag v2.0
In your ci/cd you use git hash:
Using git rev-parse HEAD, will return a hash like: d63c16cd4d0c9a30524c682fe4e7d417faae98c9
docker build -t image-name:$(git rev-parse HEAD) .
docker push image-name:$(git rev-parse HEAD)
And use the same tag on task:
ecs deploy default web-service --region us-east-1 --tag $(git rev-parse HEAD)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js