Infrastructure and code deployment in same pipeline or different? - amazon-web-services

We are in the process of setting up a new release process in AWS. We are using terraform with Elastic Beanstalk to spin up the hardware to deploy to (although actual tools are irrelevant).
As this elastic beanstalk does not support immutable deployments in windows environments we are debating whether to have a separate pipeline to deploy our infrastructure or to run terraform on all code deployments.
The two things are likely to have different rates of churn which feels like a good reason to separate them. This would also reduce risk as there is less to deploy. But it means code could be deployed to snowflake servers and means QA and live hardware could get out of sync and therefore we would not be testing like for like.
Does anyone have experience of the two approaches and care to share which has worked better and why?

Well,
we have both the approaches in place. The initial AWS provisioning has the last step of a null resource which runs an ansible which does the initial code deployment.
Subsequent code deployments are done with standalone jenkins+ansible jobs.

Related

What is the difference between AWS Lambda & AWS Elastic Beanstalk

I am studying for my AWS Cloud Practitioner Certification and I am confused with the difference between AWS Lambda & AWS Elastic Beanstalk. From my understanding, for both services you upload your code to AWS and AWS essentially manages the underlying infrastructure for you.
I know with Lambda you upload your code to a 'Lambda Function' and set triggers for when the code executes.
With AWS EB you upload your application code and EB automatically handles the deployment, capacity, provisioning, etc...
They both sound very similar as you upload your code to both and both handle underlying instances/environments.
Thanks!
Elastic beanstalk and lambda are very different though some of the features may look similar. At high level, elastic beanstalk deploys a long running application whereas lambda deploys short running code function
Lambda can at maximum run for 15 minutes, whereas EB can run continuously. Generally, we deploy websites/apps on EB whereas lambda are generally used for triggered functionality like processing image when image gets uploaded to S3.
Lambda can only handle one request at a time whereas number of concurrent requests EB can handle depends on your underlying infrastructure. So, if you are having say 100 requests, 100 lambdas will be created whereas these 100 requests can be handled by one underlying EC2 instance in EB
Lambda is serverless (underlying infra is entirely abstracted from developer). Whereas EB is automation over infra provisioning. You can still see your EC2 instances, load balancer, auto scaling group etc. in your AWS console. You can even ssh/rdp to your instance and change running services. AWS EB allows you also to have your custom AMIs.
Lambda is having issue of cold starts as in lambda, infra needs to be provisioned on demand by AWS, whereas in EB, you generally have EC2 instances already provisioned to handle your requests.
All great (and exam-specific) points by SmartCoder. If I may add a general ancillary comment:
Wittgenstein said, "In most cases, the meaning of a word is its use." I think this maxim is remarkably apt for software engineering too. In the context of your question, those two AWS services are used for significantly different purposes.
Lambda - Say you developed a photo uploading application with Node.js that uploads some processed images to an S3 bucket. The core logic for this is probably quite straightforward, and it's got a singular, distinct task. Simply take in an image, do some processing and if not for any exception, store it in a bucket. In this case, it's inefficient to waste time spinning up servers, configuring them with a runtime environment, downloading dependencies, maintenance, etc. A literal copy and paste of your code into the Lambda console while setting up a few configurations should get your job done. Plus, you save a lot of money as infrastructure is "provisioned" only when your Node.js function is invoked. Again, keep in mind the principle of this code performing a singular task.
Elastic Beanstalk - This same photo uploading system mentioned above might now mature into a more complex full-fledged software application that requires user management, authentication, and further processing of the images, which certainly requires more provisioning of resources. This application will probably do a lot of things with multiple code repositories for you to manage and deploy. And yet, you don't want to spend money on a DevOps engineer or learn to use an IaC (Infrastructure as Code) platform like CloudFormation or Terraform. In this case, Elastic Beanstalk is useful for a developer without too much in-depth DevOps knowledge as it's a PaaS (Platform as a Service) tool; it pretty much gives you a clear interface to spin up whole new production-ready systems.
Here are two good whitepapers I read a while back on the above topics.
https://docs.aws.amazon.com/whitepapers/latest/serverless-architectures-lambda/serverless-architectures-lambda.pdf
https://docs.aws.amazon.com/whitepapers/latest/introduction-devops-aws/introduction-devops-aws.pdf
Lambda is run based on specific trigger events.. and it exits as soon as its work is over.

AWS CodePipeline to only deploy files that have changed since previous deploy and not simply replace application

TLDR; How do I only push to CodeDeploy the changes that have been made from CodeCommit?
I built a simple CI/CD Pipeline with CodePipeline in which I commit to CodeCommit and then it deploys the code using CodePipeline to my Elastic Beanstalk application.
The problem is that it seems like it simply copies the entire application and puts it online. In this way, it removes all of the logs that I had previously on the server. For example, anything that was in .gitignore will not only not be submitted to git, but if this was previously on the server, then it will be removed.
Any comments or suggestions are greatly appreciated! ❤️
Thanks!
In this way, it removes all of the logs that I had previously on the server
EB environment, whether single-instance or load-balance always runs in autoscaling group. This means that they can be terminated at any time, e.g. due to AZ re-balance or due to changes to your EB environment configuration
Thus you should build all your applications to be stateless and do not depend on any stored information on them. Sooner or later this will lead to issues (some of which you are experiencing now).
If you wanted to do this upon a CodePipeline activation you would need to have a first stage that prunes based on the difference of commits (presumably using Lambda). This would then replace the artifact that goes to your instances.
Remember that CodeDeploy will replace the contents of the folder with the contents of your artifact so you'll need to account for this.
However this is generally bad practice, in fact you should never be reliant on a specific server especially for logging.
Instead architect your servers to ship your logs to a distributed service such CloudWatch Logs, an ELK stack or a third party supplier. Always be prepared for your infrastructure to fail, by allowing servers to be easily replaced it will allow your applications to be more resilient.

Is it recommended to have multiple deployments of same application in same AWS region?

I had a requirement where it might be possible I would need to support different versions of the application at the same time which is kind of a business requirement.
One way of doing this would be to deploy the app in different regions. But it might also be required to run the same app in one region multiple times.
Of course, it can be done by parameterising the deployment scripts but will it lead to some issues?
One I can think of is same app running in the same region might consume the same resources and it might hit some of the regional limits. Are there any other issues I should be aware of?
We need more background information on your actual tech stack and more detailed requirements.
Running multiple versions of multiple deployments is something that a lot of companies manage on AWS. CI/CD pipelines and many other topics are very helpful, but I am fishing in the dark here.
You can certainly run those multiple deployments in one region.
For a first start, have a look at Elastic Beanstalk environments:
"You can deploy multiple AWS Elastic Beanstalk environments when you need to run multiple versions of an application."
And this gets you started on Elastic Beanstalk.

Continuous Integration on AWS EMR

We have a long running EMR cluster that has multiple libraries installed on it using bootstrap actions. Some of these libraries are under continuous development and their codebase is on GitHub.
I've been looking to plug Travis CI with AWS EMR in a similar way to Travis and CodeDeploy. The idea is to get the code on GitHub tested and deployed automatically to EMR while using bootstrap actions to install the updated libraries on all EMR's nodes.
A solution I came up with is to use an EC2 instance in the middle, where Travis and CodeDeploy can be first used to deploy the code on the instance. After that a lunch script on the instance is triggered to create a new EMR cluster with the updated libraries.
However, the above solution means we need to create a new EMR cluster every time we deploy a new version of the system
Any other suggestions?
You definitely don't want to maintain an EC2 instance to orchestrate a CI/CD process like that. First of all, it introduces a number of challenges because then you need to deal with an entire server instance, keep it maintained, deal with networking, apply monitoring and alerts to deal with availability issues, and even then, you won't have availability guarantees, which may cause other issues. Most of all, maintaining an EC2 instance for a purpose like that simply is unnecessary.
I recommend that you investigate using Amazon CodePipeline with a Lambda Step Function.
The Step Function can be used to orchestrate the provisioning of your EMR cluster in a fully serverless environment. With CodePipeline, you can setup a web hook into your Github repo to pull your code and spin up a new deployment automatically whenever changes are committed to your master Github branch (or whatever branch you specify). You can use EMRFS to sync an S3 bucket or folder to your EMR file system for your cluster and then obtain the security benefits of IAM, as well as additional consistency guarantees that come with EMRFS. With Lambda, you also get seamless integration into other services, such as Kinesis, DynamoDB, and CloudWatch, among many others, that will simplify many administrative and development tasks, as well as enable you to have more sophisticated automation with minimal effort.
There are some great resources and tutorials for using CodePipeline with EMR, as well as in general. Here are some examples:
https://aws.amazon.com/blogs/big-data/implement-continuous-integration-and-delivery-of-apache-spark-applications-using-aws/
https://docs.aws.amazon.com/codepipeline/latest/userguide/tutorials-ecs-ecr-codedeploy.html
https://chalice-workshop.readthedocs.io/en/latest/index.html
There are also great tutorials for orchestrating applications with Lambda Step Functions, including the use of EMR. Here are some examples:
https://aws.amazon.com/blogs/big-data/orchestrate-apache-spark-applications-using-aws-step-functions-and-apache-livy/
https://aws.amazon.com/blogs/big-data/orchestrate-multiple-etl-jobs-using-aws-step-functions-and-aws-lambda/
https://github.com/DavidWells/serverless-workshop/tree/master/lessons-code-complete/events/step-functions
https://github.com/aws-samples/lambda-refarch-imagerecognition
https://github.com/aws-samples/aws-serverless-workshops
In the very worst case, if all of those options fail, such as if you need very strict control over the startup process on the EMR cluster after the EMR cluster completes its bootstrapping, you can always create a Java JAR that is loaded as a final step and then use that to either execute a shell script or use the various Amazon Java libraries to run your provisioning commands. In even this case, you still have no need to maintain your own EC2 instance for orchestration purposes (which, in my opinion, still would be hard to justify even if it was running in a Docker container in Kubernetes) because you can easily maintain that deployment process as well with a fully serverless approach.
There are many great videos from the Amazon re:Invent conferences that you may want to watch to get a jump start before you dive into the workshops. For example:
https://www.youtube.com/watch?v=dCDZ7HR7dms
https://www.youtube.com/watch?v=Xi_WrinvTnM&t=1470s
Many more such videos are available on YouTube.
Travis CI also supports Lambda deployment, as mentioned here: https://docs.travis-ci.com/user/deployment/lambda/

Deploying to several environments on Amazon Elastic Beanstalk at the same time

I have an application that have several environments (all running in Amazon Elastic Beanstalk), namely, Production, Worker and Debug. Each environment have corresponding git branch that is different from master in some ways (like, configuration is changed and some code is deleted).
I use eb deploy to deploy the new version of application from its branch. It zips current git branch using git zip and sends the information to Amazon. Then it deploys to running instances.
The problem, however, is that deploying takes some time (about 5 minutes). Thus, between deploying, say, worker and production it have different code. Which is bad, because my changes might have change the queue protocol or something like that.
What I want is to be able to upload the information and to do its processing on all the environments, but not actually replace the code, just prepare it. And after I did it for all the environments issue command like "finish deploy" so that the code base is replaced on all the environments simultaneously.
Is there a way to do it?
You need to perform a "blue-green" deploy and not do this in-place. Because your deployment model requires synchronization of more than one piece, a change to the protocol those pieces use means those pieces MUST be deployed at the same time. Treat it as a single service if there's a frequently-breaking protocol that strongly binds the design.
"Deployed" means that the outermost layer of the system is exposed and usable by other systems. In this case, it sounds like you have a web server tier exposing an API to some other system, and a worker tier that reads messages produced by the web tier.
When making a breaking queue protocol change, you should deploy BOTH change-sets (web server layer and queue layer) to entirely NEW beanstalk environments, have them configured to use each other, then do a DNS swap on the exposed endpoint, from the old webserver EB environment to the new one. After swapping DNS on the webserver tier and verifying the environment works as expected, you can destroy the old webserver and queue tiers.
On non-protocol-breaking updates, you can simply update one environment or the other.
It sounds complex because it is. If you are breaking the protocol frequently, then your system is not decoupled enough to expect to version the worker and webserver tiers, which is why you have to do this complex process to version them together.
Hope this helps!