I am trying to understand the correct way to setup my project on AWS so that I ultimately get the possibility to have CI/CD on the lambda functions. And also to ingrain good practices.
My application is quite simple : an API that calls lambda functions based on users' requests.
I have deployed the application using AWS SAM. For that, I used a SAM template that was using local paths to the lambda functions' code and that created the necessary AWS ressources (API Gateway and Lambda). It was necessary to use local paths for the lambda functions because the way SAM works does not allow using existing S3 buckets for S3 events trigger (see here) and I deploy a Lambda function that is watching the S3 bucket to see any updated code to trigger lambda updates.
Now what I have to do is to push my Lambda code on Github. And have a way that Github pushes the lambda functions' code from github to the created S3 bucket during the SAM deploy and the correct prefix. Now what I would like is a way to automatically to that upon Github push.
What is the preferred way to achieve that ? I could not find clear information in AWS documentation. Also, if you see a clear flaw in my process don't hesitate to point it out.
What you're looking to do is a standard CI/CD pipeline.
The steps of your pipeline will be (more or less): Pull code from GitHub -> Build/Package -> Deploy
You want this pipeline to be triggered upon a push to GitHub, this can be done by setting up a Webhook which will then trigger the pipeline.
Last two steps are supported by SAM which I think you have already implemented before, so will be a matter of triggering the same from the pipeline.
These capabilities are supported by most CI/CD tools, if you want to keep everything in AWS you could use CodePipeline which also supports GitHub integration. Nevertheless, Jenkins is perfectly fine and suitable for your use case as well.
There are a lot of ways you can do it. So would depend eventually on how you decide to do it and what tools you are comfortable with. If you want to use native AWS tools, then Codepipeline is what might be useful.
You can use CDK for that
https://aws.amazon.com/blogs/developer/cdk-pipelines-continuous-delivery-for-aws-cdk-applications/
If you are not familiar with CDK and would prefer cloudformation, then this can get you started.
https://docs.aws.amazon.com/codepipeline/latest/userguide/tutorials-github-gitclone.html
Related
How is it currently done the handling of multiple lambda functions for a single stack/application?
Considering a use case with more than one function is it better to stick all together in the same repository or have one for each?
Having a single repository for all the functions would be much easier for me coming from old/classic backend development with a single codebase for all the business logic, but moving on the AWS ecosystem means I can no longer "deploy" my entire business logic with a single command since I need to zip a single function and update the archive with the aws cli, and that is impossible with standard merge requests or pipeline due the impossibility of automation for these steps (every time it could be a different function or multiple ones).
From the other side, having e.g. 5 or 6 repositories one for each lambda alongside the ones for frontend and AWS stack would be very impractical to manage.
Bundle your different lambda functions together as a Cloudformation stack. Cloudformation allows you to create multiple AWS services, bridge them together as you wish. There are many tools you can use to achieve this. AWS Cloudformation, AWS SAM (serverless application model) or third party tools like serverless and Terraform. Base concept is known as Infrastructure as Code (IAC).
As per respositories, you can have a single repository per stack. (AWS SAM provides sample codes with a good directory structure) You can try sam init as an example.
Consider AWS Serverless Application Model for your development. It allows you to bash script build, package and deploy using sam cli based on the yaml template. SAM will figure out the diff in your code by itself (because it runs CloudFormation under the hood). It allows not only to combine several functions into one package, but also add API gateways, dynamoDB tables and so much more! Another cool feature is that your functions will appear as an integrated application in Lambda console so you can monitor them all at the same time.
I have two AWS accounts. I develop code in one CodeCommit repository. Once it is done, I need to clone that code into the other account CodeCommit repository. Is there a way to do that using lambda function or any other method to automate the process.
Please help me, it was a really a headache more than a month. :)
There are several ways doing that. Essentially, what you'll need is a trigger, that then kicks of the replication process into another account after each commit. Below are two possible ways documented doing this.
Lambda + Fargate
The first one uses a combination of Lambda, which you can select CodeCommit to be a trigger for. The Lambda function then runs a Fargate task, which in turn replicates the repository using git clone --mirror. Fargate is used here as the replication of larger repositories might exceed the temporary storage that Lambda can allocate.
https://aws.amazon.com/blogs/devops/replicate-aws-codecommit-repository-between-regions-using-aws-fargate/
CodePipeline + CodeBuild
This is probably the "cleaner" variant as it uses native CI/CD tooling in AWS, making it easier to set up as compared to ECS/Fargate, amongst other advantages.
Here you're setting up AWS CodePipeline, which will monitor the CodeCommit repository for any changes. When a commit is detected, it will trigger CodeBuild, which in turn runs the same git command outlined earlier.
https://medium.com/geekculture/replicate-aws-codecommit-repositories-between-regions-using-codebuild-and-codepipeline-39f6b8fcefd2
Assuming that you have repo 1 on account A, repo 2 on account B, you want to sync repo 1 -> repo 2
The easiest way is to do the following:
create SNS topic on Account A
enable Notification for repo 1, and send all event to SNS topic
create a lambda function to subscribe the SNS topic
make sure you followed this guide https://docs.aws.amazon.com/codecommit/latest/userguide/cross-account.html to grant lambda function cross account CodeCommit permission
write a python function to decide what git events you want to replicate. If you just want to sync the main branch and ignore all other branch, you can say something like: if event["source_ref"].endswith("main"), then use boto3 CodeCommit API https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/codecommit.html, (take a look at batch_get_commits) to commit the change to the remote CodeCommit repo.
However, I really doubt that do you really need to do this? How about just dump the all git history as a zip to S3 to your remote account? and just import everytime if you see any changes? I believe your remote account is mostly READ ONLY and just serve as a backup. If you only need backup, you can just dump to S3 and don't even need to import.
I am trying to develop a serverless application which will use AWS SQS (Simple Queue Service), AWS SES (Simple Email Service) and AWS Lambda. The application will perform these steps:
get some messages in the SQS queue
trigger a Lambda function to handle all these messages
the Lambda will either send an email using AWS SES or an SMS using some 3rd party API, depending on the type of message
To test this out, I created the queue, lambda and configured SES, all manually using the web interface at https://aws.amazon.com . For the Lambda function, I simply typed my code in the web IDE provided at Lambda console. Since it was a very simple POC, it didnt need any testing and I got it to work.
Now, I want to turn this into a production ready application. My requirements:
code this entire application locally on my machine
test it locally in an environment similar to the one at AWS
publish the code to GitLab and then finally deploy it at AWS
all these resources (SQS queue, SES config, Lambda function) to be created automatically through code
Based on what I read online, I could find 3 different options for doing this:
AWS Cloudformation
AWS SAM
Serverless framework
My questions are:
For my use case, are serverless applications the correct technology or do I need something else like AWS SWF or AWS Step Functions? I also read about AWS Lambda applications. Are they something else?
Which is the best option among these in terms of cost, ease of setup and use? I checked that CloudFormation itself doesnt cost anything, you just have to pay for the services (SQS, SES, Lambda) being used but for Serverless, there are some costs involved for using the framework.
Are there some other options as well apart from these?
I will be using NodeJS for the code and only AWS as my cloud platform.
Short answer: yes, this is totally doable with Serverless functions and actually a typical Serverless use case.
Long answer:
It's not necessary to use AWS SWF or AWS Step Functions here. However, you could use Step Functions in case your process gets more complicated (e.g. more external services are involved and you need certain error handling, or you want to improve parallel processing powers).
First of all, CloudFront is not comparable to AWS SAM or Serverless Framework. Did you mean AWS CloudFormation instead? CloudFront is a CDN to serve (and cache) any kind of content whereas CloudFormation is a tool to describe your infrastructure as code.
CloudFormation is the "basis" for AWS SAM and Serverless Framework
because they both translate their template code to CloudFormation
code in the end. However, CloudFormation makes developing Serverless
Functions a bit complicated in my opinion. That's why tools like AWS
SAM or Serverless Framework popped up at some point. AWS SAM is
basically an extension of CloudFormation, i.e. it provides
additional resource types like AWS::Serverless::Function but
everything else is CloudFormation. Serverless Framework also lets
you add CloudFormation resources but has its own syntax for
specifying Serverless Functions.
In terms of costs, CloudFormation, AWS SAM, and Serverless Framework are all
free. However, you can use some premium features of Serverless Framework but you don't have to. However, CloudFront is not free to use - but I believe it wasn't the service you were looking for. Besides that, for SQS, SES and Lambda you only pay for what you use.
I personally prefer AWS SAM because you are closer to CloudFormation code and compared to Serverless Framework, you don't need a plugin for some things to circumvent the abstractions that the Serverless Framework does for you. You'll notice this for 'bigger' projects where you are leaving the standard hello world examples. On the other side, the Serverless Framework is quite popular and hence, there are many resources out there to help you. Up to you what you prefer :)
In terms of infrastructure tooling, you could have a look at AWS CDK (a good starting point is cdkworkshop.com) which is becoming more and more popular.
For local development, you can have a look at Localstack. The free version supports emulating SQS and SES locally, so that should be helpful.
CDK Pipelines is great, specially for cross-account deployments. It enables the developers to define and customize the CI/CD pipeline for their app to their heart's content.
But to remain SoC compliant, we need to make sure that necessary controls like below are validated/enforced
A manual approval stage should be present before the stage that does the cross-account deployment to production
Direct deployment to production bypassing dev/staging environment is not allowed
Test cases (Unit tests/Integration tests) and InfoSec tests should pass before deployment
I know that above things are straightforward to implement in CDK Pipelines but I am not quite sure about how to ensure that every CDK Pipeline always conforms to these standards.
I can think of below solutions
Branch restrictions - Merge to master branch (which the CDK pipeline monitors) should be restricted and allowed only via pull requests
Tests - Add unit tests or integration tests which validate that the generated cloud formation template has specific resources/properties
Create a standard production stage with all necessary controls defined and wrap it in a library which developers need to use in their definition of the CDK Pipeline if the want to deploy to production
But how to enforce above controls in an automated fashion? Developers can choose to bypass above controls by simply not specifying them while defining the pipeline. And we do not want to rely on an Approver to check these things manually.
So in summary, the question is - When using CDK pipelines, how to give developers maximum customizability and freedom in designing their CI/CD solution while ensuring that SoC restrictions and mandatory controls are validated and enforced in an automated fashion?
Open Policy Agent might be helpful to check infra against custom policies in the pipeline.
https://aws.amazon.com/blogs/opensource/realize-policy-as-code-with-aws-cloud-development-kit-through-open-policy-agent/
After researching a lot, concluded that implementing these checks via a custom AWS Config rule is the best approach.
Let's say we want to ensure that
A manual approval stage is always present in every pipeline that
has a stage that deploys to prod.
We need to
Enable AWS Config and configure it to record all changes to Codepipeline
Create a custom AWS Config rule using AWS Lambda function (say pipeline-compliance-checker)
The lambda function gets triggered on every config change of any codepipeline and receives the latest config of the pipeline in question
It parses the latest pipeline config and checks whether the pipeline has a manual approval stage before the stage that deploys to prod. If yes, it deems the pipeline as COMPLIANT else NON_COMPLIANT
Create a AWS EventBridge rule to receive a notification to an SNS topic (say pipeline-non-compliance) when any pipeline is marked as NON_COMPLIANT - (doc)
Create another AWS Lambda function (say pipeline-compliance-enforcer) that is subscribed to that SNS topic. It stops the non-compliant pipeline in question (if it is in STARTED state) and then disables the incoming transition to the stage that deploys to prod. We can also delete the pipeline here if required.
Have tested the above setup and it fulfils the requirements.
Also learnt later that AWS speaks about the same solution to this problem in this talk - CI/CD Pipeline Security: Advanced Continuous Delivery Best Practices
I was wondering if there are any AWS Services or projects which allow us to configure a data pipeline using AWS Lambdas in code. I am looking for something like below. Assume there is a library called pipeline
from pipeline import connect, s3, lambda, deploy
p = connect(s3('input-bucket/prefix'),
lambda(myPythonFunc, dependencies=[list_of_dependencies])
s3('output-bucket/prefix'))
deploy(p)
There can be many variations of this idea of course. This use case assumes only one s3 bucket for e.g. There could be a list of input s3 buckets.
Can this be done by AWS Data Pipeline? The documentation I have(quickly) read says that Lambda is used to trigger a pipeline.
I think the closest thing that is available is the State Machine functionality within the newly released Lambda Step Functions. With these you can coordinate multiple steps that transform your data. I don't believe that they support standard event sources, so you would have to create a standard lambda function (potentially using the Serverless Application Model) to read from S3 and trigger your State Machine.