How to enforce standards and controls when using CDK Pipeline

How to enforce standards and controls when using CDK Pipeline - amazon-web-services

CDK Pipelines is great, specially for cross-account deployments. It enables the developers to define and customize the CI/CD pipeline for their app to their heart's content.
But to remain SoC compliant, we need to make sure that necessary controls like below are validated/enforced
A manual approval stage should be present before the stage that does the cross-account deployment to production
Direct deployment to production bypassing dev/staging environment is not allowed
Test cases (Unit tests/Integration tests) and InfoSec tests should pass before deployment
I know that above things are straightforward to implement in CDK Pipelines but I am not quite sure about how to ensure that every CDK Pipeline always conforms to these standards.
I can think of below solutions
Branch restrictions - Merge to master branch (which the CDK pipeline monitors) should be restricted and allowed only via pull requests
Tests - Add unit tests or integration tests which validate that the generated cloud formation template has specific resources/properties
Create a standard production stage with all necessary controls defined and wrap it in a library which developers need to use in their definition of the CDK Pipeline if the want to deploy to production
But how to enforce above controls in an automated fashion? Developers can choose to bypass above controls by simply not specifying them while defining the pipeline. And we do not want to rely on an Approver to check these things manually.
So in summary, the question is - When using CDK pipelines, how to give developers maximum customizability and freedom in designing their CI/CD solution while ensuring that SoC restrictions and mandatory controls are validated and enforced in an automated fashion?

Open Policy Agent might be helpful to check infra against custom policies in the pipeline.
https://aws.amazon.com/blogs/opensource/realize-policy-as-code-with-aws-cloud-development-kit-through-open-policy-agent/

After researching a lot, concluded that implementing these checks via a custom AWS Config rule is the best approach.
Let's say we want to ensure that
A manual approval stage is always present in every pipeline that
has a stage that deploys to prod.
We need to
Enable AWS Config and configure it to record all changes to Codepipeline
Create a custom AWS Config rule using AWS Lambda function (say pipeline-compliance-checker)
The lambda function gets triggered on every config change of any codepipeline and receives the latest config of the pipeline in question
It parses the latest pipeline config and checks whether the pipeline has a manual approval stage before the stage that deploys to prod. If yes, it deems the pipeline as COMPLIANT else NON_COMPLIANT
Create a AWS EventBridge rule to receive a notification to an SNS topic (say pipeline-non-compliance) when any pipeline is marked as NON_COMPLIANT - (doc)
Create another AWS Lambda function (say pipeline-compliance-enforcer) that is subscribed to that SNS topic. It stops the non-compliant pipeline in question (if it is in STARTED state) and then disables the incoming transition to the stage that deploys to prod. We can also delete the pipeline here if required.
Have tested the above setup and it fulfils the requirements.
Also learnt later that AWS speaks about the same solution to this problem in this talk - CI/CD Pipeline Security: Advanced Continuous Delivery Best Practices

Related

AWS Lambda functions - repository for multiple functions codes?

How is it currently done the handling of multiple lambda functions for a single stack/application?
Considering a use case with more than one function is it better to stick all together in the same repository or have one for each?
Having a single repository for all the functions would be much easier for me coming from old/classic backend development with a single codebase for all the business logic, but moving on the AWS ecosystem means I can no longer "deploy" my entire business logic with a single command since I need to zip a single function and update the archive with the aws cli, and that is impossible with standard merge requests or pipeline due the impossibility of automation for these steps (every time it could be a different function or multiple ones).
From the other side, having e.g. 5 or 6 repositories one for each lambda alongside the ones for frontend and AWS stack would be very impractical to manage.

Bundle your different lambda functions together as a Cloudformation stack. Cloudformation allows you to create multiple AWS services, bridge them together as you wish. There are many tools you can use to achieve this. AWS Cloudformation, AWS SAM (serverless application model) or third party tools like serverless and Terraform. Base concept is known as Infrastructure as Code (IAC).
As per respositories, you can have a single repository per stack. (AWS SAM provides sample codes with a good directory structure) You can try sam init as an example.

Consider AWS Serverless Application Model for your development. It allows you to bash script build, package and deploy using sam cli based on the yaml template. SAM will figure out the diff in your code by itself (because it runs CloudFormation under the hood). It allows not only to combine several functions into one package, but also add API gateways, dynamoDB tables and so much more! Another cool feature is that your functions will appear as an integrated application in Lambda console so you can monitor them all at the same time.

AWS Lambda CI/CD process

I am trying to understand the correct way to setup my project on AWS so that I ultimately get the possibility to have CI/CD on the lambda functions. And also to ingrain good practices.
My application is quite simple : an API that calls lambda functions based on users' requests.
I have deployed the application using AWS SAM. For that, I used a SAM template that was using local paths to the lambda functions' code and that created the necessary AWS ressources (API Gateway and Lambda). It was necessary to use local paths for the lambda functions because the way SAM works does not allow using existing S3 buckets for S3 events trigger (see here) and I deploy a Lambda function that is watching the S3 bucket to see any updated code to trigger lambda updates.
Now what I have to do is to push my Lambda code on Github. And have a way that Github pushes the lambda functions' code from github to the created S3 bucket during the SAM deploy and the correct prefix. Now what I would like is a way to automatically to that upon Github push.
What is the preferred way to achieve that ? I could not find clear information in AWS documentation. Also, if you see a clear flaw in my process don't hesitate to point it out.

What you're looking to do is a standard CI/CD pipeline.
The steps of your pipeline will be (more or less): Pull code from GitHub -> Build/Package -> Deploy
You want this pipeline to be triggered upon a push to GitHub, this can be done by setting up a Webhook which will then trigger the pipeline.
Last two steps are supported by SAM which I think you have already implemented before, so will be a matter of triggering the same from the pipeline.
These capabilities are supported by most CI/CD tools, if you want to keep everything in AWS you could use CodePipeline which also supports GitHub integration. Nevertheless, Jenkins is perfectly fine and suitable for your use case as well.

There are a lot of ways you can do it. So would depend eventually on how you decide to do it and what tools you are comfortable with. If you want to use native AWS tools, then Codepipeline is what might be useful.
You can use CDK for that
https://aws.amazon.com/blogs/developer/cdk-pipelines-continuous-delivery-for-aws-cdk-applications/
If you are not familiar with CDK and would prefer cloudformation, then this can get you started.
https://docs.aws.amazon.com/codepipeline/latest/userguide/tutorials-github-gitclone.html

What is the best way to develop, test and deploy serverless applications which uses AWS services SQS, SES, Lambda

I am trying to develop a serverless application which will use AWS SQS (Simple Queue Service), AWS SES (Simple Email Service) and AWS Lambda. The application will perform these steps:
get some messages in the SQS queue
trigger a Lambda function to handle all these messages
the Lambda will either send an email using AWS SES or an SMS using some 3rd party API, depending on the type of message
To test this out, I created the queue, lambda and configured SES, all manually using the web interface at https://aws.amazon.com . For the Lambda function, I simply typed my code in the web IDE provided at Lambda console. Since it was a very simple POC, it didnt need any testing and I got it to work.
Now, I want to turn this into a production ready application. My requirements:
code this entire application locally on my machine
test it locally in an environment similar to the one at AWS
publish the code to GitLab and then finally deploy it at AWS
all these resources (SQS queue, SES config, Lambda function) to be created automatically through code
Based on what I read online, I could find 3 different options for doing this:
AWS Cloudformation
AWS SAM
Serverless framework
My questions are:
For my use case, are serverless applications the correct technology or do I need something else like AWS SWF or AWS Step Functions? I also read about AWS Lambda applications. Are they something else?
Which is the best option among these in terms of cost, ease of setup and use? I checked that CloudFormation itself doesnt cost anything, you just have to pay for the services (SQS, SES, Lambda) being used but for Serverless, there are some costs involved for using the framework.
Are there some other options as well apart from these?
I will be using NodeJS for the code and only AWS as my cloud platform.

Short answer: yes, this is totally doable with Serverless functions and actually a typical Serverless use case.
Long answer:
It's not necessary to use AWS SWF or AWS Step Functions here. However, you could use Step Functions in case your process gets more complicated (e.g. more external services are involved and you need certain error handling, or you want to improve parallel processing powers).
First of all, CloudFront is not comparable to AWS SAM or Serverless Framework. Did you mean AWS CloudFormation instead? CloudFront is a CDN to serve (and cache) any kind of content whereas CloudFormation is a tool to describe your infrastructure as code.
CloudFormation is the "basis" for AWS SAM and Serverless Framework
because they both translate their template code to CloudFormation
code in the end. However, CloudFormation makes developing Serverless
Functions a bit complicated in my opinion. That's why tools like AWS
SAM or Serverless Framework popped up at some point. AWS SAM is
basically an extension of CloudFormation, i.e. it provides
additional resource types like AWS::Serverless::Function but
everything else is CloudFormation. Serverless Framework also lets
you add CloudFormation resources but has its own syntax for
specifying Serverless Functions.
In terms of costs, CloudFormation, AWS SAM, and Serverless Framework are all
free. However, you can use some premium features of Serverless Framework but you don't have to. However, CloudFront is not free to use - but I believe it wasn't the service you were looking for. Besides that, for SQS, SES and Lambda you only pay for what you use.
I personally prefer AWS SAM because you are closer to CloudFormation code and compared to Serverless Framework, you don't need a plugin for some things to circumvent the abstractions that the Serverless Framework does for you. You'll notice this for 'bigger' projects where you are leaving the standard hello world examples. On the other side, the Serverless Framework is quite popular and hence, there are many resources out there to help you. Up to you what you prefer :)
In terms of infrastructure tooling, you could have a look at AWS CDK (a good starting point is cdkworkshop.com) which is becoming more and more popular.
For local development, you can have a look at Localstack. The free version supports emulating SQS and SES locally, so that should be helpful.

How to extend AWS CDK with non AWS Resources during deploy

I would like to automate setting up the collection of AWS Application Load Balancer logs using Sumo Logic as documented here:
https://help.sumologic.com/07Sumo-Logic-Apps/01Amazon_and_AWS/AWS_Elastic_Load_Balancer_-_Application/01_Collect_Logs_for_the_AWS_Elastic_Load_Balancer_Application_App
This involves creating a bucket, creating a Sumo Logic hosted collector with an S3 source, taking the URL of the collector source provided by Sumo Logic and then creating an SNS Topic with an HTTP subscription where the subscription URL is the one provided by the Sumo Logic source.
The issue with this is that the SumoLogic source URL is not known at synthesis time. The Bucket must be deployed, then the Sumlogic things created, then the SNS topic created.
As best I can figure, I will have to do this through separate invocations of CDK using separate stacks, which is slower. One stack to create the bucket. After deploying that stack, use the Sumo Logic api to create or affirm prior creation of the Sumo Logic hosted collector and source, another CDK deploy to create the SNS topic and HTTP subscription.
I was just wondering if anyone knew of a better way to do this, perhaps some sort of deploy time hook that could be used.

There are two ways(which I know of) in which you can automate the collection of AWS Application Load Balancer.
Using CloudFormation
Sumo Logic have a template that creates the Collection process for AWS Application Load Balancer which is part of the AWS Observability Solution. You can fork the repository and can create your own CloudFormation template after removing resources you do not require.
Sumo Logic also have a Serverless Application which auto enable Access logging for existing and new (which are created after application installation) load balancer. Example template which uses the application.
Using Terraform
As mentioned by Grzegorz, you can create a terraform script also.
Disclaimer: Currently employed by Sumo Logic.

You could try using a Custom Resource SDK Call to trigger a lambda that does what you want.
https://docs.aws.amazon.com/cdk/api/latest/docs/#aws-cdk_custom-resources.AwsSdkCall.html

(I know this is not a perfect answer as it suggests to use another tool, yet I believe it fulfills the needs expressed in the question)
How about using Terraform?
sumologic_s3_source in Terraform is able to create the source at Sumo AND output its URL for other uses within Terraform - e.g. to set up AWS resources.
The docs on this even mention URL being one of the returned values:
url - The HTTP endpoint to use with SNS to notify Sumo Logic of new
files.
Disclaimer: I am currently employed by Sumo Logic.

Create RDS schema when RDS cluster is part of a AWS Codepipeline

I'm building a system that has a web service(AWS API Gateway + AWS lambda + AWS RDS Aurora MySQL) fully integrated with a CI/CD pipeline(AWS CodePipeline) integrated with a Git WebHook. So, I have a template that provides the gateway, the lambda and the RDS cluster. Additionally, I have a custom resource in my template that creates the database and the tables( not ingesting data for now).
Regarding the architecture previously mentioned, here I have a couple of questions:
In this scenario, is a custom resource for creating the schema the best approach according to standards?
Regarding data ingestion and schema updates, is it a good practice to manage this within the pipeline, or is it better to do it outside(running incremental scripts manually)?.
In case you manage schema changes within the pipeline process... how do you achieve that?
Thanks

For creating the initial schema at this time the best choice is as you said using a custom resource.
Regarding data ingestion/schema updates if you're using version control for managing then having some kind of pipeline is definitely the correct way to go, however, where the difficulties lie are in a rollback scenario (especially with data manipulation).
You could either use a pure Lambda action within CodePipeline (including functionality to test and rollback your changes) or you could integrate the Lambda function with a third party solution for managing rolling updates to your SQL schema.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js