I'm building a system that has a web service(AWS API Gateway + AWS lambda + AWS RDS Aurora MySQL) fully integrated with a CI/CD pipeline(AWS CodePipeline) integrated with a Git WebHook. So, I have a template that provides the gateway, the lambda and the RDS cluster. Additionally, I have a custom resource in my template that creates the database and the tables( not ingesting data for now).
Regarding the architecture previously mentioned, here I have a couple of questions:
In this scenario, is a custom resource for creating the schema the best approach according to standards?
Regarding data ingestion and schema updates, is it a good practice to manage this within the pipeline, or is it better to do it outside(running incremental scripts manually)?.
In case you manage schema changes within the pipeline process... how do you achieve that?
Thanks
For creating the initial schema at this time the best choice is as you said using a custom resource.
Regarding data ingestion/schema updates if you're using version control for managing then having some kind of pipeline is definitely the correct way to go, however, where the difficulties lie are in a rollback scenario (especially with data manipulation).
You could either use a pure Lambda action within CodePipeline (including functionality to test and rollback your changes) or you could integrate the Lambda function with a third party solution for managing rolling updates to your SQL schema.
Related
How is it currently done the handling of multiple lambda functions for a single stack/application?
Considering a use case with more than one function is it better to stick all together in the same repository or have one for each?
Having a single repository for all the functions would be much easier for me coming from old/classic backend development with a single codebase for all the business logic, but moving on the AWS ecosystem means I can no longer "deploy" my entire business logic with a single command since I need to zip a single function and update the archive with the aws cli, and that is impossible with standard merge requests or pipeline due the impossibility of automation for these steps (every time it could be a different function or multiple ones).
From the other side, having e.g. 5 or 6 repositories one for each lambda alongside the ones for frontend and AWS stack would be very impractical to manage.
Bundle your different lambda functions together as a Cloudformation stack. Cloudformation allows you to create multiple AWS services, bridge them together as you wish. There are many tools you can use to achieve this. AWS Cloudformation, AWS SAM (serverless application model) or third party tools like serverless and Terraform. Base concept is known as Infrastructure as Code (IAC).
As per respositories, you can have a single repository per stack. (AWS SAM provides sample codes with a good directory structure) You can try sam init as an example.
Consider AWS Serverless Application Model for your development. It allows you to bash script build, package and deploy using sam cli based on the yaml template. SAM will figure out the diff in your code by itself (because it runs CloudFormation under the hood). It allows not only to combine several functions into one package, but also add API gateways, dynamoDB tables and so much more! Another cool feature is that your functions will appear as an integrated application in Lambda console so you can monitor them all at the same time.
CDK Pipelines is great, specially for cross-account deployments. It enables the developers to define and customize the CI/CD pipeline for their app to their heart's content.
But to remain SoC compliant, we need to make sure that necessary controls like below are validated/enforced
A manual approval stage should be present before the stage that does the cross-account deployment to production
Direct deployment to production bypassing dev/staging environment is not allowed
Test cases (Unit tests/Integration tests) and InfoSec tests should pass before deployment
I know that above things are straightforward to implement in CDK Pipelines but I am not quite sure about how to ensure that every CDK Pipeline always conforms to these standards.
I can think of below solutions
Branch restrictions - Merge to master branch (which the CDK pipeline monitors) should be restricted and allowed only via pull requests
Tests - Add unit tests or integration tests which validate that the generated cloud formation template has specific resources/properties
Create a standard production stage with all necessary controls defined and wrap it in a library which developers need to use in their definition of the CDK Pipeline if the want to deploy to production
But how to enforce above controls in an automated fashion? Developers can choose to bypass above controls by simply not specifying them while defining the pipeline. And we do not want to rely on an Approver to check these things manually.
So in summary, the question is - When using CDK pipelines, how to give developers maximum customizability and freedom in designing their CI/CD solution while ensuring that SoC restrictions and mandatory controls are validated and enforced in an automated fashion?
Open Policy Agent might be helpful to check infra against custom policies in the pipeline.
https://aws.amazon.com/blogs/opensource/realize-policy-as-code-with-aws-cloud-development-kit-through-open-policy-agent/
After researching a lot, concluded that implementing these checks via a custom AWS Config rule is the best approach.
Let's say we want to ensure that
A manual approval stage is always present in every pipeline that
has a stage that deploys to prod.
We need to
Enable AWS Config and configure it to record all changes to Codepipeline
Create a custom AWS Config rule using AWS Lambda function (say pipeline-compliance-checker)
The lambda function gets triggered on every config change of any codepipeline and receives the latest config of the pipeline in question
It parses the latest pipeline config and checks whether the pipeline has a manual approval stage before the stage that deploys to prod. If yes, it deems the pipeline as COMPLIANT else NON_COMPLIANT
Create a AWS EventBridge rule to receive a notification to an SNS topic (say pipeline-non-compliance) when any pipeline is marked as NON_COMPLIANT - (doc)
Create another AWS Lambda function (say pipeline-compliance-enforcer) that is subscribed to that SNS topic. It stops the non-compliant pipeline in question (if it is in STARTED state) and then disables the incoming transition to the stage that deploys to prod. We can also delete the pipeline here if required.
Have tested the above setup and it fulfils the requirements.
Also learnt later that AWS speaks about the same solution to this problem in this talk - CI/CD Pipeline Security: Advanced Continuous Delivery Best Practices
I would like to automate setting up the collection of AWS Application Load Balancer logs using Sumo Logic as documented here:
https://help.sumologic.com/07Sumo-Logic-Apps/01Amazon_and_AWS/AWS_Elastic_Load_Balancer_-_Application/01_Collect_Logs_for_the_AWS_Elastic_Load_Balancer_Application_App
This involves creating a bucket, creating a Sumo Logic hosted collector with an S3 source, taking the URL of the collector source provided by Sumo Logic and then creating an SNS Topic with an HTTP subscription where the subscription URL is the one provided by the Sumo Logic source.
The issue with this is that the SumoLogic source URL is not known at synthesis time. The Bucket must be deployed, then the Sumlogic things created, then the SNS topic created.
As best I can figure, I will have to do this through separate invocations of CDK using separate stacks, which is slower. One stack to create the bucket. After deploying that stack, use the Sumo Logic api to create or affirm prior creation of the Sumo Logic hosted collector and source, another CDK deploy to create the SNS topic and HTTP subscription.
I was just wondering if anyone knew of a better way to do this, perhaps some sort of deploy time hook that could be used.
There are two ways(which I know of) in which you can automate the collection of AWS Application Load Balancer.
Using CloudFormation
Sumo Logic have a template that creates the Collection process for AWS Application Load Balancer which is part of the AWS Observability Solution. You can fork the repository and can create your own CloudFormation template after removing resources you do not require.
Sumo Logic also have a Serverless Application which auto enable Access logging for existing and new (which are created after application installation) load balancer. Example template which uses the application.
Using Terraform
As mentioned by Grzegorz, you can create a terraform script also.
Disclaimer: Currently employed by Sumo Logic.
You could try using a Custom Resource SDK Call to trigger a lambda that does what you want.
https://docs.aws.amazon.com/cdk/api/latest/docs/#aws-cdk_custom-resources.AwsSdkCall.html
(I know this is not a perfect answer as it suggests to use another tool, yet I believe it fulfills the needs expressed in the question)
How about using Terraform?
sumologic_s3_source in Terraform is able to create the source at Sumo AND output its URL for other uses within Terraform - e.g. to set up AWS resources.
The docs on this even mention URL being one of the returned values:
url - The HTTP endpoint to use with SNS to notify Sumo Logic of new
files.
Disclaimer: I am currently employed by Sumo Logic.
I Am working on one AWS POC, it uses different aws component, below are the details of each individual components.
1- java function have code to generate data, I am calling it from lambda function through cloud watch scheduler
2- datapipe-line to copy data from RDS to S3.
3- Run hive scripts using athena over s3 data.
4- quicksight for visualization.
I am done with creating individual model but not able to understand what could be best way to connect all these components,So it can run in one go.
one though is to use lambda as a connector for each step. but have no template to connect lamda with Athena.
Kindly anyone can suggest best way to connect all above component.So that it can run in one go.
I am not familiar with hive scripts or quicksight, but a cloudformation stack or a terraform stack should assist you to connect various aws components as your workflow demands.
I am using AWS to build an API, and deploy this to multiple stages.
When a call is made to a specific environment, I need to get a stage variable in Lambda and then data is recorded in a DynamoDB table such as "environment-Table".
Is this the best way to work with environments (like development, production etc) using AWS API Gateway, Lambda and DynamoDB?
It difficult to say what the best approach is for your specific situation, given the limited data in your post. Managing multiple environments such as development and production was one of the intended uses of stage and stage variables. I don't see any obvious problems with what your are proposing.
Depending on your use case, you can call a Lambda function to record data in DynamoDB, or you may be able to skip the Lambda function and record the data in DynamoDB directly using the AWS proxy integration type.