Resolving cyclical dependencies between AWS CDK CloudFormation stacks

Resolving cyclical dependencies between AWS CDK CloudFormation stacks - amazon-web-services

Context, I have a CDK app with two stacks using the following setup:
Stack_A:
StateMachine_A
Lambda_A
S3Bucket_A
IAMRole_A
Stack_B:
StateMachine_B
SageMakerTrainJob_B
IAMRole_B
StateMachine_A runs Lambda_A using execution role IAMRole_A. A separate step in StateMachine_A writes data to S3Bucket_A. StateMachine_B runs SageMakerTrainJob_B using execution role IAMRole_B. Lambda_A's purpose is to start execution of StateMachine_B, whose SageMakerTrainJob_B needs to read from S3Bucket_A. Therefore, we have to configure the following permissions:
IAMRole_A needs startExecution permissions on StateMachine_B.
IAMRole_B needs read permissions on S3Bucket_A.
We tried to model this in CDK by creating a direct dependency in Stack_B on Stack_A, using references to IAMRole_A and S3Bucket_A within Stack_B's definition to grant the needed permissions in code. However, this generated the following error:
Error: 'Stack_B' depends on 'Stack_A' (dependency added using stack.addDependency()). Adding this dependency (Stack_A -> Stack_B/IAMRole_B/Resource.Arn) would create a cyclic reference.
Likewise, trying to model the dependency in the other direction gave the same error:
Error: 'Stack_A' depends on 'Stack_B' (dependency added using stack.addDependency()). Adding this dependency (Stack_B -> Stack_A/S3Bucket_A/Resource.Arn) would create a cyclic reference.
Is there any way around this using code dependencies? Are there any recommended best practices for situations like this? Some options we've considered include:
Using a third stack that depends on both to give Stack_A and Stack_B access to each other's resources.
Creating additional access roles for the necessary resources within each stack and maintaining assumeRole permissions for the Lambda/SageMaker roles somewhere outside of CDK.
Putting them all in one stack. Not great for organization and makes the resources really tightly coupled--we may not want StateMachine_A to be the only entry point to StateMachine_B in the future.
Also , I see there were similar-sounding issues during CDK development with CodeCommit/CodePipeline and APIGateway/Lambda. Is this a related bug, or are we just trying to do something that's not supported?

Circular references are always tricky. This isn't a problem that is unique to the CDK. As you explain the problem logically you see where things start to break down. CloudFormation must create any resources that another resource depends on before it can create the dependent resource. There isn't a one solution fits all approach to this, but I'll give some ideas that work.
Promote shared resources to another stack. In your case the S3 bucket needs to be used by both stacks, so if that is in a stack that runs before both you can create the S3 bucket, use an export/import in stack B to reference the S3 bucket, and use and export/import in stack A to reference both the S3 bucket and the state machine in B.
Use wildcards in permissions. Often you can know the name, or enough of the name, of a resource to use wildcards in your permissions. You want to keep the permissions tightly scoped, but quite often a partial name match is good enough. Use this option with caution, of course. Also keep in mind that many resources can be named by you. Many prefer not to do this ever, and some resources you shouldn't (like and S3 bucket), but I often find it's easier to name things.
Create a custom resource to tie things together. If you have a true circular dependency that cannot be resolved (even in the same stack) you may need to use a custom resource to do the work for you. A prime example of this is S3 bucket events.

In addition to the guidelines by Jason, I'd like to mention a fourth option:
You can also try to decouple the two stacks by moving more towards an event driven architecture. While you can't resolve the dependency of IAMRole_B needs read permissions on S3Bucket_A in this example, you can resolve the dependency IAMRole_A needs startExecution permissions on StateMachine_B. Stack A could introduce an EventBridge where StateMachine_A raises an event as soon as it is finished. StateMachine_B can subscribe to this event and start as soon as it has been risen. The stacks would look like the following:
Stack_A:
StateMachine_A
Event_A
S3Bucket_A
IAMRole_A
Stack_B:
StateMachine_B
SageMakerTrainJob_B
IAMRole_B
You would still have two dependencies:
IAMRole_B needs read permissions on Event_A.
IAMRole_B needs read permissions on S3Bucket_A.

Related

Using CDK to Create a Step Function With Dependencies on Other AWS Resources (Like a Lambda) Owned By Different Projects

We're using AWS Step Functions in our application. We have one step function we're creating with the use of the CDK as part of a deployment of Application A from Repository A. That step function needs to include a lambda function as one of the steps. The problem we're having is that this lambda function is created and maintained independently in a different repository (Repository B). We're not sure the best way to connect one AWS resource (AWS Lambda) with another AWS resource (AWS Step Functions) when the creation of those two resources is happening independently in two different places.
We'd like to not manually create the lambda or step function (or both) in each environment. It's time consuming, prone to error and we're going to have a lot of these situations occur.
Our best thought at the moment is that we could maybe have Application A create the step function, but have it create and reference an empty lambda. Initially the step function won't be fully functional of course, but then when we deploy Application B it could look for that empty lambda function and upload new code to it.
And, so that we don't have an issue where deploying Application B first results in non-working code. We can also handle the opposite condition: Application B could create the lambda function before uploading the code to it if it doesn't already exist. Application A could then look to see if the lambda function already exists when creating the step function and just reference the lambda function in the step function directly.
Concerns with this approach:
This is extra work and adds a lot of complexity to the deployment, so more potential for failure
I'm not sure I can easily look up a lambda function like this anyway (I guess it would have to be by name since we couldn't know what the ARN would be when we're writing the code). But then we have issues if the name changes too, so maybe there's a pre-defined ID or something we could use to look it up instead.
Potential for code failing in production. If when deploying to QA for testing we deploy Application A, then Application B, we really only know that scenario works. If, then, when going to production we deploy them in the opposite order it might break.
What are some good options for this kind of thing because I can't think of anything great. My best idea involves not using lambda at all but instead having the step function step be queueing something up in SQS, then Application B can just read from that queue no problem. It feels like this is a common enough scenario though that there must be some clean way to do it with lambda and I wouldn't want my decisions on what service type I can use in AWS be stymied by deployment feasibility.
Thanks

You can easily include an existing Lambda function in a new CDK-created Step Function. Use the Function.fromFunctionArn static method to get a read-only reference to the Lambda using its ARN. The CDK uses the ARN to add the necessary lambda:InvokeFunction permissions to the Step Functions' assumed role.
import { aws_stepfunctions_tasks as tasks } from 'aws-cdk-lib';
const importedLambdaTask = new tasks.LambdaInvoke(this, 'ImportedLambdaTask', {
lambdaFunction: lambda.Function.fromFunctionArn(
this,
'ImportedFunc',
'arn:aws:lambda:us-east-1:123456789012:function:My-Lambda5C096DFA-RLhGGzBJSnMN'
),
resultPath: '$.importedLambdaTask',
});
If you prefer not to hard code the Lambda ARN int the CDK stack, save the ARN to a SSM Parameter Store Parameter. Then import it into the stack by name and pass it to fromFunctionArn:
const lambdaArnParam = ssm.StringParameter.fromStringParameterName(
this,
'ArnFromParamStore',
'lambda-arn-saved-as-ssm-param'
);
Edit: Optionally add a Trigger construct to your CDK Application A to confirm the existence of the Application B Lambda dependency before deploying. Triggers are a newish CDK feature that let you run Lambda code during deployments. The Trigger Function should return an error if it cannot find the external Lambda, thereby causing Application A's deployment to fail.

Create Infrastructure Documentation from terraform + gitlab-ci system

Our infra pipeline is setup using terraform + gitlab-ci. I am given with task to provide documentation on setup with what's implemented and what's left. I am new to infra world and finding it hard to come up template to start documentation.
So far I thought of having a table with resources needed with details on dependencies, source of the module, additional notes, etc
If you have a template, can you share OR any other suggestions?

For starters, you could try one or both of the below approaches:
a) create a graph of the Terraform resources using its graph command
b) group and then list all of your resources for a specific tag using AWS Resource Groups, specifically its Create Resource Group functionality

The way I do documentation is to keep it as simple as possible, explain how it works, how to use it and also provide instructions on how it was setup from scratch for reference and as an insurance policy. So that if it's destroyed, someone other than the person that set it all up could recreate it.
Since this is just a pipeline there is probably not much to diagram. The structure of documentation I would provide would be something like this and I would add this either as part of the README.md, in Confluence or however your team does documentation.
Summary
1-2 Sentences about the work and why it was created.
How the Repo is Structured
An explanation on how the repo is structured and decisions behind why it was structured the way it was.
How To Use
Provide steps on how a user can use the pipeline
How It Was Created
Provide steps on how it was setup so anybody can manage it and work on it going forward.

Can I load code from file in a AWS Lambda?

I am thinking of creating 2 generic AWS Lambda functions, one as an "Invoker" to run the other Lambda function. The invoked Lambda function loads the code of the Lambda from a file based on the parameter that is passed to it.
Invoker: Calls the invoked Lambda with a specified parameter, e.g. ID
Invoked: Based on the ID, load the appropriate text file containing
the actual code to run
Can this be done?
The reason for this thinking is that I don't want to have to deploy 100 Lambda functions if I could just save the code in 100 text files in S3 bucket and load them as required.
The code is uploaded constantly by users and so I cannot include it in the lambda. And the code can be in all languages supported by AWS (.NET, NodeJs, Python, etc.)
For security, is there a way to maybe "containerized" running the code?
Any recommendation and ideas are greatly appreciated.
Thanking you in advance.

The very first I'd like to mention is that you should pay a lot of attention to the security aspects of your app as you are going to execute code uploaded by users, meaning that they will potentially be able to access sensitive data.
My example is based on NodeJS, but I think something similar may be achieved using other runtimes, not sure. There are main two things you need to know:
AWS Lambda execution environment provides you with the /tmp folder with capacity of 512 MB and you are allowed to put there any necessary resources needed for the current particular invocation.
NodeJS allows you to require modules dynamically at any place in the app.
So, basically, you may download the desired js file into the /tmp folder and then require it from your code. I am not going to write the real code now as it could be quite big, but here is some general steps just to make things clear:
Lambda receives fileId as a parameter in event.
Lambda searches S3 for the file named fileId and then downloads it to the /tmp folder as fileId.js
Now in the app you may require that file and consider it as a module:
const dynamicModule = require("/tmp/fileId.js");
Use the the module loaded

You certainly won't be able to run Python code, or .Net code, in a Node lambda. Can you load files and dynamically run the code? Probably. Should you? Probably not. Even if trust the source of that code you don't want them all running in the same function. 1) they would share the same permissions. That means that, at a minimum, they would all have access to the same S3 bucket where the code is stored. 2) they would all log to the same place. Good luck debugging.
We have several hundred lambda functions deployed in our account. I would never even entertain this idea as an alternative.

What's the point of full configurations in .tfstate?

So I've gone through some basic reading:
https://blog.gruntwork.io/an-introduction-to-terraform-f17df9c6d180
and
https://blog.gruntwork.io/how-to-manage-terraform-state-28f5697e68fa
So I understand that .tfstate tells the terraform CLI which resources it's actually responsible for managing. But couldn't that be done more minimally with a list of ID's?
Why does .tfstate need to contain full configurations of all resources, if terraform refetch is run implicitly before terraform apply?
Would the fetch not get complete information from the infrastructure? And then that could be used to do the diff etc...
I suppose if you get complete information every time, you might as well record it. But I'm wondering if it's a necessary step. Thanks!

Import current state of my cloud AWS account with terraform

I would like to version control my cloud resources initially before using it to apply through Terraform. Is there anyway I can run a single command and store the current state of my cloud?
I have tried to use Terraform's import command:
terraform import ADR ID
But this takes a long time to identify all the resources and import them.
I have tried terraforming but this also needs a resource type to import:
terraforming s3
Is there any tool that can help in importing all existing resources?

While this doesn't technically answer your question I would strongly advise not to try and import an entire existing AWS account into Terraform in a single way even if it was possible.
If you look at any Terraform best practices an awful lot of it comes down to minimising blast radius of things so only things that make sense to be changed at the same time as each other are ever applied at the same time. Charity Majors wrote up a good blog post about this and the impact it had when that wasn't the case.
Any tool that would mass import things (eg terraforming) is just going to dump everything in a single state file. Which, as mentioned before, is a bad idea.
While it sounds laborious I'd recommend that you being your migration to Terraform more carefully and methodically. In general I'd probably say that only new infrastructure should use Terraform, utilising Terraform's data sources to look up existing things such as VPC IDs that already exist.
Once you feel comfortable with using Terraform and structuring your infrastructure code and state files in a particular way you can then begin to think about how you would map your existing infrastructure code into Terraform state files etc and begin manually importing specific resources as necessary.
Doing things this way also allows you to find your feet with Terraform a bit better and understand its limitations and strengths while also working out how your team and/or CI will work together (eg remote state and state file locking and orchestration) without tripping over each other or causing potentially crippling state issues.

I'm using terraformer to import my existing AWS infrastructure. It's much more flexible than terraforming and has no issues mentioned in answers.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js