I read in many places on internet that CloudFormation is not idempotent, but I cannot find any example that proves this fact.
Could you please provide me an example that runs a resource to prove that CloudFormation is not idempotent ?
The definition of idempotent according to Wikipedia is as follows:
In computer science, the term idempotent is used more comprehensively
to describe an operation that will produce the same results if
executed once or multiple times.
CloudFormation is considered not idempotent in several aspects of its behavior:
Calling the create API for a stack that already exists will result in an error
Calling the update API with an unchanged CloudFormation stack results in an error
Creating and deleting the same stack again will result in creating resources with different ARNs for IAM Users, Security Group IDs, EC2 Instance IDs, VPC IDs, etc...
Resources modified outside of CloudFormation will not be changed back to original values if existing stack is updated with existing content
However, from a high level one of the main reasons to use CloudFormation is so you represent your infrastructure as code so you can use it to produce the same infrastructure repeatedly. That is almost identical to the original definition of idempotent, but the distinction is on the multiple times part here. As listed above when using the same stack and applying on top of it or deleting a stack and recreating it, technically you are not getting the exact same results, but from a practical standpoint this is completely understandable and often perfectly acceptable.
I am not sure whether this answer will be useful as the question has been posted 2 years ago. Better late than never.
AWS CloudFormation has changed a lot in these 2 years. Right now, I can say for sure that it's API calls are idempotent.
Have a look at these API calls:
CreateStack
UpdateStack
DeleteStack
You will find that there is an optional parameter called ClientRequestToken. This provides idempotency to the API calls. It's the token that client provides to tell the CloudFormation service that it is not making a new API call. As long as you use the same token and keep making the call with rest of the parameters same, CloudFormation knows that you are only retrying the call.
Cloudformation is idempotent provided you have not made updates to an already completed stack, if there are changes then it will update ,now updating a resource might require deletion of it and creation or an update w/o creation of a new resource
To know more read about cfn-hup process, this will help you
Related
I have a few things to get clear, specifically regarding modeling architecture for a serverless application using AWS CDK.
I’m currently working on a serverless application developed using AWS CDK in TypeScript. Also as a convention, we follow the below rules too.
A stack should only have one table (dynamo)
A stack should only have one REST API (api-gateway)
A stack should not depend on any other stack (no cross-references), unless its the Event-Stack (a stack dedicated to managing EventBridge operations)
The reason for that is so that each stack can be deployed independently without any interferences of other stacks. In a way, our stacks are equivalent to micro-services in a micro-service architecture.
At the moment all the REST APIs are public and now we have decided to make them private by attaching custom Lambda authorizers to each API Gateway resource. Now, in this custom Lambda authorizer, we have to do certain operations (apart from token validation) in order to allow the user's request to proceed further. Those operations are,
Get the user’s role from DB using the user ID in the token
Get the user’s subscription plan (paid, free, etc.) from DB using the user ID in the token.
Get the user’s current payment status (due, no due, fully paid, etc.) from DB using the user ID in the token.
Get scopes allowed for this user based on 1. 2. And 3.
Check whether the user can access this scope (the resource user currently requesting) based on 4.
This authorizer Lambda function needs to be used by all the other Stacks to make their APIs private. But the problem is roles, scopes, subscriptions, payments & user data are in different stacks in their dedicated DynamoDB tables. Because of the rules, I have explained before (especially rule number 3.) we cannot depend on the resources defined in other stacks. Hence we are unable to create the Authoriser we want.
Solutions we could think of and their problems:
Since EventBridge isn't bi-directional we cannot use it to fetch data from a different stack resource.
We can invoke a Lambda in a different stack using its ARN and get the required data from its' response but, AWS has discouraged this as a CDK Anti Pattern
We cannot use technology like gRPC because it requires a continuously running server, which is out of the scope of the server-less architecture.
There was also a proposal to re-design the CDK layout of our application. The main feature of this layout is going from non-crossed-references to adopting a fully-crossed-references pattern. (Inspired by layered architecture as described in this AWS best practice)
Based on that article, we came up with a layout like this.
Presentation Layer
Stack for deploying the consumer web app
Stack for deploying admin portal web app
Application Layer
Stack for REST API definitions using API Gateway
Stack for Lambda functions running business-specific operations (Ex: CRUDs)
Stack for Lambda functions runs on event triggers
Stack for Authorisation (Custom Lambda authorizer(s))
Stack for Authentication implementation (Cognito user pool and client)
Stack for Events (EvenBuses)
Stack for storage (S3)
Data Layer
Stack containing all the database definitions
There could be another stack for reporting, data engineering, etc.
As you can see, now stacks are going to have multiple dependencies with other stacks' resources (But no circular dependencies, as shown in the attached image). While this pattern unblocks us from writing an effective custom Lambda authorizer we are not sure whether this pattern won't be a problem in the long run, when the application's scope increases.
I highly appreciate the help any one of you could give us to resolve this problem. Thanks!
Multiple options:
Use Parameter Store rather than CloudFormation exports
Split stacks into a layered architecture like you described in your
answer and import things between Stacks using SSM parameter store like the other answer describes. This is the most obvious choice for breaking inter-stack dependencies. I use it all the time.
Use fixed resource names, easily referencable and importable
Stack A creates S3 bucket "myapp-users", Stack B imports S3 bucket by fixed name using Bucket.fromBucketName(this, 'Users', 'myapp-users'). Fixed resource names have their own downsides, so this should be used only for resources that are indeed shared between stacks. They prevent easy replacement of the resource, for example. Also, you need to enforce the correct Stack deployment order, CDK will not help you with that anymore since there are no cross-stack dependencies to enforce it.
Combine the app into a single stack
This sounds extreme
and counter intuitive, but I found that most real life teams don't
actually have a pressing need for multi-stack deployment. If your only concern is
separating code-owners of different parts of the application - you
can get away by splitting the stack into multiple Constructs,
composed into a single stack, where each team takes care of their
Construct and its children. Think of it as combining multiple Git repos into a Monorepo. A lot of projects are doing that.
A strategy I use to avoid hard cross-references involves storing shared resource values in AWS Systems Manager.
In the exporting stack, we can save the name of an S3 Bucket for instance:
ssm.StringParameter(
scope=self,
id="/example_stack/example_bucket_name",
string_value=self.example_bucket.bucket_name,
parameter_name="/example_stack/example_bucket_name",
)
and then in the importing stack, retrieve the name and create an IBucket by using a .from_ method.
example_bucket_name = ssm.StringParameter.value_for_string_parameter(
scope=self,
parameter_name="/example_stack/example_bucket_name",
)
example_bucket = s3.Bucket.from_bucket_name(
scope=self,
id="example_bucket_from_ssm",
bucket_name=example_bucket_name,
)
You'll have to figure out the right order to deploy your stacks but otherwise, I've found this to be a good strategy to avoid the issues encountered with stack dependencies.
I use CDK to deploy a lambda function (along some IAM role & queue) and monitoring resources about the lambda, lambda log group and queue earlier. What i have right now is basically 2 class, 1 class to create all the lambda related resource and another to create monitoring resource and they are added all into 1 deployment stack.
Recently im deploying this to a new account and i realized my stack fail to create because some of the monitoring stuff is looking for the lambda log group and cant find it since its not created yet.
So what is the better option:
have 2 deployment group, 1 for lambda related resource and 1 for monitoring resource
use dependencies to create some ordering in my stack.
seems like both possible solution but what is a better long term solution?
Assuming you mean a Stack for your two classes, then you are better off making them both cdk.NestedStacks and instantiating them in a single common stack. You can then expose constructs as class attributes in one stack and pass them into the other as parameters to the second. Of course, this only works one way - if you have to go both ways you need to re-evaluate how you have your stacks organized.
The advantage of doing this is great: exposing constructs as an attribute is the best practice as it gives you direct access to that construct before it creates the CloudFormation data for it. you have complete access to every part of that construct from various arns (like dynamodb stream arns which are difficult to import) and automatically know the layer versions for lamdba layers - among many other things.
In addition, you never run into a stack dependency - if they are different top level stacks and you share constructs between them you can very run into lock situations where attempting to change something in one stack creates a dependency lock and prevents the stack from deploying.
The downside is that they all are part of the deployment. So there is a potential for something to be updated when you didnt expect it too - though CDK does use the Cloudformation Changeset system so it should not update things that have no changes applied to them (but sometimes, changes occur because of the way CDK generates tokens and such that you may not be aware of)
IF you do not go this route you are stuck using the various from* methods in cdk constructs to import the existing construct into your stack. This causes some issues, as it it can't import everything about a given construct at synth time (layer version and dynamo stream arns are two notable ones i mentioned already). Plus, you need to know the name of the construct - and Best Practices says you shouldn't deliberately name your constructs so you can easily spin up adhoc versions of your app without naming issues.
I am looking at AWS Lambda to create a Python function that would process data.
I need to load a heavy model to run my script (trained word2vec model), it takes about 5 min to do it on my computer for example. But once it's loaded, the execution of the function is very fast.
If I use AWS Lambda, will this model load only once or will it load each time I call my function ?
Thanks,
Maybe.
AWS Lambda uses reusable containers. So, for your use case, the Lambda function will execute quickly if it happened in an already initialized container. It'll be slow otherwise. However, there is no way you can predict the behavior.
Relevant documentation:
From here:
The first time a function executes after being created or having its code or resource configuration updated, a new container with the appropriate resources will be created to execute it, and the code for the function will be loaded into the container.
Let’s say your function finishes, and some time passes, then you call it again. Lambda may create a new container all over again, in which case the experience is just as described above. This will be the case for certain if you change your code. However, if you haven’t changed the code and not too much time has gone by, Lambda may reuse the previous container.
Remember, you can’t depend on a container being reused, since it’s Lambda’s prerogative to create a new one instead.
More official documentation here.
It will MAYBE (thanks Michael-sqlbot for the correction) load each time you invoke Lambda.
We can infer that the AWS Lambdas are stateless based on the following
Lambda is stateless
"Lambda functions are 'stateless' with no affinity to the underlying infrastructure, so that Lambda can rapidly launch as many copies of the function as needed to scale to the rate of incoming events
Lambda must be coded in stateless style
Your Lambda function code must be written in a stateless style, and have no affinity with the underlying compute infrastructure. Your code should expect local file system access, child processes, and similar artifacts to be limited to the lifetime of the request
However Container reuse is possible in Lambda
If you haven’t changed the code and not too much time has gone by, Lambda may reuse the previous container
So basically to answer your question, it is possible that you get back the model, and the probability of that is inversely proportional to the time span between 2 Lambda invocations. But you simply cannot rely on that
I have by mistake given wrong name to AWS Lambda function. Now, I wanted to change its name. I found from the given stackoverflow question that best way to do that is just create a new function and copy the exact same code into it.
Is it possible to rename an AWS Lambda function?
I am thinking to do that but I am just worried about data loss. Since my lambda is currently had 2 SNS triggers from where it is constantly receiving data. So, if I stop this lambda and create new one, I would lose some data during that time. Also, if I start the new lambda before deleting previous one, I would some entries in my datastore twice. So, is there any way I could use to get this done?
As #John Rotenstein said, it is not possible to rename an AWS Lambda. If you look at the documentation for Lambda (http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-lambda-function.html) you will see that updating FunctionName requires replacement of the entity.
If you specify a name, you cannot perform updates that require replacement of this resource. You can perform updates that require no or some interruption. If you must replace the resource, specify a new name.
If you are working with more complex systems, as it seems due to your note of SNS triggers, I would highly encourage you to take a look at CloudFormation (https://aws.amazon.com/cloudformation/), which uses code to manage deployed services. This not only has the benefit of allowing easier updates, but also enables other fun things which are inherent with code, such as integration with a VCS.
As a data loss prevention strategy while you perform this migration, you can create a new Lambda and point it to a staging database, delete the old Lambda, repoint your new Lambda to your production database, and push updates from your staging database into your production database. Check out the import/export docs (http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/DynamoDBPipeline.html) to see one method in which you might perform data migration.
There is no rename function for an AWS Lambda function.
You could instead try creating an alias to a Lambda function that would allow both names to function simultaneously. (This is normally used when different versions exist.)
I'm trying to deploy an API suite by using Api Gateway and implementing code in Java using lambda. Is it ok to have many ( related, of course ) lambdas in a single jar ( what I'm supposing to do ) or it is better to create a single jar for each lambda I want to deploy? ( this will became a mess very easily)
This is really a matter of taste but there are a few things you have to consider.
First of all there are limitations to how big a single Lambda upload can be (50MB at time of writing).
Second, there is also a limit to the total size of all all code that you upload (currently 1.5GB).
These limitations may not be a problem for your use case but are good to be aware of.
The next thing you have to consider is where you want your overhead.
Let's say you deploy a CRUD interface to a single Lambda and you pass an "action" parameter from API Gateway so that you know which operation you want to perform when you execute the Lambda function.
This adds a slight overhead to your execution as you have to route the action to the appropriate operation. This is likely a very fast routing but nevertheless, it adds CPU cycles to your function execution.
On the other hand, deploying the same jar over several Lambda function will quickly get you closer to the limits I mentioned earlier and it also adds administrative overhead in managing your Lambda functions as that number grows. They can of course be managed via CloudFormation or cli scripts but it will still add an administrative overhead.
I wouldn't say there is a right and a wrong way to do this. Look at what you are trying to do, think about what you would need to manage the deployment and take it from there. If you get it wrong you can always start over with another approach.
Personally I like the very small service Lambdas that do internal routing and handles more than just a single operation but they are still very small and focused on a specific type of task be it a CRUD for a database table or managing a selected few very closely related operations.
There's some nice advice on serverless.com
As polythene say's, the answer is "it depends". But they've listed the pros and cons for 4 ways of going about it:
Microservices Pattern
Services Pattern
Monolithic Pattern
Graph Pattern
https://serverless.com/blog/serverless-architecture-code-patterns/