AWS CloudFormation Rate Exceeded - amazon-web-services

I am running a multi-branch pipeline in Jenkins for CI/CD that deploys a CloudFormation stack to my AWS account. Occasionally, when multiple developers push to their branches at the same time, I receive this error on one or more branches:
com.amazonaws.services.cloudformation.model.AmazonCloudFormationException:
Rate exceeded (Service: AmazonCloudFormation; Status Code: 400; Error
Code: Throttling;
This seems to be a rate limit that Amazon has imposed on the number of requests to CloudFormation within a specified time frame.
What is the request limit of CloudFormation, and can I request a limit increase?

No - Not the requests to the cloudformation API.
Most likely the issue will be that Jenkins pipeline requesting for updates every few seconds in order to get the current status. And when you are deploying multiple stacks you will hit this error.
This is probably a bug in the Cloudformation plugin in Jenkins - you'll need to raise a ticket and ask them to implement a backoff of requests if the cfn stack is taking longer than expected, so that it doesn't keep requesting the status of the stack as often.
You could also change your Jenkinsfile's to use the aws-cli which do a better job of managing requests to AWS on cfn updates.

Related

How to delete too many Cloudformation stacks with status DELETE_COMPLETE

At my current site there is a very large number of cloudformation stacks in one account.
If we make an AWS CLI call to list all stacks, we get an error message saying the request has been dynamically throttled, and the request fails.
As per AWS documentation advice to avoid dynamic throttling, I implemented a script to download in smaller chunks, using pagination and exponential delays.
This succeeded but if we could get rid of the many stacks in DELETE_COMPLETE status, this would remove around 800 stacks and the would complete successfully.
How can I remove AWS Cloudformation stacks that are in DELETE_COMPLETE status?
We are also seeing problems in the Cloudformation console with the simplest operations timing out due to the large number of stacks. A request has been raised with AWS for this. The console is useful for development and debugging although all our deployments are automated.
I found an old forum post saying these stacks will auto-delete after 90 days, but we have 800+ of these, some much older than that, and they are still there.
If I delete one of the stacks with a CLI call, like this:
aws cloudformation delete-stack --stack-name arn:aws:cloudformation:eu-west-1:123456789:stack/my-stack-name-here/87654321-1aaa-11aa-00a1-0aa1a0000000
The delete call terminates with no errors but the stack remains as it was.
I can see the call has executed in Cloudtrail.
It looks like the delete-stack operation does nothing if the status is already set to DELETE_COMPLETE.
We need to delete these stacks because there are about 800 of them and we have so many stacks that the console is giving us errors for the simplest tasks, like searching for a stack to edit it.
We did increase the quota size (max number of stacks) via an AWS request but the throttling kicks in when we try to list them all, because there are so many of them.
I found an old forum post saying they will auto-delete after 90 days
Deleted stack records expiring after a certain amount of time is currently the only way deleted stack records can be removed
We have increased the quota size (max number of stacks) via an AWS request
The quota for max number of stacks only applies to active stacks, so this is unrelated
the throttling kicks in when use the console for ordinary actions because there are so many of them
The console has a stack status dropdown next to the search bar to filter by stack status

How to manually rollback CloudFormation deployment of Lambda functions?

In my CodePipeline, I am creating a CloudFormation ChangeSet and then executing it to deploy Lambda functions. It doesn't seem like CloudFormation saves the old ChangeSets so that I can revert to an old version. Am I wrong?
CloudFormation does automatically rollback when it fails to create/execute the ChangeSet due to IAM permission issues and such but I want the ability to manually rollback in case I deploy a buggy function.
You could use rollback triggers in AWS CloudFormation to detect failed tests in your code, via Amazon CloudWatch metrics and alarms, and perform an automated rollback.
Your application code would need to be modified to perform the tests upon deployment, and then write the metric values into Amazon CloudWatch.
There are a couple limits you'll want to be aware of:
Maximum of five (5) rollback configurations per CloudFormation stack
Monitoring time: 0 - 180 minutes (3 hours)

AWS throttling for Code Commit

I am getting below error when I'm doing git operations on a Code Commit repository. The number of operations is in the range of tens in few minutes - adding/removing/pulling files.
Is this because of AWS throttling or something else?
If so, what's the limit and how do I increase it in AWS?
"interim_desc": "RequestId: 12e27770db854bf0a6034cd6f851717d. 'git fetch origin --depth 20' returned with exit code 128.
error: RPC failed; HTTP 429 curl 22 The requested URL returned error: 429 Too Many Requests: The remote end hung up unexpectedly'"
Here is the manual how to handle 429 error while accessing CodeCommit:
Access error: “Rate Exceeded” or “429” message when connecting to a CodeCommit repository
https://docs.aws.amazon.com/codecommit/latest/userguide/troubleshooting-ae.html#troubleshooting-ae3
I would copy here the most noteable part:
Implement jitter in requests, particularly in periodic polling requests.
If you have an application that is polling CodeCommit periodically and this application is running on multiple Amazon EC2 instances, introduce jitter (a random amount of delay) so that different Amazon EC2 instances do not poll at the same second. We recommend a random number from 0 to 59 seconds to evenly distribute polling mechanisms across a one-minute timeframe.
......
Request a CodeCommit service quota increase in the AWS Support Center.
To receive a service limit increase, you must confirm that you have already followed the suggestions offered here, including implementation of error retries or exponential backoff methods. In your request, you must also provide the AWS Region, AWS account, and timeframe affected by the throttling issues.

AWS Lambda using API Gateway error message

Everything was working yesterday and I'm simply still testing so my capacity shouldn't be high to begin with but I keep receiving these errors today:
{
Message = "We currently do not have sufficient capacity in the region you requested. Our system will be working on provisioning
additional capacity. You can avoid getting this error by temporarily
reducing your request rate.";
Type =Service;
}
What is this error message and should I be concerned that something like this would happen when I go into production? This is a serious error because my users are mandated to login using calls to api gateway (utilizing aws lambda).
This kind of error should not last long as it will immediately trigger AWS provision request.
If you concern about your api gateway availbility, consider to create redundant lambda function on other regions and switch whenever this error occurs. However calling lambda from a remote region can introduce long latency.
Another suggestion is, please review the aws limits for API gateway and Lambda services in your account. If your requests do exceed the limits, raise ticket to aws to extend it.
Amazon API Gateway Limits
Resource Default Limit
Maximum APIs per AWS account 60
Maximum resources per API 300
Maximum labels per API 10
Increase the limits is free service in aws.
Refer: Amazon API Gateway Limits
AWS Lambda posted an event on the service health dashboard, so please follow this for further details on that specific issue.
Unfortunately, if you want to return a custom code when Lambda errors in this way you would have to write a mapping template and attach it to every integration response where you used a Lambda integration.
We recognize that this is suboptimal and is work most customers would prefer API Gateway just handle for them. With that in mind, we already have a high priority item on our backlog to make it easier to pass through the status codes from the Lambda integration. I cannot, however, commit to a timeframe as to when this would be available.

Error while submitting aws emr job from command line

I am getting below error form aws emr. I have submitted job from cli. The job status is pending.
A client error (ThrottlingException) occurred when calling the ListSteps operation: Rate exceeded
How to see the all the active jobs in emr cluster and how to kill them from cli and also from the aws console.
Regards
sanjeeb
AWS APIs are rate limited. According to the AWS docs the recommended approach to dealing with a throttling response is to implement exponential backoff in your retry logic means when you get ThrottlingException try to catch it and sleep for some time say half second and then retry ...