AWS throttling for Code Commit - amazon-web-services

I am getting below error when I'm doing git operations on a Code Commit repository. The number of operations is in the range of tens in few minutes - adding/removing/pulling files.
Is this because of AWS throttling or something else?
If so, what's the limit and how do I increase it in AWS?
"interim_desc": "RequestId: 12e27770db854bf0a6034cd6f851717d. 'git fetch origin --depth 20' returned with exit code 128.
error: RPC failed; HTTP 429 curl 22 The requested URL returned error: 429 Too Many Requests: The remote end hung up unexpectedly'"

Here is the manual how to handle 429 error while accessing CodeCommit:
Access error: “Rate Exceeded” or “429” message when connecting to a CodeCommit repository
https://docs.aws.amazon.com/codecommit/latest/userguide/troubleshooting-ae.html#troubleshooting-ae3
I would copy here the most noteable part:
Implement jitter in requests, particularly in periodic polling requests.
If you have an application that is polling CodeCommit periodically and this application is running on multiple Amazon EC2 instances, introduce jitter (a random amount of delay) so that different Amazon EC2 instances do not poll at the same second. We recommend a random number from 0 to 59 seconds to evenly distribute polling mechanisms across a one-minute timeframe.
......
Request a CodeCommit service quota increase in the AWS Support Center.
To receive a service limit increase, you must confirm that you have already followed the suggestions offered here, including implementation of error retries or exponential backoff methods. In your request, you must also provide the AWS Region, AWS account, and timeframe affected by the throttling issues.

Related

Rate limiting / scheduling AWS Cognito operations to avoid TooManyRequestsException

AWS Cognito UserUpdate related operations have a quota of 25 requests per second (a hard limit which can't be increased)
I have a Lambda function which gets 1000 simultaneous requests and is responsible for calling Cognito's AdminUpdateUserAttributes operation. as a result, some requests pass and some fails do to TooManyRequestsException.
Important to note that these 1000 requests happens on a daily basis, one time on each day in the morning. there are no requests at all during the entire day.
Our stack is completely serverless and managed by cloudformation (with serverless framework) and we tend to avoid using EC2 if possible.
What is the best way to handle these daily 1000 requests so that they will be handled as soon a I get them, while avoiding failures due to TooManyRequestsException
A solution I tried:
A lambda that receives the requests and sends them to an SQS + another lambda with reserved concurrency of 1 that is triggered from events in the SQS which calls Congito's AdminUpdateUserAttributes operation.
This solution partially worked as I didn't get TooManyRequestsException exceptions anymore but looks like some of the messages got lost in the way (I think that is because SQS got throttled).
Thanks!
AWS recommends exponential backoff with jitter for any API operations that are rate-limited or produce retryable failures.
Standard queues support a nearly unlimited number of API calls per second, per API action (SendMessage, ReceiveMessage, or DeleteMessage).
are you sure the SQS got throttled?
another option to increase failed lambda retires.

AWS lambda to Confluent Cloud latency issues

I am currently using basic version of cluster on Confluent cloud and I only have one topic with 9 partitions. I have a REST Api that’s setup using AWS lambda service which publishes messages to Kafka.
Currently i am stress testing pipeline with 5k-10k requests per second, I found that Latency is shooting up to 20-30 seconds to publish a record of size 1kb. Which is generally 300 ms for a single request.
I added producer configurations like linger.ms - 500 ms and batch.size to 100kb. I see some improvement (15-20 seconds per request) but I feel it’s still too high.
Is there anything that I am missing or is it something with the basic cluster on confluent cloud? All of the configurations on the cluster were default.
Identified that the issue is with API request which is getting throttled. As mentioned by Chris Chen, due to the exponential back-off strategy by AWS SDK the avg time is shooting up. Requested AWS for increase in concurrent executions. I am sure it should solve the issue.

Getting ThrottlingException: Rate exceeded, status code: 400 on AWS API

I have 110 Fargate tasks running (not always in parallel). I am trying to get lambdas from other AWS accounts (through CrossAccountRole) using "ListFunctions" call as guided on the AWS SDK - https://docs.aws.amazon.com/sdk-for-go/api/service/lambda/#Lambda.ListFunctions
I sometimes get a Throttling error, while making the (SDK) API call:
ThrottlingException: Rate exceeded, status code: 400
Also have gone through this solution - https://docs.aws.amazon.com/general/latest/gr/api-retries.html
Wanted to understand whether the AWS SDK Service (lambda) already implements it. Do I need a custom implementation of retries in my case or just increase the rate limit of Fargate?
For the docs you posted:
In addition to simple retries, each AWS SDK implements exponential backoff algorithm for better flow control.
This is further clarified in the aws blog:
Note: Each AWS SDK implements automatic retry logic and exponential backoff algorithms.
Do I need a custom implementation of retries in my case or just increase the rate limit of Fargate?
If you use AWS SDK, you don't have to implement anything special. However, you your exception can be related to AWS Lambda function scaling:
When requests come in faster than your function can scale, or when your function is at maximum concurrency, additional requests fail with a throttling error (429 status code).
Thus if you think that you are hitting your concurrency limits on lambda, which could be due to large number of Fargate tasks, you can consider asking the AWS support for its increase. Default limit is 1000, which seems to be sufficient for your tasks. But maybe the other account is also running other lambdas. The 1000 limit is for all functions in an account and region.

AWS CloudFormation Rate Exceeded

I am running a multi-branch pipeline in Jenkins for CI/CD that deploys a CloudFormation stack to my AWS account. Occasionally, when multiple developers push to their branches at the same time, I receive this error on one or more branches:
com.amazonaws.services.cloudformation.model.AmazonCloudFormationException:
Rate exceeded (Service: AmazonCloudFormation; Status Code: 400; Error
Code: Throttling;
This seems to be a rate limit that Amazon has imposed on the number of requests to CloudFormation within a specified time frame.
What is the request limit of CloudFormation, and can I request a limit increase?
No - Not the requests to the cloudformation API.
Most likely the issue will be that Jenkins pipeline requesting for updates every few seconds in order to get the current status. And when you are deploying multiple stacks you will hit this error.
This is probably a bug in the Cloudformation plugin in Jenkins - you'll need to raise a ticket and ask them to implement a backoff of requests if the cfn stack is taking longer than expected, so that it doesn't keep requesting the status of the stack as often.
You could also change your Jenkinsfile's to use the aws-cli which do a better job of managing requests to AWS on cfn updates.

WSO2 API Manager 1.10 Throttling tier limits are not performing properly in a gateway cluster setup

We are using wso2am-1.10.0 with a gateway cluster. The throttling tier limits are not behaving as expected.
We have redefined the “Bronze” tier to allow 3 requests in 1 minute. The application is set to “Large”, the subscription to the API is set to “Bronze” and the resource is set to “Unlimited” . We understand that “Bronze” is the most restrictive tier in this case.
We have tested the API with and without a load-balancer:
With load-balancer:
We are always allowed to exceed the 3 requests by minute and then the throttling behaviour became inconsistent. Although it does not seem to be a multiple of the number of gateway workers.
Without load-balancer:
We call our API directly through a gateway worker of our cluster, the worker allows 4 requests(app-tier limit + 1) before returning a quota failure. Then if we call this API, within the same minute, but through an other gateway worker of our cluster, this worker allows one request before failing because of the quota limit.
We have tested the API without clustering the gateways and it works as expected.
Any ideas?
Thanks in advance!