I am currently using basic version of cluster on Confluent cloud and I only have one topic with 9 partitions. I have a REST Api that’s setup using AWS lambda service which publishes messages to Kafka.
Currently i am stress testing pipeline with 5k-10k requests per second, I found that Latency is shooting up to 20-30 seconds to publish a record of size 1kb. Which is generally 300 ms for a single request.
I added producer configurations like linger.ms - 500 ms and batch.size to 100kb. I see some improvement (15-20 seconds per request) but I feel it’s still too high.
Is there anything that I am missing or is it something with the basic cluster on confluent cloud? All of the configurations on the cluster were default.
Identified that the issue is with API request which is getting throttled. As mentioned by Chris Chen, due to the exponential back-off strategy by AWS SDK the avg time is shooting up. Requested AWS for increase in concurrent executions. I am sure it should solve the issue.
Related
I am new to AWS and having some difficulty tracking down and resolving some latency we are seeing on our API. Looking for some help diagnosing and resolving the issue.
Here is what we are seeing:
If an endpoint hasn't been hit recently, then on the first request we see a 1.5-2s delay marked as "Initialization" in the CloudWatch Trace.
I do not believe this is a cold start, because each endpoint is configured to have 1 provisioned concurrency, so we should not get a cold start unless there are 2 simultaneous requests. Also, the billed duration includes this initialization period.
Cold start means when your first request hit to aws lambda it will be prepared container to run your scripts,this will take some time and your request will delay.
When second request hit lambda and lambda and container is already up and runing will be process quickly
https://aws.amazon.com/blogs/compute/operating-lambda-performance-optimization-part-1/
This is the default behavior of cold start, but since you said that you were using provisioned concurrency, that shouldn't happen.
Provisioned concurrency has a delay to activate in the account, you can follow this steps to verify if this lambda used on demand or provisioned concurrency.
AWS Lambda sets an ENV called AWS_LAMBDA_INITIALIZATION_TYPE that contain the values on_demand or provisioned_concurrency.
I have 110 Fargate tasks running (not always in parallel). I am trying to get lambdas from other AWS accounts (through CrossAccountRole) using "ListFunctions" call as guided on the AWS SDK - https://docs.aws.amazon.com/sdk-for-go/api/service/lambda/#Lambda.ListFunctions
I sometimes get a Throttling error, while making the (SDK) API call:
ThrottlingException: Rate exceeded, status code: 400
Also have gone through this solution - https://docs.aws.amazon.com/general/latest/gr/api-retries.html
Wanted to understand whether the AWS SDK Service (lambda) already implements it. Do I need a custom implementation of retries in my case or just increase the rate limit of Fargate?
For the docs you posted:
In addition to simple retries, each AWS SDK implements exponential backoff algorithm for better flow control.
This is further clarified in the aws blog:
Note: Each AWS SDK implements automatic retry logic and exponential backoff algorithms.
Do I need a custom implementation of retries in my case or just increase the rate limit of Fargate?
If you use AWS SDK, you don't have to implement anything special. However, you your exception can be related to AWS Lambda function scaling:
When requests come in faster than your function can scale, or when your function is at maximum concurrency, additional requests fail with a throttling error (429 status code).
Thus if you think that you are hitting your concurrency limits on lambda, which could be due to large number of Fargate tasks, you can consider asking the AWS support for its increase. Default limit is 1000, which seems to be sufficient for your tasks. But maybe the other account is also running other lambdas. The 1000 limit is for all functions in an account and region.
I've had a Cloud Function that makes Vision API requests, specifically Document Text Detection requests. My peak requests rate is usually around ~120-150 requests per minute on an average day.
I've suddenly been getting resource quota exceeded errors for Vision API requests with a request rate at 2500 requests per minute. Some things to note:
I've had no code changes in 3 months
I deleted and redeployed the Cloud Function making these requests to stop any problematic image that was causing a runaway loop
My code calling the API nor the cloud functions themselves were getting retried so there really wasn't a way that I could exponentially increase my request rate overnight with no changes introduced.
The service account making the Vision calls is making the normal amount of requests and is only in use by the cloud function i.e. not being used by someone's local script
I've since turned on retries to mitigate this issue since it'll "work" with exponential back off but this is expensive to do, especially with the vision API. Is there anything I can do to find out the root cause of this issue?
To identify the specific quota being exceeded, Stackdriver API helps by using Monitoring quota metrics as explained here .
GCP lets you specify quota being exceeded in greater depth using the Stackdriver API and UI, with quota metrics appearing in the Metrics Explorer.
I just started using AWS services. I want to receive notifications if any service usage exceeds limit. After searching for the options I found that same can be achieved suing AWS Cloudwatch alarm and AWS Limit Monitor using AWS CloudFormation. My question is, will i be charged if i use these services to receive notifications?
Yes, you can setup all kinds of notifications to keep a handle on what you are being billed, but that doesn't stop you from actually getting billed if you exceed your limits.
For example I have alerts to notify me when I reach 25%, 50%, 75% and 100% of my typical monthly spend - so I roughly should get one notification each week - but a lot can happen between when you get sent the notification, and when you take action - especially if, for example, someone got access to your account and started crypto-mining on some big ec2 instances.
In the DynamoDB documentation and in many places around the internet I've seen that single digit ms response times are typical, but I cannot seem to achieve that even with the simplest setup. I have configured a t2.micro ec2 instance and a DynamoDB table, both in us-west-2, and when running the command below from the aws cli on the ec2 instance I get responses averaging about 250 ms. The same command run from my local machine (Denver) averages about 700 ms.
aws dynamodb get-item --table-name my-table --key file://key.json
When looking at the CloudWatch metrics in the AWS console it says the average get latency is 12 ms though. If anyone could tell me what I'm doing wrong or point me in the direction of information where I can solve this on my own I would really appreciate it. Thanks in advance.
The response times you are seeing are largely do to the cold start times of the aws cli. When running your get-item command the cli has to get loaded into memory, fetch temporary credentials (if using an ec2 iam role when running on your t2.micro instance), and establish a secure connection to the DynamoDB service. After all that is completed then it executes the get-item request and finally prints the results to stdout. Your command is also introducing a need to read the key.json file off the filesystem, which adds additional overhead.
My experience running on a t2.micro instance is the aws cli has around 200ms of overhead when it starts, which seems inline with what you are seeing.
This will not be an issue with long running programs, as they only pay a similar overhead price at start time. I run a number of web services on t2.micro instances which work with DynamoDB and the DynamoDB response times are consistently sub 20ms.
There are a lot of factors that go into the latency you will see when making a REST API call. DynamoDB can provide latencies in the single digit milliseconds but there are some caveats and things you can do to minimize the latency.
The first thing to consider is distance and speed of light. Expect to get the best latency when accessing DynamoDB when you are using an EC2 instance located in the same region. It is normal to see higher latencies when accessing DynamoDB from your laptop or another data center. Note that each region also has multiple data centers.
There are also performance costs from the client side based on the hardware, network connection, and programming language that you are using. When you are talking millisecond latencies the processing time on your machine can make a difference.
Another likely source of the latency will be the TLS handshake. Establishing an encrypted connection requires multiple round trips and computation on both sides to get the encrypted channel established. However, as long as you are using a Keep Alive for the connection you will only pay this overheard for the first query. Successive queries will be substantially faster since they do not incur this initial penalty. Unfortunately the AWS CLI isn't going to keep the connection alive between requests, but the AWS SDKs for most languages will manage this for you automatically.
Another important consideration is that the latency that DynamoDB reports in the web console is the average. While DynamoDB does provide reliable average low double digit latency, the maximum latency will regularly be in the hundreds of milliseconds or even higher. This is visible by viewing the maximum latency in CloudWatch.
They recently announced DAX (Preview).
Amazon DynamoDB Accelerator (DAX) is a fully managed, highly available, in-memory cache for DynamoDB that delivers up to a 10x performance improvement – from milliseconds to microseconds – even at millions of requests per second. For more information, see In-Memory Acceleration with DAX (Preview).