Has anyone ever had to exceed 1000 concurrent executions in lambda? - amazon-web-services

I'm currently using ~500 concurrent executions and this tends to reach up to 5000 easily, is this a long term problem or is it relatively easy to make a quota increase request to AWS?

Getting quota increases is not difficult, but it’s also not instantaneous. In some cases the support person will ask for more information on why you need the increase (often to be sure you aren’t going too far afoul of best practices), which can slow things down. Different support levels have different response times too. So if you are concerned about it you should get ahead of it and get the increase before you think you’ll need it.

To request an increase:
In the AWS management console, select Service Quotes
Click AWS Lambda
Select Concurrent executions
Click Request quota increase

Related

AWS X-Ray monthly trace threshold

I would like to set a monthly threshold on the number of traces collected by AWS X-Ray (mainly to avoid unexpected expenses).
It seems that sampling rules enable us to limit the trace ingestion but they use one second window.
https://docs.aws.amazon.com/xray/latest/devguide/xray-console-sampling.html
But setting a limit on the number of traces per seconds might cause me to loose some important traces. Basically the one second window seems unreasonably narrow and I would rather set the limits for a whole month.
Is there any way to achieve that?
If not, does anyone know the reason why AWS does not enable that?
(Update)
Answer by Lei Wang confirms that it is not possible and speculates about the possible reasons (see the post for details).
Interestingly log analytics workspaces in azure have this functionality so it should likely not be impossible to add something similar to AWS X-Ray.
XRay right now supports 2 basic sampling behaviors:
ratio
limit the sampled per second
These 2 can be used together in or relationship to become the 3rd behavior: ratio + reservoir. For example, 1/s reservoir + 5% ration. Means sample at least 1 trace / second, then if the throughput is over 1/second, sample additional 5%.
The reason XRay does not support more sampling behavior like you mentioned limit per month I guess because technically it is not easy to implement and not sure whether it is a common user requirement. Because XRay is not able to guarantee customer would not reboot application within 1 month. Even user say his application would never reboot. XRay SDK still need communication mechanism to calculate the total traces across fleet. So, the only possible workaround is user application keeps tracking how many traces have been in XRay backend in total by periodically query.

Is concurrency on Serverless (like Google Cloud Run) pointless?

As far as I can tell by default, on Google Cloud and presumably elsewhere, each vCPU = 1 hyperthread. (3rd paragraph in the intro) Which, from my perspective, would suggest that unless one changes this setting to 2 or 4 vCPUs, concurrency in the code running on the docker image achieves nothing. Is there some multi-threaded knowledge im missing that means that concurrency on a single hyperthread accomplishes something? scaling up the vCPU number isnt very attractive as the minimum memory setting is already forced to 2GB for 4 vCPUs
This question is framed based on the Google Cloud tech stack, but is meant to umbrella all providers.
Do Serverless solutions ever really benefit from concurrency?
EDIT:
The accepted answer is a great first look, but I realized my above assumptions ignored context switching idle time. For example:
If we wish to write a backend which talks to a database, a lot of our compute time might be spent idling for the database request results. context switching to the next request in this case would allow us to fill CPU load more efficiently.
Therefore, depending on the use case, even on a single threaded vCPU our Serverless app can benefit from concurrency
I wrote this. From my experience, YES, you can handle several thread in parallel and your performance increase with the number of CPU. however, you need to have a process that support multithread.
In case of Cloud Run, each request can be processed in a thread, parallelization is easy.

How Could I Monitor Lambda Concurrent Executions on a Second-by-Second Basis (or Find a Better Solution to Limit Lambda ConcurrentExecutions)?

I am working on a massive distributive computing platform built within AWS Lambda. The platform is extremely spiky, so most of the time the number of ConcurrentExecutions is below 50, but we can hit maximum (1000 currently) for up to an hour or more if a large batch job hits the system (it is an event-driven system). This is a problem as we will have customer-facing APIs that will lag terribly. Finally, I am not an architect, so I have minimal control over how the system was designed, but I have been asked to devise a clever Concurrent Execution limiting solution
I'm not new to AWS, so I know about the standard ways to handle this problem. #1 is reserve concurrency on the user-facing lambdas. I'm not allowed to do that for the sake of this exercise (though I'll go tell my boss thats whats necessary if it truly is). I'm thinking of a system where we designate high-priority (for UI) and low priority functions (for batch processing), and the low-priority functions will check a stored (DynamoDB) value output from Cloudwatch on the current number of ConcurrentExecutions. If a low priority function finds that we are in danger of using all the ConcurrentExecutions, it will post to a queue with exponential backoff in place. This all should work, save the problem that ConcurrentExecutions are only monitored in one-minute increments, which is too slow, as many of our Lambdas run for around 500ms.
So my questions are as follows:
Is there a way to set up a custom ConcurrentExecutions metric that has second-by-second data points, and if so, how would you do it?
Is there a better way to implement a counter than Cloudwatch?
Am I just missing something here and someone has a clever way to manage Lambda ConcurrentExecutions
I don't think it's necessary to create a monitor or throttling solution at all. You will need to to build test and maintain something additional to your core solution. Instead, two suggestions:
Sounds like the current design has one lambda function doing too much. Decompose the Lambdas further, so you can split the Lambdas into a Ui/public lambda, and one or more dedicated to the batch processes. This way you can spread the concurrent execution limit across more Lambdas. The limit is per Lambda function.
Second, request a service quota/limit increase
To raise the limit above 1,000 concurrent function executions, submit a request to the AWS Support Center by following the steps in our documentation. This feature is available in all regions where Lambda is available.
See AWS Lambda Raises Default Concurrent Execution Limits.
https://aws.amazon.com/about-aws/whats-new/2017/05/aws-lambda-raises-default-concurrent-execution-limit/
The limit management team is very flexible when asking for a limit to be raped they were generally raise it to any reasonable number that our solution requires.
To request a limit increase, see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-resource-limits.html

bigstore increasing almost linearly Google Cloud

I use many api's from Google Cloud. Recently I noticed that the bigstore is gradually increasing on a daily basis. I am worried that if this continues I wont be able to pay the bill.
I do not know however how to check where this increase is coming from. Is there a way to see which cloud functions are causing this increased traffic?
The reason I am surprised about the increase in the traffic of bigstore is because I have cron jobs that are running multiple times per day to store the data in BigQuery. I have not changed these settings, so I would assume that this traffic should not increase as shown on the chart.
One other explanation I can think of is that the amount of data that I am storing has increased, which is indeed true on a daily basis. But why does this increase the traffic?
What is the way to check this?
There are two main data sources you should use:
GCP-wide billing export. This will tell you an exact breakdown of your costs. This is important to make sure you target your effort where the cost is largest to you. It also provides some level of detail about what the usage is.
Enable access & storage logging. The access log will give you an exact accounting of incoming requests down to the number of bytes transferred. The storage logs give you similar granularity into the cost of storage itself.
In addition, if you have a snapshot of your bigstore, as time goes on and you replace or even rename files, your storage charges will increase because where once you had 2 views of the same storage, as the files change each file forks in 2 copies (one is the current view of your storage, one is the snapshot.)

How to relieve a rate-limited API?

We run a website which heavily relies on the Amazon Product Advertising API (APAA). What happens is that when we experience a sudden spike in users it happens that we hit the rate-limit and all functions relying on the APAA shut down for a while. What can we do so that doesn't happen?
So, obviously we have some basic caching in place, but the APAA doesn't allow us to cache data for a very long time, and APAA queries can vary a lot so there may not be any cached data at all to query.
I think that your only option is to retry the API calls until they work — but do so in a smart way. Unfortunately, that's what everybody that gets throttled does and AWS expects people to handle that themselves.
You can implement an exponential backoff and add jitter to prevent cluster calls. AWS has a great blog post about solutions for this kind of problem: https://www.awsarchitectureblog.com/2015/03/backoff.html