How to handle backpressure using google cloud functions - concurrency

Using google cloud functions, is there a way to manage execution concurrency the way AWS Lambda is doing? (https://docs.aws.amazon.com/lambda/latest/dg/concurrent-executions.html)
My intent is to design a function that consumes a file of tasks and publish those tasks to a work queue (pub/sub). I want to have a function that consumes tasks from the work queue (pub/sub) and execute the task.
The above could result in a large number of almost concurrent execution. My dowstream consumer service is slow and cannot consume many concurrent requests at a time. In all likelyhood, it would return HTTP 429 response to try to slow down the producer.
Is there a way to limit the concurrency for a given Google Cloud functions the way it is possible to do it using AWS?

This functionality is not available for Google Cloud Functions. Instead, since you are asking to handle the pace at which the system will open concurrent tasks, Task Queues is the solution.
Push queues dispatch requests at a reliable, steady rate. They guarantee reliable task execution. Because you can control the rate at which tasks are sent from the queue, you can control the workers' scaling behavior and hence your costs.
In your case, you can control the rate at which the downstream consumer service is called.

This is now possible with the current gcloud beta! You can set a max that can run at once:
gcloud beta functions deploy FUNCTION_NAME --max-instances 10 FLAGS...
See docs https://cloud.google.com/functions/docs/max-instances

You can set the number of "Function invocations per second" with quotas. It's documented here:
https://cloud.google.com/functions/quotas#rate_limits
The documentation tells you how to increase it, but you can also decrease it to achieve the kind of throttling that you are looking for.

You can control the pace at which cloud functions are triggered by controlling the triggers themselves. For example, if you have set "new file creation in a bucket" as trigger for your cloud function, then by controlling how many new files are created in that bucket you can manage concurrent execution.
Such solutions are not perfect though because sometimes the cloud functions fails and get restart automatically (if you've configure your cloud function that way) without you having any control over it. In effect, the number of active instances of cloud functions will be sometimes more than you plan.
What AWS is offering is a neat feature though.

Related

Software or managed service for AWS Lambda job scheduling

I have a relatively large number of tasks that need to be executed at certain intervals, hourly, daily, weekly etc. These tasks are easily defined as AWS Lambda functions and I can schedule them easily enough with AWS Eventbridge.
However, in many cases jobs can fail due to delayed or missing data or other micro services going down. Take, for example, a function that is configured to run every hour and process data from hour X to hour X+1 and serialize to some data store (the ETL use case). Suppose at 1am some service becomes unavailable and the job fails until engineering is able to address the issue at 10am, at which point the code for the lambda is updated.
The desired behavior would be for that job to pick up where it left off and quickly catch up and process data from 1am to 10am (sequentially).
It would be relatively straightforward to implement some state-tracking service manually, where interval success/fails are tracked and can be checked and registered via simple API calls. My question is whether there is existing software for this sort of application/service, as far as I can tell Apache Airflow can do this but it also comes with significantly more complexity and overhead than is needed.
Two options come to mind:
Track state of your application with AWS Step Functions. You can implement coordination between Lambda functions, add parallel or sequential processing etc. Step Functions also support error handling and have built-in retry mechanisms.
Depending on the volume and velocity of data you ingest, you could go with Amazon SQS or Amazon Kinesis to stream the data to Lambda functions. With SQS, you could use retry for every message. If the message couldn't be processed, you can put it into Dead-Letter Queue (DLQ) for further investigation. Also, this approach is highly scalable and allows parallel execution of jobs.

Running thousands of scheduled jobs in AWS on a regular cadence?

I'm architecting an application solution in AWS and am looking into options AWS has for running one-off jobs to run on a regular schedule.
For example, we have a task that needs to run every 5 minutes that does an API call to an external API, interprets the data and then possibly stores some new information in a database. This particular task is expected to run for 30 seconds or so and will need to be run every 5 minutes. Where this gets a little more complex is we're running a multi-tenant application and this task needs to be performed for each tenant individually. It doesn't satisfy the user's requirements to have a single process do the specified task for each tenant in sequence. The task must be performed every x minutes (sometimes as low as every minute) and it must complete for each tenant as quickly as it takes to perform the task exactly 1 time. In other words, all 200, let's say, tenants must have a task run for them at midnight that each have their task complete in the time it takes to query the API and update the database for one tenant.
To add to the complexity a bit, this is not the only task we will be running on a regular schedule for our tenants. In the end we could have dozens of unique tasks, each running for hundreds of tenants, resulting in thousands or tens of thousands of unique concurrent tasks.
I've looked into ECS Scheduled Tasks which uses CloudWatch Events (which is now the EventBridge) but the EventBridge has a limit of 300 rules per event bus. I think that means we're going to be out of luck if we need to have 10,000 rules (one for each task * the number of tenants), but I'm honestly not sure whether each account gets its own event bus or if that's divided up differently.
In any case, even if this did work, it's still not a very attractive option to me to have 10,000 different rules set up in the EventBridge. At least, it feels like it might be difficult to manage. To that end I'm now more so looking into just creating a single EventBridge rule per event type that will kick off a parent task, that in turn asynchronously kicks off as many asynchronous instances of a child task that is needed, one per tenant. This would limit our EventBridge rules to somewhere around a few dozen. Each one of these, when triggered, would asynchronously spawn a task for each tenant that can all run together. I'm not 100% sure on what type of object this will spawn, it wouldn't be a Lambda since that would easily cause us to hit the 1,000 concurrent Lambda function limit but it might be something like a Fargate ECS task that executes for a few seconds then goes away when it's completed.
I'd love to hear others thoughts on these options, my current direction and any other options I'm currently missing.
You don't necessarily need to look at ECS for this, because 1,000 invocations of a Lambda at a time is only the default concurrency limit. That is something you can request an increase for in the Service Quotas console:
There is no maximum concurrency limit for Lambda functions. However, limit increases are granted only if the increase is required for your use case.
Source: AWS Support article.
Same goes for the 300 rules per event bus limit. That is also a default limit and can be increased upon request in the Service Quotas console.
Since you mentioned branching logic, I wonder if you've looked into AWS Step Functions? In particular, Express Workflows within Step Functions may suit the duration and rates of your executions.

AWS Serverless: Force parallel lambda execution based on request or HTTP API parameters

Is there a way to force AWS to execute a Lambda request coming from an API Gateway resource in a certain execution environment? We're in a use-case where we use one codebase with various models that are 100-300mb, so on their own small enough to fit in the ephemeral storage, but too big to play well together.
Currently, a second invocation with a different model will use the existing (warmed up) lambda function, and run out of storage.
I'm hoping to attach something like a parameter to the request that forces lambda to create parallel versions of the same function for each of the models, so that we don't run over the 512 MB limit and optimize the cold-boot times, ideally without duplicating the function and having to maintain the function in multiple places.
I've tried to investigate Step Machines but I'm not sure if there's an option for parameter-based conditionality there. AWS are suggesting to use EFS to circumvent the ephemeral storage limits, but from what I can find, using EFS will be a lot slower than reading from the ephemeral /tmp/ directory.
To my knowledge: no. You cannot control the execution environments. Only thing you can do is limit the concurrent executions.
So you never know, if it is a single Lambda serving all your events triggered from API Gateway or several running in parallel. You also have no control over which one of the execution environments is serving the next request.
If your issues is the /temp directory limit for AWS Lambda, why not try EFS?

Best temporary storage option for count value (preferably not a DB but a service) on AWS while running a SQS triggered function?

The situation
Currently, I am using an Amazon SQS queue that triggers a Lambda function to process new messages upon arrival to the queue. Those Lambda functions are being moved to a DLQ (Dead-Letter Queue) upon failure.
In-order to seed the SQS queue, I am using a CRON that runs every day and inserts the available jobs into the queue.
I want to issue a summarizing alert/email once the processing of all the new jobs the CRON has inserted for the day are done or been processed, along with the details about how many successful, failing and total jobs were originally issued in that day.
The problem:
As the Lambda functions run separately, and the fact that I want to keep it that way, I was wondering what would be the best service to use in order to store the temporary count value (at least two out of the three counts are needed among the total, succeeding and failing counts)?
I was thinking about DynamoDB, but every DB seems to be an overkill for that, and won't be cost-effective either. S3 also doesn't seem to be the most practical/preferred for this type of solution. I can also use SQS (as its "storage" is somewhat designed for cases with relatively small data storage such as these) with an identifier "count" that will be updated by every Lambda function, but knowing which Lambda function was the last requires checking the whole queue, which seems like over-complicating that.
Any other AWS service that comes up to mind?
Here is a good listing of Storage Options in the AWS Cloud (2013, but includes some of that options available today as well).
AWS Systems Manager Parameter Store can be used as a 'mini-database'.
It requires AWS credentials to access (which would be available to the Lambda functions or whatever code you are running to perform this check) but has no operational cost.
From PutParameter - AWS Systems Manager:
Parameter Store offers a standard tier and an advanced tier for parameters. Standard parameters have a content size limit of 4 KB and can't be configured to use parameter policies. You can create a maximum of 10,000 standard parameters for each Region in an AWS account. Standard parameters are offered at no additional cost.
You could run into problems if multiple processes try to update the parameters simultaneously, but hopefully your use-case is pretty simple.

Serverless Task Scheduling on AWS

So our project was using Hangfire to dynamically schedule tasks but keeping in mind auto scaling of server instances we decided to do away with it. I was looking for cloud native serverless solution and decided to use CloudWatch Events with Lambda. I discovered later on that there is an upper limit on the number of Rules that can be created (100 per account) and that wouldn't scale automatically. So now I'm stuck and any suggestions would be great!
As per CloudWatch Events documentation you can request a limit increase.
100 per region per account. You can request a limit increase. For
instructions, see AWS Service Limits.
Before requesting a limit increase, examine your rules. You may have
multiple rules each matching to very specific events. Consider
broadening their scope by using fewer identifiers in your Event
Patterns in CloudWatch Events. In addition, a rule can invoke several
targets each time it matches an event. Consider adding more targets to
your rules.
If you're trying to create a serverless task scheduler one possible way could be:
CloudWatch Event that triggers a lambda function every minute.
Lambda function reads a DynamoDB table and decide which actions need to be executed at that time.
Lambda function could dispatch the execution to other functions or services.
So I decided to do as Diego suggested, use CloudWatch Events to trigger a Lambda every minute which would query DynamoDB to check for the tasks that need to be executed.
I had some concerns regarding the data that would be fetched from dynamoDb (duplicate items in case of longer than 1 minute of execution), so decided to set the concurrency to 1 for that Lambda.
I also had some concerns regarding executing those tasks directly from that Lambda itself (timeouts and tasks at the end of a long list) so what I'm doing is pushing the tasks to SQS each separately and another Lambda is triggered by the SQS to execute those tasks parallely. So far results look good, I'll keep updating this thread if anything comes up.