Long-running AWS Lambda tasks with progress and cancellation - amazon-web-services

I have an application where I'm looking to offload the compute load to AWS, and am after some guidance on architecture. The user will initiate a main task, which contains ~100 computationally-heavy sub-tasks which can be run in parallel.
I am thinking an appropriate solution is for the desktop app to hit an API gateway endpoint to create a new task, which would then invoke many Lambdas, one for each sub-task. I would like each sub-task to have individual progress reporting, as well as the ability for the user to cancel the overall task. The user could also use the API to query the created task / hit another endpoint to cancel it.
What's an appropriate architecture / service(s) to invoke and manage these Lambda sub-tasks, access intermediate progress information from each Lambda, the final result, and allow the user to request cancelation?

You may be interested in AWS Step Functions (https://aws.amazon.com/step-functions/) for the querying and orchestration of the overall progress, and possibly use DynamoDB (https://aws.amazon.com/dynamodb/) or some other data store to allow for monitoring the progress within individual sub tasks.

Related

Is there a service or framework in Native AWS for task management?

I am looking for a service or framework in Native AWS which given, a csv file, creates a task and process that task asynchronously and returns a task id or job id to the client and notifies the client when the task is completed. Some requirements for this:
Client should be able to check the progress of the task by job id at any time.
Processing of entire task can take more than 15 mins.
There should be a way for clients to see the reasons of failures.
All the business logic would be at line item level. (this is the only thing developer should care about)
Is there any in-built service or framework for that in Native AWS? I know one can build this kind of service using some SQS, Lambda, SNS, Dynamodb but I am just looking if there is a already available AWS offering for it, which can do all of these?
The closest service to this concept is AWS Step Functions.
However, it would just be one component of a solution. You would still need to create the compute component by using Amazon EC2 or AWS Lambda. You would need to build the interface for users, add authentication, notifications, etc.
Bottom line: There is no AWS service that does what you describe. However, there are the building blocks if you wish to create one yourself.

How to handle backpressure using google cloud functions

Using google cloud functions, is there a way to manage execution concurrency the way AWS Lambda is doing? (https://docs.aws.amazon.com/lambda/latest/dg/concurrent-executions.html)
My intent is to design a function that consumes a file of tasks and publish those tasks to a work queue (pub/sub). I want to have a function that consumes tasks from the work queue (pub/sub) and execute the task.
The above could result in a large number of almost concurrent execution. My dowstream consumer service is slow and cannot consume many concurrent requests at a time. In all likelyhood, it would return HTTP 429 response to try to slow down the producer.
Is there a way to limit the concurrency for a given Google Cloud functions the way it is possible to do it using AWS?
This functionality is not available for Google Cloud Functions. Instead, since you are asking to handle the pace at which the system will open concurrent tasks, Task Queues is the solution.
Push queues dispatch requests at a reliable, steady rate. They guarantee reliable task execution. Because you can control the rate at which tasks are sent from the queue, you can control the workers' scaling behavior and hence your costs.
In your case, you can control the rate at which the downstream consumer service is called.
This is now possible with the current gcloud beta! You can set a max that can run at once:
gcloud beta functions deploy FUNCTION_NAME --max-instances 10 FLAGS...
See docs https://cloud.google.com/functions/docs/max-instances
You can set the number of "Function invocations per second" with quotas. It's documented here:
https://cloud.google.com/functions/quotas#rate_limits
The documentation tells you how to increase it, but you can also decrease it to achieve the kind of throttling that you are looking for.
You can control the pace at which cloud functions are triggered by controlling the triggers themselves. For example, if you have set "new file creation in a bucket" as trigger for your cloud function, then by controlling how many new files are created in that bucket you can manage concurrent execution.
Such solutions are not perfect though because sometimes the cloud functions fails and get restart automatically (if you've configure your cloud function that way) without you having any control over it. In effect, the number of active instances of cloud functions will be sometimes more than you plan.
What AWS is offering is a neat feature though.

AWS "Serverless" architecture for real time client-server messenging

If i understood the whole concept correctly, the "serverless" architecture assumes that instead of using own servers or containers, one should use bunch of aws services. Usually such architecture includes Amazon API Gateway, bunch of Lambda functions and DynamoDB (or alternative) for storing data and state, as Lambda can't keep state. And such services as EC2 is not participating in all this, well, because this is a virtual server and it diminish all the benefits of serverless architecture.
All this looks really cool, but i feel like i'm missing something important, because right now this seems to be not applicable for such cases as real time applications.
Say, i have 2 users online. One of them performs an action in an app, which triggers changes in database, which in turn, should trigger changes in the second user app.
The conventional way to send some data or command from server to client is websocket connection. But with serverless architecture there seem to be no way to establish and maintain websocket connection. So... where did i misunderstood the concept? Or, if i understood everything correctly, then how do i implement the interactions between 2 users as described above?
where did i misunderstood the concept?
Your observation is correct. It doesn't work out of the box using API Gateway and Lambda.
Applicable solution as described here is to use AWS IoT - yes, another AWS Service.
Serverless isn't just a matter of Lambda, API Gateway and DynamoDB, it's much bigger than that. One of the big advantages to Serverless is the operational burden that it takes off your plate. No more patching, no more capacity planning, no more config management. Those may seem trivial but doing those things well and across a significant fleet of instances is complex, expensive and time consuming. Another benefit is the economics. Public cloud leverages utility billing, meaning you pay for what you run whether or not you actually use it. With AWS most of the billing per service is by hour but with Lambda it's per 100ms. The cheapest EC2 instance running for a full month is about $10/m (double that for redundancy). $20 in Lambda pricing gets you millions of invocations so for most cases serverless is significantly cheaper.
Serverless isn't for everything though, it has it's limitations, for example it's not meant for running binaries. You can't run nginx in Lambda (for example), it's only meant to be a runtime environment for the programming languages that it supports. It's also specifically meant for event based workloads, which is perfect for microservice based architectures. Small independent discrete pieces of compute doing work that when done they send an event to another(s) to do something else and if needed return a response.
To address your concerns about realtime processing, depending on what your code is doing your Lambda function could complete in less than 100ms all the way up to 5 minutes. There are strategies to optimize it's duration time but in general it's for short lived work which is conducive of realtime scenarios.
In your example about the 2 users interacting with the web app and the db, that could very easily be built using serverless technologies with one or 2 functions and a DynamoDB table. The total roundtrip time could be as low as milliseconds if not seconds, it really all depends on your code and what it's doing. These would all be HTTP calls so no websockets needed. Think of a number of APIs calling each other and your Lambda code is the orchestrator.
You might want to look at SNS (simple notification service). In your example, if app user 2 is a a subscriber to an SNS topic, then when app user 1 makes a change that triggers an SNS message, it will be pushed to the subscriber (app user 2). The message can be pushed over several supported protocols (Amazon, Apple, Google, MS, Baidu) in addition to SMTP or SMS. The SNS message can be triggered by a lambda function or directly from a DynamoDB stream after an update (a database trigger). It's up to the app developer to select a message protocol and format. The app only has to receive messages through its native channels. This may not exactly be millisecond-latency 'real-time', but it's fast enough for all but the most latency-sensitive applications.
I've been working on an AWS serverless application for several months now, and am amazed at the variety of services available. The rate of improvement and new features being added is enough to leave you out-of-breath.

How can I expose the status of an Amazon Lambda function in my web app?

I'm hoping to use Amazon Lambda to run some background tasks for my web app. These particular tasks will only need to run once for the app (not once per user), so I'd like any user to see in the UI if a task is already running, and I'd like to disable the UI that allows them to start that task again.
Does Lambda offer a way to check the status of a function to see if it is running? If not, what is the best way to persist this info to my web app? Am I taking the wrong approach here altogether?
Lambda functions are supposed to be stateless and keeping functions stateless enables AWS Lambda to rapidly launch as many copies of the function as needed to scale to the rate of incoming events. While AWS Lambda’s programming model is stateless, your code can access stateful data by calling other web services, such as Amazon S3 or Amazon DynamoDB.

AWS IAM forced delay to start using a recently created user

I am facing a strange behavior in an AWS IAM's application to automatically create users and roles.
My sequence of operation is:
Send an action CreateUser;
Send an action CreateAccessKey for this created user;
Send an action GetUser for this created user to get the account id. I need to do this because I only have the root key and secret;
Send an action CreateRole, with a AssumeRolePolicyDocument where the Principal is this created user.
When I execute step 4, I receive a MalformedPolicyDocument (Invalid principal in policy: "AWS":"arn:aws:iam::123412341234:user/newuser").
But, if before the step 4 I put a 15 seconds delay, it runs without any problem.
Is there any workflow that I don't need to stick with a fixed delay, like read some IAM webservice to check if the user is ready to be used?
As outlined in my answer to Deterministically creating and tagging EC2 instances, the AWS APIs need to be generally treated as eventually consistent only.
Specifically, I mention that it is reasonable to assume that each and every single API action is operated entirely independently by AWS, i.e. is a micro service on its own. This explains, why even within a service like Amazon EC2 or in your case AWS Identity and Access Management (IAM), a call to one API action that results in a resource state change isn't necessarily visible to (all) other API actions within that service right away - that's precisely what you are experiencing, i.e. even though the created user is already visible for one of the other IAM APIs GetUser, it isn't yet visible for a different IAM API action CreateRole.
The correct workflow to work around this inherent characteristics is to repeat the desired API call with an Exponential backoff strategy until it succeeds (or the configured timeout is reached), which is good practice anyway in asynchronous communication scenarios. Several AWS SDKs offer integrated support for retry with exponential support meanwhile, which is usually applied transparently, but can be tailored to specific scenarios if need be, e.g. to extend whatever default timeout for very high latency scenarios etc.