Managing a 'dynamic' AWS Lambda workflow

Managing a 'dynamic' AWS Lambda workflow - amazon-web-services

Say I have a Lambda function called 'TestExecutor' which takes takes in an argument which contains ARNs for N 'Tests' which are also implemented as Lambda functions.
The workflow:
TestExecutor is invoked with a list of ARNs of various 'Tests'
TestExecutor calls each Test concurrently; each Lambda is expected to return a JSON
TestExecutor waits for each Test to complete. It consolidates all the JSONs received
Consolidated JSON is stored in DynamoDB/S3
Problem statement - What is the best way to create this kind of workflow in a Serverless manner?
I considered two AWS Services to manage this:
AWS Step Functions - My step function would need states for each possible 'Test' Lambda that can be executed. I want to give flexibility to the user to invoke any Lambda without needing to 'register' it in my Step function.
AWS SWF - Just seems a little overkill. Suffers from the same problem as above too.
So right now the best I can think of is doing this in a simple manner:
In my TestExecutor Lambda, I could create N threads for N tests each thread invokes a particular Test's Lambda function. Each thread waits for its Test to return a JSON. As all executions are successful, all JSONs are consolidated. Consolidated JSON is stored in DynamoDB.
I'm not happy with this solution - it will be a little tricky to manually manage failures and retries of the Test Lambdas from within the TestExecutor Lambda. This is my first time into trying something serverless, but it just seems like the wrong pattern. I'd like to get a nice top-down view of my workflow - it seems like monitoring this would be a little messy and scattered since there's no formal link between TestExecutor and the Test Lambdas
Maybe I could create an SQS Queue along with each Test Lambda. For each ARN supplied to the TestExecutor, I could push a message to a corresponding queue. But what now? I'd have to create 'Listener' Lambda's for each Test which polls each queue every T seconds. It would then invoke the actual Test Lambda. This also sounds needlessly complex.
Would love to hear some advice! Cheers.

AWS SWF doesn't suffer from the same problem as it doesn't require registration of a lambda function to invoke it.
The main limitation of SWF is that it is still not possible to run decider process as a lambda function. So you'll have to run it somewhere else. If you already have some host that can run it implementing your use case using AWS Flow Framework is pretty straightforward.

You could leverage the AWS SDK to generate a Step Machine using said ARNs from within a Lambda Function.
It would require some way to clean up afterwards somehow, and / or avoid duplicates, or the console would quickly get messy.

Related

What is the best way to implement AWS Lambda for async requests?

I have a lambda function that potentially many different APIs call to parse large chunks of data (potentially might take more than a few minutes) and store their results into their own separate S3.
In such case, is it better to have a copy of the same AWS Lambda function separately for each API or is it ok to have the same lambda function being called from many APIs?
The goal is to avoid queuing and have the function run asynchronously for each request.
visual reference

I'm not an expert, so perhaps other answers will help more, but I don't see why it would make a difference as long as the code involved in processing each separate call isn't enough to increase the cost of initializing an instance.
The reason I don't think it would make a difference is that lambda will initialize a new instance if it is invoked while processing another function. This approach is potentially better because you can at times have an already initialized instance from a previous request ready for the next one (although again, I'm sure there are aws experts who could confirm/deny this, you should contact AWS support if you want an authoritative answer).
Source:
If you invoke the function again while the first event is being processed, Lambda initializes another instance, and the function processes the two events concurrently. As more events come in, Lambda routes them to available instances and creates new instances as needed.
https://docs.aws.amazon.com/lambda/latest/dg/invocation-scaling.html

Enable Function Chain Response for AWS API Gateway?

The official guide for AWS API gateway introduces the way to use a lambda function to make responses to the API calls from the gateway. But it can only deal with one function, not for the condition of several functions call one by one.
For a solution, here are two to the best of my mind:
Use AWS Step Function services to bundle the function workflow.
Use one main thread function for orchestration.
Obviously, method 1 will bring extra fees, while method 2 needs a redundant function to run for long.
Could you please give me any help？

If they must be completed one after another before you can return the result, use AWS Step Functions (like you said) to orchestrate this. Synchronous invocation of the other Lambda is also an option (though it's commonly an architectural smell and indicates other issues with your architecture - why are the Lambdas not combined if they need each other etc.)
For asynchronous invocation of the other function (fire and forget), either invoke the Lambda using invocation type Event or even better would be to have an SQS queue in between the Lambdas for improved resiliency.

What is the best way to internally call an AWS Lambda function within another Lambda function?

Seems a little inefficient the way it currently is:
response.body = {
user: await userService(userID) // calls a user service to get info on user
friends: await friendsService(userID) // calls a friends service to get info on friends for
}
Let's say the userService and friendsService are configured on different API Gateway endpoints.
Then wouldn't that make the network request take longer than if I were to just package my entire backend into one zip file that's uploaded to AWS Lambda.
Seems like this is very inefficient.
Is there a way to call other lambdas without having to make a network request? I understand putting the lambdas/gateway in the same VPC as the main Gateway endpoint exposed to the internet, but this is expensive?
Anyway to do this more efficiently?

You can call a Lambda function by using the AWS SDK (a LambdaClient object). So, for example assume you wrote two Lambda functions - funA and funB.
Next assume:
you want to call funB from funA
you wrote your Lambda function by using the Lambda Java runtime
You can use the Lambda Java API to invoke funB. There is no need to wrap either one in Restful call using API Gateway. You can use the AWS SDK. Here is the Java API example that shows you how to invoke a Lambda function:
https://github.com/awsdocs/aws-doc-sdk-examples/blob/master/javav2/example_code/lambda/src/main/java/com/example/lambda/LambdaInvoke.java

#smac2020 writes in the answer about using SDK, that is of course a network call too. It just skips api gateway and calls directly AWS api.
I think the key point about Lambda is if it scales well. Let you try to thing about the algorithms in a different way. For example you can create a pipeline where in each step your "state object" is enriched with additional data. You can use step function or SQS and send the requests in between the steps or you can make the client responsible to manage the data. You should try to avoid one function waiting for another function. You are then paying two lambdas running - the caller and the called one.
If you are thinking "but microservices..." - look at the design of the AWS API itself. You do not need output of one service as an input to another one. It needs some time to adapt and to look at problems from different perspectives. In your case I would consider if the user list can for example live in the user object and the calls can be merged (look at some no-sql database design principles).

API Backend on AWS Lambda

Context:
I have a usecase where my backend service should compute 1 or more features, where each feature is a simple peace of computation (can be as simple as adding two numbers) and each feature takes input and return an output value, which can be boolean or a number. Client can actually request features (1 or 10 etc), also each feature can have multiple versions.
Design:
Lambda function seems like a good choice, since it supports versioning and takes care of scaling. In my design, one Lambda will receive the request and then call further lambda functions in parallel (Say user asked for 12 features, Lambda function L1 will invoke 12 Lambda functions in parallel) synchronously, and return all computed feature values as one response (HTTP). This way, all features can be versioned in their own Lambda functions.
Questions:
Is it ok to call a lambda function directly from another Lambda function? Is it a good usecase for using Lambda functions?
Thanks

I think Lambda would work just fine for your use case. For versioning, you could use the API versioning provided by API Gateway but I think that is a bit much for your case. Just create different functions.
Check out serverless.com. It is a solid framework and easy to get started with. It will take a lot of the work out of setting it up, plus you'll have your infrastructure as code.
Yes, it is okay to call lambdas from other Lambdas. There is not a 'clean' way to do that though. On the other hand, 'Step functions' may be what you need. Lambda support's chaining functions in a workflow. The previous lambda is not 'calling' the next function as much as proceeding to the next step in the workflow. The Serverless framework also supports using the method and can be configured in the serverless.yml config file

Handle child lambda failures

We are trying the lambda for our ETL job which is written in Clojure.
Our architecture is the scheduler will trigger the parent lambda, then the parent lambda trigger 100 child lambda and counter lambda. The child lambdas after completion of their work it will write the data to s3 . The counter lambda will check the number of files in the S3 , if it is 100 then it will combine all the files and save it to S3, otherwise it will span a new counter lambda and die.
All the positive scenario is working fine, but if any child fails then the counter lambda will end up in the indefinite loop, because there wont be 100 files.
If there any proper way of spanning child lambda, monitor it and if it fails need to restart or retry that alone ?
Is there any good Clojure lambda framework ?

Process monitoring is not built into any lambda clojure libraries that I know of, so for this case I'd recommend taking a page out of the erlang metaphorical play book (supervisor trees) and say that to have a dependable distributed system every actor needs a monitor so a decent approach would be to have a watcher for each lambda task. This can really simplify the error handling cases along the "let it crash" philosophy.
So this would leave you with this list of lambdas:
counters:
a watcher/restarter for the counter (you kind of already have this)
workers x100
supervisors x100
Each supervisor only checks for the presence of one particular file and restarts one particular lambda if it does not exist. this gets much easier if your process is idempotent, so you don't have to worry too much if a file is produced twice, though it's not too hard to check if the lambda a supervisor is watching is still running using the aws api. this supervisor can be started by the thing it's supervising or by the thing that starts the rest of the system, whatever is easier for your codebase. You likely don't need to explicitly start the workers, the supervisor can do that.
The important part is to add cloudwatch or whatever your favourite eventing system is (mine is riemann) so you can add alerts to know when you need to watch the watchers.

There is easy way out there in AWS is called AWS Step Functions. Step Functions provides a graphical console to arrange and visualize the components of your application as a series of steps. Define steps using the AWS Step Functions console or API, a fluent Java API, or AWS CloudFormation templates.
Step makes it simple to orchestrate AWS Lambda functions. Irrespective of language of function, it manages all the lambdas.
Step is good for following use cases
Run sequence functions
Run functions in parallel
Select functions based on data
Retry the functions
try/catch/finally for functions
Running the code for hours

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Managing a 'dynamic' AWS Lambda workflow - amazon-web-services

You could leverage the AWS SDK to generate a Step Machine using said ARNs from within a Lambda Function. It would require some way to clean up afterwards somehow, and / or avoid duplicates, or the console would quickly get messy.

Related

What is the best way to implement AWS Lambda for async requests?

Enable Function Chain Response for AWS API Gateway?

What is the best way to internally call an AWS Lambda function within another Lambda function?

API Backend on AWS Lambda

Handle child lambda failures

Categories

Resources