I am attempting to pass json data into my sagemaker model through a lambda function. Currently, I am using a testing model that makes relatively quick inferences and returns them to the lambda function through the invoke_endpoint call. However, eventually a more advanced model will be implemented which might take longer than a lambda function can fun for (15 minutes maximum) to produce inferences. In the case that I call invoke_endpoint in one lambda function, can I return the response to another lambda function which is invoked by the sagemaker endpoint response? Even better, can I shut down the current lambda function after sending the data to sagemaker, and re-invoke it upon a response? I need to store the inference in DynamoDB, which is why I need a response (Unless I can update the saved model to store inferences directly, in which case I need the lambda function to not expect a response from invoke_endpoint). Sorry for my ignorance, I am a bit new to sagemaker.
When calling invoke_endpoint, the underlying model invocation must take less than 1 minute. If a single model execution needs more time to execute, consider running the model in Lambda itself, in SageMaker Training API (if its coldstart is acceptable) or in a custom service. If the invocation is made of several shorter calls you can also chain multiple services together with Step Functions.
Related
I am writing a call flow in Amazon Connect. I am using Lex to get a date from the caller into a slot and then setting a call attribute in Connect equal to the value of the slot. I need to calculate how many years have passed between the date the caller provides and today.
Can this be done within Connect and if yes, how? Or do I need to write a Lambda function?
You would need to do this in a lambda function, as there is no access to date time functions or ad-hoc programatic mechanisms within the Amazon Connect contact flow blocks (actions). The contact flow blocks only provides a set of comparison operators to compare contact attribute or metrics within the blocks.
You could potentially invoke this lambda function from within Lex, so that the slot data is returned as the time difference that you need, or call it from the contact flow after you get the Lex slot data with the captured date. Either way, it would need to be done in lambda.
I need an approach in AWS lambda to resolve a issue please help
What am I doing now:
Inside lambda handler function I am taking data from athena and performing some logic, also taking data from kinesis performing some logic. lambda handler is invoked every 20 sec
This is pseudo code:
def lambda_handler(event, context):
query = query to get data from athena
df = pd.DataFrame(query)
###Some processing logic from by taking data from kinesis###
My problem is
The data that I take from athena will change only once in a day. So every time when lambda handler is invoked it is unnecessarily querying to athena which is inefficient
What I need
I need some solution approach/code to "query athena and put in dataframe as global scope" so each time when lambda handler is triggered it will make use of global variable.
There are no persistent global variables within lambda itself. The only limited persistence of data that you can count for is through AWS Lambda execution environment:
Objects declared outside of the function's handler method remain initialized, providing additional optimization when the function is invoked again. For example, if your Lambda function establishes a database connection, instead of reestablishing the connection, the original connection is used in subsequent invocations. We recommend adding logic in your code to check if a connection exists before creating a new one.
However, this is not reliable and short lived. Thus the only way for you not to query Athena often, is to store the query results outside of lambda function.
Depending on the nature and amount of the data to be stored, a common choices to ensure persistence of the data between lambda function invocations are S3, EFS, DynamoDB, SSM Parameter Store and ElasticCache.
I need to do multiple queries on dynamodb from my dotnet lamba function (Like GetItem and Query using partition and sort keys). Which one is the best way?
Having subsequent queries in a single lambda.
Have to write separate Lambda for each query and call it from other lambda.
To use the step function.
It depend. It is fine to have multiple calls to dynamodb in a single lambda function as long it is doing only one thing. For example, if you have a lambda function serving a restful API resource update and you want to give an HTTP 404 - NotFound, it is fine to call GetItem first and an UpdateItem later on. Same applies you're doing a batch update and "Query using partition and sort keys".
Similarly to methods, usually when you have more than one level of abstraction your function is usually doing too much. Splitting up functions leads to reusability and easier testing. For example, if you want to update a resource and send an email (which require "Query using partition and sort keys"), you definitely don't want to do it in the same lambda function. In this case, using a step function may be a good idea and save you some time but, in the end, should not matter for the discussion if you should have multiple lambda functions or not.
I am receiving sensory data on AWS IoT and passing these values to a Lambda function using a rule. In the Lambda function which is coded in Python, I need to make a calculation based on the latest n values.
What is the best way of accessing previous parameters?
Each Lambda invocation is supposed to be state-less and not aware of previous invocations (there's container reuse but you cannot rely on that).
If you need those, then you have to persist those parameters somewhere else like DynamoDB or Redis on Elasticache.
Then, when you need to do your calculations, you can retrieve the past n-1 values and do your calculations.
I am seeking advice on what's the best way to design this -
Use Case
I want to put multiple files into S3. Once all files are successfully saved, I want to trigger a lambda function to do some other work.
Naive Approach
The way I am approaching this is by saving a record in Dynamo that contains a unique identifier and the total number of records I will be uploading along with the keys that should exist in S3.
A basic implementation would be to take my existing lambda function which is invoked anytime my S3 bucket is written into, and have it check manually whether all the other files been saved.
The Lambda function would know (look in Dynamo to determine what we're looking for) and query S3 to see if the other files are in. If so, use SNS to trigger my other lambda that will do the other work.
Edit: Another approach is have my client program that puts the files in S3 be responsible for directly invoking the other lambda function, since technically it knows when all the files have been uploaded. The issue with this approach is that I do not want this to be the responsibility of the client program... I want the client program to not care. As soon as it has uploaded the files, it should be able to just exit out.
Thoughts
I don't think this is a good idea. Mainly because Lambda functions should be lightweight, and polling the database from within the Lambda function to get the S3 keys of all the uploaded files and then checking in S3 if they are there - doing this each time seems ghetto and very repetitive.
What's the better approach? I was thinking something like using SWF but am not sure if that's overkill for my solution or if it will even let me do what I want. The documentation doesn't show real "examples" either. It's just a discussion without much of a step by step guide (perhaps I'm looking in the wrong spot).
Edit In response to mbaird's suggestions below-
Option 1 (SNS) This is what I will go with. It's simple and doesn't really violate the Single Responsibility Principal. That is, the client uploads the files and sends a notification (via SNS) that its work is done.
Option 2 (Dynamo streams) So this is essentially another "implementation" of Option 1. The client makes a service call, which in this case, results in a table update vs. a SNS notification (Option 1). This update would trigger the Lambda function, as opposed to notification. Not a bad solution, but I prefer using SNS for communication rather than relying on a database's capability (in this case Dynamo streams) to call a Lambda function.
In any case, I'm using AWS technologies and have coupling with their offering (Lambda functions, SNS, etc.) but I feel relying on something like Dynamo streams is making it an even tighter coupling. Not really a huge concern for my use case but still feels dirty ;D
Option 3 with S3 triggers My concern here is the possibility of race conditions. For example, if multiple files are being uploaded by the client simultaneously (think of several async uploads fired off at once with varying file sizes), what if two files happen to finish uploading at around the same time, and two or more Lambda functions (or whatever implementations we use) query Dynamo and gets back N as the completed uploads (instead of N and N+1)? Now even though the final result should be N+2, each one would add 1 to N. Nooooooooooo!
So Option 1 wins.
If you don't want the client program responsible for invoking the Lambda function directly, then would it be OK if it did something a bit more generic?
Option 1: (SNS) What if it simply notified an SNS topic that it had completed a batch of S3 uploads? You could subscribe your Lambda function to that SNS topic.
Option 2: (DynamoDB Streams) What if it simply updated the DynamoDB record with something like an attribute record.allFilesUploaded = true. You could have your Lambda function trigger off the DynamoDB stream. Since you are already creating a DynamoDB record via the client, this seems like a very simple way to mark the batch of uploads as complete without having to code in knowledge about what needs to happen next. The Lambda function could then check the "allFilesUploaded" attribute instead of having to go to S3 for a file listing every time it is called.
Alternatively, don't insert the DynamoDB record until all files have finished uploading, then your Lambda function could just trigger off new records being created.
Option 3: (continuing to use S3 triggers) If the client program can't be changed from how it works today, then instead of listing all the S3 files and comparing them to the list in DynamoDB each time a new file appears, simply update the DynamoDB record via an atomic counter. Then compare the result value against the size of the file list. Once the values are the same you know all the files have been uploaded. The down side to this is that you need to provision enough capacity on your DynamoDB table to handle all the updates, which is going to increase your costs.
Also, I agree with you that SWF is overkill for this task.