I am receiving sensory data on AWS IoT and passing these values to a Lambda function using a rule. In the Lambda function which is coded in Python, I need to make a calculation based on the latest n values.
What is the best way of accessing previous parameters?
Each Lambda invocation is supposed to be state-less and not aware of previous invocations (there's container reuse but you cannot rely on that).
If you need those, then you have to persist those parameters somewhere else like DynamoDB or Redis on Elasticache.
Then, when you need to do your calculations, you can retrieve the past n-1 values and do your calculations.
Related
I am attempting to pass json data into my sagemaker model through a lambda function. Currently, I am using a testing model that makes relatively quick inferences and returns them to the lambda function through the invoke_endpoint call. However, eventually a more advanced model will be implemented which might take longer than a lambda function can fun for (15 minutes maximum) to produce inferences. In the case that I call invoke_endpoint in one lambda function, can I return the response to another lambda function which is invoked by the sagemaker endpoint response? Even better, can I shut down the current lambda function after sending the data to sagemaker, and re-invoke it upon a response? I need to store the inference in DynamoDB, which is why I need a response (Unless I can update the saved model to store inferences directly, in which case I need the lambda function to not expect a response from invoke_endpoint). Sorry for my ignorance, I am a bit new to sagemaker.
When calling invoke_endpoint, the underlying model invocation must take less than 1 minute. If a single model execution needs more time to execute, consider running the model in Lambda itself, in SageMaker Training API (if its coldstart is acceptable) or in a custom service. If the invocation is made of several shorter calls you can also chain multiple services together with Step Functions.
I need an approach in AWS lambda to resolve a issue please help
What am I doing now:
Inside lambda handler function I am taking data from athena and performing some logic, also taking data from kinesis performing some logic. lambda handler is invoked every 20 sec
This is pseudo code:
def lambda_handler(event, context):
query = query to get data from athena
df = pd.DataFrame(query)
###Some processing logic from by taking data from kinesis###
My problem is
The data that I take from athena will change only once in a day. So every time when lambda handler is invoked it is unnecessarily querying to athena which is inefficient
What I need
I need some solution approach/code to "query athena and put in dataframe as global scope" so each time when lambda handler is triggered it will make use of global variable.
There are no persistent global variables within lambda itself. The only limited persistence of data that you can count for is through AWS Lambda execution environment:
Objects declared outside of the function's handler method remain initialized, providing additional optimization when the function is invoked again. For example, if your Lambda function establishes a database connection, instead of reestablishing the connection, the original connection is used in subsequent invocations. We recommend adding logic in your code to check if a connection exists before creating a new one.
However, this is not reliable and short lived. Thus the only way for you not to query Athena often, is to store the query results outside of lambda function.
Depending on the nature and amount of the data to be stored, a common choices to ensure persistence of the data between lambda function invocations are S3, EFS, DynamoDB, SSM Parameter Store and ElasticCache.
I need to store a variable containing a URL across invocations of my function, and to be able to modify this variable from inside the function. So in short I need to be able to send this URL to the function, and be able to retrieve it later. I've created a bucket I could store the URL in, but I'm having a real hard time understanding the documentation on how to write to and read from the bucket using lambda. Using the bucket or some other method, how would I store this piece of data? I'm using python.
There are several ways, which depend on exactly how the URL is shared, accessed, how often you read/write it.
For infrequent access, e.g., you lambda executes once a minute, you can store it in AWS Systems Manager Parameter Store.
For high frequencies and concurrent access, probably you should consider using DynamoDB.
S3 can also be used, but it will be the slowest and requires a bit of setup to read and write from your lambda. Access to parameter store is rather simple, as you can use get_parameter boto3 sdk.
I'm interested in seeing whether I can invoke an AWS Lambda when one of my DynamoDB tables grows to a certain size. Nothing in the DynamoDB Events/Triggers docs nor the Lambda Developer Guide suggests this is possible, but I find that hard to believe. Anyone ever deal with anything like this before?
You will have to do it manually.
I see two out-of-the box ways to achieve this though:
1) You can create a CloudWatch Event that runs every X min (replace X with whatever you think is necessary for your business case) to trigger your Lambda Function. Your function then needs to invoke the describeTable API and run a check against that value. Once it has run, you can disable the event since your table has reached the size you wanted to be notified about. This is the easiest and most cost effective since most of time your tables size will be lower than your predefined limit.
2) You could also use DynamoDB streams and invoke the describeTable API, but then your function would be triggered upon every new event in your table. This is cost ineffective and, in my opinion, overkilling.
I need to do multiple queries on dynamodb from my dotnet lamba function (Like GetItem and Query using partition and sort keys). Which one is the best way?
Having subsequent queries in a single lambda.
Have to write separate Lambda for each query and call it from other lambda.
To use the step function.
It depend. It is fine to have multiple calls to dynamodb in a single lambda function as long it is doing only one thing. For example, if you have a lambda function serving a restful API resource update and you want to give an HTTP 404 - NotFound, it is fine to call GetItem first and an UpdateItem later on. Same applies you're doing a batch update and "Query using partition and sort keys".
Similarly to methods, usually when you have more than one level of abstraction your function is usually doing too much. Splitting up functions leads to reusability and easier testing. For example, if you want to update a resource and send an email (which require "Query using partition and sort keys"), you definitely don't want to do it in the same lambda function. In this case, using a step function may be a good idea and save you some time but, in the end, should not matter for the discussion if you should have multiple lambda functions or not.