I'm currently running a CloudFormation stack with a number of elements to process video, including a call to Rekognition. I have most of it working but have a question about properly storing information as I go... so I can, in the end, write the Rekognition data for a video to a DynamoDB table.
Below I have the relevant parts of the stack, which are mostly inside of a Step Function passing this input event along:
sample_event = {
"guid": "1234",
"video": "video.mp4",
"bucket": "my-bucket"
}
Current setup:
Write sample_event to a DynamoDB Table, by primary key 'guid', pass that sample_event along to the next step.
Rekgonition-Trigger Lambda: Lambda function that runs start_label_detection() on 'video.mp4' in 'my-bucket', sets notification channel as an SNS topic.
Rekognition-Collect Lambda: Lambda function (sits outside the Step Function) that is triggered by the SNS topic (several minutes later, for example), collects the JobID from the SNS, runs get_label_detection() at the JobID.
The above is working fine. I want to add step 4:
Write rekognition response to my DyanmoDB table, for the entry at "guid" = "1234", so my dynamo item is updated to
{ "guid": "1234", "video": "video.mp4", "bucket": "my-bucket", "rek_data": "{"Labels": [...]}" }
So it seems to me that I essentially can't pass any other data through Rekognition other than the SNS topic. Also seems that in the second lambda, I shouldn't be querying by a non-primary key such as the JobID.
Is there a way to set up the second lambda function so that it is triggered by two (and only the correct two) SNS topics? Such as one to send the 'guid' and one to send the Rekognition data?
Or would it be efficient to use two Dynamo tables, one to temporarily store the JobID and guid for later referencing? Or a better way to do all of this?
Thanks!
Related
I am using a graphql API with AppSync that receives post requests from a lambda function that is triggered by AWS IoT with sensor data in the following JSON format:
{
"scoredata": {
"id": "240",
"distance": 124,
"timestamp": "09:21:11",
"Date": "04/16/2022"
}
}
The lambda function uses this JSON object to perform a post request on the graphql API, and AppSync puts this data in DynamoDB to be stored. My issue is that whenever I parse the JSON object within my lambda function to retrieve the id value, the id value does not match with the id value stored in DynamoDB; appsync is seemingly automatically generating an id.
Here is a screenshot of the request made to the graphql api from cloudwatch:
Here is what DynamoDB is storing:
I would like to know why the id in DynamoDB is shown as 964a3cb2-1d3d-4f1e-a94a-9e4640372963" when the post request id value is "240" and if there is anything I can do to fix this.
I can’t tell for certain but i’m guessing that dynamo db schema is autogenerating the id field on insert and using a uuid as the id type. An alternative would be to introduce a new property like score_id to store this extraneous id.
If you are using amplify most likely the request mapping templates you are generating automatically identify the "id" field as a unique identifier to be generated at runtime.
I recommend you to take a look at your VTL request template, you will most likely find something like this:
$util.qr($context.args.input.put("id", $util.defaultIfNull($ctx.args.input.id, $util.autoId())))
Surely the self-generated id comes from $util.autoId()
Probably some older version of Amplify could omit the verification $util.defaultIfNull($ctx.args.input.id,... and always overwrite the id by self-generating it.
I am trying to use the 'newUUID()' aws iot function in the AWS SiteWise service (as part of an alarm action) that returns a random 16-byte UUID to be stored as a partition key for a DynamoDb tables partition key column.
With reference to the attached screenshot, in the 'PartitionKeyValue' trying to
use the value returned by newUUID() function that will be passed to the DynamoDb as part of the action trigger.
Although this gives an error as follows:
"Invalid Request exception: Failed to parse expression due to: Invalid expression. Unrecognized function: newUUID".
I do understand the error, but not sure how can I solve this and use a random UUID generator. Kindly note that I do not want to use a timestamp, because there could be eventualities where multiple events get triggered at the same time and hence the same timestamp.
Any ideas that how can I use this function, or any other information that helps me achieve the above-mentioned.
The docs you refer to say that function is all lowercase newuuid().
Perhaps that will work, but I believe that function is only available in IoT Core SQL Statements. I think with event notifications, you only have these expressions to work with, which is not much. Essentially, you need to get what you need from the alarm event itself.
You may need the alarm event to invoke Lambda, rather than directly write to DynamoDB. Your Lambda function can create a UUID and write the alarm record to DynamoDB using the SDKs.
Lambda needs to get all results from DynamoDB and performs for processing on each record and trigger a step function workflow. Although paginated result is given by DynamoDB, Lambda will timeout if there are too many pages which can't be processed within 15 mins lambda limit. Is there any workaround to use lambda other than moving to Fargate?
Overview of Lambda
while True:
l, nextToken = get list of records from DynamoDB
for each record in l:
perform some preprocesing like reading a file and triggering a workflow
if nextToken == None:
break
I assume processing one record can fit inside the 15-minute lambda limit.
What you can do is to make your original lambda as an orchestrator that calls a worker lambda that processes a single page.
Orchestrator Lambda
while True:
l, nextToken = get list of records from DynamoDB
for each record in l:
call the worker lambda by passing the record as the event
if nextToken == None:
break
Worker Lambda
perform some preprocesing like reading a file and triggering a workflow
You can use SQS to provide you with a method to process these in rapid succession. You can even use that to perform them more or less in parallel rather than in sync.
Lambda reads in Dynamodb -> breaks each entry into a json object -> sends the json object to SQS -> which queues them out to multiple invoked lambda -> that lambda is designed to handle one single entry end finish
Doing this allows you to split up long tasks that may take many many hours across multiple lambda invocations by designing the second lambda to only handle one iteration of the task - and using SQS as your loop/iterator. You can set settings on SQS to send as fast as possible or to send one at a time (though if you do the one at a time you will have to manage the time to live and staleness settings of the messages in the queue)
In addition, if this is a regular thing where new items get added to the dynamo that then have to be processed, you should make use of Dynamo Streams - everytime a new item is added that triggers a lambda to fire on that new item, allowing to to do your workflow in real time as items are added.
I have a use case where i want a scheduled lambda to read from a dynamodb table until there are no records left to process from its dynamodb query. I don't want to run lots of instances of the lamdba as it will hit a REST endpoint each time and don't want to overload this external service.
The reason I am thinking i can't use dynamo streams (please correct me if I am wrong here) is
this DDB is where messages will be sent when a legacy service is down, the scheduled error handler lambda that will read them would not want to try and process them as soon as they are inserted as it is likely the legacy service is still down. (is it possible with streams to update one row in the DB say legacy_service = alive and then trigger a lambda ONLY for the rows where processed_status = false)
I also don't want to have multiple instances of the lambda running at one time as i don't want to throttle the legacy service.
I would like a scheduled lambda that queries dynamodb table for all records that have processed_status = false, the query has a limit to only retrieve a small batch (1 or 2 messages) and process them ( I have this part implemented already) when this lambda is finished i would like it to trigger again and again until there is no records in the DDB with processed_status = false.
This can be done with recursive functions good tutorial here https://labs.ebury.rocks/2016/11/22/recursive-amazon-lambda-functions/
I have created a rule to send the incoming IoT messages to a S3 bucket.
The problem is that any time IoT recieves a messages is sended and stored in a new file (with the same name) in S3.
I want this S3 file to keep all the data from before and not truncate each time a new message is stored.
How can I do that?
When you set up an IoT S3 rule action, you need to specify a bucket and a key. The key is what we might think of as a "path and file name". As the docs say, we can specify the key string by using a substitution template, which is just a fancy way of saying "build a path out of these pieces of information". When you are building your substitution template, you can reference fields inside the message as well as use use a bunch of other functions
Especially look at the functions topic, timestamp, as well as some of the string manipulator functions.
Let's say your topic names are something like things/thing-id-xyz/location and you just want to store each incoming JSON message in a "folder" for the thing-id it came in from. You might specify a key like:
${topic(2)}/${timestamp()).json
it would evaluate to something like:
thing-id-xyz/1481825251155.json
where the timestamp part is the time the message came in. That will be different for each message, and then the messages would not overwrite each other.
You can also specify parts of the message itself. Let's imagine our incoming messages look something like this:
{
"time": "2022-01-13T10:04:03Z",
"latitude": 40.803274,
"longitude": -74.237926,
"note": "Great view!"
}
Let's say you want to use the nice ISO date value you have in your data instead of the timestamp of the file. You could reference the time field no problem, like:
${topic(2)}/${time}.json
Now the file would be written as the key:
thing-id-xyz/2022-01-13T10:04:03Z.json
You should be able to find some combination of values that works for your needs, and that most importantly, is UNIQUE for each message so they don't overwrite each other in S3.
You can do it using AWS IoT SQL variable expressions. For example use following as a key ${newuuid()}. This will create new s3 object for each message received.
See more about SQL Functions https://docs.aws.amazon.com/iot/latest/developerguide/iot-sql-functions.html
You can't do this with the S3 IoT Rule Action. You can get similar results using AWS Firehose, which will batch up several messages and write to one file. You will still end up with multiple files though.