Add a new item to a Dynamodb using a AWS lambda function each time a function is executed with Cloudwatch - amazon-web-services

I'm trying to modify a Dynamodb table each time a Lambda function is executed.
Specifically, I create a simple lambda function that returns a list of S3 bucket names and this function run each minute thanks to a Cloudwatch's rule.
However, as I said before, my goal is to also update a Dynamodb each time the same function is executed. Specifically I want to add each time a new Item with the same attribute (so let's say the function is executed 1000 times, I want 1K items/rows).
However I don't know how to do it. Any suggestions? Here's the code:
import json
import boto3
s3 = boto3.resource('s3')
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('Table')
def lambda_handler(event, context):
bucket_list = []
for b in s3.buckets.all():
print(b.name)
bucket_list.append(b.name)
response = "done"
table.put_item(
Item = {
"Update": response
}
)
return {
"statusCode": 200,
"body": bucket_list
}
Thank you in advance

Your problem is that PutItem does overwrite exiting items, if they are the same. So every time you try to insert Update=done, it just overwrites the same item.
The very first sentence of the documentation states:
Creates a new item, or replaces an old item with a new item.
So what you need to do is to put something in your item that is unique, so that a new item is created instead of the old one being overwritten.
You could create a UUID or something like that, but I think it would be beneficial to use the time of execution. This way you could see when your last execution was etc.
from datetime import datetime
[...]
table.put_item(
Item = {
"Update": response,
"ProcessingTime": datetime.now().isoformat()
}
)

Adding to what Jens stated, which is 100% correct.
You could use data from the event. The event will look something like this:
{
"id": "cdc73f9d-aea9-11e3-9d5a-835b769c0d9c",
"detail-type": "Scheduled Event",
"source": "aws.events",
"account": "123456789012",
"time": "1970-01-01T00:00:00Z",
"region": "us-west-2",
"resources": [
"arn:aws:events:us-west-2:123456789012:rule/ExampleRule"
],
"detail": {}
}
The id value will be 100% unique, and the time value will be the time it was triggered.

Related

Lambda event filtering for DynamoDB trigger

Here is a modified version of an Event type I am receiving in my handler for a lambda function with a DynamoDB someTableName table trigger that I logged using cargo lambda.
Event {
records: [
EventRecord {
change: StreamRecord {
approximate_creation_date_time: ___
keys: {"id": String("___")},
new_image: {
....
"valid": Boolean(true),
},
...
},
...
event_name: "INSERT",
event_source: Some("aws:dynamodb"),
table_name: None
}
]
}
Goal: Correctly filter with event_name=INSERT && valid=false
I have tried a number of options, for example;
{"eventName": ["INSERT"]}
While the filter is added correctly, it does not trigger the lambda on item inserted.
Q1) What am I doing incorrectly here?
Q2) Why is table_name returning None? The lambda function is created with a specific table name as trigger. The returned fields are returning an option (Some(_)) so I'm asssuming it returns None if the table name is specified on lambda creation, but seems odd to me?
Q3) From AWS Management Console > Lambda > ... > Trigger Detail, I see the following (which is slightly different from my code mentioned above), where does "key" come from and what does it represent in the original Event?
Filters must follow the documented syntax for filtering in the Event Source Mapping between Lambda and DynamoDB Streams.
If you are entering the filter in the Lambda console:
{ "eventName": ["INSERT"], "dynamodb": { "NewImage": {"valid": { "BOOL" : [false]}} } }
The attribute name is actually eventName, so your filter should look like this:
{"eventName": ["INSERT"]}

Pass date or cron schedule from cloud watch events to step function input

Is there a way I can pass a date or a cron schedule as an input to my state function - which is getting called by a cloud watch event? The cloud watch event runs on a cron schedule and I would like to pass that dynamically on a daily basis to the step function
For example:
This gives some static input, but I want to give each day's date as input
resource "aws_cloudwatch_event_target" "target" {
rule = aws_cloudwatch_event_rule.samplerule.id
arn = aws_sfn_state_machine.samplemachine.id
role_arn = aws_iam_role.iam_for_sfn.arn
input = <<EOF
{
"operand1": "3",
"operand2": "5",
"operator": "add"
}
EOF
}
An alternative solution could be to use the global access to the context object, as explained here to get the execution start time of the step functions state machine.
So you can send it through your different states of your state machine like this:
"mystep1": {
"Type": "task",
"Parameters": {
"StartTime.$": "$$.Execution.StartTime"
}
}
Make sure to use the double $ to tell Cloudformation that you're using the global access to the context object.
The input to your Lambda function from a Scheduled Event looks something like this:
{
"id": "53dc4d37-cffa-4f76-80c9-8b7d4a4d2eaa",
"detail-type": "Scheduled Event",
"source": "aws.events",
"account": "123456789012",
"time": "2019-10-08T16:53:06Z",
"region": "us-east-1",
"resources": [ "arn:aws:events:us-east-1:123456789012:rule/MyScheduledRule" ],
"detail": {}
}
Using your choice of programming language, you can extract the time value and convert it from a string to a date resource/object (depending on your language). From that time, you can get the data components you are looking for.
IMPORTANT: All scheduled events use UTC time zone and the minimum precision for schedules is 1 minute: https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/ScheduledEvents.html

Using an EventBridge event pattern string in a lambda function

I have a lambda function using Python.
It's connected to an EventBridge rule that triggers every time there's a change in a Glue table.
The event pattern it's outputting looks something like this:
{
"version":"0",
"detail":{
"databaseName":"flights-db",
"typeOfChange":"UpdateTable",
"tableName":"flightscsv"
}
}
I want to get the tableName and databaseName values from this output into the function as a variable.
My Lambda function:
import json
import boto3
def lambda_handler(event, context):
boto3_version = boto3.__version__
return_statement = 'Boto3 version: ', boto3_version,\
'Event output: ', event
return {
'statusCode': 200,
'body': json.dumps(return_statement)
}
I was expecting to get the event pattern output from the event in my return statement but that's not the case.
When testing this function the return output for event is:
{\"key1\": \"value1\", \"key2\": \"value2\", \"key3\": \"value3\"}
This key and values are defined like this in the Test Pattern for the function.
The eventbridge rule is defined like this:
How can I get the values from the event pattern to a variable?
Do I need to configure the test pattern to get the results into event?
EDIT:
Picture of log events for the table change event:
The event object generated by CloudWatch (CW) Events / Event Bridge (EB) are listed here. These events will be passed to your function when it is going to get triggered by EB.
Your EB Event Pattern should be:
{
"source": ["aws.glue"],
"detail-type": ["Glue Data Catalog Table State Change"]
}
The above should match changes to any tables in your glue catalog. The event should be similar to the one below:
{
"version": "0",
"id": "2617428d-715f-edef-70b8-d210da0317a0",
"detail-type": "Glue Data Catalog Table State Change",
"source": "aws.glue",
"account": "123456789012",
"time": "2019-01-16T18:16:01Z",
"region": "eu-west-1",
"resources": [
"arn:aws:glue:eu-west-1:123456789012:table/d1/t1"
],
"detail": {
"databaseName": "d1",
"changedPartitions": [
"[C.pdf, dir3]",
"[D.doc, dir4]"
],
"typeOfChange": "BatchCreatePartition",
"tableName": "t1"
}
}
Thus, to get tableName and databaseName your lambda function could be:
import json
import boto3
def lambda_handler(event, context):
boto3_version = boto3.__version__
print(event)
table_name = event['detail']['tableName']
database_name = event['detail']['databaseName']
print(table_name, database_name)
return_statement = {
'boto3_version': boto3_version,
'table_name': table_name,
'database_name': database_name
}
return {
'statusCode': 200,
'body': json.dumps(return_statement)
}
For testing, you can setup sample EB event in your lambda test window:

AWS Step Function get all Range keys with a single primary key in DynamoDB

I'm building AWS Step Function state machines. My goal is to read all Items from a DynamoDB table with a specific value for the Hash key (username) and any (wildcard) Sort keys (order_id).
Basically something that would be done from SQL with:
SELECT username,order_id FROM UserOrdersTable
WHERE username = 'daffyduck'
I'm using Step Functions to get configurations for AWS Glue jobs from a DynamoDB table and then spawn a Glue job via step function's Map for each dynamodb item (with parameters read from the database).
"Read Items from DynamoDB": {
"Type": "Task",
"Resource": "arn:aws:states:::dynamodb:getItem",
"Parameters": {
"TableName": "UserOrdersTable",
"Key": {
"username": {
"S": "daffyduck"
},
"order_id": {
"S": "*"
}
}
},
"ResultPath": "$",
"Next": "Invoke Glue jobs"
}
But I can't bring the state machine to read all order_id's for the user daffyduck in the step function task above. No output is displayed using the code above, apart from http stats.
Is there a wildcard for order_id ? Is there another way of getting all order_ids? The query customization seems to be rather limited inside step functions:
https://docs.amazonaws.cn/amazondynamodb/latest/APIReference/API_GetItem.html#API_GetItem_RequestSyntax
Basically I'm trying to accomplish what can be done from the command line like so:
$ aws dynamodb query \
--table-name UserOrdersTable \
--key-condition-expression "Username = :username" \
--expression-attribute-values '{
":username": { "S": "daffyduck" }
}'
Any ideas? Thanks
I don't think that is possible with Step functions Dynamodb Service yet.
currently supports get, put, delete & update Item, not query or scan.
For GetItem we need to pass entire KEY (Partition + Range Key)
For the primary key, you must provide all of the attributes. For
example, with a simple primary key, you only need to provide a value
for the partition key. For a composite primary key, you must provide
values for both the partition key and the sort key.
We need to write a Lambda function to query Dynamo and return a map and invoke the lambda function from step.

How to disable (or redirect) logging on an AWS Step Function that calls parallel Lambda functions

I'm running an AWS step function with parallel execution branches.
Each branch succeeds individually, however the overall function fails with the following error:
States.DataLimitExceeded - The state/task returned a result with a size exceeding the maximum number of characters service limit.
I then found an article from AWS that describes this issue and suggests a work around:
https://docs.aws.amazon.com/step-functions/latest/dg/connect-lambda.html
That article says:
The Lambda invoke API includes logs in the response by default. Multiple Lambda invocations in a workflow can trigger States.DataLimitExceeded errors. To avoid this, include "LogType" = "None" as a parameter when you invoke your Lambda functions.
My question is where exactly do I put it? I've tried putting it various places in the state machine definition, however I get the following error:
The field 'LogType' is not supported by Step Functions
That error seems contrary to the support article, so perhaps I'm doing it wrong!
Any advice is appreciated, thanks in advance!
Cheers
UPDATE 1 :
To be clear, this is a parallel function, with 26 parallel branches. Each branch has a small output as per the example below. The biggest item in this data is the LogResult, which (when base64 decoded) is just the billing info. I think this info multiplied by 26 has led to the error, so I just want to turn this LogResult off!!!
{
"ExecutedVersion": "$LATEST",
"LogResult": "U1RBUlQgUmVxdWVzdElkOiBlODJjZTRkOS0zMjk2LTRlNDctYjcyZC1iYmEwMzI1YmM3MGUgVmVyc2lvbjogJExBVEVTVApFTkQgUmVxdWVzdElkOiBlODJjZTRkOS0zMjk2LTRlNDctYjcyZC1iYmEwMzI1YmM3MGUKUkVQT1JUIFJlcXVlc3RJZDogZTgyY2U0ZDktMzI5Ni00ZTQ3LWI3MmQtYmJhMDMyNWJjNzBlCUR1cmF0aW9uOiA3NzI5Ljc2IG1zCUJpbGxlZCBEdXJhdGlvbjogNzgwMCBtcwlNZW1vcnkgU2l6ZTogMTAyNCBNQglNYXggTWVtb3J5IFVzZWQ6IDEwNCBNQglJbml0IER1cmF0aW9uOiAxMTY0Ljc3IG1zCQo=",
"Payload": {
"statusCode": 200,
"body": {
"signs": 63,
"nil": ""
}
},
"SdkHttpMetadata": {
"HttpHeaders": {
"Connection": "keep-alive",
"Content-Length": "53",
"Content-Type": "application/json",
"Date": "Thu, 21 Nov 2019 04:00:42 GMT",
"X-Amz-Executed-Version": "$LATEST",
"X-Amz-Log-Result": "U1RBUlQgUmVxdWVzdElkOiBlODJjZTRkOS0zMjk2LTRlNDctYjcyZC1iYmEwMzI1YmM3MGUgVmVyc2lvbjogJExBVEVTVApFTkQgUmVxdWVzdElkOiBlODJjZTRkOS0zMjk2LTRlNDctYjcyZC1iYmEwMzI1YmM3MGUKUkVQT1JUIFJlcXVlc3RJZDogZTgyY2U0ZDktMzI5Ni00ZTQ3LWI3MmQtYmJhMDMyNWJjNzBlCUR1cmF0aW9uOiA3NzI5Ljc2IG1zCUJpbGxlZCBEdXJhdGlvbjogNzgwMCBtcwlNZW1vcnkgU2l6ZTogMTAyNCBNQglNYXggTWVtb3J5IFVzZWQ6IDEwNCBNQglJbml0IER1cmF0aW9uOiAxMTY0Ljc3IG1zCQo=",
"x-amzn-Remapped-Content-Length": "0",
"x-amzn-RequestId": "e82ce4d9-3296-4e47-b72d-bba0325bc70e",
"X-Amzn-Trace-Id": "root=1-5dd60be1-47c4669ce54d5208b92b52a4;sampled=0"
},
"HttpStatusCode": 200
},
"SdkResponseMetadata": {
"RequestId": "e82ce4d9-3296-4e47-b72d-bba0325bc70e"
},
"StatusCode": 200
}
I ran into exactly the same problem as you recently. You haven't said what your lambdas are doing or returning however I found that AWS refers to limits that tasks have within executions https://docs.aws.amazon.com/step-functions/latest/dg/limits.html#service-limits-task-executions.
What I found was that my particular lambda had an extremely long response with 10s of thousands of characters. Amending that so that the response from the lambda was more reasonable got past the error in the step function.
I had the problem a week ago.
I way I solved is like below:
You can define which portion of the result that is transmitted to the next step.
For that you have to use
"OutputPath": "$.part2",
In your json input you have
"part1": {
"portion1": {
"procedure": "Delete_X"
},
"portion2":{
"procedure": "Load_X"
}
},
"part2": {
"portion1": {
"procedure": "Delete_Y"
},
"portion2":{
"procedure": "Load_Y"
}
}
Once part1 is processed, you make sure that part1 is not sent in the output and the resultpath related to it. Just part 2 which is needed for the following steps is sent for the next steps.
With this: "OutputPath": "$.part2",
let me know if that helps
I got stuck on the same issue. Step function imposes a limit of 32,768 characters on the data that can be passed around between two states.
https://docs.aws.amazon.com/step-functions/latest/dg/limits.html
Maybe you need to think and breakdown your problem in a different way? That's what I did. Because removing the log response would give you some elasticity but your solution will not scale after a certain limit.
I handle large data in my Step Functions by storing the result in an S3 bucket, and then having my State Machine return the path to the result-file (and a brief summary of the data or a status like PASS/FAIL).
The same could be done using a DB if that's more comfortable.
This way won't have to modify your results' current format, you can just pass the reference around instead of a huge amount of data, and they are persisted as long as you'd like to have them.
The start of the Lambdas looks something like this to figure out if the input is from a file or plain data:
bucket_name = util.env('BUCKET_NAME')
if 'result_path' in input_data.keys():
# Results are in a file that is referenced.
try:
result_path = input_data['result_path']
result_data = util.get_file_content(result_path, bucket_name)
except Exception as e:
report.append(f'Failed to parse JSON from {result_path}: {e}')
else:
# Results are just raw data, not a reference.
result_data = input_data
Then at the end of the Lambda they will upload their results and return directions to that file:
import boto3
def upload_results_to_s3(bucket_name, filename, result_data_to_upload):
try:
s3 = boto3.resource('s3')
results_prefix = 'Path/In/S3/'
results_suffix = '_Results.json'
result_file_path = '' + results_prefix + filename + results_suffix
s3.Object(bucket_name, result_file_path).put(
Body=(bytes(json.dumps(result_data_to_upload, indent=2).encode('UTF-8')))
)
return result_file_path
result_path = upload_results_to_s3(bucket_name, filename, result_data_to_upload)
result_obj = {
"result_path": result_path,
"bucket_name": bucket_name
}
return result_obj
Then the next Lambda will have the first code snippet in it, in order to get the input from the file.
The Step Function Nodes look like this, where the Result will be result_obj in the python code above:
"YOUR STATE":
{
"Comment": "Call Lambda that puts results in file",
"Type": "Task",
"Resource": "arn:aws:lambda:YOUR LAMBDA ARN",
"InputPath": "$.next_function_input",
"ResultPath": "$.next_function_input",
"Next": "YOUR-NEXT-STATE"
}
Something you can do is, add "emptyOutputPath": "" to your json,
"emptyOutputPath": "",
"part1": { "portion1": { "procedure": "Delete_X"
}, "portion2":{ "procedure": "Load_X" } },
"part2": { "portion1": { "procedure": "Delete_Y"
}, "portion2":{ "procedure": "Load_Y" } }
That will allow you to do "OutputPath":"$.emptyOutputPath" which is empty and will clear ResultPath.
Hope that helps
Just following up on this issue to close the loop.
I basically gave up on using parallel lambdas in favour of using AQS message queues instead