Using an EventBridge event pattern string in a lambda function - amazon-web-services

I have a lambda function using Python.
It's connected to an EventBridge rule that triggers every time there's a change in a Glue table.
The event pattern it's outputting looks something like this:
{
"version":"0",
"detail":{
"databaseName":"flights-db",
"typeOfChange":"UpdateTable",
"tableName":"flightscsv"
}
}
I want to get the tableName and databaseName values from this output into the function as a variable.
My Lambda function:
import json
import boto3
def lambda_handler(event, context):
boto3_version = boto3.__version__
return_statement = 'Boto3 version: ', boto3_version,\
'Event output: ', event
return {
'statusCode': 200,
'body': json.dumps(return_statement)
}
I was expecting to get the event pattern output from the event in my return statement but that's not the case.
When testing this function the return output for event is:
{\"key1\": \"value1\", \"key2\": \"value2\", \"key3\": \"value3\"}
This key and values are defined like this in the Test Pattern for the function.
The eventbridge rule is defined like this:
How can I get the values from the event pattern to a variable?
Do I need to configure the test pattern to get the results into event?
EDIT:
Picture of log events for the table change event:

The event object generated by CloudWatch (CW) Events / Event Bridge (EB) are listed here. These events will be passed to your function when it is going to get triggered by EB.
Your EB Event Pattern should be:
{
"source": ["aws.glue"],
"detail-type": ["Glue Data Catalog Table State Change"]
}
The above should match changes to any tables in your glue catalog. The event should be similar to the one below:
{
"version": "0",
"id": "2617428d-715f-edef-70b8-d210da0317a0",
"detail-type": "Glue Data Catalog Table State Change",
"source": "aws.glue",
"account": "123456789012",
"time": "2019-01-16T18:16:01Z",
"region": "eu-west-1",
"resources": [
"arn:aws:glue:eu-west-1:123456789012:table/d1/t1"
],
"detail": {
"databaseName": "d1",
"changedPartitions": [
"[C.pdf, dir3]",
"[D.doc, dir4]"
],
"typeOfChange": "BatchCreatePartition",
"tableName": "t1"
}
}
Thus, to get tableName and databaseName your lambda function could be:
import json
import boto3
def lambda_handler(event, context):
boto3_version = boto3.__version__
print(event)
table_name = event['detail']['tableName']
database_name = event['detail']['databaseName']
print(table_name, database_name)
return_statement = {
'boto3_version': boto3_version,
'table_name': table_name,
'database_name': database_name
}
return {
'statusCode': 200,
'body': json.dumps(return_statement)
}
For testing, you can setup sample EB event in your lambda test window:

Related

Using Eventbridge to trigger Glue job but with delay

I want to create an Eventbridge rule which triggers after certain number of files are uploaded into the S3 bucket. For ex: Consider a certain prefix in bucket is empty(bucket/folder/[empty]), the user needs to upload 5 files. Only after those five files are uploaded can the Eventbridge be triggered. I tried searching for rule pattern, but unable to find anything related to this. Currently using
{
"source": ["aws.s3"],
"detail-type": ["Object Created"],
"detail": {
"bucket": {
"name": ["test-bucket-for-event"]
},
"object": {
"key": [{
"prefix": "folder/Latest/"
}]
}
}
}
Can i mention like, numbers here, like using greater than 5 etc.
Or how to configure that.
Help is appreciated.
Thanks
I created a Lambda function which is used to trigger glue job after certain number of files are created
import json
import boto3
def lambda_handler(event,context):
bucket = "bucket-name"
folder = "Folder/Subfolder/"
objs = boto3.client('s3').list_objects_v2(Bucket=bucket,Prefix=folder)
conn = boto3.client('s3')
for key in conn.list_objects(Bucket=bucket,Prefix=folder)['Contents']:
file_name = list(key['Key'])
print(''.join(file_name))
fileCount = objs['KeyCount']
fileCount = fileCount - 1
print(fileCount)
if fileCount >= 5:
print('the files are present,going to send notification') #here add the glue trigger
else:
print("None")

Lambda calling update_item, error: Invalid UpdateExpression: Incorrect operand type for operator or function; operator or function: +, operand type: M

I'm using a lambda function to work with a dynamodb table. The function is written in python, and I just used the one from an aws tutorial (this is for a personal project to learn AWS). It's working great for "read" (get_item) and "create" (put_item), but "update" (update_item) is throwing this error, and I can't figure out why. I'm using a test query I built with NoSQL Workbench. When I hit "run" for the same query in Workbench, it works, and it also works when I call it from the command line. Whenever I go through the lambda function it throws this error, but like I said, the function works fine for the other calls.
The lambda code:
import boto3
import json
def handler(event, context):
'''Provide an event that contains the following keys:
- operation: one of the operations in the operations dict below
- tableName: required for operations that interact with DynamoDB
- payload: a parameter to pass to the operation being performed
'''
operation = event['operation']
if 'tableName' in event:
dynamo = boto3.resource('dynamodb').Table(event['tableName'])
operationsDict = {
'create': lambda x: dynamo.put_item(**x),
'read': lambda x: dynamo.get_item(**x),
'update': lambda x: dynamo.update_item(**x),
'delete': lambda x: dynamo.delete_item(**x),
'list': lambda x: dynamo.scan(**x),
'echo': lambda x: x,
'ping': lambda x: 'pong'
}
if operation in operationsDict:
return operationsDict[operation](event.get('payload'))
else:
raise ValueError('Unrecognized operation "{}"'.format(operation))
The test query:
{
"operation": "update",
"tableName": "CCDynamoDB",
"payload": {
"Key": {
"id": {"S":"visitorCount"}
},
"UpdateExpression": "SET #7c680 = #7c680 + :7c680",
"ExpressionAttributeNames": {"#7c680":"currentCount"},
"ExpressionAttributeValues": {":7c680": {"N":"1"}}
}
}
Thanks for your help!
You have a problem in your payload. Try like below. For better understanding, suggesting you to take a look at this document.
CCDynamoDB.update_item(
Key={
'id': 'visitorCount'
},
UpdateExpression='SET currentCount = :val1',
ExpressionAttributeValues={
':val1': 1
}
)

Add a new item to a Dynamodb using a AWS lambda function each time a function is executed with Cloudwatch

I'm trying to modify a Dynamodb table each time a Lambda function is executed.
Specifically, I create a simple lambda function that returns a list of S3 bucket names and this function run each minute thanks to a Cloudwatch's rule.
However, as I said before, my goal is to also update a Dynamodb each time the same function is executed. Specifically I want to add each time a new Item with the same attribute (so let's say the function is executed 1000 times, I want 1K items/rows).
However I don't know how to do it. Any suggestions? Here's the code:
import json
import boto3
s3 = boto3.resource('s3')
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('Table')
def lambda_handler(event, context):
bucket_list = []
for b in s3.buckets.all():
print(b.name)
bucket_list.append(b.name)
response = "done"
table.put_item(
Item = {
"Update": response
}
)
return {
"statusCode": 200,
"body": bucket_list
}
Thank you in advance
Your problem is that PutItem does overwrite exiting items, if they are the same. So every time you try to insert Update=done, it just overwrites the same item.
The very first sentence of the documentation states:
Creates a new item, or replaces an old item with a new item.
So what you need to do is to put something in your item that is unique, so that a new item is created instead of the old one being overwritten.
You could create a UUID or something like that, but I think it would be beneficial to use the time of execution. This way you could see when your last execution was etc.
from datetime import datetime
[...]
table.put_item(
Item = {
"Update": response,
"ProcessingTime": datetime.now().isoformat()
}
)
Adding to what Jens stated, which is 100% correct.
You could use data from the event. The event will look something like this:
{
"id": "cdc73f9d-aea9-11e3-9d5a-835b769c0d9c",
"detail-type": "Scheduled Event",
"source": "aws.events",
"account": "123456789012",
"time": "1970-01-01T00:00:00Z",
"region": "us-west-2",
"resources": [
"arn:aws:events:us-west-2:123456789012:rule/ExampleRule"
],
"detail": {}
}
The id value will be 100% unique, and the time value will be the time it was triggered.

How do I pass variables through AWS Codepipeline?

AWS CodePipeline orchestrates first lambda-A and then lambda-B and i want to pass a variable from my lambda-A to my lambda-B.
In lambda-A i set the outputVariables when setting the job to success:
boto3.client("codepipeline").put_job_success_result(
jobId=event["CodePipeline.job"]["id"],
outputVariables={"FOO":"BAR"}
)
From the documentation i know that outputVariables are Key-value pairs that can be made available to a downstream action.
CodePipeline then triggers lambda-B. How can i retrieve in lambda-B the variables i have set in the outputVariables in lambda-A?
In Lambda-B's action configuration, in User parameters, enter the variable syntax to ingest the variable created in earlier action using this syntax:
#{outputVariables.FOO}
Then you can unpack the 'UserParameters' in Lambda function:
{
"CodePipeline.job": {
"id": "EXAMPLE-e08a-4f06-b9ba-EXAMPLE",
"accountId": "EXAMPLE87397",
"data": {
"actionConfiguration": {
"configuration": {
"FunctionName": "LambdaForCP-Python",
"UserParameters": "5e2591fd79889dEXAMPLE5f33e2"
}
},
from 'event':
def lambda_handler(event, context):
print(event)
This procedure is detailed in Step (f) here:
https://docs.aws.amazon.com/codepipeline/latest/userguide/tutorials-lambda-variables.html#lambda-variables-pipeline

Formatting for Firehose transformation output

I am using AWS Kinesis Firehose with a custom Data Transformation. The Lambda's written in Python 3.6 and returns strings that look like the following:
{
"records": [
{
"recordId": "...",
"result": "Ok",
"data": "..."
},
{
"recordId": "...",
"result": "Ok",
"data": "..."
},
{
"recordId": "...",
"result": "Ok",
"data": "..."
}
]
}
This Lambda is perfectly happy, and logs outputs that look like the above just before returning them to Firehose. However, the Firehose's S3 Logs then show an error:
Invalid output structure: Please check your function and make sure the processed records contain valid result status of Dropped, Ok, or ProcessingFailed.
Looking at the examples for this spread across the web in JS and Java, it's not clear to me what I need to be doing differently; I'm quite confused.
If your data is a json object, you can try following
import base64
import json
def lambda_handler(event, context):
output = []
for record in event['records']:
# your own business logic.
json_object = {...}
output_record = {
'recordId': record['recordId'],
'result': 'Ok',
'data': base64.b64encode(json.dumps(json_object).encode('utf-8')).decode('utf-8')
}
output.append(output_record)
return {'records': output}
base64.b64encode function only works with b'xxx' string while 'data' attribute of output_record needs a normal 'xxx' string.
I've found the same error using Node.js.
Reading the documentation http://docs.aws.amazon.com/firehose/latest/dev/data-transformation.html my mistake was not base64-encoding of the data field of every record.
I resolved doing this:
{
recordId: record.recordId,
result: 'Ok',
data: new Buffer(JSON.stringify(data)).toString('base64')
}
You can check the code in my repo.
https://github.com/hixichen/golang_lamda_decode_protobuf_firehose