Getting Error when pre-processing data from Kinesis with Lambda - amazon-web-services

I have a use case where I have to filter incoming data from Kinesis Firehose based on the type of the event. I should write only certain events to S3 and ignore the rest of the events. I am using lambda to filter the records. I am using following python code to achieve this:
def lambda_handler(event, context):
# TODO implement
output = []
for record in event['records']:
payload = base64.b64decode(record["data"])
payload_json = json.loads(payload)
event_type = payload_json["eventPayload"]["operation"]
if event_type == "create" or event_type == "update":
output_record = {
'recordId': record['recordId'],
'result': 'Ok',
'data': base64.b64encode(payload)}
output.append(output_record)
else:
output_record = {
'recordId': record['recordId'],
'result': 'Dropped'}
output.append(output_record)
return {'records': output}
I am only trying to process "create" and "update" events and dropping the rest of the events. I got the sample code from AWS docs and built it from there.
This is giving the following error:
{"attemptsMade":1,"arrivalTimestamp":1653289182740,"errorCode":"Lambda.MissingRecordId","errorMessage":"One or more record Ids were not returned. Ensure that the Lambda function returns all received record Ids.","attemptEndingTimestamp":1653289231611,"rawData":"some data","lambdaArn":"arn:$LATEST"}
I am not able to get what this error means and how to fix it.

Bug: The return statement needs to be outside of the for loop. This is the cause of the error. The function is processing multiple recordIds, but only 1 recordId is returned. Unindent the return statement.
The data key must be included in output_record, even if the event is being dropped. You can base64 encode the original payload with no transformations.
Additional context: event['records'] and output must be the same length (length validation). Each dictionary in output must have a recordId key whose value equals a recordId value in a dictionary in event['record'] (recordId validation).
From AWS documentation:
The record ID is passed from Kinesis Data Firehose to Lambda during the invocation. The transformed record must contain the same record ID. Any mismatch between the ID of the original record and the ID of the transformed record is treated as a data transformation failure.
Reference: Amazon Kinesis Data Firehose Data Transformation

Related

Dynamically Insert/Update Item in DynamoDB With Python Lambda using event['body']

I am working on a lambda function that gets called from API Gateway and updates information in dynamoDB. I have half of this working really dynamically, and im a little stuck on updating. Here is what im working with:
dynamoDB table with a partition key of guild_id
My dummy json code im using:
{
"guild_id": "126",
"guild_name": "Posted Guild",
"guild_premium": "true",
"guild_prefix": "z!"
}
Finally the lambda code:
import json
import boto3
def lambda_handler(event, context):
client = boto3.resource("dynamodb")
table = client.Table("guildtable")
itemData = json.loads(event['body'])
guild = table.get_item(Key={'guild_id':itemData['guild_id']})
#If Guild Exists, update
if 'Item' in guild:
table.update_item(Key=itemData)
responseObject = {}
responseObject['statusCode'] = 200
responseObject['headers'] = {}
responseObject['headers']['Content-Type'] = 'application/json'
responseObject['body'] = json.dumps('Updated Guild!')
return responseObject
#New Guild, Insert Guild
table.put_item(Item=itemData)
responseObject = {}
responseObject['statusCode'] = 200
responseObject['headers'] = {}
responseObject['headers']['Content-Type'] = 'application/json'
responseObject['body'] = json.dumps('Inserted Guild!')
return responseObject
The insert part is working wonderfully, How would I accomplish a similar approach with update item? Im wanting this to be as dynamic as possible so I can throw any json code (within reason) at it and it stores it in the database. I am wanting my update method to take into account adding fields down the road and handling those
I get the follow error:
Lambda execution failed with status 200 due to customer function error: An error occurred (ValidationException) when calling the UpdateItem operation: The provided key element does not match the schema.
A "The provided key element does not match the schema" error means something is wrong with Key (= primary key). Your schema's primary key is guild_id: string. Non-key attributes belong in the AttributeUpdate parameter. See the docs.
Your itemdata appears to include non-key attributes. Also ensure guild_id is a string "123" and not a number type 123.
goodKey={"guild_id": "123"}
table.update_item(Key=goodKey, UpdateExpression="SET ...")
The docs have a full update_item example.

Error when calling Lambda UDF from Redshift

When calling a python lambda UDF from my Redshift stored procedure i am getting the following error. Any idea what could be wrong ?
ERROR: Invalid External Function Response Detail:
-----------------------------------------------
error: Invalid External Function Response code:
8001 context: Extra rows in external function response query: 0
location: exfunc_data.cpp:330 process: padbmaster [pid=8842]
-----------------------------------------------
My Python Lambda UDF looks as follows.
def lambda_handler(event, context):
#...
result = DoJob()
#...
ret = dict()
ret['results'] = result
ret_json = json.dumps(ret)
return ret_json
The above lambda function is associated to an external function in Redshift by name send_email_lambda. The permissions and invocation works without any issues. I am calling the lambda function as follows.
select send_email_lambda('sebder#company.com',
'recipient1#company.com',
'sample body',
'sample subject);
Edit :
As requested , adding the event payload passed from redshift to lambda.
{
"user":"awsuser",
"cluster":"arn:aws:redshift:us-central-1:dummy:cluster:redshift-test-cluster",
"database":"sample",
"external_function":"lambda_send_email",
"query_id":178044,
"request_id":"20211b87-26c8-6d6a-a256-1a8568287feb",
"arguments":[
[
"sender#company.com",
"user1#company.com,user2#company.com",
"<html><h1>Hello Therer</h1><p>A sample email from redshift. Take care and stay safe</p></html>",
"Redshift email lambda UDF",
"None",
"None",
"text/html"
]
],
"num_records":1
}
It looks like a UDF can be passed multiple rows of data. So, it could receive a request to send multiple emails. The code needs to loop through each of the top-level array, then extract the values from the array inside that.
It looks like it then needs to return an array that is the same length as the input array.
For your code, create an array with one entry and then return the dictionary inside that.

Kinesis Stream not getting the logs

I am receiving cloudtrail logs in Kinesis data stream. I am invoking a stream processing lambda function as described here. The final result that gets returned to the stream is then stored onto an S3 bucket. As of now, the processing fails with the following error file created in the S3 bucket:
{"attemptsMade":4,"arrivalTimestamp":1619677225356,"errorCode":"Lambda.FunctionError","errorMessage":"Check your function and make sure the output is in required format. In addition to that, make sure the processed records contain valid result status of Dropped, Ok, or ProcessingFailed","attemptEndingTimestamp":1619677302684,
Adding in the Python lambda function here for reference:
import base64
import gzip
import json
import logging
# Setup logging configuration
logging.basicConfig()
logger = logging.getLogger()
logger.setLevel(logging.DEBUG)
def unpack_kinesis_stream_records(event):
# decode and decompress each base64 encoded data element
return [gzip.decompress(base64.b64decode(k["data"])).decode('utf-8') for k in event["records"]]
def decode_raw_cloud_trail_events(cloudTrailEventDataList):
#Convert Raw Event Data List
eventList = [json.loads(e) for e in cloudTrailEventDataList]
#Filter out-non DATA_MESSAGES
filteredEvents = [e for e in eventList if e["messageType"] == 'DATA_MESSAGE']
#Covert each indidual log Event Message
events = []
for f in filteredEvents:
for e in f["logEvents"]:
events.append(json.loads(e["message"]))
logger.info("{0} Event Logs Decoded".format(len(events)))
return events
def handle_request(event, context):
#Log Raw Kinesis Stream Records
#logger.debug(json.dumps(event, indent=4))
# Unpack Kinesis Stream Records
kinesisData = unpack_kinesis_stream_records(event)
#[logger.debug(k) for k in kinesisData]
# Decode and filter events
events = decode_raw_cloud_trail_events(kinesisData)
####### INTEGRATION CODE GOES HERE #########
return f"Successfully processed {len(events)} records."
def lambda_handler(event, context):
return handle_request(event, context)
Can anyone help me understand the problem here.
I believe you are using 'kinesis firehose' service and not 'kinesis data stream'. code you are using is used to read directly from kinesis data stream and process cloudtrail events.
kinesis firehose data transformation lambda function is different. Firehose sends received cloudtrail events to lambda function. Lambda process/transform the events and should send those events back to firehose, so that firehose can deliver them to destination S3 bucket.
Your lambda function should return records in exactly same format as firehose expects them and each record should have either of the status [Dropped, Ok, or ProcessingFailed]. You can read more in aws doc

boto3: can't find queue that was immediately created before

I create an SQS queue in boto3 and immediately look for it via sqs.list_queues but it won't return anything.
when I input the SQS queue name into the console, it won't return anything until I input it again the second time.
So does this mean I need to call list_queues twice? Why is this happening? Why isn't AWS return queues that was immediately created before?
sqs = boto3.client('sqs')
myQ = sqs.create_queue(QueueName='just_created')
response = sqs.list_queues(
QueueNamePrefix='just_created'
)
response does not contain the usual array of QueueUrls
Just like many AWS services, SQS control plane is eventually consistent, meaning that it takes a while to propagate the data accross the systems.
If you need the URL of the queue you just created, you can find it in the return value of the create_queue call.
The following operation creates an SQS queue named MyQueue.
response = client.create_queue(
QueueName='MyQueue',
)
print(response)
Expected Output:
{
'QueueUrl': 'https://queue.amazonaws.com/012345678910/MyQueue',
'ResponseMetadata': {
'...': '...',
},
}

Lambda function not triggered by dynamodb update: Key Error

I am attempting to load a simple transactions.txt table into a S3 bucket where a Lambda function reads the file and populates DynamoDB tables for Customers and Transactions. This all works fine. However, I also have a Lambda function that is supposed to read the Transactions table as they populate the table and sum up the transaction totals by customer and insert them into another DynamoDB table--TransactionTotal.
My TotalNotifier Lambda function throws a "KeyError" regarding a "New Image". I believe the code is fine, and I have tried changing the type of Streams from 'New and Old' to just 'New' for the Transactions table and still encounter same error.
from __future__ import print_function
import json, boto3
# Connect to SNS
sns = boto3.client('sns')
alertTopic = 'HighBalanceAlert'
snsTopicArn = [t['TopicArn'] for t in sns.list_topics()['Topics'] if t['TopicArn'].endswith(':' + alertTopic)][0]
# Connect to DynamoDB
dynamodb = boto3.resource('dynamodb')
transactionTotalTableName = 'TransactionTotal'
transactionsTotalTable = dynamodb.Table(transactionTotalTableName);
# This handler is executed every time the Lambda function is triggered
def lambda_handler(event, context):
# Show the incoming event in the debug log
print("Event received by Lambda function: " + json.dumps(event, indent=2))
# For each transaction added, calculate the new Transactions Total
for record in event['Records']:
customerId = record['dynamodb']['NewImage']['CustomerId']['S']
transactionAmount = int(record['dynamodb']['NewImage']['TransactionAmount']['N'])
# Update the customer's total in the TransactionTotal DynamoDB table
response = transactionsTotalTable.update_item(
Key={
'CustomerId': customerId
},
UpdateExpression="add accountBalance :val",
ExpressionAttributeValues={
':val': transactionAmount
},
ReturnValues="UPDATED_NEW"
)
Here is a sample error from the CloudWatch log:
'NewImage': KeyError
Traceback (most recent call last):
File "/var/task/lambda_function.py", line 30, in lambda_handler
customerId = record['dynamodb']['NewImage']['CustomerId']['S']
KeyError: 'NewImage'
To elaborate on Oluwafemi's comment, you're likely experiencing this error when receiving a REMOVE event. Regardless of whether your stream is new and old images, or just new, you won't receive a NEW_IMAGE on a REMOVE event, since there is no new image. Check out the example events on aws docs.
A check on the value of record['eventName'] should solve the issue.