Error when calling Lambda UDF from Redshift - amazon-web-services

When calling a python lambda UDF from my Redshift stored procedure i am getting the following error. Any idea what could be wrong ?
ERROR: Invalid External Function Response Detail:
-----------------------------------------------
error: Invalid External Function Response code:
8001 context: Extra rows in external function response query: 0
location: exfunc_data.cpp:330 process: padbmaster [pid=8842]
-----------------------------------------------
My Python Lambda UDF looks as follows.
def lambda_handler(event, context):
#...
result = DoJob()
#...
ret = dict()
ret['results'] = result
ret_json = json.dumps(ret)
return ret_json
The above lambda function is associated to an external function in Redshift by name send_email_lambda. The permissions and invocation works without any issues. I am calling the lambda function as follows.
select send_email_lambda('sebder#company.com',
'recipient1#company.com',
'sample body',
'sample subject);
Edit :
As requested , adding the event payload passed from redshift to lambda.
{
"user":"awsuser",
"cluster":"arn:aws:redshift:us-central-1:dummy:cluster:redshift-test-cluster",
"database":"sample",
"external_function":"lambda_send_email",
"query_id":178044,
"request_id":"20211b87-26c8-6d6a-a256-1a8568287feb",
"arguments":[
[
"sender#company.com",
"user1#company.com,user2#company.com",
"<html><h1>Hello Therer</h1><p>A sample email from redshift. Take care and stay safe</p></html>",
"Redshift email lambda UDF",
"None",
"None",
"text/html"
]
],
"num_records":1
}

It looks like a UDF can be passed multiple rows of data. So, it could receive a request to send multiple emails. The code needs to loop through each of the top-level array, then extract the values from the array inside that.
It looks like it then needs to return an array that is the same length as the input array.
For your code, create an array with one entry and then return the dictionary inside that.

Related

Getting Error when pre-processing data from Kinesis with Lambda

I have a use case where I have to filter incoming data from Kinesis Firehose based on the type of the event. I should write only certain events to S3 and ignore the rest of the events. I am using lambda to filter the records. I am using following python code to achieve this:
def lambda_handler(event, context):
# TODO implement
output = []
for record in event['records']:
payload = base64.b64decode(record["data"])
payload_json = json.loads(payload)
event_type = payload_json["eventPayload"]["operation"]
if event_type == "create" or event_type == "update":
output_record = {
'recordId': record['recordId'],
'result': 'Ok',
'data': base64.b64encode(payload)}
output.append(output_record)
else:
output_record = {
'recordId': record['recordId'],
'result': 'Dropped'}
output.append(output_record)
return {'records': output}
I am only trying to process "create" and "update" events and dropping the rest of the events. I got the sample code from AWS docs and built it from there.
This is giving the following error:
{"attemptsMade":1,"arrivalTimestamp":1653289182740,"errorCode":"Lambda.MissingRecordId","errorMessage":"One or more record Ids were not returned. Ensure that the Lambda function returns all received record Ids.","attemptEndingTimestamp":1653289231611,"rawData":"some data","lambdaArn":"arn:$LATEST"}
I am not able to get what this error means and how to fix it.
Bug: The return statement needs to be outside of the for loop. This is the cause of the error. The function is processing multiple recordIds, but only 1 recordId is returned. Unindent the return statement.
The data key must be included in output_record, even if the event is being dropped. You can base64 encode the original payload with no transformations.
Additional context: event['records'] and output must be the same length (length validation). Each dictionary in output must have a recordId key whose value equals a recordId value in a dictionary in event['record'] (recordId validation).
From AWS documentation:
The record ID is passed from Kinesis Data Firehose to Lambda during the invocation. The transformed record must contain the same record ID. Any mismatch between the ID of the original record and the ID of the transformed record is treated as a data transformation failure.
Reference: Amazon Kinesis Data Firehose Data Transformation

Dynamically Insert/Update Item in DynamoDB With Python Lambda using event['body']

I am working on a lambda function that gets called from API Gateway and updates information in dynamoDB. I have half of this working really dynamically, and im a little stuck on updating. Here is what im working with:
dynamoDB table with a partition key of guild_id
My dummy json code im using:
{
"guild_id": "126",
"guild_name": "Posted Guild",
"guild_premium": "true",
"guild_prefix": "z!"
}
Finally the lambda code:
import json
import boto3
def lambda_handler(event, context):
client = boto3.resource("dynamodb")
table = client.Table("guildtable")
itemData = json.loads(event['body'])
guild = table.get_item(Key={'guild_id':itemData['guild_id']})
#If Guild Exists, update
if 'Item' in guild:
table.update_item(Key=itemData)
responseObject = {}
responseObject['statusCode'] = 200
responseObject['headers'] = {}
responseObject['headers']['Content-Type'] = 'application/json'
responseObject['body'] = json.dumps('Updated Guild!')
return responseObject
#New Guild, Insert Guild
table.put_item(Item=itemData)
responseObject = {}
responseObject['statusCode'] = 200
responseObject['headers'] = {}
responseObject['headers']['Content-Type'] = 'application/json'
responseObject['body'] = json.dumps('Inserted Guild!')
return responseObject
The insert part is working wonderfully, How would I accomplish a similar approach with update item? Im wanting this to be as dynamic as possible so I can throw any json code (within reason) at it and it stores it in the database. I am wanting my update method to take into account adding fields down the road and handling those
I get the follow error:
Lambda execution failed with status 200 due to customer function error: An error occurred (ValidationException) when calling the UpdateItem operation: The provided key element does not match the schema.
A "The provided key element does not match the schema" error means something is wrong with Key (= primary key). Your schema's primary key is guild_id: string. Non-key attributes belong in the AttributeUpdate parameter. See the docs.
Your itemdata appears to include non-key attributes. Also ensure guild_id is a string "123" and not a number type 123.
goodKey={"guild_id": "123"}
table.update_item(Key=goodKey, UpdateExpression="SET ...")
The docs have a full update_item example.

How to mock multiple dynamodb tables using moto

I have a function create which uses 3 dynamodb tables. How do i mock three Dynamo db tables?
def create():
//This function uses a dynamodb table "x"
// Then it calls my_table() function
def my_table():
// This function basically uses two dynamodb table "y" and "z"
// This function returns a value which is used in create() function.
My test file has following code -
#patch.dict(os.environ, {"DYNAMODB_TABLE": "x",
'second_TABLE': "y",
'Third_TABLE': "z"
})
def test_create():
dynamodb_test()
event = { // my event values}
result = create(event)
assert result == 200
def dynamodb_test():
with mock_dynamodb2():
dynamodb = boto3.client('dynamodb', region_name='us-east-1')
dynamodb.create_table(
TableName=os.environ["DYNAMODB_TABLE"],
KeySchema=[
{
'AttributeName': 'id',
'KeyType': 'HASH'
}
],
AttributeDefinitions=[
{
'AttributeName': 'id',
'AttributeType': 'S'
}
],
ProvisionedThroughput={
'ReadCapacityUnits': 1,
'WriteCapacityUnits': 1
}
)
yield dynamodb
whenever i am testing test_create() function using pytest , i am getting
botocore.exceptions.ClientError: An error occurred (ExpiredTokenException) when
calling the Scan operation: The security token included in the request is expired
I think its trying to access the actual aws dynamo db but i want it to use mock dynamodb. How can i achieve this ?
Moto only works when two conditions are met:
The logic to be tested is executed inside a Moto-context
The Moto-context is started before any boto3-clients (or resources) are created
The Moto-context in your example, with mock_dynamodb2(), is localized to the dynamodb_test-function. After the function finishes, the mock is no longer active, and Boto3 will indeed try to access AWS itself.
Solution
The following test-function would satisfy both criteria:
#patch.dict(os.environ, {"DYNAMODB_TABLE": "x",
'second_TABLE': "y",
'Third_TABLE': "z"
})
# Initialize the mock here, so that it is effective for the entire test duration
#mock_dynamodb2
def test_create():
dynamodb_test()
event = { // my event values}
# Ensure that any boto3-clients/resources created in the logic are initialized while the mock is active
from ... import create
result = create(event)
assert result == 200
def dynamodb_test():
# There is no need to start the mock-context again here, so create the table immediately
dynamodb = boto3.client('dynamodb', region_name='us-east-1')
dynamodb.create_table(...)
The test code you provided does not talk about creating tables y and z - if the logic expects them to exist, you'd have to create them manually as well of course (just like table x was created in dynamodb_test.
Documentation for the import quirk can be found here: http://docs.getmoto.org/en/latest/docs/getting_started.html#recommended-usage
I believe this post is almost identical to yours. You could try this or utilize some of the other existing tools like localstack or dynamodb-local. The Python client for localstack for example: https://github.com/localstack/localstack-python-client
EDIT:
I see your title explains you want to use moto. I don't see you importing moto any of the moto modules into your code. See the last snippet in this page and replace s3 with either dynamodb, dynamodb2 (whichever you are using)

Batch write to DynamoDB

I am trying to write data using batch_writer into DynamoDB using Lambda function. I am using "A1" as the partition key for my DynamoDB and when I try to pass the following Json input it works well.
{
"A1":"001",
"A2":{
"B1":"100",
"B2":"200",
"B3":"300"
}
}
When I try to send the following request I get an error.
{
"A1":{
"B1":"100",
"B2":"200",
"B3":"300"
}
}
Error -
"errorMessage": "An error occurred (ValidationException) when calling the BatchWriteItem operation: The provided key element does not match the schema"
Is it possible to write the data into DynamoDB using lambda function for this data and what should I change in my code to do that?
My code -
def lambda_handler(event, context):
with table.batch_writer() as batch:
batch.put_item(event)
return {"code":200, "message":"Data added success"}
It's hard to say without seeing the table definition, but my bet is that "A1" is the primary key of type string. If you try setting it to a map, it will fail.

AWS API Gateway only passing 1st variable to function, but lambda test passes all

I'm finding this odd issue where AWS is passing the URL String parameters to a Lambda function properly but there is a break down in API gateway only when Lambda runs a Python handler function that calls
KeywordSearch(keyword,page,RPP)
that passes 3 variables to keywordSearch. In the lambda IDE test, it works without issue and prints out all 3 variables as seen in the logs as
InsideKeywordSearch, Vars=:
keyword:
bombing
page:
1
RPP:
10
But when I run an API Gateway test the log is showing the variable is not passed into the function as seen in the log showing no variable for RPP or Page.
The keyword is passed only. Am I not defining the function correctly? it works in Lambda why not API gateway if this is so?
here is a snippet of the code.
Function Call
def handler(event, context):
print('Inside Handler Funciton')
keyword = event.get('search_keyword', None)
id = event.get('id', None)
RPP = event.get('RPP', 10)
page = event.get('page', 1)
#get event variables, if passed and filter bad input
print("keyword")
print(keyword)
print("id")
print(id)
print('RPP')
print(RPP)
print('page')
print(page)
if keyword is not None:
return keywordSearch(keyword,page,RPP)
elif id is not None:
return idSearch(id)
else:
return ""
Function
def keywordSearch (keyword, page, RPP):
print('InsideKeywordSearch, Vars=: ')
print("keyword: ")
print(keyword)
print(" page: ")
print(page)
print(" RPP: ")
print(RPP)
Lambda Logs shows
Function Logs:
6d Version: $LATEST
Inside Handler Funciton
keyword
bombing
id
None
RPP
10
page
1
InsideKeywordSearch, Vars=:
keyword:
bombing
page:
1
RPP:
10
[INFO] 2018-06-30T03:04:56.240Z 5dc7a2cc-7c12-11e8-8f39-f5112d2e976d SUCCESS: Connection to RDS mysql instance succeeded
API Gateway call shows
{
"errorMessage": "unsupported operand type(s) for -: 'str' and 'int'",
"errorType": "TypeError",
"stackTrace": [
[
"/var/task/app.py",
144,
"handler",
"return keywordSearch(keyword,page,RPP)"
],
[
"/var/task/app.py",
93,
"keywordSearch",
"sql = f\"SELECT attackid, SUM(MATCH(attack_fulltext) AGAINST('%{keyword}%' IN BOOLEAN MODE)) as score FROM search_index WHERE MATCH(attack_fulltext) AGAINST('%{keyword}%' IN BOOLEAN MODE) GROUP BY attackid ORDER BY score DESC Limit { ((page-1)*RPP) },{(RPP)};\""
]
]
}
Which tells me that it's not passing the variables because the SQL sting becomes invalid.
The issue was that the API gateway was passing in a number as a string. I had to fix the get method mapping template in integration request as follows
In order to pass variables from API Gateway to back-end as integers, the quotes around the variable values need to be removed. Please make sure the values are getting passed as an integer from the client otherwise you will get the same error that you mentioned in your initial correspondence. In this case, I update the body mapping template from:
{
"search_keyword" : "$input.params('search_keyword')",
"page": "$input.params('page')",
"RPP": "$input.params('RPP')"
}
to:
{
"search_keyword" : "$input.params('search_keyword')",
"page": $input.params('page'),
"RPP": $input.params('RPP')
}