Formatting for Firehose transformation output - amazon-web-services

I am using AWS Kinesis Firehose with a custom Data Transformation. The Lambda's written in Python 3.6 and returns strings that look like the following:
{
"records": [
{
"recordId": "...",
"result": "Ok",
"data": "..."
},
{
"recordId": "...",
"result": "Ok",
"data": "..."
},
{
"recordId": "...",
"result": "Ok",
"data": "..."
}
]
}
This Lambda is perfectly happy, and logs outputs that look like the above just before returning them to Firehose. However, the Firehose's S3 Logs then show an error:
Invalid output structure: Please check your function and make sure the processed records contain valid result status of Dropped, Ok, or ProcessingFailed.
Looking at the examples for this spread across the web in JS and Java, it's not clear to me what I need to be doing differently; I'm quite confused.

If your data is a json object, you can try following
import base64
import json
def lambda_handler(event, context):
output = []
for record in event['records']:
# your own business logic.
json_object = {...}
output_record = {
'recordId': record['recordId'],
'result': 'Ok',
'data': base64.b64encode(json.dumps(json_object).encode('utf-8')).decode('utf-8')
}
output.append(output_record)
return {'records': output}
base64.b64encode function only works with b'xxx' string while 'data' attribute of output_record needs a normal 'xxx' string.

I've found the same error using Node.js.
Reading the documentation http://docs.aws.amazon.com/firehose/latest/dev/data-transformation.html my mistake was not base64-encoding of the data field of every record.
I resolved doing this:
{
recordId: record.recordId,
result: 'Ok',
data: new Buffer(JSON.stringify(data)).toString('base64')
}

You can check the code in my repo.
https://github.com/hixichen/golang_lamda_decode_protobuf_firehose

Related

Using an EventBridge event pattern string in a lambda function

I have a lambda function using Python.
It's connected to an EventBridge rule that triggers every time there's a change in a Glue table.
The event pattern it's outputting looks something like this:
{
"version":"0",
"detail":{
"databaseName":"flights-db",
"typeOfChange":"UpdateTable",
"tableName":"flightscsv"
}
}
I want to get the tableName and databaseName values from this output into the function as a variable.
My Lambda function:
import json
import boto3
def lambda_handler(event, context):
boto3_version = boto3.__version__
return_statement = 'Boto3 version: ', boto3_version,\
'Event output: ', event
return {
'statusCode': 200,
'body': json.dumps(return_statement)
}
I was expecting to get the event pattern output from the event in my return statement but that's not the case.
When testing this function the return output for event is:
{\"key1\": \"value1\", \"key2\": \"value2\", \"key3\": \"value3\"}
This key and values are defined like this in the Test Pattern for the function.
The eventbridge rule is defined like this:
How can I get the values from the event pattern to a variable?
Do I need to configure the test pattern to get the results into event?
EDIT:
Picture of log events for the table change event:
The event object generated by CloudWatch (CW) Events / Event Bridge (EB) are listed here. These events will be passed to your function when it is going to get triggered by EB.
Your EB Event Pattern should be:
{
"source": ["aws.glue"],
"detail-type": ["Glue Data Catalog Table State Change"]
}
The above should match changes to any tables in your glue catalog. The event should be similar to the one below:
{
"version": "0",
"id": "2617428d-715f-edef-70b8-d210da0317a0",
"detail-type": "Glue Data Catalog Table State Change",
"source": "aws.glue",
"account": "123456789012",
"time": "2019-01-16T18:16:01Z",
"region": "eu-west-1",
"resources": [
"arn:aws:glue:eu-west-1:123456789012:table/d1/t1"
],
"detail": {
"databaseName": "d1",
"changedPartitions": [
"[C.pdf, dir3]",
"[D.doc, dir4]"
],
"typeOfChange": "BatchCreatePartition",
"tableName": "t1"
}
}
Thus, to get tableName and databaseName your lambda function could be:
import json
import boto3
def lambda_handler(event, context):
boto3_version = boto3.__version__
print(event)
table_name = event['detail']['tableName']
database_name = event['detail']['databaseName']
print(table_name, database_name)
return_statement = {
'boto3_version': boto3_version,
'table_name': table_name,
'database_name': database_name
}
return {
'statusCode': 200,
'body': json.dumps(return_statement)
}
For testing, you can setup sample EB event in your lambda test window:

How can I create a bot in Amazon Lex for getting weather update?

I am trying to get weather update. The Python code is working well but I am unable to embed it into Amazon Lex. It is showing received error response.
from botocore.vendored import requests
# using openweathermap api
api_address = 'http://api.openweathermap.org/data/2.5/weather?appid=__api_key_here__&q='
city = input("Enter city >> ")
url = api_address + city
json_data = requests.get(url).json()
formatted_data = json_data['weather'][0]['main']
desc_data = json_data['weather'][0]['description']
print(formatted_data)
print(desc_data)
# print(json_data)
Make sure api is running perfectly python code.
Depends on the next state you need to keep type as ElicitSlot or ElicitInten
If you are using lambda as backend for the lex, we need send the response in a below format.
You can refer the link for the Lambda response formats
Lambda response formats
{
"dialogAction": {
"type": "Close",
"fulfillmentState": "Fulfilled",
"message": {
"contentType": "PlainText",
"content": "Thanks, your pizza has been ordered."
},
"responseCard": {
"version": integer-value,
"contentType": "application/vnd.amazonaws.card.generic",
"genericAttachments": [
{
"title":"card-title",
"subTitle":"card-sub-title",
"imageUrl":"URL of the image to be shown",
"attachmentLinkUrl":"URL of the attachment to be associated with the card",
"buttons":[
{
"text":"button-text",
"value":"Value sent to server on button click"
}
]
}
]
}
}
}

How to disable (or redirect) logging on an AWS Step Function that calls parallel Lambda functions

I'm running an AWS step function with parallel execution branches.
Each branch succeeds individually, however the overall function fails with the following error:
States.DataLimitExceeded - The state/task returned a result with a size exceeding the maximum number of characters service limit.
I then found an article from AWS that describes this issue and suggests a work around:
https://docs.aws.amazon.com/step-functions/latest/dg/connect-lambda.html
That article says:
The Lambda invoke API includes logs in the response by default. Multiple Lambda invocations in a workflow can trigger States.DataLimitExceeded errors. To avoid this, include "LogType" = "None" as a parameter when you invoke your Lambda functions.
My question is where exactly do I put it? I've tried putting it various places in the state machine definition, however I get the following error:
The field 'LogType' is not supported by Step Functions
That error seems contrary to the support article, so perhaps I'm doing it wrong!
Any advice is appreciated, thanks in advance!
Cheers
UPDATE 1 :
To be clear, this is a parallel function, with 26 parallel branches. Each branch has a small output as per the example below. The biggest item in this data is the LogResult, which (when base64 decoded) is just the billing info. I think this info multiplied by 26 has led to the error, so I just want to turn this LogResult off!!!
{
"ExecutedVersion": "$LATEST",
"LogResult": "U1RBUlQgUmVxdWVzdElkOiBlODJjZTRkOS0zMjk2LTRlNDctYjcyZC1iYmEwMzI1YmM3MGUgVmVyc2lvbjogJExBVEVTVApFTkQgUmVxdWVzdElkOiBlODJjZTRkOS0zMjk2LTRlNDctYjcyZC1iYmEwMzI1YmM3MGUKUkVQT1JUIFJlcXVlc3RJZDogZTgyY2U0ZDktMzI5Ni00ZTQ3LWI3MmQtYmJhMDMyNWJjNzBlCUR1cmF0aW9uOiA3NzI5Ljc2IG1zCUJpbGxlZCBEdXJhdGlvbjogNzgwMCBtcwlNZW1vcnkgU2l6ZTogMTAyNCBNQglNYXggTWVtb3J5IFVzZWQ6IDEwNCBNQglJbml0IER1cmF0aW9uOiAxMTY0Ljc3IG1zCQo=",
"Payload": {
"statusCode": 200,
"body": {
"signs": 63,
"nil": ""
}
},
"SdkHttpMetadata": {
"HttpHeaders": {
"Connection": "keep-alive",
"Content-Length": "53",
"Content-Type": "application/json",
"Date": "Thu, 21 Nov 2019 04:00:42 GMT",
"X-Amz-Executed-Version": "$LATEST",
"X-Amz-Log-Result": "U1RBUlQgUmVxdWVzdElkOiBlODJjZTRkOS0zMjk2LTRlNDctYjcyZC1iYmEwMzI1YmM3MGUgVmVyc2lvbjogJExBVEVTVApFTkQgUmVxdWVzdElkOiBlODJjZTRkOS0zMjk2LTRlNDctYjcyZC1iYmEwMzI1YmM3MGUKUkVQT1JUIFJlcXVlc3RJZDogZTgyY2U0ZDktMzI5Ni00ZTQ3LWI3MmQtYmJhMDMyNWJjNzBlCUR1cmF0aW9uOiA3NzI5Ljc2IG1zCUJpbGxlZCBEdXJhdGlvbjogNzgwMCBtcwlNZW1vcnkgU2l6ZTogMTAyNCBNQglNYXggTWVtb3J5IFVzZWQ6IDEwNCBNQglJbml0IER1cmF0aW9uOiAxMTY0Ljc3IG1zCQo=",
"x-amzn-Remapped-Content-Length": "0",
"x-amzn-RequestId": "e82ce4d9-3296-4e47-b72d-bba0325bc70e",
"X-Amzn-Trace-Id": "root=1-5dd60be1-47c4669ce54d5208b92b52a4;sampled=0"
},
"HttpStatusCode": 200
},
"SdkResponseMetadata": {
"RequestId": "e82ce4d9-3296-4e47-b72d-bba0325bc70e"
},
"StatusCode": 200
}
I ran into exactly the same problem as you recently. You haven't said what your lambdas are doing or returning however I found that AWS refers to limits that tasks have within executions https://docs.aws.amazon.com/step-functions/latest/dg/limits.html#service-limits-task-executions.
What I found was that my particular lambda had an extremely long response with 10s of thousands of characters. Amending that so that the response from the lambda was more reasonable got past the error in the step function.
I had the problem a week ago.
I way I solved is like below:
You can define which portion of the result that is transmitted to the next step.
For that you have to use
"OutputPath": "$.part2",
In your json input you have
"part1": {
"portion1": {
"procedure": "Delete_X"
},
"portion2":{
"procedure": "Load_X"
}
},
"part2": {
"portion1": {
"procedure": "Delete_Y"
},
"portion2":{
"procedure": "Load_Y"
}
}
Once part1 is processed, you make sure that part1 is not sent in the output and the resultpath related to it. Just part 2 which is needed for the following steps is sent for the next steps.
With this: "OutputPath": "$.part2",
let me know if that helps
I got stuck on the same issue. Step function imposes a limit of 32,768 characters on the data that can be passed around between two states.
https://docs.aws.amazon.com/step-functions/latest/dg/limits.html
Maybe you need to think and breakdown your problem in a different way? That's what I did. Because removing the log response would give you some elasticity but your solution will not scale after a certain limit.
I handle large data in my Step Functions by storing the result in an S3 bucket, and then having my State Machine return the path to the result-file (and a brief summary of the data or a status like PASS/FAIL).
The same could be done using a DB if that's more comfortable.
This way won't have to modify your results' current format, you can just pass the reference around instead of a huge amount of data, and they are persisted as long as you'd like to have them.
The start of the Lambdas looks something like this to figure out if the input is from a file or plain data:
bucket_name = util.env('BUCKET_NAME')
if 'result_path' in input_data.keys():
# Results are in a file that is referenced.
try:
result_path = input_data['result_path']
result_data = util.get_file_content(result_path, bucket_name)
except Exception as e:
report.append(f'Failed to parse JSON from {result_path}: {e}')
else:
# Results are just raw data, not a reference.
result_data = input_data
Then at the end of the Lambda they will upload their results and return directions to that file:
import boto3
def upload_results_to_s3(bucket_name, filename, result_data_to_upload):
try:
s3 = boto3.resource('s3')
results_prefix = 'Path/In/S3/'
results_suffix = '_Results.json'
result_file_path = '' + results_prefix + filename + results_suffix
s3.Object(bucket_name, result_file_path).put(
Body=(bytes(json.dumps(result_data_to_upload, indent=2).encode('UTF-8')))
)
return result_file_path
result_path = upload_results_to_s3(bucket_name, filename, result_data_to_upload)
result_obj = {
"result_path": result_path,
"bucket_name": bucket_name
}
return result_obj
Then the next Lambda will have the first code snippet in it, in order to get the input from the file.
The Step Function Nodes look like this, where the Result will be result_obj in the python code above:
"YOUR STATE":
{
"Comment": "Call Lambda that puts results in file",
"Type": "Task",
"Resource": "arn:aws:lambda:YOUR LAMBDA ARN",
"InputPath": "$.next_function_input",
"ResultPath": "$.next_function_input",
"Next": "YOUR-NEXT-STATE"
}
Something you can do is, add "emptyOutputPath": "" to your json,
"emptyOutputPath": "",
"part1": { "portion1": { "procedure": "Delete_X"
}, "portion2":{ "procedure": "Load_X" } },
"part2": { "portion1": { "procedure": "Delete_Y"
}, "portion2":{ "procedure": "Load_Y" } }
That will allow you to do "OutputPath":"$.emptyOutputPath" which is empty and will clear ResultPath.
Hope that helps
Just following up on this issue to close the loop.
I basically gave up on using parallel lambdas in favour of using AQS message queues instead

Boto3 athena query without saving data to s3

I am trying to use boto3 to run a set of queries and don't want to save the data to s3. Instead I just want to get the results and want to work with those results. I am trying to do the following
import boto3
client = boto3.client('athena')
response = client.start_query_execution(
QueryString='''SELECT * FROM mytable limit 10''',
QueryExecutionContext={
'Database': 'my_db'
}.
ResultConfiguration={
'OutputLocation': 's3://outputpath',
}
)
print(response)
But here I don't want to give ResultConfiguration because I don't want to write the results anywhere. But If I remove the ResultConfiguration parameter I get the following error
botocore.exceptions.ParamValidationError: Parameter validation failed:
Missing required parameter in input: "ResultConfiguration"
So it seems like giving s3 output location for writing is mendatory. So what could the way to avoid this and get the results only in response?
The StartQueryExecution action indeed requires a S3 output location. The ResultConfiguration parameter is mandatory.
The alternative way to query Athena is using JDBC or ODBC drivers. You should probably use this method if you don't want to store results in S3.
You will have to specify an S3 temp bucket location whenever running the 'start_query_execution' command. However, you can get a result set (a dict) by running the 'get_query_results' method using the query id.
The response (dict) will look like this:
{
'UpdateCount': 123,
'ResultSet': {
'Rows': [
{
'Data': [
{
'VarCharValue': 'string'
},
]
},
],
'ResultSetMetadata': {
'ColumnInfo': [
{
'CatalogName': 'string',
'SchemaName': 'string',
'TableName': 'string',
'Name': 'string',
'Label': 'string',
'Type': 'string',
'Precision': 123,
'Scale': 123,
'Nullable': 'NOT_NULL'|'NULLABLE'|'UNKNOWN',
'CaseSensitive': True|False
},
]
}
},
'NextToken': 'string'
}
For more information, see boto3 client doc: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/athena.html#Athena.Client.get_query_results
You can then delete all files in the S3 temp bucket you've specified.
You still need to provide s3 as temporary location for Athena to save the data although you want to process the data using python. But you can page through the data as tuple using Pagination API. please refer to the example here. Hope that helps

Obtaining error while using amazon lex "Invalid Lambda Response: Received invalid response from Lambda"

I'm trying to develop a chatbot using AWS Lex. But unfortunately, I'm getting an error while building the chat on Lex. I'm using one intent and 2 slots. For some reason, when the lambda function is connected to the chat, the second value of the slot is saved as null. But when I run it in lambda as a test case, it's successful.
Right now, all I want to do is show a response message after the details of the slot is entered.
This is my code
public class LexBot implements RequestHandler<Map<String, Object>, Object> {
#Override
public Object handleRequest(Map<String, Object> input, Context context) {
// LexRequest lexRequest = LexRequestFactory.createLexRequest(input);
String content = "Request came from the bot: ";
Message message = new Message("PlainText", content);
DialogAction dialogAction = new DialogAction("Close", "Fullfiled", message);
return new LexRespond(dialogAction);
}
}
And this is the error I'm getting in AWS Lex:
An error has occurred: Invalid Lambda Response: Received invalid
response from Lambda: Can not construct instance of Message, problem:
content must not be blank at [Source:
{"dialogAction":{"type":"Close","message":{"contentType":"PlainText","some_respond_message":"Request
came from the bot: "}}}; line: 1, column: 122]
If you are using amazon lexv2, then amazon lex expecting a different JSON response compared to lexv1.
Sample lambda response which is accpeted by lex:
{
"sessionState": {
"dialogAction": {
"type": "Close"
},
"intent": {
"confirmationState": "Confirmed",
"name": "SearchProducts",
"state": "Fulfilled",
},
},
"messages": [
{
"contentType": "PlainText",
"content": "Select from the list",
}
]
}
Check here for the full response structure https://docs.aws.amazon.com/lexv2/latest/dg/lambda.html
According to the docs, below is the correct format for constructing the final response:
{
"sessionAttributes": session_attributes,
"dialogAction":{
"type":"Close",
"fulfillmentState":"Fulfilled",
"message":{
"contentType":"PlainText",
"content":message
}
}
}
Use this format for constructing the response to avoid the error.
You spelled "fulfilled" incorrectly - you typed "Fullfiled" as pasted in below:
DialogAction dialogAction = new DialogAction("Close", "Fullfiled", message);