DynamoDB Trigger Lambda Function Call Failed - amazon-web-services

I am trying to have events in a DynamoDB table trigger Lambda function that moves the events into Kinesis Data Firehose. Kinesis then batches the files and send them to an S3 bucket. The Lambda function I am using as the trigger fails.
This is the Lambda code for the trigger:
```
import json
import boto3
firehose_client = boto3.client('firehose')
def lambda_handler(event, context):
resultString = ""
for record in event['Records']:
parsedRecord = parseRawRecord(record['dynamodb'])
resultString = resultString + json.dumps(parsedRecord) + "\n"
print(resultString)
response = firehose_client.put_record(
DeliveryStreamName="OrdersAuditFirehose",
Record={
'Data': resultString
}
)
def parseRawRecord(record):
result = {}
result["orderId"] = record['NewImage']['orderId']['S']
result["state"] = record['NewImage']['state']['S']
result["lastUpdatedDate"] = record['NewImage']['lastUpdatedDate']['N']
return result
```
Edit: Cloudwatch Log
The goal is to get the lambda function to move events to Kinesis triggered by events in DynamoDB
Edit2: Cloudwatch

I'm going to post this as my initial answer, and will edit when you return with the exception from your Lambda Logs.
Edit
The issue is that that you are looking for a key in a dict which does not exist
result["lastUpdatedDate"]
lastUpdatedDate is not inside result. It may be useful to check the contents of the dict by logging it to your logs.
print(result)
There is no need to use Lambda when you want integration between DynamoDB and Firehose. Instead of DynamoDB Streams you can use Kinesis Data Streams which will integrate directly with Firehose without the need for extra code.
DynamoDB -> Kinesis Stream -> Kinesis Firehose -> S3
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/kds.html
If you really want to use DynamoDB Streams then you can also avoid the Lambda code by using EventBridge Pipes
DynamoDB -> EventBridge Pipe -> Kinesis Firehose -> S3
https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-pipes.html#pipes-targets
Both of the above solutions result in no-code delivery of DynamoDB events to Firehose.

Related

how to publish result of a Lambda Function to a cross-account Kinesis Stream

Say I have two accounts 111111111111 and 222222222222 and want to do following.
(Lambda) -> (Kinesis)
111111111111 222222222222
Where Lambda function is a trigger for a data source (could be another Kinesis stream in 111111111111).
exports.handler = async (event, context) => {
// data transformed here
const result = event.records.map(record => {});
return {data: result};
}
I am trying to format the data in 111111111111's Lambda Function and then send it to 22222222222's Kinesis stream, but I couldn't find many resources on this.
I came across this SO post. IAM role aside, it seems like each invocation of the Lambda Function needs to create a session with 22222222222 account and creates a Kinesis instance in order to call PutRecord. This looks like a red flag to me as I was thinking Lambda function could just set up a cross-account destination with resourceArn to send its result data to. What am I missing and is there better alternate to doing this?
This looks like a red flag to me as I was thinking Lambda function could just set up a cross-account destination with resourceArn to send its result data to.
This is not a red flag. The cross-account IAM roles is how it is done for kinesis, because kinesis streams don't have resource-based policies. So you have to assume IAM role from account 2, in your lambda.
I'm not sure which resourceArn are you referring to. The only one I can think of is resourceArn for Kinesis Data Analytics. This does not apply to Kinesis Data Streams.

S3 notification configuration is ambiguously defined?

I am trying to use Terraform to connect two AWS lambda functions to an S3 event.
My configuration looks like this:
resource "aws_s3_bucket_notification" "collection_write" {
bucket = aws_s3_bucket.fruit.id
lambda_function {
lambda_function_arn = aws_lambda_function.apple.arn
events = [ "s3:ObjectCreated:*" ]
filter_prefix = "foo/"
}
lambda_function {
lambda_function_arn = aws_lambda_function.banana.arn
events = [ "s3:ObjectCreated:*" ]
filter_prefix = "foo/"
}
}
But AWS / Terraform doesn't like this:
Error: Error putting S3 notification configuration: InvalidArgument: Configuration is ambiguously defined. Cannot have overlapping suffixes in two rules if the prefixes are overlapping for the same event type.
status code: 400
How should I write this in Terraform?
your terraform is not wrong it is just that s3 is limited to a single event notification. It is better to have the s3 event sent to an SNS topic which then triggers the lambdas to achieve the same functionality.
You can also achieve triggering multiple Lambda functions via AWS SQS. AWS SQS is powerful and easy to use Messaging queue for such use cases.

How to copy state from my react-native app to my AWS Kinesis Firehose stream

With the necessary imports & configuration it is quite simple to use Storage.put from aws amplify to record text in this.state.appended to S3 mybucket using a function...
doComprehend = () => {
Storage.put('apptext.txt', this.state.appended)
.then (result => {
console.log('result: ', result)
})
.catch(err => console.log('error: ', err));
}
How would I alter this function so that it sends the text in this.state.appended to an AWS Firehose stream instead?
Context of my problem: my RN app sends text to an aws bucket as the apptext.txt file, that triggers a lambda that calls aws comprehend and comprehend medical and it all works but I can't append the results of the lambda to the apptext.txt file inside the S3 bucket because that's not possible...
I want to change it so that the text I'm interested in is first put into a firehose stream, transformed there by the lambda and then saved to the S3 bucket.
I've looked at the Amplify docs. I've looked at aws-sdk-js. I tried npm firehoser. I can't figure out how to do it.

How to Trigger Glue ETL Pyspark job through S3 Events or AWS Lambda?

I'm planning to write certain jobs in AWS Glue ETL using Pyspark, which I want to get triggered as and when a new file is dropped in an AWS S3 Location, just like we do for triggering AWS Lambda Functions using S3 Events.
But, I see very narrowed down options only, to trigger a Glue ETL script. Any help on this shall be highly appreciated.
The following should work to trigger a Glue job from AWS Lambda. Have the lambda configured to the appropriate S3 bucket, and IAM roles / permissions assigned to AWS Lambda so that lambda can start the AWS Glue job on behalf of the user.
import boto3
print('Loading function')
def lambda_handler(_event, _context):
glue = boto3.client('glue')
gluejobname = "YOUR GLUE JOB NAME"
try:
runId = glue.start_job_run(JobName=gluejobname)
status = glue.get_job_run(JobName=gluejobname, RunId=runId['JobRunId'])
print("Job Status : ", status['JobRun']['JobRunState'])
except Exception as e:
print(e)
raise

AWS Configure Kinesis Stream with DynamoDB Lambda

From following this question, AWS DynamoDB Stream into Redshift
DynamoDB --> DynamoDBStreams --> Lambda Function --> Kinesis Firehose --> Redshift.
How do I configure my Kinesis function to pick up the Lambda function source?
I created a DynamoDB table (Purchase Sales), and Added DynamoDB Streams. Then I configured the Lambda function to pickup the DynamoDB Stream. My question is how do I configure Kinesis to pick up the Lambda function Source? I know how to configure Lambda Transformation, however would like to pick up as Source. Not sure how to configure the Direct Put Source below.
Thanks,
Performed these steps:
In your case, you would stream the dynamodb to redshift
DynamoDB --> DynamoDBStreams --> Lambda Function --> Kinesis Firehose --> Redshift.
First, you need a lambda function handle the DynamoDBStream. For each DynamoDBStream event, use firehose PutRecord API to send the data to firehose. From the example
var firehose = new AWS.Firehose();
firehose.putRecord({
DeliveryStreamName: 'STRING_VALUE', /* required */
Record: { /* required */
Data: new Buffer('...') || 'STRING_VALUE' /* Strings will be Base-64 encoded on your behalf */ /* required */
}
}, function(err, data) {
if (err) console.log(err, err.stack); // an error occurred
else console.log(data); // successful response
});
Next, we have to know how the data being insert into the RedShift. From the firehose document,
For data delivery to Amazon Redshift, Kinesis Firehose first delivers
incoming data to your S3 bucket in the format described earlier.
Kinesis Firehose then issues an Amazon Redshift COPY command to load
the data from your S3 bucket to your Amazon Redshift cluster.
So, we should know what data format to let the COPY command map the data into RedShift schema. We have to follow the data format requirement for redshift COPY command.
By default, the COPY command expects the source data to be
character-delimited UTF-8 text. The default delimiter is a pipe
character ( | ).
So, you could program the lambda which input dynamodb stream event, transform it to pipe (|) separated line record, and write it to firehose.
var firehose = new AWS.Firehose();
firehose.putRecord({
DeliveryStreamName: 'YOUR_FIREHOSE_NAME',
Record: { /* required */
Data: "RED_SHIFT_COLUMN_1_DATA|RED_SHIFT_COLUMN_2_DATA\n"
}
}, function(err, data) {
if (err) console.log(err, err.stack); // an error occurred
else console.log(data); // successful response
});
remember to add \n as the firehose will not append new line for you.