Aws Lambda function to save DynamoDb deleted data to S3 - amazon-web-services

I have set up lambda function, firehose, and S3 bucket to save deleted DynamoDB data to S3.
My lambda function was written in C#.
var client = new AmazonKinesisFirehoseClient();
try
{
context.Logger.LogInformation($"Write to Kinesis Firehose: {list.Count}");
var request = new PutRecordBatchRequest
{
DeliveryStreamName = _kinesisStream,
Records = new List<Amazon.KinesisFirehose.Model.Record> ()
};
foreach (var item in list)
{
var stringWrite = new StringWriter();
string json = JsonConvert.SerializeObject(item, new JsonSerializerSettings { NullValueHandling = NullValueHandling.Ignore });
byte[] byteArray = UTF8Encoding.UTF8.GetBytes(ToLiteral(json));
var record = new Amazon.KinesisFirehose.Model.Record
{
Data = new MemoryStream(byteArray)
};
request.Records.Add(record);
}
if (request.Records.Count > 0)
{
var response = await client.PutRecordBatchAsync(request);
Console.WriteLine($"FailedPutCount: {response.FailedPutCount} status: {response.HttpStatusCode}");
}
}
catch (Exception e)
{
Console.WriteLine(e);
}
The "list" is a list of objects
There are some messages in Firehose logs:
"message": "Check your function and make sure the output is in required format. In addition to that, make sure the processed records contain valid result status of Dropped, Ok, or ProcessingFailed",
"errorCode": "Lambda.FunctionError"
I also see some error msg S3 bucket:
Error>
AccessDenied
Access Denied
TCG5YV3ZM3EQ4DWE
jRDkHxATNADXilsiy59IYkkechd6nqlyAEe0UDuN7qaNZS3zEIjblZJS9mGMktdCSb8AIFUam5I=
However, when when I downloaded the error file. I saw the following:
attemptsMade":4,"arrivalTimestamp":1661897166462,"errorCode":"Lambda.FunctionError","errorMessage":"Check your function and make sure the output is in required format. In addition to that, make sure the processed records contain valid result status of Dropped, Ok, or ProcessingFailed","attemptEndingTimestamp":1661897241573,"rawData":"XXXXXXXXX"
The "rawData" can be decoded to the original json string writing to firehose using
https://www.base64decode.org/
to decode it.
I have struggled for a couple of days. Please help.

I found my problem. The lambda function to put records to firehose works. The problem was I enabled the data transformation and add C# lambda there. That causes the data format issue. The data transformation function needs to return a different format data.
The solution is to disable the Data transformation.

Related

s3.CopyObject with new metadata failed

Problem
Hi, I wanted to add metadata(CacheControl: 'max-age=3600;s-maxage=3600') whenever new file uploaded in S3. So, I made lambda code which triggered by S3 PUT. However, metadata(CacheControl) is not added in uploaded File even though code does not have error.. Could you help me :(
[enter image description here][1]
[1]: https://i.stack.imgur.com/8MfbF.png
My lambda code is here
async function retrieveNewFile(event){
const bucket = event.Records[0].s3.bucket.name;
const key = decodeURIComponent(event.Records[0].s3.object.key.replace(/\+/g, ' '));
const params = {
Bucket: bucket,
Key: key
}
console.log('* OriginData: ' + JSON.stringify(params));
return params;
}
async function addCacheControl(existedData){
existedData.CopySource = existedData.Bucket + '/' + existedData.Key;
existedData.CacheControl = 'max-age=3600;s-maxage=3600';
//existedData['x-amz-metadata-directive'] = 'replace';
console.log(existedData);
await s3.copyObject(existedData).promise();
return existedData;
}
** Note
I tried using 'putObject' instead of 'copyObject' but if I use 'putObject', code is in loop because of 'PUT' which triggered my lambda. (I cannot split the directory for this. so I want to use 'copyObject' or something...)
I modified function not to use 's3.copyobject', but use 's3.headObject' :)
async function checkHeaderExist(file){
const header= await s3.headObject(file).promise();
console.log(header);
if(header.CacheControl){
return 'exist';
}else return 'no exist';
}
Looking around I found a similar post [1]. The solution could be to check if the metadata has been updated and only update if it has not. This way, the first time the object is uploaded it triggers the lambda and puts a new object; the second time it os triggered it checks if the object is updated and closes the lambda.
[1] AWS Lambda function and S3 - change metadata on an object in S3 only if the object changed

I am learning to create AWS Lambdas. I want to create a "chain": S3 -> 4 Chained Lambda()'s -> RDS. I can't get the first lambda to call the second

I really tried everything. Surprisingly google has not many answers when it comes to this.
When a certain .csv file is uploaded to a S3 bucket I want to parse it and place the data into a RDS database.
My goal is to learn the lambda serverless technology, this is essentially an exercise. Thus, I over-engineered the hell out of it.
Here is how it goes:
S3 Trigger when the .csv is uploaded -> call lambda (this part fully works)
AAA_Thomas_DailyOverframeS3CsvToAnalytics_DownloadCsv downloads the csv from S3 and finishes with essentially the plaintext of the file. It is then supposed to pass it to the next lambda. The way I am trying to do this is by putting the second lambda as destination. The function works, but the second lambda is never called and I don't know why.
AAA_Thomas_DailyOverframeS3CsvToAnalytics_ParseCsv gets the plaintext as input and returns a javascript object with the parsed data.
AAA_Thomas_DailyOverframeS3CsvToAnalytics_DecryptRDSPass only connects to KMS, gets the encrcypted RDS password, and passes it along with the data it received as input to the last lambda.
AAA_Thomas_DailyOverframeS3CsvToAnalytics_PutDataInRds then finally puts the data in RDS.
I created a custom VPC with custom subnets, route tables, gateways, peering connections, etc. I don't know if this is relevant but function 2. only has access to the s3 endpoint, 3. does not have any internet access whatsoever, 4. is the only one that has normal internet access (it's the only way to connect to KSM), and 5. only has access to the peered VPC which hosts the RDS.
This is the code of the first lambda:
// dependencies
const AWS = require('aws-sdk');
const util = require('util');
const s3 = new AWS.S3();
let region = process.env;
exports.handler = async (event, context, callback) =>
{
var checkDates = process.env.CheckDates == "false" ? false : true;
var ret = [];
var checkFileDate = function(actualFileName)
{
if (!checkDates)
return true;
var d = new Date();
var expectedFileName = 'Overframe_-_Analytics_by_Day_Device_' + d.getUTCFullYear() + '-' + (d.getUTCMonth().toString().length == 1 ? "0" + d.getUTCMonth() : d.getUTCMonth()) + '-' + (d.getUTCDate().toString().length == 1 ? "0" + d.getUTCDate() : d.getUTCDate());
return expectedFileName == actualFileName.substr(0, expectedFileName.length);
};
for (var i = 0; i < event.Records.length; ++i)
{
var record = event.Records[i];
try {
if (record.s3.bucket.name != process.env.S3BucketName)
{
console.error('Unexpected notification, unknown bucket: ' + record.s3.bucket.name);
continue;
}
if (!checkFileDate(record.s3.object.key))
{
console.error('Unexpected file, or date is not today\'s: ' + record.s3.object.key);
continue;
}
const params = {
Bucket: record.s3.bucket.name,
Key: record.s3.object.key
};
var csvFile = await s3.getObject(params).promise();
var allText = csvFile.Body.toString('utf-8');
console.log('Loaded data:', {Bucket: params.Bucket, Filename: params.Key, Text: allText});
ret.push(allText);
} catch (error) {
console.log("Couldn't download CSV from S3", error);
return { statusCode: 500, body: error };
}
}
// I've been randomly trying different ways to return the data, none works. The data itself is correct , I checked with console.log()
const response = {
statusCode: 200,
body: { "Records": ret }
};
return ret;
};
While this shows how the lambda was set up, especially its destination:
I haven't posted on Stackoverflow in 7 years. That's how desperate I am. Thanks for the help.
Rather than getting each Lambda to call the next one take a look at AWS managed service for state machines, step functions which can handle this workflow for you.
By providing input and outputs you can pass output to the next function, with retry logic built into it.
If you haven't much experience AWS has a tutorial on setting up a step function through chaining Lambdas.
By using this you also will not need to account for configuration issues such as Lambda timeouts. In addition it allows your code to be more modular which improves testing the individual functionality, whilst also isolating issues.
The execution roles of all Lambda functions, whose destinations include other Lambda functions, must have the lambda:InvokeFunction IAM permission in one of their attached IAM policies.
Here's a snippet from Lambda documentation:
To send events to a destination, your function needs additional permissions. Add a policy with the required permissions to your function's execution role. Each destination service requires a different permission, as follows:
Amazon SQS – sqs:SendMessage
Amazon SNS – sns:Publish
Lambda – lambda:InvokeFunction
EventBridge – events:PutEvents

How can I improve the Performance (response time) for the AWS Encryption SDK on decrypt

I have an AWS Lambda that is used to encrypt PII (Personal Identifying Information) using the AWS Encryption SDK before storing it in DynamoDB.
When retrieving the data from DynamoDB using a different Lambda to display to end users, the average time for each call to KMS is 9.48sec. This is averaged across roughly 2k requests with requests ranging from ~14.5 seconds to ~5.1 seconds. The calls to KMS are being made asynchronously.
The total time from the first KMS call to the last is ~20 seconds
We have considered using data key caching and read this AWS blog post about when to use it.
The input of our data may not be frequent enough to take full advantage of caching, and I am trying to find other ways to improve the performance.
Decrypt Code Sippet:
async function decryptWithKeyring(keyring: KmsKeyringNode, ciphertext: string, context: {}) {
const b: Buffer = Buffer.from(ciphertext, 'base64');
const { plaintext, messageHeader } = await decrypt(keyring, b);
const { encryptionContext } = messageHeader;
Object.entries(context).forEach(([key, value]) => {
if (encryptionContext[key] !== value) {
throw new Error('Encryption Context does not match expected values');
}
});
return plaintext.toString();
}
Encrypt Snippet:
async function encryptWithKeyring(keyring: KmsKeyringNode, value: any, context: any) {
const { result } = await encrypt(keyring, value, { encryptionContext: context });
return result.toString('base64');
}
The conversion to base64 was to facilitate storing in DynamoDB.
XRay Trace Map
Sampling of KMS Trace data

Publish a json message to AWS SNS topic using C#

I Am trying to publish a Json Message to AWS SNS topic from my C# Application using AWS SDk. Its [enter image description here][1]populating message in string format and message attribute filed is not populated.
Code sample is as below:
var snsClient = new AmazonSimpleNotificationServiceClient(accessId, secretrkey, RegionEndpoint.USEast1);
PublishRequest publishReq = new PublishRequest()
{
TargetArn = topicARN,
MessageStructure = "json",
Message = JsonConvert.SerializeObject(message)
};
var msgAttributes = new Dictionary<string, MessageAttributeValue>();
var msgAttribute = new MessageAttributeValue();
msgAttribute.DataType = "String";
msgAttribute.StringValue = "123";
msgAttributes.Add("Objectcd", msgAttribute);
publishReq.MessageAttributes = msgAttributes;
PublishResponse response = snsClient.Publish(publishReq);
Older question but answering as I came across when dealing with similar issue
When you set the MessageStructure to "json".
The json must contain at least a top-level JSON key of "default" with a value that is a string.
So json needs to look like
{
"default" : "my message"
}
My solution looks something like
var messageDict = new Dictionary<string,object>()
messageDict["default"] = "my message";
PublishRequest publishReq = new PublishRequest()
{
TargetArn = topicARN,
MessageStructure = "json",
Message = JsonConvert.SerializeObject(messageDict)
};
// if json is an object
// then
messageDict["default"] = JsonConvert.SerializeObject(myMessageObject);
I'm am using PublishAsync on v3
From the documentation
https://docs.aws.amazon.com/sdkfornet/v3/apidocs/items/SNS/TPublishRequest.html
Message structure
Gets and sets the property MessageStructure.
Set MessageStructure to json if you want to send a different message for each protocol. For example, using one publish action, you can send a short message to your SMS subscribers and a longer message to your email subscribers. If you set MessageStructure to json, the value of the Message parameter must:
be a syntactically valid JSON object; and
contain at least a top-level JSON key of "default" with a value that is a string.
You can define other top-level keys that define the message you want to send to a specific transport protocol (e.g., "http").
Valid value: json
Great coincidence!
I was just busy writing a C# implementation to publish a message to SNS when I stumbled up on this post. Hopefully this helps you.
The messageBody argument we pass down to PublishMessageAsync is a string, it can be deserialized JSON for example.
public class SnsClient : ISnsClient
{
private readonly IAmazonSimpleNotificationService _snsClient;
private readonly SnsOptions _snsOptions; // You can inject any options you want here.
public SnsClient(IOptions<SnsOptions> snsOptions, // I'm using the IOptionsPattern as I have the TopicARN defined in the appsettings.json
IAmazonSimpleNotificationService snsClient)
{
_snsOptions = snsOptions.Value;
_snsClient = snsClient;
}
public async Task<PublishResponse> PublishMessageAsync(string messageBody)
{
return await _snsClient.PublishAsync(new PublishRequest
{
TopicArn = _snsOptions.TopicArn,
Message = messageBody
});
}
}
Also note the above setup uses Dependency Injection, so it would require you to set up an ISnsClient and you register an instance when bootstrapping the application, something as following:
services.TryAddSingleton<ISnsClient, SnsClient>();

AWS DataPipelineClient - listPipelines returns no records

I am trying to access my AWS DataPipelines using AWS Java SDK v1.7.5, but listPipelines is returning an empty list in the code below.
I have DataPipelines that are scheduled in the US East region, which I believe I should be able to list using the listPipelines method of the DataPipelineClient. I am already using the ProfilesConfigFile to authenticate and connect to S3, DynamoDB and Kinesis without a problem. I've granted the PowerUserAccess Access Policy to the IAM user specified in the config file. I've also tried applying the Administrator Access policy to the user, but it didn't change anything. Here's the code I'm using:
//Establish credentials for connecting to AWS.
File configFile = new File(System.getProperty("user.home"), ".aws/config");
ProfilesConfigFile profilesConfigFile = new ProfilesConfigFile(configFile);
AWSCredentialsProvider awsCredentialsProvider = new ProfileCredentialsProvider(profilesConfigFile, "default");
//Set up the AWS DataPipeline connection.
DataPipelineClient dataPipelineClient = new DataPipelineClient(awsCredentialsProvider);
Region usEast1 = Region.getRegion(Regions.US_EAST_1);
dataPipelineClient.setRegion(usEast1);
//List all pipelines we have access to.
ListPipelinesResult listPipelinesResult = dataPipelineClient.listPipelines(); //empty list returned here.
for (PipelineIdName p: listPipelinesResult.getPipelineIdList()) {
System.out.println(p.getId());
}
Make sure to check if there are more results - I've noticed sometimes the API returns only few pipelines (could even be empty), but has a flag for more results. You can retrieve them like this:
void listPipelines(DataPipelineClient dataPipelineClient, String marker) {
ListPipelinesRequest request = new ListPipelinesRequest();
if (marker != null) {
request.setMarker(marker);
}
ListPipelinesResult listPipelinesResult = client.listPipelines(request);
for (PipelineIdName p: listPipelinesResult.getPipelineIdList()) {
System.out.println(p.getId());
}
// Call recursively if there are more results:
if (pipelineList.getHasMoreResults()) {
listPipelines(dataPipelineClient, listPipelinesResult.getMarker());
}
}