s3.CopyObject with new metadata failed - amazon-web-services

Problem
Hi, I wanted to add metadata(CacheControl: 'max-age=3600;s-maxage=3600') whenever new file uploaded in S3. So, I made lambda code which triggered by S3 PUT. However, metadata(CacheControl) is not added in uploaded File even though code does not have error.. Could you help me :(
[enter image description here][1]
[1]: https://i.stack.imgur.com/8MfbF.png
My lambda code is here
async function retrieveNewFile(event){
const bucket = event.Records[0].s3.bucket.name;
const key = decodeURIComponent(event.Records[0].s3.object.key.replace(/\+/g, ' '));
const params = {
Bucket: bucket,
Key: key
}
console.log('* OriginData: ' + JSON.stringify(params));
return params;
}
async function addCacheControl(existedData){
existedData.CopySource = existedData.Bucket + '/' + existedData.Key;
existedData.CacheControl = 'max-age=3600;s-maxage=3600';
//existedData['x-amz-metadata-directive'] = 'replace';
console.log(existedData);
await s3.copyObject(existedData).promise();
return existedData;
}
** Note
I tried using 'putObject' instead of 'copyObject' but if I use 'putObject', code is in loop because of 'PUT' which triggered my lambda. (I cannot split the directory for this. so I want to use 'copyObject' or something...)

I modified function not to use 's3.copyobject', but use 's3.headObject' :)
async function checkHeaderExist(file){
const header= await s3.headObject(file).promise();
console.log(header);
if(header.CacheControl){
return 'exist';
}else return 'no exist';
}

Looking around I found a similar post [1]. The solution could be to check if the metadata has been updated and only update if it has not. This way, the first time the object is uploaded it triggers the lambda and puts a new object; the second time it os triggered it checks if the object is updated and closes the lambda.
[1] AWS Lambda function and S3 - change metadata on an object in S3 only if the object changed

Related

Aws Lambda function to save DynamoDb deleted data to S3

I have set up lambda function, firehose, and S3 bucket to save deleted DynamoDB data to S3.
My lambda function was written in C#.
var client = new AmazonKinesisFirehoseClient();
try
{
context.Logger.LogInformation($"Write to Kinesis Firehose: {list.Count}");
var request = new PutRecordBatchRequest
{
DeliveryStreamName = _kinesisStream,
Records = new List<Amazon.KinesisFirehose.Model.Record> ()
};
foreach (var item in list)
{
var stringWrite = new StringWriter();
string json = JsonConvert.SerializeObject(item, new JsonSerializerSettings { NullValueHandling = NullValueHandling.Ignore });
byte[] byteArray = UTF8Encoding.UTF8.GetBytes(ToLiteral(json));
var record = new Amazon.KinesisFirehose.Model.Record
{
Data = new MemoryStream(byteArray)
};
request.Records.Add(record);
}
if (request.Records.Count > 0)
{
var response = await client.PutRecordBatchAsync(request);
Console.WriteLine($"FailedPutCount: {response.FailedPutCount} status: {response.HttpStatusCode}");
}
}
catch (Exception e)
{
Console.WriteLine(e);
}
The "list" is a list of objects
There are some messages in Firehose logs:
"message": "Check your function and make sure the output is in required format. In addition to that, make sure the processed records contain valid result status of Dropped, Ok, or ProcessingFailed",
"errorCode": "Lambda.FunctionError"
I also see some error msg S3 bucket:
Error>
AccessDenied
Access Denied
TCG5YV3ZM3EQ4DWE
jRDkHxATNADXilsiy59IYkkechd6nqlyAEe0UDuN7qaNZS3zEIjblZJS9mGMktdCSb8AIFUam5I=
However, when when I downloaded the error file. I saw the following:
attemptsMade":4,"arrivalTimestamp":1661897166462,"errorCode":"Lambda.FunctionError","errorMessage":"Check your function and make sure the output is in required format. In addition to that, make sure the processed records contain valid result status of Dropped, Ok, or ProcessingFailed","attemptEndingTimestamp":1661897241573,"rawData":"XXXXXXXXX"
The "rawData" can be decoded to the original json string writing to firehose using
https://www.base64decode.org/
to decode it.
I have struggled for a couple of days. Please help.
I found my problem. The lambda function to put records to firehose works. The problem was I enabled the data transformation and add C# lambda there. That causes the data format issue. The data transformation function needs to return a different format data.
The solution is to disable the Data transformation.

I am learning to create AWS Lambdas. I want to create a "chain": S3 -> 4 Chained Lambda()'s -> RDS. I can't get the first lambda to call the second

I really tried everything. Surprisingly google has not many answers when it comes to this.
When a certain .csv file is uploaded to a S3 bucket I want to parse it and place the data into a RDS database.
My goal is to learn the lambda serverless technology, this is essentially an exercise. Thus, I over-engineered the hell out of it.
Here is how it goes:
S3 Trigger when the .csv is uploaded -> call lambda (this part fully works)
AAA_Thomas_DailyOverframeS3CsvToAnalytics_DownloadCsv downloads the csv from S3 and finishes with essentially the plaintext of the file. It is then supposed to pass it to the next lambda. The way I am trying to do this is by putting the second lambda as destination. The function works, but the second lambda is never called and I don't know why.
AAA_Thomas_DailyOverframeS3CsvToAnalytics_ParseCsv gets the plaintext as input and returns a javascript object with the parsed data.
AAA_Thomas_DailyOverframeS3CsvToAnalytics_DecryptRDSPass only connects to KMS, gets the encrcypted RDS password, and passes it along with the data it received as input to the last lambda.
AAA_Thomas_DailyOverframeS3CsvToAnalytics_PutDataInRds then finally puts the data in RDS.
I created a custom VPC with custom subnets, route tables, gateways, peering connections, etc. I don't know if this is relevant but function 2. only has access to the s3 endpoint, 3. does not have any internet access whatsoever, 4. is the only one that has normal internet access (it's the only way to connect to KSM), and 5. only has access to the peered VPC which hosts the RDS.
This is the code of the first lambda:
// dependencies
const AWS = require('aws-sdk');
const util = require('util');
const s3 = new AWS.S3();
let region = process.env;
exports.handler = async (event, context, callback) =>
{
var checkDates = process.env.CheckDates == "false" ? false : true;
var ret = [];
var checkFileDate = function(actualFileName)
{
if (!checkDates)
return true;
var d = new Date();
var expectedFileName = 'Overframe_-_Analytics_by_Day_Device_' + d.getUTCFullYear() + '-' + (d.getUTCMonth().toString().length == 1 ? "0" + d.getUTCMonth() : d.getUTCMonth()) + '-' + (d.getUTCDate().toString().length == 1 ? "0" + d.getUTCDate() : d.getUTCDate());
return expectedFileName == actualFileName.substr(0, expectedFileName.length);
};
for (var i = 0; i < event.Records.length; ++i)
{
var record = event.Records[i];
try {
if (record.s3.bucket.name != process.env.S3BucketName)
{
console.error('Unexpected notification, unknown bucket: ' + record.s3.bucket.name);
continue;
}
if (!checkFileDate(record.s3.object.key))
{
console.error('Unexpected file, or date is not today\'s: ' + record.s3.object.key);
continue;
}
const params = {
Bucket: record.s3.bucket.name,
Key: record.s3.object.key
};
var csvFile = await s3.getObject(params).promise();
var allText = csvFile.Body.toString('utf-8');
console.log('Loaded data:', {Bucket: params.Bucket, Filename: params.Key, Text: allText});
ret.push(allText);
} catch (error) {
console.log("Couldn't download CSV from S3", error);
return { statusCode: 500, body: error };
}
}
// I've been randomly trying different ways to return the data, none works. The data itself is correct , I checked with console.log()
const response = {
statusCode: 200,
body: { "Records": ret }
};
return ret;
};
While this shows how the lambda was set up, especially its destination:
I haven't posted on Stackoverflow in 7 years. That's how desperate I am. Thanks for the help.
Rather than getting each Lambda to call the next one take a look at AWS managed service for state machines, step functions which can handle this workflow for you.
By providing input and outputs you can pass output to the next function, with retry logic built into it.
If you haven't much experience AWS has a tutorial on setting up a step function through chaining Lambdas.
By using this you also will not need to account for configuration issues such as Lambda timeouts. In addition it allows your code to be more modular which improves testing the individual functionality, whilst also isolating issues.
The execution roles of all Lambda functions, whose destinations include other Lambda functions, must have the lambda:InvokeFunction IAM permission in one of their attached IAM policies.
Here's a snippet from Lambda documentation:
To send events to a destination, your function needs additional permissions. Add a policy with the required permissions to your function's execution role. Each destination service requires a different permission, as follows:
Amazon SQS – sqs:SendMessage
Amazon SNS – sns:Publish
Lambda – lambda:InvokeFunction
EventBridge – events:PutEvents

s3.putObject(params).promise() does not upload file, but successfully executes then() callback

I had pretty long number of attempts to put a file in S3 bucket, after which I have to update my model.
I have following code (note that I have tried commented lines too. It works neither with comments nor without it.)
The problem observed:
Everything in the first .then() block (successCallBack()) gets successfully executed, but I do not see result of s3.putObject().
The bucket in question is public, no access restrictions. It used to work with sls offline option, then because of it not working in AWS I had to make lot of changes and managed to make successCallback() work which does the database work successfully. However, file upload still doesn't work.
Some questions:
While solving this, the real questions I am pondering / searching are,
Is lambda supposed to return something? I saw AWS docs but they have fragmented code snippets.
Putting await in front of s3.putObject(params).promise() does not help. I see samples with and without await in front of things that have AWS Promise() function call. Not sure which ones are correct.
What is the correct way when you have chained async functions to accomplish within one lambda function?
UPDATE:
var myJSON = {}
const createBook = async (event) => {
let bucketPath = "https://com.xxxx.yyyy.aa-bb-zzzzzz-1.amazonaws.com"
let fileKey = //file key
let path = bucketPath + "/" + fileKey;
myJSON = {
//JSON from headers
}
var s3 = new AWS.S3();
let buffer = Buffer.from(event.body, 'utf8');
var params = {Bucket: 'com.xxxx.yyyy', Key: fileKey, Body: buffer, ContentEncoding: 'utf8'};
let putObjPromise = s3.putObject(params).promise();
putObjPromise
.then(successCallBack())
.then(c => {
console.log('File upload Success!');
return {
statusCode: 200,
headers: { 'Content-Type': 'text/plain' },
body: "Success!!!"
}
})
.catch(err => {
let str = "File upload / Creation error:" + err;
console.log(str);
return {
statusCode: err.statusCode || 500,
headers: { 'Content-Type': 'text/plain' },
body: str
}
});
}
const successCallBack = async () => {
console.log("Inside success callback - " + JSON.stringify(myJSON)) ;
const { myModel } = await connectToDatabase()
console.log("After connectToDatabase")
const book = await myModel.create(myJSON)
console.log(msg);
}
Finally, I got this to work. My code worked already in sls offline setup.
What was different on AWS endpoint?
What I observed on console was the fact that my lambda was set to run under VPC.
When I chose No VPC, it worked. I do not know if this is the best practice. There must be some security advantage obtained by functions running under VPC.
I came across this huge explanation about VPC but I could not find anything related to S3.
The code posted in the question currently runs fine on AWS endpoint.
If the lambda is running in a VPC then you would need a VPC endpoint to access a service outside the vpc. S3 would be outside the VPC. Perhaps if security is an issue then creating a VPC endpoint would solve the issue in a better way. Also, if security is an issue, then perhaps adding a policy (or using the default AmazonS3FullAccess policy) to the role that the lambda is using, then the S3 bucket wouldn't need to be public.

delete folder from s3 nodejs

Hey guys I was trying to delete a folder from s3 with stuff in it but deleteObjects wasn't working so I found this script online and it works great my question is why does it work? Why do you have to listObjects when deleting a folder on s3 why cant I just pass it the objects name? Why doesn't It error when I attempt to delete the folder without listing the objects first.
first attempt (doesnt work)
var filePath2 = "templates/" + key + "/test/";
var toPush = { Key: filePath2 };
deleteParams.Delete.Objects.push(toPush);
console.log("deleteParams", deleteParams);
console.log("deleteParams.Delete", deleteParams.Delete);
const deleteResult = await s3.deleteObjects(deleteParams).promise();
console.log("deleteResult", deleteResult);
keep in mind folderPath2 is a folder that has other stuff in it I get no error but yet the catch isn't triggered and it says deleted and than the folder name.
second attempt (works)
async function deleteFromS3(bucket, path) {
const listParams = {
Bucket: bucket,
Prefix: path
};
const listedObjects = await s3.listObjectsV2(listParams).promise();
console.log("listedObjects", listedObjects);
if (listedObjects.Contents.length === 0) return;
const deleteParams = {
Bucket: bucket,
Delete: { Objects: [] }
};
listedObjects.Contents.forEach(({ Key }) => {
deleteParams.Delete.Objects.push({ Key });
});
console.log("deleteParams", deleteParams);
const deleteResult = await s3.deleteObjects(deleteParams).promise();
console.log("deleteResult", deleteResult);
if (listedObjects.IsTruncated && deleteResult)
await deleteFromS3(bucket, path);
}
than I call the function like so
const result = await deleteFromS3(myBucketName, folderPath);
Folders do not exist in Amazon S3. It is a flat object storage system, where the filename (Key) for each object contains the full path.
While Amazon S3 does support the concept of a Common Prefix, which can make things appear as though they are in folders/directories, folders do not actually exist.
For example, you could run a command like this:
aws s3 cp foo.txt s3://my-bucket/folder1/folder2/foo.txt
This would work even if the folders do not exist! It is merely storing an object with a Key of folder1/folder2/foo.txt.
If you were then to delete that object, the 'folder' would disappear because no object has it as a path. That is because the folder never actually existed.
Sometimes people want an empty folder to appear, so they create a zero-length object with the same name as the folder, eg folder1/folder2/.
So, your first program did not work because it deleted the 'folder', which has nothing to do with deleting the content of the folder (since there is no concept of 'content' of a folder).

Why does AWS S3 JavaScript Example Code save objects into a nameless subfolder?

I tried posting the following question on the AWS forum but I got the error message - "Your account is not ready for posting messages yet.", which is why I'm posting this here.
I am reading through the following example code for Amazon S3:
https://docs.aws.amazon.com/sdk-for-javascript/v2/developer-guide/s3-example-photo-album-full.html
Whenever a new object is created, the example code nests the object within a nameless sub-folder like so:
function addPhoto(albumName) {
var files = document.getElementById('photoupload').files;
if (!files.length) {
return alert('Please choose a file to upload first.');
}
var file = files[0];
var fileName = file.name;
// Why is the photo placed in a nameless subfolder (below)?
var albumPhotosKey = encodeURIComponent(albumName) + '//';
...
Is there a particular reason / need for this?