Download file from Amazon S3 to lambda and extract it [duplicate]

Download file from Amazon S3 to lambda and extract it [duplicate] - amazon-web-services

Good day guys.
I have a simple question: How do I download an image from a S3 bucket to Lambda function temp folder for processing? Basically, I need to attach it to an email (this I can do when testing locally).
I have tried:
s3.download_file(bucket, key, '/tmp/image.png')
as well as (not sure which parameters will help me get the job done):
s3.getObject(params, (err, data) => {
if (err) {
console.log(err);
const message = `Error getting object ${key} from bucket ${bucket}.`;
console.log(message);
callback(message);
} else {
console.log('CONTENT TYPE:', data.ContentType);
callback(null, data.ContentType);
}
});
Like I said, simple question, which for some reason I can't find a solution for.
Thanks!

You can get the image using the aws s3 api, then write it to the tmp folder using fs.
var params = { Bucket: "BUCKET_NAME", Key: "OBJECT_KEY" };
s3.getObject(params, function(err, data){ if (err) {
console.error(err.code, "-", err.message);
return callback(err); }
fs.writeFile('/tmp/filename', data.Body, function(err){
if(err)
console.log(err.code, "-", err.message);
return callback(err);
});
});
Out of curiousity, why do you need to write the file in order to attach it? It seems kind of redundant to write the file to disk so that you can then read it from disk

If you're writing it straight to the filesystem you can also do it with streams. It may be a little faster/more memory friendly, especially in a memory-constrained environment like Lambda.
var fs = require('fs');
var path = require('path');
var params = {
Bucket: "mybucket",
Key: "image.png"
};
var tempFileName = path.join('/tmp', 'downloadedimage.png');
var tempFile = fs.createWriteStream(tempFileName);
s3.getObject(params).createReadStream().pipe(tempFile);

// Using NodeJS version 10.0 or later and promises
const fsPromise = require('fs').promises;
try {
const params = {
Bucket: 's3Bucket',
Key: 'file.txt',
};
const data = await s3.getObject(params).promise();
await fsPromise.writeFile('/tmp/file.txt', data.Body);
} catch(err) {
console.log(err);
}

I was having the same problem, and the issue was that I was using Runtime.NODEJS_12_X in my AWS lambda.
When I switched over to NODEJS_14_X it started working for me :').
Also
The /tmp is required. It will directly write to /tmp/file.ext.

Related

"InvalidParameterType" error for image files sent as blob to AWS Textract from external source

CURRENTLY
I am trying to get AWS Textract working for images supplied from a function in Google Scripts, that is sent to a Lambda resolved. I am following documentation on https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/Textract.html#analyzeDocument-property
My Google Scripts code:
function googleFunction(id) {
let file = DriveApp.getFileById(id);
console.log("File is a " + file.getMimeType());
let blob = file.getBlob();
let params = {
doc: blob,
};
var options = {
method: "PUT",
"Content-Type": "application/json",
payload: JSON.stringify(params),
};
let response = UrlFetchApp.fetch("https://api-path/prod/resolver", options);
}
My Lambda resolver code:
"use strict";
const AWS = require("aws-sdk");
exports.handler = async (event) => {
let params = JSON.parse(event.body);
console.log("Parse as document...");
let textract = new AWS.Textract();
let doc = params["doc"];
let config = {
Document: {
Bytes: doc,
FeatureTypes: ["TABLES"],
}
};
textract.analyzeDocument(config, function (err, data) {
console.log("analyzing...");
if (err) {
console.log(err, err.stack);
}
// an error occurred
else {
console.log("data:" + JSON.stringfy(data));
} // successful response
});
};
ISSUE
File is successfully sent from Google Scripts to Lambda, but the following error is returned:
"errorType": "InvalidParameterType",
"errorMessage": "Expected params.Document.Bytes to be a string, Buffer, Stream, Blob, or typed array object"
Questions
Is there a way of verifying what the format of the doc variable is, to ensure it meets AWS Textract's requirements?
Can anyone see a possible cause for the errors being returned?
NOTES
Textract works fine when the same file is uploaded to an S3 bucked, and supplied in the config using:
S3Object: { Bucket: 'bucket_name', Name: 'file_name' }
I have confirmed the file is a JPEG

Got it working with 2 changes:
added getBytes() to Google side code
added Buffer.from() to AWS side code
My Google Scripts code:
function googleFunction(id) {
let file = DriveApp.getFileById(id);
console.log("File is a " + file.getMimeType());
let blob = file.getBlob().getBytes();
let params = {
doc: blob,
};
var options = {
method: "PUT",
"Content-Type": "application/json",
payload: JSON.stringify(params),
};
let response = UrlFetchApp.fetch("https://api-path/prod/resolver", options);
}
My Lambda resolver code:
"use strict";
const AWS = require("aws-sdk");
exports.handler = async (event) => {
let params = JSON.parse(event.body);
console.log("Parse as document...");
let textract = new AWS.Textract();
let doc = params["doc"];
let config = {
Document: {
Bytes: Buffer.from(doc),
FeatureTypes: ["TABLES"],
}
};
textract.analyzeDocument(config, function (err, data) {
console.log("analyzing...");
if (err) {
console.log(err, err.stack);
}
// an error occurred
else {
console.log("data:" + JSON.stringfy(data));
} // successful response
});
};

How to set credentials in AWS SDK v3 JavaScript?

I am scouring the documentation, and it only provides pseudo-code of the credentials for v3 (e.g. const client = new S3Client(clientParams)
How do I initialize an S3Client with the bucket and credentials to perform a getSignedUrl request? Any resources pointing me in the right direction would be most helpful. I've even searched YouTube, SO, etc and I can't find any specific info on v3. Even the documentation and examples doesn't provide the actual code to use credentials. Thanks!
As an aside, do I have to include the fake folder structure in the filename, or can I just use the actual filename? For example: bucket/folder1/folder2/uniqueFilename.zip or uniqueFilename.zip
Here's the code I have so far: (Keep in mind I was returning the wasabiObjKey to ensure I was getting the correct file name. I am. It's the client, GetObjectCommand, and getSignedUrl that I'm having issues with.
exports.getPresignedUrl = functions.https.onCall(async (data, ctx) => {
const wasabiObjKey = `${data.bucket_prefix ? `${data.bucket_prefix}/` : ''}${data.uid.replace(/-/g, '_').toLowerCase()}${data.variation ? `_${data.variation.replace(/\./g, '').toLowerCase()}` : ''}.zip`
const { S3Client, GetObjectCommand } = require('#aws-sdk/client-s3')
const s3 = new S3Client({
bucketEndpoint: functions.config().s3_bucket.name,
region: functions.config().s3_bucket.region,
credentials: {
secretAccessKey: functions.config().s3.secret,
accessKeyId: functions.config().s3.access_key
}
})
const command = new GetObjectCommand({
Bucket: functions.config().s3_bucket.name,
Key: wasabiObjKey,
})
const { getSignedUrl } = require("#aws-sdk/s3-request-presigner")
const url = getSignedUrl(s3, command, { expiresIn: 60 })
return wasabiObjKey
})

There are a credential chain that provide credential to your API calls from SDK
https://docs.aws.amazon.com/sdk-for-javascript/v2/developer-guide/setting-credentials-node.html
Loaded from AWS Identity and Access Management (IAM) roles for Amazon
EC2
Loaded from the shared credentials file (~/.aws/credentials)
Loaded from environment variables
Loaded from a JSON file on disk
Other credential-provider classes provided by the JavaScript SDK
You can embed the credential inside your source code but it's not the prefered way
new S3Client(configuration: S3ClientConfig): S3Client
Where S3ClientConfig contain a credentials property
https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/clients/client-s3/modules/credentials.html
const { S3Client,GetObjectCommand } = require("#aws-sdk/client-s3");
let client = new S3Client({
region:'ap-southeast-1',
credentials:{
accessKeyId:'',
secretAccessKey:''
}
});
(async () => {
const response = await client.send(new GetObjectCommand({Bucket:"BucketNameHere",Key:"ObjectNameHere"}));
console.log(response);
})();
Sample answer
'$metadata': {
httpStatusCode: 200,
requestId: undefined,
extendedRequestId: '7kwrFkEp3lEnLU+OtxjrgdmS6gQmvPdbnqqR7I8P/rdFrUPBkdKYPYykWivuHPXCF1IHgjCIbe8=',
cfId: undefined,
attempts: 1,
totalRetryDelay: 0
},

Here's a simple approach I use (in Deno) for testing (in case you don't want to go the signedUrl approach and just let the SDK do the heavy lifting for you):
import { config as env } from 'https://deno.land/x/dotenv/mod.ts' // https://github.com/pietvanzoen/deno-dotenv
import { S3Client, ListObjectsV2Command } from 'https://cdn.skypack.dev/#aws-sdk/client-s3' // https://github.com/aws/aws-sdk-js-v3
const {AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY} = env()
// https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/clients/client-s3/modules/credentials.html
const credentials = {
accessKeyId: AWS_ACCESS_KEY_ID,
secretAccessKey: AWS_SECRET_ACCESS_KEY,
}
// https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/clients/client-s3/interfaces/s3clientconfig.html
const config = {
region: 'ap-southeast-1',
credentials,
}
// https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/clients/client-s3/classes/s3client.html
const client = new S3Client(config)
export async function list() {
// https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/clients/client-s3/interfaces/listobjectsv2commandinput.html
const input = {
Bucket: 'BucketNameHere'
}
// https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/clients/client-s3/classes/command.html
const cmd = new ListObjectsV2Command(input)
// https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/clients/client-s3/classes/listobjectsv2command.html
return await client.send(cmd)
}

AWS s3 V3 Javascript SDK stream file from bucket (GetObjectCommand)

Iv looked all over AWS docks and stack overflow (even went to page 4 of google!!!) but i cannot for the life of me work out how to stream a file from S3. The docs for V3 are pretty useless and all the examples i find are from V2.
The send commond that V3 uses only returns a promise so how do i get a stream and pipe it instead of waiting for the whole file (it needs to be piped into encryption algo then to a response stream)
this.s3.send(
new GetObjectCommand({
Bucket: '...',
Key: key,
}),
);
I was able to upload fine by passing the stream as the body, is there something i have to do similar here?
uploadToAws(key) {
const pass = new PassThrough();
return {
writeStream: pass,
promise: this.s3.send(
new PutObjectCommand({
Bucket: '...',
Key: key,
Body: pass,
ServerSideEncryption: '...',
ContentLength: 37,
}),
),
};
}

Body from the GetObjectCommand is a readable stream (https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/clients/client-s3/interfaces/getobjectcommandoutput.html#body).
So you can do:
const command = new GetObjectCommand({
Bucket
Key,
});
const item = await s3Client.send(command);
item.Body.pipe(createWriteStream(fileName));

docs links
For those landing here after googling because aws v3 sdk docs are missing details on getobjectcommandoutput interface, you can find the full getobjectcommandoutput definition at source or at "module" → https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/clients/client-s3/modules/getobjectoutput.html

Figured it out,
s3Client.send(command) returns a type GetObjectCommandOutput.
const data: GetObjectCommandOutput = await s3Client.send(command)
data.Body is of type SdkStream<Readable | ReadableStream<any> | Blob | undefined> | undefined
Undefined is for error cases, you check there is no error case like this
if (!data.Body)//handle error
For success case, you can get ReadableStream like this
const readableStream: ReadableStream = data.Body!.transformToWebStream()
For aws-sdk V2, there was createReadStream(), this is seems to be the way in v3.
To pipe through ReadableStream, use
readableStream.pipeTo() or readableStream.pipeThrough()

Using transformToString() as per docs: https://docs.aws.amazon.com/AmazonS3/latest/userguide/example_s3_GetObject_section.html.
export const handler = async event => {
try {
// Retrieve the object from S3
const data = await s3.getObject({ Bucket: BUCKET_NAME, Key: PATH_AND_FILE_NAME });
// Set the content type of the response
const contentType = data.ContentType;
// Convert to base64 string
const streamToString = await data.Body?.transformToString("base64");
// Return the object data in the response
return {
statusCode: 200,
headers: {
"Content-Type": contentType
},
body: streamToString,
isBase64Encoded: true
};
} catch (error) {
return {
statusCode: 500,
body: "An error occurred: " + error.message
};
}
};

In case you also run into a similar issue while using typescript, This helped me
(object.Body as any).pipe(res);
I did this because as mentioned by laconbass, aws v3 sdk docs are missing details on getobjectcommandoutput.

Read S3 video file, process it with ffmpeg and upload to S3

I have a video stored in s3 bucket with authenticated-read ACL.
I need to read and make a trailer with ffmpeg (nodejs)
Here's the code I use to generate the trailer
exports.generatePreview = (req, res) => {
const getParams = {
Bucket: S3_CREDENTIALS.bucketName,
Key: req.params.key
}
s3.getSignedUrl('getObject', getParams, (err, signedRequest) => {
console.log(signedRequest, err, 'getSignedUrl')
ffmpeg(new URL(signedRequest))
.size('640x?')
.aspect('4:3')
.seekInput('3:00')
.duration('0:30')
.then(function (video) {
s3.putObject({ Bucket: S3_CREDENTIALS.bucketName, key: 'preview_' + req.body.key, Body: video }, function (err, data) {
console.log(err, data)
})
});
});
}
Unfortunately, the constructor path seems not to read remote url. If I try to execute an ffmpeg command line with the same signedurl (i.e. ffmpeg -i "https://[bucketname].s3.eu-west-1.amazonaws.com/[key.mp4]?[signedParams]" -vn -acodec pcm_s16le -ar 44100 -ac 2 video.wav)
The error I get is that the signedRequest url 'The input file does not exist'
It seems fs.readFileSync https is not supported even if I try the request with http with the same result. fs.readFileSync(signedurl) => gives the same result
How to overcome this issue?

If you're using node-ffmpeg this isn't possible because the library only accepts a string pointing to a local path, but fluent-ffmpeg does support readstreams so give that a try.
For example (untested, just spitballing):
const ffmpeg = require('fluent-ffmpeg');
const stream = require('stream');
exports.generatePreview = (req, res) => {
let params = {Bucket: S3_CREDENTIALS.bucketName, Key: req.params.key};
// Retrieve object stream
let readStream = s3.getObject(params).createReadStream();
// Set up the ffmpeg process
let ffmpegProcess = new ffmpeg(readStream)
//Add your args here
.toFormat('mp4');
ffmpegProcess.on('error', (err, stdout, stderr) => {
// Handle errors here
}).on('end', () => {
// Processing is complete
}).pipe(() => {
// Create a new stream
let pt = new stream.PassThrough();
// Reuse the same params object and set the Body to the stream
params.Key = 'preview_' + req.body.key;
params.Body = pt;
// Upload and wait for the result
s3.upload(params, (err, data) => {
if (err)
return console.error(err);
console.log("done");
})
});
});
This will have high memory requirements so if this is a Lambda function you might play around with retrieving only the first X bytes of the file and converting only that.

AWS S3 Bucket Upload using CollectionFS and cfs-s3 meteor package

I am using Meteor.js with Amazon S3 Bucket for uploading and storing photos. I am using the meteorite packges collectionFS and aws-s3. I have setup my aws-s3 connection correctly and the images collection is working fine.
Client side event handler:
'click .submit': function(evt, templ) {
var user = Meteor.user();
var photoFile = $('#photoInput').get(0).files[0];
if(photoFile){
var readPhoto = new FileReader();
readPhoto.onload = function(event) {
photodata = event.target.result;
console.log("calling method");
Meteor.call('uploadPhoto', photodata, user);
};
}
And my server side method:
'uploadPhoto': function uploadPhoto(photodata, user) {
var tag = Random.id([10] + "jpg");
var photoObj = new FS.File({name: tag});
photoObj.attachData(photodata);
console.log("s3 method called");
Images.insert(photoObj, function (err, fileObj) {
if(err){
console.log(err, err.stack)
}else{
console.log(fileObj._id);
}
});
The file that is selected is a .jpg image file but upon upload I get this error on the server method:
Exception while invoking method 'uploadPhoto' Error: DataMan constructor received data that it doesn't support
And no matter whether I directly pass the image file, or attach it as data or use the fileReader to read as text/binary/string. I still get that error. Please advise.

Ok, maybe some thoughts. I have done things with collectionFS some months ago, so take care to the docs, because my examples maybe not 100% correct.
Credentials should be set via environment variables. So your key and secret is available on server only. Check this link for further reading.
Ok first, here is some example code which is working for me. Check yours for differences.
Template helper:
'dropped #dropzone': function(event, template) {
addImage(event);
}
Function addImage:
function addImagePreview(event) {
//Go throw each file,
FS.Utility.eachFile(event, function(file) {
//Some Validationchecks
var reader = new FileReader();
reader.onload = (function(theFile) {
return function(e) {
var fsFile = new FS.File(image.src);
//setMetadata, that is validated in collection
//just own user can update/remove fsFile
fsFile.metadata = {owner: Meteor.userId()};
PostImages.insert(fsFile, function (err, fileObj) {
if(err) {
console.log(err);
}
});
};
})(file);
// Read in the image file as a data URL.
reader.readAsDataURL(file);
});
}
Ok, your next point is the validation. The validation can be done with allow/deny rules and with a filter on the FS.Collection. This way you can do all your validation AND insert via client.
Example:
PostImages = new FS.Collection('profileImages', {
stores: [profileImagesStore],
filter: {
maxSize: 3145728,
allow: {
contentTypes: ['image/*'],
extensions: ['png', 'PNG', 'jpg', 'JPG', 'jpeg', 'JPEG']
}
},
onInvalid: function(message) {
console.log(message);
}
});
PostImages.allow({
insert: function(userId, doc) {
return (userId && doc.metadata.owner === userId);
},
update: function(userId, doc, fieldNames, modifier) {
return (userId === doc.metadata.owner);
},
remove: function(userId, doc) {
return false;
},
download: function(userId) {
return true;
},
fetch: []
});
Here you will find another example click
Another point of error is maybe your aws configuration. Have you done everything like it is written here?
Based on this post click it seems that this error occures when FS.File() is not constructed correctly. So maybe this should be you first way to start.
A lot for reading so i hope this helps you :)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Download file from Amazon S3 to lambda and extract it [duplicate] - amazon-web-services

// Using NodeJS version 10.0 or later and promises const fsPromise = require('fs').promises; try { const params = { Bucket: 's3Bucket', Key: 'file.txt', }; const data = await s3.getObject(params).promise(); await fsPromise.writeFile('/tmp/file.txt', data.Body); } catch(err) { console.log(err); }

I was having the same problem, and the issue was that I was using Runtime.NODEJS_12_X in my AWS lambda. When I switched over to NODEJS_14_X it started working for me :'). Also The /tmp is required. It will directly write to /tmp/file.ext.

Related

"InvalidParameterType" error for image files sent as blob to AWS Textract from external source

How to set credentials in AWS SDK v3 JavaScript?

AWS s3 V3 Javascript SDK stream file from bucket (GetObjectCommand)

Read S3 video file, process it with ffmpeg and upload to S3

AWS S3 Bucket Upload using CollectionFS and cfs-s3 meteor package

Categories

Resources