Fetching multiple objects with same prefix from s3 bucket - amazon-web-services

I have multiple folders in an s3 bucket and each folder contains some .txt files. Now I want to fetch just 10 .txt files from a given folder using javascript API.
For eg: the path is something like this
s3bucket/folder1/folder2/folder3/id
Now folder id is the one containing multiple .txt files. There are multiple id folders inside folder3. I want to pass id and get 10 s3 objects which have id as prefix. Is this possible using listObjectsV2? How do I limit the response to just 10 objects.
____obj1.txt
______id1----|____obj2.txt
| _____obj3.txt
|_____ id2---|____obj4.txt
s3bucket/folder1/folder2/folder3-| ____obj5.txt
|_____ id3---|____obj6.txt
So if I pass
var params= {Bucket:"s3bucket",Key:"folder1/folder2/folder3/id1"}
I should get obj1.txt and obj2.txt in response.

Which S3 method are you using? I suggest to use listObjectsV2 to achieve your goal. A possible call might look as the following
const s3 = new AWS.S3();
const { Contents } = await s3.listObjectsV2({
Bucket: 's3bucket',
Prefix: 'folder1/folder2/folder3/id1',
MaxKeys: 10
}).promise();
To get the object values you need to call getObject on each Key.
const responses = await Promise.all((Contents || []).map(({ Key }) => (
s3.getObject({
Bucket: 's3bucket',
Key
}).promise()
)));

Related

Amazon S3 Node.JS method getObject() clarification on its parameter

I have this code for Amazon Lambda:
const file_stream = s3.getObject({ Bucket: bucket, Key: filename });
This line of code would be used along with exports.handler = async (event) function.
In regards to the "Key" parameter, is the filename should be just the filename (ex. filename.ext), full directory for the filename (ex. https://link/to/a/file/filename.ext), or something else? (I am pretty much new in AWS S3 and Lambda)
Lets assume you have a bucket my-bucket and a file 'abc.txt' inside folder hierarchy like 2021/04/12.
So you can do like below to get the object:
s3.getObject({ Bucket: 'my-bucket', Key: '2021/04/12/abc.txt' }).promise();
You can check the key value in console as well just click on the file you want to process and you can see key value under Properties Object overview tab.

ListObjectsV2 - Get only folders in an S3 bucket

I am using AWS S3 JS SDK. I have folders within folders in my S3 bucket and I would like to list only folders at a certain level.
This is the structure:
bucket/folder1/folder2/folder3a/file1
bucket/folder1/folder2/folder3a/file2
bucket/folder1/folder2/folder3a/file3
bucket/folder1/folder2/folder3a/...
bucket/folder1/folder2/folder3b/file1
bucket/folder1/folder2/folder3b/file2
bucket/folder1/folder2/folder3b/file3
bucket/folder1/folder2/folder3b/...
bucket/folder1/folder2/folder3c/file1
bucket/folder1/folder2/folder3c/file2
bucket/folder1/folder2/folder3c/file3
bucket/folder1/folder2/folder3c/...
As you can see, at the level of folder 3, I have multiple folders and each of those folders contain multiple items. I don't care about the items. I just want to list the folder names at level 3. Is there a good way to do this?
The only way I found is to use ListObjectsV2. But this gives me also the files which inflates the results set and I would need to do a manual filtering afterwards. Is there a way to get just the folder names at the API level?
This article answers all my questions. https://realguess.net/2014/05/24/amazon-s3-delimiter-and-prefix/
The solution can be done using the combination of prefix and delimiter. In my examples the parameters should contain the following:
const params = {
Bucket: 'bucket',
Prefix: 'folder1/folder2/',
Delimiter: '/',
};
Be sure to not forget the slash at the end of the Prefix parameter.
The list of folders will be in the CommonPrefixes attribute of the response object.
To give you a real life example:
...
const params = {
Bucket: bucketName,
Prefix: prefix + '/',
MaxKeys: 25,
Delimiter: '/',
};
const command = new ListObjectsV2Command(params);
const results = await s3client.send(command);
const foldersList = results.CommonPrefixes;
...

Uploading item in Amazon s3 bucket from React Native with user's info

I am uploading an image on AWS S3 using React Native with AWS amplify for mobile app development. Many users use my app.
I want that whenever any user uploads the image on S3 through the mobile app, I want to get user's ID also along with that image. So that later I can recognise the images on S3 that which image belongs to which user. How can I achieve this?
I am using AWS Auth Cognito for user registration/ Sign-In. I came to know that whenever a user is registered in AWS cognito (for the first time), the user gets a unique ID in the pool. Can I use this user ID to be passed alongwith image whenever user uploads image?
Basically I want to have some form of functionality so that I can track back to the user who uploaded the image on S3. This is because after the image is uploaded on S3, I later want to process this image and send the result back ONLY to the user of the image.
You can store the data in S3 in structure similar to the one below:
users/
123userId/
image1.jpeg
image2.jpeg
anotherUserId456/
image1.png
image2.png
Then, if you need all files from given user, you can use ListObjects API in S3 lambda - docs here
// for lambda function
const AWS = require('aws-sdk');
const s3 = new AWS.S3();
const objects = await s3.listObjectsV2({
Bucket: 'STRING_VALUE', /* required */
Delimiter: 'STRING_VALUE',
EncodingType: url,
ExpectedBucketOwner: 'STRING_VALUE',
Marker: 'STRING_VALUE',
MaxKeys: 'NUMBER_VALUE',
Prefix: 'STRING_VALUE',
RequestPayer: requester
}).promise()
objects.forEach(item => {
console.log(item
)});
Or if you are using S3 Lambda trigger, you can parse userId from "key" / filename in received event in S3 lambda (in case you used structure above).
{
"key": "public/users/e1e0858f-2ea1-90f892b68e0c/item.jpg",
"size": 269582,
"eTag": "db8aafcca5786b62966073f59152de9d",
"sequencer": "006068DC0B344DA9E9"
}
Another option is to write "userId" into metadata of the file that will be uploaded to S3.
You can pass "sub" property from Cognito's currently logged user, so in S3 Lambda trigger function you will get the userId from metadata.
import Auth from "#aws-amplify/auth";
const user = await Auth.currentUserInfo();
const userId = user.attributes.sub;
return userId;
// use userId from Cognito and put in into custom metadata
import {Storage} from "aws-amplify";
const userId = "userIdHere"
const filename = "filename" // or use uuid()
const ref = `public/users/#{userId}/${filename}`
const response = await Storage.put(ref, blob, {
contentType: "image/jpeg",
metadata: {userId: userId},
});
AWS Amplify can do all above automatically (create folder structures, etc.), if you do not need any special structure how files are stored = docs here.
You only need to configure Storage ('globally' or per action) with "level" property.
Storage.configure({ level: 'private' });
await Storage.put(ref, blob, {
contentType: "image/jpeg",
metadata: {userId: userId},
});
//or set up level only for given action
const ref= "userCollection"
await Storage.put(ref, blob, {
contentType: "image/jpeg",
metadata: {userId: userId},
level: "private"
});
So, for example, if you use level "private", file "124.jpeg" will be stored in S3 at
"private/us-east-1:6419087f-d13e-4581-b72e-7a7b32d7c7c1/userCollection/124.jpeg"
However, as you can see, "us-east-1:6419087f-d13e-4581-b72e-7a7b32d7c7c1" looks different than the "sub" in Cognito ("sub" property does not contain regions).
The related discussion is here, also with few workarounds, but basically you need to decide how you will manage user identification in your project on your own (if you use "sub" everywhere as userId, or you will go with another ID - I think it is called identityID and consider that as userId).
PS: If you are using React Native, I guess you will go with Push Notification for sending updates from backend - if that is the case, I was doing something similar ("moderation control") - so I added another Lambda function, Cognito's Post-Confirmation Lambda, that creates user in DynamoDB with ID of Cognitos's "sub" property.
Then user can save token from mobile device needed for push notifications, so when the AWS Rekognition finished detection on the image that user uploaded, I queried DynamoDB and used SNS to send the notification to the end user.

delete folder from s3 nodejs

Hey guys I was trying to delete a folder from s3 with stuff in it but deleteObjects wasn't working so I found this script online and it works great my question is why does it work? Why do you have to listObjects when deleting a folder on s3 why cant I just pass it the objects name? Why doesn't It error when I attempt to delete the folder without listing the objects first.
first attempt (doesnt work)
var filePath2 = "templates/" + key + "/test/";
var toPush = { Key: filePath2 };
deleteParams.Delete.Objects.push(toPush);
console.log("deleteParams", deleteParams);
console.log("deleteParams.Delete", deleteParams.Delete);
const deleteResult = await s3.deleteObjects(deleteParams).promise();
console.log("deleteResult", deleteResult);
keep in mind folderPath2 is a folder that has other stuff in it I get no error but yet the catch isn't triggered and it says deleted and than the folder name.
second attempt (works)
async function deleteFromS3(bucket, path) {
const listParams = {
Bucket: bucket,
Prefix: path
};
const listedObjects = await s3.listObjectsV2(listParams).promise();
console.log("listedObjects", listedObjects);
if (listedObjects.Contents.length === 0) return;
const deleteParams = {
Bucket: bucket,
Delete: { Objects: [] }
};
listedObjects.Contents.forEach(({ Key }) => {
deleteParams.Delete.Objects.push({ Key });
});
console.log("deleteParams", deleteParams);
const deleteResult = await s3.deleteObjects(deleteParams).promise();
console.log("deleteResult", deleteResult);
if (listedObjects.IsTruncated && deleteResult)
await deleteFromS3(bucket, path);
}
than I call the function like so
const result = await deleteFromS3(myBucketName, folderPath);
Folders do not exist in Amazon S3. It is a flat object storage system, where the filename (Key) for each object contains the full path.
While Amazon S3 does support the concept of a Common Prefix, which can make things appear as though they are in folders/directories, folders do not actually exist.
For example, you could run a command like this:
aws s3 cp foo.txt s3://my-bucket/folder1/folder2/foo.txt
This would work even if the folders do not exist! It is merely storing an object with a Key of folder1/folder2/foo.txt.
If you were then to delete that object, the 'folder' would disappear because no object has it as a path. That is because the folder never actually existed.
Sometimes people want an empty folder to appear, so they create a zero-length object with the same name as the folder, eg folder1/folder2/.
So, your first program did not work because it deleted the 'folder', which has nothing to do with deleting the content of the folder (since there is no concept of 'content' of a folder).

Writing a single file to multiple s3 buckets with gulp-awspublish

I have a simple single-page app, that is deployed to an S3 bucket using gulp-awspublish. We use inquirer.js (via gulp-prompt) to ask the developer which bucket to deploy to.
Sometimes the app may be deployed to several S3 buckets. Currently, we only allow one bucket to be selected, so the developer has to gulp deploy for each bucket in turn. This is dull and prone to error.
I'd like to be able to select multiple buckets and deploy the same content to each. It's simple to select multiple buckets with inquirer.js/gulp-prompt, but not simple to generate arbitrary multiple S3 destinations from a single stream.
Our deploy task is based upon generator-webapp's S3 recipe. The recipe suggests gulp-rename to rewrite the path to write to a specific bucket. Currently our task looks like this:
gulp.task('deploy', ['build'], () => {
// get AWS creds
if (typeof(config.awsCreds) !== 'object') {
return console.error('No config.awsCreds settings found. See README');
}
var dirname;
const publisher = $.awspublish.create({
key: config.awsCreds.key,
secret: config.awsCreds.secret,
bucket: config.awsCreds.bucket
});
return gulp.src('dist/**/*.*')
.pipe($.prompt.prompt({
type: 'list',
name: 'dirname',
message: 'Using the ‘' + config.awsCreds.bucket + '’ bucket. Which hostname would you like to deploy to?',
choices: config.awsCreds.dirnames,
default: config.awsCreds.dirnames.indexOf(config.awsCreds.dirname)
}, function (res) {
dirname = res.dirname;
}))
.pipe($.rename(function(path) {
path.dirname = dirname + '/dist/' + path.dirname;
}))
.pipe(publisher.publish())
.pipe(publisher.cache())
.pipe($.awspublish.reporter());
});
It's hopefully obvious, but config.awsCreds might look something like:
awsCreds: {
dirname: 'default-bucket',
dirnames: ['default-bucket', 'other-bucket', 'another-bucket']
}
Gulp-rename rewrites the destination path to use the correct bucket.
We can select multiple buckets by using "checkbox" instead of "list" for the gulp-prompt options, but I'm not sure how to then deliver it to multiple buckets.
In a nutshell, if $.prompt returns an array of strings instead of a string, how can I write the source to multiple destinations (buckets) instead of a single bucket?
Please keep in mind that gulp.dest() is not used -- only gulp.awspublish() -- and we don't know how many buckets might be selected.
Never used S3, but if I understand your question correctly a file js/foo.js should be renamed to default-bucket/dist/js/foo.js and other-bucket/dist/js/foo.js when the checkboxes default-bucket and other-bucket are selected?
Then this should do the trick:
// additionally required modules
var path = require('path');
var through = require('through2').obj;
gulp.task('deploy', ['build'], () => {
if (typeof(config.awsCreds) !== 'object') {
return console.error('No config.awsCreds settings found. See README');
}
var dirnames = []; // array for selected buckets
const publisher = $.awspublish.create({
key: config.awsCreds.key,
secret: config.awsCreds.secret,
bucket: config.awsCreds.bucket
});
return gulp.src('dist/**/*.*')
.pipe($.prompt.prompt({
type: 'checkbox', // use checkbox instead of list
name: 'dirnames', // use different result name
message: 'Using the ‘' + config.awsCreds.bucket +
'’ bucket. Which hostname would you like to deploy to?',
choices: config.awsCreds.dirnames,
default: config.awsCreds.dirnames.indexOf(config.awsCreds.dirname)
}, function (res) {
dirnames = res.dirnames; // store array of selected buckets
}))
// use through2 instead of gulp-rename
.pipe(through(function(file, enc, done) {
dirnames.forEach((dirname) => {
var f = file.clone();
f.path = path.join(f.base, dirname, 'dist',
path.relative(f.base, f.path));
this.push(f);
});
done();
}))
.pipe(publisher.cache())
.pipe($.awspublish.reporter());
});
Notice the comments where I made changes from the code you posted.
What this does is use through2 to clone each file passing through the stream. Each file is cloned as many times as there were bucket checkboxes selected and each clone is renamed to end up in a different bucket.