ListObjectsV2 - Get only folders in an S3 bucket - amazon-web-services

I am using AWS S3 JS SDK. I have folders within folders in my S3 bucket and I would like to list only folders at a certain level.
This is the structure:
bucket/folder1/folder2/folder3a/file1
bucket/folder1/folder2/folder3a/file2
bucket/folder1/folder2/folder3a/file3
bucket/folder1/folder2/folder3a/...
bucket/folder1/folder2/folder3b/file1
bucket/folder1/folder2/folder3b/file2
bucket/folder1/folder2/folder3b/file3
bucket/folder1/folder2/folder3b/...
bucket/folder1/folder2/folder3c/file1
bucket/folder1/folder2/folder3c/file2
bucket/folder1/folder2/folder3c/file3
bucket/folder1/folder2/folder3c/...
As you can see, at the level of folder 3, I have multiple folders and each of those folders contain multiple items. I don't care about the items. I just want to list the folder names at level 3. Is there a good way to do this?
The only way I found is to use ListObjectsV2. But this gives me also the files which inflates the results set and I would need to do a manual filtering afterwards. Is there a way to get just the folder names at the API level?

This article answers all my questions. https://realguess.net/2014/05/24/amazon-s3-delimiter-and-prefix/
The solution can be done using the combination of prefix and delimiter. In my examples the parameters should contain the following:
const params = {
Bucket: 'bucket',
Prefix: 'folder1/folder2/',
Delimiter: '/',
};
Be sure to not forget the slash at the end of the Prefix parameter.
The list of folders will be in the CommonPrefixes attribute of the response object.
To give you a real life example:
...
const params = {
Bucket: bucketName,
Prefix: prefix + '/',
MaxKeys: 25,
Delimiter: '/',
};
const command = new ListObjectsV2Command(params);
const results = await s3client.send(command);
const foldersList = results.CommonPrefixes;
...

Related

Fetching multiple objects with same prefix from s3 bucket

I have multiple folders in an s3 bucket and each folder contains some .txt files. Now I want to fetch just 10 .txt files from a given folder using javascript API.
For eg: the path is something like this
s3bucket/folder1/folder2/folder3/id
Now folder id is the one containing multiple .txt files. There are multiple id folders inside folder3. I want to pass id and get 10 s3 objects which have id as prefix. Is this possible using listObjectsV2? How do I limit the response to just 10 objects.
____obj1.txt
______id1----|____obj2.txt
| _____obj3.txt
|_____ id2---|____obj4.txt
s3bucket/folder1/folder2/folder3-| ____obj5.txt
|_____ id3---|____obj6.txt
So if I pass
var params= {Bucket:"s3bucket",Key:"folder1/folder2/folder3/id1"}
I should get obj1.txt and obj2.txt in response.
Which S3 method are you using? I suggest to use listObjectsV2 to achieve your goal. A possible call might look as the following
const s3 = new AWS.S3();
const { Contents } = await s3.listObjectsV2({
Bucket: 's3bucket',
Prefix: 'folder1/folder2/folder3/id1',
MaxKeys: 10
}).promise();
To get the object values you need to call getObject on each Key.
const responses = await Promise.all((Contents || []).map(({ Key }) => (
s3.getObject({
Bucket: 's3bucket',
Key
}).promise()
)));

delete folder from s3 nodejs

Hey guys I was trying to delete a folder from s3 with stuff in it but deleteObjects wasn't working so I found this script online and it works great my question is why does it work? Why do you have to listObjects when deleting a folder on s3 why cant I just pass it the objects name? Why doesn't It error when I attempt to delete the folder without listing the objects first.
first attempt (doesnt work)
var filePath2 = "templates/" + key + "/test/";
var toPush = { Key: filePath2 };
deleteParams.Delete.Objects.push(toPush);
console.log("deleteParams", deleteParams);
console.log("deleteParams.Delete", deleteParams.Delete);
const deleteResult = await s3.deleteObjects(deleteParams).promise();
console.log("deleteResult", deleteResult);
keep in mind folderPath2 is a folder that has other stuff in it I get no error but yet the catch isn't triggered and it says deleted and than the folder name.
second attempt (works)
async function deleteFromS3(bucket, path) {
const listParams = {
Bucket: bucket,
Prefix: path
};
const listedObjects = await s3.listObjectsV2(listParams).promise();
console.log("listedObjects", listedObjects);
if (listedObjects.Contents.length === 0) return;
const deleteParams = {
Bucket: bucket,
Delete: { Objects: [] }
};
listedObjects.Contents.forEach(({ Key }) => {
deleteParams.Delete.Objects.push({ Key });
});
console.log("deleteParams", deleteParams);
const deleteResult = await s3.deleteObjects(deleteParams).promise();
console.log("deleteResult", deleteResult);
if (listedObjects.IsTruncated && deleteResult)
await deleteFromS3(bucket, path);
}
than I call the function like so
const result = await deleteFromS3(myBucketName, folderPath);
Folders do not exist in Amazon S3. It is a flat object storage system, where the filename (Key) for each object contains the full path.
While Amazon S3 does support the concept of a Common Prefix, which can make things appear as though they are in folders/directories, folders do not actually exist.
For example, you could run a command like this:
aws s3 cp foo.txt s3://my-bucket/folder1/folder2/foo.txt
This would work even if the folders do not exist! It is merely storing an object with a Key of folder1/folder2/foo.txt.
If you were then to delete that object, the 'folder' would disappear because no object has it as a path. That is because the folder never actually existed.
Sometimes people want an empty folder to appear, so they create a zero-length object with the same name as the folder, eg folder1/folder2/.
So, your first program did not work because it deleted the 'folder', which has nothing to do with deleting the content of the folder (since there is no concept of 'content' of a folder).

Digital Ocean Spaces | Add expire date for files with s3cmd

I try add expire days to a file and bucket but I have this problem:
sudo s3cmd expire s3://<my-bucket>/ --expiry-days=3 expiry-prefix=backup
ERROR: Error parsing xml: syntax error: line 1, column 0
ERROR: not found
ERROR: S3 error: 404 (Not Found)
and this
sudo s3cmd expire s3://<my-bucket>/<folder>/<file> --expiry-day=3
ERROR: Parameter problem: Expecting S3 URI with just the bucket name set instead of 's3:////'
How to add expire days in DO Spaces for a folder or file by using s3cmd?
Consider configuring Bucket's Lifecycle Rules
Lifecycle rules can be used to perform different actions on objects in a Space over the course of their "life." For example, a Space may be configured so that objects in it expire and are automatically deleted after a certain length of time.
In order to configure new lifecycle rules, send a PUT request to ${BUCKET}.${REGION}.digitaloceanspaces.com/?lifecycle
The body of the request should include an XML element named LifecycleConfiguration containing a list of Rule objects.
https://developers.digitalocean.com/documentation/spaces/#get-bucket-lifecycle
The expire option is not implemented on Digital Ocean Spaces
Thanks to Vitalii answer for pointing to API.
However API isn't really easy to use, so I've done it via NodeJS script.
First of all, generate your API keys here: https://cloud.digitalocean.com/account/api/tokens
And put them in ~/.aws/credentials file (according to docs):
[default]
aws_access_key_id=your_access_key
aws_secret_access_key=your_secret_key
Now create empty NodeJS project, run npm install aws-sdk and use following script:
const aws = require('aws-sdk');
// Replace with your region endpoint, nyc1.digitaloceanspaces.com for example
const spacesEndpoint = new aws.Endpoint('fra1.digitaloceanspaces.com');
// Replace with your bucket name
const bucketName = 'myHeckingBucket';
const s3 = new aws.S3({endpoint: spacesEndpoint});
s3.putBucketLifecycleConfiguration({
Bucket: bucketName,
LifecycleConfiguration: {
Rules: [{
ID: "autodelete_rule",
Expiration: {Days: 30},
Status: "Enabled",
Prefix: '/', // Unlike AWS in DO this parameter is required
}]
}
}, function (error, data) {
if (error)
console.error(error);
else
console.log("Successfully modified bucket lifecycle!");
});

Upload subdirectory transferutility S3

I want to upload all the files/folder inside a directory to a S3 bucket. I want to upload including the all the files in all the sub directories. I thought of using TransferUtility to do this. Though the link here says that 'By default, Amazon S3 only uploads the files at the root of the specified directory. You can, however, specify to recursively upload files in all the subdirectories.' but I couldnt find a way to do this. I coundnt find any property where I can mention to include all the sub directories. I tried using SearchOption = System.IO.SearchOption.AllDirectories and SearchPattern = "*" to achieve this, but still it uploaded only the files in the top most directory. Please help me in this. Thanks.
I'm using the below code,
TransferUtility directoryTransferUtility = new TransferUtility(s3Client);
TransferUtilityUploadDirectoryRequest uRequest = new TransferUtilityUploadDirectoryRequest()
{
Directory = dirPath,
BucketName = bucketName,
SearchOption = System.IO.SearchOption.AllDirectories,
SearchPattern = "*"
};
directoryTransferUtility.UploadDirectory(dirPath, bucketName);
This is what worked for me: I set the options on the UploadDirectory method and used "*.*" as the search pattern.
directoryTransferUtility.UploadDirectory(dirPath,
bucketName,
"*.*",
SearchOption.AllDirectories);

Writing a single file to multiple s3 buckets with gulp-awspublish

I have a simple single-page app, that is deployed to an S3 bucket using gulp-awspublish. We use inquirer.js (via gulp-prompt) to ask the developer which bucket to deploy to.
Sometimes the app may be deployed to several S3 buckets. Currently, we only allow one bucket to be selected, so the developer has to gulp deploy for each bucket in turn. This is dull and prone to error.
I'd like to be able to select multiple buckets and deploy the same content to each. It's simple to select multiple buckets with inquirer.js/gulp-prompt, but not simple to generate arbitrary multiple S3 destinations from a single stream.
Our deploy task is based upon generator-webapp's S3 recipe. The recipe suggests gulp-rename to rewrite the path to write to a specific bucket. Currently our task looks like this:
gulp.task('deploy', ['build'], () => {
// get AWS creds
if (typeof(config.awsCreds) !== 'object') {
return console.error('No config.awsCreds settings found. See README');
}
var dirname;
const publisher = $.awspublish.create({
key: config.awsCreds.key,
secret: config.awsCreds.secret,
bucket: config.awsCreds.bucket
});
return gulp.src('dist/**/*.*')
.pipe($.prompt.prompt({
type: 'list',
name: 'dirname',
message: 'Using the ‘' + config.awsCreds.bucket + '’ bucket. Which hostname would you like to deploy to?',
choices: config.awsCreds.dirnames,
default: config.awsCreds.dirnames.indexOf(config.awsCreds.dirname)
}, function (res) {
dirname = res.dirname;
}))
.pipe($.rename(function(path) {
path.dirname = dirname + '/dist/' + path.dirname;
}))
.pipe(publisher.publish())
.pipe(publisher.cache())
.pipe($.awspublish.reporter());
});
It's hopefully obvious, but config.awsCreds might look something like:
awsCreds: {
dirname: 'default-bucket',
dirnames: ['default-bucket', 'other-bucket', 'another-bucket']
}
Gulp-rename rewrites the destination path to use the correct bucket.
We can select multiple buckets by using "checkbox" instead of "list" for the gulp-prompt options, but I'm not sure how to then deliver it to multiple buckets.
In a nutshell, if $.prompt returns an array of strings instead of a string, how can I write the source to multiple destinations (buckets) instead of a single bucket?
Please keep in mind that gulp.dest() is not used -- only gulp.awspublish() -- and we don't know how many buckets might be selected.
Never used S3, but if I understand your question correctly a file js/foo.js should be renamed to default-bucket/dist/js/foo.js and other-bucket/dist/js/foo.js when the checkboxes default-bucket and other-bucket are selected?
Then this should do the trick:
// additionally required modules
var path = require('path');
var through = require('through2').obj;
gulp.task('deploy', ['build'], () => {
if (typeof(config.awsCreds) !== 'object') {
return console.error('No config.awsCreds settings found. See README');
}
var dirnames = []; // array for selected buckets
const publisher = $.awspublish.create({
key: config.awsCreds.key,
secret: config.awsCreds.secret,
bucket: config.awsCreds.bucket
});
return gulp.src('dist/**/*.*')
.pipe($.prompt.prompt({
type: 'checkbox', // use checkbox instead of list
name: 'dirnames', // use different result name
message: 'Using the ‘' + config.awsCreds.bucket +
'’ bucket. Which hostname would you like to deploy to?',
choices: config.awsCreds.dirnames,
default: config.awsCreds.dirnames.indexOf(config.awsCreds.dirname)
}, function (res) {
dirnames = res.dirnames; // store array of selected buckets
}))
// use through2 instead of gulp-rename
.pipe(through(function(file, enc, done) {
dirnames.forEach((dirname) => {
var f = file.clone();
f.path = path.join(f.base, dirname, 'dist',
path.relative(f.base, f.path));
this.push(f);
});
done();
}))
.pipe(publisher.cache())
.pipe($.awspublish.reporter());
});
Notice the comments where I made changes from the code you posted.
What this does is use through2 to clone each file passing through the stream. Each file is cloned as many times as there were bucket checkboxes selected and each clone is renamed to end up in a different bucket.