I need to share ~10K files with ~10K people (one-to-one matching) and I would like to use Amazon S3 for this (by giving public access to these files).
The caveat is that I do not want anyone to be able to download all these files together. What are the right permissions for this.
Currently, my plan is:
Create non-public buckets (foo)
Name each file with a long string so one cannot guess the link (bar)
Make all files public
Share links of the form https://foo.s3.amazonaws.com/bar
It seems that by having a non-public bucket, I ensure that no one can list files in my bucket and hence won't be able to guess names of the files inside. Is it correct?
I would approach this using pre-signed urls as this allows you to grant access on an object-level, even when your bucket and objects are kept private. This means that the only way to access an object in the bucket is by using the link you provide to each individual user.
Therefore to follow best practice, you should block all public access and make all objects private. This will prevent anyone from listing bucket objects.
To automate this, you could upload the files naming them after each user, or some other identifying string like an id number. You can then generate a presigned url giving the user a limited time to retrieve the file without granting them access to the bucket as a whole with some kind of loop.
I use bash so that's the example I'll give but there's probably a similar powershell solution for this too.
The easiest way to do this would be with the aws-cli:
aws s3 presign s3://<YOUR-BUCKET-NAME>/<userIdNumber>.file \
--expires-in 604800
Put all of your userid's or whatever you've used to identify your user's files in a text file and loop over them with bash to generate all your presigned url's like so:
Contents of users.txt:
user1
user2
user3
user4
user5
The loop:
for i in $(cat users.txt) ;
do
echo "$i" ;
aws s3 presign "s3://my-bucket/$i.file" --expires-in 604800 ;
done
This should spit out a list of usernames with a url below each user. Just send the link to each user and they will be able to get their document.
Related
I Have a bucket with 3 million objects. I Even don't know how many folders are there in my S3 bucket and even don't know the names of folders in my bucket.I want to show only list of folders of AWS s3. Is there any way to get list of all folders ?
I would use AWS CLI for this. To get started - have a look here.
Then it is a matter of almost standard linux commands (ls):
aws s3 ls s3://<bucket_name>/path/to/search/folder/ --recursive | grep '/$' > folders.txt
where:
grep command just reads what aws s3 ls command has returned and searches for entries with ending /.
ending > folders.txt saves output to a file.
Note: grep (if I'm not wrong) is unix only utility command. But I believe, you can achieve this on windows as well.
Note 2: depending on the number of files there this operation might (will) take a while.
Note 3: usually in systems like AWS S3, term folder is there only for user to maintain visual similarity with standard file systems however inside it does treat it as a part of a key. You can see in your (web) console when you filter by "prefix".
Amazon S3 buckets with large quantities of objects are very difficult to use. The API calls that list bucket contents are limited to returning 1000 objects per API call. While it is possible to request 'folders' (by using Delimiter='/' and looking at CommonPrefixes), this would take repeated calls to obtain the hierarchy.
Instead, I would recommend using Amazon S3 Inventory, which can provide a daily or weekly CSV file listing all objects. You can then play with that CSV file from code (or possibly Excel? Might be too big?) to obtain your desired listings.
Just be aware that doing anything on that bucket will not be fast.
I have an Amazon S3 server filled with multiple buckets, each bucket containing multiple subfolders. There are easily 50,000 files in total. I need to generate an excel sheet that contains the path/url of each file in each bucket.
For eg, If I have a bucket called b1, and it has a file called f1.txt, I want to be able to export the path of f1 as b1/f1.txt.
This needs to be done for every one of the 50,000 files.
I have tried using S3 browsers like Expandrive and Cyberduck, however they require you to select each and every file to copy their urls.
I also tried exploring the boto3 library in python, however I did not come across any in built functions to get the file urls.
I am looking for any tool I can use, or even a script I can execute to get all the urls. Thanks.
Do you have access to the aws cli? aws s3 ls --recursive {bucket} will list all nested files in a bucket.
Eg this bash command will list all buckets, then recursively print all files in each bucket:
aws s3 ls | while read x y bucket; do aws s3 ls --recursive $bucket | while read x y z path; do echo $path; done; done
(the 'read's are just to strip off uninteresting columns).
nb I'm using v1 CLI.
What you should do is have a look again at boto3 documentation as it is what you are looking for. It is fairly simple to do what you are asking but may take you a bit of reading if you are new to it. Since there is multiple steps involved I will try to steer you in the right direction.
In boto3 for S3 the method you are looking for is list_objects_v2(). This will give you the 'Key' or object path of every object. You will notice that it will return the entire json blob for each object. Since you only are interested in the Key, you can target this just the same way you would access Key/Values in a dict. E.g. list_objects_v2()['Contents'][0]['Key'] should return only object path of the very first object.
If you've got that working the next step is to try to loop and get all values. You can either use a for loop to do this or there is an awesome python package I regularly use called jmespath - https://jmespath.org/
Here is how you can retrieve all object paths up to 1000 objects in one line.
import jmespath
bucket_name='im-a-bucket'
s3_client = boto3.client('s3')
bucket_object_paths = jmespath.search('Contents[*].Key', s3_client.list_objects_v2(Bucket=bucket_name))
Now since your buckets may have more than 1000 objects, you will need to use the paginator to do this. Have a look at this to understand it.
How to get more than 1000 objects from S3 by using list_objects_v2?
Basically the way it works is only 1000 objects can be returned. To overcome this we use a paginator which allows you to return the entire result and treats the limit of 1000 as a pagination so you just need to also use it within a for loop to get all the results you are looking for.
Once you get this working for one bucket, store the result in a variable which will be of type list and repeat for the rest of the buckets. Once you have all this data you could easily just copy paste it into an excel sheet or use python to do it. (Haven't tested the code snippets but they should work).
Amazon s3 inventory can help you with this use case.
Do evaluate that option. refer: https://docs.aws.amazon.com/AmazonS3/latest/userguide/storage-inventory.html
I tried to use DigitalOcean Spaces which is like AWS S3 to give certain users the ability to view a file like a video. I only can give them a custom link (to one file, not a hole direcotry) with a defined period of time to view .
I would like to know what is the best practice in the cloud, how to share files privately only to registred users.
You can create a private S3 bucket and using the SDK, you can create pre-signed URL for a file and set the expiry time of the link.
https://docs.aws.amazon.com/AmazonS3/latest/dev/ShareObjectPreSignedURL.html
I have a s3 bucket which is mapped to a domian say xyz.com . When ever a user register on xyz.com a file is created and stored in s3 bucket. Now i have 1000 of files in s3 and I want to replace some text in those files. All files have common name in start ex abc-{rand}.txt
The safest way of doing this would be to regenerate them again through the same process you originally used.
Personally I would try to avoid find and replace as it could lead to modifying parts that you did not intend.
Run multiple generations in parallel and override the existing files. This will ensure the files you generate will match your expectation and will not need to be modified again.
As a suggestion enable versioning before any of these interactions if you want the ability to rollback quickly in a scenario where it needs to be reverted.
Sadly, you can't do this in place in S3. You have to download them, change their content and re-upload.
This is because S3 is an object storage system, not regular file system.
To simply working with S3 files, you can use third part tool s3fs-fuse. The tool will make the S3 appear like a filesystem on your os.
I have set up a public bucket in S3 and copied multiple objects into it. In this case they are jpeg photos.
I want to share all these objects with anonymous public users (friends), but I want to send them one static website address for the bucket and for the objects to show up as a list (or at least show all the images) when they click on that one address link.
Is this possible to display the objects this way using S3 to public users who don't have an S3 account?
The alternative I know of is to send them a unique link to each of the objects in the bucket (which would take forever!).
Any advice would be helpful.
S3 doesn't have anything built-in to do a "directory index" like nginx and Apache can do. It can be done with AWS Lambda, though.
I built a rudimentary image index with lambda, you might be able to adapt it to solve your problem.
yes.
you can host an static webpage inside a s3 bucket: http://docs.aws.amazon.com/AmazonS3/latest/dev/WebsiteHosting.html
just generate a static html page with links to all the photos, upload it in the bucket, set the bucket to serve as a static webpage and give the link to it.
Or, for the extra lazy :) https://github.com/rgrp/s3-bucket-listing
Thanks for your answers, they helped me to find a really simple solution. On a different forum I found someone has written some script and put it in a link that you just upload straight into your bucket and that puts all the objects into a simple list...... genius!
This is the link:
http://regexp.s3.amazonaws.com/list.html
So for the less techy people (like me) you literally upload that link above into your bucket. Even if you haven't downloaded it onto your PC, just copy and paste it into the upload file path.
When I uploaded it, the file appeared in the S3 bucket as list.html
Make sure the file is readable and you've set the ACL appropriately. And make sure your bucket has a policy that allows anyone to access it.
Your bucket objects(content) are then shown at the url link below.
http://<your bucket name>.s3.amazonaws.com/list.html
Where <your bucket name> is written above, replace that part with just the name of your bucket.
And you should be able to click on that link and see the list of objects in your bucket. Once you get your head around it, it is actually very simple.