Using S3 for User Image Content: Single or multiple buckets? - django

What's the best practice for using S3 to store image uploads from users in terms of a single bucket or multiple buckets for different purposes? Use case is a b2b application.

There is no limit to the amount of data you can store in an Amazon S3 bucket. Therefore you could, in theory, simply use one bucket for everything. (However, if you want data in multiple regions, then you would need to use a separate bucket per region.)
To best answer your question, you would need to think about how data is accessed:
If controlling access for IAM Users, then giving each user a separate folder is easy for access control using IAM Policy Elements: Variables and Tags
If controlling access for application users, then users will authenticate to an application, which will determine their access to objects. The application can then generate Amazon S3 pre-signed URLs to grant access to specific objects, so separation by bucket/folder is less important
If the data is managed by different Admins/Developers it is a good idea to keep the data in separate buckets to simplify access permissions (eg keeping HR data separate from customer data)
Basically, as long as you have a good reason to separate the data (eg test vs prod, different apps, different admins), then use separate buckets. But, for a single app, it might make better sense to use a single bucket.

I believe it's the same in terms of performance and availability. As for splitting content by purpose - It's probably ok to use a single bucket as long as the content is split in different folders (Paths).
We used to have one bucket for user-uploaded content and another one for static (CSS/JS/IMG) files that were auto-generated.

Related

S3 buckets composition

I'm developing CMS which runs as a single instance, but serves multiple websites of different users. This CMS needs to store files in storage. Each website can have either few images but also thousands of objects. Currently we serve around 5 websites, but plan to have hundreds, so it must scale easily.
Now I'm thinking about two possible ways to go. I want to use S3 for storage.
solution is to have single bucket for all files in my app
solution is to have one bucket for each website.
According to AWS docs, S3 can handle "virtually unlimited amount of bytes", so I think first solution could work well, but I'm thinking about other aspects:
Isn't it just cleaner to have one bucket for each website? Is it better for maintance?
Which solution is more secure, if so? Are there some security concerns to care about?
Is same applicable to other S3-like services like Minio or DigitalOcean Spaces?
Thank you very much for your answers.
I'd go for solution 1.
From a technical perspective there really is virtually no limit to the amounts of objects you can put in a bucket - S3 is built for extreme scale. For 5 websites option 2 might sound tempting, but that doesn't scale very well.
There's a soft-limit (i.e. you can raise it) of 100 buckets per region or per account, which is an indication that using hundreds of buckets is probably an anti-pattern. Also securing 100s of buckets is not easier than securing one bucket.
Concerning security: You can be very granular with bucket policies in S3 if you need that. You can also choose how you want to encrypt each object individually if that is a requirement. Features like pre-signed URLs can help you grant temporary access to specific objects in S3.
If your goal is to serve static content to end users, you'll have to either make the objects publicly readable, use the aforementioned pre-signed URLs or set up CloudFront as a CDN in front of your bucket.
I don't know how this relates to S3-like services.

How to limit the amount of data stored by user via S3 buckets on AWS?

I'm creating a platform whereby users upload data to us. We check the data to make sure it's safe and correctly formatted, and then store the data in buckets, tagging by user.
The size of the data upload is normally around 100MB. This is large enough to be concerning.
I'm worried about cases where certain users may try to store an unreasonable about of data on the platform, i.e. they make 1000s of transactions within a short period of time.
How do cloud service providers allow site admins to monitor the amount of data stored per user?
How is this actively monitored?
Any direction/insight appreciated. Thank you.
Amazon S3 does not have a mechanism for limiting data storage per "user".
Instead, your application will be responsible for managing storage and for defining the concept of an "application user".
This is best done by tracking data files in a database, including filename, owner, access permissions, metadata and lifecycle rules (eg "delete after 90 days"). Many applications allow files to be shared between users, such as a photo sharing website where you can grant view-access of your photos to another user. This is outside the scope of Amazon S3 as a storage service, and should be handled within your own application.
If such data is maintained in a database, it would be trivial to run a query to identify "data stored per user". I would recommend against storing such information as metadata against the individual objects in Amazon S3 because there is no easy way to query metadata across all objects (eg list all objects associated with a particular user).
Bottom line: Amazon S3 will store your data and keep it secure. However, it is the responsibility of your application to manage activities related to users.

S3 what is recommended hierarchy for storing objects?

I've been playing around with Amazon S3 and I wonder why would I need to use multiple buckets. I just though I would name my objects according to hierarchy they belong to, eg. blog/articles/2016/08/article-title.jpg and store them all in one bucket. The folders will be created in this case. Or is there any reason why would I need multiple buckets to store uploaded files?
And if so, what is the proper design of having multiple buckets? Let's say I need to categorise files by type year and month. I suppose I can't have buckets in a bucket.
AWS guidance in S3 Bucket Restrictions and Limitations states:
There is no limit to the number of objects that can be stored in a
bucket and no difference in performance whether you use many buckets
or just a few. You can store all of your objects in a single bucket,
or you can organize them across several buckets.
I would keep it simple, and store that type of asset data in a single bucket, perhaps divided up into a few 'top level' key name prefixes (folders) such as images, scripts, etc.

Is it better to have multiple s3 buckets or one bucket with sub folders?

Is it better to have multiple s3 buckets per category of uploads or one bucket with sub folders OR a linked s3 bucket? I know for sure there will be more user-images than there will be profille-pics and that there is a 5TB limit per bucket and 100 buckets per account. I'm doing this using aws boto library and https://github.com/amol-/depot
Which is the structure my folders in which of the following manner?
/app_bucket
/profile-pic-folder
/user-images-folder
OR
profile-pic-bucket
user-images-bucket
OR
/app_bucket_1
/app_bucket_2
The last one implies that its really a 10TB bucket where a new bucket is created when the files within bucket_1 exceeds 5TB. But all uploads will be read as if in one bucket. Or is there a better way of doing what I'm trying to do? Many thanks!
I'm not sure if this is correct... 100 buckets per account?
https://www.reddit.com/r/aws/comments/28vbjs/requesting_increase_in_number_of_s3_buckets/
Yes, there is actually a 100 bucket limit per account. I asked the reason for that to an architect in an AWS event. He said this is to avoid people hosting unlimited static websites on S3 as they think this may be abused. But you can apply for an increase.
By default, you can create up to 100 buckets in each of your AWS
accounts. If you need additional buckets, you can increase your bucket
limit by submitting a service limit increase.
Source: http://docs.aws.amazon.com/AmazonS3/latest/dev/BucketRestrictions.html
Also, please note that there are actually no folders in S3, just a flat file structure:
Amazon S3 has a flat structure with no hierarchy like you would see in
a typical file system. However, for the sake of organizational
simplicity, the Amazon S3 console supports the folder concept as a
means of grouping objects. Amazon S3 does this by using key name
prefixes for objects.
Source: http://docs.aws.amazon.com/AmazonS3/latest/UG/FolderOperations.html
Finally, the 5TB limit only applies to a single object. There is no limit on the number of objects or total size of the bucket.
Q: How much data can I store?
The total volume of data and number of objects you can store are
unlimited.
Source: https://aws.amazon.com/s3/faqs/
Also the documentation states there is no performance difference between using a single bucket or multiple buckets so I guess both option 1 and 2 would be suitable for you.
Hope this helps.
Simpler Permission with Multiple Buckets
If the images are used in different use cases, using multiple buckets will simplify the permissions model, since you can give clients/users bucket level permissions instead of directory level permissions.
2-way doors and migrations
On a similar note, using 2 buckets is more flexible down the road.
1 to 2:
If you switch from 1 bucket to 2, you now have to move all clients to the new set-up. You will need to update permissions for all clients, which can require IAM policy changes for both you and the client. Then you can move your clients over by releasing a new client library during the transition period.
2 to 1:
If you switch from 2 buckets to 1 bucket, your clients will already have access to the 1 bucket. All you need to do is update the client library and move your clients onto it during the transition period.
*If you don't have a client library than code changes are required in both cases for the clients.

can i meter or set a size limit to an s3 folder

I'd like to set up a separate s3 bucket folder for each of my mobile app users for them to store their files. However, I also want to set up size limits so that they don't use up too much storage. Additionally, if they do go over the limit I'd like to offer them increased space if they sign up for a premium service.
Is there a way I can set folder file size limits through s3 configuration or api? If not would I have to use the apis somehow to calculate folder size on every upload? I know that there is the devpay feature in Amazon but it might be a hassle for users to sign up with Amazon if they want to just use small amount of free space.
There does not appear to be a way to do this, probably at least in part because there is actually no such thing as "folders" in S3. There is only the appearance of folders.
Amazon S3 does not have concept of a folder, there are only buckets and objects. The Amazon S3 console supports the folder concept using the object key name prefixes.
— http://docs.aws.amazon.com/AmazonS3/latest/UG/FolderOperations.html
All of the keys in an S3 bucket are actually in a flat namespace, with the / delimiter used as desired to conceptually divide objects into logical groupings that look like folders, but it's only a convenient illusion. It seems impossible that S3 would have a concept of the size of a folder, when it has no actual concept of "folders" at all.
If you don't maintain an authoritative database of what's been stored by clients (which suggests that all uploads should pass through an app server rather than going directly to S3, which is the the only approach that makes sense to me at all) then your only alternative is to poll S3 to discover what's there. An imperfect shortcut would be for your application to read the S3 bucket logs to discover what had been uploaded, but that is only provided on a best-effort basis. It should be reliable but is not guaranteed to be perfect.
This service provides a best effort attempt to log all access of objects within a bucket. Please note that it is possible that the actual usage report at the end of a month will slightly vary.
Your other option is to develop your own service that sits between users and Amazon S3, that monitors all requests to your buckets/objects.
— http://aws.amazon.com/articles/1109#13
Again, having your app server mediate all requests seems to be the logical approach, and would also allow you to detect immediately (as opposed to "discover later") that a user had exceeded a threshold.
I would maintain a seperate database in the cloud to hold each users total hdd usage count. Its easy to manage the count via S3 Object Lifecycle Events which could easily trigger a Lambda which in turn writes to a DB.