Amazon S3 Multiple Custom Domain Without Cloudfront - amazon-web-services

Is it possible to setup Amazon's Simple Storage Solution to use custom domains (storage-01.example.com, storage-02.example.com, storage-03.example.com, ...) without using Cloud Front? I don't really care about having an 'edge' network, but do want the browsers to make parallel requests for assets. Thanks!

No, unless you duplicate your keys into multiple S3 buckets. This is because S3 uses the Host header value as a reference to the bucket.
I guess you could be sneaky and take advantage of the different URL styles. But it's a horrible suggestion and I would never implement it.
http://www.mybucketdomain.com/foo.jpg
http://www.mybucketdomain.com.s3.amazonaws.com/foo.jpg
http://s3.amazonaws.com/www.mybucketdomain.com/foo.jpg

Related

S3 buckets composition

I'm developing CMS which runs as a single instance, but serves multiple websites of different users. This CMS needs to store files in storage. Each website can have either few images but also thousands of objects. Currently we serve around 5 websites, but plan to have hundreds, so it must scale easily.
Now I'm thinking about two possible ways to go. I want to use S3 for storage.
solution is to have single bucket for all files in my app
solution is to have one bucket for each website.
According to AWS docs, S3 can handle "virtually unlimited amount of bytes", so I think first solution could work well, but I'm thinking about other aspects:
Isn't it just cleaner to have one bucket for each website? Is it better for maintance?
Which solution is more secure, if so? Are there some security concerns to care about?
Is same applicable to other S3-like services like Minio or DigitalOcean Spaces?
Thank you very much for your answers.
I'd go for solution 1.
From a technical perspective there really is virtually no limit to the amounts of objects you can put in a bucket - S3 is built for extreme scale. For 5 websites option 2 might sound tempting, but that doesn't scale very well.
There's a soft-limit (i.e. you can raise it) of 100 buckets per region or per account, which is an indication that using hundreds of buckets is probably an anti-pattern. Also securing 100s of buckets is not easier than securing one bucket.
Concerning security: You can be very granular with bucket policies in S3 if you need that. You can also choose how you want to encrypt each object individually if that is a requirement. Features like pre-signed URLs can help you grant temporary access to specific objects in S3.
If your goal is to serve static content to end users, you'll have to either make the objects publicly readable, use the aforementioned pre-signed URLs or set up CloudFront as a CDN in front of your bucket.
I don't know how this relates to S3-like services.

Difference between Amazon S3 cross region replication and Cloudfront

After reading some AWS documentations, I am wondering what's the difference between these different use cases if I want to delivery (js, css, images and api request) content in Asia (including China), US, and EU.
Store my images and static files on S3 US region and setup EU and Asia(Japan or Singapore) cross region replication to sync with US region S3.
Store my images and static files on S3 US region and setup cloudfront CDN to cache my content in different locations after initial request.
Do both above (if there is significant performance improvement).
What is the most cost effective solution if I need to achieve global deployment? And how to make request from China consistent and stable (I tried cloudfront+s3(us-west), it's fast but the performance is not consistent)?
PS. In early stage, I don't expect too many user requests, but users spread globally and I want them to have similar experience. The majority of my content are panorama images which I'd expect to load ~30MB (10 high res images) data sequentially in each visit.
Cross region replication will copy everything in a bucket in one region to a different bucket in another region. This is really only for extra backup/redundancy in case an entire AWS region goes down. It has nothing to do with performance. Note that it replicates to a different bucket, so you would need to use different URLs to access the files in each bucket.
CloudFront is a Content Delivery Network. S3 is simply a file storage service. Serving a file directly from S3 can have performance issues, which is why it is a good idea to put a CDN in front of S3. It sounds like you definitely need a CDN, and it sounds like you have tested CloudFront and are unimpressed. It also sounds like you need a CDN with a larger presence in China.
There is no reason you have to chose CloudFront as your CDN just because you are using other AWS services. You should look at other CDN services and see what their edge networks looks like. Given your requirements I would highly recommend you take a look at CloudFlare. They have quite a few edge network locations in China.
Another option might be to use a CDN that you can actually push your files to. I've used this feature in the past with MaxCDN. You would push your files to the CDN via FTP, and the files would automatically be pushed to all edge network locations and cached until you push an update. For your use case of large image downloads, this might provide a more performant caching mechanism. MaxCDN doesn't appear to have a large China presence though, and the bandwidth charges would be more expensive than CloudFlare.
If you want to serve your files in S3 buckets to all around the world, then I believe you may consider using S3 Transfer acceleration. It can be used in cases where you either upload to or download from your S3 bucket . Or you may also try AWS Global Accelerator
CloudFront's job is to cache content at hundreds of caches ("edge locations") around the world, making them more quickly accessible to users around the world. By caching content at locations close to users, users can get responses to their requests more quickly than they otherwise would.
S3 Cross-Region Replication (CRR) simply copies an S3 bucket from one region to another. This is useful for backing up data, and it also can be used to speed up content delivery for a particular region. Unlike CloudFront, CRR supports real-time updating of bucket data, which may be important in situations where data needs to be current (e.g. a website with frequently-changing content). However, it's also more of a hassle to manage than CloudFront is, and more expensive on a multi-region scale.
If you want to achieve global deployment in a cost-effective way, then CloudFront would probably be the better of the two, except in the special situation outlined in the previous paragraph.

Possible to allow client upload to S3 over https AND have a CNAME alias for the bucket?

OK, so I have a an Amazon S3 bucket to which I want to allow users to upload files directly from the client over https.
In order to do this it became apparent that I would have to change the bucket name from a format using periods to a format using dashes. So :
my.bucket.com
became :
my-bucket-com
This being required due to a limitation of https authentication which can't deal with periods in the bucket name when resolving the S3 endpoint.
So everything is peachy, except now I'd like to allow access to those files while hiding the fact that they are being stored on Amazon S3.
The obvious choice seems to be to use Route 53 zone configuration records to add a CNAME record to point my url at the bucket, given that I already have the 'bucket.com' domain :
my.bucket.com > CNAME > my-bucket-com.s3.amazonaws.com
However, I now seem to have hit another limitation, in that Amazon seem to insist that the name of the CNAME record must match the bucket name exactly so the above example will not work.
My temporary solution is to use a reverse proxy on an EC2 instance while traffic volumes are low. But this is not a good or long term solution as it means that all S3 access is being funneled through the proxy server causing extra server load, and data transfer charges. Not to mention the solution really isn't scalable when traffic volumes start to increase.
So is it possible to achieve both of my goals above or are they mutually exclusive?
If I want to be able to upload directly from clients over https, I can't then hide the S3 url from end users accessing that content and vice versa?
Well there simply doesn't seem to be a straightforward way of achieving this.
There are 2 possible solutions :
1.) Put your S3 bucket behind Amazon Cloudfront - but this does incur a lot more charges, all be it with the added benefit of lower latency regional access to your content.
2.) The solution we will go with will simply be to split the bucket in to two.
One for upload from HTTPS clients my-bucket-com; And one for CNAME aliased access to that content my.bucket.com. This keeps the costs down, although it will involve extra steps in organising the content before it can be accessed.

Amazon AWS: DynamoDB requirements

Objective: Using iPhone app, I would like the users store objects in DynamoDB and have Fine-Grained Access Control for the objects using IAM with TVM.
The objects will contain only Strings, no images/file storage -- I'm thinking I won't need an S3?
Question: Since there is no server-side application, do I still need an EC2 Instance? What all suite of AWS services will I have to subscribe to in order to accomplish my objective?
You can use either DynamoDB (or S3), and neither of them would require an EC2 instance - there is no dependency.
If it was me, I'd first see if I could get what I wanted down in S3(because you mentioned it as a possibility), and then go to DynamoDB if I couldn't (i.e. I wanted to be able to run agregation queries across my data set). S3 will be cheaper and depending on what your are doing, may even be faster and would allow you to globally distribute the stored data thru CloudFront easily, which if you have a globally diverse user base may be beneficial.

can i meter or set a size limit to an s3 folder

I'd like to set up a separate s3 bucket folder for each of my mobile app users for them to store their files. However, I also want to set up size limits so that they don't use up too much storage. Additionally, if they do go over the limit I'd like to offer them increased space if they sign up for a premium service.
Is there a way I can set folder file size limits through s3 configuration or api? If not would I have to use the apis somehow to calculate folder size on every upload? I know that there is the devpay feature in Amazon but it might be a hassle for users to sign up with Amazon if they want to just use small amount of free space.
There does not appear to be a way to do this, probably at least in part because there is actually no such thing as "folders" in S3. There is only the appearance of folders.
Amazon S3 does not have concept of a folder, there are only buckets and objects. The Amazon S3 console supports the folder concept using the object key name prefixes.
— http://docs.aws.amazon.com/AmazonS3/latest/UG/FolderOperations.html
All of the keys in an S3 bucket are actually in a flat namespace, with the / delimiter used as desired to conceptually divide objects into logical groupings that look like folders, but it's only a convenient illusion. It seems impossible that S3 would have a concept of the size of a folder, when it has no actual concept of "folders" at all.
If you don't maintain an authoritative database of what's been stored by clients (which suggests that all uploads should pass through an app server rather than going directly to S3, which is the the only approach that makes sense to me at all) then your only alternative is to poll S3 to discover what's there. An imperfect shortcut would be for your application to read the S3 bucket logs to discover what had been uploaded, but that is only provided on a best-effort basis. It should be reliable but is not guaranteed to be perfect.
This service provides a best effort attempt to log all access of objects within a bucket. Please note that it is possible that the actual usage report at the end of a month will slightly vary.
Your other option is to develop your own service that sits between users and Amazon S3, that monitors all requests to your buckets/objects.
— http://aws.amazon.com/articles/1109#13
Again, having your app server mediate all requests seems to be the logical approach, and would also allow you to detect immediately (as opposed to "discover later") that a user had exceeded a threshold.
I would maintain a seperate database in the cloud to hold each users total hdd usage count. Its easy to manage the count via S3 Object Lifecycle Events which could easily trigger a Lambda which in turn writes to a DB.