currently my app can upload images to a bucket in APSE1(singapore) and my app is mostly used in south east asia, so everything is pretty fast. I am wondering how can I support multiple regions? Let's say I want to also get people in the US to use my app, right now, their uploads will be slow since the bucket location is in singapore. I know that there's this S3 feature to replicate data across regions, but I wonder how I can detect the user location and get presigned upload url to closest bucket for that particular user? Right now I hardcode it to singapore... Any ideas? Thanks!
Your best bet is probably putting a CloudFront distribution in front of the bucket.
That said, have you benchmarked this? My understanding is that latency is more of a concern for a flurry of small requests than it would be for something like one or just a couple of large image uploads.
Related
I'm developing CMS which runs as a single instance, but serves multiple websites of different users. This CMS needs to store files in storage. Each website can have either few images but also thousands of objects. Currently we serve around 5 websites, but plan to have hundreds, so it must scale easily.
Now I'm thinking about two possible ways to go. I want to use S3 for storage.
solution is to have single bucket for all files in my app
solution is to have one bucket for each website.
According to AWS docs, S3 can handle "virtually unlimited amount of bytes", so I think first solution could work well, but I'm thinking about other aspects:
Isn't it just cleaner to have one bucket for each website? Is it better for maintance?
Which solution is more secure, if so? Are there some security concerns to care about?
Is same applicable to other S3-like services like Minio or DigitalOcean Spaces?
Thank you very much for your answers.
I'd go for solution 1.
From a technical perspective there really is virtually no limit to the amounts of objects you can put in a bucket - S3 is built for extreme scale. For 5 websites option 2 might sound tempting, but that doesn't scale very well.
There's a soft-limit (i.e. you can raise it) of 100 buckets per region or per account, which is an indication that using hundreds of buckets is probably an anti-pattern. Also securing 100s of buckets is not easier than securing one bucket.
Concerning security: You can be very granular with bucket policies in S3 if you need that. You can also choose how you want to encrypt each object individually if that is a requirement. Features like pre-signed URLs can help you grant temporary access to specific objects in S3.
If your goal is to serve static content to end users, you'll have to either make the objects publicly readable, use the aforementioned pre-signed URLs or set up CloudFront as a CDN in front of your bucket.
I don't know how this relates to S3-like services.
Here's my situation and my goal(s):
I have a SaaS where users (globally) can upload audio files. These audio files are then later streamed (via HTML5 <audio>) to potentially anyone in the world. Currently, the only bucket hosting files is in us-west-2, which is obviously problematic when EU customers upload files, and EU users stream audio.
How can I have AWS:
Serve up audio files to a user, using the appropriate region based on their geographical location
Receive uploads using the S3 bucket (region) closest to the user uploading files
I thought maybe CloudFront would do the trick, but AFAIK, CloudFront requires a file to be downloaded once before it actually caches it, and that won't work for my SaaS. A common use case is that someone in the US might upload an important audio file for someone in Germany to listen to. I would need that person in Germany to experience as fast a streaming experience as possible, and currently I'm getting complaints of slow load times and choppy audio.
S3 cross-region replication might make sense (replicating to eu-central-1 as a good starting point, to cover customers in Scandinavia, other European countries, and the UK), but I'm not sure how to make a single S3 URL pull the file from a specific bucket based on the user's geographical location.
What's the best solution here, and how do I execute it?
To improve file upload performance, you can use Amazon S3 Transfer Acceleration which enables fast, easy, and secure transfers of files over long distances between your client and an S3 bucket. Transfer Acceleration takes advantage of Amazon CloudFront’s globally distributed edge locations. As the data arrives at an edge location, data is routed to Amazon S3 over an optimized network path.
To improve file download performance, you need to use AWS Cloudfront caching. Since Cloudfront caches content from first request onwards, if you need improve even the first request performance per region, you can automatically populate the cache by requesting the URL timely from different regions.
After reading some AWS documentations, I am wondering what's the difference between these different use cases if I want to delivery (js, css, images and api request) content in Asia (including China), US, and EU.
Store my images and static files on S3 US region and setup EU and Asia(Japan or Singapore) cross region replication to sync with US region S3.
Store my images and static files on S3 US region and setup cloudfront CDN to cache my content in different locations after initial request.
Do both above (if there is significant performance improvement).
What is the most cost effective solution if I need to achieve global deployment? And how to make request from China consistent and stable (I tried cloudfront+s3(us-west), it's fast but the performance is not consistent)?
PS. In early stage, I don't expect too many user requests, but users spread globally and I want them to have similar experience. The majority of my content are panorama images which I'd expect to load ~30MB (10 high res images) data sequentially in each visit.
Cross region replication will copy everything in a bucket in one region to a different bucket in another region. This is really only for extra backup/redundancy in case an entire AWS region goes down. It has nothing to do with performance. Note that it replicates to a different bucket, so you would need to use different URLs to access the files in each bucket.
CloudFront is a Content Delivery Network. S3 is simply a file storage service. Serving a file directly from S3 can have performance issues, which is why it is a good idea to put a CDN in front of S3. It sounds like you definitely need a CDN, and it sounds like you have tested CloudFront and are unimpressed. It also sounds like you need a CDN with a larger presence in China.
There is no reason you have to chose CloudFront as your CDN just because you are using other AWS services. You should look at other CDN services and see what their edge networks looks like. Given your requirements I would highly recommend you take a look at CloudFlare. They have quite a few edge network locations in China.
Another option might be to use a CDN that you can actually push your files to. I've used this feature in the past with MaxCDN. You would push your files to the CDN via FTP, and the files would automatically be pushed to all edge network locations and cached until you push an update. For your use case of large image downloads, this might provide a more performant caching mechanism. MaxCDN doesn't appear to have a large China presence though, and the bandwidth charges would be more expensive than CloudFlare.
If you want to serve your files in S3 buckets to all around the world, then I believe you may consider using S3 Transfer acceleration. It can be used in cases where you either upload to or download from your S3 bucket . Or you may also try AWS Global Accelerator
CloudFront's job is to cache content at hundreds of caches ("edge locations") around the world, making them more quickly accessible to users around the world. By caching content at locations close to users, users can get responses to their requests more quickly than they otherwise would.
S3 Cross-Region Replication (CRR) simply copies an S3 bucket from one region to another. This is useful for backing up data, and it also can be used to speed up content delivery for a particular region. Unlike CloudFront, CRR supports real-time updating of bucket data, which may be important in situations where data needs to be current (e.g. a website with frequently-changing content). However, it's also more of a hassle to manage than CloudFront is, and more expensive on a multi-region scale.
If you want to achieve global deployment in a cost-effective way, then CloudFront would probably be the better of the two, except in the special situation outlined in the previous paragraph.
Here's my situation, users upload their files into an S3 bucket, and I'm using CloudFront to deliver images files.
But I found when a user upload multiple images (around 4~5 pics, all new uploads, NOT UPDATES), at least one image won't be able to display instantly through CloudFront (AccessDenied error), but after around 10 minutes, the image will be able to show up correctly. But meanwhile, if I use a VPN connection located in other place, the image will show up instantly, so I believe it should be a problem in my local CloudFront endpoint.
My S3 bucket is located in the US West 2 region (oregon).
Is this situation common? I did a Google search, but I didn't see anyone complaining about the same problem, so I'm worrying if I did something wrong.
Since heroku file system is ephemeral , I am planning on using AWS for static assets for my django project on heroku
I am seeing two conflicting articles one which advises on using AWS S3. This one says to use S3
https://devcenter.heroku.com/articles/s3
While another one below says, S3 has disadvantages and to use Cloudfront CDN instead
https://devcenter.heroku.com/articles/using-amazon-cloudfront-cdn
Many developers make use of Amazon’s S3 service for serving static
assets that have been uploaded previously, either manually or by some
form of build process. Whilst this works, this is not recommended as
S3 was designed as a file storage service and not for optimal delivery
of files under load. Therefore, serving static assets from S3 is not
recommended.
Amazon CloudFront is a Content Delivery Network (CDN) that integrates with other Amazon Web Services like S3 that give us an easy way to distribute content to end users with low latency, high data transfer speeds.
CloudFront makes your static files available from data centers around the world (called edge locations). When a visitor requests a file from your website, he or she is invisibly redirected to a copy of the file at the nearest edge location (Now AWS has around 35 edge locations spread across the world), which results in faster download times than if the visitor had accessed the content from S3 bucket located in a particular region.
So if your user base is spread across the world its a better option to use CloudFront else if your users are localized you would not find much difference using CloudFront than S3 (but in this case you need to choose right location for your your S3 bucket: US East, US West, Asia Pacific, EU, South America etc)
Comparative features of Amazon S3 and CloudFront
My recommendation is to use CloudFront on top of Whitenoise. You will be serving the static assets directly from your Heroku app, but CloudFront as the CDN will take over once you reach scale.
Whitenoise radically simplifies build processes and the need to use convoluted caching headers.
Read http://whitenoise.evans.io/en/latest/ for the full manifesto.
(Note that Whitenoise is relevant only for static assets bundled with your app, not for user-uploaded files, which still require S3 for proper storage. You'd still want to use CF though.)
Actually, you should use both.
CloudFront only acts as a CDN, which basically means it caches resources in edge locations all over the world. In order for this to work, it has to initially download those resources from an origin location, whenever they expire or don't yet exist.
CloudFront distributions can have one of two possible origin types. S3 or EC2. In your case, you should store your assets in S3 and connect the bucket to a CloudFront distribution. Use the CloudFront links for actually serving the assets, and S3 for storage.
This will ensure the best possible performance, as well as correct and scalable load handling.
Hope this helps, let me know if you need additional info in the comments section.