Digital Ocean Spaces CDN - duplicate files to all regions automatically - digital-ocean

I've set up a single DO 'Space' in the Europe region and I want to create identical spaces across all the other available regions. Ideally I would be able to just upload a file once, to one of the buckets, and it would automatically duplicate and sync with all the other regions. The end goal is a global CDN of sorts.
I haven't been able to find any solutions specific to DO which aren't just upload to all regions manually.

Digital Ocean doesn't have a built-in way to replicate a bucket across regions. You'll have to write a script to do this yourself.

Related

S3 buckets composition

I'm developing CMS which runs as a single instance, but serves multiple websites of different users. This CMS needs to store files in storage. Each website can have either few images but also thousands of objects. Currently we serve around 5 websites, but plan to have hundreds, so it must scale easily.
Now I'm thinking about two possible ways to go. I want to use S3 for storage.
solution is to have single bucket for all files in my app
solution is to have one bucket for each website.
According to AWS docs, S3 can handle "virtually unlimited amount of bytes", so I think first solution could work well, but I'm thinking about other aspects:
Isn't it just cleaner to have one bucket for each website? Is it better for maintance?
Which solution is more secure, if so? Are there some security concerns to care about?
Is same applicable to other S3-like services like Minio or DigitalOcean Spaces?
Thank you very much for your answers.
I'd go for solution 1.
From a technical perspective there really is virtually no limit to the amounts of objects you can put in a bucket - S3 is built for extreme scale. For 5 websites option 2 might sound tempting, but that doesn't scale very well.
There's a soft-limit (i.e. you can raise it) of 100 buckets per region or per account, which is an indication that using hundreds of buckets is probably an anti-pattern. Also securing 100s of buckets is not easier than securing one bucket.
Concerning security: You can be very granular with bucket policies in S3 if you need that. You can also choose how you want to encrypt each object individually if that is a requirement. Features like pre-signed URLs can help you grant temporary access to specific objects in S3.
If your goal is to serve static content to end users, you'll have to either make the objects publicly readable, use the aforementioned pre-signed URLs or set up CloudFront as a CDN in front of your bucket.
I don't know how this relates to S3-like services.

How to optimize download speeds from AWS S3 bucket?

We keep user-specific downloadable files on AWS S3 buckets in N.Virginia region. Our clients download the files from these buckets all over the world. Files size ranges from 1-20 GB. For larger files, clients in non-US location face and complain about slow downloads or interrupted downloads. How can we optimize these downloads?
We are thinking about the following approaches:
Accelerated downloads (higher costs)
use of CloudFront CDN with S3 origin (Since our downloads are of different files, each file being downloaded just once or twice, will CDN help since, for 1st time, it will fetch data from US bucket only)
Use of akamai as CDN (same concern as of CloudFront, only thing is we have a better price deal with akamai at org level)
Depending on the user's location (we know where the download will happen), we can keep the file in the specific bucket which was created at that aws region.
So, I want recommendations in terms of cost+download speed. Which may be a better option to explore further?
As each file will only be downloaded a few times, you won't benefit from CloudFront's caching, because the likelihood that the download requests all hit the same CloudFront node and that this node hasn't evicted the file from its cache yet, are probably near zero, especially for such large files.
On the other hand you gain something else by using CloudFront or S3 Transfer Acceleration (the latter one being essentially the same as the first one without caching): The requests enter AWS' network already at the edge, so you can avoid using congested networks from the location of the user to the location of your S3 bucket, which is usually the main reason for slow and interrupted downloads.
Storing the data depending on the users location would improve the situation as well, although CloudFront edge locations are usually closer to a user than the next AWS region with S3. Another reason for not distributing the files to different S3 buckets depending on the users location is the management overhead: You need to manage multiple S3 buckets, store each file in the correct bucket and point each user to the correct bucket. While storing could be simplified by using S3 Replication (you could use a filter to only replicate objects to a specific target bucket meant for this bucket), the overhead with managing multiple endpoints for multiple customers remains. Also while you state that you know the location of the customers, what happens if a customer does change its location and suddenly wants to download an object which is now stored on the other side of the world? You'd have the same problem again.
In your situation I'd probably choose option 2 and set up CloudFront in front of S3. I'd prefer CloudFront over S3 Transfer Acceleration, as it gives you more flexibility: You can use your own domain with HTTPS, you can later on reconfigure origins when the location of the files changes, etc. Depending on how far you want to go you could even combine that with S3 replication and have multiple origins for your CloudFront distribution to direct requests for different files to S3 buckets in different regions.
Which solution to choose depends on your use case and constraints. One constraint seems to be cost for you, another one could for example be the maximum file size of 20GB supported by CloudFront, if you have files to distribute larger than that.

Add whole S3 images bucket to Amazon Rekognition collection

In AWS CLI, I can add to a collection only a single image at a time.
Is there any way to add the whole S3 bucket to a collection?
The IndexFaces() API call accepts only one image at a time, but can index up to 100 faces from that image.
If you wish to add faces from multiple images (eg a whole bucket or folder), you would need to call IndexFaces() multiple times (once per image). This would involve a call to Amazon S3 to list the files, then a loop to call IndexFaces().
It would be relatively simple in a scripting language like Python.

Difference between s3 bucket vs host files for Amazon Cloud Front

Background
We have developed an e-commerce application where I want to use CDN to improve the speed of the app and also to reduce the load on the host.
The application is hosted on an EC2 server and now we are going to use Cloud Front.
Questions
After reading a lot of articles and documents, I have created a distribution for my sample site. After doing all the experience I have come to know the following things. I want to be sure if am right about these points or not.
When we create a distribution it takes all the accessible data from the given origin path. We don't need to copy/ sync our files to cloud front.
We just have to change the path of our application according to this distribution CNAME (if cname is given).
There is no difference between placing the images/js/CSS files on S3 or on our own host. Cloud Front will just take them by itself.
The application will have thousands of pictures of the products, should we place them on S3 or its ok if they are on the host itself? Please share any good article to understand the difference of both the techniques.
Because if S3 is significantly better then I'll have to make a program to sync all such data on S3.
Thanks for the help.
Some reasons to store the images on Amazon S3 rather than your own host (and then serve them via Amazon CloudFront):
Less load on your servers
Even though content is cached in Amazon CloudFront, your servers will still be hit with requests for the first access of each object from every edge location (each edge location maintains its own cache), repeated every time that the object expires. (Refreshes will generate a HEAD request, and will only re-download content that has changed or been flushed from the cache.)
More durable storage
Amazon S3 keeps copies of your data across multiple Availability Zones within the same Region. You could also replicate data between your servers to improve durability but then you would need to manage the replication and pay for storage on every server.
Lower storage cost
Storing data on Amazon S3 is lower cost than storing it on Amazon EBS volumes. If you are planning on keeping your data in both locations, then obviously using S3 is more expensive but you should also consider storing it only on S3, which makes it lower cost, more durable and less for you to backup on your server.
Reasons to NOT use S3:
More moving parts -- maintaining code to move files to S3
Not as convenient as using a local file system
Having to merge log files from S3 and your own servers to gather usage information

Difference between Amazon S3 cross region replication and Cloudfront

After reading some AWS documentations, I am wondering what's the difference between these different use cases if I want to delivery (js, css, images and api request) content in Asia (including China), US, and EU.
Store my images and static files on S3 US region and setup EU and Asia(Japan or Singapore) cross region replication to sync with US region S3.
Store my images and static files on S3 US region and setup cloudfront CDN to cache my content in different locations after initial request.
Do both above (if there is significant performance improvement).
What is the most cost effective solution if I need to achieve global deployment? And how to make request from China consistent and stable (I tried cloudfront+s3(us-west), it's fast but the performance is not consistent)?
PS. In early stage, I don't expect too many user requests, but users spread globally and I want them to have similar experience. The majority of my content are panorama images which I'd expect to load ~30MB (10 high res images) data sequentially in each visit.
Cross region replication will copy everything in a bucket in one region to a different bucket in another region. This is really only for extra backup/redundancy in case an entire AWS region goes down. It has nothing to do with performance. Note that it replicates to a different bucket, so you would need to use different URLs to access the files in each bucket.
CloudFront is a Content Delivery Network. S3 is simply a file storage service. Serving a file directly from S3 can have performance issues, which is why it is a good idea to put a CDN in front of S3. It sounds like you definitely need a CDN, and it sounds like you have tested CloudFront and are unimpressed. It also sounds like you need a CDN with a larger presence in China.
There is no reason you have to chose CloudFront as your CDN just because you are using other AWS services. You should look at other CDN services and see what their edge networks looks like. Given your requirements I would highly recommend you take a look at CloudFlare. They have quite a few edge network locations in China.
Another option might be to use a CDN that you can actually push your files to. I've used this feature in the past with MaxCDN. You would push your files to the CDN via FTP, and the files would automatically be pushed to all edge network locations and cached until you push an update. For your use case of large image downloads, this might provide a more performant caching mechanism. MaxCDN doesn't appear to have a large China presence though, and the bandwidth charges would be more expensive than CloudFlare.
If you want to serve your files in S3 buckets to all around the world, then I believe you may consider using S3 Transfer acceleration. It can be used in cases where you either upload to or download from your S3 bucket . Or you may also try AWS Global Accelerator
CloudFront's job is to cache content at hundreds of caches ("edge locations") around the world, making them more quickly accessible to users around the world. By caching content at locations close to users, users can get responses to their requests more quickly than they otherwise would.
S3 Cross-Region Replication (CRR) simply copies an S3 bucket from one region to another. This is useful for backing up data, and it also can be used to speed up content delivery for a particular region. Unlike CloudFront, CRR supports real-time updating of bucket data, which may be important in situations where data needs to be current (e.g. a website with frequently-changing content). However, it's also more of a hassle to manage than CloudFront is, and more expensive on a multi-region scale.
If you want to achieve global deployment in a cost-effective way, then CloudFront would probably be the better of the two, except in the special situation outlined in the previous paragraph.