I am looking to consider S3 as a backup storage to a primary Redis DB.
I would like to be able to archive data out from Redis and into S3 which is rarely used. This however brings up the question of how quick is an S3 select? Is it quick enough for example to respond to a post request on Apache?
The data I would be looking to store is JSON files containing 5 or 6 values for each minute of the day so the file is unlikely to be larger than a few meg and will consist of 1440 objects (1 per minute of the day). Can anyone share their experience with the latency on a select against data like this?
I am getting a test setup done for it now but didn't want to bury time if the response times are routinely 5 seconds for example.
I am working on a project where a photographer is going to upload a set of high-resolutions pictures (20 to 40 pictures). I am going to store each picture twice: 1 original and 1 with the watermark. On the platform, only pictures with a watermark are going to be displayed. The user will be able to buy pictures and the one selected are going to be send by email (original pictures).
bucket-name: photoshoot-fr
main-folder(s): YYYY-MM-DD-HH_MODEL-NAME example: 2020-01-03_Melania-Doul
I am not sure here if I should have 2 different folder inside the previous folder which are original and protected. Both folders are going to contain the exact same pictures with the same id but one is going to store the original pictures and the other one protected pictures. Is there any better bucket design solution?
n.b: it's a personal project but there are multiple photographers and each photographers are going to upload 2-3 set of photos every day. Each main-folder is going to be deleted after 2 months.
I have around 6000 videos that need to be either deleted or moved to a specific project. However I only get around 1000 api calls before I am rate limited. So is there any way to send, all videos to be deleted, or all videos to be moved, in a single api call?
Batch requests are only possible on a select number of endpoints; deleting videos is not one of those endpoints. You'll need to distribute those 6000 video deletion requests over a period of time to avoid rate limit bans.
Let's say I upload 10,000 documents to CloudSearch. CloudSearch would take some time to index them and I already have another 10,000 documents lined up to be uploaded. Now, the problem is that my ingestion flow would check if any of the documents in second batch already exists in my domain. If it does, then it would merge these documents and then upload them. However, if the indexing is in progress then my ingestion flow might miss some records and they will be overwritten by the second batch.
How can I solve this problem?
Can I know if the first batch has finished indexing before I start ingestion of the second batch?
We have lots of images stored in AWS s3. We plan to provide thumbnails for user to preview. If I store them in S3, I can only retrieve them one by one, which is not efficient. Show I store them in database? (I need query my database to decide which set of thumbnails to show the user to preview)
The best answer depends on the usage pattern of the images.
For many applications, S3 would be the best choice for the simple reason that you can easily use S3 as an origin for CloudFront, Amazon's CDN. By using CloudFront (or indeed any CDN), the images are hosted physically around the world and served from the fastest location for a given user.
With S3, you would not retrieve the images at all. You would simply use S3's URL in your final HTML page (or the CloudFront URL, if you go that route).
If you serve images from the database, that increases resource consumption on the DB (more IO, more CPU, and some RAM used to cache image queries that is not available to cache other queries).
No matter which route you go, pre-create the thumbnail rather than producing it on the fly. Storage space is cheap, and the delay to fetch (from S3 or the DB), process, then re-serve the thumbnail will lessen the user experience. Additionally if you create the thumbnail on the fly, you cannot benefit from a CDN.
If I store them in S3, I can only retrieve them one by one, which is not efficient.
No, it only looks inefficient because of the way you are using it.
S3 is massively parallel. It can serve your image to tens of thousands of simultaneous users without breaking a sweat. It can serve 100's of images to the same user in parallel -- so you can load 100 images in the same time it takes to load 1 image. So why is your page slow?
Your browsers is trying to be a good citizen and only pull 2-4 images from a site at a time. This "serialization" is what is slowing you down and causing the bottleneck.
You can trick the browser by hosting assets on multiple domains. This is called "domain sharding". You can do it with multiple buckets (put images into 4 different buckets, depending on the last digit of their ID). Alternatively, you can do it with CloudFront: http://abhishek-tiwari.com/post/CloudFront-design-patterns-and-best-practices/
As a best practice, you should store your static data in S3 & save their reference in Db.
In your particular case, you can save filename / hyperlink to the image file in a database that you can query upon depending on your business logic.
This will give you reference to all the images that you can now fetch from S3 & display it to your users.
This can also help you to replace your reference to thumbnail depending on your need. For example, if you are running a e-commerce site, you can replace the thumbnail reference to point to new product image without much effort.
I hope this helps.