In my scenario CDN request are for images on the article pages, I want to make sure that they exist before rendering, if one doesn't exist we show a generic one An example is https://cloudfront.abc.com/CDNSource/teasers/56628.jpg there is 1 image per article.
The problem is occurs when The synchronous requests held the page execution for many minutes and then the load balancer timed out the request. looks like caused by making HTTP requests with no HTTP timeout.
Currently, my resources are in S3 so perhaps a CRON that syncs data hourly to the webservers, and the webservers take a copy of the S3 bucket when building. I think For that solution though I'd need an EBS to scale to our total image size. I'm not sure about it. Can anyone please guide me on how to calculate it, what that would be?
I had tried to use EFS for session storage previously but found its cost was far too high for us to use in production, +$15,000/mo. also please advise if there would be a better solution for this?
Related
In order to learn how to connect backend to AWS, I am writing a simple notepad application. On the frontend it uses Editor.js as an alternative to traditional WYSIWYG. I am wondering how best to synchronise the images uploaded by a user.
To upload images from disk, I use the following plugin: https://github.com/editor-js/image
In the configuration of the tool, I give the api endpoint of the server to upload the image. The server in response have to send the url to the saved file. My server saves the data to s3 and returns the link.
But what if someone for example adds and removes the same file over and over again? Each time, there will be a new request to aws.
And here is the main part of the question, should I optimize it somehow in practice? I'm thinking of saving the files temporarily on my server first, and only doing a synchronization with aws from time to time. How this is done in practice? I would be very grateful if you could share with me any tips or resources that I may have missed.
I am sorry for possible mistakes in my English, i do my best.
Thank you for help!
I think you should upload them to S3 as soon as they are available. This way you are ensuring their availability and resistance to failure of you instance. S3 store files across multiple availability zones (AZs) ensuring reliable long-term storage. On the other hand, an instance operates only within one AZ and if something happens to it, all your data on the instance is lost. So potentially you can lost entire batch of images if you wait with the uploads.
In addition to that, S3 has virtually unlimited capacity, so you are not risking any storage shortage. When you keep them in batches on an instance, depending on the image sizes, there may be a scenario where you simply run out of space.
Finally, the good practice of developing apps on AWS is to make them stateless. This means that your instances should be considered disposable and interchangeable at any time. This is achieved by not storing any user data on the instances. This enables you to auto-scale your application and makes it fault tolerant.
This question already has answers here:
Is Google Cloud Storage an automagical global CDN?
(4 answers)
Closed 3 years ago.
Based on the discussions here I made my Google Cloud Storage image public but its still taking a lot of time for TTFB? Any idea why? How can I reduce the TTFB while calling Google Cloud Storage?. My URL and snapshot of what i see on developer tools is given below
Public image URL
Ok, now I understand your question. Your concern is how to reduce the TTBF time of requesting an image from Google Cloud Storage. There is not a magic way to reduce the TTFB to 0. This is near to impossible. Time to First Byte is how long the browser has to wait before start receiving data. For the specific case of Google Cloud Storage is (in a general way) the time between you requesting an image, this message being delivered to the Google server where your image is stored, this server searching the image and delivering the image to you.
This will depend on 2 main factors:
The speed of the message being transport to/from the server. This will depend on the speed of your connection and the distance between the server and you. It is not the same if you are fetching an image form USA or from India, this will give you 2 very different TTFB.
You can see this example where I get the same image from 2 different buckets with public policy. For reference, I'm in Europe.
Here is my result calling the image from a bucket in Europe:
And here is my result calling the image from India:
As you can see that my download time doesn't increase that much while my TTFB is doubled.
The second factor to see if you want to reduce your TTFB is the speed of the request being processed by your server. In this case, you don't have much influence on this since you are requesting this directly from Google Cloud Storage and you can't modify the code. The only way to influence this is by removing load to the request. Making the image public helps with this because now the server doesn’t have to look for certs or permissions, it will just send you back the image.
So, in conclusion, there is not much that you can do in here to reduce the TTFB more than select a server closer to your user location and improve your internet speed.
I found this article really useful and could help you to understand better the TTFB and how to understand it measurement.
Thanks I have moved my bucket to a nearby location to my users. That has reduced the time
I'm currently trying for my own education to achieve the fastest site speed possible for a landing page. I have hosted it once on Siteground with all speed optimizations on (Minify, Level 3 Supercacher (Memcached) and Lazy Loading, Cloudflare)
I have setup the 99% same site (100% not possible since SG has its own optimizer)
I assumed AWS will be faster. But when I look in my Developer Toolbar, Pingdom or GTMetrix SG wins. The reason is all files have a longer waiting time. I know it is minimal, but given the fact I want to achieve maximum speed I am wondering what the reason is. I tried to use a bigger instance, but changing from t2.micro to m4.16xlarge didn't make a difference. I am wondering if it would anyway without visitors.
This is the loading time on Siteground:
This is on my AWS Site:
The difference is that the JS and CSS files on SG get loaded in 20-30ms and the files on AWS in 40-50ms.
The last options I could try would be Varnish Cache or move to Lightsail, but I'm not sure if this will help.
I have a web app which serves media files (in other words pretty large) with public access. The files are hosted on S3. I'm wondering if AWS offers any kind of abuse-protection, for example detection or prevention against download hogs via some type of rate limiting. A scenario might be a single source re-downloading the same content repeatedly. I was hoping there might be some mechanism to detect that behavior and either take preventative action or notify me.
I'm looking at AWS docs and don't see anything but perhaps I'm not looking smartly enough.
How do folks who host files which are available publicly handle this?
S3 is mostly a file storage service, with elementary web server capabilities. I would highly recommend you place a CDN between your end users and S3. A good CDN will provide protection from the sort of abuse you are talking about, while also serving the files to the user more quickly.
If you are mostly worried about how the abuse will affect your bills (and they can get very large so its good to be concerned about this), I would suggest that you put in some billing alerts on your account that alarm when certain thresholds are reached.
I have a step alarms set on my account so that I know when it hits 25%, 50%, 75% and 100% of what I budget each month. That way, for example, if I hit an alarm that tells me I have used 25% of my budget in the first two days of the month, I know I better look into it.
I have an application based on php in one amazon instance for uploading and transcoding audio files. This application first uploads the file and after that transcodes that and finally put it in one s3 bucket. At the moment application shows the progress of file uploading and transcoding based on repeatedly ajax requests by monitoring file size in a temporary folder.
I was wondering all the time if tomorrow users rush to my service and I need to scale my service with any possible way in AWS.
A: What will happen for my upload and transcoding technique?
B: If I add more instances does it mean I have different files on different temporary conversion folders in different physical places?
C: If I want to get the file size by ajax from http://www.example.com/filesize up to the finishing process do I need to have the real address of each ec2 instance (i mean ip,dns) or all of the instances folders (or folder)?
D: When we scale what will happen for temporary folder is it correct that all of instances except their lamp stack locate to one root folder of main instance?
I have some basic information about scaling in the other hosting techniques but in amazon these questions are in my mind.
Thanks for advice.
It is difficult to answer your questions without knowing considerably more about your application architecture, but given that you're using temporary files, here's a guess:
Your ability to scale depends entirely on your architecture, and of course having a wallet deep enough to pay.
Yes. If you're generating temporary files on individual machines, they won't be stored in a shared place the way you currently describe it.
Yes. You need some way to know where the files are stored. You might be able to get around this with an ELB stickiness policy (i.e. traffic through the ELB gets routed to the same instances), but they are kind of a pain and won't necessarily solve your problem.
Not quite sure what the question is here.
As it sounds like you're in the early days of your application, give this tutorial and this tutorial a peek. The first one describes a thumbnailing service built on Amazon SQS, the second a video processing one. They'll help you design with best AWS practices in mind, and help you avoid many of the issues you're worried about now.
One way you could get around scaling and session stickiness is to have the transcoding update a database with the current progress. Any user returning checks the database to see the progress of their upload. No need to keep track of where the transcoding is taking place since the progress gets stored in a single place.
However, like Christopher said, we don't really know anything about you're application, any advice we give is really looking from the outside in and we don't have a good idea about what would be the easiest thing for you to do. This seems like a pretty simple solution but I could be missing something because I don't know anything about your application or architecture.