I want to develop an app for a friend's small business that will store/serve media files. However I'm afraid of having a piece of media goes viral, or getting DDoS'd. The bill could go up quite easily with a service like S3 and I really want to avoid surprise expenses like that. Ideally I'd like some kind of max-bandwidth limit.
Now, the solutions for S3 this has been posted here
But it does require quite a few steps. So I'm wondering if there is a cloud storage solution that makes this simpler I.e. where I don't need to create a custom microservice. I've talked to the support on Digital Ocean and they also don't support this
So in the interest of saving time, and perhaps for anyone else who finds themselves in a similar dilemma, I want to ask this question here, I hope that's okay.
Thanks!
Not an out-of-the-box solution, but you could:
Keep the content private
When rendering a web page that contains the file or links to the file, have your back-end generate an Amazon S3 pre-signed URLs to grant time-limited access to the object
The back-end could keep track of the "popularity" of the file and, if it exceeds a certain rate (eg 1000 over 15 minutes), it could instead point to a small file with a message of "please try later"
Related
My problem:
I want to stream media I record on the client (typescript code) to my AWS storage (services like YouTube / Twitch / Zoom / Google Meet can live record and save the record to their cloud. Some of them even have host-failure tolerance and create a file if the host has disconnected).
I want each stream to have a different file name so future triggers will be available from it.
I tried to save the stream into S3, but maybe there are more recommended storage solutions for my problems.
What services I tried:
S3: I tried to stream directly into S3 but it doesn't really support updating files.
I tried multi-part files but they are not host-failure tolerance.
I tried to upload each part and have a lambda to merge it (yes, it is very dirty and consuming) but I sometimes had ordering problems.
Kinesis-Video: I tried to use kinesis-video but couldn't enable the saving feature with the SDK.
By hand, I saw it saved a new file after a period of time or after a size was reached so maybe it is not my wanted solution.
Amazon IVS: I tried it because Twitch recommended this although it is way over my requirements.
I couldn't find an example of what I want to do in code with SDK (only by hand examples).
Questions
Do I look at the right services?
What can I do with the AWS-SDK to make it work?
Is there a good place with code examples for future problems? Or maybe a way to search for solutions?
Thank you for your help.
There are cases in a project where I'd like to store images on a model.
For example:
Company Logos
Profile Pictures
Programming Languages
Etc.
Recently I've been using AWS S3 for file storage (primarily hosting on Heroku) via ImageField uploads.
I feel like there's a better way to store files than what I've been doing.
For some things (like for the examples above) I think it would make sense to actually just get an image url from a more publically available url than take up space in my own database.
For the experts in the Django community who have built and deployed really professional projects, do you typically store files directly into the Django media folder via ImageField?
or do you normally use a URLField and then pull a url from an API or an image link from the web (e.g., go on any Google image, right click and copy then paste image URL)?
Bonus: What does your image storing setup look like?
Hope this makes sense.
Thanks in advance!
The standard is what you've described, using something like AWS S3 to store the actual image and handle the URL in your database. Here's a few reasons why:
It's cheap. like really cheap
Instead of making your web server serve the files, you're offloading that onto the client (e.g. their browser grabbing the file from S3)
If you're using an ephemeral system (like Heroku), your only option is to use something like S3.
Control. Sure, you can pull an image link from somewhere else that isn't managed by you. But this does not scale. What happens if that server goes offline? What if they take that image down? This way, you control what happens to the objects.
An example of a decently large internet company but not large enough to run their own infrastructure (like Facebook/Instagram, Google, etc.) is VSCO. They're taking a decent amount of photo uploads every day and they're handling them with AWS.
I am relatively new to amazon web services. There is problem that came up while I was coding my new web app. I am currently storing profile pictures in an s3 bucket.
I don’t want these profile pictures to be seen by the public, only authorized members. So I have a php file like this:
This php file executes getObject and sends out a header to show the picture but only if the user is allowed to see the picture. I query the database and also check session to make sure that the currently logged in user has access to the picture. All is working fine, but it takes around 500 milliseconds to the get request to execute, even on small files (40kb). On bigger files it gets even longer as well as if I embed the php file in an img tag multiple times with different query string values.
I need to mention that I’m testing this in a localhost environment with apache webserver.
Could be the the problem is that getObject is optimized to be run from an ec2 instance and that if I would test this on an ec2 the response time is much better?
My s3 is based in London, and I’m testing it in Hungary with a good internet connection so I’m not sure if this response time is what I should get here.
I read that other people had similar issues, but from my understanding the time it takes from s3 to transfer the files to an ec2 should be minimal as they are all in the cloud and the latency between these services and all the other aws services should be minimal (At least if they are in the same region).
Please don’t tell me in comments that I should just make my bucket public and embed the direct link to the file as it is not a viable option for obvious reasons. I also don’t want to generate pre-signed urls for various reasons.
I also tested this without querying the database and essentially the only logic in my code is to get the object and show it to the user. Even with this I get 400+ milliseconds response time.
I also tried using doesObjectExist() and I still need to wait around 300-400 milliseconds for that to give me a response.
Multiple get request to the same php file as image source
UPDATE
I tested it on my ec2 instance and I've got much better response time. I tested it with multiple files and all is fine. It seems like that if you use getObject on localhost, the time it takes to connect to s3 and fetch the data multiplies.
Thank you for the answers!
I have a web app which serves media files (in other words pretty large) with public access. The files are hosted on S3. I'm wondering if AWS offers any kind of abuse-protection, for example detection or prevention against download hogs via some type of rate limiting. A scenario might be a single source re-downloading the same content repeatedly. I was hoping there might be some mechanism to detect that behavior and either take preventative action or notify me.
I'm looking at AWS docs and don't see anything but perhaps I'm not looking smartly enough.
How do folks who host files which are available publicly handle this?
S3 is mostly a file storage service, with elementary web server capabilities. I would highly recommend you place a CDN between your end users and S3. A good CDN will provide protection from the sort of abuse you are talking about, while also serving the files to the user more quickly.
If you are mostly worried about how the abuse will affect your bills (and they can get very large so its good to be concerned about this), I would suggest that you put in some billing alerts on your account that alarm when certain thresholds are reached.
I have a step alarms set on my account so that I know when it hits 25%, 50%, 75% and 100% of what I budget each month. That way, for example, if I hit an alarm that tells me I have used 25% of my budget in the first two days of the month, I know I better look into it.
I have an application based on php in one amazon instance for uploading and transcoding audio files. This application first uploads the file and after that transcodes that and finally put it in one s3 bucket. At the moment application shows the progress of file uploading and transcoding based on repeatedly ajax requests by monitoring file size in a temporary folder.
I was wondering all the time if tomorrow users rush to my service and I need to scale my service with any possible way in AWS.
A: What will happen for my upload and transcoding technique?
B: If I add more instances does it mean I have different files on different temporary conversion folders in different physical places?
C: If I want to get the file size by ajax from http://www.example.com/filesize up to the finishing process do I need to have the real address of each ec2 instance (i mean ip,dns) or all of the instances folders (or folder)?
D: When we scale what will happen for temporary folder is it correct that all of instances except their lamp stack locate to one root folder of main instance?
I have some basic information about scaling in the other hosting techniques but in amazon these questions are in my mind.
Thanks for advice.
It is difficult to answer your questions without knowing considerably more about your application architecture, but given that you're using temporary files, here's a guess:
Your ability to scale depends entirely on your architecture, and of course having a wallet deep enough to pay.
Yes. If you're generating temporary files on individual machines, they won't be stored in a shared place the way you currently describe it.
Yes. You need some way to know where the files are stored. You might be able to get around this with an ELB stickiness policy (i.e. traffic through the ELB gets routed to the same instances), but they are kind of a pain and won't necessarily solve your problem.
Not quite sure what the question is here.
As it sounds like you're in the early days of your application, give this tutorial and this tutorial a peek. The first one describes a thumbnailing service built on Amazon SQS, the second a video processing one. They'll help you design with best AWS practices in mind, and help you avoid many of the issues you're worried about now.
One way you could get around scaling and session stickiness is to have the transcoding update a database with the current progress. Any user returning checks the database to see the progress of their upload. No need to keep track of where the transcoding is taking place since the progress gets stored in a single place.
However, like Christopher said, we don't really know anything about you're application, any advice we give is really looking from the outside in and we don't have a good idea about what would be the easiest thing for you to do. This seems like a pretty simple solution but I could be missing something because I don't know anything about your application or architecture.