multiple organizations/accounts and dev environments for S3 buckets - amazon-web-services

As per best practice aws resources should be per account (prod, stage, ...) and its also good to give devs their own accounts with defined limits (budget, region, ...).
Im now wondering how i can create a full working dev environment especially when it comes to S3 buckets.
Most of the services are pay per use so its totally fine to spin up some lambdas, SQS etc. to use the real services for dev.
Now to the real questions what should be done with static assets like pictures, downloads and so on which are stored in S3 buckets?
Duplicating those buckets for every dev/environment could come expensive as you pay for storage and/or data transfer.
What i thought was to give the devs S3 bucket a redirect rule and when a file is not found (e.g. 404) in the dev bucket it redirects to the prod bucket so that images, ... are retrieved from there.
I have testet this and it works pretty well but it solves only part of the problem.
The other part is how to replace those files in a convenient way?
Currently static assets and downloads are also in our git (maybe not the best idea after all ... - how you handle file changes which should go live with new features, currently its convenient to have it in git as well) and when someone changes stuff they push it and it gets deployed to prod.
We could of course sync back the devs S3 bucket to prod bucket with the new files uploaded but how to combine this with merge requests and have a good CI/CD experience?
What are your solutions to have S3 buckets for every dev so that they can spinn up their own completely working dev environment with everything available to them?

My experience is that you don't want to complicate things just to save a few dollars. S3 costs are pretty cheap, so if you're just talking about website assets, like HTML, CSS, JavaScript, and some images, then you're probably going to spend more time creating, managing, and troubleshooting a solution than you'll save. Time is, after all, your most precious resource.
If you do have large items that need to be stored to make your system work then maybe have the S3 bucket have a lifecycle policy on those large items and delete them after some reasonable amount of time. If/when a dev needs that object they can retrieve it again from its source and upload it again, manually. You could write a script to do that pretty easily.

Related

Upload, compress and serve video with AWS

I am creating as my own project a mobile app. One of the functionalities of the application is uploading video files (up to 500mb) and watching uploaded videos by other users.
I was thinking about various server solutions and how many people have so many opinions. Unfortunately, it is hard for me to find someone among my friends who knows the topic well and would be able to advise. For the beginning, I think it makes sense to use AWS (but I've never done it) and I would like to ask you (if I can) for advice.
In step one I upload a video file to AWS S3 via the application
AWS MediaConverter compresses the video file, the old one is removed and replaced with a new one (Elastic Transcoder is very expensive)
In the application, I can paste direct links to s3 which I can use to serve videos.
As far as I understand, I don't need any other services than AWS S3 and AWS MediaConverter.
Or maybe I am thinking wrong and using amazon for this does not make sense?
Thanks!

Cloud File Storage with Bandwidth Limits

I want to develop an app for a friend's small business that will store/serve media files. However I'm afraid of having a piece of media goes viral, or getting DDoS'd. The bill could go up quite easily with a service like S3 and I really want to avoid surprise expenses like that. Ideally I'd like some kind of max-bandwidth limit.
Now, the solutions for S3 this has been posted here
But it does require quite a few steps. So I'm wondering if there is a cloud storage solution that makes this simpler I.e. where I don't need to create a custom microservice. I've talked to the support on Digital Ocean and they also don't support this
So in the interest of saving time, and perhaps for anyone else who finds themselves in a similar dilemma, I want to ask this question here, I hope that's okay.
Thanks!
Not an out-of-the-box solution, but you could:
Keep the content private
When rendering a web page that contains the file or links to the file, have your back-end generate an Amazon S3 pre-signed URLs to grant time-limited access to the object
The back-end could keep track of the "popularity" of the file and, if it exceeds a certain rate (eg 1000 over 15 minutes), it could instead point to a small file with a message of "please try later"

Videos written with moviepy on amazon aws S3 are empty

I am working on processing a dataset of large videos (~100 GB) for a collaborative project. To make it easier to share data and results, I am keeping all videos remotely on an amazon S3 bucket, and processing it by mounting the bucket on an EC2 instance.
One of the processing steps I am trying to do involves cropping the videos, and rewriting them into smaller segments. I am doing this with moviepy, splitting the video with the subclip method and calling:
subclip.write_videofile("PathtoS3Bucket"+VideoName.split('.')[0]+'part' +str(segment)+ '.mp4',codec = 'mpeg4',bitrate = "1500k",threads = 2)
I found that when the videos are too large (parameters set as above) calls to this function will sometimes generate empty files in my S3 bucket (~10% of the time). Does anyone have insight into features of moviepy/ffmpeg/S3 that would lead to this?
It is recommended not to use tools such as s3fs because these merely simulate a file system, whereas Amazon S3 is an object storage system.
It is generally better to create files locally, then copy them to S3 using standard API calls.

Sync data between EC2 instances

While I'm looking to move our servers to AWS, I'm trying to figure out how to sync data between our web nodes.
I would like to mount a disk on every web node and have a local cache of the entire share.
Are there any preferred ways to do this?
Sounds like you should consider storing your files on s3 originally and if performance is key, have a sync job that pulls copies of the files locally to your ec2 instance. S3 is fast, durable and cheap - maybe even fast enough without keeping a local cache - but if you do indeed need a local copy, there are tools such as the aws cli and other 3rd party tools.
http://docs.aws.amazon.com/cli/latest/reference/s3/sync.html
Depending on what you are trying to sync - take a look at
http://aws.amazon.com/elasticache/
It is an extremely fast and efficient method for sharing data.
One absolute easy solution is to install Dropbox sync client on both the machines and keep your files in Dropbox. This is by far the easiest !
In this approach, you can "load" data to the machines, using externally adding files to your dropbox account (not even go to AWS service to load) - From another machine or even from browser interface of Dropbox.

S3 Incremental Backups

I am currently using S3 to store large quantities of account level data such as images, text files and other forms of durable content that users upload in my application. I am looking to take an incremental snapshot of this data (once per week) and ship it off to another S3 bucket. I'd like to do this in order to protect against accidentally data loss, i.e. one of our engineers accidentally deleting a chunk of data in the S3 browser.
Can anyone suggest some methodology for achieving this? Would we need to host our own backup application on an EC2 instance? Is there an application that will handle this out of the box? The data can go into S3 Glacier and doesn't need to be readily accessible, it's more of an insurance policy than anything else.
EDIT 1
I believe switching on versioning maybe the answer (continuing to research this):
http://docs.amazonwebservices.com/AmazonS3/latest/dev/Versioning.html
EDIT 2
For others looking for answers to this question, there a good thread on ServerFault. I only came across this later:
https://serverfault.com/questions/9171/aws-s3-bucket-backups
Enabling versioning on your bucket is the right solution. It can be used to protect both against accidental deletes and overwrites as well.
There's a question on the S3 FAQ, under "Data Protection", that discusses exactly this issue (accidental deletes/overwrites): http://aws.amazon.com/s3/faqs/#Why_should_I_use_Versioning