choosing a hosting platform that allows file and directory creation - amazon-web-services

I am trying to launch a project where my server generates user files and directories. Since heroku doesn't allow that, i am trying to find the best platform that will fit my needs without changing a bunch of my code.
my node server is storing data to firebase along with some files on the server itself. I realize this is not best practice but it is what it is for now
What would you recommend?

You can store your objects in S3. Do not store files on VMs in case of any failure.

Depending on your needs, an EBS volume would be a good start. It is meant to be redundant and the chances of losing any data is very small. The advantage is that it lives on if you terminate or stop an instance.
The newer EFS is very fast and can be mounted to multiple machines, much like an NFS file system. It is redundant across availability zones and will also survive a machine stop/termination.
S3 is an object store and isn't really meant for file system I/O. It can easily store files but it doesn't have nearly the performance of either EBS or EFS. It lives on after machine termination - indeed, it can be accessed with HTTP when properly configured.
Ultimately, you can create files normally on the EC2 with instance store, EBS, or EFS. The instance store data is lost if you terminate or even stop the instance. Be careful with that - you can easily lose tons of data when it is on instance store and not properly backed up.

Related

Syncing remote folders from several machines to one AWS instance

I have 3 AWS P instances processing some heavy stuff and saving results to relevant /home/user/folder
Also I have a main server with the same folder where I want to collect results from those 3 instances
Each instance works on its own part of the whole task, their results in sub folders not overlapping
Instances are 2 TB each, so I would like to get results from each instance as soon as they appear
This way when its job is done, I won't spend half a day copying results to the main server
I think one way of solving this is running something like this on each instance:
*/30 * * * * rsync /home/user/folder ubuntu#1.1.1.1:/home/user/folder
Are there any other more smart ways of achieving same results given that all of instances are AWS?
I also thought about (1) detachable storage and (2) storing on S3 but being new to AWS I might overlook some hidden pitfalls in such workflows, especially when it comes to terabytes of data and expensive instances.
How do you collect processed data from remote instances?
I would consider using rclone tool, which can be easy configured for the shared S3 bucket. Just be aware about copy/sync mode. It can rich up to several Gigabit throughput depending on your instance type.
Link for the project: rclone.org
My thoughts on some of the options mentioned in OP and comments, as well as some other ones I thought of:
EFS: create an EFS and mount it as an NFS drive on all the instances. It's the easiest but probably costs the most.
s3fs: have all the instances mount the same S3 bucket using s3fs. This is likely the most inexpensive solution. You also don't need to worry about running out of disk space. The downside is that the performance is not going to be that good compared to mounted NFS drives.
EBS volumes: attach an EBS volume to each worker instance for them to write the results to. When they are done, detach the volumes and attach them to the main server. This will be the fastest and still cheaper than EFS. If you can't or won't do all the detaching/attaching manually you'll need to write some scripts.
Old school NFS shares: there is nothing wrong with a plain vanilla NFS setup without any of those fancy AWS acronyms. :-)

Creating a persistent Link to an EFS drive on a Windows EC2 Server

I have created a Windows EC2 instance on AWS, and I have loaded it up with all of my needed software. My intention is to use this instance to create an image, so that I can (in the very near future) load up a much more powerful instance type using this image, and run a bunch of computations.
However, I also need to have a centralized location to store data. So, I created an EFS drive on AWS, and now I am trying to connect my instance to the EFS using a symbolic link that will persist to every other instance I load up in the future. I want to eventually have an army of instances, all of which use the centralized EFS drive as their primary storage device so that they can all load and save data, which can then be used by other instances.
I've been running Google searches all morning, but I'm coming up empty on how to do this. Any resources or tips would be greatly appreciated.
Thanks!
EFS is basically a managed NFS server. In order to mount this to a Windows instance, you will need to find an NFS client for windows.
An alternative would be to mount the EFS to a linux-based instance, and export the file system using Samba which could then be mounted on your Windows instances. Doing this you would lose out on a lot of the benefits of EFS (your linux instance is a single point of failure, and for high-bandwidth requirements will be a bottleneck) but it might be possible.
You don't say what you are trying to accomplish, but I would suggest designing a solution that would pull data from S3 as needed. That would also allow you to run multiple instances in parallel.

Which AWS services and specs should I best use for a file sharing web system?

I'm building a web system where 100-150 users will keep uploading/downloading ~10 GB total worth of audio files everyday (average of 150 total uploads and 250 total downloads per day).
I'm still trying to read about the whole AWS ecosystem and I need help with the ff:
For file storage, should I use S3 or EBS volumes mounted to an EC2 instance? From what I read, S3 is much cheaper and more scalable than EBS, but it's also slower. Is the speed difference really that huge or noticable for my use case? What are the advantages of a mounted EBS volume vs. S3?
What would be the best EC2 instance type for my use case? (i.e. frequent uploads and downloads) Will the General Purpose ones (T2, M4 etc) be enough to handle that load? (see above)
I can provide more info on my requirements/use cases if needed. Thanks!
Start with S3. S3 is a web api for putting and retrieving huge amounts of data, whereas EBS would be an NFS-mounted device. S3 will be more scalable from a data warehousing perspective, and in terms of access from multiple concurrent instances (should you do that, in the future.) Only use EBS if you actually need a filesystem for some reason. It doesn't sound like you do.
From there, you can look into some data archiving if you end up having huge amounts of data that doesn't need to be regularly available, to save some money.
Yes, use a t2 to start. Though, you should design your system so that it doesn't really matter, and you can easily teardown/replace instances. Using S3 helps with that pattern. You still need to figure out how you will deploy and configure your application to newly launched instances, though. You should /assume/ that your instance will go down, disappear, etc. So, you should be able to failover to another one on demand.

Best way to store shared files between ec2 instances

My website supports uploading images by the users. I'm trying to figure out what is the best strategy to save those files given that I have more than one ec2 instance running. Amazon Elastic File System sounds perfect but it's still in preview mode. What is the best alternative?
You almost certainly want to use S3 to share images between EC2 instances unless you have some very unique circumstances that won't allow it.
Best to not store any user data on the instance itself if you can avoid it; makes it easier to scale and to recover from crashes. S3 is a perfect super-redundant place to keep 'stuff' that costs next to nothing.

Amazon instance store

As far as I understand for new created amazon instance ephermeral data store is used by default, unless EBS store is configured.
After stop of the instance, which uses ephermeral data store, I will loose all data. Is it correct ?
I noticed that EBS store has been created automatically for my instance. I have created few files in home directory, but this files were not deleted after reboot. So where is ephermeral data is stored ?
I want to install database to Amazon host. Should I worry about data loose with default setup and what is the common configuration, for example
Create instance
Install and configure database on ephermeral data store
Make AMI
Create EBS store and configure database to use it as storages
After stop of the instance, which uses ephermeral data store, I will loose all data. Is it correct ?
To be specific, after you terminate or stop a node, any data on instance-specific storage will be lost. A reboot is different, and your data is intact in those cases. I am using these terms to match the terms in the AWS console.
To confuse matters slightly, some EBS-backed nodes also have some instance-specific storage. All instance-storage nodes are 100% instance-backed, though. So you really need to understand whether your data is hitting an EBS disk or instance-local storage.
I noticed that EBS store has been created automatically for my instance. I have created few files in home directory, but this files were not deleted after reboot. So where is ephermeral data is stored ?
Several points here:
For an EBS-backed instance, your /home partition is on the EBS root device, and hence data will persist provided the volume exists.
Again a reboot wouldn't delete your data even if you had an instance-storage node, but it sounds like you chose an EBS-backed node.
If you had instead created these files in /mnt, then stopped your instance and later started it again, you might have lost them. Again it depends exactly which ec2 node type you're running.
Regarding your last point - I would recommend that you just make sure your data is being stored on some EBS backed disk. Whether that is your root device or a separate EBS volume is up to you and depends on your specific needs.
I want to install database to Amazon host.
You should give some thought to not installing and maintaining your own database. Doing so is complex, error prone, and can be quite time consuming. I
A better option for most folks is a turnkey database solution like RDS. This is a performant database that you don't have to really think about - it'll just work. RDS isn't for everyone, as there are some restrictive permission issues, but generally speaking it's great. I use it every day.
You can run databases on top of EBS and it'll work just fine. But you are biting off being a database admin at that point, and need to worry about all the complexity that comes with it. In my opinion, better to focus your time & energy on things like database schema, queries, and other aspects of your business.