I have an application that is very IO sensitive that I want to host on AWS lambda. I want to use the up to 10GB memory that lambda functions can now have, create a tmpfs and serve my incoming requests using data there.
Unfortunately, I am not able to figure a way to do this.
Among other things, i have tried to create a tmpfs and load it both from my dockerfile or by invoking it from python. But these look like they need sudo access.
Would anyone know how to accomplish this?
Related
Is there a way how to switch from AWS managed key to Customer managed key for already existing EFS?
The EFS was created by the key provided by AWS (aws/elasticfilesystem), but because of Security audit we have to use CMK.
Unfortunately, you can not change the key for an existing EFS. Disabling or deleting the AWS managed Key will lead to the lost of your filesystem.
But you have several options to workaround this. The first one I see, is to create a new EFS with a CMK, mount it on a host which has also the old EFS and backup all your file using rsync or a similar tool. Then switch when synchronisation is finish. I don't know how much data you have this can take a while and cost money.
I also found a similar procedure that use datapipeline that seems to do the same things but all package by AWS.
To be honest, I never use this tool. You can find information here https://docs.aws.amazon.com/efs/latest/ug/alternative-efs-backup.html
The second option, is to use AWS backup. Create an "on demand backup" of your EFS. When backup is done, create a restore job using a new filesystem which will use your CMK. What I don't like with this method, is that AWS backup will create a directory inside the root filesystem. I thinks this is kind of dirty.
root#ip-172-31-16-39:/data1# df -h .
Filesystem Size Used Avail Use% Mounted on
fs-fc09d4c8.efs.eu-west-1.amazonaws.com:/ 8.0E 0 8.0E 0% /data1
root#ip-172-31-16-39:/data1#
root#ip-172-31-16-39:/data1# ls -l
total 4
drwxr-xr-x 3 root root 6144 May 14 17:55 aws-backup-restore_2021-05-14T19-03-08-145Z
[1]. https://docs.aws.amazon.com/efs/latest/ug/troubleshooting-efs-encryption.html
I have 3 AWS P instances processing some heavy stuff and saving results to relevant /home/user/folder
Also I have a main server with the same folder where I want to collect results from those 3 instances
Each instance works on its own part of the whole task, their results in sub folders not overlapping
Instances are 2 TB each, so I would like to get results from each instance as soon as they appear
This way when its job is done, I won't spend half a day copying results to the main server
I think one way of solving this is running something like this on each instance:
*/30 * * * * rsync /home/user/folder ubuntu#1.1.1.1:/home/user/folder
Are there any other more smart ways of achieving same results given that all of instances are AWS?
I also thought about (1) detachable storage and (2) storing on S3 but being new to AWS I might overlook some hidden pitfalls in such workflows, especially when it comes to terabytes of data and expensive instances.
How do you collect processed data from remote instances?
I would consider using rclone tool, which can be easy configured for the shared S3 bucket. Just be aware about copy/sync mode. It can rich up to several Gigabit throughput depending on your instance type.
Link for the project: rclone.org
My thoughts on some of the options mentioned in OP and comments, as well as some other ones I thought of:
EFS: create an EFS and mount it as an NFS drive on all the instances. It's the easiest but probably costs the most.
s3fs: have all the instances mount the same S3 bucket using s3fs. This is likely the most inexpensive solution. You also don't need to worry about running out of disk space. The downside is that the performance is not going to be that good compared to mounted NFS drives.
EBS volumes: attach an EBS volume to each worker instance for them to write the results to. When they are done, detach the volumes and attach them to the main server. This will be the fastest and still cheaper than EFS. If you can't or won't do all the detaching/attaching manually you'll need to write some scripts.
Old school NFS shares: there is nothing wrong with a plain vanilla NFS setup without any of those fancy AWS acronyms. :-)
I'm building a web system where 100-150 users will keep uploading/downloading ~10 GB total worth of audio files everyday (average of 150 total uploads and 250 total downloads per day).
I'm still trying to read about the whole AWS ecosystem and I need help with the ff:
For file storage, should I use S3 or EBS volumes mounted to an EC2 instance? From what I read, S3 is much cheaper and more scalable than EBS, but it's also slower. Is the speed difference really that huge or noticable for my use case? What are the advantages of a mounted EBS volume vs. S3?
What would be the best EC2 instance type for my use case? (i.e. frequent uploads and downloads) Will the General Purpose ones (T2, M4 etc) be enough to handle that load? (see above)
I can provide more info on my requirements/use cases if needed. Thanks!
Start with S3. S3 is a web api for putting and retrieving huge amounts of data, whereas EBS would be an NFS-mounted device. S3 will be more scalable from a data warehousing perspective, and in terms of access from multiple concurrent instances (should you do that, in the future.) Only use EBS if you actually need a filesystem for some reason. It doesn't sound like you do.
From there, you can look into some data archiving if you end up having huge amounts of data that doesn't need to be regularly available, to save some money.
Yes, use a t2 to start. Though, you should design your system so that it doesn't really matter, and you can easily teardown/replace instances. Using S3 helps with that pattern. You still need to figure out how you will deploy and configure your application to newly launched instances, though. You should /assume/ that your instance will go down, disappear, etc. So, you should be able to failover to another one on demand.
I'm wondering what is the best way of processing huge amounts of images stored in AWS S3 buckets from an Ec2 instance located in the same availability zone.
Should I download the images that I need each time I have to process them and then delete when I'm done, and do the same thing every time I need to do some processing?
Or is there a better way, like mounting the S3 bucket into the EC2 instance? I have seen tools like Fuse for mounting, but I am not sure if this is the best way of processing the data.
First of all. Note that each EC2 instance can be killed, so keep data, and results at reasonable storage - like S3.
If you fetch whole image into memory, and then processing goes. I can't see needs for fetching to disk. On the other hand if image is quite big - you could fetch each part many times. So there is no easy answer, at least with out more information.
You can look at map reduce solutions. How they are dealing with keeping data close to processing unit. Spark is able to process things in memory.
About mounting resources. There are other options like Elastic File System, or Elastic Block Storage - that can be mounted.
My company is looking for a solution for file sharing via FTP - currently, we share one server for client/admin FTP file sharing and serving multiple sites, and are looking to split off our roles so that we have one server dedicated to FTP and one for serving websites.
I have tried to find a good solution with AWS, but cannot find any detailed information regarding EBS and EC2 servers, and whether an EC2 package will be able to handle FTP storage. For example, a T2.nano instance seems ideal with 1 cpu and minimal RAM, but I see no information regarding EBS storage limits.
We need around 500GiB at most, and will have transfers happening daily in the neighborhood of 1GiB in and out. We don't need to run a database or http server. We may run services for file cleanup in the background weekly.
EDIT:
I mis-worded the question, which was founded from a fundamental lack of understanding AWS EC2 and EBS which I now grasp. I know EC2 can run FTP services, the question was more of a cost-effective solution with dynamic storage. Thanks for the input!
As others here on SO will tell you: don't bother with EBS. It can be made to work but does not make much sense in the long run. It's also more expensive and trickier to operate (backups/disaster recovery/having multiple ftp server machines).
Go with S3 storing your files and use something that is able to leverage S3 for ftp (like s3fs)
See:
http://resources.intenseschool.com/amazon-aws-howto-configure-a-ftp-server-using-amazon-s3/
Setting up FTP on Amazon Cloud Server
http://cloudacademy.com/blog/s3-ftp-server/
If FTP is not a strong requirement you can also look at migrating people to using S3 directly (either initially or after you do the setup and give them the option of both FTP and S3 directly)
the question is among the most seen on SO for aws: You can install a FTP server on any EC2 instance type
There's no limit on EBS and you can always increase the storage if you need, so best rule is: start low and increase when needed
Only point to mention is the network performance comes with the instance type so if you care about the speed a t2.nano (low network performance) might not be sufficient