How to mount EFS on a Lambda function? - amazon-web-services

I need to run a periodic cleanup on my EFS drive (which is being shared by multiple autoscaling EC2 instances). The cleanup involves deleting files/folders that meet a certain criterion (date/size etc.).
I imagined AWS Lambda to be the perfect solution for this task. Just trigger the function periodically, which should mount the Shared drive and run the cleanup. But it seems that Lambda only supports Creating/polling the disk for it's type and modifying its mountpoint etc.
Is there any alternative to accomplish this task?

So far I've found that while direct file operations aren't supported by Lambda, it can spin up an EC2 instance, which can run a startup script to do the cleanup and then shutdown.
While this solution is rather clunky, I do not see any alternative.
Lambda support for EFS seems to be a long standing demand:
Why can't you mount EFS to Lambda?
Can EFS be mounted from the Lambda environment

AWS has released Lambda filesystem support. See these details for configuration information, including CloudFormation and SAM templates. The file system and the Lambda function must be in the same region, and the function must be attached to the VPC, though it may be in a different account.

what about mounting your EFS to an ec2 instance and use lambda to ssh into ec2 and do the cleaning. As an example, you can use python fabric library to ssh into the ec2.

The solution with EC2 does not require the lambda at all. You can add an auto scaling group with scheduled policy to start instance once per week and shut it down. All activities required can be added using user data or some auto-run shell script in ec2 instance.

Related

What's the most efficient way to export files from EC2 to S3 on timed intervals?

Working on a problem at the moment where I want to export a file on an EC2 instance running a Windows AMI at four hour intervals to an S3 bucket. Currently, the architecture I'm thinking is as follows.
1. CloudWatch Events rule using scheduled trigger
2. Rule triggers Lambda function to run
3. Lambda function would use some form of the AWS CLI on the windows EC2 instance to extract (sync, cp, etc.) the file
4. File is placed is S3 bucket
Does anyone see a path that's more efficient than this one? I want to ensure that I'm handling this in the most straightforward manner. Thanks in advance for any input!
It is quite difficult to have external code (eg an AWS Lambda function) cause something to execute on a Windows computer. You could use Systems Manager Run Command, but that's a rather complex solution.
It would be much simpler to have the Windows computer push the files to Amazon S3:
Create a scheduled task in Windows
Use aws s3 cp or aws s3 sync to copy the files to Amazon S3
Done!
Your solution seems solid. Alternatively you may want to write daemon-like service (background process) that runs on each EC2 and does the data transfer from that instance to S3. What I like about your solution is how you can centrally control the scheduling easily. For my distributed solution you can have the processes read from central config, but that seems more complicated than the CW/Lambda solution.
For the EC2 process solution, this may be useful:
How to mount Amazon S3 Bucket as a Windows Drive, but it should be easy (and more scalable) to just use the AWS SDK instead to talk to S3

Accessing files in EC2 from Lambda

I have few EC2 servers in AWS. Whenever the disk space exceeds a limit, i want to delete some files (may be logs folder) in EC2 instance automatically. I am planning to use Lambda and cloudwatch for this. Can i use Lambda to interact with EC2. If not possible, what is the alternate approach to achieve this functionality.
This is not an appropriate use-case for an AWS Lambda function.
AWS Lambda is suitable for tasks where compute is required in response to an event. Your use-case, however, is to manipulate information on an EC2 instance, which does not need cloud compute.
You could run a script on each each computer, triggered by a Scheduled Task.
Alternatively, you could use the Systems Manager Run Command (also known as the EC2 Run Command), which allows you to run commands on multiple Amazon EC2 instances and view the results. This could be used to trigger a local script, or it could pass the whole command to run (including the script). It is purpose-built for the type of task you describe.
AWS Lambda has access to your instances if they are available in the internet. If they are not available in the internet, it is possible to give access to AWS lambda using a NAT or instance Gateway in your VPC.
The problem is: access to your instance does not means access to the instances filesystems. To delete the files from Lambda you can use two alternatives:
Configure a network filesystem service in your instances an connect
to this services in your lambda function. Using windows you would
just "share" your disks, but in that case you would use some SMB
library in your lambda code, that "I think" did not have native SMB
support. Just keep in mind that your security guy will scream out
loud when you propose this alternative.
Create a "agent" in your EC2 instances and keep it running as a
Windows Service and call this agent from your lambda function. In
that case, the lambda will start the execution of the agent that
will be responsible for the file deletion.
Another option, is to follow Ramesh's suggestion and create a Powershell script and configure a cron job. To be easy, you can create a Image with this Powershell script and use the image to initialize each instance. The same solution would be applicable to "the agent" solution in the lambda alternantives.
I think that, in any case, you will need to change something in your 150 servers. Using a customized image can help you to simplify this a little bit, but you will not get a solution without some changes.
According to the following thread, you cannot access files inside a EC2 VM unless you are exposing files to the public using different methodology.
AWS Forum
Quoting from the forum
If you are talking about the underlying EC2 instance, answer is No, you cannot access those files.
However as a solution for your problem, you can used scheduled job to cleanup your files depending your usage. You can use a service or cron job.

AWS Lambda run command on EC2 instance and get result

I have an EC2 instance that is running a few processes. I also have a Lambda script that is triggered through various means. I would like this Lambda script to talk to my EC2 instance and get a list of running processes from it (Essentially run ps aux on the EC2 box, and read the output).
Now this is easy enough with just one instance and its instance-id. Just SSH in, run the command, get the output, and be on my way. However, I would like to scale this to multiple EC2 instances, for which only the instance-id is known and SSH keys may not be given.
Is such a configuration possible with Lambda and Boto (or other libraries)? Or do I just have to run a microserver on each of my instances that will reply with the given information (something I'm really trying to avoid)
You can do this easily with AWS Systems Manager - Run Command
AWS Systems Manager provides you safe, secure remote management of your instances at scale without logging into your servers, replacing the need for bastion hosts, SSH, or remote PowerShell.
Specifically:
Use the send-command API from Lambda function to get list of all processes on a group of instances. You can do this by providing a list of instances or even a tag query
You can also use CloudWatch Events to trigger a Run Command directly
I don't think there is something available out of the box for this scenario.
Instead of querying, try an alternate approach. Install an agent on all ec2 instances, which reports the required information to a central service or probably a DynamoDB table, with HashKey as InstanceId.
You may want to bake this script as a cron job, (executed probably hourly?) in the AMI itself.
With this implementation, you reduce the complexity of managing and running a separate web service on each EC2 instance.
Query the DynamoDB table on demand. There will be a lag, as data may not be real time, but you can always reduce the CRON interval per your needs.
Like Yeshodhan mentioned, There is no direct approach for this.
However, There is one more approach.
1) Save your private key file to an s3 bucket, Create a lambda function and use python fabric module to login to the remote machines from lambda function and execute commands.
The above-mentioned approach is possible but I highly recommend launching a separate machine and use a configuration management system (Preferably ansible) and get the results from remote machines.

EC2 Event [Running] + Lambda Function

What I need to do is: When a EC2 instance is launched, the lambda function or other installs the script to monitor memory and disk usage in the host.
I'm thinking in how I can do that.. Anyone can give me a idea?
You don't need a lambda. Pass your install script as user data.
See: Running Commands on Your Linux Instance at Launch
It appears that your requirement is to monitor Memory and Disk usage from an Amazon EC2 instance. I will assume that you want to monitor it via Amazon CloudWatch.
Amazon CloudWatch provides default metrics for EC2 instances including CPU utilization, network traffic and disk access. These metrics are visible from the hypervisor. However, CloudWatch cannot see 'inside' the EC2 instance, so it is necessary to run scripts from within the instance to track things like free memory and free disk space. The scripts talk to the operating system to retrieve these metrics, which is why they have to run 'within' the instance.
Some standard monitoring scripts are available for Linux instances: Monitoring Memory and Disk Metrics for Amazon EC2 Linux Instances
You can, of course, write your own scripts to send custom metrics to CloudWatch. Once installed, the scripts will run automatically when the instance is restarted.
If you wish to install these scripts (or your own scripts) on new EC2 instances, there are a couple of methods:
Install the scripts on one instance, then create an Amazon Machine Image (AMI) of that instance that contains a copy of the disk. You can then launch new instances using that AMI, and the scripts will already be installed on the new instances.
Launch the instance(s) with a User Data script to install the monitoring script. Any script passed through User Data will automatically be run the first time that the instance is started.
When you are using a scaling group you must specify a LaunchConfig.
Part of the LaunchConfig is the user-data script which is executed when the instance boots.
This can be also easily done from CloudFormation scripts if that is what you use to create the new EC2 VM.
You can find here samples of scripts.
enter link description here

Boot strapping AWS auto scale instances

We are discussing at a client how to boot strap auto scale AWS instances. Essentially, a instance comes up with hardly anything on it. It has a generic startup script that asks somewhere "what am I supposed to do next?"
I'm thinking we can use amazon tags, and have the instance itself ask AWS using awscli tool set to find out it's role. This could give puppet info, environment info (dev/stage/prod for example) and so on. This should be doable with just the DescribeTags privilege. I'm facing resistance however.
I am looking for suggestions on how a fresh AWS instance can find out about it's own purpose, whether from AWS or perhaps from a service broker of some sort.
EC2 instances offer a feature called User Data meant to solve this problem. User Data executes a shell script to perform provisioning functions on new instances. A typical pattern is to use the User Data to download or clone a configuration management source repository, such as Chef, Puppet, or Ansible, and run it locally on the box to perform more complete provisioning.
As #e-j-brennan states, it's also common to prebundle an AMI that has already been provisioned. This approach is faster since no provisioning needs to happen at boot time, but is perhaps less flexible since the instance isn't customized.
You may also be interested in instance metadata, which exposes some data such as network details and tags via a URL path accessible only to the instance itself.
An instance doesn't have to come up with 'hardly anything on it' though. You can/should build your own custom AMI (Amazon machine image), with any and all software you need to have running on it, and when you need to auto-scale an instance, you boot it from the AMI you previously created and saved.
http://docs.aws.amazon.com/gettingstarted/latest/wah-linux/getting-started-create-custom-ami.html
I would recommend to use AWS Beanstalk for creating specific instances, this makes it easier since it will create the AutoScaling groups and Launch Configurations (Bootup code) which you can edit later. Also you only pay for EC2 instances and you can manage most of the things from Beanstalk console.