AWS Lambda run command on EC2 instance and get result - amazon-web-services

I have an EC2 instance that is running a few processes. I also have a Lambda script that is triggered through various means. I would like this Lambda script to talk to my EC2 instance and get a list of running processes from it (Essentially run ps aux on the EC2 box, and read the output).
Now this is easy enough with just one instance and its instance-id. Just SSH in, run the command, get the output, and be on my way. However, I would like to scale this to multiple EC2 instances, for which only the instance-id is known and SSH keys may not be given.
Is such a configuration possible with Lambda and Boto (or other libraries)? Or do I just have to run a microserver on each of my instances that will reply with the given information (something I'm really trying to avoid)

You can do this easily with AWS Systems Manager - Run Command
AWS Systems Manager provides you safe, secure remote management of your instances at scale without logging into your servers, replacing the need for bastion hosts, SSH, or remote PowerShell.
Specifically:
Use the send-command API from Lambda function to get list of all processes on a group of instances. You can do this by providing a list of instances or even a tag query
You can also use CloudWatch Events to trigger a Run Command directly

I don't think there is something available out of the box for this scenario.
Instead of querying, try an alternate approach. Install an agent on all ec2 instances, which reports the required information to a central service or probably a DynamoDB table, with HashKey as InstanceId.
You may want to bake this script as a cron job, (executed probably hourly?) in the AMI itself.
With this implementation, you reduce the complexity of managing and running a separate web service on each EC2 instance.
Query the DynamoDB table on demand. There will be a lag, as data may not be real time, but you can always reduce the CRON interval per your needs.

Like Yeshodhan mentioned, There is no direct approach for this.
However, There is one more approach.
1) Save your private key file to an s3 bucket, Create a lambda function and use python fabric module to login to the remote machines from lambda function and execute commands.
The above-mentioned approach is possible but I highly recommend launching a separate machine and use a configuration management system (Preferably ansible) and get the results from remote machines.

Related

How to get GPU information from multiple EC2 instances?

So I have multiple shared EC2 instances with GPUs used by a team, I want to create a streamlined process for checking which machine has free GPUs. I use ssh to access the instances and can find the relevant GPU information using
nvidia-smi
However, since I have no experience here, how can I go about developing a way to check free GPUs in a given list of instances. So far I can think of only ssh-ing into each and getting and returning information in one place, but would like a better way to do it.
You can configure all your instances to be managed by AWS Systems Manager (SSM) and this will allow you to run commands on a fleet of instances using the SSM Document AWS-RunShellScript. SSM allows you to remotely execute shell commands on managed instances without having to manually log in and execute the way you would with SSH. This doc should get you started.
You might also wanna look into SSM Automation.

AWS System Design on Preconfigured EC2s

I have an AWS workflow which is as follows:
API call -> Lambda Function (Paramiko Remote Connect) -> EC2 -> output
Basically, I have an API call, which triggers a lambda function. Within the lambda function, I remote connect to a preconfigured EC2 instance using Python Paramiko, run some commands on the ec2 instance, and then return the output. I have two main concerns with this design: 1.) latency and 2.) scalability.
For Latency:
When I call the API, it takes 8-9 seconds to run, but if I were to run the job directly on the EC2 instance, it would take 1-2 seconds. Do the ssh_client.connect() and ssh_client.exec_command() cause significantly increased runtime? Also, I am implementing this on a t2-micro ubuntu 18.04 free-tier EC2 instance. Would using the paid versions cause a difference in runtime?
For Scalability:
I am sure AWS has a solution for this, but suppose that there are several simultaneous API calls. I am sure that I can't have only 1 available EC2 instance to run the job. Should I have multiple EC2 instances preconfigured and use a load-balancer? What AWS features can I use to scale this system?
If anything is unclear, please ask and I will elaborate.
Rather than using Paramiko, the more "cloud-friendly" method of running commands on an EC2 instance would be to use AWS Systems Manager Run Command, which uses an agent to run commands on instance. It can even run commands on multiple instances and also on-premises computers that have the agent installed.
Another design choice is to push a "job" message to an Amazon SQS queue. The worker instances can poll the SQS queue asking for work. When they receive a message, they can perform the work. This is more of an asynchronous model because the main system does not 'wait' for job to finish, so it needs a return path to provide the results (eg another SQS queue). However, it is highly scalable and more resilient, with no load balancer required. This is a common design pattern.

Accessing files in EC2 from Lambda

I have few EC2 servers in AWS. Whenever the disk space exceeds a limit, i want to delete some files (may be logs folder) in EC2 instance automatically. I am planning to use Lambda and cloudwatch for this. Can i use Lambda to interact with EC2. If not possible, what is the alternate approach to achieve this functionality.
This is not an appropriate use-case for an AWS Lambda function.
AWS Lambda is suitable for tasks where compute is required in response to an event. Your use-case, however, is to manipulate information on an EC2 instance, which does not need cloud compute.
You could run a script on each each computer, triggered by a Scheduled Task.
Alternatively, you could use the Systems Manager Run Command (also known as the EC2 Run Command), which allows you to run commands on multiple Amazon EC2 instances and view the results. This could be used to trigger a local script, or it could pass the whole command to run (including the script). It is purpose-built for the type of task you describe.
AWS Lambda has access to your instances if they are available in the internet. If they are not available in the internet, it is possible to give access to AWS lambda using a NAT or instance Gateway in your VPC.
The problem is: access to your instance does not means access to the instances filesystems. To delete the files from Lambda you can use two alternatives:
Configure a network filesystem service in your instances an connect
to this services in your lambda function. Using windows you would
just "share" your disks, but in that case you would use some SMB
library in your lambda code, that "I think" did not have native SMB
support. Just keep in mind that your security guy will scream out
loud when you propose this alternative.
Create a "agent" in your EC2 instances and keep it running as a
Windows Service and call this agent from your lambda function. In
that case, the lambda will start the execution of the agent that
will be responsible for the file deletion.
Another option, is to follow Ramesh's suggestion and create a Powershell script and configure a cron job. To be easy, you can create a Image with this Powershell script and use the image to initialize each instance. The same solution would be applicable to "the agent" solution in the lambda alternantives.
I think that, in any case, you will need to change something in your 150 servers. Using a customized image can help you to simplify this a little bit, but you will not get a solution without some changes.
According to the following thread, you cannot access files inside a EC2 VM unless you are exposing files to the public using different methodology.
AWS Forum
Quoting from the forum
If you are talking about the underlying EC2 instance, answer is No, you cannot access those files.
However as a solution for your problem, you can used scheduled job to cleanup your files depending your usage. You can use a service or cron job.

Amazon EC2 - how to get list of process running on instances via AWS API?

How to get a list of process running on Amazon EC2 instances via the AWS API?
This could be accomplished by using the Amazon EC2 Systems Manager Run Command, which uses an agent installed on EC2 instances to run remote commands.
It takes a bit of configuration, but allows you to run commands on potentially hundreds of instances with one command.
This isn't possible. The EC2 API doesn't provide any actions to perform operations or retrieve data from the operating system layer.

Boot strapping AWS auto scale instances

We are discussing at a client how to boot strap auto scale AWS instances. Essentially, a instance comes up with hardly anything on it. It has a generic startup script that asks somewhere "what am I supposed to do next?"
I'm thinking we can use amazon tags, and have the instance itself ask AWS using awscli tool set to find out it's role. This could give puppet info, environment info (dev/stage/prod for example) and so on. This should be doable with just the DescribeTags privilege. I'm facing resistance however.
I am looking for suggestions on how a fresh AWS instance can find out about it's own purpose, whether from AWS or perhaps from a service broker of some sort.
EC2 instances offer a feature called User Data meant to solve this problem. User Data executes a shell script to perform provisioning functions on new instances. A typical pattern is to use the User Data to download or clone a configuration management source repository, such as Chef, Puppet, or Ansible, and run it locally on the box to perform more complete provisioning.
As #e-j-brennan states, it's also common to prebundle an AMI that has already been provisioned. This approach is faster since no provisioning needs to happen at boot time, but is perhaps less flexible since the instance isn't customized.
You may also be interested in instance metadata, which exposes some data such as network details and tags via a URL path accessible only to the instance itself.
An instance doesn't have to come up with 'hardly anything on it' though. You can/should build your own custom AMI (Amazon machine image), with any and all software you need to have running on it, and when you need to auto-scale an instance, you boot it from the AMI you previously created and saved.
http://docs.aws.amazon.com/gettingstarted/latest/wah-linux/getting-started-create-custom-ami.html
I would recommend to use AWS Beanstalk for creating specific instances, this makes it easier since it will create the AutoScaling groups and Launch Configurations (Bootup code) which you can edit later. Also you only pay for EC2 instances and you can manage most of the things from Beanstalk console.