AWS System Design on Preconfigured EC2s - amazon-web-services

I have an AWS workflow which is as follows:
API call -> Lambda Function (Paramiko Remote Connect) -> EC2 -> output
Basically, I have an API call, which triggers a lambda function. Within the lambda function, I remote connect to a preconfigured EC2 instance using Python Paramiko, run some commands on the ec2 instance, and then return the output. I have two main concerns with this design: 1.) latency and 2.) scalability.
For Latency:
When I call the API, it takes 8-9 seconds to run, but if I were to run the job directly on the EC2 instance, it would take 1-2 seconds. Do the ssh_client.connect() and ssh_client.exec_command() cause significantly increased runtime? Also, I am implementing this on a t2-micro ubuntu 18.04 free-tier EC2 instance. Would using the paid versions cause a difference in runtime?
For Scalability:
I am sure AWS has a solution for this, but suppose that there are several simultaneous API calls. I am sure that I can't have only 1 available EC2 instance to run the job. Should I have multiple EC2 instances preconfigured and use a load-balancer? What AWS features can I use to scale this system?
If anything is unclear, please ask and I will elaborate.

Rather than using Paramiko, the more "cloud-friendly" method of running commands on an EC2 instance would be to use AWS Systems Manager Run Command, which uses an agent to run commands on instance. It can even run commands on multiple instances and also on-premises computers that have the agent installed.
Another design choice is to push a "job" message to an Amazon SQS queue. The worker instances can poll the SQS queue asking for work. When they receive a message, they can perform the work. This is more of an asynchronous model because the main system does not 'wait' for job to finish, so it needs a return path to provide the results (eg another SQS queue). However, it is highly scalable and more resilient, with no load balancer required. This is a common design pattern.

Related

How to run cron job only on single instance in AWS AutoScaling?

I have scheduled 2 cronjobs for my application.
My Application server is in an autoscaling group and I kept a minimum of 2 instances because of High availability. Everything working is fine but cron job is running multiple times because of 2 instances in autoscaling.
I could not limit the instance size to 1 because already my application in the production environment I prefer to have HA.
How should I have to limit execute cron job on a single instance? or should i have to use other services like AWS Lamda or AWS ELasticBeanstalk
Firstly you should consider whether running the crons on these instances is suitable. If you're trying to keep this highly available and it is directly interacted via customers what will the impact of the crons performance be?
Perhaps consider using a separate autoscaling group or instance with a total of 1 instances to run these crons? You could launch the instance or update the autoscaling group just before the cron needs to run and then automate the shutdown after it has completed.
Otherwise you would need to consider using a locking mechanism for your script. By using this your script write a lock to confirm that it is in process, at the beginning of the script run it would check whether there was any script lock in progress. To further prevent the chance of a collision between multiple servers consider adding jitter (random seconds of sleep) to the start of your script.
Suitable technologies for writing a lock are below:
DynamoDB using strongly consistent reads.
EFS for a Linux application, or FSX for a Windows application.
S3 using strong consistency.
Solutions suggested by Chris Williams sound reasonable if using lambda function is not an option.
One way to simulate cron job is by using CloudWatch Events (now known as EventBridge) in conjunction with AWS Lambda.
First you need to write a Lambda function with the code that needs to be executed on a schedule. Lambda supports cron expressions.
You can then use Schedule Expressions with EventBridge/CloudWatch Event in the same way as a cron tab and mention the Lambda function as target.
you can enable termination protection on of the instance. Attach necessary role & permission for system manager. once the instance is available under managed instance under system manager you can create a schedule event in cloudwatch to run ssm documents. if you are running a bash script convert that to ssm document and set this doc as targate. or you can use shellscript document for running commands

How to design a server monitoring system running on AWS

I am building some form of a monitoring agent application that is running on AWS EC2 machines.
I need to be able to send commands to the agent running on a specific EC2 instance and only an agent running on that instance should pick it up and act on it. New EC2 instances can come and go at any point in time.
I can use kinesis and push all commands for all instances there and agents can pick up the ones targeted for them. The problem with this is that agents will have to receive a lot of commands that are not for them and filter it out.
I can also use SQS per instance, but then this will require to create/delete SQS every time new instance is being provisioned.
Would like to hear if there are already proven solutions for a similar scenario.
There already is a fully functional feature provided by AWS. I would rather use that one as opposed to reinventing the wheel, as it is a robust, well-integrated, and proven solution that’s being leveraged by thousands of AWS customers to gain operational insights into their instance fleets:
AWS Systems Manager Agent (SSM Agent) is a piece of software that can be installed and configured on an EC2 instance (and it’s pre-installed on many of the default AMIs, including both versions of Amazon Linux, Ubuntu, and various versions of Windows Server). SSM Agent makes it possible to update, manage, and configure these resources. The agent processes requests from the Systems Manager service in the AWS Cloud, and then runs them as specified in the request. SSM Agent then sends status and execution information back to the Systems Manager service by using the Amazon Message Delivery Service.
You can learn more about AWS Systems Manager and the breadth and depth of functionality it provides here.
Have you considered using Simple Notifications Service? Each new EC2 instance could subscribe to a topic using e.g. http, and remove previous subscribers.
That way the topic would stay constant regardless of EC2 rotation.
It might be worth noting that SNS supports subscription filters, so it can decide which messages deliver to which endpoint.
To my observation, AWS SWF could be the option here. Since Amazon SWF is to coordinate work across distributed application components and it provides SDKs for various platforms. Refer to the official FAQs for more in-depth understanding. https://aws.amazon.com/swf/faqs/
Not entirely clear what the volume of the monitoring system messages will be.
But the architecture requirements described sounds to me as follows:
The agents on the EC2 instances are (constantly?) polling some centralized service, which is a poll based architecture
The messages being sent are to a specific predetermined EC2 instance, which is a push based architecture.
To support both options without significant filtering of the messages I suggest you try using an intermediate PubSub system such Kafka, which can be managed on AWS by MSK.
Then to differentiate between the instances, create a Kafka topic named by the EC2 instance ID.
This should give you a unique topic that the instance will easily know to access messages for itself on a topic denoted by it's own instance ID.
You can also send/push Producer messages to a specific EC2 instance by sending messages to the topic in the cluster named by it's EC2 instance ID.
Since there are many EC2 instances coming and going you will end up with many topics. To handle the volume of topics, you can trigger and notify CloudWatch on each EC2 termination event and check CloudWatch to see which EC2 instances were terminated and consequently their topic needs deleting.
Alternatively, you can trigger a Lambda directly on the EC2 termination event event and log it by creating a file denoted by the instance ID to an S3 Bucket, which you can watch using an additional Lambda that will delete old EC2 instance topics from the Kafka cluster when their instance ID's appear there.

AWS "Lambda" Needing More CPU/RAM - Trigger EC2 "Job?"

I have a daily process that needs to digest a tremendous amount of data from two external sources. It normally requires around 28GB or RAM, and a decent amount of processing power. Due to this, an AWS Lambda won't work.
In the meantime, I've been running the process on an EC2 instance. In order to save resources, I've attempted to start the instance using a CloudWatch event. Since no event exists for "StartEC2," I'm kicking off a AWS Lambda instead, which in turn starts the EC2 isntance using Amazon support libraries.
All of this is extremely cumbersome, and I've been looking for a library or pattern that can do what I want. Essentially, I need to start an EC2 instance on a cron/event, deliver a unit of work to it (Shell Script, Java App, whatever), have it run it, then shutdown.
I'd love any suggestions for accomplishing this.
Look into AWS Systems Manager (SSM), you can create an Automation document that will launch the instance, run any custom scripts or tasks, and shut it down again when you're done. You can trigger the SSM Automation with a cron schedule via CloudWatch Events.
You may also want to consider AWS Batch for this type of workload.

Accessing files in EC2 from Lambda

I have few EC2 servers in AWS. Whenever the disk space exceeds a limit, i want to delete some files (may be logs folder) in EC2 instance automatically. I am planning to use Lambda and cloudwatch for this. Can i use Lambda to interact with EC2. If not possible, what is the alternate approach to achieve this functionality.
This is not an appropriate use-case for an AWS Lambda function.
AWS Lambda is suitable for tasks where compute is required in response to an event. Your use-case, however, is to manipulate information on an EC2 instance, which does not need cloud compute.
You could run a script on each each computer, triggered by a Scheduled Task.
Alternatively, you could use the Systems Manager Run Command (also known as the EC2 Run Command), which allows you to run commands on multiple Amazon EC2 instances and view the results. This could be used to trigger a local script, or it could pass the whole command to run (including the script). It is purpose-built for the type of task you describe.
AWS Lambda has access to your instances if they are available in the internet. If they are not available in the internet, it is possible to give access to AWS lambda using a NAT or instance Gateway in your VPC.
The problem is: access to your instance does not means access to the instances filesystems. To delete the files from Lambda you can use two alternatives:
Configure a network filesystem service in your instances an connect
to this services in your lambda function. Using windows you would
just "share" your disks, but in that case you would use some SMB
library in your lambda code, that "I think" did not have native SMB
support. Just keep in mind that your security guy will scream out
loud when you propose this alternative.
Create a "agent" in your EC2 instances and keep it running as a
Windows Service and call this agent from your lambda function. In
that case, the lambda will start the execution of the agent that
will be responsible for the file deletion.
Another option, is to follow Ramesh's suggestion and create a Powershell script and configure a cron job. To be easy, you can create a Image with this Powershell script and use the image to initialize each instance. The same solution would be applicable to "the agent" solution in the lambda alternantives.
I think that, in any case, you will need to change something in your 150 servers. Using a customized image can help you to simplify this a little bit, but you will not get a solution without some changes.
According to the following thread, you cannot access files inside a EC2 VM unless you are exposing files to the public using different methodology.
AWS Forum
Quoting from the forum
If you are talking about the underlying EC2 instance, answer is No, you cannot access those files.
However as a solution for your problem, you can used scheduled job to cleanup your files depending your usage. You can use a service or cron job.

AWS Lambda run command on EC2 instance and get result

I have an EC2 instance that is running a few processes. I also have a Lambda script that is triggered through various means. I would like this Lambda script to talk to my EC2 instance and get a list of running processes from it (Essentially run ps aux on the EC2 box, and read the output).
Now this is easy enough with just one instance and its instance-id. Just SSH in, run the command, get the output, and be on my way. However, I would like to scale this to multiple EC2 instances, for which only the instance-id is known and SSH keys may not be given.
Is such a configuration possible with Lambda and Boto (or other libraries)? Or do I just have to run a microserver on each of my instances that will reply with the given information (something I'm really trying to avoid)
You can do this easily with AWS Systems Manager - Run Command
AWS Systems Manager provides you safe, secure remote management of your instances at scale without logging into your servers, replacing the need for bastion hosts, SSH, or remote PowerShell.
Specifically:
Use the send-command API from Lambda function to get list of all processes on a group of instances. You can do this by providing a list of instances or even a tag query
You can also use CloudWatch Events to trigger a Run Command directly
I don't think there is something available out of the box for this scenario.
Instead of querying, try an alternate approach. Install an agent on all ec2 instances, which reports the required information to a central service or probably a DynamoDB table, with HashKey as InstanceId.
You may want to bake this script as a cron job, (executed probably hourly?) in the AMI itself.
With this implementation, you reduce the complexity of managing and running a separate web service on each EC2 instance.
Query the DynamoDB table on demand. There will be a lag, as data may not be real time, but you can always reduce the CRON interval per your needs.
Like Yeshodhan mentioned, There is no direct approach for this.
However, There is one more approach.
1) Save your private key file to an s3 bucket, Create a lambda function and use python fabric module to login to the remote machines from lambda function and execute commands.
The above-mentioned approach is possible but I highly recommend launching a separate machine and use a configuration management system (Preferably ansible) and get the results from remote machines.