automatically start an EC2 instance when a request is made - amazon-web-services

I have website that I access very infrequently (sometimes 2/3 times a day and sometimes none)
I would like to automatically shutdown the instance as soon as no traffic happen (this would be possible when setting a cloudwatch alarm).
The current issue I have is that I would like to start the instance again as soon as there is a request on the website (I don't mind at all having to wait for the instance to come back online).
Is there any way of doing that? If yes how would that work technically?

DISCLAIMER: only some theoretical thoughts
Main idea: a landing page using S3 (static website), visiting this page will trigger an Lambda script.This Lambda script will launch the instance.
More details:
an Amazon S3 static website with a landing page + javascript to:
make a call to Amazon API Gateway
test if the instance is ready
redirect to proper page when everything is ready
maybe some url/javascript tricks to avoid the instance to be started by crawlers and bots, scanners etc
an Amazon API Gateway: only using to trigger an event for Amazon Lambda
an Amazon lambda script used for launching your instance, if is not running.
Depending on your needs, you can try to go serverless like here: https://gofore.com/en/going-serverless-with-amazon-s3-and-lambda/

You can use AutoScaling group. Set the scaling rules as per your need and that's it.
While adding the instance (scaling up) you can use 'Add' 1 instance and while removing the instance (scaling down) you can use 'Set to' 0 instance.
However keep in mind that when the instances are 0 and when the request comes in, that request will not be served but it can just trigger the scale up so that subsequent request can be served after the instance has warmed up.

Related

AWS create an EC2 instance when accessed

I currently have an EC2 instance that is on constantly, but has very low usage. I wish to shut down the instance when not in use and restart the instance when it is accessed.
It can be turned off automatically with CloudWatch Alarms fairly easily (tutorial here: https://successengineer.medium.com/how-to-automatically-turn-off-your-ec2-instance-in-2021-b73374e51090 ).
What I cannot find is a way to start the instance automatically when there is an incoming request, given that there are no other instances active.
Currently users arrive by redirection from a Route 53 record (policy: simple, type: A).
The Route 53 record can be changed to make this possible.
If the request is sent to the EC2 instance via an A-Record, then it is not possible to 'activate' the EC2 instance. This is because the request is being sent to the IP address of an instance that is turned off.
One option would be to provide a way for users to Start the instance, and after 1-2 minutes they could access the instance. Some ways to trigger the instance to start are:
Use the AWS CLI to call StartInstances() (but this requires AWS credentials)
Use a web browser to access a URL, which triggers an AWS Lambda function, which calls StartInstances()
If you do not want users to first Start the instance and wait until it is available, you'll need to send the requests to something that is not the EC2 instance. That 'something' could then check that the instance is running (and Start the instance if it is not running), and then forward the request to the instance. Since you are probably doing this to save money, you presumably don't want to use something that is "always on" to do this task.
You should be able to create an AWS Lambda function with a web-accessible URL, so that requests are sent to the Lambda function instead of the instance. The Lambda function would confirm that the instance is running and then forward the request to the instance, and then pass back the response to the user. OR, it could return an HTTP redirect to send the user to the instance. (So, the URL will change between the original request and where the response comes from.)
Bottom line: There is no easy way to achieve what you want. You could use auto-scaling to handle variable loads, but that assumes that at least one instance is always running. If an instance is off, it cannot respond to requests sent to the instance.

How can I stop and start EC2 automatically, when it not responding on http

I have an EC2 with the HTTP server; I want to stop and start it automatically when it was not responding on HTTP port after 2 minutes.
What is the best way to implement on AWS without using the scale group and elastic load balancer(ELB).
As I mentioned before, I don't need to create the new instance, just stop and start.
First instead of stopping and starting instance considering restarting service with help of monit or another monitoring service because restarting instance will take time and not a good idea.
But if you are worried about instance going down scenarios, you can configure auto healing(https://aws.amazon.com/blogs/aws/new-auto-recovery-for-amazon-ec2/).
Another custom way of doing would be, inside instance do a simple hello check using curl and store the response log and schedule it in a cron, Sync the log to cloudwatch and in cloudwatch you can plot metric using logs, and configure alarm if the metric count goes below a threshold for 2 mins, you can write lambda function to restart the instance, and associate the lambda to the alarm.(https://aws.amazon.com/premiumsupport/knowledge-center/start-stop-lambda-cloudwatch/). Since you have mentioned you are having one instance this approach will work, if you have more than one instance you need to handle namespace, But again restarting instance is a not a good idea.
I am using Route53 healthcheck which, when triggered, sends notification to SNS topic that triggers Lambda function that reboots the server.

AWS Lambda Provisioning Business Logic

In AWS Lambda, there is no need of provisioning done by us. But I was wondering how AWS Lambda might be provisioning machines to run for the requests. Is it creating a EC2 server for each request and execute the request and then kill the server? Or it keeps some EC2 servers always on to serve the request by executing the lambda function? If its doing the former point, then I am wondering it would also affect performance of AWS Lambda to serve the request. Can anyone guide me on this?
Lambdas run inside of a docker-like container, on EC2 servers (using Firecracker) that are highly, highly optimized. AWS has thousands of servers running full time to serve all of the Lambda functions that are running.
A cold start Lambda (one that's never been run before) starts up in a few seconds, depending on how big it is. An EC2 server takes 30+ seconds to startup. If it had to startup an EC2 server, you'd never be able to use a Lambda through API gateway (because API Gateway has a 30 second timeout). But obviously you can.
If you want your Lambdas to startup super duper fast (100ms), use Provisioned Concurrency.
AWS Lambda is known to reuse the resources. It will not create an EC2 server for each request so that will not be a performance concern
But you should note that the disk space provided for your function sometimes not cleanup properly. As some users reported
You can read more on the execution life cycle of Lambda here: https://docs.aws.amazon.com/lambda/latest/dg/running-lambda-code.html

What's the best method for creating a scheduler for running EC2 instances?

I want to create a web app for my organization where users can schedule in advance at what times they'd like their EC2 instances to start and stop (like creating events in a calendar), and those instances will be automatically started or stopped at those times. I've come across four different options:
AWS Datapipeline
Cron running on EC2 instance
Scheduled scaling of Auto Scaling Group
AWS Lambda scheduled events
It seems to me that I'll need a database to store the user's scheduled times for autostarting and autostopping an instance, and that I'll have to pull that data from the database regularly (to make sure that's the latest updated schedule). Which would be the best of the four above options for my use case?
Edit: Auto Scaling only seems to be for launching and terminating instances, so I can rule that out.
Simple!
Ask users to add a tag to their instance(s) indicating when they should start and stop (figure out some format so they can easily specify Mon-Fri or Every Day)
Create an AWS Lambda function that scans instances for their tags and starts/stops them based upon the tag content
Create an Amazon CloudWatch Event rule that triggers the Lambda function every 15 minutes (or whatever resolution you want)
You can probably find some sample code if you search for AWS Stopinator.
Take a look at ParkMyCloud if you're looking for an external SaaS app that can help your users easily schedule (or override that schedule) your EC2, RDS, and ASG instances. It also connects to SSO, provides an API, and shows you all of your resources across regions/accounts/clouds. There's a free trial available if you want to test it out.
Disclosure: I work for ParkMyCloud.

How to make a HTTP call reaching all instances behind amazon AWS load balancer?

I have a web app which runs behind Amazon AWS Elastic Load Balancer with 3 instances attached. The app has a /refresh endpoint to reload reference data. It need to be run whenever new data is available, which happens several times a week.
What I have been doing is assigning public address to all instances, and do refresh independently (using ec2-url/refresh). I agree with Michael's answer on a different topic, EC2 instances behind ELB shouldn't allow direct public access. Now my problem is how can I make elb-url/refresh call reaching all instances behind the load balancer?
And it would be nice if I can collect HTTP responses from multiple instances. But I don't mind doing the refresh blindly for now.
one of the way I'd solve this problem is by
writing the data to an AWS s3 bucket
triggering a AWS Lambda function automatically from the s3 write
using AWS SDK to to identify the instances attached to the ELB from the Lambda function e.g. using boto3 from python or AWS Java SDK
call /refresh on individual instances from Lambda
ensuring when a new instance is created (due to autoscaling or deployment), it fetches the data from the s3 bucket during startup
ensuring that the private subnets the instances are in allows traffic from the subnets attached to the Lambda
ensuring that the security groups attached to the instances allow traffic from the security group attached to the Lambda
the key wins of this solution are
the process is fully automated from the instant the data is written to s3,
avoids data inconsistency due to autoscaling/deployment,
simple to maintain (you don't have to hardcode instance ip addresses anywhere),
you don't have to expose instances outside the VPC
highly available (AWS ensures the Lambda is invoked on s3 write, you don't worry about running a script in an instance and ensuring the instance is up and running)
hope this is useful.
While this may not be possible given the constraints of your application & circumstances, its worth noting that best practice application architecture for instances running behind an AWS ELB (particularly if they are part of an AutoScalingGroup) is ensure that the instances are not stateful.
The idea is to make it so that you can scale out by adding new instances, or scale-in by removing instances, without compromising data integrity or performance.
One option would be to change the application to store the results of the reference data reload into an off-instance data store, such as a cache or database (e.g. Elasticache or RDS), instead of in-memory.
If the application was able to do that, then you would only need to hit the refresh endpoint on a single server - it would reload the reference data, do whatever analysis and manipulation is required to store it efficiently in a fit-for-purpose way for the application, store it to the data store, and then all instances would have access to the refreshed data via the shared data store.
While there is a latency increase adding a round-trip to a data store, it is often well worth it for the consistency of the application - under your current model, if one server lags behind the others in refreshing the reference data, if the ELB is not using sticky sessions, requests via the ELB will return inconsistent data depending on which server they are allocated to.
You can't make these requests through the load balancer, So you will have to open up the security group of the instances to allow incoming traffic from source other than the ELB. That doesn't mean you need to open it to all direct traffic though. You could simply whitelist an IP address in the security group to allow requests from your specific computer.
If you don't want to add public IP addresses to these servers then you will need to run something like a curl command on an EC2 instance inside the VPC. In that case you would only need to open the security group to allow traffic from some server (or group of servers) that exist in the VPC.
I solved it differently, without opening up new traffic in security groups or resorting to external resources like S3. It's flexible in that it will dynamically notify instances added through ECS or ASG.
The ELB's Target Group offers a feature of periodic health check to ensure instances behind it are live. This is a URL that your server responds on. The endpoint can include a timestamp parameter of the most recent configuration. Every server in the TG will receive the health check ping within the configured Interval threshold. If the parameter to the ping changes it signals a refresh.
A URL may look like:
/is-alive?last-configuration=2019-08-27T23%3A50%3A23Z
Above I passed a UTC timestamp of 2019-08-27T23:50:23Z
A service receiving the request will check if the in-memory state is at least as recent as the timestamp parameter. If not, it will refresh its state and update the timestamp. The next health-check will result in a no-op since your state was refreshed.
Implementation notes
If refreshing the state can take more time than the interval window or the TG health timeout, you need to offload it to another thread to prevent concurrent updates or outright service disruption as the health-checks need to return promptly. Otherwise the node will be considered off-line.
If you are using traffic port for this purpose, make sure the URL is secured by making it impossible to guess. Anything publicly exposed can be subject to a DoS attack.
As you are using S3 you can automate your task by using the ObjectCreated notification for S3.
https://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html
https://docs.aws.amazon.com/cli/latest/reference/s3api/put-bucket-notification.html
You can install AWS CLI and write a simple Bash script that will monitor that ObjectCreated notification. Start a Cron job that will look for the S3 notification for creation of new object.
Setup a condition in that script file to curl "http: //127.0.0.1/refresh" when the script file detects new object created in S3 it will curl the 127.0.0.1/refresh and done you don't have to do that manually each time.
I personally like the answer by #redoc, but wanted to give another alternative for anyone that is interested, which is a combination of his and the accepted answer. Using SEE object creation events, you can trigger a lambda, but instead of discovering the instances and calling them, which requires the lambda to be in the vpc, you could have the lambda use SSM (aka Systems Manager) to execute commands via a powershell or bash document on EC2 instances that are targeted via tags. The document would then call 127.0.0.1/reload like the accepted answer has. The benefit of this is that your lambda doesn't have to be in the vpc, and your EC2s don't need inbound rules to allow the traffic from lambda. The downside is that it requires the instances to have the SSM agent installed, which sounds like more work than it really is. There's AWS AMIs already optimized with SSM agent stuff, but installing it yourself in the user data is very simple. Another potential downside, depending on your use case, is that it uses an exponential ramp up for simultaneous executions, which means if you're targeting 20 instances, it runs one 1, then 2 at once, then 4 at once, then 8, until they are all done, or it reaches what you set for the max. This is because of the error recovery stuff it has built in. It doesn't want to destroy all your stuff if something is wrong, like slowly putting your weight on some ice.
You could make the call multiple times in rapid succession to call all the instances behind the Load Balancer. This would work because the AWS Load Balancers use round-robin without sticky sessions by default, meaning that each call handled by the Load Balancer is dispatched to the next EC2 Instance in the list of available instances. So if you're making rapid calls, you're likely to hit all the instances.
Another option is that if your EC2 instances are fairly stable, you can create a Target Group for each EC2 Instance, and then create a listener rule on your Load Balancer to target those single instance groups based on some criteria, such as a query argument, URL or header.