https://companyA.acme.org/custom/api/endpoint1 is hosted on a dedicated ec2 instance
https://companyB.acme.org/another/custom/apiendpoint is hosted on a dedicated ec2 instance
(both ec2 instance are using the same core app, each customer can customize the catalog of API endpoints)
Most of the time those ec2 instances are idle, so we secretly want to stop them, but we don't want the customer to care about the instance being up or not.
We can accept a 2 sec delay on response timing when instance needs to be wake up before answering the customer API call
My idea is to intercept all incoming HTTP request and buffer them before routing + forwarding.
I need a delay to be able to check if a backend matching the subdomain is up or not and wake it up if it is down.
Anyone knows any existing proxy / load balancing solution able to buffer / queue HTTP requests, then allows to do some custom magic (in order to launch the right ec2 instance), then forward the request based on Origin/Referer ?
(the answer being probably every existing proxy for the last part )
I was thinking about the following:
NGINX in front of everyone (point all route53 subdomains to this NGINX)
Catch AWS event when someone is calling https://companyA.acme.com/custom/api/endpoint2
Trigger AWS lambda that will starts corresponding ec2 host
But I am not sure on how NGINX will handle the request buffering / forwarding while I start the ec2 host.
Bonus question : how not to waste any time forwarding the request in case the backend is already up ?
Related
I am creating a scraper that calls a public API. I want to have a pool of proxy IP's that I can use to send each request from. My request interval should be around 10s, and each IP is allowed to send a request every minute. So having a pool of 6 ip's would be sufficient.
My setup is a EC2 server that send a request to API Gateway which calls the lambda function.
Is there a way for me to set this up so that each lambda request is send from a different ip
So let's assume I start polling from 08:00:00
08:00:00 - 0.0.0.1
08:00:10 - 0.0.0.2
08:00:20 - 0.0.0.3
08:00:30 - 0.0.0.4
08:00:40 - 0.0.0.5
08:00:50 - 0.0.0.6
08:01:00 - 0.0.0.1
etc.
or am I completely looking in the wrong direction and is there a different way for me to solve this problem?
The outbound requests from Lambda will come from different IP addresses as long as they're run on different instances of Lambda. You can have Lambda keep a minimum number of instances in a read to run state, for additional cost. But then you don't have a mechanism to route a specific request to a specific Lambda instance.
Since you're running an EC2 instance, you could bind multiple public IP addresses to it and then route each outbound request to the next one in turn.
I've a microservice application that has multiple instances running in ASG. All these applications maintains some internal state. This application exposes Actuator endpoints to refresh it's state. I've some applications which are running on-prem. The scenario is, On some event, I want to call those Actuator endpoints of applications running in AWS to refresh their state. The problem is, If I call LoadBalanced url, then call would go to only one instance. So, I'm thinking of below solutions.
Use SQS and let on-prem ap publish and AWS app consume that message. But here also, only one instance will receive the message.
Use SNS but listeners are http/s based so URL would remain same so I think only one instance would receive the message. (AFAIK)
Any other solution? Please suggest.
Thanks
Use SNS but listeners are http/s based so URL would remain same so I
think only one instance would receive the message. (AFAIK)
When using SNS each server would subscribe to the SNS topic, and when each server subscribes it would provide SNS with its direct HTTP(s) URL (not the load balancer URL). When SNS receives a message it would send it to each server that is currently subscribed. I'm not sure SNS will submit the request to the actuator endpoint in the correct format that your application needs though.
There are likely several solutions you could consider, including ones that won't require a code change. Such as establishing a VPN connection between your on-premise applications and the VPC that contains your ASGs, which would allow you to invoke each machine's refresh endpoint by it's unique private ip address.
However, more simply, if you're using an AWS Classic ELB or ALB, than repeated calls to the load balancer url should hit each machine running your application if enough calls to the refresh endpoint are made.
Although this may not meet your use case, say if you must strictly limit refresh calls to 1 time per endpoint. You'd have to experiment with your software and the load balancer's round-robin behavior.
We are using AWS classic ELB for our service and our service can only serve x number of requests at a time. If the number of requests are greater than x then we do not want to route those requests to the instance and neither do we want to lose those requests. We would like to limit the number of connections to the instances registered with the ELB. Is there some ELB setting to configure max connections to instances?
Another solution I could find was to use ELB connection draining but based on the ELB doc [1] , using connection draining will mark the instance as OutofService after serving in-flight requests. Does that mean the instance will be terminated and de-registered from ELB after in-flight requests are served? We do not want to terminate and de-register the instances, we just want to limit the number of connections to the instances. Any solutions?
[1] http://docs.aws.amazon.com/elasticloadbalancing/latest/classic/config-conn-drain.html
ELB is more meant to spread traffic evenly across instances registered for it. If you have more traffic, you throw up more instances to deal with it. This is generally why a load balancer is matched with an auto scaling group. The Auto Scaling Group will look at set constraints and based on that either spins up more instances or pulls them down (ie. your traffic starts to slow down).
Connection draining is more meant for pulling traffic from bad instances so it doesn't get lost. Bad instances mean they aren't passing health checks because something on the instance is broken. ELB by itself doesn't terminate instances, that's another part of what the Auto Scaling Group is meant to do (basically terminate the bad instance and spin up a new instance to replace it). All ELB does is stop sending traffic to it.
It appears your situation is:
Users are sending API requests to your Load Balancer
You have several instances associated with your Load Balancer to process those requests
You do not appear to be using Auto Scaling
You do not always have sufficient capacity to respond to incoming requests, but you do not want to lose any of the requests
In situations where requests come at a higher rate than you can process them, you basically have three choices:
You could put the messages into a queue and consume them when capacity is available. You could either put everything in a queue (simple), or only use a queue when things are too busy (more complex).
You could scale to handle the load, either by using Auto Scaling to add additional Amazon EC2 instances or by using AWS Lambda to process the requests (Lambda automatically scales).
You could drop requests that you are unable to process. Unless you have implemented a queue, this is going to happen at some point if requests rise above your capacity to process them.
The best solution is to use AWS Lambda functions rather than requiring Amazon EC2 instances. Lambda can tie directly to AWS API Gateway, which can front-end the API requests and provide security, throttling and caching.
The simplest method is to use Auto Scaling to increase the number of instances to try to handle the volume of requests you have arriving. This is best when there are predictable usage patterns, such as high loads during the day and less load at night. It is less useful when spikes occur in short, unpredictable periods.
To fully guarantee no loss of requests, you would need to use a queue. Rather than requests going directly to your application, you would need an initial layer that receives the request and pushes it into a queue. A backend process would then process the message and return a result that is somehow passed back as a response. (It's more difficult providing responses to messages passed via a queue because there is a disconnect between the request and the response.)
AWS ELB is practically no limit to get request. If your application handle only 'N' connection, Please go with multiple servers behind the ELB and set ELB health check URL will be your application URL. Once your application not able to respond the request, ELB automatically forward your request to another server which is behind ELB. So that you are not going to miss any request.
I'm planning to host some private services on AWS. These services will only be used by me and possibly 1-2 other people, but never more.
I would like the single EC2 instance that I'm using to only run when I'm using it. I don't want to manually start and stop it on the AWS console.
Ideally, I would need to things to happen:
Automatically shut down the single EC2 instance, if there have been no requests for the past hour.
Automatically start the instance, when there is an incoming request, i.e. I visit the URL of a service I'm hosting.
I've been able to configure a load balancer to shutdown the instance depending on incoming network traffic. I guess this is ok, if there is no better solution that can filter by IP region or number of incoming connections.
However, I can't figure out how to automatically launch an instance, just by visiting the corresponding URL. Is this even possible?
I could probably write a simple bash script to launch an instance, but I would prefer it to be automated.
There is one thing of scaling that I yet do not understand. Assume a simple scenario ELB -> EC2 front-end -> EC2 back-end
When there is high traffic new front-end instances are created, but, how is the connection to the back-end established?
How does the back-end application keep track of which EC2 it is receiving from, so that it can respond to the right end-user?
Moreover, what happen if a connection was established from one of the automatically created instances, and then the traffic is low again and the instance is removed.. the connection to the end-user is lost?
FWIW, the connection between the servers is through WebSocket.
Assuming that, for example, your ec2 'front-ends' are web-servers, and your back-end is a database server, when new front-end instances are spun up they must either be created from a 'gold' AMI that you previously setup with all the required software and configuration information, OR as part of the the machine starting up it must install all of your customizations (either approach is valid). with either approach they will know how to find the back-end server, either by ip address or perhaps a DNS record from the configuration information on the newly started machine.
You don't need to worry about the backend keeping track of the clients - every client talking to the back-end will have an IP address and TCPIP will take care of that handshaking for you.
As far as shutting down instances, you can enable connection draining to make sure existing conversations/connections are not lost:
When Connection Draining is enabled and configured, the process of
deregistering an instance from an Elastic Load Balancer gains an
additional step. For the duration of the configured timeout, the load
balancer will allow existing, in-flight requests made to an instance
to complete, but it will not send any new requests to the instance.
During this time, the API will report the status of the instance as
InService, along with a message stating that “Instance deregistration
currently in progress.” Once the timeout is reached, any remaining
connections will be forcibly closed.
https://aws.amazon.com/blogs/aws/elb-connection-draining-remove-instances-from-service-with-care/