Notify all EC2 instances running in ASG - amazon-web-services

I've a microservice application that has multiple instances running in ASG. All these applications maintains some internal state. This application exposes Actuator endpoints to refresh it's state. I've some applications which are running on-prem. The scenario is, On some event, I want to call those Actuator endpoints of applications running in AWS to refresh their state. The problem is, If I call LoadBalanced url, then call would go to only one instance. So, I'm thinking of below solutions.
Use SQS and let on-prem ap publish and AWS app consume that message. But here also, only one instance will receive the message.
Use SNS but listeners are http/s based so URL would remain same so I think only one instance would receive the message. (AFAIK)
Any other solution? Please suggest.
Thanks

Use SNS but listeners are http/s based so URL would remain same so I
think only one instance would receive the message. (AFAIK)
When using SNS each server would subscribe to the SNS topic, and when each server subscribes it would provide SNS with its direct HTTP(s) URL (not the load balancer URL). When SNS receives a message it would send it to each server that is currently subscribed. I'm not sure SNS will submit the request to the actuator endpoint in the correct format that your application needs though.

There are likely several solutions you could consider, including ones that won't require a code change. Such as establishing a VPN connection between your on-premise applications and the VPC that contains your ASGs, which would allow you to invoke each machine's refresh endpoint by it's unique private ip address.
However, more simply, if you're using an AWS Classic ELB or ALB, than repeated calls to the load balancer url should hit each machine running your application if enough calls to the refresh endpoint are made.
Although this may not meet your use case, say if you must strictly limit refresh calls to 1 time per endpoint. You'd have to experiment with your software and the load balancer's round-robin behavior.

Related

How buffer/delay incoming HTTP request until down backend wakes up?

https://companyA.acme.org/custom/api/endpoint1 is hosted on a dedicated ec2 instance
https://companyB.acme.org/another/custom/apiendpoint is hosted on a dedicated ec2 instance
(both ec2 instance are using the same core app, each customer can customize the catalog of API endpoints)
Most of the time those ec2 instances are idle, so we secretly want to stop them, but we don't want the customer to care about the instance being up or not.
We can accept a 2 sec delay on response timing when instance needs to be wake up before answering the customer API call
My idea is to intercept all incoming HTTP request and buffer them before routing + forwarding.
I need a delay to be able to check if a backend matching the subdomain is up or not and wake it up if it is down.
Anyone knows any existing proxy / load balancing solution able to buffer / queue HTTP requests, then allows to do some custom magic (in order to launch the right ec2 instance), then forward the request based on Origin/Referer ?
(the answer being probably every existing proxy for the last part )
I was thinking about the following:
NGINX in front of everyone (point all route53 subdomains to this NGINX)
Catch AWS event when someone is calling https://companyA.acme.com/custom/api/endpoint2
Trigger AWS lambda that will starts corresponding ec2 host
But I am not sure on how NGINX will handle the request buffering / forwarding while I start the ec2 host.
Bonus question : how not to waste any time forwarding the request in case the backend is already up ?

How can Kubernetes Load balance consume queue message (SQS or other) and pass it to pod?

I am planing on using AWS SQS to receive messages from server and then instantly have Kubernetes Load Balancer consume them and pass each message to one of the pods.
My biggest concern is in which way can Load Balancer be triggered by AWS SQS.
Is this possible to do?
If yes in what way?
To expand on #Marcin's comment:
There is no integration between the Elastic Load Balancer implementations and SQS in any direction. Part of the reason is, that they both implement a pattern, which requires a trigger from the outside for them to do anything.
To consume a message from SQS, the consumer needs to actively poll SQS for work using the ReceiveMessage API call.
For the load balancer to serve traffic there needs to be a request from the outside it can respond to.
To get an integration between two passive/reactive services you need an active component inbetween. You could build for example a fleet of containers or a Lambda function. Lambda can be triggered via SQS (under the hood Lambda will poll SQS) and could subsequently send a request to your ALB or whichever load balancer you choose.

Web hook listener in AWS Lambda

I am writing a simple monitoring system for one of our existing production system. The system being monitored is a SMPP gateway. The basic requirement is to send a message to the SMPP gateway at a given frequency and receive the message via a web hook. This is so to ensure that the SMPP gateway is functioning as expected else email alarms are triggered.
This is the flow my program:
Connect to SMPP gateway
Start a web hook listener on a new thread (server)
Send a test message
Listen for incoming web hooks and notify the parent thread via events
If message web hook was received, exit gracefully, else trigger email alarm.
I have implemented this system in AWS Lambda and assigned a elastic IP by placing the Lambda function inside a VPC. I am able to send the message to SMPP gateway and the gateway is attempting to respond via web hook. But unfortunately, the server can't reach the web hook listener via the specified elastic IP. I searched around and figured that one way to implement web hook listener in AWS Lambda is by using an API gateway trigger. This is not use because this will not gaurantee that the same Lambda instance which sent the message via SMPP will receive the web hook request.
So my question is, is it possible to run a web hook listener in AWS Lambda and receive requests via an attached elastic IP?
No, it is not possible to run a web hook listener in AWS Lambda and receive requests via an attached elastic IP.
Lambda functions inside a VPC make outbound requests to the Internet using an Elastic IP attached to a NAT Gateway, via an ENI associated with the container host. Neither the ENI nor the EIP are exclusively bound to one single Lambda invocation. Lambda functions are technically allowed to listen for inbound connections... but they will never arrive via the ENI, and the NAT Gateway is also specifically designed not to allow connections initiated from outside to make their way back in. So there are at least two layers of the design that prevent what you are attempting from being done in this way.

Is any aws service suitable for sending real time updates to browser?

I'm developing a stocks app and have to keep users browser updated with pricing changes
I don't need to access past data, browser just have to get current data whenever it changes
is it possible to filter a dynamodb stream and expose an endpoint (behind api gateway) that could be used with a javascript EventSource?
I realize this is not using Server Sent Events but AWS just announced Serverless WebSockets for API Gateway. Pricing is based on minutes connected and number of messages sent.
Product Launch Article: https://aws.amazon.com/about-aws/whats-new/2018/12/amazon-api-gateway-launches-support-for-websocket-apis/
Documentation: https://docs.aws.amazon.com/apigateway/latest/developerguide/apigateway-websocket-api.html
Pricing: https://aws.amazon.com/api-gateway/pricing/
API Gateway is a store-and-forward service. It collects the response from whatever the back-end may happen to be (Lambda, an HTTP server, etc.) and then returns it en block to the browser -- it doesn't stream the response, so it would not be suited for use as an Eventsource.
AWS doesn't currently have a managed service offering that is obviously suited to this use case... you'd need a server (or more than one) on EC2, consuming the data stream and relaying it back to the connected browsers.
Assuming that running EC2 servers is an acceptable option, you then need HTTPS and load balancing. Application Load Balancer supports web sockets, so it also might also support an eventsource. A Classic ELB in TCP (not HTTP) mode should support an eventsource without a problem, though it might not correctly signal to the back-end when the browser connection is lost. Both of those balancers can also offload HTTPS for you. Network Load Balancer would definitely work for balancing an eventsource, but your instances would need to provide the HTTPS, since NLB doesn't offload it for you.
A somewhat unorthodox alternative might actually be AWS IoT, which has built-in websocket support... Not the same as eventsource, of course, but a streaming connection nonetheless... in such an environment, I suppose each browser user could be an addressable "thing."

Limit number of connections to instances with AWS ELB

We are using AWS classic ELB for our service and our service can only serve x number of requests at a time. If the number of requests are greater than x then we do not want to route those requests to the instance and neither do we want to lose those requests. We would like to limit the number of connections to the instances registered with the ELB. Is there some ELB setting to configure max connections to instances?
Another solution I could find was to use ELB connection draining but based on the ELB doc [1] , using connection draining will mark the instance as OutofService after serving in-flight requests. Does that mean the instance will be terminated and de-registered from ELB after in-flight requests are served? We do not want to terminate and de-register the instances, we just want to limit the number of connections to the instances. Any solutions?
[1] http://docs.aws.amazon.com/elasticloadbalancing/latest/classic/config-conn-drain.html
ELB is more meant to spread traffic evenly across instances registered for it. If you have more traffic, you throw up more instances to deal with it. This is generally why a load balancer is matched with an auto scaling group. The Auto Scaling Group will look at set constraints and based on that either spins up more instances or pulls them down (ie. your traffic starts to slow down).
Connection draining is more meant for pulling traffic from bad instances so it doesn't get lost. Bad instances mean they aren't passing health checks because something on the instance is broken. ELB by itself doesn't terminate instances, that's another part of what the Auto Scaling Group is meant to do (basically terminate the bad instance and spin up a new instance to replace it). All ELB does is stop sending traffic to it.
It appears your situation is:
Users are sending API requests to your Load Balancer
You have several instances associated with your Load Balancer to process those requests
You do not appear to be using Auto Scaling
You do not always have sufficient capacity to respond to incoming requests, but you do not want to lose any of the requests
In situations where requests come at a higher rate than you can process them, you basically have three choices:
You could put the messages into a queue and consume them when capacity is available. You could either put everything in a queue (simple), or only use a queue when things are too busy (more complex).
You could scale to handle the load, either by using Auto Scaling to add additional Amazon EC2 instances or by using AWS Lambda to process the requests (Lambda automatically scales).
You could drop requests that you are unable to process. Unless you have implemented a queue, this is going to happen at some point if requests rise above your capacity to process them.
The best solution is to use AWS Lambda functions rather than requiring Amazon EC2 instances. Lambda can tie directly to AWS API Gateway, which can front-end the API requests and provide security, throttling and caching.
The simplest method is to use Auto Scaling to increase the number of instances to try to handle the volume of requests you have arriving. This is best when there are predictable usage patterns, such as high loads during the day and less load at night. It is less useful when spikes occur in short, unpredictable periods.
To fully guarantee no loss of requests, you would need to use a queue. Rather than requests going directly to your application, you would need an initial layer that receives the request and pushes it into a queue. A backend process would then process the message and return a result that is somehow passed back as a response. (It's more difficult providing responses to messages passed via a queue because there is a disconnect between the request and the response.)
AWS ELB is practically no limit to get request. If your application handle only 'N' connection, Please go with multiple servers behind the ELB and set ELB health check URL will be your application URL. Once your application not able to respond the request, ELB automatically forward your request to another server which is behind ELB. So that you are not going to miss any request.