Message all instances behind load balancer - amazon-web-services

I need to notify all machines behind a load balancer when something happens.
For example, I have machines behind a load balancer which cache data, and if the data changes I want to notify the machines so they can dump their caches.
I feel as if I'm missing something as it seems I might be overcomplicating how I talk to all the machines behind my load balancer.
--
Options I've considered
SNS
The problem with this is such that each individual machine would need to be publicly accessible over HTTPS.
SNS Straight to Machines
Machines would subscribe themselves with their EC2 URL with SNS on startup. To achieve this I'd need to either
open those machines up to http from anywhere (not just the load balancer)
create a security group which lets SNS IP ranges into the machines over HTTPS.
This security group could be static (IPs don't appear to have changed since ~2014 from what i can gather)
I could create a scheduled lambda which updates this security group from the json file provided by AWS if I wanted to ensure this list was always up to date.
SNS via LB with fanout
The load balancer URL would be subscribed to SNS. When a notification is received one of the machines would receive it.
The machine would use the AWS API to look at the autoscaling group it belongs to to find other machines attached to the same load balancer and then send the other machines the same message using its internal URL.
SQS with fanout
Each machine would be a queue worker, one would receive the message and forward on to the other machines in the same way as the SNS fanout described above.
Redis PubSub
I could set up a Redis cluster which each node subscribes to and receives the updates. This seems a costly option given the task at hand (especially given I'm operating in many regions and AZs).
Websocket MQTT Topics
Each node would subscribe to an MQTT topic and received the update this way. Not every region I use supports IOT Core yet so I'd need to either host my own broker in each region or have every region connect to their nearest supported (or even a single) region. Not sure about the stability of this but seems like it might be a good option perhaps.
I suppose a 3rd party websocket service like Pusher or something could be used for this purpose.
Polling for updates
Each node contains x cached items, I would have to poll for each item individually or build some means by which to determine which items have changed into a bulk request.
This seems excessive though - hypothetically 50 items, at polling intervals of 10 seconds
6 requests per item per minute
6 * 50 * 60 * 24 = 432000 requests per day to some web service/lambda etc. Just seems a bad option for this use case when most of those requests will say nothing has changed. A push/subscription model seems better than a pull/get model.
I could also use long polling perhaps?
Dynamodb streams
The change which would cause a cache clear is made in a global DynamoDB table (not owned by or known by this service) so I could perhaps allow access to read the stream from that table in every region and listen for changes via that route. That couples the two services pretty tightly though which I'm not keen on.

Related

AWS - Server to Server pub-sub

We do have a system that is using Redis pub/sub features to communicate between different parts of the system. To keep it simple we used the pub/sub channel to implement different things. On both ends (producer and consumer), we do have Servers containing code that I see no way to convert into Lambda Functions.
We are migrating to AWS and among other changes, we are trying to replace the use of Redis with a managed pub/sub solution. The required solution is fairly simple: a managed broker that allows to publish a message from one node and to subscribe for its reception from 0 or more other nodes.
It seems impossible to achieve this with any of the available solutions:
Kinesis - It is a streaming solution for data ingestion (similar to Apache Pulsar)
SNS - From the documentation, it looks like exactly what we need until we realize that there is no solution to connect a server (not a Lambda) unless with a custom HTTP endpoint.
EventBridge - Same issue as with SNS
SQS - It is a queue, not a pub/sub.
Amazon MQ / Rabbit MQ - It is a queue, not a pub/sub. But also is not a SaaS solution but rather an installation to an owned node.
I see no reason to remove a feature such as subscribing from a server, this is why I was sure it will be present in one or more of the available solutions. But we went through the documentation and attempted to consume fro SNS and EventBridge without success. Are we missing something? How to achieve what we need?
Example
Assume we have an API server layer, deployed on ECS with a load balancer in front. The API has 2 endpoints, a PUT to update a document, and an SSE to listen for updates on documents.
Assuming a simple round-robin load balancer, an update for document1 may occur on node1 where a client may have an ongoing SSE request for the same document on node2. This can be done with a Redis backbone; node1 publishes on document1 topic and node2 is subscribed to the same topic. This solution is fast and efficient (in this case at-most-once delivery is perfectly acceptable).
Being this an example we will not consider WebSocket pub/sub API or other ready-made solutions for this specific use case.
Lambda
Subscriber side can not be a Lambda. Being two distinct contexts involved (the SSE HTTP Request one and the SNS event one) this will cause two distinct lambdas to fire and no way to 'stitch' them together.
SNS + SQS
We hesitate with SQS in conjunction with SNS being a solution that will add a lot of unneeded complexity:
Number of nodes is not known in advance and they scale, requiring an automated system to increase/reduce the number of SQS queues.
Persistence is not required
Additional latency is introduced
Additional infrastructure cost
HTTP Endpoint
This is the closest thing to a programmatic subscription but suffers from similar issues to the SNS-SQS solution:
Number of nodes is unknown requiring endpoint subscriptions to be automatically added.
Eiter we expose one endpoint for each node or have a particular configuration on the Load Balancer to route the message to the appropriate node.
Additional API endpoints must be exposed, maintained, and secured.

Beats installation on AWS-ec2 to send to on-prem ELK

I have to setup jboss over AWS-EC2-Windows server, this will scale-up as well as per the requirements. We are using ELK for infrastructure monitoring for which will be installing beats here which will send the data to on-prem logstash. There we on-board the servers with there hostname and ip.
Now the problem is: in case of autoscaling, how we can achieve this.
Please advise.
Thanks,
Abhishek
If you would create one EC2 instance and create an AMI of it in order to have it autoscale based on that one, this way the config can be part of it.
If you mean by onboard adding it to the allowed list, you could use a direct connect or a VPC with a custom CIDR block defined and add that subnet in the allowed list already.
AFAIK You need to change the logstash config file on disk to add new hosts, and it should notice the updated config automatically and "just work".
I would suggest a local script that can read/write the config file and that polls an SQS queue "listening" for autoscaling events. You can have your ASG send SNS messages when it scales and then subscribe an SQS queue to receive them. Messages will be retained for upto 14 days and theres options to add delays if required. The message you receive from SQS will indicate the region, instance-id and operation (launched or terminated) from which you can lookup the IP address/hostname to add/remove from the config file (and the message should be deleted from the queue when processed successfully). Editing the config file is just simple string operations to locate the right line and insert the new one. This approach only requires outbound HTTPS access for your local script to work and some IAM permissions, but there is (a probably trivial) cost implication.
Another option is a UserData script thats executed on each instance at startup (part of the Launch Template of your AutoScale group). Exactly how it might communicate with your on-prem depends on your architecture/capabilities - anythings possible. You could write a simple webservice to manage the config file and have the instances call it but thats a lot more effort and somewhat risky in my opinion.
FYI - if you use SQS look at Long Polling if your checking the queues frequently/want the message to propagate as quickly as possible (TLDR - faster & cheaper than polling any more than twice a minute). Its good practice to use a dead-letter queue with SQS - messages that get retrieved but not removed from the queue end up here. Setup alarms on the queue and deadletter queue to alert you via email if there are messages failing to be processed or not getting picked up in sensible time (ie your script has crashed etc).

Which AWS SaaS (or combination) to reliably send outgoing HTTP?

I am looking to replace hand rolled service, which reads messages of a queue and then sends them to outside endpoints via HTTP (basically outgoing webhooks).
I have been looking into SNS, but it feels kind of like trying to fit square peg into a round hole.
I think I could pull it off rolling out my own HTTP sender in Lambda and marrying it with SQS.
But is there any SaaS product in AWS that does it for me without need for custom code?
Like said in comments, there is no "turnkey" solution without need for a bit of coding.
Depending on the types of bandwidth / responsiveness / charge your application requires, i would go with one of these two approaches
SQS with Lambda : scalability from 0 to n virtual servers (no activity = no server = no $)
ELB Worker tiers : scalability from 1 to n virtual servers
SQS with Lambda
An SQS queue attached to a Lambda function seems a simple solution to me.
See https://docs.aws.amazon.com/lambda/latest/dg/with-sqs.html
One advantage is that you may log useful infos from your lambda function in addition to the http call to your outside endpoints.
With the use of a framework like serverless, it may be straightforward to setup.
See https://serverless.com/blog/aws-lambda-sqs-serverless-integration/
ELB Worker with SQS Daemon
You may also take a look at Elastic Beanstalk Worker environments. There is a turnkey worker environment with SQS daemon included.
See https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/using-features-managing-env-tiers.html#worker-daemon

How to handle wesocket connections on load balanced servers

Our .net core web app currently accepts websocket connections and pushes out data to clients on certain events (edit, delete, create of some of our entities).
We would like to load balance this application now but foresee a problem in how we handle the socket connections. Basically, if I understand correctly, only the node that handles a specific event will push data out to its clients and none of the clients connected to the other nodes will get the update.
What is a generally accepted way of handling this problem? The best way I can think of is to also send that same event to all nodes in a cluster so that they can also update their clients. Is this possible? How would I know about the other nodes in the cluster?
The will be hosted in AWS.
You need to distribute the event to all nodes in the cluster, so that they can each push the update out to their websocket clients. A common way to do this on AWS is to use SNS to distribute the event to all nodes. You could also use ElastiCache Redis Pub/Sub for this.
As an alternative to SNS or Redis, you could use a Kinesis Stream. But before going to that link, read about Apache Kafka, because the AWS docs don't do a good job of explaining why you'd use Kinesis for anything other than log ingest.
To summarize: Kinesis is a "persistent transaction log": everything that you write to it is stored for some amount of time (by default a day, but you can pay for up to 7 days) and is readable by any number of consumers.
In your use case, each worker process would start reading at the then-current end-of stream, and continue reading (and distributing events) until shut down.
The main issue that I have with Kinesis is that there's no "long poll" mechanism like there is with SQS. A given read request may or may not return data. What it does tell you is whether you're currently at the end of the stream; if not, you have to keep reading until you are. And, of course, Amazon will throttle you if you read too fast. As a result, your code tends to have sleeps.

Elastic Load Balancer forwarding request to each EC2 instance

Is it possible for AWS elastic load balancer to forward the incoming request to each of the ec2 instances behind it ?
You can accomplish it in several ways, and the answers can be very long, but my first recommendation would be to bring up another EC2 instance running, for example, Apache Zookeeper. Every other node (the ones you need to "notify") would then run a Zookeeper client, kind of subscribing for an event of "log changed". Whenever you need to change the log level, you would (manually or automatically) trigger a "log changed" event in your Zookeeper node. There is a lot of examples, use cases and code samples in the Zookeper project page that might help you get started.
The reason why I recommended Zookeeper is because it could serve as a central configuration point (not only log level) for your nodes in the future.
For "Command and Control" types of events, you're likely going to want a different mechanism.
You could take the "An SQS Queue for each server" approach, and whichever server gets the web request pushes it to each server's queue. Severs are periodically polling their queue for C&C operations. This gives you guaranteed delivery semantics which are quite important for C&C operations.
Instead of SQS, a database could be used to accomplish (mostly) the same thing. The DB approach is nice as it could also give you audit history which may (or may not) be important.