In case of a high throughput transaction system, we route transactions to instances in an instance group based on some condition to ensure that the transactions are processed one after the other. For example, there might be a routing rule that says transactions has 'cancel' in the data might be routed to instance C while 'new' might be routed to instance A. This is relevant for some business logic.
However in the serverless world, we cannot name an instance because we don't know where it is running and how. How do we implement this kind of logic in such cases. Or does it goes against against the serverless paradigm.
You can publish the event message in PubSub with an ordering key. like that, the message are delivered in order even if they aren't processed on the same instance.
Related
I need to notify all machines behind a load balancer when something happens.
For example, I have machines behind a load balancer which cache data, and if the data changes I want to notify the machines so they can dump their caches.
I feel as if I'm missing something as it seems I might be overcomplicating how I talk to all the machines behind my load balancer.
--
Options I've considered
SNS
The problem with this is such that each individual machine would need to be publicly accessible over HTTPS.
SNS Straight to Machines
Machines would subscribe themselves with their EC2 URL with SNS on startup. To achieve this I'd need to either
open those machines up to http from anywhere (not just the load balancer)
create a security group which lets SNS IP ranges into the machines over HTTPS.
This security group could be static (IPs don't appear to have changed since ~2014 from what i can gather)
I could create a scheduled lambda which updates this security group from the json file provided by AWS if I wanted to ensure this list was always up to date.
SNS via LB with fanout
The load balancer URL would be subscribed to SNS. When a notification is received one of the machines would receive it.
The machine would use the AWS API to look at the autoscaling group it belongs to to find other machines attached to the same load balancer and then send the other machines the same message using its internal URL.
SQS with fanout
Each machine would be a queue worker, one would receive the message and forward on to the other machines in the same way as the SNS fanout described above.
Redis PubSub
I could set up a Redis cluster which each node subscribes to and receives the updates. This seems a costly option given the task at hand (especially given I'm operating in many regions and AZs).
Websocket MQTT Topics
Each node would subscribe to an MQTT topic and received the update this way. Not every region I use supports IOT Core yet so I'd need to either host my own broker in each region or have every region connect to their nearest supported (or even a single) region. Not sure about the stability of this but seems like it might be a good option perhaps.
I suppose a 3rd party websocket service like Pusher or something could be used for this purpose.
Polling for updates
Each node contains x cached items, I would have to poll for each item individually or build some means by which to determine which items have changed into a bulk request.
This seems excessive though - hypothetically 50 items, at polling intervals of 10 seconds
6 requests per item per minute
6 * 50 * 60 * 24 = 432000 requests per day to some web service/lambda etc. Just seems a bad option for this use case when most of those requests will say nothing has changed. A push/subscription model seems better than a pull/get model.
I could also use long polling perhaps?
Dynamodb streams
The change which would cause a cache clear is made in a global DynamoDB table (not owned by or known by this service) so I could perhaps allow access to read the stream from that table in every region and listen for changes via that route. That couples the two services pretty tightly though which I'm not keen on.
I have a live production system with a google CloudSQL Postgres instance. The application will soon be undergoing a long running database schema modification to accommodate a change to the way the business operates. We've got a deployment plan that will allow the business to continue to operate during the schema change which essentially pauses replication to our read replica, and queues up API requests that would mutate the database for replay after the schema change is complete. Once the deployment is complete, the last step is to un-pause replication. But while the read replica is catching up, the schema changes will lock tables causing a lot of failing read requests. So before we un-pause the the read replication, we're going to divert all API db queries to the main instance which will have just finished the schema changes. So far so good, but I can't find a way to programmatically tell when the read replica is done catching up, so we can split our DB queries with writes going to the main instance and reads going to the replica.
Is there a PubSub topic or metric stream our application could subscribe to which would fire when replication catches up? I would also be happy with something that reports replication lag transaction count (or time) which the application could receive and when the trailing average comes below threshold, it switches over to reading from the replica again. The least desirable but still okay option would be continuous polling of an API or metric stream.
I know I can do this directly by querying the replica database itself for replication status, but that means we have to implement custom traffic directing in our application. Currently the framework we use allows us to route DB traffic in config. I know there should be metrics that are available from CloudSQL, but I cannot find them.
I know it's not fully answer your question, but maybe you will be able to use it. Seems that you might be interested in Cloud Monitoring and metric:
database/mysql/replication/seconds_behind_master
According to the reference it reflects the lag of the replica behind the master.
Either that or database/replication/replica_lag should work. I don't think you can board this through pub/sub. Anyway you should take a look at the reference as it contains all metrics.
I have been given the business logic of
A customer makes a request for services through a third party
gateway GUI to an EC2 instance
Processing for some time (15hr)
Data retrieval
Currently this is implemented by statically giving each user an EC2 instance to use to handle their requests. (This instance actually creates some sub instances to parallel process the data).
What should happen is that for each request, an EC2 instance be fired off automatically.
In the long term, I was thinking that this should be done using SWF (given the use of sub processes), however, I wondered if as a quick and dirty solution, using Autoscaling with the correct settings is worthwhile pursuing.
Any thoughts?
you can "trick" autoscalling to spin up instances based on metrics:
http://docs.aws.amazon.com/AutoScaling/latest/DeveloperGuide/policy_creating.html
So on each request, keep track/increments a metric. Decrement the metric when the process completes. Drive the autoscalling group on the metric.
Use Step Adjustments to control the number of instances: http://docs.aws.amazon.com/AutoScaling/latest/DeveloperGuide/as-scale-based-on-demand.html#as-scaling-types
Interesting challenges: binding customers to specific EC2 instances. Do you have this hard requirement of giving each customer their own instance? Sounds like autos calling is actually better suited for the paralleling process of the actual data, not for requests routing. You may get away with having a fixed number of machines for this and/or scale based on traffic, not number of customers.
I'm basically just looking for a starting point here. I have an app which needs to include the ability to update certain data in real time. For instance, the user has the ability to specify that she wants X to happen exactly 24 hours from the current time. I want to implement a framework for updating this end-user and any other relevant end-users after 24 hours that the event has occurred. Can anyone just provide me with a high-level explanation of which AWS services to implement and how to implement them in order to achieve this sort of framework? I think it includes some combination of SNS and SQS, but I'm not sure if these are relevant since I don't need to send a message or notification, rather more of an update that some sort of data has changed. If it's relevant, I'm currently using RDS with a MySQL database and Cognito for establishing user identities. Thanks!
I think its most likely a combination of SNS, and an EC2 instance - plus your existing database (and optionally SQS).
SNS can take care of the 'push' notification to a mobile device, but you can't schedule things to happen in the future (except for a few minutes).
Off the top of my head I would say the database keeps a list of what needs to be pushed, when it needs to be pushed and to whom.
The Ec2 instance has a cron job of some sort that polls on some in interval, running queries against your database to find 'things that need to be pushed now'.
If something needs to get a pushed, the cron job uses SNS to do the push - that could either just be a message (hey, you need to get new data), or else if the data is small enough, you could send the data within the message itself.
If you wanted to add a bit of scaling capability, the cron job that finds items to be pushed could, instead of sending out the SNS notifications itself, add a message to an SQS queue (i.e. work to be done), and you could use as many Ec2 instances as you needed querying the SQS queue and then sending out the SNS notifications in a parallel fashion.
Is it possible for AWS elastic load balancer to forward the incoming request to each of the ec2 instances behind it ?
You can accomplish it in several ways, and the answers can be very long, but my first recommendation would be to bring up another EC2 instance running, for example, Apache Zookeeper. Every other node (the ones you need to "notify") would then run a Zookeeper client, kind of subscribing for an event of "log changed". Whenever you need to change the log level, you would (manually or automatically) trigger a "log changed" event in your Zookeeper node. There is a lot of examples, use cases and code samples in the Zookeper project page that might help you get started.
The reason why I recommended Zookeeper is because it could serve as a central configuration point (not only log level) for your nodes in the future.
For "Command and Control" types of events, you're likely going to want a different mechanism.
You could take the "An SQS Queue for each server" approach, and whichever server gets the web request pushes it to each server's queue. Severs are periodically polling their queue for C&C operations. This gives you guaranteed delivery semantics which are quite important for C&C operations.
Instead of SQS, a database could be used to accomplish (mostly) the same thing. The DB approach is nice as it could also give you audit history which may (or may not) be important.