I have a web service (A), which basically creates an object in Redis cache and stores some user state in it. I'm trying to create a timing model, where after a certain expiry time, I can get this user state, do some operations on it and them remove it from cache.
I investigated Redis pub/sub model. I can publish an expiry stamp on the key of the object and create another web service(B), which subscribes to that key's expiry.
However, there is a possibility that I'll need to scale the web service (B). My question is if I scale horizontally service (B), would each instance receive that expiry event from Redis for the same object?
If yes, how do I make sure, I don't run into a race condition between multiple instances while doing operations on the subscribed event value.
Please suggest if there are any better ways to achieve this timing model on the server.
Related
I have a live production system with a google CloudSQL Postgres instance. The application will soon be undergoing a long running database schema modification to accommodate a change to the way the business operates. We've got a deployment plan that will allow the business to continue to operate during the schema change which essentially pauses replication to our read replica, and queues up API requests that would mutate the database for replay after the schema change is complete. Once the deployment is complete, the last step is to un-pause replication. But while the read replica is catching up, the schema changes will lock tables causing a lot of failing read requests. So before we un-pause the the read replication, we're going to divert all API db queries to the main instance which will have just finished the schema changes. So far so good, but I can't find a way to programmatically tell when the read replica is done catching up, so we can split our DB queries with writes going to the main instance and reads going to the replica.
Is there a PubSub topic or metric stream our application could subscribe to which would fire when replication catches up? I would also be happy with something that reports replication lag transaction count (or time) which the application could receive and when the trailing average comes below threshold, it switches over to reading from the replica again. The least desirable but still okay option would be continuous polling of an API or metric stream.
I know I can do this directly by querying the replica database itself for replication status, but that means we have to implement custom traffic directing in our application. Currently the framework we use allows us to route DB traffic in config. I know there should be metrics that are available from CloudSQL, but I cannot find them.
I know it's not fully answer your question, but maybe you will be able to use it. Seems that you might be interested in Cloud Monitoring and metric:
database/mysql/replication/seconds_behind_master
According to the reference it reflects the lag of the replica behind the master.
Either that or database/replication/replica_lag should work. I don't think you can board this through pub/sub. Anyway you should take a look at the reference as it contains all metrics.
I have my application caching some data on cloud memory store. The application has multiple instances running on the same region. AppInstanceA caches to MemStoreA and AppInstanceB caches to MemStoreB.
A particular user action from the app should perform cache evictions.
Is there an option in GCP to evict the entries on both MemStoreA and MemStoreB regardless from which app instance the action is triggered?
Thanks
You can use PubSub for this.
Create a topic
Publish in the topic when you have a key to invalidate
Create 1 subscription per memory store instance
Plug 1 function (each time the same function) per subscription with an environment variable that specifies the instance to use
Like this, the function are trigger in parallel and you can expect to invalidate roughly in the same time the key in all memory store instances.
I have an application where I'm looking to offload the compute load to AWS, and am after some guidance on architecture. The user will initiate a main task, which contains ~100 computationally-heavy sub-tasks which can be run in parallel.
I am thinking an appropriate solution is for the desktop app to hit an API gateway endpoint to create a new task, which would then invoke many Lambdas, one for each sub-task. I would like each sub-task to have individual progress reporting, as well as the ability for the user to cancel the overall task. The user could also use the API to query the created task / hit another endpoint to cancel it.
What's an appropriate architecture / service(s) to invoke and manage these Lambda sub-tasks, access intermediate progress information from each Lambda, the final result, and allow the user to request cancelation?
You may be interested in AWS Step Functions (https://aws.amazon.com/step-functions/) for the querying and orchestration of the overall progress, and possibly use DynamoDB (https://aws.amazon.com/dynamodb/) or some other data store to allow for monitoring the progress within individual sub tasks.
I have a system that consists of one central server, many mobile clients and many worker server. Each worker server has its own database and may be on the customer infrastructure (when he purchases the on premise installation).
In my current design, mobile clients send updates to the central server, which updates its database. The worker servers periodically pull the central to get updated information. This "pull model" creates a lot of requests and is still not suficient, because workers often use outdated information.
I want a "push model", where the central server can "post" updates to "somewhere", which persist the last version of the data. Then workers can "subscribe" to this "somewhere" and be always up-to-date.
The main problems are:
A worker server may be offline when an update happen. When it come back online, it should receive the updates it lost.
A new worker server may be created and need to get the updated data, even the data that was posted before it exists.
A bonus point:
Not needing to manage this "somewhere" myself. My application is deployed at AWS, so if there's any combination of services I can use to achieve that would be great. Everything I found has limited time data retention.
The problems with a push model are:
If clients are offline, the central system would need a retry method, which would generate many more requests than a push model
The clients might be behind firewalls, so cannot receive the message
It is not scalable
A pull model is much more efficient:
Clients should retrieve the latest data when the start, and also at regular intervals
New clients simply connect to the central server -- no need to update the central server with a list of clients (depending upon your security needs)
It is much more scalable
There are several options for serving traffic to pull requests:
Via an API call, powered by AWS API Gateway. You would then need an AWS Lambda function or a web server to handle the request.
Directly from DynamoDB (but the clients would require access credentials)
From an Amazon S3 bucket
Using an S3 bucket has many advantages: Highly scalable, a good range of security options (public; via credentials; via pre-signed URLs), no servers required.
Simply put the data in an S3 bucket and have the clients "pull" the data. You could have one set of files for "every" client, and a specific file for each individual client, thereby enabling individual configuration. Just think of S3 as a very large key-value datastore.
I'm basically just looking for a starting point here. I have an app which needs to include the ability to update certain data in real time. For instance, the user has the ability to specify that she wants X to happen exactly 24 hours from the current time. I want to implement a framework for updating this end-user and any other relevant end-users after 24 hours that the event has occurred. Can anyone just provide me with a high-level explanation of which AWS services to implement and how to implement them in order to achieve this sort of framework? I think it includes some combination of SNS and SQS, but I'm not sure if these are relevant since I don't need to send a message or notification, rather more of an update that some sort of data has changed. If it's relevant, I'm currently using RDS with a MySQL database and Cognito for establishing user identities. Thanks!
I think its most likely a combination of SNS, and an EC2 instance - plus your existing database (and optionally SQS).
SNS can take care of the 'push' notification to a mobile device, but you can't schedule things to happen in the future (except for a few minutes).
Off the top of my head I would say the database keeps a list of what needs to be pushed, when it needs to be pushed and to whom.
The Ec2 instance has a cron job of some sort that polls on some in interval, running queries against your database to find 'things that need to be pushed now'.
If something needs to get a pushed, the cron job uses SNS to do the push - that could either just be a message (hey, you need to get new data), or else if the data is small enough, you could send the data within the message itself.
If you wanted to add a bit of scaling capability, the cron job that finds items to be pushed could, instead of sending out the SNS notifications itself, add a message to an SQS queue (i.e. work to be done), and you could use as many Ec2 instances as you needed querying the SQS queue and then sending out the SNS notifications in a parallel fashion.