django channels redis multiple consumers receive message a different times ensured? - django

Does this need to be implemented or is it in Channels already?
If I have a channel group with multiple consumers subscribed to it and one consumer is sent the message is the message lost to the rest of the consumers or does the message persist until all consumers see the message?
Or does the message persist for time until time is expired regardless of consumers seeing it or not?

The Group objects manages delivery to all consumers (where possible) and message expiry. But note that delivery is not ensured.
From the documentation:
Channels implements this abstraction as a core concept called Groups ...
[Groups] also automatically manage expiry of the group members - when the channel starts having messages expire on it due to non-consumption, we go in and remove it from all the groups it’s in as well ...
One thing channels do not do, however, is guarantee delivery. If you need certainty that tasks will complete, use a system designed for this with retries and persistence (e.g. Celery)

Related

AWS SQS Selective Polling Pattern

I have a system where I publish updates to a shared topic meant for specific consumers.
I noticed messages getting stuck in the queue due to a lack of selective listening in SQS consumers, so messages are being hijacked.
Example:
Given: Message{destination: A, payload: 1234}
Given: ConsumerA, & ConsumerB
I expect Message to be processed by ConsumerA. However, it gets hijacked by Consumer B continuously. It receives the message, then refuses to process it since the destination field doesn't match, leading to the visibility timeout to expire, and the message put back on the queue.. but due to the nature of SQS, ConsumerB has an equal chance of picking the message again.
My question is, what patterns are used to solve this type of issue?
I'm considering creating a queue per consumer but it has drawbacks specific to the system im working on.
If I could only listen for messages with matching attributes, problem solved, but that's seemingly not the case.
Is there any other way?
Sharing a single Amazon SQS queue is not an appropriate architecture for your use-case.
If you want your consumers to be able to 'request' a message from a particular subset, you should either use separate SQS queues or use a database. You could even store objects in Amazon S3 as a form of noSQL database.
Having consumers grab messages and then 'send them back' to the queue is not compatible with the design of the Amazon SQS service.

SQS Queues/ Visibility Timeouts/ message groups

I am new to AWS. I am trying to understand SQS here. I have gone over a few trainings also but I still could not get some answers there in the discussion forum. I am re-iterating my question here. Note that I know that a few questions below have obvious answers and are therefore more of a rhetoric. My confusion stems from the fact that my understanding of the topic at present leads me to give conflicting answers to the follow up questions that spring up in my mind after the obvious known ones and takes away the confidence of whatever I think I understand alright.
If I have a Standard queue named MyQueue and there are 100 messages, and if there are 2 completely separate applications (as consumers; note they are not a consumer group of the same applications like you have in Kafka; instead they are 2 separate applications) for this queue, then the consumers may receive
(i) out of order messages and
(ii) multiple copies of the messages
Both of my applications do not need to bother about the order of the messages. But for the sake of the question lets say we have a perfect order of delivery, no multiple copies and no network issues and both consumers finish their processing if each message well within the Visibility Timeout window.
Q1: Will both the applications individually receive 100 messages each or will a message that is made available to one consumer won't ever be delivered to the other consumer? If the latter is true ( with no network issues, out of order delivery, multiple deliveries), then:
Is SNS-SQS fanout the way to ensure that the same message is processed by multiple consumers?
Is the consumer supposed to delete the message from the queue after processing? So, if a message is picked up by a processor, and it goes into visibility timeout while the processing happens and then is not deleted by the consumer even after the processing is complete before the visibility timeout, then will the message appear back for other consumers possibly to consume it? If that is the case, then won't the same thing apply to a FIFO queue as well?
Other Questions:
Q2: Is the Visibility timeout applicable to both Standard Queue and FIFO Queue? If it is also applicable to FIFO Queue which promises exactly once delivery, then, if the Visibility Timeout appears before the consumer ends processing a message, then it reappears in the queue only to be delivered again thereby going back to at least once processing. Can someone confirm?
Q3: What are multiple message Groups within a FIFO Queue? Are they like partitions of a queue?
Q: Will both the applications individually receive 100 messages each?
A consumer can request up to 10 messages per API call. These will become 'invisible' and will not be provided to other consumers. (Well, there actually is a small possibility that a message might be provided to multiple consumers. It is rare, but it can happen. If this is bad for your use-case, then you should track the messages in a database to ensure they are only processed once each.)
Q: Is SNS-SQS fanout the way to ensure that the same message is processed by multiple consumers?
It is very strange to want to want a single message consumed by 'multiple consumers'. The normal desire is to process each message once. If you do want a message processed by multiple consumers then, yes, you could send the message to SNS, which could then send it to multiple queues.
Q:Is the consumer supposed to delete the message from the queue after processing?
Yes. Amazon SQS does not know when a message is processed. The consumer must delete the message via the ReceiptHandle provided when the message was received. If a message times-out and another consumer receives it, SQS will provide a different ReceiptHandle so it knows which process requested the delete.
This also applies to FIFO queues.
Q: Is the Visibility timeout applicable to both Standard Queue and FIFO Queue?
Yes. If the visibility timeout expires, the message will be provided to another consumer. The "exactly once delivery" avoids the rare situation mentioned above when a message in a Standard queue might be provided more than once. However, if visibility times-out, even in a FIFO queue, then it will intentionally be visible on the queue again.
Q: What are multiple message Groups within a FIFO Queue? Are they like partitions of a queue?
A message group is a way of grouping messages that must be delivered in-order.
Let's say there are two message groups, A and B, and they send messages in this order: A1, B1, A2, B2
Message B1 can be provided even if A1 is not yet deleted. However, message A2 will not be provided until A1 is deleted. Think of them as 'mini-queues'. This allows processing of lots of messages are are unrelated, without having to wait for all previous messages to be deleted.
See: Using the Amazon SQS Message Group ID - Amazon Simple Queue Service
Q1: Will both the applications individually receive 100 messages each or will a message that is made available to one consumer won't ever be delivered to the other consumer?
Neither of these is quite accurate.
Standard queues never intentionally deliver a message more than once. It is possible that messages may occasionally be delivered more than once -- but this is the exception and is an artifact of the fact that SQS is a distributed system and situations could arise where, for example, the queue had a message stored in multiple replicas and the fact that a message was not known to all replicas due to an internal failure.
If a message is inadvertently delivered more than once, it could be to multiple consumers or the same consumer. The consumer "connections" to SQS are actually stateless, resetting each time a list of messages is delivered, so SQS does not have a sense of which consumer it delivered each message to.
Consumers delete their messages after processing, otherwise their visibilitt timeout expires and they are delivered again and again -- to whichever consumer the luck of the draw delivers them to, each time. As noted, SQS has no concept of consumer identity or state. (In high volume applications, a single consumer may actually have multiple connections to SQS, all receiving messages in parallel, because the network round-trips and cycle of receive/delete will otherwise limit a single consumer to a few hundred messages per second. Whether these connections are handled using asynchronous I/O, threads, etc., is unimportant to SQS, which doesn't care which consumer is on a given connection.)
If you want all messages sent to all consumers, you need fan-out from SNS to SQS.
Q2: Is the Visibility timeout applicable to both Standard Queue and FIFO Queue?
Yes. Because (noted above) the connection to SQS is not a persistent, stateful connection, SQS uses visibility timeout as the indication that a consumer has lost the message or failed ungracefully, so the message needs to be made accessible again. (Dead letter queues prevent this from happening endlessly, moving a message to a different queue, since repeated failures indicate a problem with a consumer, or a "poison pill" message.)
FIFO queues retain in-order delivery, here, and you could argue that they revert to "at least once" delivery, but the idea is that this should never happen. If it does, then your visibility timeout is too short or your consumer is crashing or otherwise misplacing messages.
Q3: What are multiple message Groups within a FIFO Queue?
Message groups allow FIFO queues to support in-order, parallel processing of groups messages whose ordering relative to each other across group boundaries doesn't matter. Messages are delivered in order, within each group.
If a FIFO queue, if all messages are sent with the same group ID, then only one consumer can be working at a time.
In-order delivery (simple illustration) means that message 2 will not be delivered to any consumer until message 1 has been received and deleted -- finished -- by a consumer. In order delivery includes all processing (not merely the initial "delivery"). Or if 20 messages in the queue have the same group ID and two consumers request 10 messages each, one consumer gets 10 and the other gets nothing -- yet -- because those second 10 messages have to be sequestered, until the first 10 have been processed (else we are no longer "in order").
In the 20 messages scenario, if 14 were in group A and 6 were in group B, one consumer would receive A1-A10, A11-A14 would be sequestered until A1-A10 were complete, but while the first consumer is busy, another consumer could have B1-B6 at the same time.
Note again that there is no consumer affinity. If A1-A10 and B1-B6 were deleted at the same instant, A11-A14 would next be delivered to one consumer, but not necessarily the one that handled A1-A10.

How to stream events with GCP platform?

I am looking into building a simple solution where producer services push events to a message queue and then have a streaming service make those available through gRPC streaming API.
Cloud Pub/Sub seems well suited for the job however scaling the streaming service means that each copy of that service would need to create its own subscription and delete it before scaling down and that seems unnecessarily complicated and not what the platform was intended for.
On the other hand Kafka seems to work well for something like this but I'd like to avoid having to manage the underlying platform itself and instead leverage the cloud infrastructure.
I should also mention that the reason for having a streaming API is to allow for streaming towards a frontend (who may not have access to the underlying infrastructure)
Is there a better way to go about doing something like this with the GCP platform without going the route of deploying and managing my own infrastructure?
If you essentially want ephemeral subscriptions, then there are a few things you can set on the Subscription object when you create a subscription:
Set the expiration_policy to a smaller duration. When a subscriber is not receiving messages for that time period, the subscription will be deleted. The tradeoff is that if your subscriber is down due to a transient issue that lasts longer than this period, then the subscription will be deleted. By default, the expiration is 31 days. You can set this as low as 1 day. For pull subscribers, the subscribers simply need to stop issuing requests to Cloud Pub/Sub for the timer on their expiration to start. For push subscriptions, the timer starts based on when no messages are successfully delivered to the endpoint. Therefore, if no messages are published or if the endpoint is returning an error for all pushed messages, the timer is in effect.
Reduce the value of message_retention_duration. This is the time period for which messages are kept in the event a subscriber is not receiving messages and acking them. By default, this is 7 days. You can set it as low as 10 minutes. The tradeoff is that if your subscriber disconnects or gets behind in processing messages by more than this duration, messages older than that will be deleted and the subscriber will not see them.
Subscribers that cleanly shut down could probably just call DeleteSubscription themselves so that the subscription goes away immediately, but for ones that shut down unexpectedly, setting these two properties will minimize the time for which the subscription continues to exist and the number of messages (that will never get delivered) that will be retained.
Keep in mind that Cloud Pub/Sub quotas limit one to 10,000 subscriptions per topic and per project. Therefore, if a lot of subscriptions are created and either active or not cleaned up (manually, or automatically after expiration_policy's ttl has passed), then new subscriptions may not be able to be created.
I think your original idea was better than ephemeral subscriptions tbh. I mean it works, but it feels totally unnatural. Depending on what your requirements are. For example, do clients only need to receive messages while they're connected or do they all need to get all messages?
Only While Connected
Your original idea was better imo. What I probably would have done is to create a gRPC stream service that clients could connect to. The implementation is essentially an observer pattern. The consumer will receive a message and then iterate through the subscribers to do a "Send" to all of them. From there, any time a client connects to the service, it just registers itself with that observer collection and unregisters when it disconnects. Horizontal scaling is passive since clients are sticky to whatever instance they've connected to.
Everyone always get the message, if eventually
The concept is similar to the above but the client doesn't implicitly un-register from the observer on disconnect. Instead, it would register and un-register explicitly (through a method/command designed to do so). Modify the 'on disconnected' logic to tell the observer list that the client has gone offline. Then the consumer's broadcast logic is slightly different. Now it iterates through the list and says "if online, then send, else queue", and send the message to a ephemeral queue (that belongs to the client). Then your 'on connect' logic will send all messages that are in queue to the client before informing the consumer that it's back online. Basically an inbox. Setting up ephemeral, self-deleting queues is really easy in most products like RabbitMQ. I think you'll have to do a bit of managing whether or not it's ok to delete a queue though. For example, never delete the queue unless the client explicitly unsubscribes or has been inactive for so long. Fail to do that, and the whole inbox idea falls apart.
The selected answer above is most similar to what I'm subscribing here in that the subscription is the queue. If I did this, then I'd probably implement it as an internal bus instead of an observer (since it would be unnecessary) - You create a consumer on demand for a connecting client that literally just forwards the message. The message consumer subscribes and unsubscribes based on whether or not the client is connected. As Kamal noted, you'll run into problems if your scale exceeds the maximum number of subscriptions allowed by pubsub. If you find yourself in that position, then you can unshackle that constraint by implementing the pattern above. It's basically the same pattern but you shift the responsibility over to your infra where the only constraint is your own resources.
gRPC makes this mechanism pretty easy. Alternatively, for web, if you're on a Microsoft stack, then SignalR makes this pretty easy too. Clients connect to the hub, and you can publish to all connected clients. The consumer pattern here remains mostly the same, but you don't have to implement the observer pattern by hand.
(note: arrows in diagram are in the direction of dependency, not data flow)

How is Google Cloud Pub/Sub avoiding clock skew

I am looking into ways to order list of messages from google cloud pub/sub. The documentation says:
Have a way to determine from all messages it has currently received whether or not there are messages it has not yet received that it needs to process first.
...is possible by using Cloud Monitoring to keep track of the pubsub.googleapis.com/subscription/oldest_unacked_message_age metric. A subscriber would temporarily put all messages in some persistent storage and ack the messages. It would periodically check the oldest unacked message age and check against the publish timestamps of the messages in storage. All messages published before the oldest unacked message are guaranteed to have been received, so those messages can be removed from persistent storage and processed in order.
I tested it locally and this approach seems to be working fine.
I have one gripe with it however, and this is not something easily testable by myself.
This solution relies on server-side assigned (by google) publish_time attribute. How does Google avoid the issues of skewed clocks?
If my producer publishes messages A and then immediately B, how can I be sure that A.publish_time < B.publish_time is true? Especially considering that the same documentation page mentions internal load-balancers in the architecture of the solution. Is Google Pub/Sub using atomic clocks to synchronize time on the very first machines which see messages and enrich those messages with the current time?
There is an implicit assumption in the recommended solution that the clocks on all the servers are synchronized. But the documentation never explains if that is true or how it is achieved so I feel a bit uneasy about the solution. Does it work under very high load?
Notice I am only interested in relative order of confirmed messages published after each other. If two messages are published simultaneously, I don't care about the order of them between each other. It can be A, B or B, A. I only want to make sure that if B is published after A is published, then I can sort them in that order on retrieval.
Is the aforementioned solution only "best-effort" or are there actual guarantees about this behavior?
There are two sides to ordered message delivery: establishing an order of messages on the publish side and having an established order of processing messages on the subscribe side. The document to which you refer is mostly concerned with the latter, particularly when it comes to using oldest_unacked_message_age. When using this method, one can know that if message A has a publish timestamp that is less than the publish timestamp for message B, then a subscriber will always process message A before processing message B. Essentially, once the order is established (via publish timestamps), it will be consistent. This works if it is okay for the Cloud Pub/Sub service itself to establish the ordering of messages.
Publish timestamps are not synchronized across servers and so if it is necessary for the order to be established by the publishers, it will be necessary for the publishers to provide a timestamp (or sequence number) as an attribute that is used for ordering in the subscriber (and synchronized across publishers). The subscriber would sort message by this user-provided timestamp instead of by the publish timestamp. The oldest_unacked_message_age will no longer be exact because it is tied to the publish timestamp. One could be more conservative and only consider messages ordered that are older than oldest_unacked_message_age minus some delta to account for this discrepancy.
Google Cloud Pub-sub does not guarantee order of events receive to consumers as they were produced. Reason behind that is Google Cloud Pub-sub also running on a cluster of nodes. The possibility is there an event B can reach the consumer before event A. To Ensure ordering you have to make changes on both producer and consumer to identify the order of events. Here is section from docs.

Chat bots: ensuring serial processing of messages on a per-conversation basis in clustered environment

In the context of writing a Messenger chat bot in a cloud environment, I'm facing some concurrency issues.
Specifically, I would like to ensure that incoming messages from the same conversation are processed one after the other.
As a constraint, I'm processing the messages with workers in a Cloud environment (i.e the worker pool is of variable size and worker instances are potentially short-lived and may crash). Also, low latency is important.
So abstracting a little, my requirements are:
I have a stream of incoming messages
each of these messages has a 'topic key' (the conversation id)
the set of topics is not known ahead-of-time and is virtually infinite
I want to ensure that messages of the same topic are processed serially
on a cluster of potentially ephemeral workers
if possible, I would like reliability guarantees e.g making sure that each message is processed exactly once.
My questions are:
Is there a name for this concurrency scenario?.
Are there technologies (message brokers, coordination services, etc.) which implement this out of the box?
If not, what algorithms can I use to implement this on top of lower-level concurrency tools? (distributed locks, actors, queues, etc.)
I don't know of a widely-accepted name for the scenario, but a common strategy to solve that type of problem is to route your messages so that all messages with the same topic key end up at the same destination. A couple of technologies that will do this for you:
With Apache ActiveMQ, HornetQ, or Apache ActiveMQ Artemis, you could use your topic key as the JMSXGroupId to ensure all messages with the same topic key are processed in-order by the same consumer, with failover
With Apache Kafka, you could use your topic key as the partition key, which will also ensure all messages with the same topic key are processed in-order by the same consumer
Some message broker vendors refer to this requirement as Message Grouping, Sticky Sessions, or Sticky Message Load Balancing.
Another common strategy on messaging systems with weaker delivery/ordering guarantees (like Amazon SQS) is to simply include a sequence number in the message and leave it up to the destination to resequence and request redelivery of missing messages as needed.
I think you can fix this by using a queue and a set. What I can think of is sending every message object in queue and processing it as first in first out. But while adding it in queue add topic name in set and while taking it out for processing remove topic name from set.
So now if you have any topic in set then don't add another message object of same topic in queue.
I hope this will help you. All the best :)