Redeliver unprocessed EventHub messages in IEventProcessor.ProcessEventsAsync - azure-eventhub

In IEventProcessor.ProcessEventsAsync I want to store events in a persisted store. It's possible this store is unavailable and messages cannot be persisted. How to sign these messages to be redelivered later?
The store may be down only for some hours, but until it's up again every message is affected and cannot be persisted.

I don't think you can mark a particular event to be delivered in eventhub, unlike ServiceBus queue. However, eventhub does provide retention policy and offset for each event, which make possible to reprocess an old event. You can read more in the "checkpointing" section from this document: https://azure.microsoft.com/en-us/documentation/articles/event-hubs-overview/

Adding to Tyler response, i suppose that you could use the some kind of "Poison Message"/Dead letter queue approaches. Event Hub does not have that functionality, but Service Bus Queues do.
Anyway, i think that it should be a programmatic approach, not something inside of the backend.
There is a good article about something else, but approach is alike what i meant:
https://www.dougv.com/2015/07/handling-poison-messages-in-an-azure-service-bus-queue/

Related

AWS SQS Selective Polling Pattern

I have a system where I publish updates to a shared topic meant for specific consumers.
I noticed messages getting stuck in the queue due to a lack of selective listening in SQS consumers, so messages are being hijacked.
Example:
Given: Message{destination: A, payload: 1234}
Given: ConsumerA, & ConsumerB
I expect Message to be processed by ConsumerA. However, it gets hijacked by Consumer B continuously. It receives the message, then refuses to process it since the destination field doesn't match, leading to the visibility timeout to expire, and the message put back on the queue.. but due to the nature of SQS, ConsumerB has an equal chance of picking the message again.
My question is, what patterns are used to solve this type of issue?
I'm considering creating a queue per consumer but it has drawbacks specific to the system im working on.
If I could only listen for messages with matching attributes, problem solved, but that's seemingly not the case.
Is there any other way?
Sharing a single Amazon SQS queue is not an appropriate architecture for your use-case.
If you want your consumers to be able to 'request' a message from a particular subset, you should either use separate SQS queues or use a database. You could even store objects in Amazon S3 as a form of noSQL database.
Having consumers grab messages and then 'send them back' to the queue is not compatible with the design of the Amazon SQS service.

Is it possible to selectively read from AWS SQS?

I have a use-case. I want to read from SQS always, except when another event happens.
For instance, I have football news into SQS as messages. I want to retrieve them always, except for times when live matches are happening.
Is there any possibility to read unless there is another event does the job?
I scrolled the docs and Stack Overflow, but I don't see a solution.
COMMENT: I have a small and week service, and I cannot because of technical limitations increase it (memory/CPU, etc.), but I still want 2 "conflicting" flows to be in the service. They are both supposed to communicate to the same API, and I don't want them to send conflicting requests.
Is there a way to do it, or will I have to write a custom communicator with SQS?
You can't select which messages you want to read from SQS and which you'd rather not - there is no filtering in SQS.
If you have messages that need to be processed at all times and others that need to be processed only sometimes or in batches, you should put them in separate queues and read from the seperately.
You don't say anything about the infrastructure that reads from the queue, but if it's a process on EC2, you could just stop it while live matches are happening and restart it later. SQS is built for asynchronous messaging and will store the messages for up to 14 days (depending on your configuration) until a consumer is available to read them.

How to stream events with GCP platform?

I am looking into building a simple solution where producer services push events to a message queue and then have a streaming service make those available through gRPC streaming API.
Cloud Pub/Sub seems well suited for the job however scaling the streaming service means that each copy of that service would need to create its own subscription and delete it before scaling down and that seems unnecessarily complicated and not what the platform was intended for.
On the other hand Kafka seems to work well for something like this but I'd like to avoid having to manage the underlying platform itself and instead leverage the cloud infrastructure.
I should also mention that the reason for having a streaming API is to allow for streaming towards a frontend (who may not have access to the underlying infrastructure)
Is there a better way to go about doing something like this with the GCP platform without going the route of deploying and managing my own infrastructure?
If you essentially want ephemeral subscriptions, then there are a few things you can set on the Subscription object when you create a subscription:
Set the expiration_policy to a smaller duration. When a subscriber is not receiving messages for that time period, the subscription will be deleted. The tradeoff is that if your subscriber is down due to a transient issue that lasts longer than this period, then the subscription will be deleted. By default, the expiration is 31 days. You can set this as low as 1 day. For pull subscribers, the subscribers simply need to stop issuing requests to Cloud Pub/Sub for the timer on their expiration to start. For push subscriptions, the timer starts based on when no messages are successfully delivered to the endpoint. Therefore, if no messages are published or if the endpoint is returning an error for all pushed messages, the timer is in effect.
Reduce the value of message_retention_duration. This is the time period for which messages are kept in the event a subscriber is not receiving messages and acking them. By default, this is 7 days. You can set it as low as 10 minutes. The tradeoff is that if your subscriber disconnects or gets behind in processing messages by more than this duration, messages older than that will be deleted and the subscriber will not see them.
Subscribers that cleanly shut down could probably just call DeleteSubscription themselves so that the subscription goes away immediately, but for ones that shut down unexpectedly, setting these two properties will minimize the time for which the subscription continues to exist and the number of messages (that will never get delivered) that will be retained.
Keep in mind that Cloud Pub/Sub quotas limit one to 10,000 subscriptions per topic and per project. Therefore, if a lot of subscriptions are created and either active or not cleaned up (manually, or automatically after expiration_policy's ttl has passed), then new subscriptions may not be able to be created.
I think your original idea was better than ephemeral subscriptions tbh. I mean it works, but it feels totally unnatural. Depending on what your requirements are. For example, do clients only need to receive messages while they're connected or do they all need to get all messages?
Only While Connected
Your original idea was better imo. What I probably would have done is to create a gRPC stream service that clients could connect to. The implementation is essentially an observer pattern. The consumer will receive a message and then iterate through the subscribers to do a "Send" to all of them. From there, any time a client connects to the service, it just registers itself with that observer collection and unregisters when it disconnects. Horizontal scaling is passive since clients are sticky to whatever instance they've connected to.
Everyone always get the message, if eventually
The concept is similar to the above but the client doesn't implicitly un-register from the observer on disconnect. Instead, it would register and un-register explicitly (through a method/command designed to do so). Modify the 'on disconnected' logic to tell the observer list that the client has gone offline. Then the consumer's broadcast logic is slightly different. Now it iterates through the list and says "if online, then send, else queue", and send the message to a ephemeral queue (that belongs to the client). Then your 'on connect' logic will send all messages that are in queue to the client before informing the consumer that it's back online. Basically an inbox. Setting up ephemeral, self-deleting queues is really easy in most products like RabbitMQ. I think you'll have to do a bit of managing whether or not it's ok to delete a queue though. For example, never delete the queue unless the client explicitly unsubscribes or has been inactive for so long. Fail to do that, and the whole inbox idea falls apart.
The selected answer above is most similar to what I'm subscribing here in that the subscription is the queue. If I did this, then I'd probably implement it as an internal bus instead of an observer (since it would be unnecessary) - You create a consumer on demand for a connecting client that literally just forwards the message. The message consumer subscribes and unsubscribes based on whether or not the client is connected. As Kamal noted, you'll run into problems if your scale exceeds the maximum number of subscriptions allowed by pubsub. If you find yourself in that position, then you can unshackle that constraint by implementing the pattern above. It's basically the same pattern but you shift the responsibility over to your infra where the only constraint is your own resources.
gRPC makes this mechanism pretty easy. Alternatively, for web, if you're on a Microsoft stack, then SignalR makes this pretty easy too. Clients connect to the hub, and you can publish to all connected clients. The consumer pattern here remains mostly the same, but you don't have to implement the observer pattern by hand.
(note: arrows in diagram are in the direction of dependency, not data flow)

Getting an SQS message state in boto

I have a SQS producer and lots of consumers. It would be very helpful to know if the producer can tell if a particular message has been deleted by a consumer or not. Is there a way of doing this? I'm currently using boto 2.6.0.
As far as I know, SQS does not provide any mechanism to be notified when a message is deleted. So, I think if you want to know when messages are deleted, you will have to keep track of that separately by keeping a database of message ids and having consumers tell you the message id of any messages they have deleted.
As far as I know you can't track an message in a Queue. Depending on your goal you could try the following things:
Monitoring
Write the results of a job to a logfile, and maybe use something like logstash with Kibana. If you get creative you might even fire things straight into something like ElasticSearch or SimpleDB.
Callback
The receivers could fire any kind of "callback" to the processor or any other process updating a certain message state in for example a database or a cache.
You have to keep in mind that this means that while you scale up your receivers, your processor has to scale up as well. Also keep an eye on your indexes, make sure your status update "write" is fast.

Should WS-notification be used to just notify or should the data also be transmitted with the payload

Should WS-notification (WS Notification) be used to just notify or should the data also be transmitted with the payload to save an extra call(back).
Use Case:
A customer's record has changed. Need to notify other systems. Sends a notification.
Scenario 1.
Send the notification with customer record changes. Could be bad since each listening system might do a different action or may or may not need the customer record.
Scenario 2.
Just send the notification. Means that each listening system will have to "react" in some way. Responsibility is on the listening system.
Two methods.
Pub/Sub Push and Pub/Sub Pull.
Pub/Sub Push is to push out the full data.
Pub/Sub Pull is to send enough data for the target app to call back and request the full data. This allows better control of information passed than the pub/sub push method.
Pub/Sub Push method is the easiest to implement.
Pub-sub kind of implies that the notification consumers are already interested in the topic in question by virtue of the fact that they have subscribed. However, as you say, they may not need to respond. So if you consider the notification to be a true event then the notifying system is saying, "here is a notification that my state has changed". If the notification consumer is interested it can use request-response to get that new state. This would be more flexible and lightweight.
Notifications are inherently more event-oriented and therefore using them to push state should be considered carefully. Particularly as with pub-sub you seldom have an idea as to how many subscribers you have at run time - then capacity planning can be difficult and peak load spikes are not uncommon.
So keep the notifications lightweight. Let the consumers decide if they're to act on the event. You're on your way to a true EDA!