Consumer group Load balanced readers - azure-eventhub

I am doing some POC to use EventHub, i do have multiple instances for a consumer and expect only one of the instance will receive the event, i tried both EventHubClient and EventProcessorHost is there any way i can make it possible. Kafka has a similar support for Load balanced consumer based on consumer group https://kafka.apache.org/intro.html#intro_consumers. I always get an error saying
New receiver with higher epoch of '4' is created hence current receiver with epoch '3' is getting disconnected.

The issue was that the host name which i registered was same for every instance, Now i have changed to use different host name per instance using UUID Random as hostname while constructing the EventProcessorHost class

Related

ReadEventsAsync got EventHubsExeception(ConsumerDisconnected) intermitently

I am using EventHubConsumerClient.ReadEventsAsync method to read events in eventHub. It works perfectly when I use default eventHub. However, when I route it to a new eventHub I am getting EventHubsExeception(ConsumerDisconnected) from time to time. From the documentation. It says this happen due to A client was forcefully disconnected from an Event Hub instance. This typically occurs when another consumer with higher OwnerLevel asserts ownership over the partition and consumer group. I almost got this exception every time. Only a few time it works. Anyone know how to resolve this? Or is there a better way to read message from eventHub? I don't want to use eventProcessorClient since it requires blobContainerClient
for the code, I followed the sample
await using var consumerClient = new EventHubConsumerClient(
EventHubConsumerClient.DefaultConsumerGroupName,
eventHubConnectionString,
eventHubName
);
await foreach (PartitionEvent partitionEvent in consumerClient.ReadEventsAsync(cancelToken)){
...
}
The error that you're seeing is very specific to a single scenario: another client has opened an AMQP link to one of the partitions you're reading from and has requested that the Event Hubs service give it exclusive access. This results in the Event Hubs service terminating your link with an AMQP error code of Stolen which the Event Hubs SDK translates into the form that you're seeing. (source)
These requests for exclusive access are enforced on a consumer group level. In your snippet, you're using the default consumer group, which is apparently also used by other consumers. As a best practice, I'd recommend that you create a unique consumer group for each application that is reading from the Event Hub - unless you specifically want them to interact.
In your case, your client is not requesting exclusive access, so anyone that is will take precedence. If you were to create a new consumer group and use that to configure your client, I would expect your disconnect errors to stop.

django channels redis multiple consumers receive message a different times ensured?

Does this need to be implemented or is it in Channels already?
If I have a channel group with multiple consumers subscribed to it and one consumer is sent the message is the message lost to the rest of the consumers or does the message persist until all consumers see the message?
Or does the message persist for time until time is expired regardless of consumers seeing it or not?
The Group objects manages delivery to all consumers (where possible) and message expiry. But note that delivery is not ensured.
From the documentation:
Channels implements this abstraction as a core concept called Groups ...
[Groups] also automatically manage expiry of the group members - when the channel starts having messages expire on it due to non-consumption, we go in and remove it from all the groups it’s in as well ...
One thing channels do not do, however, is guarantee delivery. If you need certainty that tasks will complete, use a system designed for this with retries and persistence (e.g. Celery)

Akka design: How to add/remove routee from cluster aware router dynamically

I have the following use case and I am not sure if the akka toolkit provide this out of the box:
I have a number of nodes (instance/machine) that can run a finite number of long running task in the background and cannot accept more work while at max capacity.
Each instance can only process 50 tasks.
All instances are behind a load balancer.
Each task can respond to messages from the client who initiated the task, since the client sends the messages via the load balancer the instances need to route it to the correct instance that handles the task.
I have tried initially cluster sharding, but there doesn't seem to be a way to cap the maximum number of shard regions/actors per node (= #tasks).
Then I tried it with a cluster aware router, which acts as a guard for accepting or rejecting work. This seems to work reasonable well, one problem is that once it reaches capacity I need to remove it as a routee and add it back once it has capacity again.
Is there something out of the box that supports this use case or should I carry on with the routing option and if so how can I achieve this?
I'll update the description if you have further questions or something is unclear.
Your scenario sounds like a good fit for the work pulling pattern. The gist of this pattern is:
A master actor coordinates units of work among a number of worker actors.
Workers register themselves to the master, meaning that workers can be added or removed dynamically.
When the master receives work to be done, the master notifies the workers that work is available. Workers pull units of work when they're ready, do what needs to be done with their respective units of work, then ask the master for more work when they're finished.
To learn more about this pattern, read the following (the first two links are listed in the Akka documentation):
The original post (by Derek Wyatt): http://letitcrash.com/post/29044669086/balancing-workload-across-nodes-with-akka-2
A follow-on post (by Michael Pollmeier): http://www.michaelpollmeier.com/akka-work-pulling-pattern
An application of the pattern in a clustered environment with a cluster-aware router (by Ryan Tanner): https://www.conspire.com/blog/2013/10/akka-at-conspire-part-5-the-importance-of/

Difference between actor pools and groups in Akka

I'm just starting Akka and I'm trying to understand the difference between actor pools and groups and when to use which. In the doc it briefly says that a group is not created by a router, so that means they don't have a master?
In the situation below, can you route messages directly from one worker group (or pool?) to another without sending it via the Master?
About the difference:
Sometimes, rather than having the router actor create its routees, it
is desirable to create routees separately and provide them to the
router for its use. You can do this by passing an paths of the routees
to the router's configuration. Messages will be sent with
ActorSelection to these paths.
So the difference is that in case of "pool", your workers are created (and supervized) automatically by the pool. In case of "group" - you have to first create actors and then pass a list of paths (which will be used in ActorSelection) to these actors into the master:
val router: ActorRef = // group's master, but not supervisor
context.actorOf(RoundRobinGroup(List("/user/workers/w1", "/user/workers/w2", "/user/workers/w3")).props(), "router4")
So, both have a master actor (router), but in second case workers are created manually by another actor(s) - so this another actor supervise them by default (if they are not top-level, of course) and receiving lifecycle messages. As a result you have 3 kinds of actors here: master, supervisor(s), workers.
About the "direct" routing. Every group/pool has its own synthetic master actor, so when you sending a message to the group, it's always going to the master first. But, if you know the address of the group member (like "/user/workers/w1" in example above), nothing stops you from sending message directly to the worker.

Life of redis connection/pipeline?

I am creating Redis pipeline as below in python:
rPipe = redis.Redis(...).pipeline()
Variable rPipe is defined in the __init__ of a class.
The functions in the class execute set and get commands when called by user using rpipe.
rpipe.set(...)
rpipe.execute()
But as I understand, Redis connections are closed by Redis server automatically, so how long my rPipe will be valid once I created the object?
Under normal conditions (e.g. unless you're hitting the limit on max number of clients or max buffer size, or if your client sets a specific timeout) Redis doesn't close client connections automatically.
Pipelines in Redis are a simple way to group commands together and send them to the server all at once, then receiving all the replies in a single step.
Assuming you're using the redis-py library (but the same arguments may reasonably hold for any well thought client), (only) when you call execute() on a pipeline object the commands are packed and sent to Redis. Then the state of the pipeline object is reset and it can be safely reused by the client.
As a side note, if using redis-py, consider that pipelined commands are wrapped in a MULTI/EXEC transaction by default, which is not always desirable.