Behavior of akka pub-sub when no subscriber is registered - akka

We are planning to use akka pub-sub in our project. I have the following 2 queries reg. the akka pub-sub behavior.
What if the Publisher starts sending the messages before any actor has subscribed. What will happen to those messages that were published before any subscriber came to existence. Will those be discarded silently ?
What if the subscriber actor dies?[There are no subscribers at all] Will the messages sent by the Publisher gets accumulated somewhere or will it be discarded by the pub-sub system.

Message routing is decided on the spot: no subscribers, no sending. Buffering messages arbitrarily within the toolkit will only lead to surprising memory outages. If you want to buffer you will have to do that explicitly.

If no one is subscribed, then the messages aren't caught. But if you set things up right, that won't be an issue.
For the second question, it depends what you mean by it dying. Actor death is where the actor is gone for good. Which is different from an actor failing and being restarted. If the actor is restarted using fault-handling in its supervisor, it preserves its mailbox so nothing is lost. Only if the actor completely died (not restarted), will it lose its mailbox.
So, with a good fault-handling scheme, you can preserve your messages across actor failure for most use-cases. Just keep your listeners higher-up in the actor heiarchy and push all risky things that are likely to fail such as I/O down to the bottom. Look into the error kernel pattern to see what I mean. That way, failure usually won't climb high enough for you to be losing mailboxes.
Since its just a restart, all messages it is subscribed for will still end up in its mailbox and be waiting for it.

Related

How to stream events with GCP platform?

I am looking into building a simple solution where producer services push events to a message queue and then have a streaming service make those available through gRPC streaming API.
Cloud Pub/Sub seems well suited for the job however scaling the streaming service means that each copy of that service would need to create its own subscription and delete it before scaling down and that seems unnecessarily complicated and not what the platform was intended for.
On the other hand Kafka seems to work well for something like this but I'd like to avoid having to manage the underlying platform itself and instead leverage the cloud infrastructure.
I should also mention that the reason for having a streaming API is to allow for streaming towards a frontend (who may not have access to the underlying infrastructure)
Is there a better way to go about doing something like this with the GCP platform without going the route of deploying and managing my own infrastructure?
If you essentially want ephemeral subscriptions, then there are a few things you can set on the Subscription object when you create a subscription:
Set the expiration_policy to a smaller duration. When a subscriber is not receiving messages for that time period, the subscription will be deleted. The tradeoff is that if your subscriber is down due to a transient issue that lasts longer than this period, then the subscription will be deleted. By default, the expiration is 31 days. You can set this as low as 1 day. For pull subscribers, the subscribers simply need to stop issuing requests to Cloud Pub/Sub for the timer on their expiration to start. For push subscriptions, the timer starts based on when no messages are successfully delivered to the endpoint. Therefore, if no messages are published or if the endpoint is returning an error for all pushed messages, the timer is in effect.
Reduce the value of message_retention_duration. This is the time period for which messages are kept in the event a subscriber is not receiving messages and acking them. By default, this is 7 days. You can set it as low as 10 minutes. The tradeoff is that if your subscriber disconnects or gets behind in processing messages by more than this duration, messages older than that will be deleted and the subscriber will not see them.
Subscribers that cleanly shut down could probably just call DeleteSubscription themselves so that the subscription goes away immediately, but for ones that shut down unexpectedly, setting these two properties will minimize the time for which the subscription continues to exist and the number of messages (that will never get delivered) that will be retained.
Keep in mind that Cloud Pub/Sub quotas limit one to 10,000 subscriptions per topic and per project. Therefore, if a lot of subscriptions are created and either active or not cleaned up (manually, or automatically after expiration_policy's ttl has passed), then new subscriptions may not be able to be created.
I think your original idea was better than ephemeral subscriptions tbh. I mean it works, but it feels totally unnatural. Depending on what your requirements are. For example, do clients only need to receive messages while they're connected or do they all need to get all messages?
Only While Connected
Your original idea was better imo. What I probably would have done is to create a gRPC stream service that clients could connect to. The implementation is essentially an observer pattern. The consumer will receive a message and then iterate through the subscribers to do a "Send" to all of them. From there, any time a client connects to the service, it just registers itself with that observer collection and unregisters when it disconnects. Horizontal scaling is passive since clients are sticky to whatever instance they've connected to.
Everyone always get the message, if eventually
The concept is similar to the above but the client doesn't implicitly un-register from the observer on disconnect. Instead, it would register and un-register explicitly (through a method/command designed to do so). Modify the 'on disconnected' logic to tell the observer list that the client has gone offline. Then the consumer's broadcast logic is slightly different. Now it iterates through the list and says "if online, then send, else queue", and send the message to a ephemeral queue (that belongs to the client). Then your 'on connect' logic will send all messages that are in queue to the client before informing the consumer that it's back online. Basically an inbox. Setting up ephemeral, self-deleting queues is really easy in most products like RabbitMQ. I think you'll have to do a bit of managing whether or not it's ok to delete a queue though. For example, never delete the queue unless the client explicitly unsubscribes or has been inactive for so long. Fail to do that, and the whole inbox idea falls apart.
The selected answer above is most similar to what I'm subscribing here in that the subscription is the queue. If I did this, then I'd probably implement it as an internal bus instead of an observer (since it would be unnecessary) - You create a consumer on demand for a connecting client that literally just forwards the message. The message consumer subscribes and unsubscribes based on whether or not the client is connected. As Kamal noted, you'll run into problems if your scale exceeds the maximum number of subscriptions allowed by pubsub. If you find yourself in that position, then you can unshackle that constraint by implementing the pattern above. It's basically the same pattern but you shift the responsibility over to your infra where the only constraint is your own resources.
gRPC makes this mechanism pretty easy. Alternatively, for web, if you're on a Microsoft stack, then SignalR makes this pretty easy too. Clients connect to the hub, and you can publish to all connected clients. The consumer pattern here remains mostly the same, but you don't have to implement the observer pattern by hand.
(note: arrows in diagram are in the direction of dependency, not data flow)

does new instance of actor get created when too many msg?

I recently learned about the akka,but some idea I can't grasp.
my question is, if there are too many message in queue,will a new actor be created?
in many framework,for example, one http-requet message coming,and the framework found that the current "worker" are busy,so the framework will create another "worker " to process the new message in another thread.
but it seems the akka doesn't do this way,there is only one actor instance.
so I think the "busy actor" will bocking the queue, which will hit the throughout and performance , am I correct?
Each Actor stores their messages in a Mailbox.
http://doc.akka.io/docs/akka/current/scala/mailboxes.html
The default mailbox is unbounded and non-blocking. If your actor cannot process messages quickly enough, their mailbox balloons in size and consumes increasing amounts of RAM. You can configure Akka to use a bounded, blocking Mailbox which will block the sender when over capacity.
If you would like to dynamically manage a pool of actors, look into Routing strategies.
http://doc.akka.io/docs/akka/2.4.1/scala/routing.html
You can create a Router Actor that receives messages and passes them to routee actors. The Router also manages the routee pool and can dynamically generate routees as needed.
Also, if using Future and callback asynchronous execution, your actors will not block on http requests.
TL;DR:
If you send messages faster than your Actor can process them, eventually your application will start dropping messages.
Longer answer:
As I understand, every Akka Actor has a Queue associated with it, which holds all the messages it receives.
If you send messages to this Actor, faster than the Actor can process them, eventually the queue will overflow, since messages on the queue are kept in ram.
It is not possible to spawn another Actor, on the fly. Since the messages on the queue are processed in order. This ordering will be broken if more than one consumer exists.
I would suggest you take a look at Akka Streams, this is a higher level API built on top of actors, and guards you against this kind of thing by providing backpressure throughout your system. This means that if the actor you're sending messages to is slower than whoever is producing the messages, the consumer will ask the producer to slow down, and will not overflow your Actor's queue.

Use case for Akka PoisonPill

According to the Akka docs for PoisonPill:
You can also send an actor the akka.actor.PoisonPill message, which will stop the actor when the message is processed. PoisonPill is enqueued as ordinary messages and will be handled after messages that were already queued in the mailbox.
Although the usefulness/utility of such a feature may be obvious to an Akka Guru, to a newcomer, this sounds completely useless/reckless/dangerous.
So I ask: What's the point of this message and when would one ever use it, for any reason?!?
We use a pattern called disposable actors:
A new temporary actor is created for each application request.
This actor may create some other actors to do some work related to the request.
Processed result is sent back to client.
All temporary actors related to this request are killed. That's the place where PoisonPill is used.
Creating an actor implies a very low overhead (about 300 bytes of RAM), so it's quite a good practise.

Actor model with Akka.NET: how to prevent sending messages to dead actors

I am using Akka.NET to implement an actor system in which some actors are created on demand and are deleted after a configurable idle period (I use Akka's "ReceiveTimeout" mechanism for this). Each of these actors is identified by a key, and there should not exist two actors with the same key.
These actors are currently created and deleted by a common supervisor. The supervisor can be asked to return a reference to the actor matching a given key, either by returning an existing one or creating a new one, if an actor with this key doesn't exist yet. When an actor receives the "ReceiveTimeout" message, it notifies the supervisor who in turn kills it with a "PoisonPill".
I have an issue when sending a message to one of these actors right after it has been deleted. I noticed that sending a message to a dead actor doesn't generate an exception. Worse, when sending an "Ask" message, the sender remains blocked, waiting indefinitely (or until a timeout) for a response that he will never receive.
I first thought about Akka's "Deatchwatch" mechanism to monitor an actor's lifecycle. But, if I'm not mistaken, the "Terminated" message sent by the dying actor will be received by the monitoring actor asynchronously just like any other message, so the problem may still occur in between the target actor's death and the reception of its "Terminated" message.
To solve this problem, I made it so that anyone asking the supervisor for a reference to such an actor has to send a "close session" message to the supervisor to release the actor when he doesn't need it anymore (this is done transparently by a disposable "ActorSession" object). As long as there are any open sessions on an actor, the supervisor will not delete it.
I suppose that this situation is quite common and am therefore wondering if there isn't a simpler pattern to follow to address this kind of problem. Any suggestion would be appreciated.
I have an issue when sending a message to one of these actors right after it has been deleted. I noticed that sending a message to a dead actor doesn't generate an exception.
This is by design. You will never receive an exception upon attempting to send a message - it will simply be routed to Deadletters and logged. There's a lot of reasons for this that I won't get into here, but the bottom line is that this is intended behavior.
DeathWatch is the right tool for this job, but as you point out - you might receive a Terminated message after you already sent a message to that actor.
A simpler pattern than tracking open / closed sessions is to simply use acknowledgement / reply messages from the recipient using Ask + Wait + a hard timeout. The downside of course is that if your recipient actor has a lot of long-running operations then you might block for a long period of time inside the sender.
The other option you can go with is to redesign your recipient actor to act as a state machine and have a soft-terminated or terminating state that it uses to drain connections / references with potential senders. That way the original actor can still reply and accept messages, but let callers know that it's no longer available to do work.
I solved this problem with entity actors created through Akka's Cluster Sharding mechanism:
If the state of the entities are persistent you may stop entities that are not used to reduce memory consumption. This is done by the application specific implementation of the entity actors for example by defining receive timeout (context.setReceiveTimeout). If a message is already enqueued to the entity when it stops itself the enqueued message in the mailbox will be dropped. To support graceful passivation without losing such messages the entity actor can send ShardRegion.Passivate to its parent Shard. The specified wrapped message in Passivate will be sent back to the entity, which is then supposed to stop itself. Incoming messages will be buffered by the Shard between reception of Passivate and termination of the entity. Such buffered messages are thereafter delivered to a new incarnation of the entity.

Why would an Akka actor send a reply to itself?

I have an integration test which sends a lot of messages to a remote Akka (2.0.5) actor. After each test run, the remote actor tree is restarted. After 43 successful test runs, according to the debug-level log messages, the remote actor started to send replies to itself, which obviously caused the test to fail.
Why might this happen?
There is only one place in the codebase where I am sending these type of messages, and it clearly says
sender ! generateTheMessage()
I figured out why it's happening in my particular case. There are actually two actors involved here:
A -> B
A initially queues up messages until the system is initialised. Then it sends the queued up messages on to B, and forwards all further messages to B as soon as they arrive.
The problem is that when it forwards the queued up messages, the original sender information has been lost and so A becomes the sender. So the reply from B goes back to A and is forwarded back to B again. I didn't initially realise the latter forwarding was happening, because I hadn't enabled logging for the forwarding.
So it's a race condition. If the system comes up quickly enough everything is OK, but if not, some initial replies will be misdirected.
How I fixed this was to pair up the sender with each queued message, and resend each queued message using the Java API, which allows specifying the sender explicitly.