How to use akka as a replication mechanism - akka

I'm new to akka and intend to use it in my new project as a data replication mechanism.
In this scenario, there is a master server and a replicate data server. The replicate data should contain the same data as the master. Each time a data change occurred in the master, it sends an update message to the replicate server. Here the master server is the Sender, and the Replicate server is the Receiver.
But after digging the docs I'm still not sure how to satisfy the following use cases:
When the receiver crashes, the sender should pile up messages to send, none messages should be lost. It should be able to reconnect to the receiver later and continue with last successful message.
when the sender crashes, it should restart and no messages between restart is lost.
Messages are dealt with the same order they were sent.
So my question is, how to config akka to create a sender and a receiver that could do this?
I'm not sure actor with a DurableMessageBox could solve this. If it could, how can i simulate the above situations for testing?
Update:
After reading the docs Victor pointed at, I now got the point that what I wanted was once-and-only-once pattern, which is extremely costly.
In the akka docs it says
Actual transports may provide stronger semantics, but at-most-once is the semantics you should expect. The alternatives would be once-and-only-once, which is extremely costly, or at-least-once which essentially requires idempotency of message processing, which is a user-level concern.
So inorder to achieve Guaranteed Delivery, I may need to turn to some other MQ solution (for example Kafka), or try to implement once-and-only-once with DurableMessageBox, and see if the complexity with it could be relieved with my specific use case.

You'd need to write your own remoting that utilizes the durable subscriber pattern, as Akka message send guarantees are less strict than what you are going for: http://doc.akka.io/docs/akka/2.0/general/message-send-semantics.html
Cheers,
√

Related

SelectChildName messages from RemoteDeadLetterActorRef

I have two actor systems that communicate via akka remoting.
When I take a look into the JVM heap I am seeing (too) many instances of akka.dispatch.Envelope containing SelectChildName messages from akka.remote.RemoteActorRefProvider$RemoteDeadLetterActorRef.
The retained heap of these messages is pretty large and causes memory problems.
What is the purpose of these SelectChildName messages? Is there a way to avoid them?
FYI This seems to relate with Disassociation errors that occur between the two actor systems.
Thanks,
Michail
SelectChildName messages are used by Akka Remoting to resolve a remote actor. If you see a lot of them, there is a chance you are interacting directly with an ActorSelection, instead of an ActorRef.
Every time you send a message to an ActorSelection, for example (these are taken from the docs)
val selection = context.actorSelection("akka.tcp://actorSystemName#10.0.0.1:2552/user/actorName")
selection ! "Pretty awesome feature"
the - possibly remote - actor is resolved, and that involves exchanging of SelectChildName messages by the underlying Akka infrastructure.
If that's the case, try and use directly ActorRefs. You can obtain one from an ActorSelection by using the resolveOne method.
Citing the docs again:
It is always preferable to communicate with other Actors using
their ActorRef instead of relying upon ActorSelection. Exceptions are
sending messages using the At-Least-Once Delivery facility
initiating first contact with a remote system

Akka.net load balancing and span out processing

I am looking to build a system that is able to process a stream of requests that needs a long processing time say 5 min each. My goal is to speed up request processing with minimal resource footprint which at times can be a burst of messages.
I can use something like a service bus to queue the request and have multiple process (a.k.a Actors in akka) that can subscribe for a message and start processing. Also can have a watchdog that looks at the queue length in the service bus and create more actors/ actor systems or stop a few.
if I want to do the same in the Actor system like Akka.net how can this be done. Say something like this:
I may want to spin up/stop new Remote Actor systems based on my request queue length
Send the message to any one of the available actor who can start processing without having to check who has the bandwidth to process on the sender side.
Messages should not be lost, and if the actor fails, it should be passed to next available actor.
can this be done with the Akka.net or this is not a valid use case for the actor system. Can some one please share some thoughts or point me to resources where I can get more details.
I may want to spin up/stop new Remote Actor systems based on my request queue length
This is not supported out of the box by Akka.Cluster. You would have to build something custom for it.
However Akka .NET has pool routers which are able to resize automatically according to configurable parameters. You may be able to build something around them.
Send the message to any one of the available actor who can start processing without having to check who has the bandwidth to process on the sender side.
If you look at Akka .NET Routers, there are various strategies that can be used to assign work. SmallestMailbox is probably the closest to what you're after.
Messages should not be lost, and if the actor fails, it should be passed to next available actor.
Akka .NET supports At Least Once Delivery. Read more about it in the docs or at the Petabridge blog.
While you may achieve some of your goals with Akka cluster, I wouldn't advise that. From your requirements it clearly states that your concerns are oriented about:
Reliable message delivery (where service buses and message queues are better option). There are a lot of solutions here, depending on your needs i.e. MassTransit, NServiceBus or queues (RabbitMQ).
Scaling workers (which is infrastructure problem and it's not solved by actor frameworks themselves). From what you've said, you don't even even need a cluster.
You could use akka for building a message processing logic, like workers. But as I said, you don't need it if your goal is to replace existing service bus.

How to setup a ZMQ PUB/SUB pattern to serve only for pre-authorized subscriber(s)

How can I implement or do kind of "hack" in PUB-SUB pattern to get an ability to publish only to authorized subscribers, disconnect unauthorized subscribers etc?
I googled for this problem, but all the answers very similar to set subscribe filter in subscriber side.
But I want, as I said, publish my updates from PUB only to those clients that passed an authorization, or have some secret key, that was received in REQ-REP.
Thanks for any ideas.
Read Chapter 5 of The Guide, specifically the section called "Pros and Cons of Pub-Sub".
There are many problems with what you're trying to accomplish in the way you're trying to accomplish it (but there are solutions, if you're willing to change your architecture).
Presumably you need the PUB socket to be generally accessible to the world, whether that's the world at large or just a world consisting of some sockets which are authorized and some sockets which are not. If not, you can just control access (via firewall) to the PUB socket itself to only authorized machines/sockets.
When a PUB socket receives a new connection, it doesn't know whether the subscriber is authorized or not. PUB cannot receive actual communication from SUB sockets, so there's no way for the SUB socket to communicate its authorization directly. XPUB/XSUB sockets break this limitation, but it won't help you (see below).
No matter how you communicate a SUB socket's authorization to a PUB socket, I'm not aware of any way for the PUB socket to kill or ignore the SUB socket's connection if it is not authorized. This means that an untrusted SUB socket can subscribe ALL ('') and receive all messages from the PUB socket, and the PUB socket can't do anything about it. If you trust the SUB socket to police itself (you create the connecting socket and control the machines it's deployed on), then you have options to just subscribe to a "control" topic, send an authorization, and have the PUB socket feed back the channels/topics that you are allowed to subscribe to.
So, this pretty much kills it for achieving general security in a PUB/SUB paradigm that is publicly accessible.
Here are your options:
Abandon PUB/SUB - The only way you can control exactly which peer you send to every single time on the sending side (that I'm aware of) is with a ROUTER socket. If you use ROUTER/DEALER, the DEALER socket can send it's authorization, the ROUTER socket stores that with its ID, and when something needs to be sent out, it just finds all connected sockets that are authorized and sends it, sequentially, to each of them. Whether this is feasible or not depends on the number of sockets and the workload (size and number of messages).
Encrypt your messages - You've already said this is your last resort, but it may be the only feasible answer. As I said above, any SUB socket that can access your PUB socket can just subscribe to ALL ('') messages being sent out, with no oversight. You cannot effectively hide your PUB socket address/port, you cannot hide any messages being sent out over that PUB socket, but you can hide the content of those messages with encryption. Proper method of key sharing depends on your situation.
As Jason has shown you an excellent review on why ( do not forget to add a +1 to his remarkable answer, ok? ), let me add my two cents on how:
Q: How?
A: Forget about PUB/SUB archetype and create a case-specific one
Yes. ZeroMQ is rather a very powerful can-do toolbox, than a box-of-candies you are forbidden to taste and choose from to assemble your next super-code.
This way your code is and remains in power of setting both controls and measures for otherwise uncontrollable SUB-side code behaviour.
Creating one's own, composite, layered messaging solution is the very power ZeroMQ brings to your designs. There you realise you are the master of distributed system design. Besides the academic examples, no one uses the plain primitive-behaviour-archetypes, but typically composes more robust and reality-proof composite messaging patterns for the production-grade solutions.
There is no simple one-liner to make your system use-case work.
While it need not answer all your details, you may want to read remarks
on managing PUB/SUB connections
on ZeroMQ authorisation measures.

Akka - keeping reference to client

All,
I'm dealing with what seems like a simple case, but one that poses some design challenges:
There's a local actor system with a client, which reaches out to a
remote system that runs the bulk of the business logic.
The remote system will have a fixed IP address, port, etc. -
therefore, one can use the context.actorSelection(uri) strategy to
get a hold of the ActorRef for the current incarnation of the actor
(or a group of routees behind a router).
The remote system, being a server, shouldnt be in the business of
knowing the location of the client.
Given this, it's pretty straightforward to propagate messages from the client to the server, process them, and send a message back to the client. Even if there are several steps, one can propagate the responses through the hierarchy until one reaches the top-level remote actor that the client called, which will know who the sender was on the client side.
Let's say on the server side, we have a Master actor that has a router of Worker actors. You can have the worker respond directly to the client, since the message received from the Client by the Master can be sent to the Worker via the router as "router.tell(message, sender)" instead of "router ! message." Of course, you can also propagate responses from the Worker to the Master and then to the Client.
However, let's say the Worker throws an exception. If its parent (the Master) is its supervisor and it handles the Workers' failures, the Master can do the usual Restart/Resume/Stop. But let's say that we also want to notify the Client of the failure, e.g. for UI updating purposes. Even if we handle the failure of the Worker via the Master's SupervisorStrategy, we won't know who the original caller was (the Client) that had the processing request payload at the time when the Master intercepted the Worker's failure.
Here's a diagram
Client (local) -> Master (remote) -> Router (remote) -> Worker (remote)
Worker throws an exception, Master handles it. Now the Master can restart the Worker, but it doesn't know which Client to notify, in case there are several, their IP addresses change, etc.
If there's one Client and the Client has a host/port/etc. known to the server, then one could use context.actorSelection(uri) to look up the client and send it a message. However, with the server not being in the business of knowing where the Client is coming from (#3), this shouldn't be a requirement.
One obvious solution to this is to propagate messages from the Client to the Worker with the Client's ActorRef in the payload, in which case the Master would know about whom to send the failure notification to. It seems ugly, though. Is there a better way?
I suppose the Client can have the Workers on DeathWatch, but the Client shouldn't really have to know the details of the actor DAG on the server. So, I guess I'm coming back to the issue of whether the message sent from the Client should contain not just the originally intended payload, but also the ActorRef of the Client.
Also, this brings another point. Akka's "let it crash" philosophy suggests that the actor's supervisor should handle the actor's failures. However, if we have a Client, a Master (with a router) and a Worker, if the Worker fails, the Master can restart it - but it would have to tell the Client that something went wrong. In such a case, the Master would have to correlate the messages from the Client to the Workers to let the Client know about the failure. Another approach is to send the ActorRef of the Client along with the payload to the Worker, which would allow the Client to use the standard try/catch approach to intercept a failure, send a message to the client before failing, and then throw an exception that would be handled by the Master. However, this seems against Akka's general philosophy. Would Akka Persistence help in this case, since Processors track message IDs?
Thanks in advance for your help!
Best,
Marek
Quick answer, use this:
def preRestart(reason: Throwable, message: Option[Any]): Unit
More elaborate answer that gives no easy answers (as I struggle with this myself):
There are several ideas on how you can achieve what you need.
The question you asked should the worker answer the client or the master. Well that depends.
Let's assume that client sends you some work W1 and you pass it to the worker. The worker fails. Now the question is, if that work was important? If so, the master should still hold the reference to the W1 as it will probably retry the attempt in some near future. Maybe it was some data that should be persisted and the connection to database was lost for a second?
It the work was not important you may just set a timeout on the client that the operation was unsuccesfull and you're done. This way you will lost the exception details. But maybe it does not matter? You only want to check the logs afterwards and you just give a '500 Server Error' response.
This is not as easy to answer as it seems at first.
One possibility is to side-step most of this complexity by changing your approach. This may or may not be feasible for your use case.
For example, if the Exception can be anticipated and it is not of a sort that requires a restart of the worker actor, then don't let the master supervisor handle it. Simply build an appropriate response for that Exception (possible the Exception itself, or something containing the Exception), and send that to the client as a normal response message. You could send a scala Try message, for example, or create whatever messages make sense.
Of course, there are non-expected Exceptions, but in this case, the actor dealing with the UI can simply time-out and return a general error. Since the exception is unexpected, you probably wouldn't be able to do better than a general UI error anyway (e.g. a 500 error if the UI is HTTP-based), even if the exception was propagated to that layer. One downside of course is that the timeout will take longer to report the problem to the UI than if the error was propagated explicitly.
Lastly, I don't think there is anything wrong at all with sending ActorRef's as part of the payload, to handle this case from within the master actor as you suggested. I believe ActorRef was designed explicitly with the intent of sending them between actors (including remote actors). From the ScalaDoc of ActorRef:
Immutable and serializable handle to an actor, which may or may not reside on the local host or inside the same ActorSystem.
...
ActorRefs can be freely shared among actors by message passing.

Designing an architecture for exchanging data between two systems

I've been tasked with creating an intermediate layer which needs to exchange data (over HTTP) between two independent systems (e.g. Receiver <=> Intermediate Layer (IL) <=> Sender). Receiver and Sender both expose a set of API's via Web Services. Everytime a transaction occurs in the Sender system, the IL should know about it (I'm thinking of creating a Windows Service which constantly pings the Sender), massage the data, then deliver it to the Receiver. The IL can temporarily store the data in a SQL database until it is transferred to the Receiver. I have the following questions -
Can WCF (haven't used it a lot) be used to talk to the Sender and Receiver (both expose web services)?
How do I ensure guaranteed delivery?
How do I ensure security of the messages over the Internet?
What are best practices for handling concurrency issues?
What are best practices for error handling?
How do I ensure reliability of the data (data is not tampered along the way)
How do I ensure the receipt of the data back to the Sender?
What are the constraints that I need to be aware of?
I need to implement this on MS platform using a custom .NET solution. I was told not to use any middleware like BizTalk. The receiver is an SDFC instance, if that matters.
Any pointers are greatly appreciated. Thank you.
A Windows Service that orchestras the exchange sounds fine.
Yes WCF can deal with traditional Web Services.
How do I ensure guaranteed delivery?
To ensure delivery you can use TransactionScope to handle the passing of data between the
Receiver <=> Intermediate Layer and Intermediate Layer <=> Sender but I wouldn't try and do them together.
You might want to consider some sort of queuing mechanism to send the data to the receiver; I guess I'm thinking more of a logical queue rather than an actual queuing component. A workflow framework could also be an option.
make sure you have good logging / auditing in place; make sure it's rock solid, has the right information and is easy to read. Assuming you write a service it will execute without supervision so the operational / support aspects are more demanding.
Think about scenarios:
How do you manage failed deliveries?
What happens if the reciever (or sender) is unavailbale for periods of time (and how long is that?); for example: do you need to "escalate" to an operator via email?
How do I ensure security of the messages over the Internet?
HTTPS. Assuming other existing clients make calls to the Web Services how do they ensure security? (I'm thinking encryption).
What are best practices for handling concurrency issues?
Hmm probably a separate question. You should be able to find information on that easily enough. How much data are we taking? what sort of frequency? How many instances of the Windows Service were you thinking of having - if one is enough why would concurrency be an issue?
What are best practices for error handling?
Same as for concurrency, but I can offer some pointers:
Use an established logging framework, I quite like MS EntLibs but there are others (re-using whatever's currently used is probably going to make more sense - if there is anything).
Remember that execution is unattended so ensure information is complete, clear and unambiguous. I'd be tempted to log more and dial it down once a level of comfort is reached.
use a top level handler to ensure nothing get's lost; but don;t be afraid to log deep in the application where you can still get useful context (like the metadata of the data being sent / recieved).
How do I ensure the receipt of the data back to the Sender?
Include it (sending the receipt) as a step that is part of the transaction.
On a different angle - have a look on CodePlex for ESB type libraries, you might find something useful: http://www.codeplex.com/site/search?query=ESB&ac=8
For example ESBasic which seems to be a class library which you could reuse.