Akka concurrent message proccesing by preserving the messages order - concurrency

In our project, we publish and consume messages to/from a JMS broker. We have a PL/SOL producer and a Java consumer. The problem is however; he producer is 10 times faster than the consumer. Theofore we want to change the consumerr to work with multiple threads while reading and processing the messages.
But we need to preserve the order of the messages as well. That said, the messages shall be sent to the target system in the order they were published to the jms broker. I'm new to Akka and i'm trying to understand its features. Can we achieve that using akka dispatchers ?

Assuming you want to parallelize the consumption inside a single instance of JVM, what you describe is a good case for Akka Streams. This can be solved with the Actors, but you risk of running out of memory if the producer is too fast, because you'll need to queue the results for re-ordering.
Akka Streams handle this problem with the introduction of backpressure. If consumer can't keep up with the producer, it will indicate this, and producer will reduce the rate. Akka Streams also can maintain the order of the messages.
Akka Streams is a 1.0 software, so it's not yet as battle-hardened as pure Akka, but it's based on Akka and is coming from the Akka team, so it should be good and become even better in the future. Also the documentation is not organized in the best way possible yet.
It's also important to mention that Akka Streams, while implemented using Akka, is quite different paradigm than Actor Model or Future combinations. It's based on stream processing paradigm, so you'll have to adjust the way you think about your programs. Might be an issue for some teams.

Related

Use akka streams or akka actors for dynamic pipeline creation

I have to implement a process which polls messages from a queue and forwards the data to a http endpoint. Normally I think the best approach for this would be to use Akka streams (with akka http), because it handles backpressure etc. for you. The problem is that I have to create multiple of these pipelines on demand during runtime (based on a http call) with different configurations. Another requirement is that it should be possible to dynamically stop one of those pipelines during runtime and start it with another configuration.
I'm currently not sure if it is really possible to dynamically spin up new parallel akka streams pipelines during runtime (and also remove and restart some of them). For me it currently sounds better to use akka actors for this (based on a queue consumer actor and a akka http actor and I could create a pair dynamically during runtime), but to use them I also have to implement backpressure etc. manually.
Do you think it would be possible to create this via akka streams or would it be a hassle?
Thanks in advance!
You cant reroute the stream once its meterialized and running. But since you configure streams programmatically it is very handy to dynamically define new streams and run them. Everything depends on certain use case but I'd go for single coordinator actor which receives queue messages and dispatch them to the given stream. The same coordinator would receive configuration change requests so it configure new streams, removes old ones etc.
Dont go for manually wiring actors. Its hard to follow and maintain.

Communication between bounded contexts in akka cluster

I'm struggling with proper design of communication between 2 separate akka microservices/bounded contexts in akka cluster.
Lets say we have 2 microservices in cluster per node.
Both microservices are based on akka.
It's clear for me that communication within particular bounded context will be handled by sending messages from actor to actor or from actor on node1 to actor on node2 (if necessary)
Q: but is it OK to use similar communication between separate akka application? e.g. boundedContext1.actor --message--> boundedContext2.actor
or it should be done via much clearer boundaries: In bc1 raise an event - publish to broker and read the event in bc2?
//Edit currently we've implemented a service registry an we're publishing events to service registry via Akka streams.
I think there is no universal answer here.
It is like if your BCs are simple enough you can hold the BCs in one application and even in one project/library with very weak boundaries i.e. just placing them into separate namespaces and providing an API for other BCs.
But if your BCs become more complex, more independent and require its own deployment cycle then it is definitely better to construct more strong boundaries and separate microservices that communicate through message broker.
So, my answer is you should just "feel" the right way according to your particular needs. If you don't "feel" it then follow KISS principle and start with an easier way i.e. using built-in akka communication system. And if in the future your BCs will become more complex then you will have to refactor them. But this decision will be justified and it will not be an unnecessary overhead.

SelectChildName messages from RemoteDeadLetterActorRef

I have two actor systems that communicate via akka remoting.
When I take a look into the JVM heap I am seeing (too) many instances of akka.dispatch.Envelope containing SelectChildName messages from akka.remote.RemoteActorRefProvider$RemoteDeadLetterActorRef.
The retained heap of these messages is pretty large and causes memory problems.
What is the purpose of these SelectChildName messages? Is there a way to avoid them?
FYI This seems to relate with Disassociation errors that occur between the two actor systems.
Thanks,
Michail
SelectChildName messages are used by Akka Remoting to resolve a remote actor. If you see a lot of them, there is a chance you are interacting directly with an ActorSelection, instead of an ActorRef.
Every time you send a message to an ActorSelection, for example (these are taken from the docs)
val selection = context.actorSelection("akka.tcp://actorSystemName#10.0.0.1:2552/user/actorName")
selection ! "Pretty awesome feature"
the - possibly remote - actor is resolved, and that involves exchanging of SelectChildName messages by the underlying Akka infrastructure.
If that's the case, try and use directly ActorRefs. You can obtain one from an ActorSelection by using the resolveOne method.
Citing the docs again:
It is always preferable to communicate with other Actors using
their ActorRef instead of relying upon ActorSelection. Exceptions are
sending messages using the At-Least-Once Delivery facility
initiating first contact with a remote system

Akka.net load balancing and span out processing

I am looking to build a system that is able to process a stream of requests that needs a long processing time say 5 min each. My goal is to speed up request processing with minimal resource footprint which at times can be a burst of messages.
I can use something like a service bus to queue the request and have multiple process (a.k.a Actors in akka) that can subscribe for a message and start processing. Also can have a watchdog that looks at the queue length in the service bus and create more actors/ actor systems or stop a few.
if I want to do the same in the Actor system like Akka.net how can this be done. Say something like this:
I may want to spin up/stop new Remote Actor systems based on my request queue length
Send the message to any one of the available actor who can start processing without having to check who has the bandwidth to process on the sender side.
Messages should not be lost, and if the actor fails, it should be passed to next available actor.
can this be done with the Akka.net or this is not a valid use case for the actor system. Can some one please share some thoughts or point me to resources where I can get more details.
I may want to spin up/stop new Remote Actor systems based on my request queue length
This is not supported out of the box by Akka.Cluster. You would have to build something custom for it.
However Akka .NET has pool routers which are able to resize automatically according to configurable parameters. You may be able to build something around them.
Send the message to any one of the available actor who can start processing without having to check who has the bandwidth to process on the sender side.
If you look at Akka .NET Routers, there are various strategies that can be used to assign work. SmallestMailbox is probably the closest to what you're after.
Messages should not be lost, and if the actor fails, it should be passed to next available actor.
Akka .NET supports At Least Once Delivery. Read more about it in the docs or at the Petabridge blog.
While you may achieve some of your goals with Akka cluster, I wouldn't advise that. From your requirements it clearly states that your concerns are oriented about:
Reliable message delivery (where service buses and message queues are better option). There are a lot of solutions here, depending on your needs i.e. MassTransit, NServiceBus or queues (RabbitMQ).
Scaling workers (which is infrastructure problem and it's not solved by actor frameworks themselves). From what you've said, you don't even even need a cluster.
You could use akka for building a message processing logic, like workers. But as I said, you don't need it if your goal is to replace existing service bus.

Performing an asynchronous transformation within a Kafka Stream

Assume I have two Kafka topics, A and B. I am trying to develop a system that pulls records from A, applies a transformation to each record, then publishes the transformed records to B. In this case, the transformation involves calling a REST endpoint over HTTP.
Being relatively new to Kafka, I was glad to see that the Kafka Streams project already solved this type of problem (consume-transform-publish). Unfortunately, I discovered that transformations in Kafka streams are blocking operations. Instinctively, I try to call HTTP endpoints in a non-blocking, asynchronous manner.
Does this mean that Kafka Streams will not work in this situation? Does this mean that I must revert back to calling the REST endpoint in a blocking manner? Is this even an acceptable pattern for Kafka Streams? Stream-based data processing is still relatively new to me, so I am not entirely familiar with its concurrency models.
Update: after looking in to this further, I am not sure that this is the right answer...
I am new to Kafka and Kafka Streams (hereafter referred to as "Kafka"), but having encountered and considered similar questions, here is my perspective:
Kafka has two salient features:
All parallelism is achieved through the partitioning of topics
Within a partition of a topic, processing is strongly ordered, one-at-a-time.
Many really nice properties fall out from these features. For example, stream-based "transactions", I think, is one of the coolest.
But whether these properties are actually "features" in the sense that you want them, of course, depends on the application. If you don't want strongly ordered processing with parallelism based on topic partitioning, then you might not want to be using Kafka for that application.
So, with regard to:
Does this mean that Kafka Streams will not work in this situation?
It will work, but increased parallelism is achieved through increased partitioning.
Does this mean that I must revert back to calling the REST endpoint in a blocking manner?
Yes, I think it does—but I'm not sure why that would be a "reversion". Personally, that's what I like about Kafka: blocking code is simpler. If I want more parallelism, I can run more threads. There's no shared state, after all.