We implemented Akka clustering with cluster sharding for a use case.
When we doing load testing for that, We created 1000 entity actors in a Node by cluster Sharding.(Cluster Node)
And we sends messages to that entity actors from a Proxy Node (Shard Region Proxy on other Node).
What we done
akka {
remote {
netty.tcp {
hostname = "x.x.x.x"
port = 255x
akka.cluster {
remember-entities = on
remember-entities-store = ddata
distributed-data.durable.keys = []
Created a Dispatcher thread with 1000 threads and assigned that to Entity actors.(Which is on Cluster Node).
Created a java program which spawn 100 threads and each thread produce message to 10 actors sequentially one by one by the ShardRegion Proxy from Proxy node to Cluster Node.
For each message we wait for acknowledgement from the Entity Actor to the sender thread.Thereafter only next message will be produced.
So at a time 100 parallel messages can be fired.
When i produce 10KB messages with this 100 Parallel threads to 1000 Entity Actors we getting the acknowledgement from Entity actor pretty <40 ms
But when i sending 100KB messages like the same the acknowledgement making 150 to even 200ms delay for each messages.
I know huge message will take more time than small messages.
As i read some blogs and others questions similar like this. They are saying to increase
akka {
remote {
netty.tcp {
# Sets the send buffer size of the Sockets,
# set to 0b for platform default
send-buffer-size = 2MiB
# Sets the receive buffer size of the Sockets,
# set to 0b for platform default
receive-buffer-size = 2MiB
this configurations.
Even after increased this config from 200KB,2MB,10MB,20MB there is no performance gain.
I put some debug log on Endpoint Writer Actor and saw a strange thing, even i have a buffer size 2MB when huge no of messages send to shard Region the Buffer in Endpoint writer is increasing but it writing into the Association-handle one by one.I'm getting logger for each message write to Association Handle(Same Association handle Object Id on each write).
Then is it sequential???
Then how the send and receive buffer used in this cases.?
Some one said increasing Shard count will help.Even after increasing there is no performance gain.
Is that any miss configuration i done or any Configuration i missed?.
Cluster Node have 1000 Entity Actors which split into 3 Shards.
Proxy Node which have 100 parallel threads which produce messages to the Cluster Node.


Distributing tasks over HTTP using SpringBoot non-blocking WebClient (performance question?)

I have a large number of tasks - N, needs to be distributed to multiple http worker nodes via load balancer. Though there exists multiple nodes - n, combining all nodes we have a max-concurrency setting - x.
N > x > n
One node can run those tasks in multiple threads. Mean time consumption for each task is about 50 sec to 1 min. Using Web-Client to distribute tasks and Mono response from Workers.
There exists a distributor and designed the process as follows:
1. Remove a task from queue.
2. Send the task via POST request using Web-Client and subscribe immediately with a subscriber instance
3. Holt new subscription when max concurrency reached to x
4. When any one of the above distributed task completes it calls on-accept(T) method of the subscriber.
5. If task queue is not empty, remove and send the next task / (x+1) task.
6. Keep track of total number completed tasks.
7. If all tasks completed & queue empty set Completable Future object as complete
8. Exit
The above process works fine. Tested with N=5000, n=10 & x=25.
Now the confusion is in this design we always have x number of concurrent subscriptions. As soon as one ends we create another until all tasks are completed. What is the impact of this in large scale production environment? If number of concurrent subscription (the value of x > 10,000) increases via the HTTP(s) load balancer is that going to have serious impact on performance and network latency? Our expected production volume will be something like below:
I will be grateful if some one with knowledge of Reactor and Web-Client expertise comment of this approach. Our main concern is having too many concurrent subscriptions.

Akka streams: Alpakka not using more than one CPU core

We have created a Alpakka stream, which consumes Kafka message from a topic and then process those messages. These messages are processed in parallel, using mapAsyncUnordered with a configured parallelism. The Kafka lag for the consumer increases, but the application uses only 1 core of CPU. I have changed the default dispatchers to, which uses a fork-join executor expecting it to use more than a CPU core. I have my application running in 32 cores.
Please find the configured settings below:
akka.kafka.consumer.use-dispatcher = ""
Consumer stream code:
Consumer.DrainingControl<Done> control = Consumer.committableSource(consumerSettings, Subscriptions.topics(topic))
.buffer( 500, OverflowStrategy.backpressure() )
//De-serialize the response from json to java object
.mapAsyncUnordered( 5, //deserialize the output )
.mapAsyncUnordered(5, //Process it and perform some calculations )
.mapAsyncUnordered( 5, //Do something and return the consumer offset )
//Commit the offset
.toMat( Committer.sink(committerSettings.withMaxBatch(100)), Consumer::createDrainingControl)
.run( materializer );
The stream runs in a akka-cluster, which is load balanced by same consumer group id. We have a typed actor system as well in the application which is used for triggering the request, with a group router which helps in sharing the load across the cluster. The triggered request is sent to a micro service as a Kafka message and we get a response as a Kafka message which is processed by streams. And these messages are not necessarily to be processed in order, hence the use of mapAsyncUnordered…
Tried increasing the parallelism to even 100, but didn’t see a change.
Thanks in advance

akka cluster/remoting dead latter on high volume, slower subscriber

Trying to understand what layer of AKKA clustering is sending occasional messages to dead latter when volume increase or all receiving actor are busy doing the work as well as how to tune it to eliminate such behavior.
Here is basic topology: 2 nodes. Node1 consist of set of actors(lets call them publishing actors) and akka cluster aware router. publishing actors publish messages to the router (RoundRobin) that in turn routs messages to Node2 consisting of worker actors(lest call them subscriber actors) that receive message and do some work and ack back to publishing routers.
Observations: under high rate (well not that high for akka, 10K in 10 sec) of published messages and subscriber workers are busy, i see occasional dead latter from both sides (publishing actors and subscriber actors acking back). The rate of dead latter was almost 30-40% but after profiling and noticing thread starvation and configuring separate dispatcher for cluster and PinnedDispatcher for subscriber workers, we were able to reduce dead latter rate to 1-2%. Worth noting here that high rate of dead latter was observed when default dispatcher with for-join thread pool was used and number of actors higher then number of threads; and much lesser rate when number actors less then number of threads leading us to convulsion that fork-join pool is being used by other akka system processing. Ram, GC and CPU looks under control. It is using default unbounded mail box , so can not be related with buffer. As Far as I know akka doe snot manage back pressure
Of course we do understand that akka doe snot gurantee delivery and we have to implement our own retry logic. Main attempt here is to understand what is causing dead latter: is it occurring on akka remoting, netty transport layer..., are there some time out implemented that can be tuned and configured.
I have spend quite a good chunk of time profiling, adding extra logging , capturing dead latter and logging but did not get any clue on actual cause.
Any hints, things to try or additional information is greatly appreciated
Here is the config we added:
cluster-dispatcher { type = "Dispatcher" executor = "fork-join-executor" fork-join-executor {
parallelism-min = 2
parallelism-max = 4 } }
#usde by worker worker-pinned-dispatcher { executor = "thread-pool-executor" type = PinnedDispatcher }

Spring cloud SQS - Polling interval

Listening to a AWS SQS queue, using spring cloud as follows:
#SqsListener(value = "${}", deletionPolicy = SqsMessageDeletionPolicy.ON_SUCCESS)
public void queueListener(String message, #Headers Map<String, Object> sqsHeaders) {
// code
Spring config:
max-number-of-messages="10" wait-time-out="20" visibility-timeout="3600"
amazon-sqs="awsSqsClient" />
public awsSqsClient() {
ExecutorService executorService = Executors.newFixedThreadPool(10);
return new AmazonSQSAsyncClient(new DefaultAWSCredentialsProviderChain(), executorService);
This works fine.
Configured 10 threads to process these messages in SQS client as you can see above code. This is also working fine, at any point of time maximum 10 messages are processed.
The issue is, I couldn't figure-out a way to control the polling interval. By default spring polls once all threads are free.
i.e. consider the following example
Around 3 messages are delivered to Queue
Spring polls the queue and get 3 messages
3 messages are processing each message take roughly about 20 minutues
In the meantime there are around 25 messages delivered to queue. Spring is NOT polling the queue until all the 3 messages delivered earlier completed. Esentially as per example above Spring polls only after 20 minutes though there are 7 threads still free!!
Any idea how we can control this polling? i.e. Poll should start if there are any threads free and should not wait until all threads become free
Your listener can load messages into your Spring app and submit them to another thread pool along with Acknowledgement and Visibility objects (if you want to control both).
Once messages are submitted to this thread pool, your listener can load more data. You can control the concurrency by adjusting thread pool settings.
Your listener's method signature will be similar to one below:
#SqsListener(value = "${queueName}", deletionPolicy = SqsMessageDeletionPolicy.NEVER)
public void listen(YourCustomPOJO pojo,
#Headers Map<String, Object> headers,
Acknowledgment acknowledgment,
Visibility visibility) throws Exception {
...... Send pojo to worker thread and return
A worker thread then will acknowledge the successful processing
Make sure your message visibility is set to a value that is greater than your highest processing time (use some timeout to limit execution time).

Monitor system for Akka cluster

It's very difficult to keep track of the states of all actors in Akka cluster. I've been searching around the internet for a good system for monitoring Akka cluster system. However, the results were most likely systems to monitor JVM stats. I am curious if there is a system I can use to monitor the statistics below :
What are the active actors, their states and all other attributes.. i.e connect time, role, path, host etc
The status of all active shard regions and their shards
The messages buffered in Akka (Pending messages)
The deadletter mailbox
The status of the coordinators
You could just have some observing actor send messages to the actors you want to see the state of to tell them to send a message back to this observing actor with a snap shot of their state.
You can use Agents somewhat as well but I don't think they are distrubuted however.
If you were looking for some common framework or something to do this then I would suggest trying to bundle some of this behavior to a trait, I don't really know what that would like that because it depends allot on how you invision this behavior working, if all the messages sent back to the observer can be of the same case class or not, ect. can gather matrixes as you want.
for instance :
val myHistogram = Kamon.metrics.histogram("my-histogram")
val myCounter = Kamon.metrics.counter("my-counter")
val myMMCounter = Kamon.metrics.minMaxCounter("my-mm-counter", refreshInterval = 500 milliseconds)
val myTaggedHistogram = Kamon.metrics.histogram("my-tagged-histogram", tags = Map("algorithm" -> "X"))
Also, supports several backends as a datastore of these metrixes.