MQTTNet slow performance with 10K clients - mqttnet

I ran few performance tests with MQTTNet and got unexpected results
Looking for help to see if there are some configuration changes that I missed or there are MQTTNet implementation limitation.
Environment:
I have 3 processes on one box : Sender -> Server -> Subscriber
Tests:
Messages Sender Server Subscriber Time
100K 1 instance sends 100K messages -> 1 -> 1 instance 30sec
100K (1000*100) 1k instances sends 100 messages -> 1 -> 1 instance 32sec
100k (10000*10) 10K instances sends 10 messages -> 1 -> 1 instance 340sec
I see that moving from 1 instance to 1000 do not significantly impact performance, but moving to 10K senders (1 process with 10k client instances) significantly impact performance.
In last test all messages delivered to server in 50sec and after server sends messages to subscriber.
after 50 sec CPU dropped to 25%
Server start
var optionsBuilder = new MqttServerOptionsBuilder()
.WithConnectionBacklog(10000)
.WithDefaultEndpointPort(8883)
.WithDefaultEndpointBoundIPAddress(IPAddress.Parse("xxx"))
.WithApplicationMessageInterceptor(context =>
{
Interlocked.Increment(ref cnt);
}
)
.WithMaxPendingMessagesPerClient(100);
mqttServer = new MqttFactory().CreateMqttServer();
await mqttServer.StartAsync(optionsBuilder.Build());
Client Start
var options = new MqttClientOptionsBuilder()
.WithClientId(name)
.WithTcpServer("xxxx", 8883)
.Build();
var factory = new MqttFactory();
mqttClient = factory.CreateMqttClient();
mqttClient.UseApplicationMessageReceivedHandler(e =>
{
Interlocked.Increment(ref cnt);
});
// for receiver
mqttClient.UseConnectedHandler(async e =>
{
Console.WriteLine("### CONNECTED WITH SERVER ###");
await mqttClient.SubscribeAsync(new TopicFilterBuilder().WithTopic("my/topic").Build());
Console.WriteLine("### SUBSCRIBED ###");
});
MQTTNet version 3.0.11,
OS Windows 10
Why with 10K clients I observe so significant degradation while almost no degradation with 1000 clients.
Can I miss some configuration ?

Related

AWS S3 Client throws "Operator called default onErrorDropped\njava.util.concurrent.CompletionException"

I am using the below AWS S3 AsyncClient from my Spring webflux application.As part of performance testing I had to introduce eventLoopGroup in S3 Client Config below to enable the S3 operations when the request rate was really high with 50 concurrent users triggering 20 request each having 5 documents to upload to S3.Before this I was getting the below error
reactor.core.publisher.Operators : Operator called default onErrorDropped\njava.util.concurrent.CompletionException: software.amazon.awssdk.core.exception.SdkClientException: Unable to execute HTTP request: Acquire operation took longer than the configured maximum time. This indicates that a request cannot get a connection from the pool within the specified maximum time. This can be due to high request rate.\nConsider taking any of the following actions to mitigate the issue: increase max connections, increase acquire timeout, or slowing the request rate.\nIncreasing the max connections can increase client throughput (unless the network interface is already fully utilized), but can eventually start to hit operation system limitations on the number of file descriptors used by the process
Introducing eventLoopGroup with numberOfThreads as 15 drastically reduced the error count to 6 from around a 1000. However once it fails, it didn't allow the next requests and failed with the same error.I was assuming that after sometime the used threads will be returned to the thread pool and would be available for use.But I have to restart the spring webflux application to get rid of the errors.Kindly let me know that the changes I have made below are correct.
#Bean
public S3AsyncClient s3client() {
SdkAsyncHttpClient httpClient =
NettyNioAsyncHttpClient.builder()
.readTimeout(Duration.ofSeconds(30))
.writeTimeout(Duration.ZERO)
.maxConcurrency(300)
.eventLoopGroup(SdkEventLoopGroup.builder().numberOfThreads(15).build())
.connectionAcquisitionTimeout(Duration.ofSeconds(30))
.build();
S3Configuration serviceConfiguration =
S3Configuration.builder()
.checksumValidationEnabled(false)
.chunkedEncodingEnabled(true)
.build();
S3AsyncClientBuilder s3AsyncClientBuilder =
S3AsyncClient.builder()
.httpClient(httpClient)
.region(Region.AP_SOUTHEAST_2)
.serviceConfiguration(serviceConfiguration);
return s3AsyncClientBuilder.build();
}

Actor Cluster Slowness when Large size messages produced by a ShardRegion proxy

We implemented Akka clustering with cluster sharding for a use case.
When we doing load testing for that, We created 1000 entity actors in a Node by cluster Sharding.(Cluster Node)
And we sends messages to that entity actors from a Proxy Node (Shard Region Proxy on other Node).
What we done
Using
akka.cluster.shard.remember.entities=ddata
akka {
remote {
netty.tcp {
hostname = "x.x.x.x"
port = 255x
}
}
}
akka.cluster {
sharding{
remember-entities = on
remember-entities-store = ddata
distributed-data.durable.keys = []
}
}
Created a Dispatcher thread with 1000 threads and assigned that to Entity actors.(Which is on Cluster Node).
Created a java program which spawn 100 threads and each thread produce message to 10 actors sequentially one by one by the ShardRegion Proxy from Proxy node to Cluster Node.
For each message we wait for acknowledgement from the Entity Actor to the sender thread.Thereafter only next message will be produced.
So at a time 100 parallel messages can be fired.
When i produce 10KB messages with this 100 Parallel threads to 1000 Entity Actors we getting the acknowledgement from Entity actor pretty fast.like <40 ms
But when i sending 100KB messages like the same the acknowledgement making 150 to even 200ms delay for each messages.
I know huge message will take more time than small messages.
As i read some blogs and others questions similar like this. They are saying to increase
akka {
remote {
netty.tcp {
# Sets the send buffer size of the Sockets,
# set to 0b for platform default
send-buffer-size = 2MiB
# Sets the receive buffer size of the Sockets,
# set to 0b for platform default
receive-buffer-size = 2MiB
}
}
this configurations.
Even after increased this config from 200KB,2MB,10MB,20MB there is no performance gain.
I put some debug log on Endpoint Writer Actor and saw a strange thing, even i have a buffer size 2MB when huge no of messages send to shard Region the Buffer in Endpoint writer is increasing but it writing into the Association-handle one by one.I'm getting logger for each message write to Association Handle(Same Association handle Object Id on each write).
Then is it sequential???
Then how the send and receive buffer used in this cases.?
Some one said increasing Shard count will help.Even after increasing there is no performance gain.
Is that any miss configuration i done or any Configuration i missed?.
NOTE:
Cluster Node have 1000 Entity Actors which split into 3 Shards.
Proxy Node which have 100 parallel threads which produce messages to the Cluster Node.

Event hub Send event to random partitions but exactly one partition

I have event hub publisher but it is duplicating messages across random partitions multiple times . I want parallel messages for huge number of messages coming in which should go into random but exactly in one partition from where the consumer should get the data .
How do I do that . This is causing the message to be duplicated .
EventHubProducerClientOptions producerClientOptions = new EventHubProducerClientOptions
{
RetryOptions = new EventHubsRetryOptions
{
Mode = EventHubsRetryMode.Exponential,
MaximumRetries = 30,
TryTimeout = TimeSpan.FromSeconds(5),
Delay = TimeSpan.FromSeconds(10),
MaximumDelay = TimeSpan.FromSeconds(15),
}
};
using EventDataBatch eventBatch = await producerClient.CreateBatchAsync();
// Add events to the batch. An event is a represented by a collection of bytes and metadata.
eventBatch.TryAdd(eventMessage);
string logInfo = $"[PUBLISHED - [{EventId}]] =======> {message}";
logger.LogInformation(logInfo);
// Use the producer client to send the batch of events to the event hub
await producerClient.SendAsync(eventBatch);
Your code sample is publishing your batch to the Event Hubs gateway, where events will be routed to a partition. For a successful publish operation, each event will be sent to one partition only.
"Successful" is the key in that phrase. You're configuring your retry policy with a TryTimeout of 5 seconds and allowing 30 retries. The duplication that you're seeing is most likely caused by your publish request timing out due to the very short interval, being successfully received by the service, but leaving the service unable to acknowledge success. This will cause the client to consider the operation a failure and retry.
By default, the TryTimeout interval is 60 seconds. I'm not sure why you've chosen to restrict the timeout to such a small value, but I'd strongly advise considering changes. Respectfully, unless you've done profiling and measuring to prove that you need to make changes, I'd advise using the default values for retries in their entirety.

Concurrency and ultimate thread group Setup

I want to use Concurrency thread group, so I'm using this configuration
Why I'm expecting is to send 10 requests in 5 seconds, and hold them for 1 second but the result after running my script is this, more than 10 http request are send.
How can I control only send 10 requests?
Thank you.
A similar behaviour happens with Ultimate thread group
You're not sending 10 requests in 5 seconds, you're launching 5 threads (virtual users) in 5 seconds, to wit JMeter will add 2 virtual users each second for 5 seconds and then hold the load for 1 second.
The actual number of requests which will be made depends on your application response time, higher response time - less requests, lower response time - more requests.
If you want to send exactly 10 requests in 5 seconds evenly distributed go for the following configuration:
Normal Thread Group with users * loops = 10, to wit:
10 users - 1 loop
5 users - 2 loops
etc.
Throughput Controller in Total Executions mode and Throughput set to 10
HTTP Request
Throughput Shaping Timer configured to send 2 requests per second

Can SQS scale up to 1,000,000 queues for a single account?

I need a messaging service that allows me to create a channel for each user in order to facilitate real-time notifications. If I have somewhere between 100,000 and 1 million users, does it make sense to create an SQS queue for each of these users?
According to the SQS pricing documentation it would only cost $0.40 to create 1 million queues, but would I run into scaling problems?
Also, is there a way to set an expiration date on a queue? If a user deletes their account, then their queue no longer needs to exist.
Creating queues is not an issue here. Polling or even long polling the queue is going to be really expensive for you. In order to process real-time notifications, you need to poll every queue, 1M of them for lets say every 5 seconds.
Based on SQS Pricing, Price per 1 Million Requests after free tier is $0.00000040 per request.
That means you will be calling the ReceiveMessage API for about:
1000000 queues * 17280 (1 day in seconds / 5 seconds) = 17280000000 times.
Which is about $6912.00 per day for the worst case scenarios.
You need to architect the solution in a better way.
"a channel for each user in order to facilitate real-time notifications" - you don't need a dedicated queue per user for this - you can do this with one main messaging queue, and depending on your traffic patterns, probably a few overflow queues to deal with ultra-high-traffic users.
"One queue" you say? "How on earth will that scale to 1M users?"
The number of users doesn't matter. What matters is that your message consumption can keep up with message production (incoming messages per second). If you can do that, it'll seem like realtime to your users.
Message consumption can scale to as high as you're willing to spend - just spawn up a thread to handle each incoming message (use a thread pool!)
of course, you'll need to limit each host to X processing threads, based on how many it can handle (hence 'as high as you're willing')
the overflow queues are to keep costs under control - if you're scaled to handle 10K messages per second, you don't want a user to come along and send you 1M messages per second, knocking out your service and the rest of your customers - throttle them to some reasonable limit, and process the rest of those messages at a lower priority.
"But... millions." - Yes. SQS can handle a lot. And a single multi-tenant queue will scale much better in terms of architecture and cost than multiple single-tenant channels ever will.
Most AWS resource volumes are limited, and while I'm not finding any account limits on numbers of queues, I may have missed it or it may just not be published. I definitely wouldn't be excited about the queue per notification destination architecture you're pitching here, if my co-workers brought it to me. I would be concerned about the cost of putting the same notification to all the listeners' queues, and then reading them back out.
What you're describing sounds more like pub sub. Or, if you want better delivery guarantees, maybe a stream like kinesis or kafka. I've also heard of folks using Redis to implement this kind of thing.
Could you potentially design a queue consumer that pauses after a certain period of idle time to prevent unnecessary API calls?
Something like:
const AWS = require('aws-sdk');
// Set the AWS region
AWS.config.update({ region: 'YOUR_REGION' });
// Set the parameters for the consumer
const IDLE_TIMEOUT = 300; // Stop Polling after 300 seconds of idle time
const POLLING_INTERVAL = 10; // Poll the queue every 10 seconds
// Create an SQS client
const sqs = new AWS.SQS();
// Set a flag to control the Polling loop
let isPolling = false;
function startPolling() {
isPolling = true;
}
function stopPolling() {
isPolling = false;
}
async function processMessage(message) {
// Do something with the message here
console.log(message);
}
async function pollQueue() {
// Set the idle timer to 0
let idleTimer = 0;
while (isPolling) {
// Check if the idle timer has reached the timeout
if (idleTimer > IDLE_TIMEOUT) {
// Stop Polling
stopPolling();
break;
}
// Poll the queue for messages
const params = {
QueueUrl: 'YOUR_QUEUE_URL',
MaxNumberOfMessages: 10,
WaitTimeSeconds: POLLING_INTERVAL,
};
const data = await sqs.receiveMessage(params).promise();
// Get the messages from the response
const messages = data.Messages || [];
// Process the messages
for (const message of messages) {
await processMessage(message);
// Delete the message from the queue
const deleteParams = {
QueueUrl: 'YOUR_QUEUE_URL',
ReceiptHandle: message.ReceiptHandle,
};
await sqs.deleteMessage(deleteParams).promise();
// Reset the idle timer
idleTimer = 0;
}
// Increment the idle timer
idleTimer += POLLING_INTERVAL;
// Sleep for the Polling interval
await new Promise((resolve) => setTimeout(resolve, POLLING_INTERVAL * 1000));
}
}
// Start the Polling loop when the startPolling function is called
startPolling();
pollQueue();
This way you could only activate the consumer after some process begins, and avoid constant polling of the queue when the service is inactive.