How to allow more concurrent client connections with Netty?

How to allow more concurrent client connections with Netty? - concurrency

First, thanks all the Netty contributors for the great library. I have been happily using it for several weeks.
Recently, I started to load test my system but now I'm experiencing some scalability problem with Netty. I tried to fork as many simultaneous Netty clients as possible to connect to a Netty server. For small number of clients (<50), the system just works fine. However, for large number of clients (>100), I find the client side always prompts the "ClosedChannelException":
java.nio.channels.ClosedChannelException
at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$1.operationComplete(NioClientSocketPipelineSink.java:157)
at org.jboss.netty.channel.DefaultChannelFuture.notifyListener(DefaultChannelFuture.java:381)
at org.jboss.netty.channel.DefaultChannelFuture.notifyListeners(DefaultChannelFuture.java:367)
at org.jboss.netty.channel.DefaultChannelFuture.setSuccess(DefaultChannelFuture.java:316)
at org.jboss.netty.channel.AbstractChannel$ChannelCloseFuture.setClosed(AbstractChannel.java:351)
at org.jboss.netty.channel.AbstractChannel.setClosed(AbstractChannel.java:188)
at org.jboss.netty.channel.socket.nio.NioSocketChannel.setClosed(NioSocketChannel.java:146)
at org.jboss.netty.channel.socket.nio.NioWorker.close(NioWorker.java:592)
at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.close(NioClientSocketPipelineSink.java:415)
at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.processConnectTimeout(NioClientSocketPipelineSink.java:379)
at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.run(NioClientSocketPipelineSink.java:299)
at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:44)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
I am wondering how to make Netty support more simultaneous client connections, such as 10K. I am using the newest version of Netty. Following is the testing scenario:
Each client sends a four letter string to the server and the server handler does nothing upon receiving the string. Each of the server and the clients is running on a high performance machine with eight-core and 16GB memory. The two machines are connected by a Gigabyte network.
Do you have any hints?

1) You can tweak the connectTimeout in the client bootstrap to make sure there is no network/server issues
clientBootStrap.setOption("connectTimeoutMillis", optimumTimout);
2) By setting the backlog value in the Netty server, you can increase the queue of incoming connection size, so clients will have better chance of connecting to the server
serverBootStrap.setOption("backlog", 1000);
3) You have said that your application is creating many connections simultaneously, Client Boss thread may lag behind, if the application is connecting too fast.
Netty 3.2.7 Final allows to set more than one Client Boss thread in NioClientSocketChannelFactory constructor to avoid this issue.

Related

Strategy for Asynchronous database access with Qt5 SQL

I need to create a server in Qt C++ with QTcpServer which can handle so many requests at the same time. nearly more than 1000 connections and all these connection will constantly need to use database which is MariaDB.
Before it can be deployed on main servers, It needs be able to handle 1000 connections with each connection Querying data as fast it can on 4 core 1 Ghz CPU with 2GB RAM Ubuntu virtual machine running on cloud. MySQL database is hosted on some other server which more powerful
So how can I implement this ? after googling around, I've come up with following options
1. Create A new QThread for each SQL Query
2. Use QThreadPool for new SQL Query
For the fist one, it might will create so many Threads and it might slow down system cause of so many context switches.
For second one,after pool becomes full, Other connections have to wait while MariaDB is doing its work. So what is the best strategy ?

Sorry for bad english.
1) Exclude.
2) Exclude.
3) Here first always doing work qt. Yes, connections (tasks for connections) have to wait for available threads, but you easy can add 10000 tasks to qt threadpool. If you want, configure max number of threads in pool, timeouts for tasks and other. Ofcourse your must sync shared data of different threads with semaphore/futex/mutex and/or atomics.
Mysql (maria) it's server, and this server can accept many connections same time. This behaviour equally what you want for your qt application. And mysql it's just backend with data for your application.
So your application it's server. For simple, you must listen socket for new connections and save this clients connections to vector/array and work with each client connection. Always when you need something (get data from mysql backend for client (yeah, with new, separated for each client, onced lazy connection to mysql), read/write data from/to client, close connection, etc.) - you create new task and add this task to threadpool.
This is very simple explanation but hope i'm helped you.

Consider for my.cnf [mysqld] section
thread_handling=pool-of-threads
Good luck.

gRPC C++ client blocking when attempting to connect channel on unreachable IP

I'm trying to enhance some client C++ code using gRPC to support failover between 2 LAN connections.
I'm unsure if I found a bug in gRPC, or more likely that I'm doing something wrong.
Both the server and client machines are on the same network with dual LAN connections, I'll call LAN-A and LAN-B.
The server is listening on 0.0.0.0:5214, so accepts connections on both LANs.
I tried creating the channel on the client with both IPs, and using various load balancing options, ex:
string all_endpoints = "ipv4:172.24.1.12:5214,10.23.50.123:5214";
grpc::ChannelArguments args;
args.SetLoadBalancingPolicyName("pick_first");
_chan = grpc::CreateCustomChannel(all_endpoints,
grpc::InsecureChannelCredentials(),
args);
_stub = std::move(Service::NewStub(_chan));
When I startup the client and server with all LAN connections functioning, everything works perfectly. However, if I kill one of the connections or startup the client with one of the connections down, gRPC seems to be blocking forever on that subchannel. I would expect it to use the subchannel that is still functioning.
As an experiment, I implemented some code to only try to connect on 1 channel (the non-functioning one in this case), and then wait 5 seconds for a connection. If the deadline is exceeded, then we create a new channel and stub.
if(!_chan->WaitForConnected(std::chrono::system_clock::now() +
std::chrono::milliseconds(5000)))
{
lan_failover();
}
The stub is a unique_ptr so should be destroyed, the channel is a shared_ptr. What I see is that I can successfully connect on my new channel but when my code returns, gRPC ends up taking over and indefinitely blocking on what appears to be trying to connect on the old channel. I would expect gRPC would be closing/deleting this no longer used channel. I don't see any functions available in the cpp version that I can call on the channel or globally that would for the shutdown/closure of the channel.
I'm at a loss on how to get gRPC to stop trying to connect on failed channels, any help would be greatly appreciated.
Thanks!
Here is some grpc debug output I see when I startup with the first load balancing implementation I mention, and 1 of the 2 LANs is not functioning (blocking forever):
https://pastebin.com/S5s9E4fA

You can enable keepaliaves. Example usage: https://github.com/grpc/grpc/blob/master/test/cpp/end2end/flaky_network_test.cc#L354-L358

Just wanted to let anyone know the problem wasn't with gRCP, but the way our systems were configured with a SAN that was being written too. A SAN was mounted through the LAN connection I was using the test failover through and the process was actually blocking because it was trying to access that SAN. The stack trace was misleading because it showed the gRPC thread.

Got an error reading communication packets in Google Cloud SQL

From 31th March I've got following error in Google Cloud SQL:
Got an error reading communication packets.
I have been using Google Cloud SQL for 2 years, but never faced with such problem.
I'm very worried about it.
This is detail error message:
textPayload: "2019-04-29T17:21:26.007574Z 203385 [Note] Aborted connection 203385 to db: {db_name} user: {db_username} host: 'cloudsqlproxy~{private ip}' (Got an error reading communication packets)"

While it is true that this error message often occurs after a maintenance period, it isn't necessarily a cause for concern as this is a known behavior by MySQL.
Possible explanations about why this issue is happening are :
The large increase of connection requests to the instance, with the
number of active connections increasing over a short period of time.
The freezing / unavailability of the instance can also occur due to
the burst of connections happening in a very short time interval. It
is observed that this freezing always happens with an increase of
connection requests. This increase in connections causes the
instance to be overloaded and hence unavailable to respond to
further connection requests until the number of connections
decreases or the instance stabilizes.
The server was too busy to accept new connections.
There were high rates of previous connections that were not closed
correctly.
The client terminated it abnormally.
readTimeout setting being set too low in the MySQL driver.
In an excerpt from the documentation, it is stated that:
There are many reasons why a connection attempt might not succeed.
Network communication is never guaranteed, and the database might be
temporarily unable to respond. Make sure your application handles
broken or unsuccessful connections gracefully.
Also a low Cloud SQL Proxy version can be the reason for such
incident issues. Possible upgrade to the latest version of (v1.23.0)
can be a troubleshooting solution.
IP from where you are trying to connect, may not be added to the
Authorized Networks in the Cloud SQL instance.
Some possible workaround for this issue, depending which is your case could be one of the following:
In the case that the issue is related to a high load, you could
retry the connection, using an exponential backoff to prevent
from sending too many simultaneous connection requests. The best
practice here is to exponentially back off your connection requests
and add randomized backoffsto avoid throttling, and potentially
overloading the instance. As a way to mitigate this issue in the
future, it is recommended that connection requests should be
spaced-out to prevent overloading. Although, depending on how you
are connecting to Cloud SQL, exponential backoffs may already be in
use by default with certain ORM packages.
If the issue could be related to an accumulation of long-running
inactive connections, you would be able to know if it is your case
using show full processliston your database looking for
the connections with high Time or connections where Command is
Sleep.
If this is your case you would have a few possible options:
If you are not using a connection pool you could try to update the client application logic to properly close connections immediately at the end of an operation or use a connection pool to limit your connections lifetime. In particular, it is ideal to manage the connection count by using a connection pool. This way unused connections are recycled and also the number of simultaneous connection requests can be limited through the use of the maximum pool size parameter.
If you are using a connecting pool, you could return the idle connections to the pool immediately at the end of an operation and set a shorter timeout by adjusting wait_timeout or interactive_timeoutflag values. Set CloudSQL wait_timeout flag to 600 seconds to force refreshing connections.
To check the network and port connectivity once -
Step 1. Confirm TCP connectivity on port 3306 with tcptraceroute or
netcat.
Step 2. If [Step 1] succeeded then try to check if there are any
errors in using mysql client to check timeout/error.
When the client might be terminating the connection abruptly you
could check for:
If the MySQL client or mysqld server are receiving a packet bigger
than max_allowed_packet bytes, or the client receiving a packet
too large message,if it so you could send smaller packets or
increase the max_allowed_packet flag value on both client
and server. If there are transactions that are not being properly
committed using both "begin" and "commit", there is the need to
update the client application logic to properly commit the
transaction.
There are several utilities that I think will be helpful here,
if you can install mtr and the tcpdump utilities to
monitor the packets during these connection-increasing events.
It is strongly recommended to enable the general_log in the
database flags. Another suggestion is to also enable the slow_query
database flag and output to a file. Also have a look at this
GitHub issue comment and go through the list of additional
solutions proposed for this issue here

This error message indicates a connection issue, either because your application doesn't terminate connections properly or because of a network issue.
As suggested in these troubleshooting steps for MySQL or PostgreSQL instances from the GCP docs, you can start debugging by checking that you follow best practices for managing database connections.

Can Winsock connections randomly fail?

I have a blocking client/server connected locally via Winsock. The client uses firefox to retrieve data from websites, passing certain data along to the server for extra processing. The server always responds, and the processing can take anywhere from 1/10th second to a few minutes. The client has no winsock connection to anything but the server; all web data is retrieved to hard-drive via firefox.
This setup works quite well until, seemingly randomly, the client's recv returns -1 (SOCKET_ERROR) with error code 10054 (WSAECONNRESET). This means the server supposedly terminated connection, but the server is actually still waiting to recv as if nothing is wrong. The connection has failed in this way as early as 5 minutes in or after working for as long as about an hour and a half. The client sends about 10 different types of requests to the server, and failure has occurred on a variety of them. The frequency of requests is roughly constant, probably an average of 10-15 a minute. When the connection breaks, neither computer experiences internet problems and remote desktop does not disconnect.
Initially I thought memory leaks, but after extensive debugging I am reasonably certain no more exist. Firefox is engaged in considerable HTTP traffic at times, so I thought maybe that could be filling available socket bufferspace or something -- seems doubtful but at this point I'm really not sure. So, could it be more memory leaks, maybe a hidden buffer overrun, too much web traffic? What is causing my Winsock app to randomly fail?

Sounds like a firewall at work.
Many firewalls are configured to terminate idle connections (i.e. open TCP sessions on which no data is transferred for awhile). Especially if it's an HTTP connection, which are typically not persistent.

TCP/IP and designing networking application

i'm reading about way to implemnt client-server in the most efficient manner, and i bumped into that link :
http://msdn.microsoft.com/en-us/library/ms740550(VS.85).aspx
saying :
"Concurrent connections should not exceed two, except in special purpose applications. Exceeding two concurrent connections results in wasted resources. A good rule is to have up to four short lived connections, or two persistent connections per destination "
i can't quite get what they mean by 2... and what do they mean by persistent?
let's say i have a server who listens to many clients , whom suppose to do some work with the server, how can i keep just 2 connections open ?
what's the best way to implement it anyway ? i read a little about completion port , but couldn't find a good examples of code, or at least a decent explanation.
thanks

Did you read the last sentence:
A good rule is to have up to four
short lived connections, or two
persistent connections per
destination.
Hard to say from the article, but by destination I think they mean client. This isn't a very good article.

A persistent connection is where a client connects to the server and then performs all its actions without ever dropping the connection. Even if the client has periods of time when it does not need the server, it maintains its connection to the server ready for when it might need it again.
A short lived connection would be one where the client connects, performs its action and then disconnects. If it needs more help from the server it would re-connect to the server and perform another single action.
As the server implementing the listening end of the connection, you can set options in the listening TCP/IP socket to limit the number of connections that will be held at the socket level and decide how many of those connections you wish to accept - this would allow you to accept 2 persistent connections or 4 short lived connections as required.

What they mean by, "persistent," is a connection that is opened, and then held open. It's pretty common problem to determine whether it's more expensive to tie up resources with an "always on" connection, or suffer the overhead of opening and closing a connection every time you need it.
It may be worth taking a step back, though.
If you have a server that has to listen for requests from a bunch of clients, you may have a perfect use case for a message-based architecture. If you use tightly-coupled connections like those made with TCP/IP, your clients and servers are going to have to know a lot about each other, and you're going to have to write a lot of low-level connection code.
Under a message-based architecture, your clients could place messages on a queue. The server could then monitor that queue. It could take messages off the queue, perform work, and place the responses back on the queue, where the clients could pick them up.
With such a design, the clients and servers wouldn't have to know anything about each other. As long as they could place properly-formed messages on the queue, and connect to the queue, they could be implemented in totally different languages, and run on different OS's.
Messaging-oriented-middleware like Apache ActiveMQ and Weblogic offer API's you could use from C++ to manage and use queues, and other messaging objects. ActiveMQ is open source, and Weblogic is sold by Oracle (who bought BEA). There are many other great messaging servers out there, so use these as examples, to get you started, if messaging sounds like it's worth exploring.

I think key words are "per destination". Single tcp connection tries to accelerate up to available bandwidth. So if you allow more connections to same destination, they have to share same bandwidth.
This means that each transfer will be slower than it could be and server has to allocate more resources for longer time - data structures for each connection.
Because establishing tcp connection is "time consuming", it makes sense to allow establish second connection in time when you are serving first one, so they are overlapping each other. for short connections setup time could be same as for serving the connection itself (see poor performance example), so more connections are needed for filling all bandwidth effectively.
(sorry I cannot post hyperlinks yet)
here msdn.microsoft.com/en-us/library/ms738559%28VS.85%29.aspx you can see, what is poor performance.
here msdn.microsoft.com/en-us/magazine/cc300760.aspx is some example of threaded server what performs reasonably well.
you can limit number of open connections by limiting number of accept() calls. you can limit number of connections from same source just by canceling connection when you find out, that you allready have more then two connections from this location (just count them).
For example SMTP works in similar way. When there are too many connections, it returns 4xx code and closes your connection.
Also see this question:
What is the best epoll/kqueue/select equvalient on Windows?

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js