gRPC C++ client blocking when attempting to connect channel on unreachable IP - c++

I'm trying to enhance some client C++ code using gRPC to support failover between 2 LAN connections.
I'm unsure if I found a bug in gRPC, or more likely that I'm doing something wrong.
Both the server and client machines are on the same network with dual LAN connections, I'll call LAN-A and LAN-B.
The server is listening on 0.0.0.0:5214, so accepts connections on both LANs.
I tried creating the channel on the client with both IPs, and using various load balancing options, ex:
string all_endpoints = "ipv4:172.24.1.12:5214,10.23.50.123:5214";
grpc::ChannelArguments args;
args.SetLoadBalancingPolicyName("pick_first");
_chan = grpc::CreateCustomChannel(all_endpoints,
grpc::InsecureChannelCredentials(),
args);
_stub = std::move(Service::NewStub(_chan));
When I startup the client and server with all LAN connections functioning, everything works perfectly. However, if I kill one of the connections or startup the client with one of the connections down, gRPC seems to be blocking forever on that subchannel. I would expect it to use the subchannel that is still functioning.
As an experiment, I implemented some code to only try to connect on 1 channel (the non-functioning one in this case), and then wait 5 seconds for a connection. If the deadline is exceeded, then we create a new channel and stub.
if(!_chan->WaitForConnected(std::chrono::system_clock::now() +
std::chrono::milliseconds(5000)))
{
lan_failover();
}
The stub is a unique_ptr so should be destroyed, the channel is a shared_ptr. What I see is that I can successfully connect on my new channel but when my code returns, gRPC ends up taking over and indefinitely blocking on what appears to be trying to connect on the old channel. I would expect gRPC would be closing/deleting this no longer used channel. I don't see any functions available in the cpp version that I can call on the channel or globally that would for the shutdown/closure of the channel.
I'm at a loss on how to get gRPC to stop trying to connect on failed channels, any help would be greatly appreciated.
Thanks!
Here is some grpc debug output I see when I startup with the first load balancing implementation I mention, and 1 of the 2 LANs is not functioning (blocking forever):
https://pastebin.com/S5s9E4fA

You can enable keepaliaves. Example usage: https://github.com/grpc/grpc/blob/master/test/cpp/end2end/flaky_network_test.cc#L354-L358

Just wanted to let anyone know the problem wasn't with gRCP, but the way our systems were configured with a SAN that was being written too. A SAN was mounted through the LAN connection I was using the test failover through and the process was actually blocking because it was trying to access that SAN. The stack trace was misleading because it showed the gRPC thread.

Related

Checking if a program is running on local network

I want to write a simple program in c++ that use tcp socket to communicate with the same program on another computer in lan.
To create the tcp socket I could make the user write the ip and the port to make the connection. But I also need to be able to autodetect in the local area network if there is any computer also running the program.
My idea was:
when the program is autodetecting for available connection in lan, it will send all ips a message via udp to a specific port, meanwhile it will also keep listening to a port waiting to eventual answer.
when the program on the other computer is opened for lan connection, it will keep listening to the a port in case another computer is trying to detect, then it will send also via udp the response messagee notifying the possibility of connection.
All the security system is another problem for which I don't need answer now.
// Client 1:
// Search for all ips in local network
// create udp socket
// send check message
// thread function listening for answers
// if device found than show to menu
// continue searching process
// Client 2 (host) :
// user enable lan connection
// create udp socket
// thread function listening for detection requests
// if request structure is right send back identification message
// continue listening for request
My question - Is there a more efficient or standard way to do something like that?
Testing whether another computer is listening on a given port is what hackers do all day to try to take over the world...
When writing a software like you describe, though, you want to specify the IP and port information. A reason to search and automatically find a device would be if you are implementing a printer, for example. In that case, as suggested by Hero, you could use broadcasting. However, in that case, you use UDP (because TCP does not support that feature).
The software on one side must have a server, which in TCP parlance means a listen() call followed by an accept() until a connection materialized.
The client on the other side can then attempt a connect(). If the connect works, then the software on the other side is up and running.
If you need both to be able to attempt a connection, then both must implement the client and server (which is doable if you use ppoll() [or the old select()] you know which event is happening and can act on it, no need for threads or fork()).
On my end, I wrote the eventdispatcher library to do all those things under the hood. I also want many computers to communicate between each others, so I have a form of RPC service I call communicatord. This service is at the same time a client and a server. It listens on a port and tries to connect to other systems. If the other system has a lower IP address, it is considered a server. Otherwise, it is viewed as a client and I disconnect after sending a GOSSIP message. That way the client (larger IP address) can in turn connect to the server. This communicator service allows all my other services to communicate without having to re-implement the communication layer between computer over and over again.

Client doesn't detect Server disconnection

In my application (c++) I have a service exposed as:
grpc foo(stream Request) returns (Reply) { }
The issue is that when the server goes down (CTRL-C) the stream on the client side keeps going indeed the
grpc::ClientWriter::Write
doesn't return false. I can confirm that using netstat I don't see any connection between the client and the server (apart a TIME_WAIT one that after a while goes away) and the client keeps calling that Write without errors.
Is there a way to see if the underlying connection is still up instead to rely on the Write return value ? I use grpc version 1.12
update
I discovered that the underlying channel goes in status IDLE but still the ClientWriter::Write doesn't report the error, I don't know if this is intended. During the streaming I'm trying now to reestablish a connection with the server every time the channel status is not GRPC_CHANNEL_READY
This could happen in a few scenarios but the most common element is a connection issue. We have KEEPALIVE support in gRPC to tackle exactly this issue. For C++, please refer to https://github.com/grpc/grpc/blob/master/doc/keepalive.md on how to set this up. Essentially, endpoints would send pings at certain intervals and expect a reply within a certain timeframe.

Apache Thrift: Terminate Connection from the Server

I am using thrift to provide an interface between a device and a management console. It is possible for there to be up to 4 active connections to the device at one time, and I have this working using a TThreadPool server.
The issue arises around client disconnections; If a client disconnects correctly, there is no issue, however if one does not (i.e. the client crashes out or doesn't call client->close()) then the server seems to keep that clients thread alive. This means that when the next connection attempt is made, the client hangs, as the server has used up its allocated thread pool so cannot service the new request.
I haven't been able to find any standard, public mechanism by which the server can stop, and hence free up, a clients thread if that client has not used the interface for a set time period?
Is there a standard way to facilitate this in thrift?
Set the receive/send timeout on the server socket might help. Server will close the connection on timeout.
https://github.com/apache/thrift/blob/129f332d72facda5d06f87e2b4e5e08bea0b6b44/lib/cpp/src/thrift/transport/TServerSocket.h#L103
void setSendTimeout(int sendTimeout);
void setRecvTimeout(int recvTimeout);

Winsock select function returning different values

I am working on project having client server architecture. select function returns different value in different scenarios Followings are the details
Scenario 1:
When i install my server at my machine, stop all the corresponding services, my client goes to DC state and now return value of select is 1 and read_mask.fd_count is also 1.
Scenario 2:
When i connect to remote server (abc.com) and disconnect my wireless connection. in this case the same function returns 0 also read_mask.fd_count is 0. i tried changing timeout variable value from ten ms to 50 sec. cant figure out the problem.
Any help will be appreciated
When you shot down the server you cause the network stack to shutdown the connection. Furtehr connection request are refused. The select indicates that there's something and the the recv() returns 0 to indicate forcibly closed.
When you pull the wireless cable out of the plug then the client gets neither the shutdown nor the connection request. You wait for any timeout to detect the not available server.
In a real world application you should implement a kind of heartbeat in you protocol that allows to detect the "disconnected state" in the second scenario.
Edit: If your Winsock implementation supports SO_KEEPALIVE_VALS, you can also configure this to detect the lost connectivity. See also: SO_KEEPALIVE.

Multithreaded Server Issue

I am writing a server in linux that is supposed to serve an API.
Initially, I wanted to make it Multi-threaded on a single port, meaning that I'd have multiple threads working on various request received on a single port.
One of my friends told me that it not the way it is supposed to work. He told me that when a request is received, I first have to follow a Handshake procedure, create a thread that is listening to some other port dedicated to the request and then redirect the requested client to the new port.
Theoretically, it's very interesting but I could not find any information on how to implement the handshake and do the redirection. Can someone help?
If I'm not wrong in interpreting your responses, once I create a multithreaded server with a main thread listening to a port, and creates a new thread to handle requests, I'm essentially making it multithreaded on a single port?
Consider the scenario where I get a large number of requests every second. Isn't it true that every request on the port should now wait for the "current" request to complete? If not, how would the communication still be done: Say a browser sends a request, so the thread handling this has to first listen to the port, block it, process it, respond and then unblock it.
By this, eventhough I'm having "multithreads" , all I'm using is one single thread at a time apart from the main thread because the port is being blocked.
What your friend told you is similar to passive FTP - a client tells the server that it needs a connection, the server sends back the port number and the client creates a data connection to that port.
But all you wanted to do is a multithreaded server. All you need is one server socket listening and accepting connections on a given port. As soon as the automatic TCP handshake is finished, you'll get a new socket from the accept function - that socket will be used for communication with the client that has just connected. So now you only have to create a new thread, passing that client socket to the thread function. In your server thread, you will then call accept again in order to accept another connection.
TCP/IP does the handshake, if you can't think of any reason to do a handshake than your application does not demand it.
An example of an application specific handshake could be for user authentication.
What your colleague is suggesting sounds like the way FTP works. This is not a good thing to do -- the internet these days is more or less used for protocols which use a single port, and having a command port is bad. One of the reasons is because statefull firewalls aren't designed for multi-port applications; they have to be extended for each individual application that does things this way.
Look at ASIO's tutorial on async TCP. There one part accept connections on TCP and spawns handlers that each communicate with a single client. That's how TCP-servers usually work (including HTTP/web, the most common tcp protocol.)
You may disregard the asynchronous stuff of ASIO if you're set on creating a thread per connection. It doesn't apply to your question. (Going fully async and have one worker-thread per core is nice, but it might not integrate well with the rest of your environment.)