I am having a problem with gRPC C++ client making calls against google cloud Bigtable. These calls occasionally hang and it is only if the call deadline is set the call returns. There is an issue filed with gRPC team: https://github.com/grpc/grpc/issues/6278 with stack trace and a piece of gRPC tracing log provided.
The call that hangs most often is ReadRows stream read call. I have seen MutateRow call hanging a few times as well but that is rather rare.
gRPC tracing shows that there is some response coming back from the server, however that response seems to be insufficient for gRPC client to go on.
I did spend a fair amount of time debugging the code, no obvious problems found so far on the client side, no memory corruptions seen. This is a single-threaded application, making one call at a time, client side concurrency is not a suspect. Client runs on google compute engine box, so the network likely is not an issue as well. gRPC client is kept up to date with the github repository main line.
Any suggestions would be appreciated. If anyone have debugging ideas that would be great as well. Using valgrind, gdb, reducing the application to a subset with reproducible results did not help so far, I have not been able to find out what the problem is. The problem is random and shows up occasionally.
Additional note on May 17, 2016
There was a suggestion that re-tries may help to deal with the issue.
Unfortunately re-tries do not work very well for us because we would have to carry that over into the application logic. We can easily re-try updates, which is MutateRow calls, and we do that, these are not streaming calls and easy to re-try. However once the iteration of the DB query results has begun by the application, if it fails, the re-trying means that the application needs to re-issue the query and start iteration of the results again. Which is problematic. It is always possible to consider a change that would make our applications to read the whole result set at once and then at the application level iterations can be done in memory. Then re-tries can be handled. But that is problematic for all kinds of reasons, like memory footprint and application latencies. We want to process DB query results as soon as they arrive, not when all of them are in memory. There is also timeout added to the call latency when the call hangs. So, re-tries of the query result iterations are really costly to such a degree that they are not practical.
We've experienced hanging issues with gRPC in various languages. The gRPC team is investigating.
Related
I have a kind of a complex application, which uses heavy IO: it includes ffmpeg and doing video transcoding (by software, no HW-acceleration is available).
This might be unimportant, but I wanted to emphasize it.
All video transcoding functions are working on their own std::thread, and using libev for IO management.
Hardware details:
CPU architecture is ARM
OS is Debian.
Now, I'm trying to use ping to check if a particular server is available. I found ping's open source, and included in the application. This is running an a completely different std::thread.
Problem:
If there is no video-transcoding in progress, ping function works as intended.
However, when there is a CPU-intensive transcoding, ping function 99% returns with timeout.
I suspect there is some kind of IO starving, so I deep dived into pings source:
I found out that ping uses the old select function call to detect if there is an I/O available. I was almost sure this causing the problem, and I have refactored the code: dropped select, and put libev into action.
But unfortunately the starving still stays the same.
I have almost accepted this, as the video transcoding really puts a huge load onto the CPU (70-80%).
But, if I run my application from an SSH session #A, and run ping from another SSH session #B, my application can do the transcoding, and there is no singe lost packet from ping on session #B.
Conclusion:
The hardware seems capable of running my heavy application in parallel with ping.
Question:
I am really curious of:
Is there a limitation per process on Linux on how many IOs can the process use? (I guess, but how can I know this limitation? How can I raise this limitation? How can I check current usage?)
If there is not any problems with IO limitation, what else can cause this kind of "starving-like" problems between std::threads? (Anyway, pings thread seems not blocking, as it receives the timeout. It just does not receive a free IO operation) More accurate: ping actually CAN send out packets, but they just do not seem arriving back. And I am almost sure those ping packets replys are coming back, just my application is not getting a green light for READ operation.
Got a large C++ function in Linux that calls a whole lot of other functions, making up an algorithm. At various points given certain bad inputs, the algorithm can get "stuck" and go on forever. Adding a timeout seems appropriate as all potential "stuck" points cannot be predicted. But despite scouring the Internet for timeout examples I've only found how to apply timeouts when either the thing your timing is a separate thread or it's reading inputs. My code is a single thread and does not modify file descriptors, so not coming up with any luck. Do I basically have no choice but to thread it?
I am not sure about the situation, actually server applications or embedded applications often run for years in background without stopping. I think one option is to let your program run in background and log to a file(or screen) timely, and, if you really want to stop the program after certain time, you can use timeout command or a script to kill your program after that time, say, timeout 15s your-prog.
I have a server program which should run full time a day. If I want to change some parameters of it, Is there any way rather than shut down then restart way?
There are quite a few ways of doing this, including, but almost certainly not limited to:
You can maintain the parameters in a separate file so that the program will periodically check that file and update its internal information.
Similar to (1) but you can send some sort of signal to the application to get it to immediately re-read the file.
You can do either (1) or (2) but using shared memory rather than a configuration file.
You can have your program sit at the server end of an IPC conversation, so that a client can open up a connection to it to provide new parameters. Anything from a simple message queue to a full-blown HTTP server and associated pages.
Of course, all of these tend to need a fair amount of work in your program to get it to look for the new information.
You should take that into account when making your decision. By far the quickest solution to implement is to just (cleanly) kill off the process at something like 11:55pm then immediately restart it. It's simpler because your code probably already has the ability to load the information on startup, so this could be a simple cron one-liner.
Some people speak of laziness as a bad thing but that's not always the case :-)
If the Server maintains many alive connections from clients, restarting the server process is the last way you should consider. Except reloading configuration files, inserting a proxy process between clients and server can be another way.
The proxy process is Responsible for 2 things.
a. Maintaining the connection from clients and forwarding packets to Server for handling.
b. Judging weather the current server process(Server A) is alive and if it not, switching to another server(Server B) automatically.
Then you can change parameters by restart server without worrying about interrupting clients since there is always two(or more) servers running.
I have a remote server which handles various different commands, one of which is an event fetching method.
The event fetch returns right away if there is 1 or more events listed in the queue ready for processing. If the event queue is empty, this method does not return until a timeout of a few seconds. This way I don't run into any HTTP/socket timeouts. The moment an event becomes available, the method returns right away. This way the client only ever makes connections to the server, and the server does not have to make any connections to the client.
This event mechanism works nicely. I'm using the boost library to handle queues, event notifications, etc.
Here's the problem. While the server is holding back on returning from the event fetch method, during that time, I can't issue any other commands.
In the source code, XmlRpcDispatch.cpp, I'm seeing in the "work" method, a simple loop that uses a blocking call to "select".
Seems like while the handling of a method is busy, no other requests are processed.
Question: am I not seeing something and can XmlRpcpp (xmlrpc++) handle multiple requests asynchronously? Does anyone know of a better xmlrpc library for C++? I don't suppose the Boost library has a component that lets me issue remote commands?
I actually don't care about the XML or over-HTTP feature. I simply need to issue (asynchronous) commands over TCP in any shape or form?
I look forward to any input anyone might offer.
I had some problems with XMLRPC also, and investigated many solutions like GSoap and XMLRPC++, but in the end I gave up and wrote the whole HTTP+XMLRPC from scratch using Boost.ASIO and TinyXML++ (later I swaped TinyXML to expat). It wasn't really that much work; I did it myself in about a week, starting from scratch and ending up with many RPC calls fully implemented.
Boost.ASIO gave great results. It is, as its name says, totally async, and with excellent performance with little overhead, which to me was very important because it was running in an embedded environment (MIPS).
Later, and this might be your case, I changed XML to Google's Protocol-buffers, and was even happier. Its API, as well as its message containers, are all type safe (i.e. you send an int and a float, and it never gets converted to string and back, as is the case with XML), and once you get the hang of it, which doesn't take very long, its very productive solution.
My recomendation: if you can ditch XML, go with Boost.ASIO + ProtobufIf you need XML: Boost.ASIO + Expat
Doing this stuff from scratch is really worth it.
Given: multithreaded (~20 threads) C++ application under RHEL 5.3.
When testing under load, top shows that CPU usage jumps in range 10-40% every second.
The design mostly pretty simple - most of the threads implement active object design pattern: thread has a thread-safe queue, requests from other queues are pushed to the queue, while the thread only polling on the queue and process incomming requests. Processed request causes to a new request to be pushed to next processing thread.
The process has several TCP/UDP connection over each a data is received/sent in a high load.
I know I did not provided sufficiant data. This is pretty big application, and I'n not familiar well with all it's parts. It's now ported from Windows on Linux over ACE library (used for networking part).
Suppusing the problem is in the application and not external one, what are the techicues/tools/approaches can be used to discover the problem. For example I suspect that this maybe caused by some mutex contention.
I have faced similar problem some time back and here are the steps that helped me.
1) Start with using strace to see where the application is spending the time executing system calls.
2) Use OProfile to profile both the application and the kernel.
3) If you are using an SMP system , look at the numa settings,
In my case that caused a havoc .
/proc/appPID/numa_maps will give a quick look at how the access to the memory is happening.
numa misses can cause the jumps.
4) You have mentioned about TCP connections in your app.
Look at the MTU size and see its set to right value and
Depending upon the type of Data getting transferred use the Nagles Delay appropriately.
Nagles Delay