console out in multi-threaded applications - c++

Usually developing applications I am used to print to console in order to get useful debugging/tracing information. The application I am working now since it is multi-threaded sometimes I see my printf overlapping each other.
I tried to synchronize the screen using a mutex but I end up in slowing and blocking the app. How to solve this issue?
I am aware of MT logging libraries but in using them, since I log too much, I slow ( a bit ) my app.
I was thinking to the following idea..instead of logging within my applications why not log outside it? I would like to send logging information via socket to a second application process that actually print out on the screen.
Are you aware of any library already doing this?
I use Linux/gcc.
thanks
afg

You have 3 options. In increasing order of complexity:
Just use a simple mutex within each thread. The mutex is shared by all threads.
Send all the output to a single thread that does nothing but the logging.
Send all the output to a separate logging application.
Under most circumstances, I would go with #2. #1 is fine as a starting point, but in all but the most trivial applications you can run in to problems serializing the application. #2 is still very simple, and simple is a good thing, but it is also quite scalable. You still end up doing the processing in the main application, but for the vast majority of applications you gain nothing by spinning this off to it's own, dedicated application.
Number 3 is what you're going to do in preformance-critical server type applications, but the minimal performance gain you get with this approach is 1: very difficult to achieve, 2: very easy to screw up, and 3: not the only or even most compelling reason people generally take this approach. Rather, people typically take this approach when they need the logging service to be seperated from the applications using it.

Which OS are you using?
Not sure about specific library's, but one of the classical approaches to this sort of problem is to use a logging queue, which is worked by a writer thread, who's job is purely to write the log file.
You need to be aware, either with a threaded approach, or a multi-process approach that the write queue may back up, meaning it needs to be managed, either by discarding entries or by slowing down your application (which is obviously easier if it's the threaded approach).
It's also common to have some way of categorising your logging output, so that you can have one section of your code logging at a high level, whilst another section of your code logs at a much lower level. This makes it much easier to manage the amount of output that's being written to files and offers you the option of releasing the code with the logging in it, but turned off so that it can be used for fault diagnosis when installed.

As I know critical section has less weight.
Critical section
Using critical section
If you use gcc, you could use atomic accesses. Link.

Frankly, a Mutex is the only way you really want to do that, so it's always going to be slow in your case because you're using so many print statements.... so to solve your question then, don't use so many print_f statements; that's your problem to begin with.
Okay, is your solution using a mutex to print? Perhaps you should have a mutex to a message queue which another thread is processing to print; that has a potential hang up, but I think will be faster. So, use an active logging thread that spins waiting for incoming messages to print. The networking solution could work too, but that requires more work; try this first.

What you can do is to have one queue per thread, and have the logging thread routinely go through each of these and post the message somewhere.
This is fairly easy to set up and the amount of contention can be very low (just a pointer swap or two, which can be done w/o locking anything).

Related

Thread per connection vs Reactor pattern (with a thread pool)?

I want to write a simple multiplayer game as part of my C++ learning project.
So I thought, since I am at it, I would like to do it properly, as opposed to just getting-it-done.
If I understood correctly: Apache uses a Thread-per-connection architecture, while nginx uses an event-loop and then dedicates a worker [x] for the incoming connection. I guess nginx is wiser, since it supports a higher concurrency level. Right?
I have also come across this clever analogy, but I am not sure if it could be applied to my situation. The analogy also seems to be very idealist. I have rarely seen my computer run at 100% CPU (even with a umptillion Chrome tabs open, Photoshop and what-not running simultaneously)
Also, I have come across a SO post (somehow it vanished from my history) where a user asked how many threads they should use, and one of the answers was that it's perfectly acceptable to have around 700, even up to 10,000 threads. This question was related to JVM, though.
So, let's estimate a fictional user-base of around 5,000 users. Which approach should would be the "most concurrent" one?
A reactor pattern running everything in a single thread.
A reactor pattern with a thread-pool (approximately, how big do you suggest the thread pool should be?
Creating a thread per connection and then destroying the thread the connection closes.
I admit option 2 sounds like the best solution to me, but I am very green in all of this, so I might be a bit naive and missing some obvious flaw. Also, it sounds like it could be fairly difficult to implement.
PS: I am considering using POCO C++ Libraries. Suggesting any alternative libraries (like boost) is fine with me. However, many say POCO's library is very clean and easy to understand. So, I would preferably use that one, so I can learn about the hows of what I'm using.
Reactive Applications certainly scale better, when they are written correctly. This means
Never blocking in a reactive thread:
Any blocking will seriously degrade the performance of you server, you typically use a small number of reactive threads, so blocking can also quickly cause deadlock.
No mutexs since these can block, so no shared mutable state. If you require shared state you will have to wrap it with an actor or similar so only one thread has access to the state.
All work in the reactive threads should be cpu bound
All IO has to be asynchronous or be performed in a different thread pool and the results feed back into the reactor.
This means using either futures or callbacks to process replies, this style of code can quickly become unmaintainable if you are not used to it and disciplined.
All work in the reactive threads should be small
To maintain responsiveness of the server all tasks in the reactor must be small (bounded by time)
On an 8 core machine you cannot cannot allow 8 long tasks arrive at the same time because no other work will start until they are complete
If a tasks could take a long time it must be broken up (cooperative multitasking)
Tasks in reactive applications are scheduled by the application not the operating system, that is why they can be faster and use less memory. When you write a Reactive application you are saying that you know the problem domain so well that you can organise and schedule this type of work better than the operating system can schedule threads doing the same work in a blocking fashion.
I am a big fan of reactive architectures but they come with costs. I am not sure I would write my first c++ application as reactive, I normally try to learn one thing at a time.
If you decide to use a reactive architecture use a good framework that will help you design and structure your code or you will end up with spaghetti. Things to look for are:
What is the unit of work?
How easy is it to add new work? can it only come in from an external event (eg network request)
How easy is it to break work up into smaller chunks?
How easy is it to process the results of this work?
How easy is it to move blocking code to another thread pool and still process the results?
I cannot recommend a C++ library for this, I now do my server development in Scala and Akka which provide all of this with an excellent composable futures library to keep the code clean.
Best of luck learning C++ and with which ever choice you make.
Option 2 will most efficiently occupy your hardware. Here is the classic article, ten years old but still good.
http://www.kegel.com/c10k.html
The best library combination these days for structuring an application with concurrency and asynchronous waiting is Boost Thread plus Boost ASIO. You could also try a C++11 std thread library, and std mutex (but Boost ASIO is better than mutexes in a lot of cases, just always callback to the same thread and you don't need protected regions). Stay away from std future, cause it's broken:
http://bartoszmilewski.com/2009/03/03/broken-promises-c0x-futures/
The optimal number of threads in the thread pool is one thread per CPU core. 8 cores -> 8 threads. Plus maybe a few extra, if you think it's possible that your threadpool threads might call blocking operations sometimes.
FWIW, Poco supports option 2 (ParallelReactor) since version 1.5.1
I think that option 2 is the best one. As for tuning of the pool size, I think the pool should be adaptive. It should be able to spawn more threads (with some high hard limit) and remove excessive threads in times of low activity.
as the analogy you linked to (and it's comments) suggest. this is somewhat application dependent. now what you are building here is a game server. let's analyze that.
game servers (generally) do a lot of I/O and relatively few calculations, so they are far from 100% CPU applications.
on the other hand they also usually change values in some database (a "game world" model). all players create reads and writes to this database. which is exactly the intersection problem in the analogy.
so while you may gain some from handling the I/O in separate threads, you will also lose from having separate threads accessing the same database and waiting for its locks.
so either option 1 or 2 are acceptable in your situation. for scalability reasons I would not recommend option 3.

C++ Server - To Thread or not to Thread?

I'm working on a game server, written in C++, and I'm trying to decide how many threads to use and what tasks to thread. The basic server skeleton consists of keyboard I/O and output to a console, accepting incoming connects, sending outgoing connects, and doing the game "stuff".
What I'd like to know is which things should be given a separate thread. Should each connect have its own thread? I know this is variable, it depends on the project or so, but I would like it to support a pretty decent number of players (somewhere in the hundreds if possible).
The standard answer should always be: Try it the simplest way first, and only look for ways to improve performance if the simple way isn't good enough. However, re-architecting a large C++ program can be a painful experience, so some guesses about performance in advance may be appropriate.
Theoretically, hundreds of threads are probably OK on modern machines. The NPTL implementation for Linux was tested with tens of thousands of threads, as I recall. If that's the easiest way for you to implement, it may be the right answer.
However, high-performance web servers and similar typically use event-driven models instead. Consider a library like libevent. I'm sure there are C++ libraries for the same purpose.
I personally believe that languages without first-class continuations, or at least coroutines, are poor choices for this kind of work, but the C language family is how we get work done today, so off we go. :-)
A good solution could be to use a Thread pool.
Idea is to let the main thread dispatch equitably all connexions in a fixed number of threads.
With a good design, you can easily set the number of thread on runtime.
You can find more informations here.
Create more threads than you have CPU cores is not productive, and adding too threads decrease performances due to time taken for switching between threads.
By example, for compiling a large project (it's not exactly the same thing, but it's valid for both case), it's often recommended to use no more thread than number of CPU cores + 1.
A very common technique is to have the game server run on one thread to monitor several connections (i.e. sockets) by using a select on each socket. When data is available, grab the data and enqueue it in a producer/consumer type model for the game engine to pick up.
This is by no means the be-all-end-all implementation, but it should be enough to get you started. Sounds like a cool project. Good luck!
If you setup the connections and utilize them in a manner that cause the thread to block waiting on IO then you should be able to service all of the connections and the keyboard on one thread. You may not want to put the console output on that same thread, as I've seen cases (on windows at least), where the speed of writing to the console is actually a bottleneck (i.e. if the console window is minimized the process runs considerably faster).
If the work of your game engine parallelizes well then you probably want to set use as many threads as there are CPUs less one (for the OS and the other two threads). If you expect the client to run on the same machine the server will want to detect that and scale back the number of threads it uses.

VNC viewer implementation

Our team is implementing a VNC viewer (=VNC client) on Windows. The protocol (called RFB) is stateful, meaning that the viewer has to read 1 byte, see what it is, then read either 3 or 10 bytes more, parse them, and so on.
We've decided to use asynchronous sockets and a single (UI) thread. Consequently, there are 2 ways to go:
1) state machine -- if we get a block on socket reading, just remember the current state and quit. Later on, a socket notification will arrive and the interrupted logic will resume from the proper stage;
2) inner message loop -- once we determine that reading from the socket would block, we enter an inner message loop and spin there until all the necessary data is finally received.
UI is not thus frozen in case of a block.
As experience showed, the second approach is bad, as any message can come while we're in the inner message loop. I cannot tell the full story here, but it simply is not reliable enough. Crashes and kludges.
The first option seems to be quite acceptable, but it is not easy to program in such a style. One has to remember the state of an algorithm and values of all the local variables required for further processing.
This is quite possible to use multiple threads, but we just thought that the problems in this case would be even much harder: synchronization of frame-buffer access, multi-threading issues, etc. Moreover, even in this variant it seems necessary to use asynchronous sockets as well.
So, what way is in your opinion the best ?
The problem is quite a general one. This is the problem of organizing asynchronous communication through stateful protocols.
Edit 1: We use C++ and MFC as UI framework.
I've done a few parallel computing projects and it seems that MPI (Message Passing Interface) might be helpful to your VNC project. You're probably not so interested in the parallel computing power provided by MPI, but you may want to use the simplified socket-like interface for asynchronous communication over a network.
http://www.open-mpi.org/
You can find other implementations of MPI and tons of use examples from google.
Don't bother with CSocket, you'll move to CAsyncSocket in the end because of the extra control you get (interrupting, shutting down etc.). I'd also recommend using a separate thread to manage the communication, it adds complexity but keeping the UI responsive should be a top priority.
I think you will find that your design will be simplified greatly by using a separate thread to handle a blocking socket.
The main reason for this is you don't need to spin and wait. The UI remains responsive while the network thread(s) block when it has nothing to do and comes back when it has stuff to do. You are effectively offloading a large portion of your overhead to the OS.
Remember, RFB does not require a whole lot of state info to work. Because client to server messages are short; there is nothing requiring you to receive a frame buffer before you send your next pointer input.
My point being is messages in RFB can be intermixed; the server will work on your schedule.
Now, Windows provides easy to use synchronization API's that while not always the most efficient, are more than enough for your purposes and will ease getting a proof of concept up and going.
Take a look at Windows Synchronization and specifically Critical Sections
Just my 2cents, I've implemented both a vnc server and client on windows, these were my impressions.

Asynchronous thread-safe logging in C++

I'm looking for a way to do asynchronous and thread-safe logging in my C++ project, if possible to one file. I'm currently using cerr and clog for the task, but since they are synchronous, execution shortly pauses every time something is logged. It's a relatively graphics-heavy app, so this kind of thing is quite annoying.
The new logger should use asynchronous I/O to get rid of these pauses. Thread-safety would also be desirable as I intend to add some basic multithreading soon.
I considered a one-file-per-thread approach, but that seemed like it would make managing the logs a nightmare. Any suggestions?
I noticed this 1 year+ old thread. Maybe the asynchronous logger I wrote could be of interest.
http://www.codeproject.com/KB/library/g2log.aspx
G2log uses a protected message queue to forward log entries to a background worker that the slow disk accesses.
I have tried it with a lock-free queue which increased the average time for a LOG call but decreased the worst case time, however I am using the protected queue now as it is cross-platform. It's tested on Windows/Visual Studio 2010 and Ubuntu 11.10/gcc4.6.
It's released as public domain so you can do with it what you want with no strings attached.
This is VERY possible and practical. How do I know? I wrote exactly that at my last job. Unfortunately (for us), they now own the code. :-) Sadly, they don't even use it.
I intend on writing an open source version in the near future. Meanwhile, I can give you some hints.
I/O manipulators are really just function names. You can implement them for your own logging class so that your logger is cout/cin compatible.
Your manipulator functions can tokenize the operations and store them into a queue.
A thread can be blocked on that queue waiting for chunks of log to come flying through. It then processes the string operations and generates the actual log.
This is intrinsically thread compatible since you are using a queue. However, you still would want to put some mutex-like protection around writing to the queue so that a given log << "stuff" << "more stuff"; type operation remains line-atomic.
Have fun!
I think the proper approach is not one-file-per-thread, but one-thread-per-file. If any one file (or resource in general) in your system is only ever accessed by one thread, thread-safe programming becomes so much easier.
So why not make Logger a dedicated thread (or several threads, one per file, if you're logging different things in different files), and in all other threads, writing to log would place the message on the input queue in the appropriate Logger thread, which would get to it after it's done writing the previous message. All it takes is a mutex to protect the queue from adding an event while Logger is reading an event, and a condvar for Logger to wait on when its queue is empty.
Have you considered using a log library.
There are several available, I discovered Pantheios recently and it really seems to be quite incredible.
It's more a front-end logger, you can customize which system is used. It can interact with ACE or log4cxx for example and it seems really easy to use and configure. The main advantage is that it use typesafe operators, which is always great.
If you just want a barebone logging library:
ACE
log4c*
Boost.Log
Pick any :)
I should note that it's possible to implement lock-free queues in C++ and that they are great for logging.
I had the same issue and I believe I have found the perfect solution. I present to you, a single-header library called loguru: https://github.com/emilk/loguru
It's simple to use, portable, configurable, macro-based and by default doesn't #include anything (for that sweet, sweet compilation times).

Detecting application hang

I have a very large, complex (million+ LOC) Windows application written in C++. We receive a handful of reports every day that the application has locked up, and must be forcefully shut down.
While we have extensive reporting about crashes in place, I would like to expand this to include these hang scenarios -- even with heavy logging in place, we have not been able to track down root causes for some of these. We can clearly see where activity stopped - but not why it stopped, even in evaluating output of all threads.
The problem is detecting when a hang occurs. So far, the best I can come up with is a watchdog thread (as we have evidence that background threads are continuing to run w/out issues) which periodically pings the main window with a custom message, and confirms that it is handled in a timely fashion. This would only capture GUI thread hangs, but this does seem to be where the majority of them are occurring. If a reply was not received within a configurable time frame, we would capture a memory and stack dump, and give the user the option of continuing to wait or restarting the app.
Does anyone know of a better way to do this than such a periodic polling of the main window in this way? It seems painfully clumsy, but I have not seen alternatives that will work on our platforms -- Windows XP, and Windows 2003 Server. I see that Vista has much better tools for this, but unfortunately that won't help us.
Suffice it to say that we have done extensive diagnostics on this and have been met with only limited success. Note that attaching windbg in real-time is not an option, as we don't get the reports until hours or days after the incident. We would be able to retrieve a memory dump and log files, but nothing more.
Any suggestions beyond what I'm planning above would be appreciated.
The answer is simple: SendMessageTimeout!
Using this API you can send a message to a window and wait for a timeout before continuing; if the application responds before timeout the is still running otherwise it is hung.
One option is to run your program under your own "debugger" all the time. Some programs, such as GetRight, do this for copy protection, but you can also do it to detect hangs. Essentially, you include in your program some code to attach to a process via the debugging API and then use that API to periodically check for hangs. When the program first starts, it checks if there's a debugger attached to it and, if not, it runs another copy of itself and attaches to it - so the first instance does nothing but act as the debugger and the second instance is the "real" one.
How you actually check for hangs is another whole question, but having access to the debugging API there should be some way to check reasonably efficiently whether the stack has changed or not (ie. without loading all the symbols). Still, you might only need to do this every few minutes or so, so even if it's not efficient it might be OK.
It's a somewhat extreme solution, but should be effective. It would also be quite easy to turn this behaviour on and off - a command-line switch will do or a #define if you prefer. I'm sure there's some code out there that does things like this already, so you probably don't have to do it from scratch.
A suggestion:
Assuming that the problem is due to locking, you could dump your mutex & semaphore states from a watchdog thread. With a little bit of work (tracing your call graph), you can determine how you've arrived at a deadlock, which call paths are mutually blocking, etc.
While a crashdump analysis seems to provide a solution for identifying the problem, in my experience this rarely bears much fruit since it lacks sufficient unambiguous detail of what happened just before the crash. Even with the tool you propose, it would provide little more than circumstantial evidence of what happened. I bet the cause is unprotected shared data, so a lock trace wouldn't show it.
The most productive way of finding this—in my experience—is distilling the application's logic to its essence and identifying where conflicts must be occurring. How many threads are there? How many are GUI? At how many points do the threads interact? Yep, this is good old desk checking. Leading suspect interactions can be identified in a day or two, then just convince a small group of skeptics that the interaction is correct.