Do Asynchronous Loggers really help in performance? - c++

We know that synchronous logging, writes the log message to the file and then continues to the program execution. Asynchronous loggers queues the log messages and writes them in a separate thread. I'm starting to implement Log4CPlus in my Project and couple of things came to my mind.
I can't initialize more LogObjects, because that will open more file handles and we don't need that. (I Know we should use Feature based logging objects, example for UploadLogObj,DownloadLogOb,WebReqLogObj,AuthLogObj,etc). Hope each and every addition of log object may increase logging threads too.
Still for argument sake, if i use a Single Log Object and push log messages from Multiple Threads, i suppose there must be some mutex lock to prevent writing to the message queue. My Question won't this mutex lock slow down the process, won't it create performance issue ..?
I'm just wondering how Asynchronous loggers work, i can look into the code, that's one way. But Hope the answers will be enlightening to a lot of people.

Yes, the mutex will slow down the process a bit, but if you are logging from multiple threads to the same destination you will need some form of synchronization anyway, since you don't want lines from different threads to be mixed up.
In the end it's a matter of deciding where to synchronize, not if. With asynchronous logging this happens when the object to be logged is pushed to the queue of the logging thread. In the synchronous case probably at the time the line is written (though it depends on the implementation).
In the first case the time spent inside the mutex will be much shorter and predictable, since no disk flushes happens while in the mutex. This means that you may have less performance degradation and better scaling than in the second case (plus the time that you didn't spend writing the actual data, because the other thread is taking care of it).
If you don't have a lot of threads competing for the mutex anyway it won't a problem. I had the chance to write and use an asynchronous logger for a real-time system some time ago, and we reached disk-bandwidth related issues long before sychronization issues.
One downside of asynchronous logging is more memory related: since you need to pass the data to be logged around you need to be careful and avoid unneeded allocations/deallocations.

Mutex lock takes something like 40-60 nanoseconds (if mutex is not locked by another thread) on modern hardware. This is nothing comparing to IO operation which is theoretically can write file to a slow HDD or network drive for a few seconds.
Lock-free is a different thing - in this case you don't even have mutexes. However, there is price for it - you'll have to write a more complicated code.

Related

Dumping threads' Log Data into Common buffer

My program has different threads and one common logging thread will be running.
All my threads have to dump some logging data into a buffer in logging thread. The logging thread in-turn will write into log file once the buffer reaches some size.
How can i write into the common buffer without affecting the performance of the running threads.? I am thinking of some way without much overhead instead of using mutex or any other sync mechanisms.
You may find that the performance penalties of using a mutex are low enough that it's just not worth the hassle of trying to make a Multi-producer, single consumer queue. However this question asks about such things and there are a few suggestions provided.

Does endless While loop take up CPU resources?

From what I understand, you write your Linux Daemon that listens to a request in an endless loop.
Something like..
int main() {
while(1) {
//do something...
}
}
ref: http://www.thegeekstuff.com/2012/02/c-daemon-process/
I read that sleeping a program makes it go into waiting mode so it doesn't eat up resources.
1.If I want my daemon to check for a request every 1 second, would the following be resource consuming?
int main() {
while(1) {
if (request) {
//do something...
}
sleep(1)
}
}
2.If I were to remove the sleep, does it mean the CPU consumption will go up 100%?
3.Is it possible to run an endless loop without eating resources? Say..if it does nothing but just loops itself. Or just sleep(1).
Endless loops and CPU resources is a mystery to me.
Is it possible to run an endless loop without eating resources? Say..if it does nothing but just loops itself. Or just sleep(1).
There ia a better option.
You can just use a semaphore, which remains blocked at the begining of loop and you can signal the semaphore whenever you want the loop to execute.
Note that this will not eat any resources.
The poll and select calls (mentioned by Basile Starynkevitch in a comment) or a semaphore (mentioned by Als in an answer) are the correct ways to wait for requests, depending on circumstances. On operating systems without poll or select, there should be something similar.
Neither sleep, YieldProcessor, nor sched_yield are proper ways to do this, for the following reasons.
YieldProcessor and sched_yield merely move the process to the end of the runnable queue but leave it runnable. The effect is that they allow other processes at the same or higher priority to execute, but, when those processes are done (or if there are none), then the process that called YieldProcessor or sched_yield continues to run. This causes two problems. One is that lower priority processes still will not run. Another is that this causes the processor to be always running, using energy. We would prefer the operating system to recognize when no process needs to be running and to put the processor into a low-power state.
sleep may permit this low-power state, but it plays a guessing game about how long it will be until the next request comes in, it wakes the processor repeatedly when there is no need, and it makes the process less responsive to requests, since the process will continue sleeping until the expiration of the requested time even if there is a request to be serviced.
The poll and select calls are designed for exactly this situation. They tell the operating system that this process wants to service a request coming in on one of its I/O channels but otherwise has no work to do. This allows the operating system to mark the process as not runnable and to put the processor in a low-power state if suitable.
Using a semaphore provides the same behavior, except that the signal to wake the process comes from another process raising the semaphore instead of activity arising in an I/O channel. Semaphores are suitable when the signal to do some work arrives in this way; simply use whichever of poll or a semaphore is more appropriate for your situation.
The criticism that poll, select, or a semaphore causes a kernel-mode call is irrelevant, because the other methods also cause kernel-mode calls. A process cannot sleep on its own; it has to call the operating system to request it. Similarly, YieldProcessor and sched_yield make requests to the operating system.
The short answer is yes -- removing sleep gives 100% CPU -- but the answer does depend on some additional details. It consumes all CPU it can get, unless...
The loop body is trivial, and optimised away.
The loop contains a blocking operation (like a file or network operation). The link you provide suggests to avoid this, but it is often a good idea to block until something relevant happens.
EDIT : For your scenario, I support the suggestion made by #Als.
EDIT 2: I expect this answer has received a -1 because I claim blocking operations can actually be a good idea. [If you -1, you should leave a motivation in a comment so that we all may learn something.]
Current popular thinking is that non-block (event-based) IO is good and blocking is bad. This view is oversimplified because it assumes all software that performs IO can improve throughput by using non-blocking operations.
What? Am I really suggesting that using non-blocking IO can actually reduce throughput? Yes it can. When a process serves a single activity it is actually better to use blocking IO because blocking IO only burns resources that have already been paid for in the existence of the process.
In contrast, non-blocking IO can carry a greater fixed overhead than simple blocking IO. If the process isn't able to supply additional IO that can be interleaved, then there is nothing gained by paying for non-blocking setup. (In practice, the greatest cost of innapropriate non-blocking IO is simply in the added code complexity. Beyond that, this topic is largely a thought exercise.)
Under blocking IO we rely upon the operating system to schedule those processes that can make progress. That's what the OS is designed to do.
Under non-blocking IO we have greater setup costs but can share the resources of the process and its threads between interleaved work. The non-blocking IO is therefor ideal for any process that serves multiple independent activities, such as a web server. The throughput gained is vastly superior to the fixed cost overheads of non-blocking IO.

fastest way to wake up a thread without using condition variable

I am trying to speed up a piece of code by having background threads already setup to solve one specific task. When it is time to solve my task I would like to wake up these threads, do the job and block them again waiting for the next task. The task is always the same.
I tried using condition variables (and mutex that need to go with them), but I ended up slowing my code down instead of speeding it up; mostly it happened because the calls to all needed functions are very expensive (pthread_cond_wait/pthread_cond_signal/pthread_mutex_lock/pthread_mutex_unlock).
There is no point in using a thread pool (that I don't have either) because it is a too generic construct; here I want to address only my specific task. Depending on the implementation I would also pay a performance penalty for the queue.
Do you have any suggestion for a quick wake-up without using mutex or con_var?
I was thinking in setup threads like timers reading an atomic variable; if the variable is set to 1 the threads will do the job; if it is set to 0 they will go to sleep for few microseconds (I would start with microsecond sleep since I would like to avoid using spinlocks that might be too expensive for the CPU). What do you think about it? Any suggestion is very appreciated.
I am using Linux, gcc, C and C++.
These functions should be fast. If they are taking a large fraction of your time, it is quite possible that you are trying to switch threads too often.
Try buffering up a work queue, and send the signal once a significant amount of work has accumulated.
If this is impossible due to dependencies between the tasks, then your application is not amenable to multithreading at all.
In order to gain performance in a multithreaded application, spawn as many threads as there are CPUs, not a separate thread for each task. Otherwise you end up with a lot of overhead from context switching.
You may also consider making your algorithm more linear (i.e. by using non-blocking calls).

Impact of hundreds of idle threads

I am considering the use of potentially hundreds of threads to implement tasks that manage devices over a network.
This is a C++ application running on a powerpc processor with a linux kernel.
After an initial phase when each task does synchronization to copy data from the device into the task, the task becomes idle, and only wakes up when it receives an alarm, or needs to change some data (configuration), which is rare after the start phase. Once all tasks reach the "idle" phase, I expect that only a few per second will need to wake.
So, my main concern is, if I have hundreds of threads will they have a negative impact on the system once they become idle?
Thanks.
amso
edit:
I'm updating the question based on the answers that I got. Thanks guys.
So it seems that having a ton of threads idling (IO blocked, waiting, sleeping, etc), per se , will not have an impact on the system in terms of responsiveness.
Of course, they will spend extra money for each thread's stack and TLS data but that's okay as long as we throw more memory at the thing (making it more €€€)
But then, other issues have to be accounted for. Having 100s of threads waiting will likely increase memory usage on the kernel, due to the need of wait queues or other similar resources. There's also a latency issue, which looks non-deterministic. To check the responsiveness and memory usage of each solution one should measure it and compare.
Finally, the whole idea of hundreds of threads that will be mostly idling may be modeled like a thread pool. This reduces a bit of code linearity but dramatically increases the scalability of the solution and with propper care can be easily tunable to adjust the compromise between performance and resource usage.
I think that's all. Thanks everyone for their input.
--
amso
Each thread has overhead - most importantly each one has its own stack and TLS. Performance is not that much of a problem since they will not get any time slices unless they actually do anything. You may still want to consider using thread pools.
Chiefly they will use up address space and memory for stacks; once you get, say, 1000 threads, this gets quite significant as I've seen that 10M per thread is typical for stacks (on x86_64). It is changable, but only with care.
If you have a 32-bit processor, address space will be the main limitation once you hit 1000s of threads, you can easily exhaust the AS.
They use up some kernel memory, but probably not as much as userspace.
Edit: of course threads share address space with each other only if they are in the same process; I am assuming that they are.
I'm not a Linux hacker, but assuming that Linux's thread scheduling is similar to Windows'...
Yes, of course the will be some impact. Every bit of memory you consume will potentially have some impact.
However, in a time-sliced environment, threads that are in a Wait/Sleep/Join state will not consume CPU cycles until they are awoken.
I would be worried about offering 1:1 thread-connections mappings, if nothing else because it leaves you rather exposed to denial of service attacks. (pthread_create() is a fairly expensive operation compared to just a call to accept())
EboMike has already answered the question directly - provided threads are blocked and not busy-waiting then they won't consume much in the way of resources although they will occupy memory and swap for all the per-thread state.
I'm learning the basics of the kernel now. I can't give you a specific answer yet; I'm still a noob... but here are some things for you to chew on.
Linux implements each POSIX thread as a unique process. This will create overhead as others have mentioned. In addition to this, your waiting model appears flawed any way you do it. If you create one conditional variable for each thread, then I think (based off of my interpretation of the website below) that you'll actually be expending a lot of kernel memory, as each thread would be placed into its own wait queue. If instead you break your threads up for each group of X threads to share a conditional variable, then you've got problems as well because every time the variable signals, you must wake up _EVERY_DARN_PROCESS_ in that variable's wait queue.
I also assume that you will need to do some object sharing an synchronization. In this case, your code may get slower because of the need to wake up all processes waiting on a resource, as I mentioned earlier.
I know this wasn't much help, but as I said, I'm a kernel noob. Hope it helped a little.
http://book.chinaunix.net/special/ebook/PrenticeHall/PrenticeHallPTRTheLinuxKernelPrimer/0131181637/ch03lev1sec7.html
I'm not sure what "device" you are talking about, but if it's a file descriptor, I'd suggest that you look at starting to migrate to using either poll or epoll (Id suggest the latter given the description of how active you expect each file descriptor to be). That way, you could use one process which would be responsible for all the fds.

Is there a way to abort an SQLite call?

I'm using SQLite3 in a Windows application. I have the source code (so-called SQLite amalgamation).
Sometimes I have to execute heavy queries. That is, I call sqlite3_step on a prepared statement, and it takes a lot of time to complete (due to the heavy I/O load).
I wonder if there's a possibility to abort such a call. I would also be glad if there was an ability to do some background processing in the middle of the call within the same thread (since most of the time is spent in waiting for the I/O to complete).
I thought about modifying the SQLite code myself. In the simplest scenario I could check some condition (like an abort event handle for instance) before every invocation of either ReadFile/WriteFile, and return an error code appropriately. And in order to allow the background processing the file should be opened in the overlapped mode (this enables asynchronous ReadFile/WriteFile).
Is there a chance that interruption of WriteFile may in some circumstances leave the database in the inconsistent state, even with the journal enabled? I guess not, since the whole idea of the journal file is to be prepared for any error of any kind. But I'd like to hear more opinions about this.
Also, did someone tried something similar?
Thanks in advance.
EDIT:
Thanks to ereOn. I wasn't aware of the existence of sqlite3_interrupt. This probably answers my question.
Now, for all of you who wonders how (and why) one expects to do some background processing during the I/O within the same thread.
Unfortunately not many people are familiar with so-called "Overlapped I/O".
http://en.wikipedia.org/wiki/Overlapped_I/O
Using it one issues an I/O operation asynchronously, and the calling thread is not blocked. Then one receives the I/O completion status using one of the completion mechanisms: waitable event, new routine queued into the APC, or the completion port.
Using this technique one doesn't have to create extra threads. Actually the only real legitimation for creating threads is when your bottleneck is the computation time (i.e. CPU load), and the machine has several CPUs (or cores).
And creating a thread just to let it be blocked by the OS most of the time - this doesn't make sense. This leads to the unjustified waste of the OS resources, complicates the program (need for synchronization and etc.).
Unfortunately not all the libraries/APIs allow asynchronous mode of operation, thus making creating extra threads the necessarily evil.
EDIT2:
I've already found the solution, thansk to ereOn.
For all those who nevertheless insist that it's not worth doing things "in background" while "waiting" for the I/O to complete using overlapped I/O. I disagree, and I think there's no point to argue about this. At least this is not related to the subject.
I'm a Windows programmer (as you may noticed), and I have a very extensive experience in all kinds of multitasking. Plus I'm also a driver writer, so that I also know how things work "behind the scenes".
I know that it's a "common practice" to create several threads to do several things "in parallel". But this doesn't mean that this is a good practice. Please allow me not to follow the "common practice".
I don't understand why you want the interruption to come from the same thread and I even don't understand how that would be possible: if the current thread is blocked, waiting for some IO, you can't execute any other code. (Yeah, that's what "blocked" means)
Perhaps if you give us more hints about why you want this, we might help further.
Usually, I use sqlite3_interrupt() to cancel calls. But this, obviously, involves that the call is made from another thread.
By default, SQLite is threadsafe. It sounds to me like the easiest thing to do would be to start the Sqlite command on a background thread, and let SQLite to the necessary locking to have that work.
From your perspective then, the sqlite call looks like an asynchronous bit of I/O, and you can continue normal processing on this thread, such as e.g. using a loop including interruptible sleep and a bit of occasional background processing (e.g. to update a liveness indicator). When the SQLite statement completes, the background thread should set a state variable to indicate this, wake the main thread (if necessary), and terminate.