I have a performance issue where clients are creating hundreds of a particular kind of object "Foo" in my C++ application's DOM. Each Foo instance has its own asynchronous work queue with its own thread. Obviously, that doesn't scale.
I need to share threads amongst work queues, and I don't want to re-invent the wheel. I need to support XP, so I can't use the Vista/Win7 thread pool. The work that needs to be done to process each queue item involves making COM calls in the multi-threaded COM apartment. The documentation for the XP thread pool says that it is okay to call CoInitializeEx() with the MTA apartment in the thread worker function callback. I've written a test app and verified that this works. I made the app run 1 million iterations with and without a CoInitializeEx/CoUninitialize pair in the WorkItem callback function. It takes 35 seconds with the CoInit* calls and 5 seconds without them. That's way too much overhead for my application. Since the thread pool is per-process and 3rd-party code runs in my process, I'm assuming it isn't safe to CoInitializeEx() once per thread and never CoUninitialize().
Given all of that, is there any way that I can use the Win32 thread pool? Am I missing something, or is the XP thread pool pretty useless for high-performance COM applications? Am I just going to have to create my own thread-sharing system?
Have you verified what is taking so long? i.e. is it the call to CoInitializeEx()? You definitely don't need to call CoInitialize once per task. You also don't say how many threads you spawn, i.e. if your running on a dual core and your work is CPU intensive don't expect more than a 2x speedup, and if your work isn't CPU intensive then it's waiting on some resource (memory, disk, net) and speedups will be similarly constrained, perhaps made worse if there is a lock being held for that resource.
If you can use Visual Studio 2010 take a look at the Parallel Pattern Library and Asynchronous Agents Library, there are a couple tools that can help make this take less code to write.
If you can't you can at least try placing a token in TLS that represents whether COM has been initialized on that thread and use the presence of this token to bypass your calls to CoInitialize when they aren't needed.
I'm assuming it isn't safe to CoInitializeEx() once per thread and never CoUninitialize().
Windows will clean up if a thread exits without calling CoUninitialize, we know this works because if it didn't there would be no cleanup when threads crash or are aborted.
So the only way this hack could cause a problem is of someone was trying to queue work items that needed an STA apartment, which seem unlikely.
I'd be tempted to go for it.
Related
I would like to have your opinion for this general technical concept. (I am working on microsoft windows OS)
There is a Process, this process creates multiple threads for different tasks.
Main process: it is a windows service written by C# code.
There are several threads that are create inside the main process: Thread_01, Thread_02, ...
Inside Thread_01: There is a Wrapper dll written in managed C++ to consume DLL_01. (DLL_01 is a dll written by me in native C++ code, that provides some APIs: Add, Remove, Connect)
Add and Remove can run very fast, but Connect may take more than 10 seconds and blocks the caller until it finishes.
I am thinking to use std::async to do the Connect function code, and send the result through a callback to the caller (main process).
Is it a good approach? I heard we cannot create or it is better not to create any thread inside inner threads, is it true? If so, how about std::async ?
Any recommendation is appreciated.
Thanks in advance,
None of what you describe makes the use of threads inacceptable for your code.
As usual, threads have issues that need to be cared for:
Data races due to access to shared data.
Problems of ownership of resources is now not just "Who own what?" but "Who and when owns what?".
When a thread is blocked and you want to abort this operation, how do you cancel it without causing issues down the line? In your case, you must avoid calling the callback, when the receiver doesn't exist any more.
Concerning your approach of using a callback, consider std::future<> instead. This takes care of a few of the issues above, though some are only shifted to the caller instead.
Suppose I have a multi-threaded program in C++11, in which each thread controls the behavior of something displayed to the user.
I want to ensure that for every time period T during which one of the threads of the given program have run, each thread gets a chance to execute for at least time t, so that the display looks as if all threads are executing simultaneously. The idea is to have a mechanism for round robin scheduling with time sharing based on some information stored in the thread, forcing a thread to wait after its time slice is over, instead of relying on the operating system scheduler.
Preferably, I would also like to ensure that each thread is scheduled in real time.
In case there is no way other than relying on the operating system, is there any solution for Linux?
Is it possible to do this? How?
No that's not cross-platform possible with C++11 threads. How often and how long a thread is called isn't up to the application. It's up to the operating system you're using.
However, there are still functions with which you can flag the os that a special thread/process is really important and so you can influence this time fuzzy for your purposes.
You can acquire the platform dependent thread handle to use OS functions.
native_handle_type std::thread::native_handle //(since C++11)
Returns the implementation defined underlying thread handle.
I just want to claim again, this requires a implementation which is different for each platform!
Microsoft Windows
According to the Microsoft documentation:
SetThreadPriority function
Sets the priority value for the specified thread. This value, together
with the priority class of the thread's process determines the
thread's base priority level.
Linux/Unix
For Linux things are more difficult because there are different systems how threads can be scheduled. Under Microsoft Windows it's using a priority system but on Linux this doesn't seem to be the default scheduling.
For more information, please take a look on this stackoverflow question(Should be the same for std::thread because of this).
I want to ensure that for every time period T during which one of the threads of the given program have run, each thread gets a chance to execute for at least time t, so that the display looks as if all threads are executing simultaneously.
You are using threads to make it seem as though different tasks are executing simultaneously. That is not recommended for the reasons stated in Arthur's answer, to which I really can't add anything.
If instead of having long living threads each doing its own task you can have a single queue of tasks that can be executed without mutual exclusion - you can have a queue of tasks and a thread pool dequeuing and executing tasks.
If you cannot, you might want to look into wait free data structures and algorithms. In a wait free algorithm/data structure, every thread is guaranteed to complete its work in a finite (and even specified) number of steps. I can recommend the book The Art of Multiprocessor Programming where this topic is discussed in length. The gist of it is: every lock free algorithm/data structure can be modified to be wait free by adding communication between threads over which a thread that's about to do work makes sure that no other thread is starved/stalled. Basically, prefer fairness over total throughput of all threads. In my experience this is usually not a good compromise.
I have a problem with long running boost::regex_match(...) invocation in a threaded process environment. But it could be another lib (API call) having the same problem.
Is there a generic way to set up a watchdog for such?
For non-threaded process alarm() can be used to detect timeout.
However, signals don't play nicely with threads. I can avoid direct use of alarm() in the thread and delegate timer mgt. to a dedicated separate thread and let that one use pthread_kill(...) to address the correct threads (this is just an idea - i didn't yet verify that part).
However, also this only interrupts and detects the situation, but cannot gracefully stop boost::regex_match(...).
I played around with Throwing an exception from within a signal handler using sigsetjmp() and siglongjmp() for the thread using boost::regex_match(..).
But it causes memory leaks in boost::regex_match(...) becausesiglongjmp()` bypasses destructors.
How can i gracefully stop a 3rd party API call - presuming that it's implemented exception safe?
Or does it have to be supported by some "stoppable" feature actively implemented in the 3rd party API? (is there some for the boost library?)
Maybe some strange idea, but:
Code can be implemented to be "thread-safe" and/or "exception-safe".
Would it be an option to define "longjmp-safe"? This could be done by passing an additional token to a lib to let is associate all resource allocations to that token. After longjmp() the client SW could ask the API separately to release those resources.
simpler maybe would just be some central init()/release() or register()/unregister() API call, by which the API could clean-up itself.
In a case where you have to:
monitor exceeding execution time
stop execution of processing
you should simply think for tasks instead of threads.
Using threads is something which sounds like "state of the art" but in practice tasks are very often the better way of implementation. Especially for controlling memory leeks in "undefined" end of execution, confine unwanted memory excess and control stack overruns etc.
In the case you have mentioned I tend to implement that as tasks. IPC works well on all known platforms but is not portable. If portability is no problem, changing to a task based solution is not a big deal.
A hanging task can be killed by a os call and all locks, memory and other resources like ipc/shared memory/pipes etc. will be removed automatically. So this fits much better to your problem and it did not depend on your external and maybe unchangeable third party components.
I wanted to Discuss the Design and technical issue/challenges related with multi threaded application.
Issue I faced
1.I came across the situation where there is multiple thread is using the shared function/variable crash the application, so proper guard is required on that occasion.
2. State Machine and Multi thread-
There are several point one should remember before delve in to the multi thread application.
There can issue related to 1. Memory 2. Handle 3. Socket etc.
please share your experience on the following point
what are the common mistake one do in the multi threaded application
Any specific issue related to multi threaded.
Should we pass data by value or by referen in the thread function.
Well, there are so many...
1) Shared functions/procedures - they are just code and, unless the code modifies itself, there can be no problem. Local variables are no problem because each thread calls on a separate stack, (amost by definition:). Any other data can an issue and may need protection. 99.99% of all household API calls on multiTasking OS are thread-safe, again, almost by definition. Another poster has already warned about thread-local storage...
2) State machines. Can be a little awkward. You can easly lock all the events firing into the SM, so ensuring the integrity of the state, but you must not make blocking calls from inside the SM while it is locked, (might seem obvious, but I have done this.. once :).
I occasionally run state-machines from one thread only, queueing event objects to it. This moves the locking to the input queue and means that the SM is somewhat easier to debug. It also means that the thread running the SM can implement timeouts on an internal delta queue and so itself fire timeout calls to the objects on the delta queue, (classic example: TCP server sockets with connection timeouts - thousands of socket objects that each need an independent timeout).
3) 'Should we pass data by value or by referen in the thread function.'. Not sure what you mean, here. Most OS allow one pointer to be passed on thread creation - do with it what you will. You could pass it an event it should signal on work completion or a queue object upon which it is to wait for work requests. After creation, you need some form of inter-thread comms to send requests and get results, (unless you are going to use the direct 'read/write/waitForExit' mechanism - AV/deadlock/noClose generator).
I usually use a simple semaphore/CS producer-consumer queue to send/receive comms objects between worker threads, and the PostMessage API to send them to a UI thread. Apart from the locking in the queue, I don't often need any more locking. You have to try quite hard to deadlock a threaded system based on message-passing and things like thread pools become trivial - just make [no. of CPU] threads and pass each one the same queue to wait on.
Common mistakes. See the other posters for many, to which I would add:
a) Reading/writing directly to thread fields to pass parameters and return results, (esp. between UI threads and 'worker' threads), ie 'Create thread suspended, load parameters into thread fields, resume thread, wait on thread handle for exit, read results from thread fields, free thread object'. This causes performance hit from continually creating/terminating/destroying threads and often forces the developer to ensure that thread are terminated when exiting an app to prevent AV/216/217 exceptions on close. This can be very tricky, in some cases impossible because a few API's block with no way of unblocking them. If developers would stop this nasty practice, there would be far fewer app close problems.
b) Trying to build multiThreaded apps in a procedural fashion, eg. trying to wait for results from a work thread in a UI event handler. Much safer to build a thread request object, load it with parameters, queue it to a work thread and exit the event handler. The thread can get the object, do work, put results back into the object and, (on Windows, anyway), PostMessage the object back. A UI message-handler can deal with the results and dispose of the object, (or recycle, reuse:). This approach means that, since the UI and worker are always operating on different data that can outlive them both, no locking and, (usually), no need to ensure that the work thread is freed when closing the app, (problems with this are ledgendary).
Rgds,
Martin
The biggest issue people face in multi threading applications are race conditions, deadlocks and not using semaphores of some sort to protect globally accessible variables.
You are facing these problems when using thread locks.
Deadlock
Priority Inversion
Convoying
“Async-signal-safety”
Kill-tolerant availability
Preemption tolerance
Overall performance
If you want to look at more advanced threading techniques you can look at the lock free threading, where many threads work on the same problem in case they are waiting.
Deadlocks, memory corruption (of shared resources) due to lack of proper synchronization, buffer overflow (even that can be occured due to memory corruption), improper usage of thread local storage are the most common things
Also it depends on under which platform and technology you're using to implement the thread. For e.g. in Microsoft Windows, if you use MFC objects, several MFC objects are not really shareable across threads because they're heavily rely on thread local storage (e.g CSocket, CWnd classes etc.)
The Windows and Solaris thread APIs both allow a thread to be created in a "suspended" state. The thread only actually starts when it is later "resumed". I'm used to POSIX threads which don't have this concept, and I'm struggling to understand the motivation for it. Can anyone suggest why it would be useful to create a "suspended" thread?
Here's a simple illustrative example. WinAPI allows me to do this:
t = CreateThread(NULL,0,func,NULL,CREATE_SUSPENDED,NULL);
// A. Thread not running, so do... something here?
ResumeThread(t);
// B. Thread running, so do something else.
The (simpler) POSIX equivalent appears to be:
// A. Thread not running, so do... something here?
pthread_create(&t,NULL,func,NULL);
// B. Thread running, so do something else.
Does anyone have any real-world examples where they've been able to do something at point A (between CreateThread & ResumeThread) which would have been difficult on POSIX?
To preallocate resources and later start the thread almost immediately.
You have a mechanism that reuses a thread (resumes it), but you don't have actually a thread to reuse and you must create one.
It can be useful to create a thread in a suspended state in many instances (I find) - you may wish to get the handle to the thread and set some of it's properties before allowing it to start using the resources you're setting up for it.
Starting is suspended is much safer than starting it and then suspending it - you have no idea how far it's got or what it's doing.
Another example might be for when you want to use a thread pool - you create the necessary threads up front, suspended, and then when a request comes in, pick one of the threads, set the thread information for the task, and then set it as schedulable.
I dare say there are ways around not having CREATE_SUSPENDED, but it certainly has its uses.
There are some example of uses in 'Windows via C/C++' (Richter/Nasarre) if you want lots of detail!
There is an implicit race condition in CreateThread: you cannot obtain the thread ID until after the thread started running. It is entirely unpredictable when the call returns, for all you know the thread might have already completed. If the thread causes any interaction in the rest of that process that requires the TID then you've got a problem.
It is not an unsolvable problem if the API doesn't support starting the thread suspended, simply have the thread block on a mutex right away and release that mutex after the CreateThread call returns.
However, there's another use for CREATE_SUSPENDED in the Windows API that is very difficult to deal with if API support is lacking. The CreateProcess() call also accepts this flag, it suspends the startup thread of the process. The mechanism is identical, the process gets loaded and you'll get a PID but no code runs until you release the startup thread. That's very useful, I've used this feature to setup a process guard that detects process failure and creates a minidump. The CREATE_SUSPEND flag allowed me to detect and deal with initialization failures, normally very hard to troubleshoot.
You might want to start a thread with some other (usually lower) priority or with a specific affinity mask. If you spawn it as usual it can run with undesired priority/affinity for some time. So you start it suspended, change the parameters you want, then resume the thread.
The threads we use are able to exchange messages, and we have arbitrarily configurable priority-inherited message queues (described in the config file) that connect those threads. Until every queue has been constructed and connected to every thread, we cannot allow the threads to execute, since they will start sending messages off to nowhere and expect responses. Until every thread was constructed, we cannot construct the queues since they need to attach to something. So, no thread can be allowed to do work until the very last one was configured. We use boost.threads, and the first thing they do is wait on a boost::barrier.
I stumbled with a similar problem once upon I time. The reasons for suspended initial state are treated in other answer.
My solution with pthread was to use a mutex and cond_wait, but I don't know if it is a good solution and if can cover all the possible needs. I don't know, moreover, if the thread can be considered suspended (at the time, I considered "blocked" in the manual as a synonim, but likely it is not so)