Should a win32 program always be multi-threaded - c++

Right now I am writing a win32 / opengl application that has 2 threads per window. One thread deals with opengl drawing, the other deals with the windows message events. My question is, should I only use 1 thread for ALL windows messages? Will that cause problems such as windows not responding occasionally?
I'm using multiple windows message loops, all on different threads. It seems to me that the message loop was designed for 1 thread, and only one appearance in a process. Is this correct?

should I only use 1 thread for ALL windows messages?
You may, or nay. It is not enforced by the OS, but it may be by your GUI framework.
Will that cause problems such as windows not responding occasionally?
It will not, in itself, cause that problem. Poorly-responding message-loops are usually caused by performing too much work in wndprocs/event-handlers for windows that handle messages from the OS UI drivers, or actually waiting in them for something instead of returning to the GetMessage call in a timely manner. The OS detects that messages from KB etc. are not getting handled, and tends to ghost the window and generally moan about a 'Not responding' application.
IF a WMQ is used to communcate with a thread that does not process UI messages, eg, those with message number WM_APP upwards, the OS will take no action if the thread handling such messages performs lengthy and/or blocking actions before getting back to its GetMessage call.
It seems to me that the message loop was designed for 1 thread, and
only one appearance in a process. Is this correct?
No, it is not.
Windows Message Queues, and associated GetMessage() loops, can be, and often are, used to communicate between threads of a process. WMQ are specialised producer-consumer queues, primarily designed to communicate GUI messages. As such, they have constraints on message format, and only one thread can wait on a queue, but WMQ can be used to communicate between non-GUI threads.
It is correct that windows are bound to the threads that create them, and that many GUI framewoks are designed/written in such a way that it is not safe to use them from multiple threads, but many Windows message queues and message-handlers in one process are certainly possible.

This is implementation dependent. Multi-threading makes often sense if to independent processes shouldn't disturb the program flow. However there are other ways to achieve similar things too. E.g. you could use timers to break the flow for a couple of ms and execute what the other thread would do.
If you would recognize that your window event thread is getting on it's limits, than I would think first, that you probably are not doing event handling only but also greater calculations. There it would make sense to start a new thread.
Edit: I am not a windows pro. But nearly all implementations I know use for the event system only one thread (loop). Qt has some elegant ways to circumvent breaks and to extent the event system by spawning new threads in different ways. It also supports Signal/Slots in combination with timers. Maybe you are interested to use it.

Related

Run function periodically in Cap'n Proto RPC server

I have a Cap'n Proto RPC server that runs some OpenGL commands in a window. I am not interested in the window's events at all, but in order to avoid getting killed on Windows I need to poll events once a second or so. How can I do this in a simple fashion?
I have read that you can make your own EventPort, but I couldn't figure out how to actually use EventPorts. It might also be overkill when I'm not actually interested in the events. I would like prioritize RPC events over polling the window if possible.
Using something else than EZ-rpc is not a downside, as I want to move to shared memory communication later on.
So, there's this critical flaw in Windows event handling: The best way to handle network I/O, especially with many connections, is via I/O Completion Ports (IOCP). However, unfortunately, Windows provides no way for a thread to wait on IOCP events and GUI events in the same thread. This seems to be a serious design flaw in the Win32 API, yet it's been this way for decades. Weirder still, the internal NT kernel APIs do in fact support an alternative (specifically, they allow I/O completion events to be delivered via APC) but Microsoft hasn't made these APIs public, so applications that use them could break in a future version of Windows.
As a result, there are essentially two ways to design a program that simultaneously does network I/O and implements a GUI:
Use a MsgWaitForMultipleObjectsEx-based event loop instead of IOCP. You will be limited to no more than 64 connections, and the event loop will be relatively inefficient.
Have separate threads for network and GUI.
For your use case, it sounds like #1 would probably be fine, but there's another problem: The KJ event loop library (used by Cap'n Proto) doesn't implement this case yet. It only implements IOCP-based networking. There's a class Win32WaitObjectThreadPool defined in kj/async-win32.h meant to handle the GUI event loop approach... but at present it is not implemented. (PRs are welcome if you'd like to contribute!)
If you truly don't care about handling GUI events in a timely fashion, then perhaps a hack would work: You could use kj::Timer to create a loop that waits for a second, then checks the Win32 GUI event queue, then waits again, and so on. This is really ugly but would probably be easy to implement. I'm not sure if kj::Timer is exposed via EZ-rpc, so you may have to go to lower-level building blocks like kj::setupAsyncIo() instead.

Win32 API deadlocks while using different threads

I am experience deadlock while trying to use WIN32 API from additional thread. The additional thread is needed in my application to improve Frame Rate. It actually helps, however, I get deadlocks in almost all of the system functions:
::ShowWindow
::MoveWindow
::UpdateWindow
I know that ShowWindow() for example, may be replaced with ShowWindowAsync() and it does solves the problem, however, there are no such alternatives in MoveWindow() and UpdateWindow().
Did someone experienced those problems, what is solution?
Thanks!
The term "deadlock" describes a very specific thing, two threads waiting for access to a resource that is locked by the other. There is no indication that this is what is happening in your case (or is there?), so what exactly is it that you are experiencing? Also, what exactly is it that you want to achieve with multithreading?
In any case, keep the UI in a single thread, use SendMessage() & Co to notify that thread of any events occurring in background threads. Alternatively, you can also use a timer to poll for certain state changes. That way, you are on the safe side and your application shouldn't lock up (at least not because of using the UI from different threads).
To be a bit more precise, you have to keep the message loop for a window and all its child windows in a single thread. You can create multiple windows and handle each of them from their own thread, but don't mix calls. In practice, this distinction isn't important though, because few applications create multiple windows (and no, e.g. a message box or other dialogs don't count).
All the API functions that you refer to have in common that they send(!) some message to the target window. UpdateWindow is probably the most obvious, because it needs to send WM_PAINT. Notice also that it "sends" the message and doesn't post to the queue (for UpdateWindow, the MSDN documentation calls this out explicitly, for the others it may be less obvious).
Also notice that windows have thread affinity as alluded to in some of the comments. Among other things this means that messages to that window are only ever received/dispatched on one thread. If you send a message to a window of another thread, the operating system is left with the task to determine when it should dispatch that message (i.e. call the window procedure). This (dispatching incoming sent messages) only happens during certain API calls during which it can be assumed to be safe to invoke the window procedure with a random message. The relevant times are during GetMessage and PeekMessage*.
So if your window owning thread (also called UI thread) is pumping messages normally, incoming sent messages are also quickly dispatched. From your question it seems however, that your UI thread is currently busy. If the second thread then invokes one of said functions, then it will block until the first thread provides a chance to have the sent messages dispatched.
As others have said, it is usually a good idea to keep user interface code on one dedicated UI thread (although exceptions - as always - prove the rule). And it is definitely necessary (for a good user experience) to have window owning threads be responsive to messages at all times. If your UI thread also has to wait on some synchronization objects, you may find MsgWaitForMultipleObjects helpful.
*the list might not be complete.

Design and Technical issue in Multi Threaded Application

I wanted to Discuss the Design and technical issue/challenges related with multi threaded application.
Issue I faced
1.I came across the situation where there is multiple thread is using the shared function/variable crash the application, so proper guard is required on that occasion.
2. State Machine and Multi thread-
There are several point one should remember before delve in to the multi thread application.
There can issue related to 1. Memory 2. Handle 3. Socket etc.
please share your experience on the following point
what are the common mistake one do in the multi threaded application
Any specific issue related to multi threaded.
Should we pass data by value or by referen in the thread function.
Well, there are so many...
1) Shared functions/procedures - they are just code and, unless the code modifies itself, there can be no problem. Local variables are no problem because each thread calls on a separate stack, (amost by definition:). Any other data can an issue and may need protection. 99.99% of all household API calls on multiTasking OS are thread-safe, again, almost by definition. Another poster has already warned about thread-local storage...
2) State machines. Can be a little awkward. You can easly lock all the events firing into the SM, so ensuring the integrity of the state, but you must not make blocking calls from inside the SM while it is locked, (might seem obvious, but I have done this.. once :).
I occasionally run state-machines from one thread only, queueing event objects to it. This moves the locking to the input queue and means that the SM is somewhat easier to debug. It also means that the thread running the SM can implement timeouts on an internal delta queue and so itself fire timeout calls to the objects on the delta queue, (classic example: TCP server sockets with connection timeouts - thousands of socket objects that each need an independent timeout).
3) 'Should we pass data by value or by referen in the thread function.'. Not sure what you mean, here. Most OS allow one pointer to be passed on thread creation - do with it what you will. You could pass it an event it should signal on work completion or a queue object upon which it is to wait for work requests. After creation, you need some form of inter-thread comms to send requests and get results, (unless you are going to use the direct 'read/write/waitForExit' mechanism - AV/deadlock/noClose generator).
I usually use a simple semaphore/CS producer-consumer queue to send/receive comms objects between worker threads, and the PostMessage API to send them to a UI thread. Apart from the locking in the queue, I don't often need any more locking. You have to try quite hard to deadlock a threaded system based on message-passing and things like thread pools become trivial - just make [no. of CPU] threads and pass each one the same queue to wait on.
Common mistakes. See the other posters for many, to which I would add:
a) Reading/writing directly to thread fields to pass parameters and return results, (esp. between UI threads and 'worker' threads), ie 'Create thread suspended, load parameters into thread fields, resume thread, wait on thread handle for exit, read results from thread fields, free thread object'. This causes performance hit from continually creating/terminating/destroying threads and often forces the developer to ensure that thread are terminated when exiting an app to prevent AV/216/217 exceptions on close. This can be very tricky, in some cases impossible because a few API's block with no way of unblocking them. If developers would stop this nasty practice, there would be far fewer app close problems.
b) Trying to build multiThreaded apps in a procedural fashion, eg. trying to wait for results from a work thread in a UI event handler. Much safer to build a thread request object, load it with parameters, queue it to a work thread and exit the event handler. The thread can get the object, do work, put results back into the object and, (on Windows, anyway), PostMessage the object back. A UI message-handler can deal with the results and dispose of the object, (or recycle, reuse:). This approach means that, since the UI and worker are always operating on different data that can outlive them both, no locking and, (usually), no need to ensure that the work thread is freed when closing the app, (problems with this are ledgendary).
Rgds,
Martin
The biggest issue people face in multi threading applications are race conditions, deadlocks and not using semaphores of some sort to protect globally accessible variables.
You are facing these problems when using thread locks.
Deadlock
Priority Inversion
Convoying
“Async-signal-safety”
Kill-tolerant availability
Preemption tolerance
Overall performance
If you want to look at more advanced threading techniques you can look at the lock free threading, where many threads work on the same problem in case they are waiting.
Deadlocks, memory corruption (of shared resources) due to lack of proper synchronization, buffer overflow (even that can be occured due to memory corruption), improper usage of thread local storage are the most common things
Also it depends on under which platform and technology you're using to implement the thread. For e.g. in Microsoft Windows, if you use MFC objects, several MFC objects are not really shareable across threads because they're heavily rely on thread local storage (e.g CSocket, CWnd classes etc.)

Proper message queue usage in POSIX

I'm quite bewildered by the use of message queues in realtime OS. The code that was given seems to have message queues used down to the bone: even passing variables to another class object is done through MQ. I always have a concept of MQ used in IPC. Question is: what is a proper use of a message queue?
In realtime OS environments you often face the problem that you have to guarantee execution of code at a fixed schedule. E.g. you may have a function that gets called exactly each 10 milliseconds. Not earlier, not later.
To guarantee such hard timing constraints you have to write code that must not block the time critical code under any circumstances.
The posix thread synchronization primitives from cannot be used here.
You must never lock a mutex or aqurie a semaphore from time critical code because a different process/thread may already have it locked. However, often you are allowed to unblock some other thread from time critical code (e.g. releasing a semaphore is okay).
In such environments message queues are a nice choice to exchange data because they offer a clean way to pass data from one thread to another without ever blocking.
Using queues to just set variables may sound like overkill, but it is very good software design. If you do it that way you have a well-defined interface to your time critical code.
Also it helps to write deterministic code because you'll never run into the problem of race-conditions. If you set variables via message-queues you can be sure that the time critical code sees the messages in the same order as they have been sent. When mixing direct memory access and messages you can't guarantee this.
Message Queues are predominantly used as an IPC Mechanism, whenever there needs to be exchange of data between two different processes. However, sometimes Message Queues are also used for thread context switching. For eg:
You register some callback with a software layer which sits on top of driver. The callback is returned to you in the context of the driver. It is a thread spawned by the driver. Now you cannot hog this thread of driver by doing a lot of processing in it. So one may add the data returned in callback in a message Queue, which has application threads blocked on it for performing the processing on the data.
I dont see why one should use Message Queues for replacing just normal function calls.

More threads, better performance?

When I write a message driven app. much like a standard windows app only that it extensively uses messaging for internal operations, what would be the best approach regarding to threading?
As I see it, there are basically three approaches (if you have any other setup in mind, please share):
Having a single thread process all of the messages.
Having separate threads for separate message types (General, UI, Networking, etc...)
Having multiple threads that share and process a single message queue.
So, would there be any significant performance differences between the three?
Here are some general thoughts:
Obviously, the last two options benefit from a situation where there's more than one processor. Plus, if any thread is waiting for an external event, other threads can still process unrelated messages. But ignoring that, seems that multiple threads only add overhead (Thread switches, not to mention more complicated sync situations).
And another question: Would you recommend to implement such a system upon the standard Windows messaging system, or to implement a separate queue mechanism, and why?
The specific choice of threading model should be driven by the nature of the problem you are trying to solve. There isn't necessarily a single "correct" approach to designing the threading model for such an application. However, if we adopt the following assumptions:
messages arrive frequently
messages are independent and don't rely too heavily on shared resources
it is desirable to respond to an arriving message as quickly as possible
you want the app to scale well across processing architectures (i.e. multicode/multi-cpu systems)
scalability is the key design requirement (e.g. more message at a faster rate)
resilience to thread failure / long operations is desirable
In my experience, the most effective threading architecture would be to employ a thread pool. All messages arrive on a single queue, multiple threads wait on the queue and process messages as they arrive. A thread pool implementation can model all three thread-distribution examples you have.
#1 Single thread processes all messages => thread pool with only one thread
#2 Thread per N message types => thread pool with N threads, each thread peeks at the queue to find appropriate message types
#3 Multiple threads for all messages => thread pool with multiple threads
The benefits of this design is that you can scale the number of threads in the thread in proportion to the processing environment or the message load. The number of threads can even scale at runtime to adapt to the realtime message load being experienced.
There are many good thread pooling libraries available for most platforms, including .NET, C++/STL, Java, etc.
As to your second question, whether to use standard windows message dispatch mechanism. This mechanism comes with significant overhead and is really only intended for pumping messages through an windows application's UI loop. Unless this is the problem you are trying to solve, I would advise against using it as a general message dispatching solution. Furthermore, windows messages carry very little data - it is not an object-based model. Each windows message has a code, and a 32-bit parameter. This may not be enough to base a clean messaging model on. Finally, the windows message queue is not design to handle cases like queue saturation, thread starvation, or message re-queuing; these are cases that often arise in implementing a decent message queing solution.
We can't tell you much for sure without knowing the workload (ie, the statistical distribution of events over time) but in general
single queue with multiple servers is at least as fast, and usually faster, so 1,3 would be preferable to 2.
multiple threads in most languages add complexity because of the need to avoid contention and multiple-writer problems
long duration processes can block processing for other things that could get done quicker.
So horseback guess is that having a single event queue, with several server threads taking events off the queue, might be a little faster.
Make sure you use a thread-safe data structure for the queue.
It all depends.
For example:
Events in a GUI queue are best done by a single thread as there is an implied order in the events thus they need to be done serially. Which is why most GUI apps have a single thread to handle events, though potentially multiple events to create them (and it does not preclude the event thread from creating a job and handling it off to a worker pool (see below)).
Events on a socket can potentially by done in parallel (assuming HTTP) as each request is stateless and can thus by done independently (OK I know that is over simplifying HTTP).
Work Jobs were each job is independent and placed on queue. This is the classic case of using a set of worker threads. Each thread does a potentially long operation independently of the other threads. On completion comes back to the queue for another job.
In general, don't worry about the overhead of threads. It's not going to be an issue if you're talking about merely a handful of them. Race conditions, deadlocks, and contention are a bigger concern, and if you don't know what I'm talking about, you have a lot of reading to do before you tackle this.
I'd go with option 3, using whatever abstractions my language of choice offers.
Note that there are two different performance goals, and you haven't stated which you are targetting: throughput and responsiveness.
If you're writing a GUI app, the UI needs to be responsive. You don't care how many clicks per second you can process, but you do care about showing some response within a 10th of a second or so (ideally less). This is one of the reasons it's best to have a single thread devoted to handling the GUI (other reasons have been mentioned in other answers). The GUI thread needs to basically convert windows messages into work-items and let your worker queue handle the heavy work. Once the worker is done, it notifies the GUI thread, which then updates the display to reflect any changes. It does things like painting a window, but not rendering the data to be displayed. This gives the app a quick "snapiness" that is what most users want when they talk about performance. They don't care if it takes 15 seconds to do something hard, as long as when they click on a button or a menu, it reacts instantly.
The other performance characteristic is throughput. This is the number of jobs you can process in a specific amount of time. Usually this type of performance tuning is only needed on server type applications, or other heavy-duty processing. This measures how many webpages can be served up in an hour, or how long it takes to render a DVD. For these sort of jobs, you want to have 1 active thread per CPU. Fewer than that, and you're going to be wasting idle clock cycles. More than that, and the threads will be competing for CPU time and tripping over each other. Take a look at the second graph in this article DDJ articles for the trade-off you're dealing with. Note that the ideal thread count is higher than the number of available CPUs due to things like blocking and locking. The key is the number of active threads.
A good place to start is to ask yourself why you need multiple threads.
The well-thought-out answer to this question will lead you to the best answer to the subsequent question, "how should I use multiple threads in my application?"
And that must be a subsequent question; not a primary question. The fist question must be why, not how.
I think it depends on how long each thread will be running. Does each message take the same amount of time to process? Or will certain messages take a few seconds for example. If I knew that Message A was going to take 10 seconds to complete I would definitely use a new thread because why would I want to hold up the queue for a long running thread...
My 2 cents.
I think option 2 is the best. Having each thread doing independant tasks would give you best results. 3rd approach can cause more delays if multiple threads are doing some I/O operation like disk reads, reading common sockets and so on.
Whether to use Windows messaging framework for processing requests depends on the work load each thread would have. I think windows restricts the no. of messages that can be queued at the most to 10000. For most of the cases this should not be an issue. But if you have lots of messages to be queued this might be some thing to take into consideration.
Seperate queue gives a better control in a sense that you may reorder it the way you want (may be depending on priority)
Yes, there will be performance differences between your choices.
(1) introduces a bottle-neck for message processing
(3) introduces locking contention because you'll need to synchronize access to your shared queue.
(2) is starting to go in the right direction... though a queue for each message type is a little extreme. I'd probably recommend starting with a queue for each model in your app and adding queues where it makes since to do so for improved performance.
If you like option #2, it sounds like you would be interested in implementing a SEDA architecture. It is going to take some reading to understand what is going on, but I think the architecture fits well with your line of thinking.
BTW, Yield is a good C++/Python hybrid implementation.
I'd have a thread pool servicing the message queue, and make the number of threads in the pool easily configurable (perhaps even at runtime). Then test it out with expected load.
That way you can see what the actual correlation is - and if your initial assumptions change, you can easily change your approach.
A more sophisticated approach would be for the system to introspect its own performance traits and adapt it's use of resources, threads in particular, as it goes. Probably overkill for most custom application code, but I'm sure there are products that do that out there.
As for the windows events question - I think that's probably an application specific question that there is no right or wrong answer to in the general case. That said, I usually implement my own queue as I can tailor it to the specific characteristics of the task at hand. Sometimes that might involve routing events via the windows message queue.