To Marshal or Not to Marshal - c++

I could not find the definite answer to the following question: if a COM class is thread safe, that is it is marked with Both or Free, do I really need to marshal its object interface to pass it to another thread in the same process? I do not ask about the case if both threads belong to MTA, I ask about the case when each thread belongs to it's own STA.
I know about the rule to marshal interfaces between threads which belong to different apartments, my question is what happens if I pass a raw interface pointer to a thread in a different apartment and each thread calls methods on the object which is thread safe?
According to my experience it works fine, my question is, if it is a matter of time, and dangerous and could lead to a crash because of any reason, or it's completely safe and just nice to have rule?

TL;DR - always marshal... always.
Why? COM then knows about it and will do the right thing...
... do I really need to marshal its object interface to pass it to another thread in the same process?
Yes. Always.
The COM rule here is that accessing a COM object must always be done in the same apartment (read on the same thread for STAs) as it was created in. If you subvert this (even if it appears to work), you can run into a deadlock between COM calls because objects in separate apartments land up waiting on each other.
If COM sees that the source and target apartments of the marshal is the MTA, it won't impose any overhead. It will also be able to managed the callbacks to other apartments as required.
... if a COM class is thread safe, that is it is marked with Both or Free...
What this means is that the object can be used in either apartment types. It is at the point of creation that the apartment in which it will live is decided.
According to my experience it works fine, my question is if it is a matter of time, dangerous and leading to crash because of any reason, or it's completely safe and just nice to have rule?
Subverting the COM threading model generally lands up in tears - quiet possibly years after the initial offence. It is a ticking time bomb. Don't do it.
As noted in the comments, there is CoCreateFreeThreadedMarshaler, but as mentioned in the remarks in the linked documentation, it requires "... a calculated violation of the rules of COM...", and does hint at a non-general or narrow band of applicability.

Related

Cancelling arbitary jobs running in a thread_pool

Is there a way for a thread-pool to cancel a task underway? Better yet, is there a safe alternative for on-demand cancelling opaque function calls in thread_pools?
Killing the entire process is a bad idea and using native handle to perform pthread_cancel or similar API is a last resort only.
Extra
Bonus if the cancellation is immediate, but it's acceptable if the cancellation has some time constraint 'guarantees' (say cancellation within 0.1 execution seconds of the thread in question for example)
More details
I am not restricted to using Boost.Thread.thread_pool or any specific library. The only limitation is compatibility with C++14, and ability to work on at least BSD and Linux based OS.
The tasks are usually data-processing related, pre-compiled and loaded dynamically using C-API (extern "C") and thus are opaque entities. The aim is to perform compute intensive tasks with an option to cancel them when the user sends interrupts.
While launching, the thread_id for a specific task is known, and thus some API can be sued to find more details if required.
Disclaimer
I know using native thread handles to cancel/exit threads is not recommended and is a sign of bad design. I also can't modify the functions using boost::this_thread::interrupt_point, but can wrap them in lambdas/other constructs if that helps. I feel like this is a rock and hard place situation, so alternate suggestions are welcome, but they need to be minimally intrusive in existing functionality, and can be dramatic in their scope for the feature-set being discussed.
EDIT:
Clarification
I guess this should have gone in the 'More Details' section, but I want it to remain separate to show that existing 2 answers are based o limited information. After reading the answers, I went back to the drawing board and came up with the following "constraints" since the question I posed was overly generic. If I should post a new question, please let me know.
My interface promises a "const" input (functional programming style non-mutable input) by using mutexes/copy-by-value as needed and passing by const& (and expecting thread to behave well).
I also mis-used the term "arbitrary" since the jobs aren't arbitrary (empirically speaking) and have the following constraints:
some which download from "internet" already use a "condition variable"
not violate const correctness
can spawn other threads, but they must not outlast the parent
can use mutex, but those can't exist outside the function body
output is via atomic<shared_ptr> passed as argument
pure functions (no shared state with outside) **
** can be lambda binding a functor, in which case the function needs to makes sure it's data structures aren't corrupted (which is the case as usually, the state is a 1 or 2 atomic<inbuilt-type>). Usually the internal state is queried from an external db (similar architecture like cookie + web-server, and the tab/browser can be closed anytime)
These constraints aren't written down as a contract or anything, but rather I generalized based on the "modules" currently in use. The jobs are arbitrary in terms of what they can do: GPU/CPU/internet all are fair play.
It is infeasible to insert a periodic check because of heavy library usage. The libraries (not owned by us) haven't been designed to periodically check a condition variable since it'd incur a performance penalty for the general case and rewriting the libraries is not possible.
Is there a way for a thread-pool to cancel a task underway?
Not at that level of generality, no, and also not if the task running in the thread is implemented natively and arbitrarily in C or C++. You cannot terminate a running task prior to its completion without terminating its whole thread, except with the cooperation of the task.
Better
yet, is there a safe alternative for on-demand cancelling opaque
function calls in thread_pools?
No. The only way to get (approximately) on-demand preemption of a specific thread is to deliver a signal to it (that is is not blocking or ignoring) via pthread_kill(). If such a signal terminates the thread but not the whole process then it does not automatically make any provision for freeing allocated objects or managing the state of mutexes or other synchronization objects. If the signal does not terminate the thread then the interruption can produce surprising and unwanted effects in code not designed to accommodate such signal usage.
Killing the entire process is a bad idea and using native handle to
perform pthread_cancel or similar API is a last resort only.
Note that pthread_cancel() can be blocked by the thread, and that even when not blocked, its effects may be deferred indefinitely. When the effects do occur, they do not necessarily include memory or synchronization-object cleanup. You need the thread to cooperate with its own cancellation to achieve these.
Just what a thread's cooperation with cancellation looks like depends in part on the details of the cancellation mechanism you choose.
Cancelling a non cooperative, not designed to be cancelled component is only possible if that component has limited, constrained, managed interactions with the rest of the system:
the ressources owned by the components should be managed externally (the system knows which component uses what resources)
all accesses should be indirect
the modifications of shared ressources should be safe and reversible until completion
That would allow the system to clean up resource, stop operations, cancel incomplete changes...
None of these properties are cheap; all the properties of threads are the exact opposite of these properties.
Threads only have an implied concept of ownership apparent in the running thread: for a deleted thread, determining what was owned by the thread is not possible.
Threads access shared objects directly. A thread can start modifications of shared objects; after cancellation, such modifications that would be partial, non effective, incoherent if stopped in the middle of an operation.
Cancelled threads could leave locked mutexes around. At least subsequent accesses to these mutexes by other threads trying to access the shared object would deadlock.
Or they might find some data structure in a bad state.
Providing safe cancellation for arbitrary non cooperative threads is not doable even with very large scale changes to thread synchronization objects. Not even by a complete redesign of the thread primitives.
You would have to make thread almost like full processes to be able to do that; but it wouldn't be called a thread then!

COM initialization and cleanup appropriate at the function-level granularity?

Consider writing a reusable custom function that inside its body creates COM objects and calls methods to some COM interfaces. For this to work properly, CoInitializeEx and the matching CoUninitialize APIs must be called.
Calling those COM initialization and cleanup APIs inside the function's body would hide a COM implementation detail to the caller, and would remove a burden from the caller as well.
But is calling CoInitializeEx and the matching CoUninitialize inside function's body considered a good coding practice?
Would calling those COM init/cleanup functions at the function-granularity level imply too much overhead for each function call?
Are there other drawbacks in this design?
It is a terrible practice and fundamentally wrong. What matters a great deal is the value for the 2nd argument (dwCoInit). It must be COINIT_APARTMENTTHREADED, often abbreviated to STA, or COINIT_MULTITHREADED (MTA). This is a promise that you make, cross-your-heart-hope-to-die style. If you break the promise then the program will die. Usually by deadlocking, not getting expected events or having unacceptably slow perf.
When you select STA then you promise that the thread is well-behaved and can support COM components that are not thread-safe. Fulfilling that promise requires that the thread pumps a message loop and never blocks. The common behavior of a thread that supports a GUI for example. The vast majority of COM components are not thread-safe.
When you select MTA then you don't promise any support at all. The component must now fend for itself to keep itself thread-safe. Often done automatically by having the COM infrastructure creating a thread by itself to give the component a safe home. A further detail that you need to take care of is marshaling the interface pointer, requires CoMarshalInterThreadInterfaceInStream() helper function or the more convenient IGlobalInterfaceTable interface. This ensures that a proxy is created that takes care of the required thread context switch.
MTA sounds convenient, but not without consequences, a simple property getter call can take as much as x10000 more time. Overhead imposed by the thread context switches and the need to copy any arguments and the return value across stack frames. And marshaling the interface pointer can easily fail, authors of COM components often don't provide the necessary proxy/stub or they intentionally omitted it because it is just plain too difficult or expensive to copy the data.
Key point is that the choice between STA and MTA can never be made by a library. It does not know beans about the thread, it did not create that thread. And cannot possibly know if the thread pumps a message loop or blocks. That's all code that is entirely out of the library's reach. Otherwise the exact reason that the COM infrastructure needs to know this as well, it likewise cannot make assumptions about the thread.
The choice must be made by the code that created and initialized the thread, invariably the app itself. Unless the library creates a thread for the purpose of making the calls safe. But then with the consequence of code always being slow. You remind the caller of your library that he didn't get it right by returning the inevitable CO_E_NOTINITIALIZED error code.
Fwiw, this is something you see back in the .NET Framework. The CLR always calls CoInitializeEx() before a thread can execute any managed code. Still a choice that must be made by the programmer of the app, or more typically the project template, done with the [STAThread] attribute on Main() or the Thread.SetApartmentState() call for a worker thread.

STA (Single Threaded Apartment) COM Object - Spawn worker threads?

Is it a bad thing to spawn worker threads in your STA COM object (ie. COM object creates a thread to perform a task)? I think, the answer is - that depends!
For example in my case:
The worker threads that I am using will not interfere/access COM or COM Services.
Reason why I am asking this is because by STA COM definition STA can only house one thread. Spawning multiple threads kind of goes against this principle unless the worker threads and the work they do NOT interfere/deal with COM/COM services.
In this case I am thinking this is perfectly fine and in my opinion the worker threads should not be considered by COM as part of the logical STA.
What are your thoughts on this?
No, that's not a bad thing. Apartments explicitly exist to help you getting multi-threaded code working. An STA thread is a safe home for a COM server that's not thread-safe, COM's apartment threading model ensures that it is always used in a thread-safe way. All you have to do is the marshal the interface pointer you want to use in the worker thread (IGlobalInterfaceTable for example) and you can call the methods without doing anything special.
This doesn't come for free of course, there's overhead involved in marshaling the call. How much depends on how responsive the STA thread is when it pumps its message loop. If you intended to create the worker thread explicitly to use that COM server in a multi-threaded way then of course you'll not be ahead, you made it slower.
Don't let the worker threads use COM in any way, and you should be fine. This means you can't call COM objects in the worker and you can't call COM runtime APIs from the worker... either directly or indirectly.
The important thing to realize is that any new threads you create are new threads in their own right; it actually doesn't matter at all which thread created them. The two things that matter are: (1) that those new threads themselves call CoInitializeEx and either get their own STA each, or share an MTA together, and (2) any COM object pointers you transfer between threads get marshaled appropriately. Do not ever just pass a COM object pointer from one thread to another in a global variable; instead use the GIT or CoMarshalInterThreadInterfaceInStream as appropriate.
(One exception to this: you can pass COM pointers freely between MTA threads; but only once that pointer has been appropriately marshaled into the MTA in the first place.)
Also, you need to be very aware of there objects live and what their affinities are. If you create an object on a STA thread, and marshal a pointer to another thread, then the typical case is that the object will still live on that original STA thread with calls returning to that thread, unless you takes specific steps to specify otherwise. (Things to watch for here: what the object's threading model is, and whether it 'aggregates the free-threaded marshaller'.)
So it's not a bad thing; but be sure that you do it appropriately. For example, you might think that using two threads might be more efficient; but then later on realize that a lot of time is being spent by that worker thread calling back to the object on the original thread, giving you worse performance than a single-threaded case. So you need to think out your threads and object strategy carefully first.
(Having said all of that, you can of course spin up as many threads as you want that don't call CoInitialize, so long as they don't use COM or COM objects in any way; if those threads to need so somehow communicate with the threads that do use COM, it's up to you to manage that communication using any 'classic' IPC mechanism of your choice - eg. messages, globals, etc.)

COM calls from multiple threads

If I call the same COM function from multiple threads to an in proc COM Dll, how thread safe is that?
Do all my objects in the COM DLL also need to be thread safe for this to work reliably?
COM takes care of threading on behalf of the COM server. The server publishes the kind of threading it supports with the ThreadingModel registry key. Very common settings are Apartment or Both. Free is very rare. A missing key is equivalent to Apartment.
COM requires a single-threaded apartment (STA) for apartment threaded servers. If you don't provide one (CoInitialize/Ex call) then it will create a dedicated thread for the server. A hard requirement for an STA thread is that it also pumps a Windows message loop. The message loop is the mechanism by which COM automatically marshals a method call from one thread to another.
So, the general answer to your question is, yes, it normally is thread-safe. There are still things that can go wrong. Deadlock is possible when a call is made from a worker thread but the STA thread isn't pumping. Or the server could be fibbing about the ThreadingModel it registered. Not uncommon with servers implemented in .NET. They get registered as Both, but there are few .NET classes that are actually thread-safe.
See this very detaled article. Basically COM will take care of synchronization - you don't need to bother. However in certain cases the consumer can experience significant slowdown because of synchronization.
Depends upon the COM objects threading model. If its free threaded then you are responsible for thread safety. If its in a single threaded apartment then you can only call it from one, and if it's in a multithreaded apartment, then you can can but as always you have to consider the implications for the object's state. there is a very good answer on SO Could you explain STA and MTA? explaining this.

What's the best way of ensuring valid object lifespan when using Boost.Asio?

Been playing a lot with Boost.Asio of late. I like the library a lot since it offers a fantastic way to squeeze performance out of today's multicore systems.
A question I have asked myself a few times, and I thought worth throwing out there regards object lifespan / ownership when making async calls with Asio.
The problem I've come accross repeatedly is that you quite often have to "expire" an object that still has async callbacks pending against it. If that object goes out of scope before the callback is invoked things inevitably go bang.
To combat this I've taken to using the boost::enable_shared_from_this template as a base class for most asio based classes. This works OK but it's a little burdensome: usually this also means protecting the constructor and adding a factory method to the class to ensure that all instances are created inside a shared_ptr.
I just wanted to know how other people had tackled this problem. Am I going about this the best way? Or have I got my Asio.Foo all wrong?
Discuss... :)
Using boost::enable_shared_from_this is pretty much the way to do it. Additionally, look at using boost::weak_ptr if you need references to the object that should not preserve the object if they are the only references which remain.
A good example of using weak_ptr: I use enable_shared_from_this in my socket class which utilizes boost::asio. The boost::asio framework is the only thing that stores persistent references to the object, via read and write handlers. Thus, when the socket's destructor is called, I know that the socket is closed and I can "do stuff" in a handler to clean up for that closed socket. The application which uses the socket only has a weak_ptr reference to it, which it promotes to a shared_ptr when it wants to work with the socket (usually to write to it). That promotion can be checked for failure in case the socket went away, although the socket's close handler usually cleans up all the weak_ptr references appropriately before that even happens.
That kind of thing isn't limited to Asio. I recently wrote a thread-pool class (using Boost::Thread) that had pretty much the same problem -- the threads would call the thread-pool class that created them to see what task they had to do next, using a plain pointer to it, and if the thread-pool class were destroyed with a child thread still running, the program would crash. I dealt with it by calling interrupt on each of the threads in the thread-pool destructor, then waiting for all of them to exit before letting the destructor return.
If I understand your shared-pointer solution, it seems to be doing the same general thing -- ensuring that the item can't be destroyed until it's no longer needed. An aesthetically pleasing solution too. I don't see any better answer to this kind of problem.