Thread safety of Matlab engine API - c++

I have discovered through trial and error that the MATLAB engine function is not completely thread safe.
Does anyone know the rules?
Discovered through trial and error:
On Windows, the connection to MATLAB is via COM, so the COM Apartment threading rules apply. All calls must occur in the same thread, but multiple connections can occur in multiple threads as long as each connection is isolated.
From the answers below, it seems that this is not the case on UNIX, where calls can be made from multiple threads as long as the calls are made serially.

From the documentation,
MATLAB libraries are not thread-safe.
If you create multithreaded
applications, make sure only one
thread accesses the engine
application.

When I first started using the engine, I didn't run across any documentation on thread safety, so I assumed that it was not thread-safe.
I use a C++ class to synchronize access to an engine instance. For more parallel processing designs, I instantiate multiple instances of the engine class.
(edit) I'm using MATLAB R14 on Solaris. I open the engine using the 'engOpen' call, and close it using 'engClose'. My platform does not crash when the Close is called by a different thread than the one that called Open.

From a user's perspective, Matlab's interpreter is purely single-threaded. To be safe, you probably need to make all access to the engine from a single thread.
Note that internally, Matlab uses plenty of threads. There are GUI threads, and in the last few versions, the interpreter can use multiple threads behind the scenes. But, the interpreter is semantically equivalent to a single-threaded interpreter (with interrupts).

You can use engOpenSingleUse instead of using engOpen to make more than one thread working separately. (Only Windows)

Related

Is it correct to use std::async for background tasks inside an internal thread (not from main process's thread)

I would like to have your opinion for this general technical concept. (I am working on microsoft windows OS)
There is a Process, this process creates multiple threads for different tasks.
Main process: it is a windows service written by C# code.
There are several threads that are create inside the main process: Thread_01, Thread_02, ...
Inside Thread_01: There is a Wrapper dll written in managed C++ to consume DLL_01. (DLL_01 is a dll written by me in native C++ code, that provides some APIs: Add, Remove, Connect)
Add and Remove can run very fast, but Connect may take more than 10 seconds and blocks the caller until it finishes.
I am thinking to use std::async to do the Connect function code, and send the result through a callback to the caller (main process).
Is it a good approach? I heard we cannot create or it is better not to create any thread inside inner threads, is it true? If so, how about std::async ?
Any recommendation is appreciated.
Thanks in advance,
None of what you describe makes the use of threads inacceptable for your code.
As usual, threads have issues that need to be cared for:
Data races due to access to shared data.
Problems of ownership of resources is now not just "Who own what?" but "Who and when owns what?".
When a thread is blocked and you want to abort this operation, how do you cancel it without causing issues down the line? In your case, you must avoid calling the callback, when the receiver doesn't exist any more.
Concerning your approach of using a callback, consider std::future<> instead. This takes care of a few of the issues above, though some are only shifted to the caller instead.

Can fibers migrate between threads?

Can a fiber created in thread A switch to another fiber created in thread B? To make the question more specific, some operating systems have fibers natively implemented (windows fibers),
other need to implement it themselves (using setjump longjump in linux etc.).
Libcoro for example wraps this all up in a single API (for windows it’s just a wrapper for native fibers, for Linux it implements it itself etc.)
So, if it's possible to migrate fibers between threads, can you give me an example usage in windows (linux) in c/c++?
I found something about fiber migration in the boost library documentation, but it's not specific enough about it's implementation and platform dependence. I still want to understand how to do it myself using only windows fibers for example (or using Libcoro on linux).
If it's not possible in a general way, why so?
I understand that fibers are meant to be used as lightweight threads for cooperative multitasking over a single thread, they have cheap context switching compared to regular threads, and they simplify the programming.
An example usage is a system with several threads, each having several fibers doing some kind of work hierarchy on their parent thread (never leaving the parent thread).
Even though it's not the intended use I still want to learn how to do it if it's possible in a general way, because I think I can optimize the work load on my job system by migrating fibers between threads.
The mentioned boost.fiber uses boost.context (callcc/continuation) to implement context switching.
Till boost-1.64 callcc was implemented in assembler only, boost-1.65 enables you to choose between assembler, Windows Fibers (Windows) or ucontext (POSIX if available; deprecated API by POSIX).
The assembler implementation is faster that the other two (2 orders of magnitude compared to ucontext).
boost.fiber uses callcc to implement lightweight threads/fibers - the library provides fiber schedulers that allow to migrate fibers between threads.
For instance one provided scheduler steals fibers from other threads if its run-queue goes out of work (fibers that are ready/that can be resumed).
(so you can choose Windows Fibers that get migrated between threads).

Game Engine Multithreading with Lua

I'm designing the threading architecture for my game engine, and I have reached a point where I am stumped.
The engine is partially inspired by Grimrock's engine, where they put as much as they could into LuaJIT, with some things, including low level systems, written in C++.
This seemed like a good plan, given that LuaJIT is easy to use, and I can continue to add API functions in C++ and expand it further. Faster iteration is nice, the ability to have a custom IDE attached to the game and edit the code while it runs is an interesting option, and serializing from Lua is also easy.
But I am stumped on how to go about adding threading. I know Lua has coroutines, but that is not true threading; it's basically to keep Lua from stalling as it waits for code that takes too long.
I originally had in mind to have the main thread running in Lua and calling C++ functions which are dispatched to the scheduler, but I can't find enough information on how Lua functions. I do know that when Lua calls a C++ function it runs outside of the state, so theoretically it may be possible.
I also don't know whether, if Lua makes such a call that is not supposed to return anything, it will hang on the function until it's done.
And I'm not sure whether the task scheduler runs in the main thread, or if it is simply all worker threads pulling data from a queue.
Basically meaning that, instead of everything running at once, it waits for the game state update before doing anything.
Does anyone have any ideas, or suggestions for threading?
In general, a single lua_State * is not thread safe. It's written in pure C and meant to go very fast. It's not safe to allow exceptions go through it either. There's no locks in there and no way for it to protect itself.
If you want to run multiple lua scripts simultaneously in separate threads, the most straightforward way is to use luaL_newstate() separately in each thread, initialize each of them, and load and run scripts in each of them. They can talk to the C++ safely as long as your callbacks use locks when necessary. At least, that's how I would try to do it.
There are various things you could do to speed it up, for instance, if you are loading copies of a single script in each of the threads, you could compile it to lua bytecode before you launch any of the threads, then put the buffer into shared memory, and have the scripts load the shared byte code without changing. That's most likely an unnecessary optimization though, depending on your application.

Handling GUI thread in a program using OpenMP

I have a C++ program that performs some lengthy computation in parallel using OpenMP. Now that program also has to respond to user input and update some graphics. So far I've been starting my computations from the main / GUI thread, carefully balancing the workload so that it is neither to short to mask the OpenMP threading overhead nor to long so the GUI becomes unresponsive.
Clearly I'd like to fix that by running everything concurrently. As far as I can tell, OpenMP 2.5 doesn't provide a good mechanism for doing this. I assume it wasn't intended for this type of problem. I also wouldn't want to dedicate an entire core to the GUI thread, it just needs <10% of one for its work.
I thought maybe separating the computation into a separate pthread which launches the parallel constructs would be a good way of solving this. I coded this up but had OpenMP crash when invoked from the pthread, similar to this bug: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36242 . Note that I was not trying to launch parallel constructs from more than one thread at a time, OpenMP was only used in one pthread throughout the program.
It seems I can neither use OpenMP to schedule my GUI work concurrently nor use pthreads to have the parallel constructs run concurrently. I was thinking of just handling my GUI work in a separate thread, but that happens to be rather ugly in my case and might actually not work due to various libraries I use.
What's the textbook solution here? I'm sure others have used OpenMP in a program that needs to concurrently deal with a GUI / networking etc., but I haven't been able to find any information using Google or the OpenMP forum.
Thanks!
There is no textbook solution. The textbook application for OpenMP is non-interactive programs that read input files, do heavy computation, and write output files, all using the same thread pool of size ~ #CPUs in your supercomputer. It was not designed for concurrent execution of interactive and computation code and I don't think interop with any threads library is guaranteed by the spec.
Leaving theory aside, you seem to have encountered a bug in the GCC implementation of OpenMP. Please file a bug report with the GCC maintainers and for the time being, either look for a different compiler or run your GUI code in a separate process, communicating with the OpenMP program over some IPC mechanism. (E.g., async I/O over sockets.)

QueueUserWorkItem with COM in C++

I have a performance issue where clients are creating hundreds of a particular kind of object "Foo" in my C++ application's DOM. Each Foo instance has its own asynchronous work queue with its own thread. Obviously, that doesn't scale.
I need to share threads amongst work queues, and I don't want to re-invent the wheel. I need to support XP, so I can't use the Vista/Win7 thread pool. The work that needs to be done to process each queue item involves making COM calls in the multi-threaded COM apartment. The documentation for the XP thread pool says that it is okay to call CoInitializeEx() with the MTA apartment in the thread worker function callback. I've written a test app and verified that this works. I made the app run 1 million iterations with and without a CoInitializeEx/CoUninitialize pair in the WorkItem callback function. It takes 35 seconds with the CoInit* calls and 5 seconds without them. That's way too much overhead for my application. Since the thread pool is per-process and 3rd-party code runs in my process, I'm assuming it isn't safe to CoInitializeEx() once per thread and never CoUninitialize().
Given all of that, is there any way that I can use the Win32 thread pool? Am I missing something, or is the XP thread pool pretty useless for high-performance COM applications? Am I just going to have to create my own thread-sharing system?
Have you verified what is taking so long? i.e. is it the call to CoInitializeEx()? You definitely don't need to call CoInitialize once per task. You also don't say how many threads you spawn, i.e. if your running on a dual core and your work is CPU intensive don't expect more than a 2x speedup, and if your work isn't CPU intensive then it's waiting on some resource (memory, disk, net) and speedups will be similarly constrained, perhaps made worse if there is a lock being held for that resource.
If you can use Visual Studio 2010 take a look at the Parallel Pattern Library and Asynchronous Agents Library, there are a couple tools that can help make this take less code to write.
If you can't you can at least try placing a token in TLS that represents whether COM has been initialized on that thread and use the presence of this token to bypass your calls to CoInitialize when they aren't needed.
I'm assuming it isn't safe to CoInitializeEx() once per thread and never CoUninitialize().
Windows will clean up if a thread exits without calling CoUninitialize, we know this works because if it didn't there would be no cleanup when threads crash or are aborted.
So the only way this hack could cause a problem is of someone was trying to queue work items that needed an STA apartment, which seem unlikely.
I'd be tempted to go for it.