how to let Matlab keep a mex session alive - c++

My question is on how to program Matlab and my c++ code so that they can interact. To be more specific, I have a c++ program that process data, create an object, derive statistics of that object and write to mat file. I will then load it in matlab to do further analysis and visulization.
However, the time it takes to process a data and create the object is enormous, while the time to derive a statistic is negligible. On the other hand, there are many statistics and different combination of them and it is difficult to anticipate what combinations we are going to use. So I hope I can run the "statistics" part interatively many times without repeating the job of processing the data.
My question is: Can I ask Matlab to: 1. call the c++ code; 2. after processing the data and creating the object, keep that object "alive" in the memory 3. call the c++ code again to ask for a statistic to be loaded into my workspace. 4. Repeat 3 with different statistics.
Thanks

A further option may be to create a C++ class instance in your MEX function and return a pointer to it to MATLAB, passing the pointer to any subsequent calls. You should also create a MATLAB handle class wrapper for it if you use this approach, to allow you to clean up memory properly in its destructor. Here is a post where the poster was advised to do just that, and this is an example of the method on the Mathworks FileExchange.
The applicability of this method to your problem depends on the complexity of the problem. I would personally only go down this route if the problem is intractably complex with other approaches (e.g., you need to use a C++ class from some library and the instance must stay alive between calls, or if global variables won't do the trick as you need to keep track of a lot of instances and this is naturally represented best by an array of C++ classes where you can properly separate your concerns).

One way to accomplish this is to declare the variables that you want to access again as global in your c++ mex code. These variables will stay in the memory and you can access them again (when you call your mex function) until you clear that mex function or close the Matlab session. I used global variables for this purpose and it worked just fine for me.
Another option is to use persistent variables. From the documentation
Persistent variables are similar to global variables because the
MATLABĀ® software creates permanent storage for both. They differ from
global variables in that persistent variables are known only to the
function in which they are declared. This prevents persistent
variables from being changed by other functions or from the MATLAB
command line.

Related

C++ Multithreading objects from library with static variables

I created several "manager" objects of a library, each with different parameters. Every cycle a manager is fed with a data set, run calculations and writes result into a data structure. I have to run all managers on the same data set as fast as possible, so I created a threadpool to distribute data to all managers so that they can be run concurrently. Each manager have access to one result data structure, so I thought this will be thread safe.
However later I found out that the several classes in this library, which are used by managers, have static member variables which (as I believe) causes segmentation faults - segmentation errors originates from the library, not my code (checked).
My question is, is it possible to go around this? This will probably sound stupid, but is it possible to force each manager to use its own copy of the library and thus circumventing the static issue? I am processing ~20-50k data sets per second so I cannot afford overhead. Using forks would be very painful and in my case could create unwanted overheads.
Thanks for any advice!

Sharing pointer across programs in C++

This is related to a previous post:
Allocating a large memory block in C++
I would like a single C++ server running that generates a giant matrix M. Then, on the same machine I would like to run other programs that can contact this server, and get the memory address for M. M is read only, and the server creates it once. I should be able to spawn a client ./test and this programs should be able to make read only access to M. The server should always be running, but I can run other programs like ./test at anytime.
I don't know much about C++ or OS, what is the best way to do this? Should I use POSIX threads? The matrix is a primitive type (double, float etc), and all programs know its dimensions. The client programs require the entire matrix, so I don't want latency from mem copy from the server to the client, I just want to share that pointer directly. What are my best options?
One mechanism of inter-process communication you could definitely use for sharing direct access to you matrix M is shared memory. It means that the OS lets multiple processes access a shared segment in the memory, as if it was in their address space, by mapping it for each one requesting. A solution that answers all your requirements, and is also cross-platform is boost::interprocess. It is a thin portable layer that wraps all of the necessary OS calls. See a working example right here in the docs.
Essentially, your server process just needs to create an object of type boost::interprocess::shared_memory_object, providing the constructor with a name for the shared segment. When calling its truncate() method, the OS will look for a large enough segement in the address space of this server process. From this moment, any other process can create an object of the same type and provide the same name. Now it too has access to the exact same memory. No copies involved.
If for some reason you are unable to use the portable Boost libraries, or for other reason want to restrict the supported platform to Linux, use the POSIX API around the mmap() function. Here's the Linux man page. Usage is basically not far from the Boost pipeline described above. you create the named segment with shm_open() and truncate with ftruncate(). From there onwards you receive the mapped pointer to this allocated space by calling mmap(). In simpler cases where you'll only be sharing between parent and child processes, you can use this code example from this very website.
Of course, no matter what approach you take, when using a shared resource, make sure to synchronize read/writes properly so to avoid any race condition -- as you would have done in the scenario of multiple threads of the same process.
Of course other programs cannot access the matrix as long as it is in the "normal" process memory.
Without questioning the design approach: Yes, you have to use shared memory. Lookup functions like shmget(), shmat() etc. Then you don't need to pass the pointer to another program (which would not work, actually), you simply use the same file in ftok() everywhere to get access to the shared memory.

What is the right way to use QuantLib from multiple threads?

I haven't been able to find any documentation explicitly describing QuantLib's thread-safety properties (or the absence of them!). The QuantLib configuration documentation lists a number of compile-time options related to thread safety, from which i infer that, by default, QuantLib is not entirely threadsafe.
In particular, there are:
QL_ENABLE_SESSIONS - "If defined, singletons will return different instances for different sessions. You will have to provide and link with the library a sessionId() function in namespace QuantLib, returning a different session id for each session. Undefined by default."
QL_ENABLE_THREAD_SAFE_OBSERVER_PATTERN - "If defined, a thread-safe (but less performant) version of the observer pattern will be used. You should define it if you want to use QuantLib via the SWIG layer within the JVM or .NET eco system or any environment with an async garbage collector. Undefined by default."
QL_ENABLE_SINGLETON_THREAD_SAFE_INIT - "Define this to make Singleton initialization thread-safe. Undefined by default. Not compatible with multiple sessions."
Which options should i use, and what other steps should i take, if i want to use QuantLib:
From multiple threads, but never at the same time (eg only when holding a global lock)?
From multiple threads at the same time, but not sharing any objects between them?
From multiple threads at the same time, sharing objects between them?
The natural structure for my application is a directed acyclic graph, with a constant stream of market data entering at one end, being used to compute and update various objects, and producing a stream of estimated prices leaving at the other end. I would very much like to be able to have multiple cores working in parallel, as some calculations take a long time.
The application will mostly be written in Java, with minimal parts in C++ to interface with QuantLib. I am not planning to use the SWIG wrapper. I am happy to do memory management of QuantLib objects without help from Java's garbage collector.
EDIT! If you decide to set any of these options, then on unix, do it with the corresponding flag to ./configure:
--enable-sessions
--enable-thread-safe-observer-pattern
--enable-thread-safe-singleton-init
The answer from SmallChess is not far from the truth. There are almost no locks or safety nets in QuantLib, so most people use multiprocessing if they need to distribute calculations over processors---and with good reason.
For those who want a bit more insight, and not as an endorsement of using multi-threading in QuantLib:
whatever else you do, if possible, enable the configuration switches that give you some safety, such as the one for thread-safe initialization of singletons (with a caveat, see below);
you might have multiple threads running at once if they don't share any objects, and if they don't try to modify globals such as the evaluation date (look for classes inheriting from Singleton for the list of globals).
if you need different evaluation dates for different threads, you can use another compilation switch to build QuantLib so that the singletons are not actually singletons, but there's an instance per thread. Caveat: this switch is not compatible with thread-safe initialization of singletons. You still shouldn't share objects between threads.
if you want to share objects, you might be in for more trouble than it's worth. The problems are: (1) any change to the underlying data of, say, a curve will trigger a recalculation; and (2) the recalculations (such as the bootstrap of a curve) are not executed right away, but only when needed, i.e., when some curve method is called. This means that you must keep the various steps separate: first, set the values of any quotes and make sure that there aren't any further changes; then, go around the curves and trigger recalculation, for instance by asking a discount factor at some date; finally, pass the curves to the instruments and price them. Changing a value during the calculations will result in a bootstrap being done in the middle of them; and not triggering full construction before calculations might lead to two instruments triggering two simultaneous bootstraps, which wouldn't end well for any concerned parties.
As I said, it's probably more trouble than it's worth. Ideally, don't share objects between threads and don't touch the globals. Otherwise, prefer multiprocessing.
Unfortunately, QuantLib is not thread safe. None of the option you have will help you. QuantLib is a free project, it's focus is on the actual mathematical modelling and not computational optimisations such as thread safe.
You should definitely wrap QuantLib in a process. Multithreading is not encourage for QuantLib unless you absolutely know what you're doing and have checked the relevant source code.

cflock on application variables that rarely change

We currently have a series of variables that are loaded into the application scope that rarely change.
By rarely change, I mean that they are strings like phone numbers, or simple text values that appear on a website and may change once a week or once a month.
Since we are reading these variables and because they rarely change, is there any requirement to encapsulate these inside a cflock ?
I think it would be alot of coding overhead to wrap these variables inside a cflock as the template may contain upwards of 20 instances of these static variables.
Any advice on this greatly appreciated
Personally I would say you do not need to. These variables are essentially constants.
However, you need to assess this yourself. You need to answer the question, 'what would be the ramifications of these variables being read with stale data?'
This means, if as in your example the wrong phone number is used on a request is this a disaster? If that is a problem that you can live with then you can make no changes. If however there are variables that are used in calculations or ones that will cause unacceptable problems if they are stale, then you will need to lock access to these. In this way you can focus your efforts on where you need to and minimise the additional work.
As an aside if you do need to lock any variables then a good pattern to use is to store them inside a CFC instance that is stored in application scope. This way you can handle all the locking in the CFC and your calling code remains simple.
Depending on the version of ACF, Railo, etc... you are using I would suggest that data like this might be better stored in the cache and not in the application scope. The cache can have more persistences through restarts, etc... as well and could be a more efficient way to go.
Take a look at the cacheput, cacheget, cachedelete, etc... functions in the documentation. I believe this was functionality was added in CF9 and Railo 3.2.
Taking it one step further you could simply cache the entire output that uses them for X time as well, so that each time that part is loaded it only has to load one thing from the cache instead of the twenty or so times you mention.
If you are going to store them in the application scope then you only really need to have the cflock around the part of the code that updates them and lock it at the application level. That way anything wanting to read them will have to wait for it to finish updating them before it can read them anyway as the update thread will have a lock on the application scope.

Mixed-Mode Process vs. Managed-to-Unmanaged IPC

I am trying to come up with design candidates for a current project that I am working on. Its client interface is based on WCF Services exposing public methods and call backs. Requests are routed all the way to C++ libraries (that use boost) that perform calculations, operations, etc.
The current scheme is based on a WCF Service talking to a separate native C++ process via IPC.
To make things a little simpler, there is a recommendation around here to go mixed-mode (i.e. to have a single .NET process which loads the native C++ layer inside it, most likely communicating to it via a very thin C++/CLI layer). The main concern is whether garbage collection or other .NET aspects would hinder the performance of the unmanaged C++ part of the process.
I started looking up concepts of safe points and and GC helper methods (e.g. KeepAlive(), etc.) but I couldn't find any direct discussion about this or benchmarks. From what I understand so far, one of the safe points is if a thread is executing unamanged code and in this case garbage collection does not suspend any threads (is this correct?) to perform the cleanup.
I guess the main question I have is there a performance concern on the native side when running these two types of code in the same process vs. having separate processes.
If you have a thread that has never executed any managed code, it will not be frozen during .NET garbage collection.
If a thread which uses managed code is currently running in native code, the garbage collector won't freeze it, but instead mark the thread to stop when it next reaches managed code. However, if you're thinking of a native dispatch loop that doesn't return for a long time, you may find that you're blocking the garbage collector (or leaving stuff pinned causing slow GC and fragmentation). So I recommend keeping your threads performing significant tasks in native code completely pure.
Making sure that the compiler isn't silently generating MSIL for some standard C++ code (thereby making it execute as managed code) is a bit tricky. But in the end you can accomplish this with careful use of #pragma managed(push, off).
It is very easy to get a mixed mode application up and running, however it can be very hard to get it working well.
I would advise thinking carefully before choosing that design - in particular about how you layer your application and the sort of lifetimes you expect for your unmanaged objects. A few thoughts from past experiences:
C++ object lifetime - by architecture.
Use C++ objects briefly in local scope then dispose of them immediately.
It sounds obvious but worth stating, C++ objects are unmanaged resources that are designed to be used as unmanaged resources. Typically they expect deterministic creation and destruction - often making extensive use of RAII. This can be very awkward to control from a a managed program. The IDispose pattern exists to try and solve this. This can work well for short lived objects but is rather tedious and difficult to get right for long lived objects. In particular if you start making unmanaged objects members of managed classes rather than things that live in function scope only, very quickly every class in your program has to be IDisposable and suddenly managed programming becomes harder than ummanaged programming.
The GC is too aggressive.
Always worth remembering that when we talk about managed objects going out of scope we mean in the eyes of the IL compiler/runtime not the language that you are reading the code in. If an ummanaged object is kept around as a member and a managed object is designed to delete it things can get complicated. If your dispose pattern is not complete from top to bottom of your program the GC can get rather aggressive. Say for example you try to write a managed class which deletes an unmanaged object in its finaliser. Say the last thing you do with the managed object is access the unmanaged pointer to call a method. Then the GC may decide that during that unmanaged call is a great time to collect the managed object. Suddenly your unmanaged pointer is deleted mid method call.
The GC is not aggressive enough.
If you are working within address constraints (e.g. you need a 32 bit version) then you need to remember that the GC holds on to memory unless it thinks it needs to let go. Its only input to these thoughts is the managed world. If the unmanaged allocator needs space there is no connection to the GC. An unmanaged allocation can fail simply because the GC hasn't collected objects that are long out of scope. There is a memory pressure API but again it is only really usable/useful for quite simple designs.
Buffer copying. You also need to think about where to allocate any large memory blocks. Managed blocks can be pinned to look like unmanaged blocks. Unmanaged blocks can only ever be copied if they need to look like managed blocks. However when will that large managed block actually get released?