FFTW reentrancy in plug-in based programs

FFTW reentrancy in plug-in based programs - c++

I'm developing a cross-platform application (Win / Mac / Linux). This application loads plug-ins that I don't control as dynamic libraries, which may do various things, mostly audio and image processing.
Some of these plug-ins may use FFTW as part of their implementation details. (This is not an hypothetical case - I already have three of those).
But, FFTW's fftw_plan family of function is not reentrant per the docs - they can only be called by a single thread. The problem is that some of the plug-ins I could load may call fftw_plan deep inside some thread that they would create themselves.
Is there something I can do to still make sure that things work in that case, or should I just accept that this will end up crashing ? (Putting each plug-in in its own process is not an acceptable solution for me sadly).

It turns out that FFTW provides the void fftw_make_planner_thread_safe(void) function which does ensure that plug-ins will be able to run plans in separate threads.
Calling it at the beginning of the program is enough.

Related

Capping allocated memory in multi-threaded C++ library

I've developed a library in C++ that allows multi-threaded usage. I want to support an option for the caller to specify a cap on the memory allocated by a given thread. (We can ignore the case of one thread allocating memory and others using it.)
Possibly making this more complicated is that my library uses various open source components (boost, ICU, etc), some of which are statically linked and others dynamically.
One option I've been looking into is overriding the allocation functions (new/delete/etc) to do the bookkeeping per thread ID. Natural concerns come up around the bookkeeping: performance, etc.
But an even bigger question/concern is whether this approach will work with the open source components without code changes to them?
I can't seem to find pre-existing solutions for this, though it seems to me like it's not very unusual.
Any suggestions on this approach, or another approach?
EDIT: More background: The library can allocate a significantly large range of memory per calling thread depending on the input provided (ie. KBs to GBs).
So the goal of this request is to (more graciously & deterministically) support running in RAM-constrained environments. This is not for a hard-real-time environment with strict memory limits--it's to support a number of concurrent threads which each have a "safe" allocation cap to avoid engaging the page/swap file.
Basic example use case: a system with 32GB RAM, 20GB free, the application using my library may configure itself to use a max of 10 threads and configure the library to use a max of 1GB per thread.
Upon hitting the cap the current thread's call into the library will cease further work and return a suitable error. (The code is already fully RAII so unwinding cleanly is easy.)
BTW I found some interesting content on the web already, sadly none provide a lot of hope for a "simple & effective" solution. But this one is especially insightful.

Dynamic Linking ~ Limiting a DLL's system access

I know the question might seem a little vague but I will try to explain as clearly as I can.
In C++ there is a way to dynamically link code to your already running program. I am thinking about creating my own plugin system (For learning/research purposes) but I'd like to limit the plugins to specific system access for security purposes.
I would like to give the plugins limited access to for example disk writing such that it can only call functions from API I pass from my application (and write through my predefined interface) Is there a way to enforce this kind of behaviour from the application side?
If not: Are there other language's that support secure dynamically linked modules?

You should think of writing a plugin container (or a sand-box), then coordinate everything through the container, also make sure to drop privileges that you do not need inside the container process before running the plugin. Being run in a process means, you can run the container also as a unique user and not the one who started the process, after that you can limit the user and automatically the process will be limited. Having a dedicated user for a process is the most common and easiest way, it is also the only cross-platform way to limit a process, even on Windows you can use this method to limit a process.
Limiting access to shared resources that OS provides, like disk or RAM or CPU depends heavily on the OS, and you have not specified what OS. While it is doable on most OSes, Linux is the prime choice because it is written with multi-seat and server-use-cases in mind. For example in Linux you can use cgroups here to limit CPU, or RAM easily for each process, then you will only need to apply it for your plugin container process. There is blkio to control disk access, but you can still use the traditional quote mechanism in Linux to limit per-process or per-user share of disk space.
Supporting plugins is an involved process, and the best way to start is reading code that does some of that, Chromium sand-boxing is best place I can suggest, it is very cleanly written, and has nice documentation. Fortunately the code is not very big.
If you prefer less involvement with actual cgroups, there is an even easier mechanism for limiting resources, docker is fairly new but abstracts away low level OS constructs to easily contain applications, without the need to run them in Virtual Machines.

To block some calls, a first idea may be to hook the system calls which are forbidden and others API call which you don't want. You can also hook the dynamic linking calls to prevent your plugins to load another DLLs. Hook disk read/write API to block read/write.
Take a look at this, it may give you an idea to how can you forbid function calls.
You can also try to sandbox your plugins, try to look some open source sandbox and understand how they work. It should help you.

In this case you really have to sandbox the environment in that the DLL runs. Building such a sandbox is not easy at all, and it is something you probably do not want to do at all. System calls can be hidden in strings, or generated through meta programming at execution time, so hard to detect by just analysing the binary. Luckyly people have already build solutions. For example google's project native client with the goal to generally allow C++ code to be run safely in the browser. And when it is safe enough for a browser, it is probably safe enough for you and it might work outside of the browser.

Multithreading at boot time?

We are developing a very low-level app-system which runs before OS boot, in-fact a boot application.
the question is how we should utilize CPU cores/threads?
And how many thread we would run?
Is it possible at all?! is there any link/tutorial?

Since you're talking about threading before booting the OS, I'm going to assume that no kernel is available to you yet. That means no system calls, so no fork() or clone(). For the purpose of this answer, however, I'm also going to assume that you have already set up the A20-gate, a GDT, either protected (for IA-32) or long (for x86-64) mode, and so on. If you don't know what these are, we probably shouldn't be talking about threads before booting to begin with.
There are opcodes and tricks you can use to let your processor use other cores, thus implementing threading quite directly. You can find all these things in the Intel x86 (you are working on x86, are you? You obviously need a different set of manuals if you're on a different architecture) manuals here: http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf
The reason there are no tutorials for something like this is, quite frankly, that it's not very useful. The entire point of setting things up before loading the kernel into memory is to make it easier to load the kernel into memory. Threading does not exactly contribute to this goal. It would be advisable to simply let the kernel deal with such low-level implementation requirements, so that you can then use the fork() and clone() system calls for all your threading needs.
EDIT: Good correction by Sinn: fork() create a new process, which of course isn't actually threading.

IPC between Qt and C/C++

I need to send/receive data between two processes. One of them will be using Qt (4 or 5).
That process will be running all the time (like a background process).
The other process will be launched and then it should be able to send argv to the
first process and receive some answer from it.
The second process must startup as fast as possible so using QtCore is kind of a last resourse. Meaning I need it to be as small and fast as possible, so I'd need to use plain
C/C++ without any external libraries.
Any ideas how it could be done?
If that's not possible, I'll have to use QtCore in the second process. Do you know how much
slower it would be because of QtCore vs plain C/C++? (in terms of startup time).
Regards
EDIT:
I can't use QBus as this must be Mac/Linux/Windows compatible.

If it needs to be fully cross platform compatible your best bet is likely to be named sockets/named pipes, which should work on each platform. Should take you to the information you need for the socket setup. You'll still need some network handling code in your pure C++ application, but it should be significantly less overhead than Qt-Core and Qt-Network.
You could also do it with shared memory, but I prefer the socket method for simplicity.

How does libc work?

I'm writing a MIPS32 emulator and would like to make it possible to use the whole Standard C Library (maybe with the GNU extensions) when compiling C programs with gcc.
As I understand at this point, I/O is handled by syscalls on the MIPS32 architecture. To successfully run a program using libc/glibc, how can I tell what syscalls do I need to emulate? (without trial and error)
Edit: See this for an example of what I mean by syscalls.
(You can check out the project here if you are interested, any feedback is welcome. Keep in mind that it's in a very early stage)

Very Short Answer
Read the much longer answer.
Short Answer
If you intend to provide a custom libc that uses some feature of your emulator to have the host OS execute your system calls, you have to implement all of them.
Much Longer Answer
Step back for a minute and look at the way things are typically layered in a real (non-emulated) system:
The peripherals have some I/O interface (e.g., numbered ports or memory mapping) that the CPU can tickle to make them do whatever they do.
The CPU runs software that understands how to manipulate the hardware. This can be a single-purpose program or an operating system that runs other programs. Since libc is in the picture, let's assume there's an OS and that it's something Unix-y.
Userspace programs run by the OS use a defined interface between themselves and OS to ask for certain "system" functions to be carried out.
What you're trying to accomplish takes place between layers 3 and 2, where a function in libc or user code does whatever the OS defines as triggering a system call. This opens up numerous cans of worms:
What the OS defines as triggering a system call differs from OS to OS and (rarely) between versions of the same OS. This problem is mitigated on "real" systems by providing a dynamically-linkable libc that takes care of hiding those details. That aside, if you have a MIPS32 binary you want to run, does it use a system call convention that your emulator supports?
You would need to provide a custom libc that does something your emulator can recognize as making a particular system call and carry it out. Any program you wish to run will have to be cross-compiled to MIPS32 and statically linked with it, as would any other libraries the program requires (libm comes to mind). Alternately, your emulator package will need to provide a simulation of a dynamic linker plus dynamically-linkable copies of all required libraries, because opening those on the host won't work. If you have enough source to recompile the program from scratch, porting might be better than emulation.
Any code that makes assumptions about paths to files on a particular system or other assumptions about what they'll find in certain devices (which are themselves files) won't run correctly.
If you're providing layer 2, you're signing yourself up to provide a complete, correct simulation of the behavior of one particular version of an entire operating system. Some calls like read() and write() would be easy to deal with; others like fork(), uselib() and ioctl() would be much more difficult. There also isn't necessarily a one-to-one mapping of calls and behaviors your program uses with those your host OS provides. All of this assumes the host is Unix and the target program is, too. If the target is compiled for some other environment, all bets are off.
That last point is why most emulators provide just a CPU and the hardware behaviors of some target system (i.e., everything in layer 1). With those in place, you can run an original system's boot ROM, OS and user programs, all unaltered. There are a number of existing MIPS32 emulators that do just this and can run unaltered versions of the operating systems that ran on the hardware they emulate.
HTH and best of luck on your project.

Most of the ISO standard C library can be written in straight C. Only a few portions need access to lower level OS functionality.
At a minimum, you'll need to emulate basic I/O at the block or character level for fopen, fread, and fwrite. You could take the Unix approach, though, and implement those on top of the lower-level open, read, and write calls.
And you'll have to manage dynamic memory allocation for malloc and free.
And setjmp and longjmp, which needs access to the execution stack.
Also time and the signal.h functions.

I don't know exactly how MIPS works, but on Win32 then OS calls have to be explicitly imported in to a process via the DLL/EXE import table. There could be something similar in the executable format used by the MIPS system.

The usual approach is to emulate not only the CPU, but also a representive set of standard peripherals. Then you start an operating system in your emulator which comes with a libc and hardware drivers included. Libc will invoke the OSes drivers which invoke the virtual hardware in your emulator. For a popular example, see DosBox.
The other interpretation of your question is that you don't want to write a full emulator, but a binary compatibility layer that allows you to execute mips32 binaries on a non-mips32 system. A popular example of that is MacOsX (Intel) that can also execute PowerPC applications.
In the latter scenario you need to emulate either the OSes ABI (application binary interface) or maybe you can get away with libc's ABI. In both cases you need to implement stub code running on the emulator and proxy code running on the host:
The stub serializes the function call arguments
...and transmits them from emulator memory to host memory using some special virtual instructions
The proxy needs to patch the arguments (endianness, integer length, address space ...)
...and executes the function call on the host system
The proxy then paches and serializes the outgoing function arguments
...and transmits them back to the stub
...which returns the data to the caller
Most calls will not be able to work with generic stub/proxy, but need a specific solutions.
Good luck!

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js