Boost mmap performance vs native memory maps - c++

I will be writing a benchmarking tool that will test a mix of IOPS and bandwidth of a disk system and as such I will be turning to file backed memory maps for IO. Because the tool is going to need to be on both POSIX and WinNT platforms I can't just use plain old mmaps. Also from what I understand you have to madvise the Linux kernel that the whole file will be accessed sequentially? Which brings me to Boost memory maps. Are Boost memory maps going to be likely to give me similar performance on similar hardware with similar quality drivers on Windows, Linux and Max OS X? Has anyone benchmarked Boost mmaps across systems?

I would suspect that there would be no performance difference, because Boost is merely providing a platform agnostic wrapper facility, but I would suggest you test it under your specific circumstances.
Also, Windows NT platforms provide a memory mapping facility -- it's not like memory mapping is a Linux specific feature. For Windows, you'll want CreateFile, CreateFileMapping and MapViewOfFile. The Windows library differs in that creation of the mapping machinery is separate from creating a mapped view. Otherwise the functionality is equivalent. Oh, just like on POSIX, you need to clean up, in this case with UnmapViewOfFile on the views and CloseHandle on the file mapping and file handle.

Related

Are there cross-platform ways to promote prefetching for reading a large Boost memory mapped_file?

I have a C++ application for Windows and Linux (e.g. Boost 1.53.2 on Amazon Linux) that is using Boost::iostream::mapped_file (i.e. a memory mapped file). The documentation doesn't mention "prefetch".
The application needs to sequentially read through large read-only files rapidly. Sometimes these files will be larger than available memory. So loading the whole file into memory at once may not be an option. But in all cases, processing will proceed from the beginning to the end sequentially.
It would be helpful if prefetching of upcoming pages happens in a way that keeps ahead of the processing of the pages (i.e. upcoming pages in memory before they are needed), yet not so far ahead that not-yet-processed pages are dropped from memory to make room.
I'm wondering if there are helpful cross-platform ways (Windows and Linux) to give hints or direction or otherwise promote the automated prefetching of the pages that will be needed in the not distant future. I expect the OS might do this to some degree automatically, but I am wondering if there is a convenient technique I should be using to improve on whatever is the default behavior.
Thanks in advance!
Not sure how portable, but I included fadvise and madvise in this answer:
boost file_mapping performance
Seems there are some good pointers for non-POSIX windows here: What is fadvise/madvise equivalent on windows?

Can boost do shared memory between processes that are on different CPUs

If i have a multiprocessor with 2 CPUs, and i have a process running on CPU 1, and another process on CPU 2, is boost::interprocess shared memory be used between them? If so, how is that implemented? I couldn't find any documentation in the boost docs about it
Yes, if you're on either an SMP or standard NUMA system.
Maybe not if some of your CPUs are running on daughter boards or similar.
The OS and underlying hardware platform (which you haven't told us) control this, and you should be able to ask a question specific to that OS/platform in an appropriate forum if you're still not sure.
If you're not sure which of the above two cases are relevant, it's almost certain you're on a general-purpose platform and it will all work. Note that Boost may not expose NUMA affinity control however, if you want to choose which node pages are allocated on.
It is not advantage of boost, but platform. Boost uses shmem or memory mapped files support in the level of operating system.

Fastest IPC method on Windows 7

What is the fastest possible Interprocess Communication (IPC) method on Windows 7? We would like to share only a memory blocks (two-way).
Is it ReadProcessMemory or something else?
We would like to use plain C but, for example, what does Boost library use for IPC?
ReadProcessMemory shouldn't even be listed as an IPC method; yes, it can be used as such, but it exists mainly for debugging purposes (if you check its reference, it's under the category "Debugging functions"), and it's surely slower than "real" shared memory because it copies the memory of a process into the specified buffer, while real shared memory doesn't have this overhead.
The full list of IPC methods supported by Windows is available on the MSDN; still, if you just have two applications that want to share a memory block, you should create a named memory-mapped file (backed by the paging file) with CreateFileMapping/MapViewOfFile, that should be the most straightforward and fastest method. The details of file mapping are described on its page on MSDN.
The relevant Boost IPC classes can act as a thin wrapper around shared memory, AFAIK it only encapsulates the calls to the relevant system-specific APIs, but in the end you get the usual pointer to the shared memory block, so operation should be as fast as using the native APIs.
Because of this I advise you to use Boost.Interprocess, since it's portable, C++-friendly (it provides RAII semantics) and does not give you any performance penalty after the shared memory block has been created (it can provide additional functionalities on shared memory, but they are all opt-in - if you just want shared memory you get just it).

C++ multiple processes?

I've got a project that consists of two processes and I need to pass some data between them in a fast and efficent manner.
I'm aware that I could use sockets to do this using TCP, even though both processes will always exist on the same computer, however this does not seem to be a very efficient solution.
I see lots of information about using "pipes" on Linux. However I primarily want this for Windows and Linux (preferably via a cross platform library), ideally in a type safe,
non-blocking manner.
Another important thing is I need to support multiple instances of the whole application (i.e. both processes), each with their own independent copy of the communication objects.
Also is there a cross platform way to spawn a new process?
Take a look at Boost.Interprocess
Boost.Interprocess simplifies the use of common interprocess communication and synchronization mechanisms and offers a wide range of them:
Shared memory.
Memory-mapped files.
Semaphores, mutexes, condition variables and upgradable mutex types to place them in shared memory and memory mapped files.
Named versions of those synchronization objects, similar to UNIX/Windows sem_open/CreateSemaphore API.
File locking.
Relative pointers.
Message queues.
Boost.Interprocess also offers higher-level interprocess mechanisms to allocate dynamically portions of a shared memory or a memory mapped file (in general, to allocate portions of a fixed size memory segment). Using these mechanisms, Boost.Interprocess offers useful tools to construct C++ objects, including STL-like containers, in shared memory and memory mapped files:
Dynamic creation of anonymous and named objects in a shared memory or memory mapped file.
STL-like containers compatible with shared memory/memory-mapped files.
STL-like allocators ready for shared memory/memory-mapped files implementing several memory allocation patterns (like pooling).
Boost.Interprocess has been tested in the following compilers/platforms:
Visual 7.1 Windows XP
Visual 8.0 Windows XP
GCC 4.1.1 MinGW
GCC 3.4.4 Cygwin
Intel 9.1 Windows XP
GCC 4.1.2 Linux
GCC 3.4.3 Solaris 11
GCC 4.0 MacOs 10.4.1
For IPC, Windows supports named pipes just like Linux does, except that the pipe names follow a different format, owing to the difference in path formats between the two operating systems. This is something that you could overcome with simple preprocessor defines. Both operating systems also support non-blocking IO on pipes and IO multiplexing with select().
Plain old TCP should work fairly efficiently; as I understand it, modern OS's will detect when both ends of a TCP connection are located on the same machine, and will internally route that data through a fast, lightweight (pipe-like) mechanism rather than through the ordinary TCP stack.
So if you already have code that works over TCP, I say stick with that and avoid spending a lot of extra development time for not much payoff.
It may be overkill, but you could use the Apache Portable Runtime; here are the thread and process functions.

Interprocess communication between 32- and 64-bit apps on Windows x64

We'd like to support some hardware that has recently been discontinued. The driver for the hardware is a plain 32-bit C DLL. We don't have the source code, and (for legal reasons) are not interested in decompiling or reverse engineering the driver.
The hardware sends tons of data quickly, so the communication protocol needs to be pretty efficient.
Our software is a native 64-bit C++ app, but we'd like to access the hardware via a 32-bit process. What is an efficient, elegant way for 32-bit and 64-bit applications to communicate with each other (that, ideally, doesn't involve inventing a new protocol)?
The solution should be in C/C++.
Update: several respondents asked for clarification whether this was a user-mode or kernel-mode driver. Fortunately, it's a user-mode driver.
If this is a real driver (kernel mode), you're SOL. Vista x64 doesn't allow installing unsigned drivers. It this is just a user-mode DLL, you can get a fix by using any of the standard IPC mechanisms. Pipes, sockets, out-of-proc COM, roughly in that order. It all operates on bus speeds so as long as you can buffer enough data, the context switch overhead shouldn't hurt too much.
I would just use sockets. It would allow you to use it over IP if you need it in the future, and you won't be tied down to one messaging API. If in the future you wish to implement this on another OS or language, you can.
This article might be of interest. It discusses the problem and then suggests using COM as a solution. I'm not a big fan of COM but given its ubiquity in the Windows universe, it's possible that it might be efficient enough. You will probably want to architect your solution so that you can batch data (you don't want to do one COM call for each item of data).
Elegant? C++? DCOM/RPC calls to yourself might work, or you could create a named pipe and use that to talk between the two processes (maybe create a "CMessage class" or something), though watch out for different structure alignment between x86 and x64.
If the driver does turn out to be a real driver, nobugz is almost right -- you're going to have to work a lot harder, you're not completely SOL. One solution is to install Win32 on some other machine (or virtual machine) and then use some form of RPC, such as sockets (as suggested by Pyrolistical) or UDP or MQ or even Tibco Rendezvous (which claims to support very high throughput in order to handle the volumes of data generated by the financial markets -- at least that's what I remember from back in the old days).
A memory-mapped file, shared by both sides would have the same contents. The OS will have to do some interesting pointer stuff to make it happen, but quite likely will be able to setup the 2 views in such a way that you're not physically copying memory around. Zero copies is about as good as it gets