How to choose a fixed address for shared memory mapping - c++

I would like to use shared memory between several processes, and would like to be able to keep using raw pointers (and stl containers).
For this purpose, I am using shared memory mapped at a fixed address:
segment = new boost::interprocess::managed_shared_memory(
boost::interprocess::open_or_create,
"MySegmentName",
1048576, // alloc size
(void *)0x400000000LL // fixed address
);
What is a good strategy for choosing this fixed address? For example, should I just use a pretty high number to reduce the chance that I run out of heap space?

This is a hard problem. If you are forking a single program to create children, and only the parent and the children will use the memory segment, just be sure to map it before you fork. The children will automatically inherit the mapping from their parent and there's no need to use a fixed address.
If you aren't, then the first thing to consider is whether you really need to use raw STL containers instead of the boost interprocess containers. That you're already using boost interprocess to allocate the shared memory segment suggests you don't have any problem using boost, so the only advantage I can think of to using STL containers would be so you don't have to port existing code. Keep in mind that for it to work with fixed addresses, the containers and what they contain pointers to (assuming you're working with containers of pointers) will need to be kept in the shared memory space.
If you're certain that it's what you want, you'll have to figure out some method for them to negotiate an address. Keep in mind that the OS is allowed to reject your desired fixed memory address. It will reject an address if the page at that address has already been mapped into memory or allocated. Because different programs will have allocated different amounts of memory at different times, which pages are available and which are unavailable will vary across your programs.
So you need for the programs to gain consensus on a memory address. This means that several addresses might have to be tried and rejected. If it's possible that sometime after startup a new program will become interested, the search for consensus will have to start over again. The algorithm would look something like this:
Program A proposes memory address X to all other programs.
The other programs respond with true or false to indicate whether the memory mapping at address X succeeded.
If program A receives any false responses, goto #1.
Program A sends a message to the other programs letting them know the address has been validated and maybe used.
If a new app becomes interested in the data, it must notify program A it would like an address.
Program A then has to tell all the other programs to stop using the data and goto #1.
To come up with what addresses A should propose, you could have A map a non-fixed memory segment, see what address it's mapped at, and propose that address. If it's unsatisfactory, map another segment and propose it instead. You will need to unmap the segments at some point, but you can't unmap them right away because if you unmap then remap a segment of the same size chances are the OS will give you the same address back over and over. Keep in mind that you may never reach consensus; there's no guarantee that there's a large enough segment at a common location across all the processes. This could happen if your programs all independently use almost all memory, say if they are backed up by a ton of swap (though if you care enough about performance to use shared memory hopefully you are avoiding swap).
All of the above assumes you're in a relatively constrained address space. If you're on 64-bit, this could work. Most computers' RAM + swap will be far less than what's allowed by 64-bits, so you could put map the memory at a very far out fixed address that all processes are unlikely to have mapped already. I suggest at least 2^48, since current 64-bit x86 processors don't each beyond that range (despite pointers being 64-bits, you can only plug in as much RAM as allowed by 48-bits, still a ton at the time of this writing). Although there's no reason a smart heap allocator couldn't take advantage of the vastness of the address space to reduce its bookkeeping work, so to be truly robust you would still need to build consensus. Keep in mind that you will at least want the address to be configurable -- even if we don't have that much memory anytime soon, between now and then someone else might have the same idea and pick your address.
To do the bidirectional communication you could use any of sockets, pipes, or another shared memory segment. Your OS may provide other forms of IPC. But strongly consider that you are probably now introducing more complexity than you would have to deal with if you just used the boost interprocess containers ;)

Read the address from a configuration file. That will allow easy experimentation, and make it easy to change the address as the circumstances change.

Don't use hard-coded absolute addresses as shared memory area for security reasons, even when you don't uses forks or threads. This bypasses all ASLR protections. It enables any attacker predictable locations in the process' address space. It is pretty easy to search for such hard-coded pointers in a binary.
You've been choosen by http://reversingonwindows.blogspot.sg/2013/12/hardcoded-pointers.html as example how to make software less secure, bypassing ASLR.
The 2nd bad example is in the boost library.
The address space needs to be negotiated between the communicating parties at run-time.

My solution:
The initialising program allows the system to select an appropriate segment address. This address is written to disc and retrieved for use by subsequent programs as required.
Caveats:
I am using 64 bit fedora 21 with Kdevelop 4.7 and find that 'void*' is 64 bits long. Writing to disc of the segment head address involves
sprintf(bu, "%p", pointer); and writing a text file:
Recovery reads this file and decodes the hex number as a 'long long' value. This is returned to the caller where it is cast as (void*)
I have also found that grouping all the access routines into a single folder
above the level of the individual processes (each as a project in its own right) has helped save my sanity at the expense of a single aberrant '#include' in the process files
David N Laine

Related

Does Windows 10 protect you from accessing memory that another program is using?

The following C++ code works:
int *p = new int;
p[1000] = 12;
Meaning I access a memory location that is sizeof (int) * 1000 bytes away from p.
What I was thinking is that maybe Windows or any other program is currently using the memory location &p[1000] for something. And if I tired to set p[1000] to a new value, then another program or even Windows who might be using that location to hold some memory, might crash, because I changed an important variable of that program.
Since C++ doesn't forbid this, I was wondering if at least Windows has some sort of protection against a program using a memory location currently used by someone else.
On Windows (and all other modern consumer operating systems) writing to a memory address you don't own will not directly affect memory belonging to any other process.
However, the operating system might be using that memory to provide essential services to your program, or the address might not be valid at all, so overwriting an address you don't own could cause your program to crash or behave in an unexpected way, either immediately or at some unpredictable point in the future. Google "undefined behavior" for more discussion of why this is a Bad Thing.
In the case of Windows, I have a vague recollection that the GUI uses some user-mode shared memory (for efficiency) so if you are really unlucky then writing to the wrong address might cause other GUI programs to malfunction, or perhaps even the entire GUI to become unresponsive, which would look very similar to an operating system hang from the user's point of view. I don't think I've ever seen that happen, though, so perhaps my information is out of date, or there are protective mechanisms in place to make this scenario less likely. (This does not represent a security vulnerability, because it only affects the user's other programs, and a malicious program could achieve the same effect in any number of other, more reliable ways.)
Memory is organized into PAGES. Each process sees a logical address space consisting of pages numbered 0-N.
The logical address space is divided into two ranges: user space and system space.
Each process has its own unique user space and all processes share the same system space. Your user space page 10 maps to a different physical location then some other process's user space page 10 (in most cases).
Memory in the system space is protected from user mode access. The only way to write to it is to switch to kernel mode. The operating system limits how you can do that to calls to specific system services. So, absent bugs (but we are talking M$ here) you should not be able to modify the system space willy-nilly.
It is possible for two applications to map memory in such a way that they are sharing memory locations in user mode. In that case, you can screw things up doing the type of thing you are illustrating. However, you have to explicitly map the memory in both processes.
Every process has its own address space. This space is mapped to the real adresses. They don't overlap.

Is it possible to partially free dynamically-allocated memory on a POSIX system?

I have a C++ application where I sometimes require a large buffer of POD types (e.g. an array of 25 billion floats) to be held in memory at once in a contiguous block. This particular memory organization is driven by the fact that the application makes use of some C APIs that operate on the data. Therefore, a different arrangement (such as a list of smaller chunks of memory like std::deque uses) isn't feasible.
The application has an algorithm that is run on the array in a streaming fashion; think something like this:
std::vector<float> buf(<very_large_size>);
for (size_t i = 0; i < buf.size(); ++i) do_algorithm(buf[i]);
This particular algorithm is the conclusion of a pipeline of earlier processing steps that have been applied to the dataset. Therefore, once my algorithm has passed over the i-th element in the array, the application no longer needs it.
In theory, therefore, I could free that memory in order to reduce my application's memory footprint as it chews through the data. However, doing something akin to a realloc() (or a std::vector<T>::shrink_to_fit()) would be inefficient because my application would have to spend its time copying the unconsumed data to the new spot at reallocation time.
My application runs on POSIX-compliant operating systems (e.g. Linux, OS X). Is there any interface by which I could ask the operating system to free only a specified region from the front of the block of memory? This would seem to be the most efficient approach, as I could just notify the memory manager that, for example, the first 2 GB of the memory block can be reclaimed once I'm done with it.
If your entire buffer has to be in memory at once, then you probably will not gain much from freeing it partially later.
The main point on this post is basically to NOT tell you to do what you want to do, because the OS will not unnecessarily keep your application's memory in RAM if it's not actually needed. This is the difference between "resident memory usage" and "virtual memory usage". "Resident" is what is currently used and in RAM, "virtual" is the total memory usage of your application. And as long as your swap partition is large enough, "virtual" memory is pretty much a non-issue. [I'm assuming here that your system will not run out of virtual memory space, which is true in a 64-bit application, as long as you are not using hundreds of terabytes of virtual space!]
If you still want to do that, and want to have some reasonable portability, I would suggest building a "wrapper" that behaves kind of like std::vector and allocates lumps of some megabytes (or perhaps a couple of gigabytes) of memory at a time, and then something like:
for (size_t i = 0; i < buf.size(); ++i) {
do_algorithm(buf[i]);
buf.done(i);
}
The done method will simply check if the value if i is (one element) past the end of the current buffer, and free it. [This should inline nicely, and produce very little overhead on the average loop - assuming elements are actually used in linear order, of course].
I'd be very surprised if this gains you anything, unless do_algorithm(buf[i]) takes quite some time (certainly many seconds, probably many minutes or even hours). And of course, it's only going to help if you actually have something else useful to do with that memory. And even then, the OS will reclaim memory that isn't actively used by swapping it out to disk, if the system is short of memory.
In other words, if you allocate 100GB, fill it, leave it sitting without touching, it will eventually ALL be on the hard-disk rather than in RAM.
Further, it is not at all unusual that the heap in the application retains freed memory, and that the OS does not get the memory back until the application exits - and certainly, if only parts of a larger allocation is freed, the runtime will not release it until the whole block has been freed. So, as stated at the beginning, I'm not sure how much this will actually help your application.
As with everything regarding "tuning" and "performance improvements", you need to measure and compare a benchmark, and see how much it helps.
Is it possible to partially free dynamically-allocated memory on a POSIX system?
You can not do it using malloc()/realloc()/free().
However, you can do it in a semi-portable way using mmap() and munmap(). The key point is that if you munmap() some page, malloc() can later use that page:
create an anonymous mapping using mmap();
subsequently call munmap() for regions that you don't need anymore.
The portability issues are:
POSIX doesn't specify anonymous mappings. Some systems provide MAP_ANONYMOUS or MAP_ANON flag. Other systems provide special device file that can be mapped for this purpose. Linux provides both.
I don't think that POSIX guarantees that when you munmap() a page, malloc() will be able to use it. But I think it'll work an all systems that have mmap()/unmap().
Update
If your memory region is so large that most pages surely will be written to swap, you will not loose anything by using file mappings instead of anonymous mappings. File mappings are specified in POSIX.
If you can do without the convenience of std::vector (which won't give you much in this case anyway because you'll never want to copy / return / move that beast anyway), you can do your own memory handling. Ask the operating system for entire pages of memory (via mmap) and return them as appropriate (using munmap). You can tell mmap via its fist argument and the optional MAP_FIXED flag to map the page at a particular address (which you must ensure to be not otherwise occupied, of course) so you can build up an area of contiguous memory. If you allocate the entire memory upfront, then this is not an issue and you can do it with a single mmap and let the operating system choose a convenient place to map it. In the end, this is what malloc does internally. For platforms that don't have sys/mman.h, it's not difficult to fall back to using malloc if you can live with the fact that on those platforms, you won't return memory early.
I'm suspecting that if your allocation sizes are always multiples of the page size, realloc will be smart enough not to copy any data. You'd have to try this out and see if it works (or consult your malloc's documentation) on your particular target platform, though.
mremap is probably what you need. As long as you're shifting whole pages, you can do a super fast realloc (actually the kernel would do it for you).

Share pointer between subprocesses

I have a 64 bit application that creates 2 subprocesses (32 bit) via a popen2 implementation. Everything is written in C++.
I need the 2 subprocesses to access the same object in memory and I don't have a good idea about how to do this.
If I understand correctly, each subprocess will have a different memory map and therefore I can't just pass a memory address between the two.
Additional information: The target platform is Mac but I'm looking for an answer that is as platform independent as possible Mac specific answers are fine, I probably won't use this approach on other platforms. I simply don't know enough about using threads; I came down this route because the subprocesses must be 32 bit.
You can use shared memory concept. It means, that you allocate (using OS services) a memory, that will be visible by both subprocesses.
As wiki recoomends, you can use boost.interprocess to use shared memory on platform-independent level.
It's a difficult problem.
You are correct that each process has its own address space. Objects created by one process can't be accessed by another process.
It is possible to use shared memory, and place the objects there. One complication is that in general the shared memory segment will be mapped at different addresses into each process's address space. This means that you can't use pointers inside those object. This can be alleviated by working with indices instead of pointers.
Furthermore, if process A is 32-bit and process B is 64-bit, primitive types such as long can have different width. Thus when sharing data in such a scenario you need to use fixed-width types such as int32_t.
One final complication is synchronization: if a process can modify an object while another process is reading or modifying it, you'll need to introduces inter-process synchronization.

Mapping of several big files into memory

In our application we have to be able to map several (i.e. maybe up to 4) files into memory (via mapViewOfFile). For a long time this has not been a problem, but as the files were getting bigger and bigger over the last years, now memory fragmentation prevents us from mapping those big files (files will be about 200 MB). The problem may already exist if no other files are loaded at that moment.
I am now looking for a way t make sure that the mapping is always successful. Therefor I wanted to reserve a block of memory at program start only for the mapping and that would therefor suffer much less from the fragmentation.
My first approach was to HeapCreate a private heap, I would then HeapAlloc a block of memory large enough to hold the mapping for one file and then use MapViewOfFileEx with the address of that block. Of cause the address would have to match the memory allocation granularity. But the mapping still failed with error code ERROR_INVALID_ADDRESS (487).
Next I tried the same thing with VirtualAloc. My understanding was that when I pass the parameter MEM_RESERVE I would then be able to use that memory for what ever I wanted, e.g. to map a view of a file. But I found out that that is not possible (same error code as above) until i completely free the whole block with VirtualFree again. Therefor there would be no reserved memory for the next files anymore.
I'm already using the low fragmentation heap feature and it is of nearly no use to us. Rewriting our code to use only smaller views of the files is not an option at the moment. I also took a look at this post Can address space be recycled for multiple calls to MapViewOfFileEx without chance of failure? but didn't find any it very useful and was hoping for an other possibility.
Do you have any suggestions what I can do or where my design may be wrong?
Thank you.
Well, the documentation for MapViewOfFileEx is clear: "The suggested address is used to specify that a file should be mapped at the same address in multiple processes. This requires the region of address space to be available in all involved processes. No other memory allocation can take place in the region that is used for mapping, including the use of the VirtualAlloc"
The low fragmentation heap is intended to prevent even relatively small allocations from failing. I.e. it avoids 1 byte holes so 2 byte allocations will remain possible for longer. Your allocations are not small by 32 bits standards.
Realistically, this is going to hurt. If you really really need it, reimplement memory mapped files. All the necessary functions are available. Use a vectored exception handler to page in the source, and use QueryWorkingSet to figure out if pages are dirty.

How to get the amount of virtual memory available in C++?

I would like to map a file into memory using mmap function and would like to know if the amount of virtual memory on the current platform is sufficient to map a huge file. For a 32 system I cannot map file larger than 4 Gb.
Would std::numeric_limits<size_t>::max() give me the amount of addressable memory or is there any other type that I should test (off_t or something else)?
As Lie Ryan has pointed out in his comment the "virtual memory" here is misused. The question, however holds: there is a type associated with a pointer and it has the maximum value that defines the upper limit of what you can possibly adress on your system. What is this type? Is it size_t or perhaps ptrdiff_t?
size_t is only required to be big enough to store the biggest possible single contiguous object. That may not be the same as the size of the address space (on systems with a segmented memory model, for example)
However, on common platforms with a flat memory space, the two are equal, and so you can get away with using size_t in practice if you know the target CPU.
Anyway, this doesn't really tell you anything useful. Sure, a 32-bit CPU has a 4GB memory space, and so size_t is a 32-bit unsigned integer. But that says nothing about how much you can allocate. Some part of the memory space is used by the OS. And some parts are already used by your own application: for mapping the executable into memory (as well as any dynamic libraries it may use), for each thread's stack, allocated memory on the heap and so on.
So no, tricks such as taking the size of size_t tells you a little bit about the address space you're running in, but nothing very usable. You can ask the OS how much memory is in use by your process and other metrics, but again, that doesn't really help you much. It is possible for a process to use just a couple of megabytes, but have that spread out over so many small allocations that it's impossible to find a contiguous block of memory larger than 100MB, say. And so, on a 32-bit machine, with a process that uses nearly no memory, you'd be unlikely to make such an allocation. (And even if the OS had a magical WhatIsTheLargestPossibleMemoryAllocationICanMake() API, that still wouldn't help you. It would tell you what you needed from a moment ago. You have no guarantee that the answer would still be valid by the time you tried to map the file.
So really, the best you can do is try to map the file, and see if it fails.
Hi you can use GlobalMemoryStatusEx and VirtualQueryEx if you coding in win32
Thing is, the size of a pointer tells you nothing about how much of that "address space" is actually available to you, i.e. can be mapped as a single contiguous chunk.
It's limited by:
the operating system. It may choose to only make a subset of the theoretically-possible address range available to you, because mappable memory is needed for OS-own purposes (like, say, making the graphics card framebuffer visible, and of course for use by the OS itself).
configurable limits. On Linux / UNIX, the "ulimit" command resp. setrlimit() system call allows to restrict the maximum size of an application's address space in various ways, and Windows has similar options through registry parameters.
the history of the application. If the application uses memory mapping extensively, the address space can fragment limiting the maximum size of "available" contiguous virtual addresses.
the hardware platform. Some CPUs have address spaces with "holes"; an example of that is 64bit x86 where pointers are only valid if they're between 0x0..0x7fffffffffff or 0xffff000000000000 and 0xffffffffffffffff. I.e. you have 2x128TB instead of the full 16EB. Think of it as 48-bit "signed" pointers ...
Finally, don't confuse "available memory" and "available address space". There's a difference between doing a malloc(someBigSize) and a mmap(..., someBigSize, ...) because the former might require availability of physical memory to accommodate the request while the latter usually only requires availability of a large-enough free address range.
For UNIX platforms, part of the answer is to use getrlimit(RLIMIT_AS) as this gives the upper bound for the current invocation of your application - as said, the user and/or admin can configure this. You're guaranteed that any attempt to mmap areas larger than that will fail.
Re your rephrased question "upper limit of what you can possibly adress on your system", is somewhat misleading; it's hardware architecture specific. There are 64bit architectures out there (x64, sparc) whose MMU happily allows (uintptr_t)(-1) as valid address, i.e. you can map something into the last page of a 64bit address space. Whether the operating system allows an application to do so or not is again an entirely different question ...
For user applications, the "high mark" isn't (always) fixed a-priori. It's tunable on e.g. Solaris or Linux. That's where getrlimit(RLIMIT_AS) comes in.
Note that again, by specification, there'd be nothing to prevent a (weird) operating system design to choose e.g. putting application stacks and heaps at "low" addresses while putting code at "high" addresses, on a platform with address space holes. You'd need full 64bit pointers there, can't make them any smaller, but there could be an arbitrary number of "inaccessible / invalid" ranges which never are made available to your app.
You can try sizeof(int*). This will give you the length (in bytes) of a pointer on the target platform. Thus, you can find out how big the addressable space is.