How to get the amount of virtual memory available in C++? - c++

I would like to map a file into memory using mmap function and would like to know if the amount of virtual memory on the current platform is sufficient to map a huge file. For a 32 system I cannot map file larger than 4 Gb.
Would std::numeric_limits<size_t>::max() give me the amount of addressable memory or is there any other type that I should test (off_t or something else)?
As Lie Ryan has pointed out in his comment the "virtual memory" here is misused. The question, however holds: there is a type associated with a pointer and it has the maximum value that defines the upper limit of what you can possibly adress on your system. What is this type? Is it size_t or perhaps ptrdiff_t?

size_t is only required to be big enough to store the biggest possible single contiguous object. That may not be the same as the size of the address space (on systems with a segmented memory model, for example)
However, on common platforms with a flat memory space, the two are equal, and so you can get away with using size_t in practice if you know the target CPU.
Anyway, this doesn't really tell you anything useful. Sure, a 32-bit CPU has a 4GB memory space, and so size_t is a 32-bit unsigned integer. But that says nothing about how much you can allocate. Some part of the memory space is used by the OS. And some parts are already used by your own application: for mapping the executable into memory (as well as any dynamic libraries it may use), for each thread's stack, allocated memory on the heap and so on.
So no, tricks such as taking the size of size_t tells you a little bit about the address space you're running in, but nothing very usable. You can ask the OS how much memory is in use by your process and other metrics, but again, that doesn't really help you much. It is possible for a process to use just a couple of megabytes, but have that spread out over so many small allocations that it's impossible to find a contiguous block of memory larger than 100MB, say. And so, on a 32-bit machine, with a process that uses nearly no memory, you'd be unlikely to make such an allocation. (And even if the OS had a magical WhatIsTheLargestPossibleMemoryAllocationICanMake() API, that still wouldn't help you. It would tell you what you needed from a moment ago. You have no guarantee that the answer would still be valid by the time you tried to map the file.
So really, the best you can do is try to map the file, and see if it fails.

Hi you can use GlobalMemoryStatusEx and VirtualQueryEx if you coding in win32

Thing is, the size of a pointer tells you nothing about how much of that "address space" is actually available to you, i.e. can be mapped as a single contiguous chunk.
It's limited by:
the operating system. It may choose to only make a subset of the theoretically-possible address range available to you, because mappable memory is needed for OS-own purposes (like, say, making the graphics card framebuffer visible, and of course for use by the OS itself).
configurable limits. On Linux / UNIX, the "ulimit" command resp. setrlimit() system call allows to restrict the maximum size of an application's address space in various ways, and Windows has similar options through registry parameters.
the history of the application. If the application uses memory mapping extensively, the address space can fragment limiting the maximum size of "available" contiguous virtual addresses.
the hardware platform. Some CPUs have address spaces with "holes"; an example of that is 64bit x86 where pointers are only valid if they're between 0x0..0x7fffffffffff or 0xffff000000000000 and 0xffffffffffffffff. I.e. you have 2x128TB instead of the full 16EB. Think of it as 48-bit "signed" pointers ...
Finally, don't confuse "available memory" and "available address space". There's a difference between doing a malloc(someBigSize) and a mmap(..., someBigSize, ...) because the former might require availability of physical memory to accommodate the request while the latter usually only requires availability of a large-enough free address range.
For UNIX platforms, part of the answer is to use getrlimit(RLIMIT_AS) as this gives the upper bound for the current invocation of your application - as said, the user and/or admin can configure this. You're guaranteed that any attempt to mmap areas larger than that will fail.

Re your rephrased question "upper limit of what you can possibly adress on your system", is somewhat misleading; it's hardware architecture specific. There are 64bit architectures out there (x64, sparc) whose MMU happily allows (uintptr_t)(-1) as valid address, i.e. you can map something into the last page of a 64bit address space. Whether the operating system allows an application to do so or not is again an entirely different question ...
For user applications, the "high mark" isn't (always) fixed a-priori. It's tunable on e.g. Solaris or Linux. That's where getrlimit(RLIMIT_AS) comes in.
Note that again, by specification, there'd be nothing to prevent a (weird) operating system design to choose e.g. putting application stacks and heaps at "low" addresses while putting code at "high" addresses, on a platform with address space holes. You'd need full 64bit pointers there, can't make them any smaller, but there could be an arbitrary number of "inaccessible / invalid" ranges which never are made available to your app.

You can try sizeof(int*). This will give you the length (in bytes) of a pointer on the target platform. Thus, you can find out how big the addressable space is.

Related

Memory allocation in MSVC++ 2019

I have a question regarding the memory allocation, particularly when using MSVC2019.
I have a C++ program compiled to x64.
By debugging I saw, that allocating variables result in very high pointer addresses, pointing into locations over the first 4GB address space (32bit). If I check the program in the Task Manager, I see it is using only around 30-50MBs of memory.
What is the reason that the variables are not allocated in the lower part of the virtual memory space when practically the whole address space under 4GB is unused?
I would expect the allocation to start from low addresses, and until the first 4GB space used, no need to allocate space over this.
Why is this interesting for me:
I have a big SW containing more than 15 years old C++ code, which was not everywhere prepared to be 64bit, on many places it casts pointers to 32bit types and by this the pointers are damaged. Most probably the original authors assumed the pointers are 32bit. What should be practically true also when compiled to 64bit, hence the program is not using much memory, the memory usage does not grow over 4GB. And it seems when compiled using compilers from 2010, this problem does not appear, probably that time the memory allocations resulted addresses in the first 4GB block even if compiled for x64.
My question is:
can this allocation strategy influenced somehow in MSVC++ 2019? Eg. to instruct he compiler/linker/memory manager to prefer allocation in the first 32bit space until no more is needed? Or, to set a size limit for the virtual address space offered by the memory manager, eg. by setting to 2GB I could achieve there will never be any pointer pointing to an allocated block over 4GB. By this, the old code would survive the cast operations assuming a pointer is 32bit.
I already tried to set NO for high memory awareness in the linker option, and checked the heap parameters, but none of them helped.
Thank you!
If your program assumes pointers will be 32-bit, you will just have to compile for 32-bit until you get proper declarations in place using ifdef to check what you are compiling for.
Just pick the x86 instead of x64 from the dropdown as a work around until you modernize your legacy code.
There's more you can do with a big address space, and since the os maps these to portions of physical memory anyway, the compiler simply chose to reap the benefits for keeping different portions of the address space apart for different purposes.
There are ways to create custom heaps and to allocate things on a specific address space if that space is available, however to work these into code would likely take just as long and be going backwards compared to properly allocating correct sizes.
Welcome to the world of virtual memory! In fact to dynamically allocate memory, the standard library kindly asks the kernel to provide it. And only the kernel is reponsable for the virtual addresses given to the program. As each process has its own virtual address translator, multiple processes can be given the same virtual addresses.
As a programmer, you should never worry about that. Use the memory addresses that the kernel has given to you and keep on. If you have to use legacy code assuming that a pointer cannot exceed 32 bits, you should simply not compile it in 64 bits mode but only in 32 bits mode.

Is it possible to partially free dynamically-allocated memory on a POSIX system?

I have a C++ application where I sometimes require a large buffer of POD types (e.g. an array of 25 billion floats) to be held in memory at once in a contiguous block. This particular memory organization is driven by the fact that the application makes use of some C APIs that operate on the data. Therefore, a different arrangement (such as a list of smaller chunks of memory like std::deque uses) isn't feasible.
The application has an algorithm that is run on the array in a streaming fashion; think something like this:
std::vector<float> buf(<very_large_size>);
for (size_t i = 0; i < buf.size(); ++i) do_algorithm(buf[i]);
This particular algorithm is the conclusion of a pipeline of earlier processing steps that have been applied to the dataset. Therefore, once my algorithm has passed over the i-th element in the array, the application no longer needs it.
In theory, therefore, I could free that memory in order to reduce my application's memory footprint as it chews through the data. However, doing something akin to a realloc() (or a std::vector<T>::shrink_to_fit()) would be inefficient because my application would have to spend its time copying the unconsumed data to the new spot at reallocation time.
My application runs on POSIX-compliant operating systems (e.g. Linux, OS X). Is there any interface by which I could ask the operating system to free only a specified region from the front of the block of memory? This would seem to be the most efficient approach, as I could just notify the memory manager that, for example, the first 2 GB of the memory block can be reclaimed once I'm done with it.
If your entire buffer has to be in memory at once, then you probably will not gain much from freeing it partially later.
The main point on this post is basically to NOT tell you to do what you want to do, because the OS will not unnecessarily keep your application's memory in RAM if it's not actually needed. This is the difference between "resident memory usage" and "virtual memory usage". "Resident" is what is currently used and in RAM, "virtual" is the total memory usage of your application. And as long as your swap partition is large enough, "virtual" memory is pretty much a non-issue. [I'm assuming here that your system will not run out of virtual memory space, which is true in a 64-bit application, as long as you are not using hundreds of terabytes of virtual space!]
If you still want to do that, and want to have some reasonable portability, I would suggest building a "wrapper" that behaves kind of like std::vector and allocates lumps of some megabytes (or perhaps a couple of gigabytes) of memory at a time, and then something like:
for (size_t i = 0; i < buf.size(); ++i) {
do_algorithm(buf[i]);
buf.done(i);
}
The done method will simply check if the value if i is (one element) past the end of the current buffer, and free it. [This should inline nicely, and produce very little overhead on the average loop - assuming elements are actually used in linear order, of course].
I'd be very surprised if this gains you anything, unless do_algorithm(buf[i]) takes quite some time (certainly many seconds, probably many minutes or even hours). And of course, it's only going to help if you actually have something else useful to do with that memory. And even then, the OS will reclaim memory that isn't actively used by swapping it out to disk, if the system is short of memory.
In other words, if you allocate 100GB, fill it, leave it sitting without touching, it will eventually ALL be on the hard-disk rather than in RAM.
Further, it is not at all unusual that the heap in the application retains freed memory, and that the OS does not get the memory back until the application exits - and certainly, if only parts of a larger allocation is freed, the runtime will not release it until the whole block has been freed. So, as stated at the beginning, I'm not sure how much this will actually help your application.
As with everything regarding "tuning" and "performance improvements", you need to measure and compare a benchmark, and see how much it helps.
Is it possible to partially free dynamically-allocated memory on a POSIX system?
You can not do it using malloc()/realloc()/free().
However, you can do it in a semi-portable way using mmap() and munmap(). The key point is that if you munmap() some page, malloc() can later use that page:
create an anonymous mapping using mmap();
subsequently call munmap() for regions that you don't need anymore.
The portability issues are:
POSIX doesn't specify anonymous mappings. Some systems provide MAP_ANONYMOUS or MAP_ANON flag. Other systems provide special device file that can be mapped for this purpose. Linux provides both.
I don't think that POSIX guarantees that when you munmap() a page, malloc() will be able to use it. But I think it'll work an all systems that have mmap()/unmap().
Update
If your memory region is so large that most pages surely will be written to swap, you will not loose anything by using file mappings instead of anonymous mappings. File mappings are specified in POSIX.
If you can do without the convenience of std::vector (which won't give you much in this case anyway because you'll never want to copy / return / move that beast anyway), you can do your own memory handling. Ask the operating system for entire pages of memory (via mmap) and return them as appropriate (using munmap). You can tell mmap via its fist argument and the optional MAP_FIXED flag to map the page at a particular address (which you must ensure to be not otherwise occupied, of course) so you can build up an area of contiguous memory. If you allocate the entire memory upfront, then this is not an issue and you can do it with a single mmap and let the operating system choose a convenient place to map it. In the end, this is what malloc does internally. For platforms that don't have sys/mman.h, it's not difficult to fall back to using malloc if you can live with the fact that on those platforms, you won't return memory early.
I'm suspecting that if your allocation sizes are always multiples of the page size, realloc will be smart enough not to copy any data. You'd have to try this out and see if it works (or consult your malloc's documentation) on your particular target platform, though.
mremap is probably what you need. As long as you're shifting whole pages, you can do a super fast realloc (actually the kernel would do it for you).

Get the maximum adressable memory space on a Win32 system

Is there a way on Win32 systems to programmatically get the full size of the OS's addressable memory space, using the Win32 API (or any accessible DLL that would be installed on a >=XP system). I know about GetPerformanceInfo and GlobalMemoryStatusEx, but the former only seems to deal with physical memory, and the latter pertains to memory addressable by my program, no the OS; since my program must be x86 and might be run on an x64 system, there is no guarantee this will even be ballpark.
Note: I'd prefer, but don't need, an exact size. I just need a "really good guess."
GetPhysicallyInstalledSystemMemory can get the physical limit.
GetNativeSystemInfo can retrieve the highest user virtual address the system can access.
Do either of those satisfy your requirement?

How to choose a fixed address for shared memory mapping

I would like to use shared memory between several processes, and would like to be able to keep using raw pointers (and stl containers).
For this purpose, I am using shared memory mapped at a fixed address:
segment = new boost::interprocess::managed_shared_memory(
boost::interprocess::open_or_create,
"MySegmentName",
1048576, // alloc size
(void *)0x400000000LL // fixed address
);
What is a good strategy for choosing this fixed address? For example, should I just use a pretty high number to reduce the chance that I run out of heap space?
This is a hard problem. If you are forking a single program to create children, and only the parent and the children will use the memory segment, just be sure to map it before you fork. The children will automatically inherit the mapping from their parent and there's no need to use a fixed address.
If you aren't, then the first thing to consider is whether you really need to use raw STL containers instead of the boost interprocess containers. That you're already using boost interprocess to allocate the shared memory segment suggests you don't have any problem using boost, so the only advantage I can think of to using STL containers would be so you don't have to port existing code. Keep in mind that for it to work with fixed addresses, the containers and what they contain pointers to (assuming you're working with containers of pointers) will need to be kept in the shared memory space.
If you're certain that it's what you want, you'll have to figure out some method for them to negotiate an address. Keep in mind that the OS is allowed to reject your desired fixed memory address. It will reject an address if the page at that address has already been mapped into memory or allocated. Because different programs will have allocated different amounts of memory at different times, which pages are available and which are unavailable will vary across your programs.
So you need for the programs to gain consensus on a memory address. This means that several addresses might have to be tried and rejected. If it's possible that sometime after startup a new program will become interested, the search for consensus will have to start over again. The algorithm would look something like this:
Program A proposes memory address X to all other programs.
The other programs respond with true or false to indicate whether the memory mapping at address X succeeded.
If program A receives any false responses, goto #1.
Program A sends a message to the other programs letting them know the address has been validated and maybe used.
If a new app becomes interested in the data, it must notify program A it would like an address.
Program A then has to tell all the other programs to stop using the data and goto #1.
To come up with what addresses A should propose, you could have A map a non-fixed memory segment, see what address it's mapped at, and propose that address. If it's unsatisfactory, map another segment and propose it instead. You will need to unmap the segments at some point, but you can't unmap them right away because if you unmap then remap a segment of the same size chances are the OS will give you the same address back over and over. Keep in mind that you may never reach consensus; there's no guarantee that there's a large enough segment at a common location across all the processes. This could happen if your programs all independently use almost all memory, say if they are backed up by a ton of swap (though if you care enough about performance to use shared memory hopefully you are avoiding swap).
All of the above assumes you're in a relatively constrained address space. If you're on 64-bit, this could work. Most computers' RAM + swap will be far less than what's allowed by 64-bits, so you could put map the memory at a very far out fixed address that all processes are unlikely to have mapped already. I suggest at least 2^48, since current 64-bit x86 processors don't each beyond that range (despite pointers being 64-bits, you can only plug in as much RAM as allowed by 48-bits, still a ton at the time of this writing). Although there's no reason a smart heap allocator couldn't take advantage of the vastness of the address space to reduce its bookkeeping work, so to be truly robust you would still need to build consensus. Keep in mind that you will at least want the address to be configurable -- even if we don't have that much memory anytime soon, between now and then someone else might have the same idea and pick your address.
To do the bidirectional communication you could use any of sockets, pipes, or another shared memory segment. Your OS may provide other forms of IPC. But strongly consider that you are probably now introducing more complexity than you would have to deal with if you just used the boost interprocess containers ;)
Read the address from a configuration file. That will allow easy experimentation, and make it easy to change the address as the circumstances change.
Don't use hard-coded absolute addresses as shared memory area for security reasons, even when you don't uses forks or threads. This bypasses all ASLR protections. It enables any attacker predictable locations in the process' address space. It is pretty easy to search for such hard-coded pointers in a binary.
You've been choosen by http://reversingonwindows.blogspot.sg/2013/12/hardcoded-pointers.html as example how to make software less secure, bypassing ASLR.
The 2nd bad example is in the boost library.
The address space needs to be negotiated between the communicating parties at run-time.
My solution:
The initialising program allows the system to select an appropriate segment address. This address is written to disc and retrieved for use by subsequent programs as required.
Caveats:
I am using 64 bit fedora 21 with Kdevelop 4.7 and find that 'void*' is 64 bits long. Writing to disc of the segment head address involves
sprintf(bu, "%p", pointer); and writing a text file:
Recovery reads this file and decodes the hex number as a 'long long' value. This is returned to the caller where it is cast as (void*)
I have also found that grouping all the access routines into a single folder
above the level of the individual processes (each as a project in its own right) has helped save my sanity at the expense of a single aberrant '#include' in the process files
David N Laine

Information about PTE's (Page Table Entries) in Windows

In order to find more easily buffer overflows I am changing our custom memory allocator so that it allocates a full 4KB page instead of only the wanted number of bytes. Then I change the page protection and size so that if the caller writes before or after its allocated piece of memory, the application immediately crashes.
Problem is that although I have enough memory, the application never starts up completely because it runs out of memory. This has two causes:
since every allocation needs 4 KB, we probably reach the 2 GB limit very soon. This problem could be solved if I would make a 64-bit executable (didn't try it yet).
even when I only need a few hundreds of megabytes, the allocations fail at a certain moment.
The second problem is the biggest one, and I think it's related to the maximum number of PTE's (page table entries, which store information on how Virtual Memory is mapped to physical memory, and whether pages should be read-only or not) you can have in a process.
My questions (or a cry-for-tips):
Where can I find information about the maximum number of PTE's in a process?
Is this different (higher) for 64-bit systems/applications or not?
Can the number of PTE's be configured in the application or in Windows?
Thanks,
Patrick
PS. note for those who will try to argument that you shouldn't write your own memory manager:
My application is rather specific so I really want full control over memory management (can't give any more details)
Last week we had a memory overwrite which we couldn't find using the standard C++ allocator and the debugging functionality of the C/C++ run time (it only said "block corrupt" minutes after the actual corruption")
We also tried standard Windows utilities (like GFLAGS, ...) but they slowed down the application by a factor of 100, and couldn't find the exact position of the overwrite either
We also tried the "Full Page Heap" functionality of Application Verifier, but then the application doesn't start up either (probably also running out of PTE's)
There is what i thought was a great series of blog posts by Mark Russinovich on technet called "Pushing the limits of Windows..."
http://blogs.technet.com/markrussinovich/archive/2008/07/21/3092070.aspx
It has a few articles on virtual memory, paged nonpaged memory, physical memory and others.
He mentions little utilities he uses to take measurements about a systems resources.
Hopefully you will find your answers there.
A shotgun approach is to allocate those isolated 4KB entries at random. This means that you will need to rerun the same tests, with the same input repeatedly. Sometimes it will catch the error, if you're lucky.
A slightly smarter approach is to use another algorithm than just random - e.g. make it dependent on the call stack whether an allocation is isolated. Do you trust std::string users, for instance, and suspect raw malloc use?
Take a look at the implementation of OpenBSD malloc. Much of the same ideas (and more) implemented by very skilled folk.
In order to find more easily buffer
overflows I am changing our custom
memory allocator so that it allocates
a full 4KB page instead of only the
wanted number of bytes.
This has already been done. Application Verifier with PageHeap.
Info on PTEs and the Memory architecture can be found in Windows Internals, 5th Ed. and the Intel Manuals.
Is this different (higher) for 64-bit systems/applications or not?
Of course. 64bit Windows has a much larger address space, so clearly more PTEs are needed to map it.
Where can I find information about the
maximum number of PTE's in a process?
This is not so important as the maximum amount of user address space available in a process. (The number of PTEs is this number divided by the page size.)
This is 2GB on 32 bit Windows and much bigger on x64 Windows. (The actual number varies, but it's "big enough").
Problem is that although I have enough
memory, the application never starts
up completely because it runs out of
memory.
Are you a) leaking memory? b) using horribly inefficient algorithms?