Contiguous Memory Allocation

Contiguous Memory Allocation - c++

In an application i have to allocate two buffers of 480 MB each. Memory allocation is done using HeapAlloc method. The application works fine in the systems where not many applications are running. But in system where other applications are also running memory is not allocated because of non availability of contiguous memory. Even though the memory space(non contiguous) is available but it is not allocated.
Need help to allocate two buffers of 480 MB even if non contiguous memory is available.

The situation you describe is not possible in a full featured OS which gives each process its own address space. It doesn't matter how many other applications are running, they won't affect contiguity of the free address space in your process. And virtual memory can map discontiguous physical memory addresses to a contiguous range in virtual address space.
Only in an embedded system without a memory management unit could the existence of other tasks cause your program to suffer memory fragmentation.
HeapAlloc() suggests Windows, which does give a separate address space to each process. The most likely explanation there is that your private address space is fragmented by libraries (DLLs) loading in scattered locations. You can rebase the libraries you use to avoid this and provide larger contiguous blocks of address space.

You can use VirtualAlloc with fAllocation specified as MEM_LARGE_PAGES. This enables large page support, note that you must check GetLargePageMinimum to ensure that the system supports lage pages.
Also note that this is likely to be slow as this page details.
Large-page memory regions may be difficult to obtain after the system has been running for a long time because the physical space for each large page must be contiguous, but the memory may have become fragmented. Allocating large pages under these conditions can significantly affect system performance. Therefore, applications should avoid making repeated large-page allocations and instead allocate all large pages one time, at startup.

Use VirtualAlloc. The underlying memory that backs the virtual pages need not be contiguous and you will always have your full virtual address space (2GB on a 32 bit system, I think 8 or 16 TB on Windows x64, I can't remember.) HeapAlloc can become fragmented (through your process's use, not others.) Your address space can also become fragmented, so try allocating it early in your application. I actually don't recommend HeapAlloc for anything, you can just use new and delete (which call malloc and free) For large blocks like yours malloc will call VirtualAlloc on Windows.

Related

What part of the process virtual memory does Windows Task Manager display

My question is a bit naive. I'm willing to have an overview as simple as possible and couldn't find any resource that made it clear to me. I am a developer and I want to understand what exactly is the memory displayed in the "memory" column by default in Windows Task Manager:
To make things a bit simpler, let's forget about the memory the process shares with other processes, and imagine the shared memory is negligible. Also I'm focussed on the big picture and mainly care for things at GB level.
As far as I know, the memory reserved by the process called "virtual memory", is partly stored in the main memory (RAM), partly on the disk. The system decides what goes where. The system basically keeps in RAM the parts of the virtual memory that is accessed sufficiently frequently by the process. A process can reserve more virtual memory than RAM available in the computer.
From a developer point of view, the virtual memory may only be partially allocated by the program through its own memory manager (with malloc() or new X() for example). I guess the system has no awareness of what part of the virtual memory is allocated since this is handled by the process in a "private" way and depends on the language, runtime, compiler... Q: Is this correct?
My hypothesis is that the memory displayed by the task manager is essentially the part of the virtual memory being stored in RAM by the system. Q: Is it correct? And is there a simple way to know the total virtual memory reserved by the process?

Memory on windows is... extremely complicated and asking 'how much memory does my process use' is effectively a nonsensical question. TO answer your questions lets get a little background first.
Memory on windows is allocated via ptr = VirtualAlloc(..., MEM_RESERVE, ...) and committed later with VirtualAlloc(ptr+n, MEM_COMMIT, ...).
Any reserved memory just uses up address space and so isn't interesting. Windows will let you MEM_RESERVE terabytes of memory just fine. Committing the memory does use up resources but not in the way you'd think. When you call commit windows does a few sums and basically works out (total physical ram + total swap - current commit) and lets you allocate memory if there's enough free. BUT the windows memory manager doesn't actually give you physical ram until you actually use it.
Later, however, if windows is tight for physical RAM it'll swap some of your RAM out to disk (it may compress it and also throw away unused pages, throw away anything directly mapped from a file and other optimisations). This means your total commit and total physical ram usage for your program may be wildly different. Both numbers are useful depending on what you're measuring.
There's one last large caveat - memory that is shared. When you load DLLs the code, the read-only memory [and even maybe the read/write section but this is COW'd] can be shared with other programs. This means that your app requires that memory but you cannot count that memory against just your app - after all it can be shared and so doesn't take up as much physical memory as a naive count would think.
(If you are writing a game or similar you also need to count GPU memory but I'm no expert here)
All of the above goodness is normally wrapped up by the heap the application uses and you see none of this - you ask for and use memory. And its just as optimal as possible.
You can see this by going to the details tab and looking at the various options - commit-size and working-set are really useful. If you just look at the main window in task-manager and it has a single value I'd hope you understand now that a single value for memory used has to be some kind of compromise as its not a question that makes sense.
Now to answer your questions
Firstly the OS knows exactly how much memory your app has reserved and how much it has committed. What it doesn't know is if the heap implementation you (or more likely the CRT) are using has kept some freed memory about which it hasn't released back to the operation system. Heaps often do this as an optimisation - asking for memory from the OS and freeing it back to the OS is a fairly expensive operation (and can only be done in large chunks known as pages) and so most of them keep some around.
Second question: Dont use that value, go to details and use the values there as only you know what you actually want to ask.
EDIT:
For your comment, yes, but this depends on the size of the allocation. If you allocate a large block of memory (say >= 1MB) then the heap in the CRT generally directly defers the allocation to the operating system and so freeing individual ones will actually free them. For small allocations the heap in the CRT asks for pages of memory from the operating system and then subdivides that to give out in allocations. And so if you then free every other one of those you'll be left with holes - and the heap cannot give those holes back to the OS as the OS generally only works in whole pages. So anything you see in task manager will show that all the memory is still used. Remember this memory isn't lost or leaked, its just effectively pooled and will be used again if allocations ask for that size. If you care about this memory you can use the crt heap statistics famliy of functions to keep an eye on those - specifically _CrtMemDumpStatistics

Memory usage in C++ program, as reported by Gnome resource monitor: confusion

I am looking at the memory consumed by my app to make sure I am not allocating too much, and am confused as to what Gnome Resource Monitor is showing me. I have used the following pieces of code to allocate memory in two separate apps that are otherwise identical; they contain nothing other than this code and a scanf() call to pause execution whilst I grab the memory usage:
malloc(1024 * 1024 * 100);
and
char* p = new char[1204*1024*100];
The following image shows the memory usage of my app before and after each of these lines:
Now, I have read a lot (but obviously not enough) about memory usage (including this SO question), and am having trouble differentiating between writeable memory and virtual memory. According to the linked question,
"Writeable memory is the amount of address space that your process has
allocated with write privileges"
and
"Virtual memory is the address space that your application has
allocated"
1) If I have allocated memory myself, surely it has write privileges?
2) The linked question also states (regarding malloc)
"...which won't actually allocate any memory. (See the rant at the end
of the malloc(3) page for details.)"
I don't see any "rant", and my images show the virtual memory has increased! Can someone explain this please?
3) If I have purely the following code:
char* p = new char[100];
...the resource monitor shows that both Memory and Writeable Memory have increased by 8KB - the same as when I was allocating a full one megabyte! - with Virtual memory increasing by 0.1. What is happening here?
4) What column should I be looking at in the resource monitor to see how much memory my app is using?
Thanks very much in advance for participation, and sorry if have been unclear or missed anything that could have led me to find answers myself.

A more precise way to understand on Linux the memory usage of a running process is to use the proc(5) file system.
So, if your process pid is 1234, try
cat /proc/1234/maps
Notice that processes are having their address space in virtual memory. That address space can be changed by mmap(2) and other syscalls(2). For several efficency reasons malloc(3) and free avoid to make too much of these syscalls, and prefer to re-use previously free-d memory zones. So when your program is free-ing (or, in C++, delete-ing) some memory chunk, that chunk is often marked as re-usable but is not released back to the kernel (by e.g. munmap). Likewise, if you malloc only 100 bytes, your libc is allowed to e.g. request a whole megabyte using mmap (the next time you are calling malloc for e.g. 200 bytes, it will use part of that magabyte)
See also http://linuxatemyram.com/ and Advanced Linux Programming (and this question about memory overcommit)

The classes of memory reported by the Gnome resource monitor (and in fact, the vast majority of resource reporting tools) are not simply separate classes of memory - there is overlap between them because they are reporting on different characteristics of the memory. Some of those different characteristics include:
virtual vs physical - all memory in a processes address space on modern operating systems is virtual; that virtual address space is mapped to actual physical memory by the hardware capabilities of the CPU; how that mapping is done is a complex topic in itself, with a lot of differences between different architectures
memory access permissions - memory can be readable, writable, or executable, or any combination of the three (in theory - some combinations don't really make sense and so may actually not be allowed by hardware and/or software, but the point is that these permissions are treated separately)
resident vs non-resident - with a virtual memory system, much of the address space of a process may not actually be currently mapped to real physical memory, for a variety of reasons - it may not have been allocated yet; it may be part of the binary or one of the libraries, or even a data segment that has not yet been loaded because the program has not called for it yet; it may have been swapped out to a swap area to free up physical memory for a different program that needed it
shared vs private - parts of a processes virtual address space that are read-only (for example, the actual code of the program and most of the libraries) may be shared with other processes that use the same libraries or program - this is a big advantage for overall memory usage, as having 37 different xterm instances running does not mean that the code for xterm needs to be loaded 37 different times into memory - all the processes can share one copy of the code
Because of these, and a few other factors (IPC shared memory, memory-mapped files, physical devices that have memory regions mapped in hardware, etc.), determining the actual memory in use by any single process, or even the entire system, can be complicated.

Dynamic allocation in uClinux

I'm new to embedded development, and the big differences I see between traditional Linux and uClinux is that uClinux lacks the MMU.
From this article:
Without VM, each process must be located at a place in memory where it can be run. In the simplest case, this area of memory must be contiguous. Generally, it cannot be expanded as there may be other processes above and below it. This means that a process in uClinux cannot increase the size of its available memory at runtime as a traditional Linux process would.
To me, this sounds like all data must reside on the stack, and that heap allocation is impossible, meaning malloc() and/or "new" are out of the question... is that accurate? Perhaps there are techniques/libraries which allow for managing a "static heap" (i.e. a stack based area from which "dynamic" allocations can be requested)?
Or am I over thinking it? Or over simplifying it?

Under regular Linux, the programmer does not need to deal with physical resources. The kernel takes care of this, and a user space process sees only its own address space. As the stack grows, or malloc-type requests are made, the kernel will map free memory into the process's virtual address space.
In uClinux, the programmer must be more concerned with physical memory. The MMU and VM are not available, and all address space is shared with the kernel. When a user space program is loaded, the process is allocated physical memory pages for the text, stack, and variables. The process's program counter, stack pointer, and data/bss table pointers are set to physical memory addresses. Heap allocations (via malloc-type calls) are made from the same pool.
You will not have to get rid of heap allocation in programs. You will need to be concerned with some new issues. Since the stack cannot grow via virtual memory, you must size it correctly during linking to prevent stack overflows. Memory fragmentation becomes an issue because there's no MMU to consolidate smaller free pages. Errant pointers become more dangerous because they can now cause unintended writes to anywhere in physical memory.

It's been a while since I've worked with uCLinux (it was before it was integrated into the main tree), but I thought malloc was still available as part of the c library. There was a lot higher chance of doing Very Bad Things (tm) in memory since the heap wasn't isolated, but it was possible.

yes you can use malloc in user space applications on uclinux ,but then you have to increase the size of stack of user space application(before running the program cause stack size would be static),so that when malloc runs it will get the space it needs.
for e.g. uclinux on arm-cortex
arm toolchain provides command to find and change size of stack used by binary of user application then you can tranfer it to your embedded system and run
----- > arm-uclinuxeabi-flthdr

How much memory should you be able to allocate?

Background: I am writing a C++ program working with large amounts of geodata, and wish to load large chunks to process at a single go. I am constrained to working with an app compiled for 32 bit machines. The machine I am testing on is running a 64 bit OS (Windows 7) and has 6 gig of ram. Using MS VS 2008.
I have the following code:
byte* pTempBuffer2[3];
try
{
//size_t nBufSize = nBandBytes*m_nBandCount;
pTempBuffer2[0] = new byte[nBandBytes];
pTempBuffer2[1] = new byte[nBandBytes];
pTempBuffer2[2] = new byte[nBandBytes];
}
catch (std::bad_alloc)
{
// If we didn't get the memory just don't buffer and we will get data one
// piece at a time.
return;
}
I was hoping that I would be able to allocate memory until the app reached the 4 gigabyte limit of 32 bit addressing. However, when nBandBytes is 466,560,000 the new throws std::bad_alloc on the second try. At this stage, the working set (memory) value for the process is 665,232 K So, it I don't seem to be able to get even a gig of memory allocated.
There has been some mention of a 2 gig limit for applications in 32 bit Windows which may be extended to 3 gig with the /3GB switch for win32. This is good advice under that environment, but not relevant to this case.
How much memory should you be able to allocate under the 64 bit OS with a 32 bit application?

As much as the OS wants to give you. By default, Windows lets a 32-bit process have 2GB of address space. And this is split into several chunks. One area is set aside for the stack, others for each executable and dll that is loaded. Whatever is left can be dynamically allocated, but there's no guarantee that it'll be one big contiguous chunk. It might be several smaller chunks of a couple of hundred MB each.
If you compile with the LargeAddressAware flag, 64-bit Windows will let you use the full 4GB address space, which should help a bit, but in general,
you shouldn't assume that the available memory is contiguous. You should be able to work with multiple smaller allocations rather than a few big ones, and
You should compile it as a 64-bit application if you need a lot of memory.

on windows 32 bit, the normal process can take 2 GB at maximum, but with /3GB switch it can reach to 3 GB (for windows 2003).
but in your case I think you are allocating contiguous memory, and so the exception occured.

You can allocate as much memory as your page file will let you - even without the /3GB switch, you can allocate 4GB of memory without much difficulty.
Read this article for a good overview of how to think about physical memory, virtual memory, and address space (all three are different things). In a nutshell, you have exactly as much physical memory as you have RAM, but your app really has no interaction with that physical memory at all - it's just a convenient place to store the data that in your virtual memory. Your virtual memory is limited by the size of your pagefile, and the amount your app can use is limited by how much other apps are using (although you can allocate more, providing you don't actually use it). Your address space in the 32 bit world is 4GB. Of those, 2 GB are allocated to the kernel (or 1GB if you use the /3BG switch). Of the 2GB that are left, some is going to be used up by your stack, some by the program you are currently running, (and all the dlls, etc..). It's going to get fragmented, and you are only going to be able to get so much contiguous space - this is where your allocation is failing. But since that address space is just a convenient way to access the virtual memory you have allocated for you, it's possible to allocate much more memory, and bring chunks of it into your address space a few at a time.
Raymond Chen has an example of how to allocate 4GB of memory and map part of it into a section of your address space.
Under 32-bit Windows, the maximum allocatable is 16TB and 256TB in 64 bit Windows.
And if you're really into how memory management works in Windows, read this article.

During the ElephantsDream project the Blender Foundation with Blender 3D had similar problems (though on Mac). Can't include the link but google: blender3d memory allocation problem and it will be the first item.
The solution involved File Mapping. Haven't tried it myself but you can read up on it here: http://msdn.microsoft.com/en-us/library/aa366556(VS.85).aspx

With nBandBytes at 466,560,000, you are trying to allocate 1.4 GB. A 32-bit app typically only has access to 2 GB of memory (more if you boot with /3GB and the executable is marked as large address space aware). You may be hard pressed to find that many blocks of contiguous address space for your large chunks of memory.
If you want to allocate gigabytes of memory on a 64-bit OS, use a 64-bit process.

You should be able to allocate a total of about 2GB per process. This article (PDF) explains the details. However, you probably won't be able to get a single, contiguous block that is even close to that large.

Even if you allocate in smaller chunks, you couldn't get the memory you need, especially if the surrounding program has unpredictable memory behavior, or if you need to run on different operating systems. In my experience, the heap space on a 32-bit process caps at around 1.2GB.
At this amount of memory, I would recommend manually writing to disk. Wrap your arrays in a class that manages the memory and writes to temporary files when necessary. Hopefully the characteristics of your program are such that you could effectively cache parts of that data without hitting the disk too much.

Sysinternals VMMap is great for investigating virtual address space fragmentation, which is probably limiting how much contiguous memory you can allocate. I recommend setting it to display free space, then sorting by size to find the largest free areas, then sorting by address to see what is separating the largest free areas (probably rebased DLLs, shared memory regions, or other heaps).
Avoiding extremely large contiguous allocations is probably for the best, as others have suggested.
Setting LARGE_ADDRESS_AWARE=YES (as jalf suggested) is good, as long as the libraries that your application depends on are compatible with it. If you do so, you should test your code with the AllocationPreference registry key set to enable top-down virtual address allocation.

Deallocation doesn't free memory in Windows/C++ Application

My Windows/C++ application allocates ~1Gb of data in memory with the operator new and processes this data. After processing the data is deleted.
I noticed that if I run the processing again without exiting the application, the second call to the operatornew to allocate ~1Gb of data fails.
I would expect Windows to deliver the memory back. Could this be managed in a better way with some other Win32 calls etc.?

I don't think this is a Windows problem. Check if you used delete or delete[] correctly. Perhaps it would help if you post the code that is allocating/freeing the memory.

In most runtime environments memory allocated to an application from the operating system remains in the application, and is seldom returned back to the operating system. Freeing a memory block allows you to reuse the block from within the application, but does not free it to the operating system to make it available to other applications.
Microsoft's C runtime library tries to return memory back to the operating system by having _heapmin_region call _heap_free_region or _free_partial_region which call VirtualFree to release data to the operating system. However, if whole pages in the corresponding region are not empty, then they will not be freed. A common cause of this is the bookkeeping information and storage caching of C++ containers.

This could be due to memory fragmentation (in reality, address space fragmentation), where various factors have contributed to your program address space not having a 1gb contiguous hole available. In reality, I suspect a bug in your memory management (sorry) - have you run your code through leak detection?

Since you are using very large memory blocks, you should consider using VirtualAlloc() and VirtualFree(), as they allow you to allocate and free pages directly, without the overhead (in memory and time) of interacting with a heap manager.
Since you are using C++, it is worth noting that you can construct C++ objects in the memory you allocate this way by using placement new.

This problem is almost certainly memory fragmentation. On 32 bit Windows the largest contiguous region you can allocate is about 1.1GB (because various DLLs in your EXE preventa larger contiguous range being available). If after deallocating a memory allocation (or a DLL load, or a memory mapped file) ends up in the middle of your previous 1GB region then there will no longer be a 1GB region available for your next call to new to allocate 1GB. Thus it will fail.
You can visualize this process using VM Validator.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js