memory allocation from heap in windows kernel mode

memory allocation from heap in windows kernel mode - c++

I'm working on Windows kernel mode driver development.
I believe the way we allocate dynamic memory on the heap in the user space is not applicable in the kernel mode.
I would like to know what is the kernel mode equivalent code for following piece of code(from user space).
MsgAttriPtr = HeapAlloc(GetProcessHeap(), HEAP_ZERO_MEMORY, ulMessageAttributeBufSize);
HeapFree(GetProcessHeap(), 0, MsgAttriPtr);
In the code above I'm trying to allocate memory on the heap of size ulMessageAttributeBufSize and initialize it with zeros. Then I'm using HeapFree() to try and de-allocate/free the above memory.
Thanks,
Also please let me know what are the Windows native API headers to include for the functions we are going to use for kernel mode memory allocation on the heap.

Related

glibc application holding onto unused memory until just before exit

I have a C++ application (gcc 4.9.1, glibc 2.17) running on Linux (Centos 7). It uses various third-party libraries, notably Boost 1.61. As the application runs, I can watch its memory usage increasing steadily via htop's VIRT and RES columns, or the ps command, etc. If I let it go long enough, it will use enormous amounts of that memory and swamp the box.
Sounds like a leak, but it passes valgrind with only a few bytes leaked, all in places I'd expect. Debug print messages indicate program flow is as expected.
Digging further via the debugger, I find that most of this memory is still in use when __run_exit_handlers is invoked at the end of main. I can step through various calls to free as it works through the global destructor chain. After it finishes those, I observe only a minimal downward change in the apparent memory usage. Then, finally it calls _exit(), and only then is the memory restored to the operating system, all at once.
Can anyone offer me additional tips on how to proceed debugging this? Why won't my program give that memory back?

Everything here is based on GNU libc implementation of malloc running on Linux.
Test program below does not seem to give away any memory to the system after freeing memory (strace does not show sbrk calls that return memory back to the kernel):
int main()
{
static const int N = 5000000;
static void *arr[N];
for (int i = 0; i < N; i++)
arr[i] = std::malloc(1024);
// reverse to simplify allocators job
for (int i = N - 1; i >= 0; i--)
std::free(arr[i]);
}
Looks like glibc does not give away memory back at all. According to mallopt(3) man page, parameter M_TRIM_THRESHOLD is responsible for giving away memory. By default it is 128kb, while test program allocates and frees 5 GB of memory. Looks like some other details of malloc implementation do not let it free memory.
At the moment I can recommend following solutions:
If you can, try calling malloc_trim once in a while or after freeing a lot of memory. This should force trimming and should give memory back to OS using MADV_DONTNEED.
Avoid allocating large number of small objects using malloc or operator new, instead allocate them from a memory pool of a size greater than M_MMAP_THRESHOLD. Try destroing that pool afterwards if program logic allows this. Memory chunks of size greater than M_MMAP_THRESHOLD are immediately released back to OS.
Same as previous one, but should be faster: allocate memory pools for small objects using mmap and release memory back to OS using madvise and MADV_DONTNEED/MADV_FREE.
Try using another allocator that might take advantage of MADV_FREE to return memory back to the system (jemalloc?).
I have found this old (2006) ticket on glibc's bugzilla. It says there that free never returns memory back to the kernel, unless malloc_trim is called.
Newer versions of free seem to have code that executes internal systrim function that should trim top of the arena, but I wasn't able to make it work.

You can profile your memory allocation using valgrind --tool=massif ./executable
Check out the documentation at http://valgrind.org/docs/manual/ms-manual.html
Then once you have profiling data you can apply memory pools and other techniques. Since you already use Boost you can find several such tools in Boost.

top may be showing incorrect memory usage

I am writing a simple C++ program on Mac OS. I have just
int main()
{
int *n = new int[50000000];
}
I launch this program in lldb, and put a breakpoint at the line where n is allocated. Then I launch top in another tab, I see that memory usage is 336K pre-allocation. When I do n inside lldb, so that the allocation for n happens, I expect to my memory usage to go up. However, top shows me the same amount of memory used by my program. What could be the reason for this? I am trying to understand how memory allocation happens in C++, which is why I am doing this.
I have not exited the scope of main. When I check top again, I am sitting at closing curly brace for main.

The top command shows the process stats as viewed by the operating system. It shows how much memory was allocated to the process, but not how much of this memory is effectively in use. It's not accurate for monitoring memory allocation.
Memory allocation with heap and free store is implementation dependent in C++. But tt's usually not mapped one to one with OS allocation calls. For performance reasons (calls to the OS are slower than calls inside your userland code), the memory is received from OS in larger chunks:
when the c++ runtime starts, it usually allocates some memory from the OS, in order to allocate memory it needs for standard library objects, and to initalize the free store to quickly satisfy allocation request.
Only if this initial memory is exhausted will the standard library allocate more memory from the operating system.
And allocation is done again in larger chunks, so that not every new would raise an OS call.
From your observations, I guess that this initial allocation is larger than 50 MB. Try with a much larger value to see the difference.
If you want to track memory consumption more precisely, you need some profiling tools, for example valgrind or heap command

new[] doesn't decrease available memory until populated

This is in C++ on CentOS 64bit using G++ 4.1.2.
We're writing a test application to load up the memory usage on a system by n Gigabytes. The idea being that the overall system load gets monitored through SNMP etc. So this is just a way of exercising the monitoring.
What we've seen however is that simply doing:
char* p = new char[1000000000];
doesn't affect the memory used as shown in either top or free -m
The memory allocation only seems to become "real" once the memory is written to:
memcpy(p, 'a', 1000000000); //shows an increase in mem usage of 1GB
But we have to write to all of the memory, simply writing to the first element does not show an increase in the used memory:
p[0] = 'a'; //does not show an increase of 1GB.
Is this normal, has the memory actually been allocated fully? I'm not sure if it's the tools we are using (top and free -m) that are displaying incorrect values or whether there is something clever going on in the compiler or in the runtime and/or kernel.
This behavior is seen even in a debug build with optimizations turned off.
It was my understanding that a new[] allocated the memory immediately. Does the C++ runtime delay this actual allocation until later on when it is accessed. In that case can an out of memory exception be deferred until well after the actual allocation of the memory until the memory is accessed?
As it is it is not a problem for us, but it would be nice to know why this is occurring the way it is!
Cheers!
Edit:
I don't want to know about how we should be using Vectors, this isn't OO / C++ / the current way of doing things etc etc. I just want to know why this is happening the way it is, rather than have suggestions for alternative ways of trying it.

When your library allocates memory from the OS, the OS will just reserve an address range in the process's virtual address space. There's no reason for the OS to actually provide this memory until you use it - as you demonstrated.
If you look at e.g. /proc/self/maps you'll see the address range. If you look at top's memory use you won't see it - you're not using it yet.

Please look up for overcommit. Linux by default doesn't reserve memory until it is accessed. And if you end up by needing more memory than available, you don't get an error but a random process is killed. You can control this behavior with /proc/sys/vm/*.
IMO, overcommit should be a per process setting, not a global one. And the default should be no overcommit.

About the second half of your question:
The language standard doesn't allow any delays in throwing a bad_alloc. That must happen as an alternative to new[] returning a pointer. It cannot happen later!
Some OSs might try to overcommit memory allocations, and fail later. That is not conforming to the C++ language standard.

what could be reason for virtual bytes to grow 2x private bytes?

An application's virtual bytes grow 2-times the private bytes.
does this indicate memory leak? bad application design?
OS is 32Bit
any thoughts are welcome.
application is stream database.

Fragmentation.
If you allocate the following chunks of memory:
16KB
8KB
16KB
and you then free the chunk of 8KB, your application will have 32 KB of private bytes, but 40 KB bytes of virtual memory, which is actually the highest virtual memory address that has ever been in use by your process (ignoring the other memory parts for sake of simplicity).
Consider (if possible) using another memory manager. Some alternatives are:
The Windows Low-fragementation heap (see http://msdn.microsoft.com/en-us/library/aa366750%28VS.85%29.aspx for more info)
The Doug-Lea open source memory manager
Commercial alternatives like Hoard
A fourth alternative is to write your own memory manager. It's not that easy, but if done right, it can have quite some benefits. Especially for certain niche or special applications, writing your own memory manager can be useful.

An application's virtual bytes grow 2-times the private bytes.
If application allocates only heap, then to me it would be the sign that application allocates lots of memory but never actually touches it. For example:
void *p = malloc( 16u<<20 );
would eat up 16MB of virtual memory. But as long as application doesn't perform any actions with the memory block, OS wouldn't even attempt to map the virtual memory to the RAM. Simplest way to force the actual allocation of private memory is to memset() it:
void *p = malloc( 16u<<20 );
memset( p, 0, 16u<<20 );
does this indicate memory leak? bad application design?
Or both. Or neither.
The longer variant of the response: unknown, depends on what memory application allocates, what other resources application uses, OS, h/w platform, etc.
If unsure, use a memory leak analysis tools to investigate, e.g. valgrind. Read up SO for more information on memory leak analysis in C++.

Memory allocation has overhead to store management information about what was allocated. If you're allocating very small buffers the extra information can be a significant percentage of the total. That might be what you're seeing.

One possibility is if you set a large stack reserve size for your threads with linker option /STACK:reserve_bytes and then you start a lot of threads.
For example, if you have an ATL service, it automatically starts 4*numberOfCores apartment message dispatching threads by default. Compile and link such a service with /STACK:12000000 (12 megabytes), then run it on a 16-core server and it will start 64 threads, each with a 12MB stack, immediately consuming 768MB of virtual address space, although the actual committed memory may be much lower.

C++ Array size x86 and for x64

Simple question, I'm writting a program that needs to open huge image files (8kx8k) but I'm a little bit confused on how to initialize the huge arrays to hold the images in c++.
I been trying something like this:
long long SIZE = 8092*8092; ///8096*8096
double* array;
array = (double*) malloc(sizeof(double) * SIZE);
if (array == NULL)
{
fprintf(stderr,"Could not allocate that much memory");
}
But sometimes my NULL check does not catch that the array was not initialized, any idea why?
Also I can't initialize more that 2 or 3 arrays, even when running in a x64 machine with 12 GB of RAM, any idea why?
I would really wish not to have to work with sections of array instead. Any help is welcome.
Thanks.

You're not running into an array size problem. 8K*8K is merely 64M. Even 64M doubles (sizeof==8) are not an issue; that would require a mere 512 MB. Now, a 32 bit application (no matter where it's running) should be able to allocate a few of them. Not 8, because the OS typically needs to reserve some space for itself (often slightly over 2GB) and sometimes not even 3 when memory is fragmented.
The behavior of "malloc failed but didn't return NULL" is a Linux configuration bug, fixed by # echo 2 > /proc/sys/vm/overcommit_memory

malloc() does not initialize memory, it just reserves it. You will have to initialize it explicitly, e.g. via memset() from string.h:
array = (double*) malloc(SIZE * sizeof(double));
if (array) memset(array, 0, SIZE * sizeof(double));
However, in C++ you should use new instead of malloc:
double* array = new double[SIZE];
if (!array) {
cerr << "Could not allocate that much memory" << endl;
}
for (int i=0; i<SIZE; i++) array[i] = 0.0;
Regarding size: each such array is 512 MB. Are you positively sure you need double precision (which means the image has 64-bit pixel depth)? Maybe a float would suffice? That would halve the memory footprint.

You might be running into a 2GB per-process address space limit if you are running a 32bit operating system. With a few hundred MBs of system libs and other stuff, and 2 or 3 arrays of 512MB each, that will give 2GB easily. A 64bit OS would help you there.

Are you compiling your application as a 32-bit application (the default in Visual Studio, if that's what you're using), or as a 64-bit application? You shouldn't have troubles if you build it as a 64-bit app.
malloc allocates (reserves memory and returns a pointer), calloc initializes (writes all zeros to that memory).

Seems to be that you have no continuous memory block of such size (~500Mb) in C runtime heap. Instead of copying file into memory try to map image into a processes address space. You could map only necessary parts of the file.

Just as a side note: although you don't want to bother about the whole image not being in memory at once, there are reasons not to do it. Maybe think about an abstraction that allows you to keep only the currently needed chunk in memory. The program code then can be written as though ignorant of the memory issues.

I would really wish not to have to work with sections of array instead. Any help is welcome.
Have you looked into memory-mapped files?

Yep, sounds a lot like heap fragmentation, as Kirill pointed out. See also: How to avoid heap fragmentation?

i suggest using compression. decompress part of it which you need to process in your code whenever, and compress it after the part done.
2nd proposal: write code to overload memory pointer "operator+" and "operator-" so you could use non-continuous memory buffers. use smaller memory buffers make your code more stable than a continuous larger one. i had experienced it and had written some operator-overloading, see http://code.google.com/p/effoaddon/source/browse/trunk/devel/effo/codebase/addons/mem/include/mcur_i.h for the example. when i test 47G malloc()ed system memory on a x86_64, i allocated just 1G per malloc() call, so i allocated 47 memory blocks in total. EDIT: while if i tried to allocate as much as possible by using just one malloc(), i would only get 30G on a 48G system, say less than 70%, that's because larger buffer per malloc() requested, much more managemental memory consumed by the system/libc itself, you know, I called mlock() to prevent the allocated memory from being swapped out to the disk.
3rd one: try posix file mapping, map to memory per image.
Btw: call malloc() is more stable than new() though writing c++, because when memory got stressed, new() is prone to trow exceptions instead of returning NULL.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js