Also check realloc() if shrinking allocated size of memory? - c++

When you call realloc() you should check whether the function failed before assigning the returned pointer to the pointer passed as a parameter to the function...
I've always followed this rule.
Now is it necessary to follow this rule when you know for sure the memory will be truncated and not increased?
I've never ever seen it fail. Just wondered if I could save a couple instructions.

realloc may, at its discretion, copy the block to a new address regardless of whether the new size is larger or smaller. This may be necessary if the malloc implementation requires a new allocation to "shrink" a memory block (e.g. if the new size requires placing the memory block in a different allocation pool). This is noted in the glibc documentation:
In several allocation implementations, making a block smaller sometimes necessitates copying it, so it can fail if no other space is available.
Therefore, you must always check the result of realloc, even when shrinking. It is possible that realloc has failed to shrink the block because it cannot simultaneously allocate a new, smaller block.

Even if you realloc (read carefully realloc(3) and about Posix realloc please) to a smaller size, the underlying implementation is doing the equivalent of malloc (of the new smaller size), followed by a memcpy (from old to new zone), then free (of the old zone). Or it may do nothing... (e.g. because some crude malloc implementations maitain a limited set of sizes -like power of two or 3 times power of two-, and the old and new size requirements fits in the same size....)
That malloc can fail. So realloc can still fail.
Actually, I usually don't recommend using realloc for that reason: just do the malloc, memcpy, free yourself.
Indeed, dynamic heap memory functions like malloc rarely fail. But when they do, chaos may happen if you don't handle that. On Linux and some other Posix systems you could setrlimit(2) with RLIMIT_AS -e.g. using bash ulimit builtin- to lower the limits for testing purposes.
You might want to study the source code implementations of C memory management. For example MUSL libc (for Linux) is very readable code. On Linux, malloc is often built above mmap(2) (the C library may allocate a large chunk of memory using mmap then managing smaller used and freed memory zones inside it).

Related

delete[] does not free memory of array returned by function [duplicate]

I am observing the following behavior in my test program:
I am doing malloc() for 1 MB and then free() it after sleep(10). I am doing this five times. I am observing memory consumption in top while the program is running.
Once free()-d, I am expecting the program's virtual memory (VIRT) consumption to be down by 1 MB. But actually it isn't. It stays stable. What is the explanation for this behavior? Does malloc() do some reserve while allocating memory?
Once free()-d, I am expecting program's virtual memory (VIRT) consumption to be down by 1MB.
Well, this is not guaranteed by the C standard. It only says, once you free() the memory, you should not be accessing that any more.
Whether the memory block is actually returned to the available memory pool or kept aside for future allocations is decided by the memory manager.
The C standard doesn't force on the implementer of malloc and free to return the memory to the OS directly. So different C library implementations will behave differently. Some of them might give it back directly and some might not. In fact, the same implementation will also behave differently depending on the allocation sizes and patterns.
This behavior, of course, is for good reasons:
It is not always possible. OS-level memory allocations usually are done in pages (4KB, 4MB, or ... sizes at once). And if a small part of the page is still being used after freeing another part then the page cannot be given back to the operating system until that part is also freed.
Efficiency. It is very likely that an application will ask for memory again. So why give it back to the OS and ask for it again soon after. (of course, there is probably a limit on the size of the memory kept.)
In most cases, you are not accountable for the memory you free if the implementation decided to keep it (assuming it is a good implementation). Sooner or later it will be reallocated or returned to the OS. Hence, optimizing for memory usage should be based on the amount you have malloc-ed and you haven't free-d. The case where you have to worry about this, is when your allocation patterns/sizes start causing memory fragmentation which is a very big topic on its own.
If you are, however, on an embedded system and the amount of memory available is limited and you need more control over when/how memory is allocated and freed then you need to ask for memory pages from the OS directly and manage it manually.
Edit: I did not explain why you are not accountable for memory you free.
The reason is, on a modern OS, allocated memory is virtual. Meaning if you allocate 512MB on 32-bit system or 10TB of 64-bit system, as long as you don't read or write to that memory, it will not reserve any physical space for it. Actually, it will only reserve physical memory for the pages you touch from that big block and not the entire block. And after "a while of not using that memory", its contents will be copied to disk and the underlying physical memory will be used for something else.
This is very dependent on the actual malloc implementation in use.
Under Linux, there is a threshold (MMAP_THRESHOLD) to decide where the memory for a given malloc() request comes from.
If the requested amount is below or equal to MMAP_THRESHOLD, the request is satisfied by either taking it from the so-called "free list", if any memory blocks have already been free()d. Otherwise, the "break line" of the program (i. e. the end of the data segment) is increased and the memory made available to the program by this process is used for the request.
On free(), the freed memory block is added to the free list. If there is enough free memory at the very end of the data segment, the break line (mentionned above) is moved again to shrink the data segment, returning the excess memory to the OS.
If the requested amount exceeds MMAP_THRESHOLD, a separate memory block is requested by the OS and returned again during free().
See also https://linux.die.net/man/3/malloc for details.

Why don't memory allocators actively return freed memory to the OS?

Yes, this might be the third time you see this code, because I asked two other questions about it (this and this)..
The code is fairly simple:
#include <vector>
int main() {
std::vector<int> v;
}
Then I build and run it with Valgrind on Linux:
g++ test.cc && valgrind ./a.out
==8511== Memcheck, a memory error detector
...
==8511== HEAP SUMMARY:
==8511== in use at exit: 72,704 bytes in 1 blocks
==8511== total heap usage: 1 allocs, 0 frees, 72,704 bytes allocated
==8511==
==8511== LEAK SUMMARY:
==8511== definitely lost: 0 bytes in 0 blocks
==8511== indirectly lost: 0 bytes in 0 blocks
==8511== possibly lost: 0 bytes in 0 blocks
==8511== still reachable: 72,704 bytes in 1 blocks
==8511== suppressed: 0 bytes in 0 blocks
...
==8511== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Here, Valgrind reports there is no memory leak, even though there is 1 alloc and 0 free.
The answer here points out the allocator used by C++ standard library don't necessarily return the memory back to the OS - it might keep them in an internal cache.
Question is:
1) Why keep them in an internal cache? If it is for speed, how is it faster? Yes, the OS needs to maintain a data structure to keep track of memory allocation, but this the maintainer of this cache also needs to do so.
2) How is this implemented? Because my program a.out terminates already, there is no other process that is maintaining this memory cache - or, is there one?
Edit: for question (2) - Some answers I've seen suggest "C++ runtime", what does it mean? If "C++ runtime" is the C++ library, but the library is just a bunch of machine code sitting on the disk, it is not a running process - the machine code is either linked to my a.out (static library, .a) or is invoked during runtime (shared objects, .so) in the process of a.out.
Clarification
First, some clarification. You asked: ... my program a.out terminates already, there is no other process that is maintaining this memory cache - or, is there one?
Everything we are talking about is within the lifetime of a single process: the process always returns all allocated memory when it exits. There is no cache that outlives the process1. The memory is returned even without any help from the runtime allocator: the OS simply "takes it back" when the process is terminated. So there is no system-wide leak possible from terminated applications with normal allocations.
Now what Valgrind is reporting is memory that is in use at the moment the process terminated, but before the OS cleans everything up. It works at the runtime library level, and not at the OS level. So it's saying "Hey, when the program finished, there were 72,000 bytes that hadn't been returned to the runtime" but an unstated implication is that "these allocations will be cleaned up shortly by the OS".
The Underlying Questions
The code and Valgrind output shown doesn't really correlate well with the titular question, so let's break them apart. First we'll just try to answer the questions you asked about allocators: why they exist and why they don't generally don't immediately return freed memory to the OS, ignoring the example.
You asked:
1) Why keep them in an internal cache? If it is for speed, how is it
faster? Yes, the OS needs to maintain a data structure to keep track
of memory allocation, but this the maintainer of this cache also needs
to do so.
This is sort of two questions in one: one is why bother having a userland runtime allocator at all, and then the other one is (perhaps?) why don't these allocators immediately return memory to the OS when it is freed. They are related, but let's tackle them one at a time.
Why Runtime Allocators Exist
Why not just rely on the OS memory allocation routines?
Many operating systems, including most Linux and other Unix-like operating systems, simply don't have an OS system call to allocate and free arbitrary blocks of memory. Unix-alikes offer brk which only grows or shrinks one contiguous block of memory - you have no way to "free" arbitrary earlier allocations. They also offer mmap which allows you to independently allocate and free chunks of memory, but these allocate on a PAGE_SIZE granularity, which on Linux is 4096 bytes. So if you want request 32 bytes, you'll have to waste 4096 - 32 == 4064 bytes if you don't have your own allocator. On these operating systems you practically need a separate memory allocation runtime which turns these coarse-grained tools into something capable of efficiently allocating small blocks.
Windows is a bit different. It has the HeapAlloc call, which is part of the "OS" and does offer malloc-like capabilities of allocating and freeing arbitrarily sized chunks of memory. With some compilers then, malloc is just implemented as a thin wrapper around HeapAlloc (the performance of this call has improved greatly in recent Windows versions, making this feasible). Still, while HeapAlloc is part of the OS it isn't implemented in the kernel - it is also mostly implemented in a user-mode library, managing a list of free and used blocks, with occasional kernel calls to get chunks of memory from the kernel. So it is mostly malloc in another disguise and any memory it is holding on to is also not available to any other processes.
Performance! Even if there were appropriate kernel-level calls to allocate arbitrary blocks of memory, the simple overhead roundtrip to the kernel is usually hundreds of nanoseconds or more. A well-tuned malloc allocation or free, on other hand, is often only a dozen instructions and may complete in 10 ns or less. On top of that, system calls can't "trust their input" and so must carefully validate parameters passed from user-space. In the case of free this means that it much check that the user passed a pointer which is valid! Most runtime free implements simply crash or silently corrupt memory since there is no responsibility to protect a process from itself.
Closer link to the rest of the language runtime. The functions you use to allocate memory in C++, namely new, malloc and friends, are part of an defined by the language. It is then entirely natural to implement them as part of the runtime that implements the rest of the language, rather than the OS which is for the most part language-agnostic. For example, the language may have specific alignment requirements for various objects, which can best be handled by language aware allocators. Changes to the language or compiler might also imply necessary changes to the allocation routines, and it would be a tough call to hope for the kernel to be updated to accommodate your language features!
Why Not Return Memory to the OS
Your example doesn't show it, but you asked and if you wrote a different test you would probably find that after allocating and then freeing a bunch of memory, your processes resident set size and/or virtual size as reported by the OS might not decrease after the free. That is, it seems like the process holds on to the memory even though you have freed it. This is in fact true of many malloc implementations. First, note that this is not a leak per se - the unreturned memory is still available to the process that allocated it, even if not to other processes.
Why do they do that? Here are some reasons:
The kernel API makes it hard. For the old-school brk and sbrk system calls, it simply isn't feasible to return freed memory unless it happens to be at the end of very last block allocated from brk or sbrk. That's because the abstraction offered by these calls is a single large contiguous region that you can only extend from one end. You can't hand back memory from the middle of it. Rather than trying to support the unusual case where all the freed memory happens to be at the end of brk region, most allocators don't even bother.
The mmap call is more flexible (and this discussion generally applies also to Windows where VirtualAlloc is the mmap equivalent), allowing you to at least return memory at a page granularity - but even that is hard! You can't return a page until all allocations that are part of that page are freed. Depending on the size and allocation/free pattern of the application that may be common or uncommon. A case where it works well is for large allocations - greater than a page. Here you're guaranteed to be able to free most of the allocation if it was done via mmap and indeed some modern allocators satisfy large allocations directly from mmap and free them back to the OS with munmap. For glibc (and by extension the C++ allocation operators), you can even control this threshold:
M_MMAP_THRESHOLD
For allocations greater than or equal to the limit specified
(in bytes) by M_MMAP_THRESHOLD that can't be satisfied from
the free list, the memory-allocation functions employ mmap(2)
instead of increasing the program break using sbrk(2).
Allocating memory using mmap(2) has the significant advantage
that the allocated memory blocks can always be independently
released back to the system. (By contrast, the heap can be
trimmed only if memory is freed at the top end.) On the other
hand, there are some disadvantages to the use of mmap(2):
deallocated space is not placed on the free list for reuse by
later allocations; memory may be wasted because mmap(2)
allocations must be page-aligned; and the kernel must perform
the expensive task of zeroing out memory allocated via
mmap(2). Balancing these factors leads to a default setting
of 128*1024 for the M_MMAP_THRESHOLD parameter.
So by default allocations of 128K or more will be allocated by the runtime directly from the OS and freed back to the OS on free. So sometimes you will see the behavior you might have expected is always the case.
Performance! Every kernel call is expensive, as described in the other list above. Memory that is freed by a process will be needed shortly later to satisfy another allocation. Rather than trying to return it to the OS, a relatively heavyweight operation, why not just keep it around on a free list to satisfy future allocations? As pointed out in the man page entry, this also avoids the overhead of zeroing out all the memory returned by the kernel. It also gives the best chance of good cache behavior since the process is continually re-using the same region of the address space. Finally, it avoids TLB flushes which would be imposed by munmap (and possibly by shrinking via brk).
The "problem" of not returning memory is the worst for long-lived processes that allocate a bunch of memory at some point, free it and then never allocate that much again. I.e., processes whose allocation high-water mark is larger than their long term typical allocation amount. Most processes just don't follow that pattern, however. Processes often free a lot of memory, but allocate at a rate such that their overall memory use is constant or perhaps increasing. Applications that do have the "big then small" live size pattern could perhaps force the issue with malloc_trim.
Virtual memory helps mitigate the issue. So far I've been throwing around terms like "allocated memory" without really defining what it means. If a program allocates and then frees 2 GB of memory and then sits around doing nothing, is it wasting 2 GB of actual DRAM plugged into your motherboard somewhere? Probably not. It is using 2 GB of virtual address space in your process, sure, but virtual address space is per-process, so that doesn't directly take anything away from other processes. If the process actually wrote to the memory at some point, it would be allocated physical memory (yes, DRAM) - after freeing it, you are - by definition - no longer using it. At this point the OS may reclaim those physical pages by use for someone else.
Now this still requires you have swap to absorb the dirty not-used pages, but some allocators are smart: they can issue a madvise(..., MADV_DONTNEED) call which tells the OS "this range doesn't have anything useful, you don't have to preserve its contents in swap". It still leaves the virtual address space mapped in the process and usable later (zero filled) and so it's more efficient than munmap and a subsequent mmap, but it avoid pointlessly swapping freed memory regions to swap.2
The Demonstrated Code
As pointed out in this answer your test with vector<int> isn't really testing anything because an empty, unused std::vector<int> v won't even create the vector object as long as you are using some minimal level of optimization. Even without optimization, no allocation is likely to occur because most vector implementations allocate on first insertion, and not in the constructor. Finally, even if you are using some unusual compiler or library that does an allocation, it will be for a handful of bytes, not the ~72,000 bytes Valgrind is reporting.
You should do something like this to actually see the impact of a vector allocation:
#include <vector>
volatile vector<int> *sink;
int main() {
std::vector<int> v(12345678);
sink = &v;
}
That results in actual allocation and de-allocation. It isn't going to change the Valgrind output, however, since the vector allocation is correctly freed before the program exits, so there is no issue as far as Valgrind is concerned.
At a high level, Valgrind basically categorizes things into "definite leaks" and "not freed at exit". The former occur when the program no longer has a reference to a pointer to memory that it allocated. It cannot free such memory and so has leaked it. Memory which hasn't been freed at exit may be a "leak" - i.e., objects that should have been freed, but it may also simply be memory that the developer knew would live the length of the program and so doesn't need to be explicitly freed (because of order-of-destruction issues for globals, especially when shared libraries are involved, it may be very hard to reliably free memory associated with global or static objects even if you wanted to).
1 In some cases some deliberately special allocations may outlive the process, such as shared memory and memory mapped files, but that doesn't relate to plain C++ allocations and you can ignore it for the purposes of this discussion.
2 Recent Linux kernels also have the Linux-specific MADV_FREE which seems to have similar semantics to MADV_DONTNEED.

mmap: enforce 64K alignment

I'm porting a project written (by me) for Windows to mobile platforms.
I need an equivalent of VirtualAlloc (+friends), and the natural one is mmap. However there are 2 significant differences.
Addresses returned by VirtualAlloc are guaranteed to be multiples of the so-called allocation granularity (dwAllocationGranularity). Not to be confused with the page size, this number is arbitrary, and on most Windows system is 64K. In contrast address returned by mmap is only guaranteed to be page-aligned.
The reserved/allocated region may be freed at once by a call to VirtualFree, and there's no need to pass the allocation size (that is, the size used in VirtualAlloc). In contrast munmap should be given the exact region size to be unmapped, i.e. it frees the given number of memory pages without any relation about how they were allocated.
This imposes problems for me. While I could live with (2), the (1) is a real problem. I don't want to get into details, but assuming the much smaller allocation granularity, such as 4K, will lead to a serious efficiency degradation. This is related to the fact that my code needs to put some information at every granularity boundary within the allocated regions, which impose "gaps" within the contiguous memory region.
I need to solve this. I can think about pretty naive methods of allocating increased regions, so that they can be 64K-aligned and still have adequate size. Or alternatively reserve huge regions of virtual address space, and then allocating properly-aligned memory regions (i.e. implement a sort of an aligned heap). But I wonder if there are alternatives. Such as special APIs, maybe some flags, secret system calls or whatever.
(1) is actually quite easy to solve. As you note, munmap takes a size parameter, and this is because munmap is capable of partial deallocation. You can thus allocate a chunk of memory that is bigger than what you need, then deallocate the parts that aren't aligned.
If possible, use posix_memalign (which despite the name allocates rather than aligns memory); this allows you to specify an alignment, and the memory allocated can be released using free. You would need to check whether posix_memalign is provided by your target platform.

Why is memory not reusable after allocating/deallocating a number of small objects?

While investigating a memory link in one of our projects, I've run into a strange issue. Somehow, the memory allocated for objects (vector of shared_ptr to object, see below) is not fully reclaimed when the parent container goes out of scope and can't be used except for small objects.
The minimal example: when the program starts, I can allocate a single continuous block of 1.5Gb without problem. After I use the memory somewhat (by creating and destructing an number of small objects), I can no longer do big block allocation.
Test program:
#include <iostream>
#include <memory>
#include <vector>
using namespace std;
class BigClass
{
private:
double a[10000];
};
void TestMemory() {
cout<< "Performing TestMemory"<<endl;
vector<shared_ptr<BigClass>> list;
for (int i = 0; i<10000; i++) {
shared_ptr<BigClass> p(new BigClass());
list.push_back(p);
};
};
void TestBigBlock() {
cout<< "Performing TestBigBlock"<<endl;
char* bigBlock = new char [1024*1024*1536];
delete[] bigBlock;
}
int main() {
TestBigBlock();
TestMemory();
TestBigBlock();
}
Problem also repeats if using plain pointers with new/delete or malloc/free in cycle, instead of shared_ptr.
The culprit seems to be that after TestMemory(), the application's virtual memory stays at 827125760 (regardless of number of times I call it). As a consequence, there's no free VM regrion big enough to hold 1.5 GB. But I'm not sure why - since I'm definitely freeing the memory I used. Is it some "performance optimization" CRT does to minimize OS calls?
Environment is Windows 7 x64 + VS2012 + 32-bit app without LAA
Sorry for posting yet another answer since I am unable to comment; I believe many of the others are quite close to the answer really :-)
Anyway, the culprit is most likely address space fragmentation. I gather you are using Visual C++ on Windows.
The C / C++ runtime memory allocator (invoked by malloc or new) uses the Windows heap to allocate memory. The Windows heap manager has an optimization in which it will hold on to blocks under a certain size limit, in order to be able to reuse them if the application requests a block of similar size later. For larger blocks (I can't remember the exact value, but I guess it's around a megabyte) it will use VirtualAlloc outright.
Other long-running 32-bit applications with a pattern of many small allocations have this problem too; the one that made me aware of the issue is MATLAB - I was using the 'cell array' feature to basically allocate millions of 300-400 byte blocks, causing exactly this issue of address space fragmentation even after freeing them.
A workaround is to use the Windows heap functions (HeapCreate() etc.) to create a private heap, allocate your memory through that (passing a custom C++ allocator to your container classes as needed), and then destroy that heap when you want the memory back - This also has the happy side-effect of being very fast vs delete()ing a zillion blocks in a loop..
Re. "what is remaining in memory" to cause the issue in the first place: Nothing is remaining 'in memory' per se, it's more a case of the freed blocks being marked as free but not coalesced. The heap manager has a table/map of the address space, and it won't allow you to allocate anything which would force it to consolidate the free space into one contiguous block (presumably a performance heuristic).
There is absolutely no memory leak in your C++ program. The real culprit is memory fragmentation.
Just to be sure(regarding memory leak point), I ran this program on Valgrind, and it did not give any memory leak information in the report.
//Valgrind Report
mantosh#mantosh4u:~/practice$ valgrind ./basic
==3227== HEAP SUMMARY:
==3227== in use at exit: 0 bytes in 0 blocks
==3227== total heap usage: 20,017 allocs, 20,017 frees, 4,021,989,744 bytes allocated
==3227==
==3227== All heap blocks were freed -- no leaks are possible
Please find my response to your query/doubt asked in original question.
The culprit seems to be that after TestMemory(), the application's
virtual memory stays at 827125760 (regardless of number of times I
call it).
Yes, real culprit is hidden fragmentation done during the TestMemory() function.Just to understand the fragmentation, I have taken the snippet from wikipedia
"
when free memory is separated into small blocks and is interspersed by allocated memory. It is a weakness of certain storage allocation algorithms, when they fail to order memory used by programs efficiently. The result is that, although free storage is available, it is effectively unusable because it is divided into pieces that are too small individually to satisfy the demands of the application.
For example, consider a situation wherein a program allocates 3 continuous blocks of memory and then frees the middle block. The memory allocator can use this free block of memory for future allocations. However, it cannot use this block if the memory to be allocated is larger in size than this free block."
The above explains paragraph explains very nicely about memory fragmentation.Some allocation patterns(such as frequent allocation and deal location) would lead to memory fragmentation,but its end impact(.i.e. memory allocation 1.5GBgets failed) would greatly vary on different system as different OS/heap manager has different strategy and implementation.
As an example, your program ran perfectly fine on my machine(Linux) however you have encountered the memory allocation failure.
Regarding your observation on VM size remains constant: VM size seen in task manager is not directly proportional to our memory allocation calls. It mainly depends on the how much bytes is in committed state. When you allocate some dynamic memory(using new/malloc) and you do not write/initialize anything in those memory regions, it would not go committed state and hence VM size would not get impacted due to this. VM size depends on many other factors and bit complicated so we should not rely completely on this while understanding about dynamic memory allocation of our program.
As a consequence, there's no free VM regrion big enough to hold 1.5
GB.
Yes, due to fragmentation, there is no contiguous 1.5GB memory. It should be noted that total remaining(free) memory would be more than 1.5GB but not in fragmented state. Hence there is not big contiguous memory.
But I'm not sure why - since I'm definitely freeing the memory I used.
Is it some "performance optimization" CRT does to minimize OS calls?
I have explained about why it may happen even though you have freed all your memory. Now in order to fulfil user program request, OS will call to its virtual memory manager and try to allocate the memory which would be used by heap memory manager. But grabbing the additional memory does depend on many other complex factor which is not very easy to understand.
Possible Resolution of Memory Fragmentation
We should try to reuse the memory allocation rather than frequent memory allocation/free. There could be some patterns(like a particular request size allocation in particular order) which may lead overall memory into fragmented state. There could be substantial design change in your program in order to improve memory fragmentation. This is complex topic and require internal understanding of memory manager to understand the complete root cause of such things.
However there are tools exists on Windows based system which I am not much aware. But I found one excellent SO post regarding the which tool(on windows) can be useful to understand and check the fragmentation status of your program by yourself.
https://stackoverflow.com/a/1684521/2724703
This is not memory leak. The memory U used was allocated by C\C++ Runtime. The Runtime apply a a bulk of memory from OS once and then each new you called will allocated from that bulk memory. when delete one object, the Runtime not return memory to OS immediately, it may hold that memory for performance.
There is nothing here which indicates a genuine "leak". The pattern of memory you describe is not unexpected. Here are a few points which might help to understand. What happens is highly OS dependent.
A program often has a single heap which can be extended or shrunk in length. It is however one contiguous memory area, so changing the size is just changing where the end of the heap is. This makes it very difficult to ever "return" memory to the OS, since even one little tiny object in that space will prevent its shrinking. On Linux you can lookup the function 'brk' (I know you're on Windows, but I presume it does something similar).
Large allocations are often done with a different strategy. Rather than putting them in the general purpose heap, an extra block of memory is created. When it is deleted this memory can actually be "returned" to the OS since its guaranteed nothing is using it.
Large blocks of unused memory don't tend to consume a lot of resources. If you generally aren't using the memory any more they might just get paged to disk. Don't presume that because some API function says you're using memory that you are actually consuming significant resources.
APIs don't always report what you think. Due to a variety of optimizations and strategies it may not actually be possible to determine how much memory is in use and/or available on a system at a particular moment. Unless you have intimate details of the OS you won't know for sure what those values mean.
The first two points can explain why a bunch of small blocks and one large block result in different memory patterns. The latter points indicate why this approach to detecting leaks is not useful. To detect genuine object-based "leaks" you generally need a dedicated profiling tool which tracks allocations.
For example, in the code provided:
TestBigBlock allocates and deletes array, assume this uses a special memory block, so memory is returned to OS
TestMemory extends the heap for all the small objects, and never returns any heap to the OS. Here the heap is entirely available from the applications point-of-view, but from the OS's point of view it is assigned to the application.
TestBigBlock now fails, since although it would use a special memory block, it shares the overall memory space with heap, and there just isn't enough left after 2 is complete.

C and C++: Freeing PART of an allocated pointer

Let's say I have a pointer allocated to hold 4096 bytes. How would one deallocate the last 1024 bytes in C? What about in C++? What if, instead, I wanted to deallocate the first 1024 bytes, and keep the rest (in both languages)? What about deallocating from the middle (it seems to me that this would require splitting it into two pointers, before and after the deallocated region).
Don't try and second-guess memory management. It's usually cleverer than you ;-)
The only thing you can achieve is the first scenario to 'deallocate' the last 1K
char * foo = malloc(4096);
foo = realloc(foo, 4096-1024);
However, even in this case, there is NO GUARANTEE that "foo" will be unchanged. Your entire 4K may be freed, and realloc() may move your memory elsewhere, thus invalidating any pointers to it that you may hold.
This is valid for both C and C++ - however, use of malloc() in C++ is a bad code smell, and most folk would expect you to use new() to allocate storage. And memory allocated with new() cannot be realloc()ed - or at least, not in any kind of portable way. STL vectors would be a much better approach in C++
If you have n bytes of mallocated memory, you can realloc m bytes (where m < n) and thus throw away the last n-m bytes.
To throw away from the beginning, you can malloc a new, smaller buffer and memcpy the bytes you want and then free the original.
The latter option is also available using C++ new and delete. It can also emulate the first realloc case.
You don't have "a pointer allocated to hold 4096 bytes", you have a pointer to an allocated block of 4096 bytes.
If your block was allocated with malloc(), realloc() will allow you to reduce or increase the size of the block. The start address of the block won't necessarily stay the same, though.
You can't change the start address of a malloc'd memory block, which is really what your second scenario is asking. There's also no way to split a malloc'd block.
This is a limitation of the malloc/calloc/realloc/free API -- and implementations may rely on these limitations (for example, keeping bookkeeping information about the allocation immediately before the start address, which would make moving the start address difficult.)
Now, malloc isn't the only allocator out there -- your platform or libraries might provide other ones, or you could write your own (which gets memory from the system via malloc, mmap, VirtualAlloc or some other mechanism) and then hands it out to your program in whatever fashion you desire.
For C++, if you allocate memory with std::malloc, the information above applies. If you're using new and delete, you're allocating storage for and constructing objects, and so changing the size of an allocated block doesn't make sense -- objects in C++ are a fixed size.
You can make it shorter with realloc(). I don't think the rest is possible.
You can use realloc() to apparently make the memory shorter. Note that for some implementations such a call will actually do nothing. You can't free the first bit of the block and retain the last bit.
If you find yourself needing this kind of functionality, you should consider using a more complex data structure. An array is not the correct answer to every programming problem.
http://en.wikipedia.org/wiki/New_(C%2B%2B)
SUMMARY:In contrast to C's realloc, it
is not possible to directly reallocate
memory allocated with new[]. To extend
or reduce the size of a block, one
must allocate a new block of adequate
size, copy over the old memory, and
delete the old block. The C++ standard
library provides a dynamic array that
can be extended or reduced in its
std::vector template.