What can "new_handler" be used for in C++ besides garbage collection?

What can "new_handler" be used for in C++ besides garbage collection? - c++

C++ programs can define and set a new_handler() that should be called from memory allocation functions like operator new() if it's impossible to allocate memory requested.
One use of a custom new_handler() is dealing with C++ implementations that don't throw an exception on allocation failure. Another use is initiating garbage collection on systems that implement garbage collection.
What other uses of custom new_handler() are there?

In a similar vein to the garbage collection application, you can use the new handler to free up any cached data you may be keeping.
Say that you're caching some resource data read from disk or the intermediate results of some calculation. This is data that you can recreate at any time, so when the new handler gets called (signifying that you've run out of heap), you can free the memory for the cached data, then return from the new handler. Hopefully, new will now be able to make the allocation.
In many cases, virtual memory can serve the same purpose: You can simply let your cached data get swapped out to disk -- if you've got enough virtual address space. On 32-bit systems, this isn't really the case any more, so the new handler is an interesting option. Many embedded systems will face similar constraints.

On most of the servers I've worked on, the new_handler freed
a pre-allocated block (so future new's wouldn't fail) before
logging a message (the logger used dynamic memory) and aborting.
This ensured that the out of memory error was correctly logged
(instead of the process just "disappearing", with an error
message to cerr, which was connected to /dev/null).
In applications such as editors, etc., it may be possible to
spill parts of some buffers to disk, then continue; if the
new_handler returns, operator new is supposed to retry the
allocation, and if the new_handler has freed up enough memory,
the allocation may succeed (and if it doesn't, the new_handler
will be called again, to maybe free up even more).

I've never used it for anything - too many OS will grant virtual memory and SIGSEGV or similar if they can't provide it later so it's not a good idea to make a system that relies on being memory-exhaustion tolerant: it's often outside C++'s hands. Still, if you developed for a system where it could/must be relied upon, I can easily imagine a situation where some real-time data was being streamed into a queue in your process, and you were processing it and writing/sending out results as fast as you could (e.g. video hardware streaming video for recompression to disk/network). If you got to the stage where you couldn't store any more, you'd simply have to drop some, but how would you know when it got that bad? Setting an arbitrary limit would be kind of silly, especially if your software was for an embedded environment / box that only existed to do this task. And, you probably shouldn't use a function like this casually on a system with any kind of hard-disk based swap memory, as if you're already into swap you're throughput rates would be miserable ever after. But - past the caveats - it might be useful to drop packets for a while until you catch up. Perhaps dropping every Nth frame through the queued buffer would be less visible than dropping a chunk at the back or front of the queue. Whatever, discarding data from the queue could be a sane application-level use (as distinct from intra-memory-subsystem) for something like this....

Related

Is it practical to delete all heap-allocated memory after you have finished using it?

Are there any specific situations in which it would not be practical nor necessary to delete the heap-allocated memory when you are done using it? Or does not deleting it always affect programs to a large extent?

In a few cases, I've had code that allocated lots of stuff on the heap. A typical run of the program took at least a few hours, and with larger data sets, that could go up to a couple of days or so. When it finished and you exited the program, all the destructors ran, and freed all the memory.
That led to a bit of a problem though. Especially after a long run (which allocated many blocks on the heap) it could take around five minutes for all the destructors to run.
So, I rewrote some destructors to do nothing--not even free memory an object had allocated.
The program had a pretty simple memory usage pattern, so everything it allocated remained in use until you shut it down. Disabling the destructors so they no longer released the memory that had been allocated reduced the time to shut down the program from ~5 minutes to what appeared instant (but was still actually pretty close to 100 ms).
That said, this is really only rarely an option. The vast majority of the time, a program should clean up after itself. With well written code it's usually pretty trivial anyway.

Are there any specific situations in which it would not be practical
nor necessary to delete the heap-allocated memory when you are done
using it?
Yes.
In certain types of telecomm embedded systems I have seen:
1) an operator commanded software-revision-update can also perform (or remind the user to perform) a software reset as the last step in the upgrade. This is not a power bounce, and (typically) the associated hw continues to run.
Note: There are two (or more) kinds of revision updates: 1) processor code; and 2) firmware (of the fpga's which is typically stored in eprom)
In this case, there need not be a delete of long-term heap allocated memory. The embedded software I am familiar with has many new'd data structures that last the life of the code. Software reset is the user-commanded end-of-life, and the memory is zero'd at system startup (not shutdown). No dtor's are used at this point, either.
There is often a customer requirement about the upper limit on how long a system reboot takes. The time starts when the customer wants ... perhaps at the start of the download of a new revision ... so a fast reset can help achieve that additional requirement.
2) I have worked on (embedded telecom) systems with a 'Watchdog' feature to detect certain inconsistencies (including thread 'hangs'). This failure mechanism generates a log entry in some persistent store (such as battery-back-static-ram or eprom or file system).
The log entry is evidence of some 'self-detected' inconsistency.
Any attempt to delete heap memory would be suspect, as the inconsistency might have already corrupted the system. This reset is not user-commanded, but may have site policy based controls. A fast reset is also desired here to restore functionality when the reset occurs with no user at the console.
Note:
IMHO, The most useful "development features" for embedded system (none of which trigger heap clean up efforts) are :
a) a soft-reset switch (fairly common availability) - reboots the processor with no impact to the hw that the software controls/monitors. Is used often.
b) a hard-reset switch (availability rare) - power bounces the card .. both processor and the equipment it controls, without impact to the rest of the cards in the shelf. (Unknown utility.)
c) a shelf-reset switch (some times the shelf has its own switch) - power bounces the shelf and all cards, processors and equipment within. This is seldom used, (except for system startup issues) but the alternative is to clumsily 'pull the power plug'.
d) computer control of these three switches - I've never seen it.

Are there any specific situations in which it would not be practical
nor necessary to delete the heap-allocated memory when you are done
using it?
Any heap memory you allocate and never free will remain allocated until your process exits. During that time, no other program will be able to use that portion of the computer's RAM for any purpose.
So the question is, will that cause a problem? The answer will depend on a number of variables:
How much RAM has your process allocated?
How much RAM does the computer have physically installed and available for other programs to use?
How long will your process continue running (and thus holding on to that memory) for?
If your program is of the type that runs, does its thing, and then exits (more-or-less) immediately, then there's likely no problem with it "leaking" memory, since the leaked memory will be reclaimed by the OS when your process exits (note: some very primitive/embedded/old OS's may not reclaim the resources of an exited process, so make sure your OS does -- that said, almost all commonly in-use modern OS's do)
If your program is of the type that can continue running indefinitely, on the other hand, then memory leaks are going to be a problem, because if the program keeps allocating memory and never freeing it, eventually it will eat up all of the computer's available RAM and then bad things will start to happen.
In general, there is no reason why you should ever have to leak memory in a modern C++ program -- smart pointers (e.g. std::unique_ptr and std::shared_ptr) are there specifically to make memory-leaks easy to avoid, and they are easier to use than the old/leak-prone raw C pointers, so there's no reason not to use them.

Shouldn't malloc be asynchronous?

Am I correct to assume that when a process calls malloc, there may be I/O involved (swapping caches out etc) to make memory available, which in turn implies it can block considerable time? Thus, shouldn't we have two versions of malloc in linux, one say "fast_malloc" which is suitable for obtaining smaller chunks & guaranteed not to block (but may of course still fail with OUT_OF_MEMORY) and another async_malloc where we could ask for arbitrary-size space but require a callback?
Example: if I need a smaller chunk of memory to make room for an item in the linked-list, I may prefer the traditional inline malloc knowing the OS should be able to satisfy it 99.999% of the time or just fail. Another example: if I'm a DB server trying to allocate a sizable chunk to put indexes in it I may opt for the async_malloc and deal with the "callback complexity".
The reason I brought this up is that I'm looking to create highly concurrent servers handling hundreds of thousands of web requests per second and generally avoid threads for handling the requests. Put another way, anytime I/O occurs I want it to be asynchronous (say libevent based). Unfortunately I'm realizing most C APIs lack proper support for concurrent use. For example, the ubiquitous MySQL C library is entirely blocking, and that's just one library my servers use extensively. Again, I can always simulate non-blocking by offloading to another thread but that's nowhere near as cheap as waiting for result via completion callback.

As kaylum said in a comment:
Calling malloc will not inherently cause more IO. Perhaps you are confusing use of the memory returned versus just allocating the memory to you. Just because you ask for 100MB does not mean that malloc will immediately trigger 100MB of swapping. That only happens when you access the memory.
If you want to protect against long delays for swapping, etc. during subsequent access to the allocated memory, you can call mlock on it in a separate thread (so your process isn't stalled waiting for mlock to complete). Once mlock has succeeded, the memory is physically instantiated and cannot be swapped out until munlock.

Remember that a call to malloc() does not necessarily result in your program asking the OS for more memory. It's down to the C runtime's implementation of malloc().
For glibc malloc() merely (depending on how much you're asking for) returns a pointer to memory that the runtime has already got from the OS. Similarly free() doesn't necessarily return memory to the OS. It's a lot faster that way. I think glibc's malloc() is thread safe too.
Interestingly this gives C, C++ (and everything built on top) the same sort of properties normally associated with languages like Java and C#. Arguably building a runtime like Java or C# on top of a runtime like glibc means that there's actually more work than necessary going on to manage memory... Unless they're not using malloc() or new at all.
There's various allocators out there, and you can link whichever one you want into your program regardless of what your normal C runtime provides. So even on platforms like *BSD (which are typically far more traditional in their memory allocation approach, asking the OS each and every time you call malloc() or new) you can pull off the same trick.

Put another way, anytime I/O occurs I want it to be asynchronous (say libevent based).
I have bad news for you. Any time you access memory you risk blocking for I/O.
malloc itself is quite unlikely to block because the system calls it uses just create an entry in a data structure that tells the kernel "map in some memory here when it's accessed". This means that malloc will only block when it needs to go down to the kernel to map more memory and either the kernel is out of memory so that it itself has to wait for allocating its internal data structure (you can wait for quite a while then) or you use mlockall. The actual allocating of memory that can cause swapping doesn't happen until you touch memory. And your own memory can be swapped out at any time (or your program text could be paged out) and you have pretty much no control over it.

Allocating memory that can be freed by the OS if needed

I'm writing a program that generates thumbnails for every page in a large document. For performance reasons I would like to keep the thumbnails in memory for as long as possible, but I would like the OS to be able to reclaim that memory if it decides there is another more important use for it (e.g. the user has started running a different application.)
I can always regenerate the thumbnail later if the memory has gone away.
Is there any cross-platform method for flagging memory as can-be-removed-if-needed? The program is written in C++.
EDIT: Just to clarify, rather than being notified when memory is low or regularly monitoring the system's amount of memory, I'm thinking more along the lines of allocating memory and then "unlocking" it when it's not in use. The OS can then steal unlocked memory if needed (even for disk buffers if it thinks that would be a better use of the memory) and all I have to do as a programmer is just "lock" the memory again before I intend to use it. If the lock fails I know the memory has been reused for something else so I need to regenerate the thumbnail again, and if the lock succeeds I can just keep using the data from before.
The reason is I might be displaying maybe 20 pages of a document on the screen, but I may as well keep thumbnails of the other 200 or so pages in case the user scrolls around a bit. But if they go do something else for a while, that memory might be better used as a disk cache or for storing web pages or something, so I'd like to be able to tell the OS that it can reuse some of my memory if it wants to.
Having to monitor the amount of free system-wide memory may not achieve the goal (my memory will never be reclaimed to improve disk caching), and getting low-memory notifications will only help in emergencies. I was hoping that by having a lock/unlock method, this could be achieved in more of a lightweight way and benefit the performance of the system in a non-emergency situation.

Is there any cross-platform method for flagging memory as can-be-removed-if-needed? The program is written in C++
For Windows, at least, you can register for a memory resource notification.
HANDLE WINAPI CreateMemoryResourceNotification(
_In_ MEMORY_RESOURCE_NOTIFICATION_TYPE NotificationType
);
NotificationType
LowMemoryResourceNotification Available physical memory is running low.
HighMemoryResourceNotification Available physical memory is high.
Just be careful responding to both events. You might create a feedback loop (memory is low, release the thumbnails! and then memory is high, make all the thumbnails!).

In AIX, there is a signal SIGDANGER that is send to applications when available memory is low. You may handle this signal and free some memory.
There is a discussion among Linux people to implement this feature into Linux. But AFAIK it is not yet implemented in Linux. Maybe they think that application should not care about low level memory management, and it could be transparently handled in OS via swapping.
In posix standard there is a function posix_madvise might be used to mark an area of memory as less important. There is an advice POSIX_MADV_DONTNEED specifies that the application expects that it will not access the specified range in the near future.
But unfortunately, current Linux implementation will immediately free the memory range when posix_madvise is called with this advice.
So there's no portable solution to your question.
However, on almost every OS you are able to read the current available memory via some OS interface. So you can routinely read such value and manually free memory when available memory in OS is low.

There's nothing special you need to do. The OS will remove things from memory if they haven't been used recently automatically. Some OSes have platform-specific ways to improve this, but generally, nothing special is needed.

This question is very similar and has answers that cover things not covered here.
Allocating "temporary" memory (in Linux)
This shouldn't be too hard to do because this is exactly what the page cache does, using unused memory to cache the hard disk. In theory, someone could write a filesystem such that when you read from a certain file, it calculated something, and the page cache would cache it automatically.
All the basics of automatically freed cache space are already there in any OS with a disk cache, and It's hard to imagine there not being an API for something that would make a huge difference especially in things like mobile web browsers.

What's the graceful way of handling out of memory situations in C/C++?

I'm writing an caching app that consumes large amounts of memory.
Hopefully, I'll manage my memory well enough, but I'm just thinking about what
to do if I do run out of memory.
If a call to allocate even a simple object fails, is it likely that even a syslog call
will also fail?
EDIT: Ok perhaps I should clarify the question. If malloc or new returns a NULL or 0L value then it essentially means the call failed and it can't give you the memory for some reason. So, what would be the sensible thing to do in that case?
EDIT2: I've just realised that a call to "new" can throw an exception. This could be caught at a higher level so I can perhaps gracefully exit further up. At that point, it may even be possible to recover depending on how much memory is freed. In the least I should by that point hopefully be able to log something. So while I have seen code that checks the value of a pointer after new, it is unnecessary. While in C, you should check the return value for malloc.

Well, if you are in a case where there is a failure to allocate memory, you're going to get a std::bad_alloc exception. The exception causes the stack of your program to be unwound. In all likelihood, the inner loops of your application logic are not going to be handling out of memory conditions, only higher levels of your application should be doing that. Because the stack is getting unwound, a significant chunk of memory is going to be free'd -- which in fact should be almost all the memory used by your program.
The one exception to this is when you ask for a very large (several hundred MB, for example) chunk of memory which cannot be satisfied. When this happens though, there's usually enough smaller chunks of memory remaining which will allow you to gracefully handle the failure.
Stack unwinding is your friend ;)
EDIT: Just realized that the question was also tagged with C -- if that is the case, then you should be having your functions free their internal structures manually when out of memory conditions are found; not to do so is a memory leak.
EDIT2: Example:
#include <iostream>
#include <vector>
void DoStuff()
{
std::vector<int> data;
//insert a whole crapload of stuff into data here.
//Assume std::vector::push_back does the actual throwing
//i.e. data.resize(SOME_LARGE_VALUE_HERE);
}
int main()
{
try
{
DoStuff();
return 0;
}
catch (const std::bad_alloc& ex)
{ //Observe that the local variable `data` no longer exists here.
std::cerr << "Oops. Looks like you need to use a 64 bit system (or "
"get a bigger hard disk) for that calculation!";
return -1;
}
}
EDIT3: Okay, according to commenters there are systems out there which do not follow the standard in this regard. On the other hand, on such systems, you're going to be SOL in any case, so I don't see why they merit discussion. But if you are on such a platform, it is something to keep in mind.

Doesn't this question make assumptions regarding overcommitted memory?
I.e., an out of memory situation might not be recoverable! Even if you have no memory left, calls to malloc and other allocators may still succeed until the program attempts to use the memory. Then, BAM!, some process gets killed by the kernel in order to satisfy memory load.

I don't have any specific experience on Linux, but I spent a lot of time working in video games on games consoles, where running out of memory is verboten, and on Windows-based tools.
On a modern OS, you're most likely to run out of address space. Running out of memory, as such, is basically impossible. So just allocate a large buffer, or buffers, on startup, in order to hold all the data you'll ever need, whilst leaving a small amount for the OS. Writing random junk to these regions would probably be a good idea in order to force the OS to actually assign the memory to your process. If your process survives this attempt to use every byte it's asked for, there's some kind of backing now reserved for all of this stuff, so now you're golden.
Write/steal your own memory manager, and direct it to allocate from these buffers. Then use it, consistently, in your app, or take advantage of gcc's --wrap option to forward calls from malloc and friends appropriately. If you use any libraries that can't be directed to call into your memory manager, junk them, because they'll just get in your way. Lack of overridable memory management calls is evidence of deeper-seated issues; you're better of without this particular component. (Note: even if you're using --wrap, trust me, this is still evidence of a problem! Life is too short to use those libraries that don't let you overload their memory management!)
Once you run out of memory, OK, you're screwed, but you've still got that space you left free before, so if freeing up some of the memory you've asked for is too difficult you can (with care) call system calls to write a message to the system log and then terminate, or whatever. Just make sure to avoid calls to the C library, because they'll probably try to allocate some memory when you least expect it -- programmers who work with systems that have virtualised address spaces are notorious for this kind of thing -- and that's the very thing that has caused the problem in the first place.
This approach might sound like a pain in the arse. Well... it is. But it's straightforward, and it's worth putting in a bit of effort for that. I think there's a Kernighan-and/or-Ritche quote about this.

If your application is likely to allocate large blocks of memory and risks hitting the per-process or VM limits, waiting until an allocation actually fails is a difficult situation from which to recover. By the time malloc returns NULL or new throws std::bad_alloc, things may be too far gone to reliably recover. Depending on your recovery strategy, many operations may still require heap allocations themselves, so you have to be extremely careful on which routines you can rely.
Another strategy you may wish to consider is to query the OS and monitor the available memory, proactively managing your allocations. This way you can avoid allocating a large block if you know it is likely to fail, and will thus have a better chance of recovery.
Also, depending on your memory usage patterns, using a custom allocator may give you better results than the standard built-in malloc. For example, certain allocation patterns can actually lead to memory fragmentation over time, so even though you have free memory, the available blocks in the heap arena may not have an available block of the right size. A good example of this is Firefox, which switched to dmalloc and saw a great increase in memory efficiency.

I don't think that capturing the failure of malloc or new will gain you much in your situation. linux allocates large chunks of virtual pages in malloc by means of mmap. By this you may find yourself in a situation where you allocate much more virtual memory than you have (real + swap).
The program then will only fail much later with a segfault (SIGSEGV) when you write to the first page for which there isn't any place in swap. In theory you could test for such situations by writing a signal handler and then dirtying all pages that you allocate.
But usually this will not help much either, since your application will be in a very bad state long before that: constantly swapping, computing mechanically with your harddisk...

It's possible for writes to the syslog to fail in low memory conditions: there's no way to know that for every platform without looking at the source for the relevant functions. They could need dynamic memory to format strings that are passed in, for instance.
Long before you run out of memory, however, you'll start paging stuff to disk. And when that happens, you can forget any performance advantages from caching.
Personally, I'm convinced by the design behind Varnish: the operating system offers services to solve a lot of the relevant problems, and it makes sense to use those services (minor editing):
So what happens with Squid's elaborate memory management is that it gets into fights with the kernel's elaborate memory management ...
Squid creates a HTTP object in RAM and it gets used some times rapidly after creation. Then after some time it get no more hits and the kernel notices this. Then somebody tries to get memory from the kernel for something and the kernel decides to push those unused pages of memory out to swap space and use the (cache-RAM) more sensibly for some data which is actually used by a program. This however, is done without Squid knowing about it. Squid still thinks that these http objects are in RAM, and they will be, the very second it tries to access them, but until then, the RAM is used for something productive. ...
After some time, Squid will also notice that these objects are unused, and it decides to move them to disk so the RAM can be used for more busy data. So Squid goes out, creates a file and then it writes the http objects to the file.
Here we switch to the high-speed camera: Squid calls write(2), the address it gives is a "virtual address" and the kernel has it marked as "not at home". ...
The kernel tries to find a free page, if there are none, it will take a little used page from somewhere, likely another little used Squid object, write it to the paging ... space on the disk (the "swap area") when that write completes, it will read from another place in the paging pool the data it "paged out" into the now unused RAM page, fix up the paging tables, and retry the instruction which failed. ...
So now Squid has the object in a page in RAM and written to the disk two places: one copy in the operating system's paging space and one copy in the filesystem. ...
Here is how Varnish does it:
Varnish allocate some virtual memory, it tells the operating system to back this memory with space from a disk file. When it needs to send the object to a client, it simply refers to that piece of virtual memory and leaves the rest to the kernel.
If/when the kernel decides it needs to use RAM for something else, the page will get written to the backing file and the RAM page reused elsewhere.
When Varnish next time refers to the virtual memory, the operating system will find a RAM page, possibly freeing one, and read the contents in from the backing file.
And that's it. Varnish doesn't really try to control what is cached in RAM and what is not, the kernel has code and hardware support to do a good job at that, and it does a good job.
You may not need to write caching code at all.

As has been stated, exhausting memory means that all bets are off. IMHO the best method of handling this situation is to fail gracefully (as opposed to simply crashing!). Your cache could allocate a reasonable amount of memory on instantiation. The size of this memory would equate to an amount that, when freed, will allow the program to terminate reasonably. When your cache detects that memory is becoming low then it should release this memory and instigate a graceful shutdown.

I'm writing an caching app that consumes large amounts of memory.
Hopefully, I'll manage my memory well enough, but I'm just thinking about what to do if I do run out of memory.
If you are writing deamon which should run 24/7/365, then you should not use dynamic memory management: preallocate all the memory in advance and manage it using some slab allocator/memory pool mechanism. That will also protect you again the heap fragmentation.
If a call to allocate even a simple object fails, is it likely that even a syslog call will also fail?
Should not. This is partially reason why syslog exists as a syscall: that application can report an error independent of its internal state.
If malloc or new returns a NULL or 0L value then it essentially means the call failed and it can't give you the memory for some reason. So, what would be the sensible thing to do in that case?
I generally try in the situations to properly handle the error condition, applying the general error handling rules. If error happens during initialization - terminate with error, probably configuration error. If error happens during request processing - fail the request with out-of-memory error.
For plain heap memory, malloc() returning 0 generally means:
that you have exhausted the heap and unless your application free some memory, further malloc()s wouldn't succeed.
the wrong allocation size: it is quite common coding error to mix signed and unsigned types when calculating block size. If the size ends up mistakenly negative, passed to malloc() where size_t is expected, it becomes very large number.
So in some sense it is also not wrong to abort() to produce the core file which can be analyzed later to see why the malloc() returned 0. Though I prefer to (1) include the attempted allocation size in the error message and (2) try to proceed further. If application would crash due to other memory problem down the road (*), it would produce core file anyway.
(*) From my experience of making software with dynamic memory management resilient to malloc() errors I see that often malloc() returns 0 not reliably. First attempts returning 0 are followed by a successful malloc() returning valid pointer. But first access to the pointed memory would crash the application. This is my experience on both Linux and HP-UX - and I have seen similar pattern on Solaris 10 too. The behavior isn't unique to Linux. To my knowledge the only way to make an application 100% resilient to memory problems is to preallocate all memory in advance. And that is mandatory for mission critical, safety, life support and carrier grade applications - they are not allowed dynamic memory management past initialization phase.

I don't know why many of the sensible answers are voted down. In most server environments, running out of memory means that you have a leak somewhere, and that it makes little sense to 'free some memory and try to go on'. The nature of C++ and especially the standard library is that it requires allocations all the time. If you are lucky, you might be able to free some memory and execute a clean shutdown, or at least emit a warning.
It is however far more likely that you won't be able to do a thing, unless the allocation that failed was a huge one, and there is still memory available for 'normal' things.
Dan Bernstein is one of the very few guys I know that can implement server software that operates in memory constrained situations.
For most of the rest of us, we should probably design our software that it leaves things in a useful state when it bails out because of an out of memory error.
Unless you are some kind of brain surgeon, there isn't a lot else to do.
Also, very often you won't even get an std::bad_alloc or something like that, you'll just get a pointer in return to your malloc/new, and only die when you actually try to touch all of that memory. This can be prevented by turning off overcommit in the operating system, but still.
Don't count on being able to deal with the SIGSEGV when you touch memory that the kernel hoped you wouldn't be.. I'm not quite sure how this works on the windows side of things, but I bet they do overcommit too.
All in all, this is not one of C++'s strong spots.

Any reason to overload global new and delete?

Unless you're programming parts of an OS or an embedded system are there any reasons to do so? I can imagine that for some particular classes that are created and destroyed frequently overloading memory management functions or introducing a pool of objects might lower the overhead, but doing these things globally?
Addition
I've just found a bug in an overloaded delete function - memory wasn't always freed. And that was in a not-so memory critical application. Also, disabling these overloads decreases performance by ~0.5% only.

We overload the global new and delete operators where I work for many reasons:
pooling all small allocations -- decreases overhead, decreases fragmentation, can increase performance for small-alloc-heavy apps
framing allocations with a known lifetime -- ignore all the frees until the very end of this period, then free all of them together (admittedly we do this more with local operator overloads than global)
alignment adjustment -- to cacheline boundaries, etc
alloc fill -- helping to expose usage of uninitialized variables
free fill -- helping to expose usage of previously deleted memory
delayed free -- increasing the effectiveness of free fill, occasionally increasing performance
sentinels or fenceposts -- helping to expose buffer overruns, underruns, and the occasional wild pointer
redirecting allocations -- to account for NUMA, special memory areas, or even to keep separate systems separate in memory (for e.g. embedded scripting languages or DSLs)
garbage collection or cleanup -- again useful for those embedded scripting languages
heap verification -- you can walk through the heap data structure every N allocs/frees to make sure everything looks ok
accounting, including leak tracking and usage snapshots/statistics (stacks, allocation ages, etc)
The idea of new/delete accounting is really flexible and powerful: you can, for example, record the entire callstack for the active thread whenever an alloc occurs, and aggregate statistics about that. You could ship the stack info over the network if you don't have space to keep it locally for whatever reason. The types of info you can gather here are only limited by your imagination (and performance, of course).
We use global overloads because it's convenient to hang lots of common debugging functionality there, as well as make sweeping improvements across the entire app, based on the statistics we gather from those same overloads.
We still do use custom allocators for individual types too; in many cases the speedup or capabilities you can get by providing custom allocators for e.g. a single point-of-use of an STL data structure far exceeds the general speedup you can get from the global overloads.
Take a look at some of the allocators and debugging systems that are out there for C/C++ and you'll rapidly come up with these and other ideas:
valgrind
electricfence
dmalloc
dlmalloc
Application Verifier
Insure++
BoundsChecker
...and many others... (the gamedev industry is a great place to look)
(One old but seminal book is Writing Solid Code, which discusses many of the reasons you might want to provide custom allocators in C, most of which are still very relevant.)
Obviously if you can use any of these fine tools you will want to do so rather than rolling your own.
There are situations in which it is faster, easier, less of a business/legal hassle, nothing's available for your platform yet, or just more instructive: dig in and write a global overload.

The most common reason to overload new and delete are simply to check for memory leaks, and memory usage stats. Note that "memory leak" is usually generalized to memory errors. You can check for things such as double deletes and buffer overruns.
The uses after that are usually memory-allocation schemes, such as garbage collection, and pooling.
All other cases are just specific things, mentioned in other answers (logging to disk, kernel use).

In addition to the other important uses mentioned here, like memory tagging, it's also the only way to force all allocations in your app to go through fixed-block allocation, which has enormous implications for performance and fragmentation.
For example, you may have a series of memory pools with fixed block sizes. Overriding global new lets you direct all 61-byte allocations to, say, the pool with 64-byte blocks, all 768-1024 byte allocs to the the 1024b-block pool, all those above that to the 2048 byte block pool, and anything larger than 8kb to the general ragged heap.
Because fixed block allocators are much faster and less prone to fragmentation than allocating willy-nilly from the heap, this lets you force even crappy 3d party code to allocate from your pools and not poop all over the address space.
This is done often in systems which are time- and space-critical, such as games. 280Z28, Meeh, and Dan Olson have described why.

UnrealEngine3 overloads global new and delete as part of its core memory management system. There are multiple allocators that provide different features (profiling, performance, etc.) and they need all allocations to go through it.
Edit: For my own code, I would only ever do it as a last resort. And by that I mean I would almost positively never use it. But my personal projects are obviously much smaller/very different requirements.

Some realtime systems overload them to avoid them being used after init..

Overloading new & delete makes it possible to add a tag to your memory allocations. I tag allocations per system or control or by middleware. I can view, at runtime, how much each uses. Maybe I want to see the usage of a parser separated from the UI or how much a piece of middleware is really using!
You can also use it to put guard bands around the allocated memory. If/when your app crashes you can take a look at the address. If you see the contents as "0xABCDABCD" (or whatever you choose as guard) you are accessing memory you don't own.
Perhaps after calling delete you can fill this space with a similarly recognizable pattern.
I believe VisualStudio does something similar in debug. Doesn't it fill uninitialized memory with 0xCDCDCDCD?
Finally, if you have fragmentation issues you could use it to redirect to a block allocator? I am not sure how often this is really a problem.

You need to overload them when the call to new and delete doesn't work in your environment.
For example, in kernel programming, the default new and delete don't work as they rely on user mode library to allocate memory.

From a practical standpoint it may just be better to override malloc on a system library level, since operator new will probably be calling it anyway.
On linux, you can put your own version of malloc in place of the system one, as in this example here:
http://developers.sun.com/solaris/articles/lib_interposers.html
In that article, they are trying to collect performance statistics. But you may also detect memory leaks if you also override free.
Since you are doing this in a shared library with LD_PRELOAD, you don't even need to recompile your application.

I've seen it done in a system that for 'security'* reasons was required to write over all memory it used on de-allocation. The approach was to allocate an extra few bytes at the start of each block of memory which would contain the size of the overall block which would then be overwritten with zeros on delete.
This had a number of problems as you can probably imagine but it did work (mostly) and saved the team from reviewing every single memory allocation in a reasonably large, existing application.
Certainly not saying that it is a good use but it is probably one of the more imaginative ones out there...
* sadly it wasn't so much about actual security as the appearance of security...

Photoshop plugins written in C++ should override operator new so that they obtain memory via Photoshop.

I've done it with memory mapped files so that data written to the memory is automatically also saved to disk.
It's also used to return memory at a specific physical address if you have memory mapped IO devices, or sometimes if you need to allocate a certain block of contiguous memory.
But 99% of the time it's done as a debugging feature to log how often, where, when memory is being allocated and released.

It's actually pretty common for games to allocate one huge chunk of memory from the system and then provide custom allocators via overloaded new and delete. One big reason is that consoles have a fixed memory size, making both leaks and fragmentation large problems.
Usually (at least on a closed platform) the default heap operations come with a lack of control and a lack of introspection. For many applications this doesn't matter, but for games to run stably in fixed-memory situations the added control and introspection are both extremely important.

It can be a nice trick for your application to be able to respond to low memory conditions by something else than a random crash. To do this your new can be a simple proxy to the default new that catches its failures, frees up some stuff and tries again.
The simplest technique is to reserve a blank block of memory at start-up time for that very purpose. You may also have some cache you can tap into - the idea is the same.
When the first allocation failure kicks in, you still have time to warn your user about the low memory conditions ("I'll be able to survive a little longer, but you may want to save your work and close some other applications"), save your state to disk, switch to survival mode, or whatever else makes sense in your context.

The most common use case is probably leak checking.
Another use case is when you have specific requirements for memory allocation in your environment which are not satisfied by the standard library you are using, like, for instance, you need to guarantee that memory allocation is lock free in a multi threaded environment.

As many have already stated this is usually done in performance critical applications, or to be able to control memory alignment or track your memory. Games frequently use custom memory managers, especially when targeting specific platforms/consoles.
Here is a pretty good blog post about one way of doing this and some reasoning.

Overloaded new operator also enables programmers to squeeze some extra performance out of their programs. For example, In a class, to speed up the allocation of new nodes, a list of deleted nodes is maintained so that their memory can be reused when new nodes are allocated.In this case, the overloaded delete operator will add nodes to the list of deleted nodes and the overloaded new operator will allocate memory from this list rather than from the heap to speedup memory allocation. Memory from the heap can be used when the list of deleted nodes is empty.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js