Related
I've run into memory leaks many times. Usually when I'm malloc-ing like there's no tomorrow, or dangling FILE *s like dirty laundry. I generally assume (read: hope desperately) that all memory is cleaned up at least when the program terminates. Are there any situations where leaked memory won't be collected when the program terminates, or crashes?
If the answer varies widely from language-to-language, then let's focus on C(++).
Please note hyperbolic usage of the phrase, 'like there's no tomorrow', and 'dangling ... like dirty laundry'. Unsafe* malloc*ing can hurt the ones you love. Also, please use caution with dirty laundry.
No. Operating systems free all resources held by processes when they exit.
This applies to all resources the operating system maintains: memory, open files, network connections, window handles...
That said, if the program is running on an embedded system without an operating system, or with a very simple or buggy operating system, the memory might be unusable until a reboot. But if you were in that situation you probably wouldn't be asking this question.
The operating system may take a long time to free certain resources. For example the TCP port that a network server uses to accept connections may take minutes to become free, even if properly closed by the program. A networked program may also hold remote resources such as database objects. The remote system should free those resources when the network connection is lost, but it may take even longer than the local operating system.
The C Standard does not specify that memory allocated by malloc is released when the program terminates. This done by the operating system and not all OSes (usually these are in the embedded world) release the memory when the program terminates.
As all the answers have covered most aspects of your question w.r.t. modern OSes, but historically, there is one that is worth mentioning if you have ever programmed in the DOS world. Terminant and Stay Resident (TSR) programs would usually return control to the system but would reside in memory which could be revived by a software / hardware interrupt. It was normal to see messages like "out of memory! try unloading some of your TSRs" when working on these OSes.
So technically the program terminates, but because it still resides on memory, any memory leak would not be released unless you unload the program.
So you can consider this to be another case apart from OSes not reclaiming memory either because it's buggy or because the embedded OS is designed to do so.
I remember one more example. Customer Information Control System (CICS), a transaction server which runs primarily on IBM mainframes is pseudo-conversational. When executed, it processes the user entered data, generates another set of data for the user, transferring to the user terminal node and terminates. On activating the attention key, it again revives to process another set of data. Because the way it behaves, technically again, the OS won't reclaim memory from the terminated CICS Programs, unless you recycle the CICS transaction server.
Like the others have said, most operating systems will reclaim allocated memory upon process termination (and probably other resources like network sockets, file handles, etc).
Having said that, the memory may not be the only thing you need to worry about when dealing with new/delete (instead of raw malloc/free). The memory that's allocated in new may get reclaimed, but things that may be done in the destructors of the objects will not happen. Perhaps the destructor of some class writes a sentinel value into a file upon destruction. If the process just terminates, the file handle may get flushed and the memory reclaimed, but that sentinel value wouldn't get written.
Moral of the story, always clean up after yourself. Don't let things dangle. Don't rely on the OS cleaning up after you. Clean up after yourself.
This is more likely to depend on operating system than language. Ultimately any program in any language will get it's memory from the operating system.
I've never heard of an operating system that doesn't recycle memory when a program exits/crashes. So if your program has an upper bound on the memory it needs to allocate, then just allocating and never freeing is perfectly reasonable.
If the program is ever turned into a dynamic component ("plugin") that is loaded into another program's address space, it will be troublesome, even on an operating system with tidy memory management. We don't even have to think about the code being ported to less capable systems.
On the other hand, releasing all memory can impact the performance of a program's cleanup.
One program I was working on, a certain test case required 30 seconds or more for the program to exit, because it was recursing through the graph of all dynamic memory and releasing it piece by piece.
A reasonable solution is to have the capability there and cover it with test cases, but turn it off in production code so the application quits fast.
All operating systems deserving the title will clean up the mess your process made after termination. But there are always unforeseen events, what if it was denied access somehow and some poor programmer did not foresee the possibility and so it doesn't try again a bit later?
Always safer to just clean up yourself IF memory leaks are mission critical - otherwise not really worth the effort IMO if that effort is costly.
Edit:
You do need to clean up memory leaks if they are in place where they will accumulate, like in loops. The memory leaks I speak of are ones that build up in constant time throughout the course of the program, if you have a leak of any other sort it will most likely be a serious problem sooner or later.
In technical terms if your leaks are of memory 'complexity' O(1) they are fine in most cases, O(logn) already unpleasant (and in some cases fatal) and O(N)+ intolerable.
Shared memory on POSIX compliant systems persists until shm_unlink is called or the system is rebooted.
If you have interprocess communication, this can lead to other processes never completing and consuming resources depending on the protocol.
To give an example, I was once experimenting with printing to a PDF printer in Java when I terminated the JVM in the middle of a printer job, the PDF spooling process remained active, and I had to kill it in the task manager before I could retry printing.
I've run into memory leaks many times. Usually when I'm malloc-ing like there's no tomorrow, or dangling FILE *s like dirty laundry. I generally assume (read: hope desperately) that all memory is cleaned up at least when the program terminates. Are there any situations where leaked memory won't be collected when the program terminates, or crashes?
If the answer varies widely from language-to-language, then let's focus on C(++).
Please note hyperbolic usage of the phrase, 'like there's no tomorrow', and 'dangling ... like dirty laundry'. Unsafe* malloc*ing can hurt the ones you love. Also, please use caution with dirty laundry.
No. Operating systems free all resources held by processes when they exit.
This applies to all resources the operating system maintains: memory, open files, network connections, window handles...
That said, if the program is running on an embedded system without an operating system, or with a very simple or buggy operating system, the memory might be unusable until a reboot. But if you were in that situation you probably wouldn't be asking this question.
The operating system may take a long time to free certain resources. For example the TCP port that a network server uses to accept connections may take minutes to become free, even if properly closed by the program. A networked program may also hold remote resources such as database objects. The remote system should free those resources when the network connection is lost, but it may take even longer than the local operating system.
The C Standard does not specify that memory allocated by malloc is released when the program terminates. This done by the operating system and not all OSes (usually these are in the embedded world) release the memory when the program terminates.
As all the answers have covered most aspects of your question w.r.t. modern OSes, but historically, there is one that is worth mentioning if you have ever programmed in the DOS world. Terminant and Stay Resident (TSR) programs would usually return control to the system but would reside in memory which could be revived by a software / hardware interrupt. It was normal to see messages like "out of memory! try unloading some of your TSRs" when working on these OSes.
So technically the program terminates, but because it still resides on memory, any memory leak would not be released unless you unload the program.
So you can consider this to be another case apart from OSes not reclaiming memory either because it's buggy or because the embedded OS is designed to do so.
I remember one more example. Customer Information Control System (CICS), a transaction server which runs primarily on IBM mainframes is pseudo-conversational. When executed, it processes the user entered data, generates another set of data for the user, transferring to the user terminal node and terminates. On activating the attention key, it again revives to process another set of data. Because the way it behaves, technically again, the OS won't reclaim memory from the terminated CICS Programs, unless you recycle the CICS transaction server.
Like the others have said, most operating systems will reclaim allocated memory upon process termination (and probably other resources like network sockets, file handles, etc).
Having said that, the memory may not be the only thing you need to worry about when dealing with new/delete (instead of raw malloc/free). The memory that's allocated in new may get reclaimed, but things that may be done in the destructors of the objects will not happen. Perhaps the destructor of some class writes a sentinel value into a file upon destruction. If the process just terminates, the file handle may get flushed and the memory reclaimed, but that sentinel value wouldn't get written.
Moral of the story, always clean up after yourself. Don't let things dangle. Don't rely on the OS cleaning up after you. Clean up after yourself.
This is more likely to depend on operating system than language. Ultimately any program in any language will get it's memory from the operating system.
I've never heard of an operating system that doesn't recycle memory when a program exits/crashes. So if your program has an upper bound on the memory it needs to allocate, then just allocating and never freeing is perfectly reasonable.
If the program is ever turned into a dynamic component ("plugin") that is loaded into another program's address space, it will be troublesome, even on an operating system with tidy memory management. We don't even have to think about the code being ported to less capable systems.
On the other hand, releasing all memory can impact the performance of a program's cleanup.
One program I was working on, a certain test case required 30 seconds or more for the program to exit, because it was recursing through the graph of all dynamic memory and releasing it piece by piece.
A reasonable solution is to have the capability there and cover it with test cases, but turn it off in production code so the application quits fast.
All operating systems deserving the title will clean up the mess your process made after termination. But there are always unforeseen events, what if it was denied access somehow and some poor programmer did not foresee the possibility and so it doesn't try again a bit later?
Always safer to just clean up yourself IF memory leaks are mission critical - otherwise not really worth the effort IMO if that effort is costly.
Edit:
You do need to clean up memory leaks if they are in place where they will accumulate, like in loops. The memory leaks I speak of are ones that build up in constant time throughout the course of the program, if you have a leak of any other sort it will most likely be a serious problem sooner or later.
In technical terms if your leaks are of memory 'complexity' O(1) they are fine in most cases, O(logn) already unpleasant (and in some cases fatal) and O(N)+ intolerable.
Shared memory on POSIX compliant systems persists until shm_unlink is called or the system is rebooted.
If you have interprocess communication, this can lead to other processes never completing and consuming resources depending on the protocol.
To give an example, I was once experimenting with printing to a PDF printer in Java when I terminated the JVM in the middle of a printer job, the PDF spooling process remained active, and I had to kill it in the task manager before I could retry printing.
I haven't been able to create a Qt GUI app that didn't have over 1K 'definitely lost' bytes in valgrind. I have experimented with this, making minimal apps that just show one QWidget, that extend QMainWindow; that just create a QApplication object without showing it or without executing it or both, but they always leak.
Trying to figure this out I have read that it's because X11 or glibc has bugs, or because valgrind gives false positives. And in one forum thread it seemed to be implied that creating a QApplication-object in the main function and returning the object's exec()-function, as is done in tutorials, is a "simplified" way to make GUIs (and not necessarily good, perhaps?).
The valgrind output does indeed mention libX11 and libglibc, and also libfontconfig. The rest of the memory losses, 5 loss records, occurs at ??? in libQtCore.so during QLibrary::setFileNameAndVersion.
If there is a more appropriate way to create GUI apps that prevents even just some of this from happening, what is it?
And if any of the valgrind output is just noise, how do I create a suppression file that suppresses the right things?
EDIT: Thank you for comments and answers!
I'm not worrying about the few lost kB themselves, but it'll be easier to find my own memory leaks if I don't have to filter several screens of errors but can normally get an "OK" from valgrind. And if I'm going to suppress warnings, I'd better know what they are, right?
Interesting to see how accepted leaks can be!
It is not uncommon for large-scale multi-thread-capable libraries such as QT, wxWidgets, X11, etc. to set up singleton-type objects that initialize once when a process is started and then make no attempt to effort to clean up the allocation when the process shuts down.
I can assure you that anything "leaked" from a function such as QLibrary::setFileNameAndVersion() has been left so intentionally. The bits of memory left behind by X11/glibc/fontConfig are probably not bugs either.
It could be seen as bad coding practice or etiquette, but it can also greatly simplify certain types of tasks. Operating systems these days offer a very strong guarantee for cleaning up any memory or resources left open by a process when its killed (either gracefully or by force), and if the allocation in question is very likely to be needed for the duration of the application, including shutdown procedures -- and various core components of QT would qualify -- then it can be a boon to performance to have the library set up some memory allocations as soon as it is loaded/initialized, and allow those to persist indefinitely. Among other things, this allows the memory to be present for use by any other C++ destructors that might reference that memory.
Since those allocations are only set up once, and from one point in the code, there is no risk of a meaningful memory leak. Its just memory that belongs to the process and is thus cleaned up when the process is closed by the operating system.
Conclusion: if the memory leak isn't in your code, and it doesn't appear to get significantly larger over time (and by significant these days, think megabytes), and/or is clearly orginating from first-time initialization setup code that is only ever invoked once within your app, then don't worry about it. It is probably intentional.
One way to test this can be to run your code inside a loop, and vary the number of iterations. If the difference between allocs and frees is independent on the number of iterations, you are likely to be safe.
I know some Java and am right now trying C++ instead and apparently in C++ you can do things like declare an int array of size 6, then change the 10th element of that array, which I'm understanding to be simply the 4th byte after the end of the section of memory that was allocated for the 6-integer array.
So my question is, if I'm careless is it possible to accidentally alter memory in my C++ program that is being used by other programs on my system? Is there an actual risk of seriously messing something up this way? I mean I know you can just restart your computer and clear the memory if you have to, but if I don't do that, there could be some lasting damage.
It depends on your system. Formally, an out of bounds access is
undefined behavior. On a modern general purpose system, each user
process has its own address space, and one process can't modify, or even
read, that of another process (barring shared memory), so unless you're
writing kernel code, you shouldn't be able to break anything outside of
your own process (and non-kernel code can't normally do physical IO, so
I don't see how anything in the hardware could break).
If you're writing kernel code, however, or working on an embedded
processor with no memory mapping or protection, you can literally
destroy hardware with an out of bounds write; if the program is
controlling something like a nuclear power plant, you can even destroy a
lot more than just the machine your code is running on.
Each process is given its own virtual address space, so naturally processes don't see each others memory. Don't forget that even a buffer overrun that is local to your program can have dire consequences - the overrun may cause the program to misbehave and do something that has lasting effect (like deleting all files for example).
This depends on what operating system and environment you are in:
Normal OS (Windows, Linux etc) userspace program: You can only mess up your own process memory. However having really bad luck this can be enough. Imagine for example that you make a call to some function that deletes files. If your memory is corrupted at the time of the call, the parameters to the function might be messed up to mean deletion of something else than you intended. As long as you keep from calling delete file routines etc. in the programs where you test memory handling, this risk is non-existent.
Normal OS, kernel space device driver: You can access system memory and the memory of the currently running process, possibly destroying everything.
Simple embedded OS without memory protection: You can access everything and destroy anything.
Legacey OS without memory protection (Win 3.X, MS-DOS): You can access everyting and destroy anything.
Every program runs in its own address space and one program cannot access (read / modify) any other programs address space (this is a memory management technique called paging).
If you try to access an address in memory that your program cannot read it will cause a segmentation or page fault and your program will crash.
In answer to your question, no permanent damage will be caused.
I'm not sure modern OS (especially win7) allows you to do that. The OS will block the buffer overrun action as you've described
Back in the day (DOS times) some viruses would try to damage hardware by directly programming video or hard drive controller, but even then it was not easy or sure thing. Modern hardware and OSs make it practicaly impossible for user level applications to damage hardware. So program away :) you wont break anything.
There is a different possibility. Having a buffer overrun might allow ill reputed people to exploit that bug and execute arbitrary code in your clients computer. I guess thats bad enough. And the most dangerous part about overrun is that you may not find it even after a seriously excessive test.
Let's say we have an SDK in C++ that accepts some binary data (like a picture) and does something. Is it not possible to make this SDK "crash-proof"? By crash I primarily mean forceful termination by the OS upon memory access violation, due to invalid input passed by the user (like an abnormally short junk data).
I have no experience with C++, but when I googled, I found several means that sounded like a solution (use a vector instead of an array, configure the compiler so that automatic bounds check is performed, etc.).
When I presented this to the developer, he said it is still not possible.. Not that I don't believe him, but if so, how is language like Java handling this? I thought the JVM performs everytime a bounds check. If so, why can't one do the same thing in C++ manually?
UPDATE
By "Crash proof" I don't mean that the application does not terminate. I mean it should not abruptly terminate without information of what happened (I mean it will dump core etc., but is it not possible to display a message like "Argument x was not valid" etc.?)
You can check the bounds of an array in C++, std::vector::at does this automatically.
This doesn't make your app crash proof, you are still allowed to deliberately shoot yourself in the foot but nothing in C++ forces you to pull the trigger.
No. Even assuming your code is bug free. For one, I have looked at many a crash reports automatically submitted and I can assure you that the quality of the hardware out there is much bellow what most developers expect. Bit flips are all too common on commodity machines and cause random AVs. And, even if you are prepared to handle access violations, there are certain exceptions that the OS has no choice but to terminate the process, for example failure to commit a stack guard page.
By crash I primarily mean forceful termination by the OS upon memory access violation, due to invalid input passed by the user (like an abnormally short junk data).
This is what usually happens. If you access some invalid memory usually OS aborts your program.
However the question what is invalid memory... You may freely fill with garbage all the memory in heap and stack and this is valid from OS point of view, it would not be valid from your point of view as you created garbage.
Basically - you need to check the input data carefully and relay on this. No OS would do this for you.
If you check your input data carefully you would likely to manage the data ok.
I primarily mean forceful termination
by the OS upon memory access
violation, due to invalid input passed
by the user
Not sure who "the user" is.
You can write programs that won't crash due to invalid end-user input. On some systems, you can be forcefully terminated due to using too much memory (or because some other program is using too much memory). And as Remus says, there is no language which can fully protect you against hardware failures. But those things depend on factors other than the bytes of data provided by the user.
What you can't easily do in C++ is prove that your program won't crash due to invalid input, or go wrong in even worse ways, creating serious security flaws. So sometimes[*] you think that your code is safe against any input, but it turns out not to be. Your developer might mean this.
If your code is a function that takes for example a pointer to the image data, then there's nothing to stop the caller passing you some invalid pointer value:
char *image_data = malloc(1);
free(image_data);
image_processing_function(image_data);
So the function on its own can't be "crash-proof", it requires that the rest of the program doesn't do anything to make it crash. Your developer also might mean this, so perhaps you should ask him to clarify.
Java deals with this specific issue by making it impossible to create an invalid reference - you don't get to manually free memory in Java, so in particular you can't retain a reference to it after doing so. It deals with a lot of other specific issues in other ways, so that the situations which are "undefined behavior" in C++, and might well cause a crash, will do something different in Java (probably throw an exception).
[*] let's face it: in practice, in large software projects, "often".
I think this is a case of C++ codes not being managed codes.
Java, C# codes are managed, that is they are effectively executed by an Interpreter which is able to perform bound checking and detect crash conditions.
With the case of C++, you need to perform bound and other checking yourself. However, you have the luxury of using Exception Handling, which will prevent crash during events beyond your control.
The bottom line is, C++ codes themselves are not crash proof, but a good design and development can make them to be so.
In general, you can't make a C++ API crash-proof, but there are techniques that can be used to make it more robust. Off the top of my head (and by no means exhaustive) for your particular example:
Sanity check input data where possible
Buffer limit checks in the data processing code
Edge and corner case testing
Fuzz testing
Putting problem inputs in the unit test for regression avoidance
If "crash proof" only mean that you want to ensure that you have enough information to investigate crash after it occurred solution can be simple. Most cases when debugging information is lost during crash resulted from corruption and/or loss of stack data due to illegal memory operation by code running in one of threads. If you have few places where you call library or SDK that you don't trust you can simply save the stack trace right before making call into that library at some memory location pointed to by global variable that will be included into partial or full memory dump generated by system when your application crashes. On windows such functionality provided by CrtDbg API.On Linux you can use backtrace API - just search doc on show_stackframe(). If you loose your stack information you can then instruct your debugger to use that location in memory as top of the stack after you loaded your dump file. Well it is not very simple after all, but if you haunted by memory dumps without any clue what happened it may help.
Another trick often used in embedded applications is cycled memory buffer for detailed logging. Logging to the buffer is very cheap since it is never saved, but you can get idea on what happen milliseconds before crash by looking at content of the buffer in your memory dump after the crash.
Actually, using bounds checking makes your application more likely to crash!
This is good design because it means that if your program is working, it's that much more likely to be working /correctly/, rather than working incorrectly.
That said, a given application can't be made "crash proof", strictly speaking, until the Halting Problem has been solved. Good luck!