Is there anything that can be done to track the size of every container in my C++ program? I have a program that has to be run real-time to actually work and in which memory grows over time probably due to STL containers I have forgotten to clean up nodes from when I am done with them.
They aren't proper memory leaks but there are plenty of containers in which I write data, do stuff with it for a while, erase entries (which I am not consistently doing it seems) and move on. I want to see if I can track down which containers I have forgotten to clean up. I apologize for what I'm sure must be a duplicate question, but I could not for the life of me find an answer.
Limitations:
I have to run my program real-time or it will drastically change the course of the program.
I don't have an IDE set up to run my code
It has to run on Linux
I am sadly limited to those and I can't change that. Any ideas shy of grepping each instance of vector etc and putting a print in the code?
One interesting approach that is potentially quite powerful and performant, but a non-trivial amount of work, is to write a custom allocator. Start with the minimal allocator, described here: http://en.cppreference.com/w/cpp/concept/Allocator. You can then add two key things: first, on construction, you can get a backtrace (e.g. https://panthema.net/2008/0901-stacktrace-demangled/, there are many resources online for this), and log the backtrace along with the this pointer. Then, on every allocate and deallocate call, you can log the size of the allocation/deallocation along with the this pointer.
Ultimately, your log file will contain all the allocations and deallocation sizes mapped by pointer, and a backtrace that will let you figure out what object in your code each pointer corresponds to. You can add additional logging statements to get a sense of where these are occuring relative to program control flow. Then, you can easily write a python script that will do running sums of allocations/deallocations to see what the sizes of your containers are at various points in your program.
Once you write the custom allocator you'll of course need to use it everywhere, which is slightly annoying but not too difficult. Preferably factor out your container types into typedefs in a central header file so its each to change back to the standard allocator afterwards.
Thank you all for the suggestions. You are probably right that the only way is to actually edit the allocator and add my own code.
In the end, since Valgrind was too slow, I tried gperftools which was also too slow but at least fast enough to get me a little bit of a clue of what is happening in my code. Thanks all for the suggestions.
Standard hacking case. Hack file type injects into a started process and writes over process memory using WriteProcessMemory call. In games this is not something you would want because it can provide the hacker to change the portion of the game and give himself an advantage.
There is a possibility to force a user to run a third-party program along with the game and I would need to know what would be the best way to prevent such injection. I already tried to use a function EnumProcessModules which lists all process DLLs with no success. It seems to me that the hacks inject directly into process memory (end of stack?), therefore it is undetected. At the moment I have came down to a few options.
Create a blacklist of files, file patterns, process names and memory patterns of most known public hacks and scan them with the program. The problem with this is that I would need to maintain the blacklist and also create an update of the program to hold all avalible hacks. I also found this usefull answer Detecting memory access to a process but it could be possible that some existing DLL is already using those calls so there could be false positives.
Using ReadProcessMemory to monitor the changes in well known memory offsets (hacks usually use the same offsets to achieve something). I would need to run a few hacks, monitor the behaviour and get samples of hack behaviour when comparing to normal run.
Would it be possible to somehow rearrange the process memory after it starts? Maybe just pushing the process memory down the stack could confuse the hack.
This is an example of the hack call:
WriteProcessMemory(phandler,0xsomeoffset,&datatowrite,...);
So unless the hack is a little more smarter to search for the actual start of the process it would already be a great success. I wonder if there is a system call that could rewrite the memory to another location or somehow insert some null data in front of the stack.
So, what would be the best way to go with this? It is a really interesting and dark area of the programming so I would like to hear as much interesting ideas as possible. The goal is to either prevent the hack from working or detect it.
Best regards
Time after time compute the hash or CRC of application's image stored in memory and compare it with known hash or CRC.
Our service http://activation-cloud.com provides the ability to check integrity of application against the signature stored in database.
I'm writing a program (a theorem prover as it happens) whose memory requirement is "as much as possible, please"; that is, it can always do better by using more memory, for practical purposes without upper bound, so what it actually needs to do is use just as much memory as is available, no more and no less. I can figure out how to prioritize data to delete the lowest value stuff when memory runs short; the problem I'm trying to solve is how to tell when this is happening.
Ideally I would like a system call that returns "how much memory is left" or "are we out of memory yet?"; as far as I can tell, no such thing exists?
Of course, malloc can signal out of memory by returning 0 and new can call a handler; these aren't ideal signals, but would be better than nothing. A problem, however, is that I really want to know when physical memory is running out, so I can avoid going deep into swap and thereby making everything grind to a halt; I don't suppose there's any way to ask "are we having to swap yet?" or tell the operating system "don't swap on my account, just fail my requests if it comes to that"?
Another approach would be to find out how much RAM is in the machine, and monitor how much memory the program is using at the moment. As far as I know, there is generally no way to tell the former? I also get the impression there is no reliable way to tell the latter except by wrapping malloc/free with a bookkeeper function (which is then more problematic in C++).
Are there any approaches I'm missing?
The ideal would be a portable solution, but I suspect that's not going to happen. Failing that, a solution that works on Windows and another one that works on Unix would be nice. Failing that, I could get by with a solution that works on Windows and another one that works on Linux.
I think the most useful and flexible way to use all the memory available is to let the user specify how much memory to use.
Let the user write it in a config file or through an interface, then create an allocator (or something similar) that will not provide more than this memory.
That way, you don't have to find statistics about the current computer as this will allways be biased by the fact that the OS could also run other programs as well. Don't even talk about the way the OS will manage cache, the differences between 32 and 64 bit making adress space limit your allocations etc.
In the end, human intelligence (assuming the user know about the context of use) is cheaper to implement when provided by the user.
To find out how much system memory is still unused, under Linux you can parse the file /proc/meminfo and look for a line starting with "MemFree:". Under Windows you can use GlobalMemoryStatusEx http://msdn.microsoft.com/en-us/library/aa366589%28VS.85%29.aspx
Relying on malloc to return 0 when no memory is available might cause problems on Linux, because Linux overcommits memory allocations. malloc will usually return a valid pointer (unless the process is out of virtual address space), but accessing the memory it points to may trigger the "OOM killer", a mechanism that kills your process or another process on the system. The system administrator can tune this behavior.
The best solution I can think of might be to query how many page faults have occurred within, say, the last second. If there's a lot of swapping going on, you should probably release some memory, and if not, you can try allocating more memory.
On Windows, WMI can probably give you some statistics you can use.
But it's a tough problem, since there is no hard limit you can ask the OS for and then stay below. You can keep allocating memory far beyond the point where you've run out of physical memory, which just means you'll cripple your process with excessive swapping.
So the best you can really do is some kind of approximation.
You can keep allocating memory beyond the point where it is useful to do so - i.e. that which requires the OS to swap, or page out important things. The trouble is, it is not necessarily easy to tell where this is.
Also, if your task does any (significant) IO, you will need to have some left for the OS buffers.
I recommend just examining how much there is in the machine, then allocating an amount as a function of that (proportion, or leave some free etc).
Good afternoon all,
What I'm trying to accomplish: I'd like to implement an extension to a C++ unit test fixture to detect if the test allocates memory and doesn't free it. My idea was to record allocation levels or free memory levels before and after the test. If they don't match then you're leaking memory.
What I've tried so far: I've written a routine to read /proc/self/stat to get the vm size and resident set size. Resident set size seems like what I need but it's obviously not right. It changes between successive calls to the function with no memory allocation. I believe it's returning the cached memory used not what's allocated. It also changes in 4k increments so it's too coarse to be of any real use.
I can get the stack size by allocating a local and saving it's address. Are there any problems with doing this?
Is there a way to get real free or allocated memory on linux?
Thanks
Your best bet may actually be to use a tool specifically designed for the job of finding memory leaks. I have personal experience with Electric Fence, which is easy to use and seems to do the job nicely (not sure how well it will handle C++). Also recommended by others is Dmalloc.
For sure though, everyone seems to like Valgrind, which can do just about anything and even has front-ends (though anything that has a front-end built for it means that it probably isn't the simplest thing in the world). If the KDE folks can recommend it, it must be able to handle just about anything. (I'm not saying anything bad about KDE, just that it is a very large C++ codebase, so if Valgrind can handle KDE software, it must have something going for it. Though I don't have personal experience with it as Electric Fence was always enough for me)
I'd have to agree with those suggesting Valgrind and similar, but if the run-time overhead is too great, one option may be to use mallinfo() call to retrieve statistics on currently allocated memory, and check whether uordblks is nonzero.
Note that this will have to be run before global destructors are called - so if you have any allocations that are cleaned up there, this will register a false positive. It also won't tell you where the allocation is made - but it's a good first pass to figure out which test cases need work.
don't look a the OS to get allocation info. the C library manages memory internally, and only asks the OS for more RAM in chunks (4KB in your case). In most cases, it's never released to back to the OS, so you can't really check anything there.
You'll have to patch malloc() and free() to get the info you need.
Or, use Valgrind.
Not a direct answer but you could re-define the ::new and ::delete operators, and internally either via a singleton or global objects, keep track of the allocated, and de-allocated memory.
Edit: If this is a personal, DIY project then cool. But if its for something critical you can always jump onto one of the many leak detection libraries/programs available, a quick google search should suffice.
google-perftools can be used in your test code.
I'm working on a project that is supposed to be used from the command line with the following syntax:
program-name input-file
The program is supposed to process the input, compute some stuff and spit out results on stdout.
My language of choice is C++ for several reasons I'm not willing to debate. The computation phase will be highly symbolic (think compiler) and will use pretty complex dynamically allocated data structures. In particular, it's not amenable to RAII style programming.
I'm wondering if it is acceptable to forget about freeing memory, given that I expect the entire computation to consume less than the available memory and that the OS is free to reclaim all the memory in one step after the program finishes (assume program terminates in seconds). What are your feeling about this?
As a backup plan, if ever my project will require to run as a server or interactively, I figured that I can always refit a garbage collector into the source code. Does anyone have experience using garbage collectors for C++? Do they work well?
It shouldn't cause any problems in the specific situation described the question.
However, it's not exactly normal. Static analysis tools will complain about it. Most importantly, it builds bad habits.
Sometimes not deallocating memory is the right thing to do.
I used to write compilers. After building the parse tree and traversing it to write the intermediate code, we would simply just exit. Deallocating the tree would have
added a bit of slowness to the compiler, which we wanted of course to be as fast as possible.
taken up code space
taken time to code and test the deallocators
violated the "no code executes better than 'no code'" dictum.
HTH! FWIW, this was "back in the day" when memory was non-virtual and minimal, the boxes were much slower, and the first two were non-trivial considerations.
My feeling would be something like "WTF!!!"
Look at it this way:
You choose a programming language that does not include a garbage collector, we are not allowed to ask why.
You are basically stating that you are too lazy to care about freeing the memory.
Well, WTF again. Laziness isn't a good reason for anything, the least of what is playing around with memory without freeing it.
Just free the memory, it's a bad practice, the scenario may change and then can be a million reasons you can need that memory freed and the only reason for not doing it is laziness, don't get bad habits, and get used to do things right, that way you'll tend to do them right in the future!!
Not deallocating memory should not be problem but it is a bad practice.
Joel Coehoorn is right:
It shouldn't cause any problems.
However, it's not exactly normal.
Static analysis tools will complain
about it. Most importantly, it builds
bad habits.
I'd also like to add that thinking about deallocation as you write the code is probably a lot easier than trying to retrofit it afterwards. So I would probably make it deallocate memory; you don't know how your program might be used in future.
If you want a really simple way to free memory, look at the "pools" concept that Apache uses.
Well, I think that it's not acceptable. You've already alluded to potential future problems yourself. Don't think they're necessarily easy to solve.
Things like “… given that I expect the entire computation to consume less …” are famous last phrases. Similarly, refitting code with some feature is one of these things they all talk of and never do.
Not deallocating memory might sound good in the short run but can potentially create a huge load of problems in the long run. Personally, I just don't think that's worth it.
There are two strategies. Either you build in the GC design from the very beginning. It's more work but it will pay off. For a lot of small objects it might pay to use a pool allocator and just keep track of the memory pool. That way, you can keep track of the memory consumption and simply avoid a lot of problems that similar code, but without allocation pool, would create.
Or you use smart pointers throughout the program from the beginning. I actually prefer this method even though it clutters the code. One solution is to rely heavily on templates, which takes out a lot of redundancy when referring to types.
Take a look at projects such as WebKit. Their computation phase resembles yours since they build parse trees for HTML. They use smart pointers throughout their program.
Finally: “It’s a question of style … Sloppy work tends to be habit-forming.”
– Silk in Castle of Wizardry by David Eddings.
will use pretty complex dynamically
allocated data structures. In
particular, it's not amenable to RAII
style programming.
I'm almost sure that's an excuse for lazy programming. Why can't you use RAII? Is it because you don't want to keep track of your allocations, there's no pointer to them that you keep? If so, how do you expect to use the allocated memory - there's always a pointer to it that contains some data.
Is it because you don't know when it should be released? Leave the memory in RAII objects, each one referenced by something, and they'll all trickle-down free each other when the containing object gets freed - this is particularly important if you want to run it as a server one day, each iteration of the server effective runs a 'master' object that holds all others so you can just delete it and all the memory disappears. It also helps prevent you retro-fitting a GC.
Is it because all your memory is allocated and kept in-use all the time, and only freed at the end? If so see above.
If you really, really cannot think of a design where you cannot leak memory, at least have the decency to use a private heap. Destroy that heap before you quit and you'll have a better design already, if a little 'hacky'.
There are instances where memory leaks are ok - static variables, globally initialised data, things like that. These aren't generally large though.
Reference counting smart pointers like shared_ptr in boost and TR1 could also help you manage your memory in a simple manner.
The drawback is that you have to wrap every pointers that use these objects.
I've done this before, only to find that, much later, I needed the program to be able to process several inputs without separate commands, or that the guts of the program were so useful that they needed to be turned into a library routine that could be called many times from within another program that was not expected to terminate. It was much harder to go back later and re-engineer the program than it would have been to make it leak-less from the start.
So, while it's technically safe as you've described the requirements, I advise against the practice since it's likely that your requirements may someday change.
If the run time of your program is very short, it should not be a problem. However, being too lazy to free what you allocate and losing track of what you allocate are two entirely different things. If you have simply lost track, its time to ask yourself if you actually know what your code is doing to a computer.
If you are just in a hurry or lazy and the life of your program is small in relation to what it actually allocates (i.e. allocating 10 MB per second is not small if running for 30 seconds) .. then you should be OK.
The only 'noble' argument regarding freeing allocated memory sets in when a program exits .. should one free everything to keep valgrind from complaining about leaks, or just let the OS do it? That entirely depends on the OS and if your code might become a library and not a short running executable.
Leaks during run time are generally bad, unless you know your program will run in a short amount of time and not cause other programs far more important than your's as far as the OS is concerned to skid to dirty paging.
What are your feeling about this?
Some O/Ses might not reclaim the memory, but I guess you're not intenting to run on those O/Ses.
As a backup plan, if ever my project will require to run as a server or interactively, I figured that I can always refit a garbage collector into the source code.
Instead, I figure you can spawn a child process to do the dirty work, grab the output from the child process, let the child process die as soon as possible after that and then expect the O/S to do the garbage collection.
I have not personally used this, but since you are starting from scratch you may wish to consider the Boehm-Demers-Weiser conservative garbage collector
The answer really depends on how large your program will be and what performance characteristics it needs to exhibit. If you never deallocate memory, your process's memory footprint will be much larger than it would otherwise be. Depeding on the system, this could cause a lot of paging and slow down the performance for you or other applications on the system.
Beyond that, what everyone above says is correct. It probably won't cause harm in the short term, but it's a bad practice that you should avoid. You'll never be able to use the code again. Trying to retrofit a GC on afterwards will be a nightmare. Just think about going to each place you allocate memory and trying to retrofit it but not break anything.
One more reason to avoid doing this: reputation. If you fail to deallocate, everyone who maintains the code will curse your name and your rep in the company will take a hit. "Can you believe how dumb he was? Look at this code."
If it is non-trivial for you to determine where to deallocate the memory, I would be concerned that other aspects of the data structure manipulation may not be fully understood either.
Apart from the fact that the OS (kernel and/or C/C++ library) can choose not to free the memory when the execution ends, your application should always provide proper freeing of allocated memory as a good practice. Why? Suppose you decide to extend that application or reuse the code; you'll quickly get in trouble if the code you had previously written hogs up the memory unnecessarily, after finishing its job. It's a recipe for memory leaks.
In general, I agree it's a bad practice.
For a one shot program, it can be OK, but it kinda shows like you don't what you are doing.
There is one solution to your problem though - use a custom allocator, which preallocates larger blocks from malloc, and then, after the computation phase, instead of freeing all the little blocks from you custom allocator, just release the larger preallocated blocks of memory. Then you don't need to keep track of all objects you need to deallocate and when. One guy who wrote a compiler too explained this approach many years ago to me, so if it worked for him, it will probably work for you as well.
Try to use automatic variables in methods so that they will be freed automatically from the stack.
The only useful reason to not free heap memory is to save a tiny amount of computational power used in the free() method. You might loose any advantage if page faults become an issue due to large virtual memory needs with small physical memory resources. Some factors to consider are:
If you are allocating a few huge chunks of memory or many small chunks.
Is the memory going to need to be locked into physical memory.
Are you absolutely positive the code and memory needed will fit into 2GB, for a Win32 system, including memory holes and padding.
That's generally a bad idea. You might encounter some cases where the program will try to consume more memory than it's available. Plus you risk being unable to start several copies of the program.
You can still do this if you don't care of the mentioned issues.
When you exit from a program, the memory allocated is automatically returned to the system. So you may not deallocate the memory you had allocated.
But deallocations becomes necessary when you go for bigger programs such as an OS or Embedded systems where the program is meant to run forever & hence a small memory leak can be malicious.
Hence it is always recommended to deallocate the memory you have allocated.