Force garbage collection/compaction with malloc() - c++

I have a C++ program that benchmarks various algorithms on input arrays of different length. It looks more or less like this:
# (1)
for k in range(4..20):
# (2)
input = generate 2**k random points
for variant in variants:
benchmark the following call
run variant on input array
# (3)
Is it possible to reset the whole heap management at (2) to the state it had at (1)? All memory allocated on the heap that was allocated during the program is guaranteed to be freed at (3).
I am using g++ 4.3 on Linux.
Edit: I understand that there is no real garbage collection in C/C++. I want to force the memory allocation to join adjacent empty chunks of memory it has in its free list at (2).

If you want the test runs to start in the same heap states, you can run them in their own processes created by fork().

I think there's a simple solution to your problem - you could move the outside loop outside of your application and into a shell script or another application and pass the (k) (and any other) parameters through the command line to the benchmarked app - this way you'll be sure all executions had similar starting conditions.

There is no way of doing this using Standard C++ short of implementing your own versions of new & delete with their own heap management. An alternative would not be to use arrays but use std::vectors instead - you can then use a custom allocator to do the heap management.

What do you mean? There is no garbage collection in C, and certainly no compaction.
To "reset the state of the heap", you have to call free() for every malloc() call. And as I understand your code, you do that already.
Compaction is pretty much impossible. Unlike higher-level languages like Java or C#, you can not change the address of an object, because any pointers to it would be invalidated.

There's no automatic way, you have to manually delete whatever is on the heap to get back to the state of (1).

Their are a few pieces of garbage collection code out their. Look at perl/python/lua/ruby/mono/parrot/boehm/pike/slate/self/io etc etc.
Also look at alloca() and dynamic arrays. Also consider using structs to implement your own destructor or using gcc attributes to call free when a function leaves scope.

Related

Should I free long-lived memory that would normally be freed at the very end of the program?

I am currently writing a library that parse some structured binary data into a set of objects. These objects are expected to outlive any user code, and would normally be freed at the end or after the main function.
I am using shared (and weak) pointers to manage the memory of each object, but it is causing a lot of added complexity to the program, and raises structural issues that I will not get into in this particular question.
Considering that:
traversing the entirety of the binary data is expensive and I cannot afford to do it more than one time,
each visited entry is used to build an object, that then gets registered (i.e. added into the set),
entries in the binary data may rely on other entries that appears later but gets parsed immediately, and registered when the entry is visited again,
duplicate entries may appear at any moment, but I need to merge those duplicates into one instance (and update any pointer referencing those duplicates to the new merged entry) before registration,
every single one of those objects is guaranteed to be of one of many POD types deriving a common class, so nothing except memory needs to be cleaned up,
the resulting program will run on a modern OS (or in this case, that collects memory from dead processes),
I am very tempted to just use raw pointers, never free the memory taken by those objects and let the OS do its cleanup after the process exits.
What would be the best course of action?
If you're writing reusable code, you need to at least provide the option of cleaning up. What if some program uses your library for one operation, and then continues running? It's not safe to assume that the process exits immediately after your library's task is complete.
The other answers cover the general and standard approach: in an ideal world, yes, you'd clean up your memory, because it makes the code more generic and more reusable and helps with tooling. As others have said, std::unique_ptr for owning pointers and raw pointers for non-owning pointers should work well.
There are a couple of more specialized approaches that may or may not be useful:
Use a pool allocator (such as Boost.Pool, or roll your own) to allocate a bunch of memory up front then dole out pieces of it for your objects. You can then free every object at once by deleting the pool.
Intentionally not freeing memory is occasionally a valid technique. See, e.g., "Increasing Compiler Performance by Over 75%", by Walter Bright. Of course, a compiler is a specialized problem domain, and Walter Bright is probably one of the top compiler developers alive, so techniques that work for his problem domain shouldn't be blindly applied elsewhere.
the resulting program will run on a modern OS (or in this case, that collects memory from dead processes)
I am very tempted to just use raw pointers, never free the memory taken by those objects and let the OS do its cleanup after the process exits.
If you take this approach, then anyone who uses your library and then uses valgrind to try to detect memory leaks in their program will report massive leaks coming from your library and complain to you about it, so if I were you I definitely would not do this.
If you are writing a library then you should provide a cleanup function that frees all memory that you allocated.
A practical example of why this is useful is if a Windows DLL uses your library. When the library is loaded, static data is initialized. When the library is unloaded, static data is cleared. If your library has some global pointers to memory that is never freed, then load-unload cycles of the DLL will leak memory.
If the objects are all of the same type, then rather than allocating each one independently, you could just put them all into a vector and have them refer to each other by index number instead of using pointers. The vector's built-in memory management takes care of allocating space as needed, and when you're done with the objects, you can just destroy the vector to deallocate them all at once. (Note that vector::clear() doesn't actually free the memory, though it does make it available to store a new set of objects in the vector.)
If your objects aren't all the same type, you'll want to look into the more general concept of region-based memory management. As above, the idea is that you can allocate all your objects in a relatively small number of memory chunks (possibly just one), which can be freed later without having to track all the
individual objects allocated within.
If your ownership and lifetimes are clear I suggest you use unique_ptr for the owning pointers and raw pointers for the non-owning pointers. It should be less complex than shared_ptr and weak_ptr whilst still managing memory automatically.
I don't think not managing memory at all is an option. But I think using smart pointers to express ownership is not just about good memory management it also makes code easier to reason about.
Try to think of future maintenance work. Suppose your code needs to be broken up or other stuff done after it. In this case you're opening yourself up to leaks or being a resource hog later down the line.
Cleaning up (or being able to do s) is good. It may seem obvious now that an application should work with a single structured binary dataset throughout its entire lifetime, but you'll start kicking yourself once you realize you need to write an application that needs to reset half-way through and start over with another dataset.
(a related thing that's easy to overlook is that an application may need to work with two completely independent datasets at the same time, so try not to design your library to exclude that use case!)
That said, I think you may be focusing too much on the extremes. Code that shouldn't participate in memory management can use raw pointers, and this is reasonable when there is no risk of these pointers outliving your structured dataset in memory.
However, that doesn't mean that code that does participate in memory management needs to use raw pointers too. You can use smart pointers to manage your data structures even if you are passing raw pointers out to the user.
That aside, keep in mind that, in my experience, pointers are usually the wrong semantics — usually, most use cases are most natural with reference or value semantics, which means you should be passing around raw references, or passing around lightweight wrapper class that have reference or value semantics, but are implemented as containing a pointer to the actual data. Or even as a copy of the actual data if appropriate.

Best way of dynamically assigning variables in a C++ interpreter

I'm currently working on an interpreter I created that uses a pseudo c++ syntax. I am looking for the best way of storing variables created by the interpreter.
Currently I am using dynamic arrays that store pointers to those variables, but surely there's a better way? Maybe some sort of inline assembler code to control a memory block?
I'm not too concerned about portability as I am willing to rewrite those pieces of code for every major OS. I am simply looking for a way to create a memory block without it being locked to a single type. For my current testing I am using the MingW compiler on Windows.
Any ideas will be greately appreciated.
I'd say it very much depends on how your language works what you can do in an interpreter. Provided that this is a true interpreter and you don't have any precompile step you'd typically have two sorts of allocations - stack and heap allocations. If you support allocating things on the stack, you should implement this as a stack in your interpreter.
Use a vector<char> as a stack buffer. Keep track of each scope being entered, place a marker on the stack. When you encounter a stack allocated variable, grow the stack to accommodate the new local variable. Use placement new to initialize the object if that is required.
Add it to some sort of dictionary to match the variable name to the memory space so that your code knows where to find the name given the context. Like a symbol table really, only kept at run-time.
Once you encounter a scope end you will pop the stack of all the locally allocated symbols and call destructors if necessary. Also remove all the entries from the symbol table, since they are no longer in scope. This way you're avoiding the heap allocation entirely for objects that aren't used on the heap.
You don't need inline assembler code to do that. You can use a vector<char> as buffer and construct your objects within that using placement new. Note that with this technique you enter the realm of manual allocation management, which brings with it a host of problems, like dealing with fragmentation.
Easiest solution I've found is std::map<std::string, Variant>. The string stores the variable name, the Variant is a typedef for a boost::variant<all-interpreter-types>. That allows for code as easy as globals["foo"]=1; (sets interpreter variable foo to int, 1).
Sure, you could write your own code to do roughly the same, but then you have to worry about memory.

Why should C++ programmers minimize use of 'new'?

I stumbled upon Stack Overflow question Memory leak with std::string when using std::list<std::string>, and one of the comments says this:
Stop using new so much. I can't see any reason you used new anywhere you did. You can create objects by value in C++ and it's one of the huge advantages to using the language. You do not have to allocate everything on the heap. Stop thinking like a Java programmer.
I'm not really sure what he means by that.
Why should objects be created by value in C++ as often as possible, and what difference does it make internally? Did I misinterpret the answer?
There are two widely-used memory allocation techniques: automatic allocation and dynamic allocation. Commonly, there is a corresponding region of memory for each: the stack and the heap.
Stack
The stack always allocates memory in a sequential fashion. It can do so because it requires you to release the memory in the reverse order (First-In, Last-Out: FILO). This is the memory allocation technique for local variables in many programming languages. It is very, very fast because it requires minimal bookkeeping and the next address to allocate is implicit.
In C++, this is called automatic storage because the storage is claimed automatically at the end of scope. As soon as execution of current code block (delimited using {}) is completed, memory for all variables in that block is automatically collected. This is also the moment where destructors are invoked to clean up resources.
Heap
The heap allows for a more flexible memory allocation mode. Bookkeeping is more complex and allocation is slower. Because there is no implicit release point, you must release the memory manually, using delete or delete[] (free in C). However, the absence of an implicit release point is the key to the heap's flexibility.
Reasons to use dynamic allocation
Even if using the heap is slower and potentially leads to memory leaks or memory fragmentation, there are perfectly good use cases for dynamic allocation, as it's less limited.
Two key reasons to use dynamic allocation:
You don't know how much memory you need at compile time. For instance, when reading a text file into a string, you usually don't know what size the file has, so you can't decide how much memory to allocate until you run the program.
You want to allocate memory which will persist after leaving the current block. For instance, you may want to write a function string readfile(string path) that returns the contents of a file. In this case, even if the stack could hold the entire file contents, you could not return from a function and keep the allocated memory block.
Why dynamic allocation is often unnecessary
In C++ there's a neat construct called a destructor. This mechanism allows you to manage resources by aligning the lifetime of the resource with the lifetime of a variable. This technique is called RAII and is the distinguishing point of C++. It "wraps" resources into objects. std::string is a perfect example. This snippet:
int main ( int argc, char* argv[] )
{
std::string program(argv[0]);
}
actually allocates a variable amount of memory. The std::string object allocates memory using the heap and releases it in its destructor. In this case, you did not need to manually manage any resources and still got the benefits of dynamic memory allocation.
In particular, it implies that in this snippet:
int main ( int argc, char* argv[] )
{
std::string * program = new std::string(argv[0]); // Bad!
delete program;
}
there is unneeded dynamic memory allocation. The program requires more typing (!) and introduces the risk of forgetting to deallocate the memory. It does this with no apparent benefit.
Why you should use automatic storage as often as possible
Basically, the last paragraph sums it up. Using automatic storage as often as possible makes your programs:
faster to type;
faster when run;
less prone to memory/resource leaks.
Bonus points
In the referenced question, there are additional concerns. In particular, the following class:
class Line {
public:
Line();
~Line();
std::string* mString;
};
Line::Line() {
mString = new std::string("foo_bar");
}
Line::~Line() {
delete mString;
}
Is actually a lot more risky to use than the following one:
class Line {
public:
Line();
std::string mString;
};
Line::Line() {
mString = "foo_bar";
// note: there is a cleaner way to write this.
}
The reason is that std::string properly defines a copy constructor. Consider the following program:
int main ()
{
Line l1;
Line l2 = l1;
}
Using the original version, this program will likely crash, as it uses delete on the same string twice. Using the modified version, each Line instance will own its own string instance, each with its own memory and both will be released at the end of the program.
Other notes
Extensive use of RAII is considered a best practice in C++ because of all the reasons above. However, there is an additional benefit which is not immediately obvious. Basically, it's better than the sum of its parts. The whole mechanism composes. It scales.
If you use the Line class as a building block:
class Table
{
Line borders[4];
};
Then
int main ()
{
Table table;
}
allocates four std::string instances, four Line instances, one Table instance and all the string's contents and everything is freed automagically.
Because the stack is faster and leak-proof
In C++, it takes but a single instruction to allocate space—on the stack—for every local scope object in a given function, and it's impossible to leak any of that memory. That comment intended (or should have intended) to say something like "use the stack and not the heap".
The reason why is complicated.
First, C++ is not garbage collected. Therefore, for every new, there must be a corresponding delete. If you fail to put this delete in, then you have a memory leak. Now, for a simple case like this:
std::string *someString = new std::string(...);
//Do stuff
delete someString;
This is simple. But what happens if "Do stuff" throws an exception? Oops: memory leak. What happens if "Do stuff" issues return early? Oops: memory leak.
And this is for the simplest case. If you happen to return that string to someone, now they have to delete it. And if they pass it as an argument, does the person receiving it need to delete it? When should they delete it?
Or, you can just do this:
std::string someString(...);
//Do stuff
No delete. The object was created on the "stack", and it will be destroyed once it goes out of scope. You can even return the object, thus transfering its contents to the calling function. You can pass the object to functions (typically as a reference or const-reference: void SomeFunc(std::string &iCanModifyThis, const std::string &iCantModifyThis). And so forth.
All without new and delete. There's no question of who owns the memory or who's responsible for deleting it. If you do:
std::string someString(...);
std::string otherString;
otherString = someString;
It is understood that otherString has a copy of the data of someString. It isn't a pointer; it is a separate object. They may happen to have the same contents, but you can change one without affecting the other:
someString += "More text.";
if(otherString == someString) { /*Will never get here */ }
See the idea?
Objects created by new must be eventually deleted lest they leak. The destructor won't be called, memory won't be freed, the whole bit. Since C++ has no garbage collection, it's a problem.
Objects created by value (i. e. on stack) automatically die when they go out of scope. The destructor call is inserted by the compiler, and the memory is auto-freed upon function return.
Smart pointers like unique_ptr, shared_ptr solve the dangling reference problem, but they require coding discipline and have other potential issues (copyability, reference loops, etc.).
Also, in heavily multithreaded scenarios, new is a point of contention between threads; there can be a performance impact for overusing new. Stack object creation is by definition thread-local, since each thread has its own stack.
The downside of value objects is that they die once the host function returns - you cannot pass a reference to those back to the caller, only by copying, returning or moving by value.
C++ doesn't employ any memory manager by its own. Other languages like C# and Java have a garbage collector to handle the memory
C++ implementations typically use operating system routines to allocate the memory and too much new/delete could fragment the available memory
With any application, if the memory is frequently being used it's advisable to preallocate it and release when not required.
Improper memory management could lead memory leaks and it's really hard to track. So using stack objects within the scope of function is a proven technique
The downside of using stack objects are, it creates multiple copies of objects on returning, passing to functions, etc. However, smart compilers are well aware of these situations and they've been optimized well for performance
It's really tedious in C++ if the memory being allocated and released in two different places. The responsibility for release is always a question and mostly we rely on some commonly accessible pointers, stack objects (maximum possible) and techniques like auto_ptr (RAII objects)
The best thing is that, you've control over the memory and the worst thing is that you will not have any control over the memory if we employ an improper memory management for the application. The crashes caused due to memory corruptions are the nastiest and hard to trace.
I see that a few important reasons for doing as few new's as possible are missed:
Operator new has a non-deterministic execution time
Calling new may or may not cause the OS to allocate a new physical page to your process. This can be quite slow if you do it often. Or it may already have a suitable memory location ready; we don't know. If your program needs to have consistent and predictable execution time (like in a real-time system or game/physics simulation), you need to avoid new in your time-critical loops.
Operator new is an implicit thread synchronization
Yes, you heard me. Your OS needs to make sure your page tables are consistent and as such calling new will cause your thread to acquire an implicit mutex lock. If you are consistently calling new from many threads you are actually serialising your threads (I've done this with 32 CPUs, each hitting on new to get a few hundred bytes each, ouch! That was a royal p.i.t.a. to debug.)
The rest, such as slow, fragmentation, error prone, etc., have already been mentioned by other answers.
Pre-C++17:
Because it is prone to subtle leaks even if you wrap the result in a smart pointer.
Consider a "careful" user who remembers to wrap objects in smart pointers:
foo(shared_ptr<T1>(new T1()), shared_ptr<T2>(new T2()));
This code is dangerous because there is no guarantee that either shared_ptr is constructed before either T1 or T2. Hence, if one of new T1() or new T2() fails after the other succeeds, then the first object will be leaked because no shared_ptr exists to destroy and deallocate it.
Solution: use make_shared.
Post-C++17:
This is no longer a problem: C++17 imposes a constraint on the order of these operations, in this case ensuring that each call to new() must be immediately followed by the construction of the corresponding smart pointer, with no other operation in between. This implies that, by the time the second new() is called, it is guaranteed that the first object has already been wrapped in its smart pointer, thus preventing any leaks in case an exception is thrown.
A more detailed explanation of the new evaluation order introduced by C++17 was provided by Barry in another answer.
Thanks to #Remy Lebeau for pointing out that this is still a problem under C++17 (although less so): the shared_ptr constructor can fail to allocate its control block and throw, in which case the pointer passed to it is not deleted.
Solution: use make_shared.
To a great extent, that's someone elevating their own weaknesses to a general rule. There's nothing wrong per se with creating objects using the new operator. What there is some argument for is that you have to do so with some discipline: if you create an object you need to make sure it's going to be destroyed.
The easiest way of doing that is to create the object in automatic storage, so C++ knows to destroy it when it goes out of scope:
{
File foo = File("foo.dat");
// Do things
}
Now, observe that when you fall off that block after the end-brace, foo is out of scope. C++ will call its destructor automatically for you. Unlike Java, you don't need to wait for the garbage collection to find it.
Had you written
{
File * foo = new File("foo.dat");
you would want to match it explicitly with
delete foo;
}
or even better, allocate your File * as a "smart pointer". If you aren't careful about that it can lead to leaks.
The answer itself makes the mistaken assumption that if you don't use new you don't allocate on the heap; in fact, in C++ you don't know that. At most, you know that a small amount of memory, say one pointer, is certainly allocated on the stack. However, consider if the implementation of File is something like:
class File {
private:
FileImpl * fd;
public:
File(String fn){ fd = new FileImpl(fn);}
Then FileImpl will still be allocated on the stack.
And yes, you'd better be sure to have
~File(){ delete fd ; }
in the class as well; without it, you'll leak memory from the heap even if you didn't apparently allocate on the heap at all.
new() shouldn't be used as little as possible. It should be used as carefully as possible. And it should be used as often as necessary as dictated by pragmatism.
Allocation of objects on the stack, relying on their implicit destruction, is a simple model. If the required scope of an object fits that model then there's no need to use new(), with the associated delete() and checking of NULL pointers.
In the case where you have lots of short-lived objects allocation on the stack should reduce the problems of heap fragmentation.
However, if the lifetime of your object needs to extend beyond the current scope then new() is the right answer. Just make sure that you pay attention to when and how you call delete() and the possibilities of NULL pointers, using deleted objects and all of the other gotchas that come with the use of pointers.
When you use new, objects are allocated to the heap. It is generally used when you anticipate expansion. When you declare an object such as,
Class var;
it is placed on the stack.
You will always have to call destroy on the object that you placed on the heap with new. This opens the potential for memory leaks. Objects placed on the stack are not prone to memory leaking!
One notable reason to avoid overusing the heap is for performance -- specifically involving the performance of the default memory management mechanism used by C++. While allocation can be quite quick in the trivial case, doing a lot of new and delete on objects of non-uniform size without strict order leads not only to memory fragmentation, but it also complicates the allocation algorithm and can absolutely destroy performance in certain cases.
That's the problem that memory pools where created to solve, allowing to to mitigate the inherent disadvantages of traditional heap implementations, while still allowing you to use the heap as necessary.
Better still, though, to avoid the problem altogether. If you can put it on the stack, then do so.
I tend to disagree with the idea of using new "too much". Though the original poster's use of new with system classes is a bit ridiculous. (int *i; i = new int[9999];? really? int i[9999]; is much clearer.) I think that is what was getting the commenter's goat.
When you're working with system objects, it's very rare that you'd need more than one reference to the exact same object. As long as the value is the same, that's all that matters. And system objects don't typically take up much space in memory. (one byte per character, in a string). And if they do, the libraries should be designed to take that memory management into account (if they're written well). In these cases, (all but one or two of the news in his code), new is practically pointless and only serves to introduce confusions and potential for bugs.
When you're working with your own classes/objects, however (e.g. the original poster's Line class), then you have to begin thinking about the issues like memory footprint, persistence of data, etc. yourself. At this point, allowing multiple references to the same value is invaluable - it allows for constructs like linked lists, dictionaries, and graphs, where multiple variables need to not only have the same value, but reference the exact same object in memory. However, the Line class doesn't have any of those requirements. So the original poster's code actually has absolutely no needs for new.
I think the poster meant to say You do not have to allocate everything on the heap rather than the the stack.
Basically, objects are allocated on the stack (if the object size allows, of course) because of the cheap cost of stack-allocation, rather than heap-based allocation which involves quite some work by the allocator, and adds verbosity because then you have to manage data allocated on the heap.
Two reasons:
It's unnecessary in this case. You're making your code needlessly more complicated.
It allocates space on the heap, and it means that you have to remember to delete it later, or it will cause a memory leak.
Many answers have gone into various performance considerations. I want to address the comment which puzzled OP:
Stop thinking like a Java programmer.
Indeed, in Java, as explained in the answer to this question,
You use the new keyword when an object is being explicitly created for the first time.
but in C++, objects of type T are created like so: T{} (or T{ctor_argument1,ctor_arg2} for a constructor with arguments). That's why usually you just have no reason to want to use new.
So, why is it ever used at all? Well, for two reasons:
You need to create many values the number of which is not known at compile time.
Due to limitations of the C++ implementation on common machines - to prevent a stack overflow by allocating too much space creating values the regular way.
Now, beyond what the comment you quoted implied, you should note that even those two cases above are covered well enough without you having to "resort" to using new yourself:
You can use container types from the standard libraries which can hold a runtime-variable number of elements (like std::vector).
You can use smart pointers, which give you a pointer similar to new, but ensure that memory gets released where the "pointer" goes out of scope.
and for this reason, it is an official item in the C++ community Coding Guidelines to avoid explicit new and delete: Guideline R.11.
The core reason is that objects on heap are always difficult to use and manage than simple values. Writing code that are easy to read and maintain is always the first priority of any serious programmer.
Another scenario is the library we are using provides value semantics and make dynamic allocation unnecessary. Std::string is a good example.
For object oriented code however, using a pointer - which means use new to create it beforehand - is a must. In order to simplify the complexity of resource management, we have dozens of tools to make it as simple as possible, such as smart pointers. The object based paradigm or generic paradigm assumes value semantics and requires less or no new, just as the posters elsewhere stated.
Traditional design patterns, especially those mentioned in GoF book, use new a lot, as they are typical OO code.
new is the new goto.
Recall why goto is so reviled: while it is a powerful, low-level tool for flow control, people often used it in unnecessarily complicated ways that made code difficult to follow. Furthermore, the most useful and easiest to read patterns were encoded in structured programming statements (e.g. for or while); the ultimate effect is that the code where goto is the appropriate way to is rather rare, if you are tempted to write goto, you're probably doing things badly (unless you really know what you're doing).
new is similar — it is often used to make things unnecessarily complicated and harder to read, and the most useful usage patterns can be encoded have been encoded into various classes. Furthermore, if you need to use any new usage patterns for which there aren't already standard classes, you can write your own classes that encode them!
I would even argue that new is worse than goto, due to the need to pair new and delete statements.
Like goto, if you ever think you need to use new, you are probably doing things badly — especially if you are doing so outside of the implementation of a class whose purpose in life is to encapsulate whatever dynamic allocations you need to do.
One more point to all the above correct answers, it depends on what sort of programming you are doing. Kernel developing in Windows for example -> The stack is severely limited and you might not be able to take page faults like in user mode.
In such environments, new, or C-like API calls are prefered and even required.
Of course, this is merely an exception to the rule.
new allocates objects on the heap. Otherwise, objects are allocated on the stack. Look up the difference between the two.

special mode for no free() on delete's? C++

I know this will sound weird but i need my app to run fast and it does a lot of new and delete. All function calls new and passes the ptr back expect for the ones pushing a pointer to a list or deque.
At the end of the main loop the program goes across all of that memory and deletes it (unless i forgot to delete it). I am not exaggerating. Is there a mode that allows my code to allocate objs for new but doesnt delete them on delete but just mark it as unused so the next new for that struct will use it instead of doing a full allocation?
I imagine that would boost performance. It isnt fully done so i cant benchmark but i am sure i'd see a boost and if this was automatic then great. Is there such a mode or flag i can use?
I am using gcc (linux, win) and MSVC2010(win).
Try object pooling via Boost - http://www.boost.org/doc/libs/1_44_0/libs/pool/doc/index.html
What do you mean by "end of the main loop" - after the loop finishes, or just before it repeats?
If the former, then you can safely leave memory allocated when your process exits, although it isn't recommended. The OS will recover it, probably faster than you'd do by deleting each object. Destructors won't be called (so if they do anything important other than freeing resources associated with the process, then don't do this). Debugging tools will tell you that you have memory leaks, which isn't very satisfactory, but it works on the OSes you name.
If the latter, then "marking the memory unused so that the next new will use it" is exactly what delete does (well, after destructors). Some special-purpose memory allocators are faster than general-purpose allocators, though. You could try using a memory pool allocator instead of the default new/delete, if you have a lot of objects of the same size.
"I imagine that would boost performance"
Unfortunately we can't get performance boosts just by imagining them ;-p Write the code first, measure performance, then worry about changing your allocation once you know what you're up against. "Faster" is pretty much useless if the boring, simple version of your code is already "easily fast enough". You can usually change your allocation mechanism without significant changes to the rest of your code, so you don't have to worry about it in up-front design.
What your are describing is what malloc and co usually do, keeping memory around and reallocating it for similar sized allocations.
i believe what you are looking for is a "placement new".
Use new to allocate byte size memory only once.
And later on just use the ptr as follows
Type* ptr = static_cast<Type*>(operator new (sizeof(Type))); // need to call only once and store the pointer
Type* next_ptr = new (ptr) Type();
Manually call the destructors instead of delete.
next_ptr->~Type();
Since no memory allocation happens this should definitely be fast. "how fast" i am not sure
Using a memory pool is what you are looking to achieve.
You also could use a few of the windows Heap allocation methods, and instead of just free'ing each individual allocation, you could just free the entire heap all at once. Though if you are using a memory profiling tool (like bounds checker) it will think it's a problem.

memory management issues in C++

I would like to know what are the common memory management issues associated with C and C++. How can we debug these errors.
Here are few i know
1)uninitialized variable use
2)delete a pointer two times
3)writing array out of bounds
4)failing to deallocate memory
5)race conditions
1) malloc passed back a NULL pointer. You need to cast this pointer to whatever you want.
2) for string, need to allocate an extra byte for the end character.
3) double pointers.
4) (delete and malloc) and (free and new) don't go together
5) see what the actual function returns (return code) on failure and free the memory if it fails.
6) check for size allocating memory malloc(func +1)
7) check how u pass the double pointe **ptr to function
8) check for data size for behaviour undefined function call
9) failure of allocation of memory
Use RAII (Resource Acquisition Is Initialization). You should almost never be using new and delete directly in your code.
Preemptively preventing these errors in the first place:
1) Turn warnings to error levels to overcome the uninitialized errors. Compilers will frequently issue such warnings and by having them accessed as errors you'll be forced to fix the problem.
2) Use Smart pointers. You can find a good versions of such things in Boost.
3) Use vectors or other STL containers. Don't use arrays unless you're using one of the Boost variety.
4) Again, use a container object or smart pointer to handle this issue for you.
5) Use immutable data structures everywhere you can and place locks around modification points for shared mutable objects.
Dealing with legacy applications
1) Same as above.
2) Use integration tests to see how different components of your application play out. This should find many cases of such errors. Seriously consider having a formal peer review done by another group writing a different segment of the application who would come into contact with your naked pointers.
3) You can overload the new operator so that it allocates one extra byte before and after an object. These bytes should then be filled with some easily identifiable value such as 0xDEADBEEF. All you then have to do is check the preceeding byte before and after to witness if and when your memory is being corrupted by such errors.
4) Track your memory usage by running various components of your application many times. If your memory grows, check for missing deallocations.
5) Good luck. Sorry, but this is one of those things that can work 99.9% of the time and then, boom! The customer complains.
In addition to all already said, use valgrind or Bounds Checker to detect all of these errors in your program (except race conditions).
The best technique I know of is to avoid doing pointer operations and dynamic allocation directly. In C++, use reference parameters in preference to pointers. Use stl objects rather than rolling your own lists and containers. Use std::string instead of char *. Failing all that, take Rob K's advice and use RAII wherever you need to do alocations.
For C, there are some simliar things you can try to do, but you are pretty much doomed. Get a copy of Lint and pray for mercy.
Use a good compiler and set warning level to max
Wrap new/malloc and delete/free and bookkeep all allocations/deallocations
Replace raw arrays with an array class that does bounds checking (or use std::vector) (harder to do in C)
See 2.
This is hard, there exists some special debuggers such as jinx that specialize in this but I don't know how good they are.
Make sure you understand when to place object on the heap and when on the stack. As a general rule only put objects on the heap if you must, this will safe you lots of trouble.. Learn STL and use containers provided by the standard lib.
Take a look at my earlier answer to "Any reason to overload global new and delete?" You'll find a number of things here that will help with early detection and diagnosis, as well as a list of helpful tools. Most of the tools and techniques can be applied to either C or C++.
It's worth noting that valgrind's memcheck will spot 4 of your items, and helgrind may help spot the last (data races).
One common pattern I use is the following.
I keep the following three private variables in all allocator classes:
size_t news_;
size_t deletes_;
size_t in_use_;
In the allocator constructor, all of these three are initialized to 0.
Then on,
whenever allocator does a new, it increments news_, and
whenever allocator does a delete, it increments deletes_
Based on that, I put lot of asserts in the allocator code as:
assert( news_ - deletes_ == in_use_ );
This works very good for me.
Addition: I place the assert as precondition and postcondition on all non-trivial methods of the allocator. If the assert blovs, then I know I am doing something wrong. If the assert does not blow, with the all the testing I can do, then I get a reasonably sufficient confidence about the memory management correctness of my program.
1)uninitialized variable use
Automatically detected by compiler (turn warning to full and treat warnings as errors).
2)delete a pointer two times
Don't use RAW pointers. All pointers should by inside either a smart pointer or some form of RAII object that manages the lifetime of the pointer.
3)writing array out of bounds
Don't do it. Thats a logical bug.
You can mitigate it by uisng a container and a method that throws on out of bounds access (vector::at())
4)failing to deallocate memory
Don't use RAW pointers. See (2) above.
5)race conditions
Don't allow them. Allocate resources in priority order to avoid conflicting locks then lock objects when there is a potential for multiple write access (or read access when it is important).
One you forgot:
6) dereferencing pointer after it has been freed.
So far everyone seems to be answering "how to prevent", not "how to debug".
Assuming you're working with code which already has some of these issues, here are some ideas on debugging.
uninitialized variable use
The compiler can detect a lot of this. Initializing RAM to a known value helps in debugging those that escape. In our embedded system, we do a memory test before we leave the bootloader, which leaves all the RAM set to 0x5555. This turns out to be quite useful for debugging: when an integer == 21845, we know it was never initialized.
delete a pointer two times
Visual Studio should detect this at runtime. If you suspect this is happening in other systems, you can debug by replacing the delete call with custom code something like
void delete( void*p){ assert(*(int*)p!=p); _system_delete(p); *(int*)p=p;}
writing array out of bounds
Visual Studio should detect this at runtime. In other systems, add your own sentinels
int headZONE = 0xDEAD;
int array[whatever];
int tailZONE = 0xDEAD;
//add this line to check for overruns
//- place it using binary search to zero in on trouble spot
assert(headZONE==tailZONE&&tailZONE==0xDEAD)
failing to deallocate memory
Watch the stack growth. Record the free heap size before and after points which create and destroy objects; look for unexpected changes. Possibly write your own wrapper around the memory functions to track blocks.
race conditions
aaargh. Make sure you have a logging system with accurate timestamping.