Realloc creates dangling pointers? - c++

I wrote some array code that allocates memory and each value in the array is a type. I then have another array that consists of in to the first array for references.
Both arrays can grow. It uses realloc. Because the 2nd array contains pointers in to the first, they are surely not updated when the first array changes(I don't do it manually and there is no GC). Surely all the pointers in the 2nd array are invalid! (they point to memory that was free'ed by realloc).
This is the case right?
This seems like it would make persistent pointers to blocks of memory that may move very dangerous?
What is the standard solution? Don't use "global" pointers? Using pointers to pointers to pointers? I think I could make the 2nd array use **'s and could probably get things to work.
In a MT environment, things are even worse. Local pointers access may be moved in the middle, then the memory changed, and the local pointer is now wrong. (Which, of course might be solved by preventing the moves by lock, etc...)
Go with functional programming?

Yes, the realloc can invalidate your references. If there is no continuous space for relocating your array will be moved.
Consider using a container as the std::deque.

1) This is the case right?
Yes.
2) This seems like it would make persistent pointers to blocks of memory that may move very dangerous?
Yes.
3) What is the standard solution?
You design your application such that the life-times of your objects is well defined so that you do not refer to them after they are no longer required.
4) In a MT environment, things are even worse. Local pointers access may be moved in the middle, then the memory changed, and the local pointer is now wrong. (Which, of course might be solved by preventing the moves by lock, etc...)
Obviously you should never use a pointer that no-longer pointers to its resource. Managing shared resources in a MT environment is non trivial and there are a whole bunch of tools and techniques to achieve it.
5) Go with functional programming?
It is always advisable to avoid pointers if you can.
Without a specific problem it is hard to give a specific solution. But in order to achieve "not pointing at disappeared resources" we have various tools to employ. We have smart pointers, we have containers and we have value semantics. We need to understand how to use all of those but also we need to design with object lifetime in mind as a major consideration.
Object life-time should always be an important factor. However some languages (like Java for instance) mitigate against bad-design by providing a "safer" environment. C++, on the other hand, is rather less forgiving. However it does have a whole bunch of sophisticated tools for the task. That means a steeper learning curve but more efficiency and better control.

Related

Pointer pointing to deleted stack memory

I believe what I have just experienced is called "undefined behavior", but I'm not quite sure. Basically, I had an instance declared in an outer scope that holds addresses of a class. In the inner level I instantiated an object on the stack and stored the address of that instance into the holder.
After the inner scope had escaped, I checked to see if I could still access methods and properties of the removed instance. To my surprise it worked without any problem.
Is there a simple way to combat this? Is there a way I can clear deleted pointers from the list?
example:
std::vector<int*> holder;
{
int inside = 12;
holder.push_back(&inside);
}
cout << "deleted variable:" << holder[0] << endl;
Is there a simple way to combat this?
Sure, there are a number of ways to avoid this sort of problem.
The easiest way would be to not use pointers at all -- pass objects by value instead. i.e. In your example code, you could use a std::vector<int> instead of a std::vector<int *>.
If your objects are not copy-able for some reason, or are large enough that you think it will be too expensive to make copies of them, you could allocate them on the heap instead, and manage their lifetimes automatically using shared_ptr or unique_ptr or some other smart-pointer class. (Note that passing objects by value is more efficient than you might think, even for larger objects, since it avoids having to deal with the heap, which can be expensive... and modern CPUs are most efficient when dealing with contiguous memory. Finally, modern C++ has various optimizations that allow the compiler to avoid actually doing a data copy in many circumstances)
In general, retaining pointers to stack objects is a bad idea unless you are 100% sure that the pointer's lifetime will be a subset of the lifetime of the stack object it points to. (and even then it's probably a bad idea, because the next programmer who takes over the code after you've moved on to your next job might not see this subtle hazard and is therefore likely to inadvertently introduce dangling-pointer bugs when making changes to the code)
After the inner scope had escaped, I checked to see if I could still
access methods and properties of the removed instance. To my surprise
it worked without any problem.
That can happen if the memory where the object was hasn't been overwritten by anything else yet -- but definitely don't rely on that behavior (or any other particular behavior) if/when you dereference an invalid pointer, unless you like spending a lot of quality time with your debugger chasing down random crashes and/or other odd behavior :)
Is there a way I can clear deleted pointers from the list?
In principle, you could add code to the objects' destructors that would go through the list and look for pointers to themselves and remove them. In practice, I think that is a poor approach, since it uses up CPU cycles trying to recover from an error that a better design would not have allowed to be made in the first place.
Btw this is off topic but it might interest you that the Rust programming language is designed to detect and prevent this sort of error by catching it at compile-time. Maybe someday C++ will get something similar.
There is no such thing as deleted pointer. Pointer is just a number, representing some address in your process virtual address space. Even if stack frame is long gone, memory, that was holding it is still available, since it was allocated when thread started, so technically speaking, it is still a valid pointer, valid in terms, that you could dereference it and get something. But since object it was pointing is already gone, valid term will be dangling pointer. Moral is that if you have pointer to the object in the stack frame, there is no way to determine is it valid or not, not even using functions like IsBadReadPtr (Win32 API just for example). The best way to prevent such situations is avoid returning and storing pointers to the stack objects.
However, if you wish to track your heap allocated memory and automatically deallocate it after it is no longer used, you could utilize smart pointers (std::shared_ptr, boost::shared_ptr, etc).

Compacting garbage collector implementation in C++0x

I'm implementing a compacting garbage collector for my own personal use in C++0x, and I've got a question. Obviously the mechanics of the collector depend upon moving objects, and I've been wondering how to implement this in terms of the smart pointer types that point to it. I've been thinking about either pointer-to-pointer in the pointer type itself, or, the collector maintains a list of pointers that point to each object so that they can be modified, removing the need for a double de-ref when accessing the pointer but adding some extra overhead during collection and additional memory overhead. What's the best way to go here?
Edit: My primary concern is for speedy allocation and access. I'm not concerned with particularly efficient collections or other maintenance, because that's not really what the GC is intended for.
There's nothing straight forward about grafting on extra GC to C++, let alone a compacting algorithm. It isn't clear exactly what you're trying to do and how it will interact with the rest of the C++ code.
I have actually written a gc in C++ which works with existing C++ code, and it had a compactor at one stage (though I dropped it because it was too slow). But there are many nasty semantic problems. I mentioned to Bjarne only a few weeks ago that C++ lacks the operator required to do it properly and the situation is that it is unlikely to ever exist because it has limited utility..
What you actually need is a "re-addres-me" operator. What happens is that you do not actually move objects around. You just use mmap to change the object address. This is much faster, and, in effect, it is using the VM features to provide handles.
Without this facility you have to have a way to perform an overlapping move of an object, which you cannot do in C++ efficiently: you'd have to move to a temporary first. In C, it is much easier, you can use memmove. At some stage all the pointers to or into the moved objects have to be adjusted.
Using handles does not solve this problem, it just reduces the problem from arbitrary sized objects to constant sized ones: these are easier to manage in an array, but the same problem exists: you have to manage the storage. If you remove lots of handle from the array randomly .. you still have a problem with fragmentation.
So don't bother with handles, they don't work.
This is what I did in Felix: you call new(shape, collector) T(args). Here the shape is a descriptor of the type, including a list of offsets which contain (GC) pointers, and the address of a routine to finalise the object (by default, it calls the destructor).
It also contains a flag saying if the object can be moved with memmove. If the object is big or immobile, it is allocated by malloc. If the object is small and mobile, it is allocated in an arena, provided there is space in the arena.
The arena is compacted by moving all the objects in it, and using the shape information to globally adjust all the pointers to or into these objects. Compaction can be done incrementally.
The downside for a C++ programmer is the need to construct a correct shape object to pass. This doesn't bother me because I'm implementing a language which can generate the shape information automatically.
Now: the key point is: to do compaction, you must use a precise collector. Compaction cannot work with a conservative collector. This is very important. It is fine to allow some leakage if you see an value that looks like a pointer but happens to be an integer: some object won't be collected, but this is usually no big deal. But for compaction you have to adjust the pointers but you'd better not change that integer: so you have to know for sure when something is a pointer, so your collector has to be precise: the shape must be known.
In Ocaml this is relatively simple: everything is either a pointer or integer and the low bit is used at run time to tell. Objects pointed at have a code telling the type, and there are only a few types: either a scalar (don't scan it) or an aggregate (scan it, it only contains integers or pointers).
This is a pretty straight-forward question so here's a straight-forward answer:
Mark-and-sweep (and occasionally mark-and-compact to avoid heap fragmentation) is the fastest when it comes to allocation and access (avoiding double de-refs). It's also very easy to implement. Since you're not worried about collection performance impact (mark-and-sweep tends to freeze up the process in a nondeterministically), this should be the way to go.
Implementation details found at:
http://www.brpreiss.com/books/opus5/html/page424.html#secgarbagemarksweep
http://www.brpreiss.com/books/opus5/html/page428.html
A nursery generation will give you the best possible allocation performance because it is just a pointer bump.
You could implement pointer updates without using double indirection by using techniques like a shadow stack but this will be slow and very error prone if you're writing this C++ code by hand.

Raw Pointer Management in C++

I have a piece of performance critical code. The class and objects are fairly large, hence, it would be stored as pointers in the STL container. Problems arise when the pointers to objects need to be stored in multiple different containers based on some logic. It is very messy in handling the ownership of the object as I couldn't isolate the ownership of the object to a single containers(which I could just delete from the single container). Other than using smart pointer (since it is performance critical and smart pointer might affects the performance), what could I do?
Thanks.
You are asking for impossible - from one point, you are asking for exceptional performance, such that smart pointers you claim cannot provide, and you also happen to ask for safety and tidiness. Well, one comes at the cost of another, actually. You could, of course try to write your own shared pointer, which would be more lightweight than boost's, but still provide the basic functionality. Incidentally, have you actually tried boost::shared_ptr? Did it actually slow down the performance?
Your question is very awkward: you ask for performance with a messy logic ?
shared_ptr have incredible performance, really, and though you can get better it is probably your best choice: it works.
You could look up another Boost smart pointer though: boost::intrusive_ptr.
This is done at the cost of foregoing: weak_ptr and in exchange allows to have a single memory allocation for both the counter and the object. Packing up the two yields a SMALL increase in performance.
If you do not have cyclic reference, try and check it out, it might be what you were looking for.
Intrusive smart pointers are typically more efficient than normal smart pointers, yet easier than dumb pointers. For instance, check out boost::intrusive_ptr<T>
If the objects don't refer to one another, manual reference counting might be worth trying. There would be more cost when adding to or removing from a list, but no overhead to the actual object access. (This can be painful to diagnose when you get it wrong, so I recommend being careful to get it right.)
If there is dead time between operations, consider a sort of garbage collect. Maintain the list of all objects (an intrusive list would probably do). When you have time to spare, cross-reference this against the other lists; any objects that aren't in a list can get deleted. You don't need an extra array to do this (just a global counter and a last-seen counter on each object) so it could be reasonably efficient.
Another option would be to use a smart pointer that provides access to the underlying pointer. If you're trying to avoid the overhead of calling an overloaded operator-> a lot then this might be worth trying. Store the smart pointers in the lists (this does the lifetime management for you) and then when running through the objects you can retrieve the raw pointer for each and operate using that (so you don't incur the overhead of any overloaded operator-> etc.). For example:
std::vector<smart_ptr<T> > objects;
if(!objects.empty()) {
smart_ptr<T> *objects_raw=&objects[0];
for(size_t n=objects.size(),i=0;i<n;++i) {
T *object=objects_raw[i].get_ptr();
// do stuff
}
}
This is the sort of approach I prefer personally. Long-term storage gets a smart pointer, short-term storage gets a plain pointer. The object lifetimes are easy to manage, and you don't fall foul of 1,000,000 tiny overheads (more important for keeping the debug build runnable than it is for the release build, but it can easily add to wasted time nonetheless).

Why don't some languages allow declaration of pointers?

I'm programming in C++ right now, and I love using pointers. But it seems that other, newer languages like Java, C#, and Python don't allow you to explicitly declare pointers. In other words, you can't write both int x and int * y, and have x be a value while y is a pointer, in any of those languages. What is the reasoning behind this?
Pointers aren't bad, they are just easy to get wrong. In newer languages they have found ways of doing the same things, but with less risk of shooting yourself in the foot.
There is nothing wrong with pointers though. Go ahead and love them.
Toward your example, why would you want both x and y pointing to the same memory? Why not just always call it x?
One more point, pointers mean that you have to manage the memory lifetime yourself. Newer languages prefer to use garbage collection to manage the memory and allowing for pointers would make that task quite difficult.
I'll start with one of my favorite Scott Meyers quotes:
When I give talks on exception handling, I teach people two things:
POINTERS ARE YOUR ENEMIES, because they lead to the kinds of problems that auto_ptr is designed to eliminate.
POINTERS ARE YOUR FRIENDS, because operations on pointers can't throw.
Then I tell them to have a nice day :-)
The point is that pointers are extremely useful and it's certainly necessary to understand them when programming in C++. You can't understand the C++ memory model without understanding pointers. When you are implementing a resource-owning class (like a smart pointer, for example), you need to use pointers, and you can take advantage of their no-throw guarantee to write exception-safe resource owning classes.
However, in well-written C++ application code, you should never have to work with raw pointers. Never. You should always use some layer of abstraction instead of working directly with pointers:
Use references instead of pointers wherever possible. References cannot be null and they make code easier to understand, easier to write, and easier to code review.
Use smart pointers to manage any pointers that you do use. Smart pointers like shared_ptr, auto_ptr, and unique_ptr help to ensure that you don't leak resources or free resources prematurely.
Use containers like those found in the standard library for storing collections of objects instead of allocating arrays yourself. By using containers like vector and map, you can ensure that your code is exception safe (meaning that even when an exception is thrown, you won't leak resources).
Use iterators when working with containers. It's far easier to use iterators correctly than it is to use pointers correctly, and many library implementations provide debug support for helping you to find where you are using them incorrectly.
When you are working with legacy or third-party APIs and you absolutely must use raw pointers, write a class to encapsulate usage of that API.
C++ has automatic resource management in the form of Scope-Bound Resource Management (SBRM, also called Resource Acquisition is Initialization, or RAII). Use it. If you aren't using it, you're doing it wrong.
Pointers can be abused, and managed languages prefer to protect you from potential pitfalls. However, pointers are certainly not bad - they're an integral feature of the C and C++ languages, and writing C/C++ code without them is both tricky and cumbersome.
A true "pointer" has two characteristics.
It holds the address of another object (or primitive)
and exposes the numeric nature of that address so you can do arithmetic.
Typically the arithmetic operations defined for pointers are:
Adding an integer to a pointer into an array, which returns the address of another element.
Subtracting two pointers into the same array, which returns the number of elements in-between (inclusive of one end).
Comparing two pointers into the same array, which indicates which element is closer to the head of the array.
Managed languages generally lead you down the road of "references" instead of pointers. A reference also holds the address of another object (or primitive), but arithmetic is disallowed.
Among other things, this means you can't use pointer arithmetic to walk off the end of an array and treat some other data using the wrong type. The other way of forming an invalid pointer is taken care of in such environments by using garbage collection.
Together this ensures type-safety, but at a terrible loss of generality.
I try to answer OP's question directly:
In other words, you can't write both
int x and int * y, and have x be a
value while y is a pointer, in any of
those languages. What is the reasoning
behind this?
The reason behind this is the managed memory model in these languages. In C# (or Python, or Java,...) the lifetime of resources and thus usage of memory is managed automatically by the underlying runtime, or by it's garbage collector, to be precise. Briefly: The application has no control over the location of a resource in memory. It is not specified - and is even not guaranteed to stay constant during the lifetime of a resource. Hence, the notion of pointer as 'a location of something in virtual or physical memory' is completely irrelevant.
As someone already mentioned, pointers can, and actually, will go wrong if you have a massive application. This is one of the reasons we sometimes see Windows having problems due to NULL pointers created! I personally don't like pointers because it causes terrible memory leak and no matter how good you manage your memory, it will eventually hunt you down in some ways. I experienced this a lot with OpenCV when working around image processing applications. Having lots of pointers floating around, putting them in a list and then retrieving them later caused problems for me. But again, there are good sides of using pointers and it is often a good way to tune up your code. It all depends on what you are doing, what specs you have to meet, etc.
Pointers aren't bad. Pointers are difficult. The reason you can't have int x and int * y in Java, C#, etc., is that such languages want to put you away from solving coding problems (which are eventually subtle mistakes of yours), and they want to bring you closer to producing solutions to your project. They want to give you productivity.
As I said before, pointers aren't bad, they're just diffucult. In a Hello World programs pointers seems to be a piece of cake. Nevertheless when the program start growing, the complexity of managing pointers, passing the correct values, counting the objects, deleting the pointers, etc. start to get complex.
Now, from the point of view of the programmer (the user of the language, in this case you) this would lead to another problem that will become evident over the time: the longer it takes you don't read the code, the more difficult it will become to understand it again (i.e. projects from past years, months or event days). Added to this the fact that sometimes pointers use more than one level of indirection (TheClass ** ptr).
Last but not least, there are topics when the pointers application is very useful. As I mention before, they aren't bad. In Algorithms field they are very useful because in mathemical context you can refer to a specific value using a simple pointer (i.e. int *) and change is value without creating another object, and ironically it becomes easier to implement the algorithms with the use of pointers rather than without it.
Reminder: Each time when you ask why pointers or another thing is bad or is good, instead try to think in the historical context around when such topic or technology emerged and the problem that they tried to solve. In this case, pointers where needed to access to the memory of the PDP computers at Bell laboratories.

Are there any reasons not to use Boost::shared_ptrs?

I've asked a couple questions (here and here) about memory management, and invariably someone suggests that I use boost::shared_ptrs.
Given how useful they seem to be, I'm seriously considering switching over my entire application to use boost::shared_ptrs.
However, before I jump in with both feet and do this, I wanted to ask -- Has anyone had any bad experiences with boost::shared_ptrs? Is there some pitfall to using them that I need to watch out for?
Right now, they seem almost too good to be true - taking care of most of my garbage collection concerns automatically. What's the downside?
The downside is they're not free. You especially shouldn't use shared_ptr/shared_array when scoped_ptr/scoped_array (or plain old stack allocation) will do. You'll need to manually break cycles with weak_ptr if you have any. The vector question you link to is one case where I might reach for a shared_ptr, the second question I would not. Not copying is a premature optimization, especially if the string class does it for you already. If the string class is reference counted, it will also be able to implement COW properly, which can't really be done with the shared_ptr<string> approach. Using shared_ptr willy-nilly will also introduce "interface friction" with external libraries/apis.
Boost shared pointers or any other technique of memory management in C++ is not a panacea. There is no substitution for careful coding. If you dive into using boost::shared_ptr be aware of object ownership and avoid circular references. You are going to need to explicitly break cycles or use boost::weak_ptr where necessary.
Also be careful to always use boost::shared_ptr for an instance from the moment it is allocated. That way you are sure you won't have dangling references. One way you can ensure that is to use factory methods that return your newly created object in a shared_ptr.
typedef boost::shared_ptr<Widget> WidgetPtr;
WidgetPtr myWidget = Widget::Create();
I use shared_ptr's often.
Since Shared_ptr's are copied by-value, you can incur the cost of copying both the pointer value and a reference count, but if boost::intrusive_ptr is used, the reference count must be added to your class, and there is no additional overhead above that of using a raw pointer.
However, in my experience, more than 99% of the time, the overhead of copying boost::shared_ptr instances throughout your code is insignificant. Usually, as C. A. R. Hoare noted, premature optimization is pointless - most of the time other code will use significantly more time than the time to copy small objects. Your mileage may vary. If profiling show the copying is an issue, you can switch to intrusive pointers.
As already noted above, cycles must be broken by using a weak_ptr, or there will be a memory leak. This will happen with data structures such as some graphs, but if, for example, you are making a tree structure where the leaves never point backwards, you can just use shared_pointers for nodes of the tree without any issues.
Using shared_ptr's properly greatly simplifies code, makes it easier to read, and easier to maintain. In many cases using them is the right choice.
Of course, as already mentioned, in some cases, using scoped_ptr (or scoped_array) is the right choice. If the pointee isn't being shared, don't use shared pointers!
Finally, the most recent C++ standard provides the std::tr1::shared_ptr template, which is now on most platforms, although I don't think there is an intrusive pointer type for tr1 (or rather, there might be, but I have not heard of it myself).
Dynamic memory overhead (i.e., extra allocations) plus all the overhead associated with reference counted smart pointers.