Why don't some languages allow declaration of pointers? - c++

I'm programming in C++ right now, and I love using pointers. But it seems that other, newer languages like Java, C#, and Python don't allow you to explicitly declare pointers. In other words, you can't write both int x and int * y, and have x be a value while y is a pointer, in any of those languages. What is the reasoning behind this?

Pointers aren't bad, they are just easy to get wrong. In newer languages they have found ways of doing the same things, but with less risk of shooting yourself in the foot.
There is nothing wrong with pointers though. Go ahead and love them.
Toward your example, why would you want both x and y pointing to the same memory? Why not just always call it x?
One more point, pointers mean that you have to manage the memory lifetime yourself. Newer languages prefer to use garbage collection to manage the memory and allowing for pointers would make that task quite difficult.

I'll start with one of my favorite Scott Meyers quotes:
When I give talks on exception handling, I teach people two things:
POINTERS ARE YOUR ENEMIES, because they lead to the kinds of problems that auto_ptr is designed to eliminate.
POINTERS ARE YOUR FRIENDS, because operations on pointers can't throw.
Then I tell them to have a nice day :-)
The point is that pointers are extremely useful and it's certainly necessary to understand them when programming in C++. You can't understand the C++ memory model without understanding pointers. When you are implementing a resource-owning class (like a smart pointer, for example), you need to use pointers, and you can take advantage of their no-throw guarantee to write exception-safe resource owning classes.
However, in well-written C++ application code, you should never have to work with raw pointers. Never. You should always use some layer of abstraction instead of working directly with pointers:
Use references instead of pointers wherever possible. References cannot be null and they make code easier to understand, easier to write, and easier to code review.
Use smart pointers to manage any pointers that you do use. Smart pointers like shared_ptr, auto_ptr, and unique_ptr help to ensure that you don't leak resources or free resources prematurely.
Use containers like those found in the standard library for storing collections of objects instead of allocating arrays yourself. By using containers like vector and map, you can ensure that your code is exception safe (meaning that even when an exception is thrown, you won't leak resources).
Use iterators when working with containers. It's far easier to use iterators correctly than it is to use pointers correctly, and many library implementations provide debug support for helping you to find where you are using them incorrectly.
When you are working with legacy or third-party APIs and you absolutely must use raw pointers, write a class to encapsulate usage of that API.
C++ has automatic resource management in the form of Scope-Bound Resource Management (SBRM, also called Resource Acquisition is Initialization, or RAII). Use it. If you aren't using it, you're doing it wrong.

Pointers can be abused, and managed languages prefer to protect you from potential pitfalls. However, pointers are certainly not bad - they're an integral feature of the C and C++ languages, and writing C/C++ code without them is both tricky and cumbersome.

A true "pointer" has two characteristics.
It holds the address of another object (or primitive)
and exposes the numeric nature of that address so you can do arithmetic.
Typically the arithmetic operations defined for pointers are:
Adding an integer to a pointer into an array, which returns the address of another element.
Subtracting two pointers into the same array, which returns the number of elements in-between (inclusive of one end).
Comparing two pointers into the same array, which indicates which element is closer to the head of the array.
Managed languages generally lead you down the road of "references" instead of pointers. A reference also holds the address of another object (or primitive), but arithmetic is disallowed.
Among other things, this means you can't use pointer arithmetic to walk off the end of an array and treat some other data using the wrong type. The other way of forming an invalid pointer is taken care of in such environments by using garbage collection.
Together this ensures type-safety, but at a terrible loss of generality.

I try to answer OP's question directly:
In other words, you can't write both
int x and int * y, and have x be a
value while y is a pointer, in any of
those languages. What is the reasoning
behind this?
The reason behind this is the managed memory model in these languages. In C# (or Python, or Java,...) the lifetime of resources and thus usage of memory is managed automatically by the underlying runtime, or by it's garbage collector, to be precise. Briefly: The application has no control over the location of a resource in memory. It is not specified - and is even not guaranteed to stay constant during the lifetime of a resource. Hence, the notion of pointer as 'a location of something in virtual or physical memory' is completely irrelevant.

As someone already mentioned, pointers can, and actually, will go wrong if you have a massive application. This is one of the reasons we sometimes see Windows having problems due to NULL pointers created! I personally don't like pointers because it causes terrible memory leak and no matter how good you manage your memory, it will eventually hunt you down in some ways. I experienced this a lot with OpenCV when working around image processing applications. Having lots of pointers floating around, putting them in a list and then retrieving them later caused problems for me. But again, there are good sides of using pointers and it is often a good way to tune up your code. It all depends on what you are doing, what specs you have to meet, etc.

Pointers aren't bad. Pointers are difficult. The reason you can't have int x and int * y in Java, C#, etc., is that such languages want to put you away from solving coding problems (which are eventually subtle mistakes of yours), and they want to bring you closer to producing solutions to your project. They want to give you productivity.
As I said before, pointers aren't bad, they're just diffucult. In a Hello World programs pointers seems to be a piece of cake. Nevertheless when the program start growing, the complexity of managing pointers, passing the correct values, counting the objects, deleting the pointers, etc. start to get complex.
Now, from the point of view of the programmer (the user of the language, in this case you) this would lead to another problem that will become evident over the time: the longer it takes you don't read the code, the more difficult it will become to understand it again (i.e. projects from past years, months or event days). Added to this the fact that sometimes pointers use more than one level of indirection (TheClass ** ptr).
Last but not least, there are topics when the pointers application is very useful. As I mention before, they aren't bad. In Algorithms field they are very useful because in mathemical context you can refer to a specific value using a simple pointer (i.e. int *) and change is value without creating another object, and ironically it becomes easier to implement the algorithms with the use of pointers rather than without it.
Reminder: Each time when you ask why pointers or another thing is bad or is good, instead try to think in the historical context around when such topic or technology emerged and the problem that they tried to solve. In this case, pointers where needed to access to the memory of the PDP computers at Bell laboratories.

Related

Realloc creates dangling pointers?

I wrote some array code that allocates memory and each value in the array is a type. I then have another array that consists of in to the first array for references.
Both arrays can grow. It uses realloc. Because the 2nd array contains pointers in to the first, they are surely not updated when the first array changes(I don't do it manually and there is no GC). Surely all the pointers in the 2nd array are invalid! (they point to memory that was free'ed by realloc).
This is the case right?
This seems like it would make persistent pointers to blocks of memory that may move very dangerous?
What is the standard solution? Don't use "global" pointers? Using pointers to pointers to pointers? I think I could make the 2nd array use **'s and could probably get things to work.
In a MT environment, things are even worse. Local pointers access may be moved in the middle, then the memory changed, and the local pointer is now wrong. (Which, of course might be solved by preventing the moves by lock, etc...)
Go with functional programming?
Yes, the realloc can invalidate your references. If there is no continuous space for relocating your array will be moved.
Consider using a container as the std::deque.
1) This is the case right?
Yes.
2) This seems like it would make persistent pointers to blocks of memory that may move very dangerous?
Yes.
3) What is the standard solution?
You design your application such that the life-times of your objects is well defined so that you do not refer to them after they are no longer required.
4) In a MT environment, things are even worse. Local pointers access may be moved in the middle, then the memory changed, and the local pointer is now wrong. (Which, of course might be solved by preventing the moves by lock, etc...)
Obviously you should never use a pointer that no-longer pointers to its resource. Managing shared resources in a MT environment is non trivial and there are a whole bunch of tools and techniques to achieve it.
5) Go with functional programming?
It is always advisable to avoid pointers if you can.
Without a specific problem it is hard to give a specific solution. But in order to achieve "not pointing at disappeared resources" we have various tools to employ. We have smart pointers, we have containers and we have value semantics. We need to understand how to use all of those but also we need to design with object lifetime in mind as a major consideration.
Object life-time should always be an important factor. However some languages (like Java for instance) mitigate against bad-design by providing a "safer" environment. C++, on the other hand, is rather less forgiving. However it does have a whole bunch of sophisticated tools for the task. That means a steeper learning curve but more efficiency and better control.

using non-smart pointers in modern C++

Short Version:
Is there any acceptable reason for using non-smart pointers in modern C++?
Long Version:
we have a huge product that contains lot's of old C++ code and now we are trying to refactor it to the modern C++ era. Along with all the old fashioned code, there is huge amount of pointers passing around (mostly with SAL annotations to provide some sense of security) and I was wondering if we should change all of them to smart pointers or maybe leave some of them as is?
Trying to convert some of these codes, I ended up with a code that is simply arguable for over using smart pointers.
So the question is: is there such a thing as over using smart pointers?
Or in other words: is there any acceptable scenario for non-smart pointers these days?
Smart pointers (unique_ptr and shared_ptr) should be OWNING pointers (i.e., responsible for destruction of the object). The bottom line of using them is that any object created by new should be shoved into a unique_ptr ASAP, to prevent memory leaks. After that, the unique_ptr should end up being moved:
either into a shared_ptr if ownership should be shared,
or into a unique_ptr if ownership is determined by a scope (of a block, or the lifetime of an object).
releases should be rare. If your code passes non-owning pointers around, these should be:
raw pointers if they may be null, (obtained by get)
references if they may not be null, (obtained by get)
unique_ptrs by value if the aim of the call is transferring ownership. (in which case you'll need to move them)
Factory methods should return unique_ptrs by value. (because then, if you don't assign the return value of the factory method, the object is immediately de-allocated)
And check out Ali's answer regarding links to some philosophical points of handling legacy code. (Which I totally agree on)
Short Version:
Is there any acceptable reason for using non-smart
pointers in modern C++?
Short answer:
Definitely, if they only serve for observation, that is, they don't own the pointee. However, try to use references instead of pointers even in this case; use pointers only if you really need to make them optional (initialize with null_ptr then reassign later, for example).
Long Version:
we have a huge product that contains lot's of old C++ code and now we are trying to refactor it to the modern C++ era. [...]
Long answer:
As I am reading these lines this answer comes to mind:
Advice on working with legacy code
I wish I could upvote this answer more than once. I would quote: "[...] for each re-factor we have made we can justify 'this specific change will make the actual task we are doing now easier'. Rather than 'this is now cleaner for future work'."
Long story short, don't do big refactoring unless you really need to.
So the question is: is there such a thing as over using smart pointers?
In my opinion, std::shared_ptr is overused. It is very comfortable to use, and it gives you the illusion that you don't have to think about ownership issues. But that is not the whole picture. I fully agree with Sean Parent: "a shared pointer is as good as a global variable." Shared pointers can also introduce very difficult ownership issues, etc.
On the other hand, if you need to allocate something on the heap, use a unique_ptr. You can't overuse it, if you really need heap allocation. In my experience, using unique_ptr also leads to cleaner and easier to understand code, as the ownership issues become self-explanatory.
Interesting talks from Sean Parent on how to avoid / reduce the usage of pointers are:
Inheritance Is The Base Class of Evil
Value Semantics and Concepts-based Polymorphism
Hope this helps.
Yes, raw pointers still have a uses as an "optional reference". I.e. a T* is similar to a T&. Neither implies ownership, but a T* can be a nullptr.
Check out the talks here: http://channel9.msdn.com/Events/GoingNative/2013 (especially Stroustrup's talk).
The short answer is no, assuming "modern C++" is >= c++11
The long answer is that wasn't always the case and trying to restructure a large project is almost always hard. The way we think about problems is constrained by the tools we have to solve them. There are many cases when doing such refactoring when it simply makes more sense to use pointers than try to re-express basic logic to be class and smart pointer friendly. I think it's less of a case where smart pointers are over-used and more of a case when classes are under used. YMMV ;-)
Of course there are use cases for raw pointers in modern C++:
interfaces that must be compile-able as pure C (although the implementation itself may make use of those C++ features, that are not also C features, like classes, or smart-pointers)
code that is extremely low level, so low level, that even the simplest smart-pointer proves as being to heavy-weight
Of course those are rather rare cases and for by far the most use cases of pointers smart pointers should be just fine for new code, BUT:
If the existing code works fine with raw pointers, why invest time to rewrite it and risk to add bugs when converting it to a smart-pointer using version?
Don't refactor code, that is working fine, just because the new code better follows modern programming-standards. These programming standards are not existing for their own sake, but to make working with some code easier, so don't do refactoring, which will cost you more time than they can save you in the future.
That means: If it will cost you more time to check, which of those pointers can be safely converted into smart-pointers and which can't and to hunt the bugs, that your refactoring may have introduced, than you will be able to save on future maintenance work due to the refactoring, then simply don't refactor and let it stay as it is.
If the refactoring will save more time than it costs for only some parts of the code base, then consider to only refactor those parts of the code base.

Compacting garbage collector implementation in C++0x

I'm implementing a compacting garbage collector for my own personal use in C++0x, and I've got a question. Obviously the mechanics of the collector depend upon moving objects, and I've been wondering how to implement this in terms of the smart pointer types that point to it. I've been thinking about either pointer-to-pointer in the pointer type itself, or, the collector maintains a list of pointers that point to each object so that they can be modified, removing the need for a double de-ref when accessing the pointer but adding some extra overhead during collection and additional memory overhead. What's the best way to go here?
Edit: My primary concern is for speedy allocation and access. I'm not concerned with particularly efficient collections or other maintenance, because that's not really what the GC is intended for.
There's nothing straight forward about grafting on extra GC to C++, let alone a compacting algorithm. It isn't clear exactly what you're trying to do and how it will interact with the rest of the C++ code.
I have actually written a gc in C++ which works with existing C++ code, and it had a compactor at one stage (though I dropped it because it was too slow). But there are many nasty semantic problems. I mentioned to Bjarne only a few weeks ago that C++ lacks the operator required to do it properly and the situation is that it is unlikely to ever exist because it has limited utility..
What you actually need is a "re-addres-me" operator. What happens is that you do not actually move objects around. You just use mmap to change the object address. This is much faster, and, in effect, it is using the VM features to provide handles.
Without this facility you have to have a way to perform an overlapping move of an object, which you cannot do in C++ efficiently: you'd have to move to a temporary first. In C, it is much easier, you can use memmove. At some stage all the pointers to or into the moved objects have to be adjusted.
Using handles does not solve this problem, it just reduces the problem from arbitrary sized objects to constant sized ones: these are easier to manage in an array, but the same problem exists: you have to manage the storage. If you remove lots of handle from the array randomly .. you still have a problem with fragmentation.
So don't bother with handles, they don't work.
This is what I did in Felix: you call new(shape, collector) T(args). Here the shape is a descriptor of the type, including a list of offsets which contain (GC) pointers, and the address of a routine to finalise the object (by default, it calls the destructor).
It also contains a flag saying if the object can be moved with memmove. If the object is big or immobile, it is allocated by malloc. If the object is small and mobile, it is allocated in an arena, provided there is space in the arena.
The arena is compacted by moving all the objects in it, and using the shape information to globally adjust all the pointers to or into these objects. Compaction can be done incrementally.
The downside for a C++ programmer is the need to construct a correct shape object to pass. This doesn't bother me because I'm implementing a language which can generate the shape information automatically.
Now: the key point is: to do compaction, you must use a precise collector. Compaction cannot work with a conservative collector. This is very important. It is fine to allow some leakage if you see an value that looks like a pointer but happens to be an integer: some object won't be collected, but this is usually no big deal. But for compaction you have to adjust the pointers but you'd better not change that integer: so you have to know for sure when something is a pointer, so your collector has to be precise: the shape must be known.
In Ocaml this is relatively simple: everything is either a pointer or integer and the low bit is used at run time to tell. Objects pointed at have a code telling the type, and there are only a few types: either a scalar (don't scan it) or an aggregate (scan it, it only contains integers or pointers).
This is a pretty straight-forward question so here's a straight-forward answer:
Mark-and-sweep (and occasionally mark-and-compact to avoid heap fragmentation) is the fastest when it comes to allocation and access (avoiding double de-refs). It's also very easy to implement. Since you're not worried about collection performance impact (mark-and-sweep tends to freeze up the process in a nondeterministically), this should be the way to go.
Implementation details found at:
http://www.brpreiss.com/books/opus5/html/page424.html#secgarbagemarksweep
http://www.brpreiss.com/books/opus5/html/page428.html
A nursery generation will give you the best possible allocation performance because it is just a pointer bump.
You could implement pointer updates without using double indirection by using techniques like a shadow stack but this will be slow and very error prone if you're writing this C++ code by hand.

pros and cons of smart pointers

I came to know that smart pointer is used for resource management and supports RAII.
But what are the corner cases in which smart pointer doesn't seem smart and things to be kept in mind while using it ?
Smart pointers don't help against loops in graph-like structures.
For example, object A holds a smart pointer to object B and object B - back to object A. If you release all pointers to both A and B before disconnection A from B (or B from A) both A and B will hold each other and form a happy memory leak.
Garbage collection could help against that - it could see that both object are unreachable and free them.
I would like to mention performance limitations. Smart pointers usually use atomic operations (such as InterlockedIncrement in Win32 API) for reference counting. These functions are significantly slower than plain integer arithmetic.
In most cases, this performance penalty is not a problem, just make sure you don't make too many unnecessary copies of the smart pointer object (prefer to pass the smart pointer by reference in function calls).
This is quite interesting: Smart Pointers.
It's a sample chapter from "Modern C++ Design" by Andrei Alexandrescu.
Watch out at the transitions - when assigning between raw and smart pointers. Bad smart pointers - like _com_ptr_t - make it worse by allowing implicit conversions. Most errors happen at the transition.
Watch out for cycles - as mentioned, you need weak pointers to break the cycles. However, in a complex graph that's not always easy to do.
Too much choice - most libraries offer different implementations with different advantages / drawbacks. Unfortunately, most of the time these different variants are not compatible, which becomes a probem when mixing libraries. (say, LibA uses LOKI, LibB uses boost). Having to plan ahead for enable_shared_from_this sucks, having to decide naming conventions between intrusive_ptr, shared_ptr and weak_ptr for a bunch of objects sucks.
For me, the single most e advantage of shared_ptr (or one of similar functionality) is that it is coupled to its destruction policy at creation. Both C++ and Win32 offers so many ways of getting rid of things it's not even funny. Coupling at construction time (without affecting the actual type of the pointer) means I have both policies in one place.
Beside technical limitations (already mentioned : circular dependencies), i'd say that the most important thing about smart pointers is to remember that it's still a workaround to get heap-allocated-objects deleted.
Stack allocation is the best option for most cases - along with the use of references - to manage the lifetime of objects.
The following article is a very interesting paper
Smart Pointers: Can't Live With 'Em, Can't Live Without 'Em
Here are a few things
There's no definite entity which destroys the object. Often it is desired to know exactly when and how an object is being destroyed
Circular dependencies - If they exist you'll have a memory leak. That's usually where weak_ptr come to the rescue.
There is a problem with reference counting in certain types of data structures that have cycles. There can also be problems with accessing smart pointers from multiple threads, concurrent access to reference counts can cause problems. There's a utility in boost called atomic.hpp that can mitigate this problem.
Many people run into problems when using smart pointers mixed with raw pointers (to the same objects). A typical example is when interacting with an API that uses raw pointers.
For example; in boost::shared_ptr there is a .get() function that returns the raw pointer. Good functionality if used with care, but many people seem to trip on it.
IMHO it's an example of a "leaky abstraction".
Raymond Chen is notoriously ambivalent about smart pointers. There are issues about when the destructor actually runs (please note, the destructor runs at a well-defined time in a well-defined order; it's just that once in a while you'll forget that it's after the last line in your function).
Also remember that "smart pointer" is a pretty big category. I include std::vector in that category (a std::vector is essentially a smart array).

Once you've adopted boost's smart pointers, is there any case where you use raw pointers?

I'm curious as I begin to adopt more of the boost idioms and what appears to be best practices I wonder at what point does my c++ even remotely look like the c++ of yesteryear, often found in typical examples and in the minds of those who've not been introduced to "Modern C++"?
I don't use shared_ptr almost at all, because I avoid shared ownership in general. Therefore, I use something like boost::scoped_ptr to "own" an object, but all other references to it will be raw pointers. Example:
boost::scoped_ptr<SomeType> my_object(new SomeType);
some_function(my_object.get());
But some_function will deal with a raw pointer:
void some_function(SomeType* some_obj)
{
assert (some_obj);
some_obj->whatever();
}
Just a few off the top of my head:
Navigating around in memory-mapped files.
Windows API calls where you have to over-allocate (like a LPBITMAPINFOHEADER).
Any code where you're munging around in arbitrary memory (VirtualQuery() and the like).
Just about any time you're using reinterpret_cast<> on a pointer.
Any time you use placement-new.
The common thread here is "any situation in which you need to treat a piece of memory as something other than a resource over which you have allocation control".
These days I've pretty much abandoned all use of raw pointers. I've even started looking through our code base for places where raw pointers were used and switched them to a smart pointer variant. It's amazing how much code I've been able to delete by doing this simple act. There is so much code wasted on lifetime management of raw C++ pointers.
The only places where I don't use pointers is for a couple of interop scenarios with other code bases I don't have control over.
I find the primary difference between 'modern' C++ and the old* stuff is careful use of class invariants and encapsulation. Well organised code tends naturally to have fewer pointers flying around. I'm almost as nervous swimming in shared_ptrs as I would be in news and deletes.
I'm looking forward to unique_ptr in C++0x. I think that will tidy away the few (smart) pointers that do still roam the wild.
*still unfortunately very common
Certainly any time you're dealing with a legacy library or API you'll need to pass a raw pointer, although you'll probably just extract it from your smart pointer temporarily.
In fact it is always safe to pass a raw pointer to a function, as long as the function does not try to keep a copy of the pointer in a global or member variable, or try to delete it. With these restrictions in place, the function cannot affect the lifetime of the object, and the only reason for a smart pointer is to manage the object lifetime.
I still use regular pointers in resource-sensitive code or other code that needs tiny footprint, such as certain exceptions, where I cannot assume that any data is valid and must also assume that I am running out of memory too.
Managed memory is almost always superior to raw otherwise, because it means that you don't have to deal with deleting it at the right place, but still have great control over the construction and destruction points of your pointers.
Oh, and there's one other place to use raw pointers:
boost::shared_ptr<int> ptr(new int);
I still use raw pointers on devices that have memory mapped IO, such as embedded systems, where having a smart pointer doesn't really make sense because you will never need or be able to delete it.
If you have circular data structures, e.g., A points to B and B points back to A, you can't use naively use smart pointers for both A and B, since then the objects will only be freed extra work. To free the memory, you have to manually clear the smart pointers, which is about as bad as the delete the smart pointers get rid of.
You might thing this doesn't happen very often, but suppose you have Parent object that has smart pointers to a bunch of Child objects. Somewhere along the way someone needs to look up a the Parent for a Child, so they add a smart pointer member to Child that points back to the parent. Silently, memory is no longer freed.
Some care is required. Smart pointers are not equivalent to garbage collection.
I'm writing C++ that has to co-exist with Objective C (using Objective C++ to bridge).
Because C++ objects declared as part of Objective C++ classes don't have constructors or destructors called you can't really hold them there in smart pointers.
So I tend to use raw pointers, although often with boost::intrustive_ptr and an internal ref count.
Not that I would do it, but you need raw pointers to implement, say, a linked list or a graph. But it would be much smarter to use std::list<> or boost::graph<>.