In Fortran one can allocate memory to a pointer, or one cannot:
real(kind=jp), target :: bt(100,100)
real(kind=jp), pointer :: pt(:,:)
But then you can allocate memory to the pointer pt:
allocate(pt(100,100))
My question is: what are the pros and cons? For all I can see, allocating memory to the pointer defeats the purpose of a pointer and uses up more memory. Granted, my knowledge of pointers is limited so if some could explain to me what going on here, I would grateful.
I'm using a model with mixed FORTRAN 77 and Fortran 90 code, plus, I'm compiling the code using Intel compilers.
As Vladimir F said in a comment to another answer, there may well be some confusion as to what the example code is doing. Although not a wide answer all about pointers (which is already partly covered), I'll comment briefly.
real(kind=jp), target :: bt(100,100)
real(kind=jp), pointer :: pt(:,:)
The pointer pt is not at this point pointing to anything: it has undefined association status. That you have the two objects declared together, one with the target attribute and one with the pointer attribute, doesn't signify something.
To associate pt with the target bt, pointer assignment is required:
pt => bt
In this way, one can go ahead and do things to bt through the pointer pt without any extra memory allocation (there will be some overhead associated with the pointer variable, but let's ignore that).
Yes, one also could do pointer allocation as in the question
allocate(pt(100,100))
but this newly minted lump of memory has nothing to do with bt.
As Alexander Vogt said in modern times there are reduced reasons to want to do that sort of thing (using allocatable arrays instead is a good thing to consider).
In summary, for "what are the pros and cons?": pointer assignment and allocation do totally different things. Choose whichever is appropriate.
Allocatable pointers as you use them were widely in use before dummy arguments could be allocatable (Fortran 2003). This way you could pass an array and allocate it in another routine, which is immensely helpful for input routines where you don't know the input size before-hand.
Allocated pointers behave very similarly to the usual counterparts, but are not automatically deallocated. Some say that using pointers this way is detrimental on performance, but I have never experienced this myself (probably compiler specific). One issue with pointers, though, is possible aliasing when used as dummy arguments.
Nowadays, I would not use pointers when I could use an allocatable array instead. This has many benefits, key among them is the automatic deallocation and better readability of the code. (If you like, you can also use automatic allocation of the LHS, but I usually turn that off).
For OOP pointers are essential and I use them a lot for e.g. linked lists, trees, etc. If you have nested derived types, it is also quite elegant to use pointers to reduce de-referencing.
Related
I wrote some array code that allocates memory and each value in the array is a type. I then have another array that consists of in to the first array for references.
Both arrays can grow. It uses realloc. Because the 2nd array contains pointers in to the first, they are surely not updated when the first array changes(I don't do it manually and there is no GC). Surely all the pointers in the 2nd array are invalid! (they point to memory that was free'ed by realloc).
This is the case right?
This seems like it would make persistent pointers to blocks of memory that may move very dangerous?
What is the standard solution? Don't use "global" pointers? Using pointers to pointers to pointers? I think I could make the 2nd array use **'s and could probably get things to work.
In a MT environment, things are even worse. Local pointers access may be moved in the middle, then the memory changed, and the local pointer is now wrong. (Which, of course might be solved by preventing the moves by lock, etc...)
Go with functional programming?
Yes, the realloc can invalidate your references. If there is no continuous space for relocating your array will be moved.
Consider using a container as the std::deque.
1) This is the case right?
Yes.
2) This seems like it would make persistent pointers to blocks of memory that may move very dangerous?
Yes.
3) What is the standard solution?
You design your application such that the life-times of your objects is well defined so that you do not refer to them after they are no longer required.
4) In a MT environment, things are even worse. Local pointers access may be moved in the middle, then the memory changed, and the local pointer is now wrong. (Which, of course might be solved by preventing the moves by lock, etc...)
Obviously you should never use a pointer that no-longer pointers to its resource. Managing shared resources in a MT environment is non trivial and there are a whole bunch of tools and techniques to achieve it.
5) Go with functional programming?
It is always advisable to avoid pointers if you can.
Without a specific problem it is hard to give a specific solution. But in order to achieve "not pointing at disappeared resources" we have various tools to employ. We have smart pointers, we have containers and we have value semantics. We need to understand how to use all of those but also we need to design with object lifetime in mind as a major consideration.
Object life-time should always be an important factor. However some languages (like Java for instance) mitigate against bad-design by providing a "safer" environment. C++, on the other hand, is rather less forgiving. However it does have a whole bunch of sophisticated tools for the task. That means a steeper learning curve but more efficiency and better control.
The std::get_temporary_buffer returns a std::pair holding a pointer to the beginning of the allocated storage and the number of objects allocated, and the only purpose of its counterpart: std::return_temporary_buffer is to deallocate memory previously allocated with std::get_temporary_buffer.
Both functions lies on the <memory> header which main purpose is to provide tools to enhance memory management (as it name implies) and make memory management more secure.
About the security of the memory management, the <memory> header provides also the smart pointers utility which allows to manage the memory in a RAII-like manner and hence making the memory management exception safe.
C++14 also added the std::make_unique helper function, so we can avoid using raw pointers in many cases nowadays.
With all this efforts in reducing the use of raw pointers, realizing that std::get_temporary_buffer returns a raw pointer instead of a smart pointer is pretty confusing. Thats why I want to ask:
Is there any reason for std::get_temporary_buffer to return a raw pointer instead of returning a smart one?
If there's a reason for this "old fashioned" way to allocate and deallocate memory manually, which goal it have that cannot be achieved with smart pointers?
The simple answer is that std::get_temporary_buffer was created before smart pointers were standardized, and changing the return value of std::get_temporary_buffer in C++11 would have broken code that depended on it, which is absolutely unacceptable for the C++ standard library.
Now, why haven't they standardized a new smart pointer equivalent?
Well, maybe no one was interested in having one. Personally, I find it weird to have one smart pointer own many objects. If you need a smart array, use std::vector.
If you look at the docs for the old SGI STL implementations of get_temporary_buffer et al, they say...
Note: get_temporary_buffer and return_temporary_buffer are only provided for backward compatibility. If you are writing new code, you should instead use the temporary_buffer class.
That effectively acknowledges the desirability of better automated management. GCC added temporary_buffer as an extension (see here), but it never made it into the Standard. Long and short of it is that it's just not that useful, so having a better interface won't have been a priority. The whole notion of the OS guessing at whether it should give you all the requested memory or some smaller amount flies in the face of the optimistic memory allocation strategies used by most modern Operating Systems, and once you get multiple calls requesting more than the easily available memory, being too generous with the first leaves the others a bit starved: just not a very practical notion.
If you care, you could submit a proposal for a later C++ Standard....
I saw some post about implement GC in C and some people said it's impossible to do it because C is weakly typed. I want to know how to implement GC in C++.
I want some general idea about how to do it. Thank you very much!
This is a Bloomberg interview question my friend told me. He did badly at that time. We want to know your ideas about this.
Garbage collection in C and C++ are both difficult topics for a few reasons:
Pointers can be typecast to integers and vice-versa. This means that I could have a block of memory that is reachable only by taking an integer, typecasting it to a pointer, then dereferencing it. A garbage collector has to be careful not to think a block is unreachable when indeed it still can be reached.
Pointers are not opaque. Many garbage collectors, like stop-and-copy collectors, like to move blocks of memory around or compact them to save space. Since you can explicitly look at pointer values in C and C++, this can be difficult to implement correctly. You would have to be sure that if someone was doing something tricky with typecasting to integers that you correctly updated the integer if you moved a block of memory around.
Memory management can be done explicitly. Any garbage collector will need to take into account that the user is able to explicitly free blocks of memory at any time.
In C++, there is a separation between allocation/deallocation and object construction/destruction. A block of memory can be allocated with sufficient space to hold an object without any object actually being constructed there. A good garbage collector would need to know, when it reclaims memory, whether or not to call the destructor for any objects that might be allocated there. This is especially true for the standard library containers, which often make use of std::allocator to use this trick for efficiency reasons.
Memory can be allocated from different areas. C and C++ can get memory either from the built-in freestore (malloc/free or new/delete), or from the OS via mmap or other system calls, and, in the case of C++, from get_temporary_buffer or return_temporary_buffer. The programs might also get memory from some third-party library. A good garbage collector needs to be able to track references to memory in these other pools and (possibly) would have to be responsible for cleaning them up.
Pointers can point into the middle of objects or arrays. In many garbage-collected languages like Java, object references always point to the start of the object. In C and C++ pointers can point into the middle of arrays, and in C++ into the middle of objects (if multiple inheritance is used). This can greatly complicate the logic for detecting what's still reachable.
So, in short, it's extremely hard to build a garbage collector for C or C++. Most libraries that do garbage collection in C and C++ are extremely conservative in their approach and are technically unsound - they assume that you won't, for example, take a pointer, cast it to an integer, write it to disk, and then load it back in at some later time. They also assume that any value in memory that's the size of a pointer could possibly be a pointer, and so sometimes refuse to free unreachable memory because there's a nonzero chance that there's a pointer to it.
As others have pointed out, the Boehm GC does do garbage collection for C and C++, but subject to the aforementioned restrictions.
Interestingly, C++11 includes some new library functions that allow the programmer to mark regions of memory as reachable and unreachable in anticipation of future garbage collection efforts. It may be possible in the future to build a really good C++11 garbage collector with this sort of information. In the meantime though, you'll need to be extremely careful not to break any of the above rules.
Look into the Boehm Garbage Collector.
C isn't C++, but both have the same "weakly typed" issues. It's not the implicit typecasts that cause an issue, though, but the tendency towards "punning" (subverting the type system), especially in data structure libraries.
There are garbage collectors out there for C and/or C++. The Boehm conservative collector is probably the best know. It's conservative in that, if it sees a bit pattern that looks like a pointer to some object, it doesn't collect that object. That value might be some other type of value completely, so the object could be collected, but "conservative" means playing safe.
Even a conservative collector can be fooled, though, if you use calculated pointers. There's a data structure, for example, where every list node has a field giving the difference between the next-node and previous-node addresses. The idea is to give double-linked list behaviour with a single link per node, at the expense of more complex iterators. Since there's no explicit pointer anywhere to most of the nodes, they may be wrongly collected.
Of course this is a very exceptional special case.
More important - you can either have reliable destructors or garbage collection, not both. When a garbage cycle is collected, the collector cannot decide which destructor to call first.
Since the RAII pattern is pervasive in C++, and that relies on destructors, there is IMO a conflict. There may be valid exceptions, but my view is that if you want garbage collection, you should use a language that's designed from the ground up for garbage collection (Java, C#, ...).
You could either use smart pointers or create your own container object which will track references and handle memory allocation etc. Smart pointers would probably be preferable. Often times you can avoid dynamic heap allocation altogether.
For example:
char* pCharArray = new char[128];
// do some stuff with characters
delete [] pCharArray;
The danger with the above being if anything throws between the new and the delete your delete will not be executed. Something like above could easily be replaced with safer "garbage collected" code:
std::vector<char> charArray;
// do some stuff with characters
Bloomberg has notoriously irrelevant interview questions from a practical coding standpoint. Like most interviewers they are primarily concerned with how you think and your communication skills than the actual solution though.
You can read about the shared_ptr struct.
It implements a simple reference-counting garbage collector.
If you want a real garbage collector, you can overload the new operator.
Create a struct similar to shared_ptr, call it Object.
This will wrap the new object created. Now with overloading its operators, you can control the GC.
All you need to do now, is just implement one of the many GC algorithms
The claim you saw is false; the Boehm collector supports C and C++. I suggest reading the Boehm collector's documentation (particularly this page)for a good overview of how one might write a garbage collector in C or C++.
I'm implementing a compacting garbage collector for my own personal use in C++0x, and I've got a question. Obviously the mechanics of the collector depend upon moving objects, and I've been wondering how to implement this in terms of the smart pointer types that point to it. I've been thinking about either pointer-to-pointer in the pointer type itself, or, the collector maintains a list of pointers that point to each object so that they can be modified, removing the need for a double de-ref when accessing the pointer but adding some extra overhead during collection and additional memory overhead. What's the best way to go here?
Edit: My primary concern is for speedy allocation and access. I'm not concerned with particularly efficient collections or other maintenance, because that's not really what the GC is intended for.
There's nothing straight forward about grafting on extra GC to C++, let alone a compacting algorithm. It isn't clear exactly what you're trying to do and how it will interact with the rest of the C++ code.
I have actually written a gc in C++ which works with existing C++ code, and it had a compactor at one stage (though I dropped it because it was too slow). But there are many nasty semantic problems. I mentioned to Bjarne only a few weeks ago that C++ lacks the operator required to do it properly and the situation is that it is unlikely to ever exist because it has limited utility..
What you actually need is a "re-addres-me" operator. What happens is that you do not actually move objects around. You just use mmap to change the object address. This is much faster, and, in effect, it is using the VM features to provide handles.
Without this facility you have to have a way to perform an overlapping move of an object, which you cannot do in C++ efficiently: you'd have to move to a temporary first. In C, it is much easier, you can use memmove. At some stage all the pointers to or into the moved objects have to be adjusted.
Using handles does not solve this problem, it just reduces the problem from arbitrary sized objects to constant sized ones: these are easier to manage in an array, but the same problem exists: you have to manage the storage. If you remove lots of handle from the array randomly .. you still have a problem with fragmentation.
So don't bother with handles, they don't work.
This is what I did in Felix: you call new(shape, collector) T(args). Here the shape is a descriptor of the type, including a list of offsets which contain (GC) pointers, and the address of a routine to finalise the object (by default, it calls the destructor).
It also contains a flag saying if the object can be moved with memmove. If the object is big or immobile, it is allocated by malloc. If the object is small and mobile, it is allocated in an arena, provided there is space in the arena.
The arena is compacted by moving all the objects in it, and using the shape information to globally adjust all the pointers to or into these objects. Compaction can be done incrementally.
The downside for a C++ programmer is the need to construct a correct shape object to pass. This doesn't bother me because I'm implementing a language which can generate the shape information automatically.
Now: the key point is: to do compaction, you must use a precise collector. Compaction cannot work with a conservative collector. This is very important. It is fine to allow some leakage if you see an value that looks like a pointer but happens to be an integer: some object won't be collected, but this is usually no big deal. But for compaction you have to adjust the pointers but you'd better not change that integer: so you have to know for sure when something is a pointer, so your collector has to be precise: the shape must be known.
In Ocaml this is relatively simple: everything is either a pointer or integer and the low bit is used at run time to tell. Objects pointed at have a code telling the type, and there are only a few types: either a scalar (don't scan it) or an aggregate (scan it, it only contains integers or pointers).
This is a pretty straight-forward question so here's a straight-forward answer:
Mark-and-sweep (and occasionally mark-and-compact to avoid heap fragmentation) is the fastest when it comes to allocation and access (avoiding double de-refs). It's also very easy to implement. Since you're not worried about collection performance impact (mark-and-sweep tends to freeze up the process in a nondeterministically), this should be the way to go.
Implementation details found at:
http://www.brpreiss.com/books/opus5/html/page424.html#secgarbagemarksweep
http://www.brpreiss.com/books/opus5/html/page428.html
A nursery generation will give you the best possible allocation performance because it is just a pointer bump.
You could implement pointer updates without using double indirection by using techniques like a shadow stack but this will be slow and very error prone if you're writing this C++ code by hand.
I'm programming in C++ right now, and I love using pointers. But it seems that other, newer languages like Java, C#, and Python don't allow you to explicitly declare pointers. In other words, you can't write both int x and int * y, and have x be a value while y is a pointer, in any of those languages. What is the reasoning behind this?
Pointers aren't bad, they are just easy to get wrong. In newer languages they have found ways of doing the same things, but with less risk of shooting yourself in the foot.
There is nothing wrong with pointers though. Go ahead and love them.
Toward your example, why would you want both x and y pointing to the same memory? Why not just always call it x?
One more point, pointers mean that you have to manage the memory lifetime yourself. Newer languages prefer to use garbage collection to manage the memory and allowing for pointers would make that task quite difficult.
I'll start with one of my favorite Scott Meyers quotes:
When I give talks on exception handling, I teach people two things:
POINTERS ARE YOUR ENEMIES, because they lead to the kinds of problems that auto_ptr is designed to eliminate.
POINTERS ARE YOUR FRIENDS, because operations on pointers can't throw.
Then I tell them to have a nice day :-)
The point is that pointers are extremely useful and it's certainly necessary to understand them when programming in C++. You can't understand the C++ memory model without understanding pointers. When you are implementing a resource-owning class (like a smart pointer, for example), you need to use pointers, and you can take advantage of their no-throw guarantee to write exception-safe resource owning classes.
However, in well-written C++ application code, you should never have to work with raw pointers. Never. You should always use some layer of abstraction instead of working directly with pointers:
Use references instead of pointers wherever possible. References cannot be null and they make code easier to understand, easier to write, and easier to code review.
Use smart pointers to manage any pointers that you do use. Smart pointers like shared_ptr, auto_ptr, and unique_ptr help to ensure that you don't leak resources or free resources prematurely.
Use containers like those found in the standard library for storing collections of objects instead of allocating arrays yourself. By using containers like vector and map, you can ensure that your code is exception safe (meaning that even when an exception is thrown, you won't leak resources).
Use iterators when working with containers. It's far easier to use iterators correctly than it is to use pointers correctly, and many library implementations provide debug support for helping you to find where you are using them incorrectly.
When you are working with legacy or third-party APIs and you absolutely must use raw pointers, write a class to encapsulate usage of that API.
C++ has automatic resource management in the form of Scope-Bound Resource Management (SBRM, also called Resource Acquisition is Initialization, or RAII). Use it. If you aren't using it, you're doing it wrong.
Pointers can be abused, and managed languages prefer to protect you from potential pitfalls. However, pointers are certainly not bad - they're an integral feature of the C and C++ languages, and writing C/C++ code without them is both tricky and cumbersome.
A true "pointer" has two characteristics.
It holds the address of another object (or primitive)
and exposes the numeric nature of that address so you can do arithmetic.
Typically the arithmetic operations defined for pointers are:
Adding an integer to a pointer into an array, which returns the address of another element.
Subtracting two pointers into the same array, which returns the number of elements in-between (inclusive of one end).
Comparing two pointers into the same array, which indicates which element is closer to the head of the array.
Managed languages generally lead you down the road of "references" instead of pointers. A reference also holds the address of another object (or primitive), but arithmetic is disallowed.
Among other things, this means you can't use pointer arithmetic to walk off the end of an array and treat some other data using the wrong type. The other way of forming an invalid pointer is taken care of in such environments by using garbage collection.
Together this ensures type-safety, but at a terrible loss of generality.
I try to answer OP's question directly:
In other words, you can't write both
int x and int * y, and have x be a
value while y is a pointer, in any of
those languages. What is the reasoning
behind this?
The reason behind this is the managed memory model in these languages. In C# (or Python, or Java,...) the lifetime of resources and thus usage of memory is managed automatically by the underlying runtime, or by it's garbage collector, to be precise. Briefly: The application has no control over the location of a resource in memory. It is not specified - and is even not guaranteed to stay constant during the lifetime of a resource. Hence, the notion of pointer as 'a location of something in virtual or physical memory' is completely irrelevant.
As someone already mentioned, pointers can, and actually, will go wrong if you have a massive application. This is one of the reasons we sometimes see Windows having problems due to NULL pointers created! I personally don't like pointers because it causes terrible memory leak and no matter how good you manage your memory, it will eventually hunt you down in some ways. I experienced this a lot with OpenCV when working around image processing applications. Having lots of pointers floating around, putting them in a list and then retrieving them later caused problems for me. But again, there are good sides of using pointers and it is often a good way to tune up your code. It all depends on what you are doing, what specs you have to meet, etc.
Pointers aren't bad. Pointers are difficult. The reason you can't have int x and int * y in Java, C#, etc., is that such languages want to put you away from solving coding problems (which are eventually subtle mistakes of yours), and they want to bring you closer to producing solutions to your project. They want to give you productivity.
As I said before, pointers aren't bad, they're just diffucult. In a Hello World programs pointers seems to be a piece of cake. Nevertheless when the program start growing, the complexity of managing pointers, passing the correct values, counting the objects, deleting the pointers, etc. start to get complex.
Now, from the point of view of the programmer (the user of the language, in this case you) this would lead to another problem that will become evident over the time: the longer it takes you don't read the code, the more difficult it will become to understand it again (i.e. projects from past years, months or event days). Added to this the fact that sometimes pointers use more than one level of indirection (TheClass ** ptr).
Last but not least, there are topics when the pointers application is very useful. As I mention before, they aren't bad. In Algorithms field they are very useful because in mathemical context you can refer to a specific value using a simple pointer (i.e. int *) and change is value without creating another object, and ironically it becomes easier to implement the algorithms with the use of pointers rather than without it.
Reminder: Each time when you ask why pointers or another thing is bad or is good, instead try to think in the historical context around when such topic or technology emerged and the problem that they tried to solve. In this case, pointers where needed to access to the memory of the PDP computers at Bell laboratories.