Detect dynamically allocated object? - c++

Can I check if an object (passed by pointer or reference) is dynamically allocated?
Example:
T t;
T* pt = new T();
is_tmp(&t); // false
is_tmp(pt); // true
Context
I perfectly realise this smells like bad design, and as a matter of fact it is, but I am trying to extend code I cannot (or should not) modify (of course I blame code that isn't mine ;) ). It calls a method (which I can override) that will delete the passed object among other things that are only applicable to dynamically allocated objects. Now, I want to check whether I have something that is okay to be deleted or if it is a temporary.
I will never pass a global (or static) variable, so I leave this undefined, here.

Not portably. Under Solaris or Linux on a PC (at least 32 bit Linux),
the stack is at the very top of available memory, so you can compare the
address passed in to the address of a local variable: if the address
passed in is higher than that of the local variable, the object it
points to is either a local variable or a temporary, or a part of a
local variable or temporary. This technique, however, invokes undefined
behavior right and left—it just happens to work on the two
platforms I mention (and will probably work on all platforms where the
stack is at the top of available memory and grows down).
FWIW: you can also check for statics on these machines. All statics are
at the bottom of memory, and the linker inserts a symbol end at the
end of them. So declare an external data (of any type) with this name,
and compare the address with it.
With regards to possibly deleting the object, however... just knowing
that the object is not on the heap (nor is a static) is not enough. The
object might be a member of a larger dynamically allocated object.

In general, as DeadMG said, there's no way you can tell from a pointer where it comes from. However, as a debugging or porting or analyzing measure, you could add a member operator new to your class which tracks dynamic allocations (provided nobody uses the explicit global ::new -- that includes containers, I'm afraid). You could then build up a set<T*> of dynamically allocated memory and search in there.
That's not at all suitable for any sort of serious application, but perhaps this can help you track where things are coming from. You can even add debug messages with line numbers to your operator.

No, it's impossible to know. You should fix the bug. In the least case, you can use a smart pointer (like shared_ptr) and give it an empty custom destructor if you don't want it to be deleted.

If you have access to the dynamic memory allocator code itself, you could scan the internal structure and see if the current pointer is in its allocated list/stack/area or however it is being stored. Quite often they are stored as linked list style structs and it wouldn't be too hard to scan for your var's address.

In my opinion it should be possible
because you can check if the memory is on the heap or on the stack
This is going to be highly platform depended code
First you have to get the range of the heap, and then you have to check if the passed memory adress is in this range...
(sounds simple, but the first step is probably tricky :-) )

Related

A pointer variable in C++ points only to variables in the heap?

According to Warford's "Computer Systems" (4th ed), "When you declare a global or local variable, you specify its type. For example, you can specify the type to be an integer, or a character, or an array. Similarly, when you declare a pointer, you must declare that it points to some type. The pointer itself can be global or local. The value to which it points, however, resides in the heap and is neither global nor local."
I know that you can create local variables in the stack, global variables in a fixed location, and dynamically allocated variables in the heap. Can pointers only point to dynamically allocated variables then? Why can't a pointer point to global or local variables?
Thanks
The value to which it points, however, resides in the heap and is neither global nor local.
That is incorrect.
Can pointers only point to dynamically allocated variables then?
No.
Why can't a pointer point to global or local variables?
They sure can.
You might want to read the book more carefully to see whether the statement in book was made with some additional context. If not, it's time to abandon the book and learn from a different one.
The pointer point to the place in memory where the variable is located. I don't think you should care about where it is. You can make a pointer to absolutely any variable even outside of your program (however this is dangerous and may also crash your app). Pointer is just a number, address in memory. There are many cases where you point to global/local variable. For example if you want to pass class/array to some function.
A pointer can point to local variables or global variables (or fields inside them). But you might prefer references in that case.
A pointer can also point to memory obtained by some ad-hoc mechanism. For instance, the ::operator new on Linux would indirectly call mmap(2) (or similar system calls) which grows the virtual address space of your process. Or in some embedded processors you have special primitives to get some memory (e.g. switch memory banks), etc...
A pointer can also point to some address outside of your virtual address space. Then dereferencing it is undefined behavior and often gives a segmentation fault.
However, it is often (but not always) considered bad taste to point to local or global data. It is often simpler to code with the convention that pointers should be to heap allocated memory. Read also about smart pointers (notably std::shared_ptr) and reference counting.
(there are some excellent reasons to use pointers to local data on the call stack or to global data, but that should be done with caution and care; I leave you to find out more about them, perhaps by studying existing C++ source code - e.g. of free software)
Read also about memory leaks and dangling pointers. You should avoid both of them. On Linux and POSIX systems, learn about valgrind (a tool to help hunting memory related bugs).
The terminology and concepts of garbage collection are useful to know.
A common style might be (for pointers stored in fields of classes or structures) to allocate memory for them with new in constructors and to release with delete in destructors.
Pointers can point to any object, such that statement "The value to which it points, however, resides in the heap" is at least incomplete, actually wrong. A pointer can point to any object, regardless of whether it resides on a "stack", on a "heap" or on same "fixed" memory, and regardless of whether it is a "global variable" or a "local variable". Note that all these terms are imprecise and not even used in the standard in the context of objects or pointers. The standard just talks about storage duration of objects (automatic, static, dynamic), and about the scope (i.e. visibility) of the variable name (block, function, namespace, global namespace,...).
So the only important thing is that a pointer can point to any object, regardless of its storage duration, but it must not be "used" (i.e. dereferenced) before the lifetime of the object it points to has begun or after the lifetime has ended. Otherwise, the behaviour of your program is undefined.
BTW: regardless of how old your book is, there has not been any C++ standard where the statement "points always to the heap" has been true.

Pointer pointing to deleted stack memory

I believe what I have just experienced is called "undefined behavior", but I'm not quite sure. Basically, I had an instance declared in an outer scope that holds addresses of a class. In the inner level I instantiated an object on the stack and stored the address of that instance into the holder.
After the inner scope had escaped, I checked to see if I could still access methods and properties of the removed instance. To my surprise it worked without any problem.
Is there a simple way to combat this? Is there a way I can clear deleted pointers from the list?
example:
std::vector<int*> holder;
{
int inside = 12;
holder.push_back(&inside);
}
cout << "deleted variable:" << holder[0] << endl;
Is there a simple way to combat this?
Sure, there are a number of ways to avoid this sort of problem.
The easiest way would be to not use pointers at all -- pass objects by value instead. i.e. In your example code, you could use a std::vector<int> instead of a std::vector<int *>.
If your objects are not copy-able for some reason, or are large enough that you think it will be too expensive to make copies of them, you could allocate them on the heap instead, and manage their lifetimes automatically using shared_ptr or unique_ptr or some other smart-pointer class. (Note that passing objects by value is more efficient than you might think, even for larger objects, since it avoids having to deal with the heap, which can be expensive... and modern CPUs are most efficient when dealing with contiguous memory. Finally, modern C++ has various optimizations that allow the compiler to avoid actually doing a data copy in many circumstances)
In general, retaining pointers to stack objects is a bad idea unless you are 100% sure that the pointer's lifetime will be a subset of the lifetime of the stack object it points to. (and even then it's probably a bad idea, because the next programmer who takes over the code after you've moved on to your next job might not see this subtle hazard and is therefore likely to inadvertently introduce dangling-pointer bugs when making changes to the code)
After the inner scope had escaped, I checked to see if I could still
access methods and properties of the removed instance. To my surprise
it worked without any problem.
That can happen if the memory where the object was hasn't been overwritten by anything else yet -- but definitely don't rely on that behavior (or any other particular behavior) if/when you dereference an invalid pointer, unless you like spending a lot of quality time with your debugger chasing down random crashes and/or other odd behavior :)
Is there a way I can clear deleted pointers from the list?
In principle, you could add code to the objects' destructors that would go through the list and look for pointers to themselves and remove them. In practice, I think that is a poor approach, since it uses up CPU cycles trying to recover from an error that a better design would not have allowed to be made in the first place.
Btw this is off topic but it might interest you that the Rust programming language is designed to detect and prevent this sort of error by catching it at compile-time. Maybe someday C++ will get something similar.
There is no such thing as deleted pointer. Pointer is just a number, representing some address in your process virtual address space. Even if stack frame is long gone, memory, that was holding it is still available, since it was allocated when thread started, so technically speaking, it is still a valid pointer, valid in terms, that you could dereference it and get something. But since object it was pointing is already gone, valid term will be dangling pointer. Moral is that if you have pointer to the object in the stack frame, there is no way to determine is it valid or not, not even using functions like IsBadReadPtr (Win32 API just for example). The best way to prevent such situations is avoid returning and storing pointers to the stack objects.
However, if you wish to track your heap allocated memory and automatically deallocate it after it is no longer used, you could utilize smart pointers (std::shared_ptr, boost::shared_ptr, etc).

Objects of com from a type library heap or stack

When we create objects from a type library for instance
SomeClassPtr some_obj(__uuidof(SomeImplementation));
is some_obj created on a heap or stack ? I mean is it like
SomeClassPtr *some_obj = new SomeImplementation();
Wrong way to think about it. Not the stack.
But that's where the guessing ends. This COM object could live in a different process. Or on a machine half-way around the world. All you got is an interface pointer, what it points to you don't know. Could be the actual object allocated on the heap. Could be a proxy that talks to a stub located somewhere else. Anywhere else. That's a feature, avoid caring about it.
In this case, your "pointer" (SomeClassPtr) is pointing to a block of memory which will be heap allocated.
However, it's not necessarily performing the heap allocation, as it's actually a reference counted type which handles the allocation and deallocation (via IUnknown::AddRef and IUnknown::Release). This means it may be acquiring and incrementing the reference count of an object which was previously allocated, depending on the type stored in the COM pointer.
Actually, it's a little wierder than that.
Strictly speaking, the first example is a stack variable, and the second example is a heap variable (assuming new hasnt been overridden.
The wierd part happens because, by the nature of COM, you don't get know what or where or if (if the result is an AddRef, there might not be, for example) any other allocation happens; in particular, for allocations that need to cross thread/process boundaries (which will always be the case for out of process servers), the default implementation of IMalloc::Alloc (CoTaskMemAlloc()) is to allocate from one of the heaps of the process in which the COM object is actually created/instantiated.
As pointed out elsewhere, that's a feature; you shouldn't need to care.

C++ Pointing to classes

I'm going through a C++ book at the moment and i'm slightly confused about pointing to classes.
Earlier in the book the examples used classes and methods in this way:
Calculator myCalc;
myCalc.launch();
while( myCalc.run() ){
myCalc.readInput();
myCalc.writeOutput();
}
However, now it's changed to doing it this way:
Calculator* myCalc = new Calculator;
myCalc -> launch();
while( myCalc -> run() ){
myCalc -> readInput();
myCalc -> writeOutput();
}
And I can't seem to find an explanation in there as to WHY it is doing it this way.
Why would I want to point to a class in this way, rather than use the standard way of doing it?
What is the difference? And what circumstances would one or the other be preferable?
Thank you.
First, you are not pointing to the class, but to an instance of the class, also called an object. (Pointing to classes is not possible in C++, one of its flaws if you'd ask me).
The difference is the place where the object is allocated. When you're doing:
Calculator myCalc;
The whole object is created on the stack. The stack is the storage for local variables, nested calls and so on, and is often limited to 1 MB or lower. On the other hand, allocations on the stack are faster, as no memory manager call is involved.
When you do:
Calculator *myCalc;
Not much happens, except that a Pointer is allocated on the stack. A pointer is usually 4 or 8 bytes in size (32bit vs. 64bit architectures) and only holds a memory address. You have to allocate an object and make the pointer point to it by doing something like:
myCalc = new Calculator;
which can also be combined into one line like shown in your example. Here, the object is allocated on the heap, which is approximately as large as your phyiscal memory (leaving swap space and architectural limitations unconsidered), so you can store way more data there. But it is slower, as the memory manager needs to kick in and find a spare place on the heap for your object or even needs to get more memory from the operating system. Now the pointer myCalc contains the memory address of the object, so it can be used with the * and the -> operators.
Also you cannot pass pointers or references to objects on the stack outside their scope, as the stack will get cleaned when the scope ends (i.e. at the end of a function for example), thus the object becomes unavailable.
Oh and nearly forgot to mention. Objects on the heap are not automatically destroyed, so you have to delete them manually like this*:
delete myCalc;
So to sum it up: For small, short living objects which are not to leave their scope, you can use stack based allocation, while for larger, long living objects the heap is usually the better place to go.
*: Well, ideally, not like that. Use a smart pointer, like std::unique_ptr.
You use the dot (.) when your variable is an instance or reference of the class while you use -> if your variable is a pointer to an instance of a class.
They are both part of the C++ standard, but there is a core difference. In the first way, your object lives on the stack (which is where functions and local variables are stored, and removed after they are no longer used). When you instead declare your variable type as a pointer, you are only storing a pointer on the stack, and the object itself is going on the heap.
While when you use the stack local variable to allocate the memory, it is automatically taken care of by C++. When it's on the heap, you have to get the memory with new and free it with delete.
While in the stack example your code uses . to call methods, to call methods on a pointer, C++ provides a shortcut: ->, which is equivalent to *obj.method().
Remember, when you use new, always use delete.
Both are standard. One is not preferred over the other.
The first one is typical of local variables that you declare and use in a narrow scope.
The pointer method allows you to dynamically allocate memory and assign it to a pointer type; that's what the "star" notation means. These can be passed out of a method or assigned to a member variable, living on after a method is exited.
But you have to be aware that you are also responsible for cleaning up that memory when you're done with the object the pointer refers to. If you don't, you many eventually exhaust a long-running application with a "memory leak".
Other than the obvious difference in notation/syntax. Pointers are generally useful when passing data into a function.
void myFunc(Calculator *c) {
...
}
is usually preferred over
void myFunc(Calculator c) {
...
}
since the second requires a copy be made of the calculator. A pointer only contains the location to what is being pointed to, so it only refers to another spot in memory instead of containing the data itself. Another good use is for strings, imagine reading a text file and calling functions to process the text, each function would make a copy of the string if it were not a pointer. A pointer is either 4 or 8 bytes depending on the machines architecture so it can save a lot of time and memory when passing it to functions.
In some case though it may be better to work with a copy. Maybe you just want to return an altered version like so
Calculator myFunc(Calculator c) {
...
}
one of the important things about pointers is the "new" keyword. It is not the only way to create a pointer but it is the easiest way that for c++. You should also be able to use a function called malloc() but that is more for structs and c IMO but I have seen both ways.
Speaking of C. Pointers may also be good for arrays. I think you can still only declare the size of an array at compile time in c++ too, but I could be mistaken. You could use the following I believe
Calculator *c;
....
Calculator d = c[index];
So now you have an array which can make it quite ambiguous IMO.
I think that covers just about all I know and in the example provided I do not think there is any difference between the two snippets you provided.
First of all, you are not pointing to a class, you are pointing to an instance (or object) of that class. In some other languages, classes are actually objects too :-)
The example is just that, an example. Most likely you wouldn't use pointers there.
Now, what IS a pointer? A pointer is just a tiny little thing that points to the real thing. Like the nametag on a doorbell -- it shows your name, but it's not actually you. However, because it is not you, you can actually have multiple buttons with your name on it in different locations.
This is one reason for using pointers: if you have one object, but you want to keep pointers to that object in various places. I mean, the real world has tons of "pointers" to you in all sorts of places; it shouldn't be too difficult to imagine that programs might need similar things inside their data.
Pointers are also used to avoid having to copy the object around, which can be an expensive operation. Passing a pointer to functions is much cheaper. Plus, it allows functions to modify the object (note that technically, C++ "references" are pointers as well, it's just a little less obvious and they are more limited).
In addition, objects allocated with "new" will stay around until they are deallocated with "delete". Thus, they don't depend on scoping -- they don't disappear when the function around them finishes, they only disappear when they are told to get lost.
Plus, how would you make a "bag with fruit"? You allocate a "bag" object. Then you allocate a "fruit" object, and you set a pointer inside the bag object to point to the fruit object, indicating that the bag is supposed to contain that fruit. The fruit might also get a pointer to the bag object, just so code working on the fruit can also get to the bag. You can also allocate another "fruit" object, and establish a chain of pointers: each "fruit" could have a single "next" pointer that points to the "next" fruit, so you can put an arbitrary number of fruits into the bag: the bag contains a pointer to the first fruit, and each fruit contains a pointer to another fruit. So you get a whole chain of fruits.
( This is a simple "container"; there are several such classes that "contain" an arbitrary number of objects ).
It's actually not that simple to come up with descriptions of when or why pointers are used; usually there'll just be situations where you'll need them. It's much easier to see their usefulness when you run into such a situation. Like "why is an umbrella useful" -- once you step into the pouring rain outside, the usefulness of an umbrella will become obvious.
One use would be if the variable myCalc has a very long lifetime. You can create it when you need if with new and remove it when done with delete. Then you don't have to worry about carrying it around at times when it's not needed and would only take up space. Or you can reinitialise it at will when needed, etc.
Or when you have a very big class, it's common practice to use new to allocate it on the heap rather than the stack. This is a leftover from the days when stack space was scarce and the heap was larger, so heap space was cheaper.
Or, of course, the most common use, allocating a dynamic array. myCalc = new Calculator[x]; to create x new calculators. You can't do this with static variables if you don't know beforehand how large x is; how many objects you're going to create.

Is there any heap compaction in C++?

I have a notion that C++ runtime doesn't do any heap compaction which means that the address of an object created on heap never changes. I want to confirm if this is true and also if it is true for every platform (Win32, Mac, ...)?
The C++ standard says nothing about a heap, nor about compaction. However, it does require that if you take the address of an object, that address stays the same throughout the object's lifetime.
A C++ implementation could do some kind of heap compaction and move objects around behind the scenes. But then the "addresses" it return to you when you use the address-of operator, are not actually memory addresses but some other kind of mapping.
In other words, yes, it is safe to assume that addresses in C++ stay the same while the object you're taking the address of lives.
What happens behind the scenes is unknown. It is possible that the physical memory addresses change (although common C++ compilers wouldn't do this, it might be relevant for compilers targeting various forms of bytecode, such as Flash), but the addresses that your program sees are going to behave nicely.
The standard does not specify it, but then the standard does not specify a heap. This is entirely dependent on your implementation. However, there is nothing stopping an implementation compacting unused memory while maintaining the same addreses for objects in use.
You are right it does not change. Pages can be moved around in physical memory but the Translation Lookaside Buffer (This is what control virtual memory) hides all that from you.
I'm unaware of any C++ implementation that will move allocated objects around. I suppose it might be technically permitted by the standard (though I'm not 100% sure about that), but remember that the standard must allow a pointer to be cast to a large enough integral type and back again and still be a valid pointer. So an implementation that could move dynamically allocated objects around would have to be able to deal with the admittedly unlikely series of events where:
a pointer is cast to an intptr_t
that value is transformed somehow (xor'ed with some value), so the runtime can't detect that it's a pointer to a particular object
the object gets moved due to compaction
the intptr_t gets transformed back into its original value, and
cast back to a pointer to the object type
The implementation would need to ensure that the pointer from that last step points to the moved object's new location.
I suppose using double indirection for pointers might allow an implementation to deal with this, but I'm unaware of any implementation that does anything like this.
Under normal circumstances when you're using the system compiler's default runtimes, you can safely assume that pointers will not be invalidated by the runtime.
If you are not using the default memory managers, but a 3rd-party memory manager instead, it completely depends on the runtime and memory manager you are using. While C++ objects do not generally get moved around in memory by the memory manager, you can write a memory manager that compacts free space and you could conceivably write one that would move allocated objects around to maximise free space as well.