Tracking automatic variable lifetime? - c++

This may not be possible, but I figured I'd ask...
Is there any way anyone can think of to track whether or not an automatic variable has been deleted without modifying the class of the variable itself? For example, consider this code:
const char* pStringBuffer;
{
std::string sString( "foo" );
pStringBuffer = sString.c_str();
}
Obviously, after the block, pStringBuffer is a dangling pointer which may or may not be valid. What I would like is a way to have a wrapper class which contains pStringBuffer (with a casting operator for const char*), but asserts that the variable it's referencing is still valid. By changing the type of the referenced variable I can certainly do it (boost shared_ptr/weak_ptr, for example), but I would like to be able to do it without imposing restrictions on the referenced type.
Some thoughts:
I'll probably need to change the assignment syntax to include the referenced variable (which is fine)
I might be able to look at the stack pointer to detect if my wrapper class was allocated "later" than the referenced class, but this seems hacky and not standard (C++ doesn't define stack behavior). It could work, though.
Thoughts / brilliant solutions?

In general, it's simply not possible from within C++ as pointers are too 'raw'. Also, looking to see if you were allocated later than the referenced class wouldn't work, because if you change the string, then the c_str pointer may well change.
In this particular case, you could check to see if the string is still returning the same value for c_str. If it is, you are probably still valid and if it isn't then you have an invalid pointer.
As a debugging tool, I would advise using an advanced memory tracking system, like valgrind (available only for linux I'm afraid. Similar programs exist for windows but I believe they all cost money. This program is the only reason I have linux installed on my mac). At the cost of much slower execution of your program, valgrind detects if you ever read from an invalid pointer. While it isn't perfect, I've found it detects many bugs, in particular ones of this type.

One technique you may find useful is to replace the new/delete operators with your own implementations which mark the memory pages used (allocated by your operator new) as non-accessible when released (deallocated by your operator delete). You will need to ensure that the memory pages are never re-used however so there will be limitations regarding run-time length due to memory exhaustion.
If your application accesses memory pages once they've been deallocated, as in your example above, the OS will trap the attempted access and raise an error. It's not exactly tracking per se as the application will be halted immediately but it does provide feedback :-)
This technique is applicable in narrow scenarios and won't catch all types of memory abuses but it can be useful. Hope that helps.

You could make a wrapper class that works in the simple case you mentioned. Maybe something like this:
X<const char*> pStringBuffer;
{
std::string sString( "foo" );
Trick trick(pStringBuffer).set(sString.c_str());
} // trick goes out of scope; its destructor marks pStringBuffer as invalid
But it doesn't help more complex cases:
X<const char*> pStringBuffer;
{
std::string sString( "foo" );
{
Trick trick(pStringBuffer).set(sString.c_str());
} // trick goes out of scope; its destructor marks pStringBuffer as invalid
}
Here, the invalidation happens too soon.
Mostly you should just write code which is as safe as possible (see: smart pointers), but no safer (see: performance, low-level interfaces), and use tools (valgrind, Purify) to make sure nothing slips through the cracks.

Given "pStringBuffer" is the only part of your example existing after sString goes out of scope, you need some change to it, or a substitute, that will reflect this. One simple mechanism is a kind of scope guard, with scope matching sString, that affects pStringBuffer when it is destroyed. For example, it could set pStringBuffer to NULL.
To do this without changing the class of "the variable" can only be done in so many ways:
Introduce a distinct variable in the same scope as sString (to reduce verbosity, you might consider a macro to generate the two things together). Not nice.
Wrap with a template ala X sString: it's arguable whether this is "modifying the type of the variable"... the alternative perspective is that sString becomes a wrapper around the same variable. It also suffers in that the best you can do is have templated constructor pass arguments to wrapped constructors up to some finite N arguments.
Neither of these help much as they rely on the developer remembering to use them.
A much better approach is to make "const char* pStringBuffer" simply "std::string some_meaningful_name", and assign to it as necessary. Given reference counting, it's not too expensive 99.99% of the time.

Related

Should I reset primitive member variable in destructor?

Please see following code,
class MyClass
{
public:
int i;
MyClass()
{
i = 10;
}
};
MyClass* pObj = nullptr;
int main()
{
{
MyClass obj;
pObj = &obj;
}
while (1)
{
cout << pObj->i; //pObj is dangling pointer, still no crash.
Sleep(1000);
}
return 0;
}
obj will die once it comes out of scope. But I tested in VS 2017, I see no crash even after I use it.
Is it good practice to reset int member varialbe i?
Accessing a member after an object got destroyed is undefined behavior. It may seem like a good to set members in a destructor to a predictable and most likely unexpected value, e.g., a rather large value or a value with specific bit pattern making it easy to recognize the value in a debugger.
However, this idea is flawed and dwarved by the system:
All classes would need to play along and instead of concentrating on creating correct code developers would spent time (both development time as well as run-time) making pointless change.
Compilers happen get rather smart and can detect that changes in destructor are not needed. Since a correct program cannot detect whether the change was made they may not make the change at all. This effect is an actual issue for security applications where, e.g., a password should be erased from memory so it cannot be read (using some non-portable mean).
Even if the value gets set to a specific value, memory gets reused and the values get overwritten. Especially with objects on the stack it is most likely that the memory is used for something else before you see the bad value in a debugger.
Even when resetting values you would necessarily see a "crash": a crash is caused by something being setup to protect against something invalid. In your example you are accessing an int on the stack: the stack will remain accessible from a CPU point of view and at best you'd get an unexpected value. Use of unusual pointer values typically leads to a crash because the memory management system tries to access a location which isn't mapped but even that isn't guaranteed: on a busy 32 bit system pretty much all memory may be in use. That is, trying to rely on undefined behavior being detect is also futile.
Correspondingly, it is much better to use good coding practices which avoid dangling references right away and concentrate on using these. For example, I'm always initializing members in the member initializer list, even in the rare cases they end up getting changed in the body of the constructor (i.e., you'd write your constructor as MyClass(): i() {}).
As a debugging tool it may be reasonable to replace the allocation functions (ideally the allocator object but potentially the global operator new()/operator delete() and family with a version which doesn't quickly hand out released memory and instead fills the released memory with a predictable pattern. Since these actions slow down the program you'd only use this code in a debug build but it is relatively simple to implement once and easy to enable/disable centrally it may be worth the effort. In practice I don't think even such a system pays off as use of managed pointers and proper design of ownership and lifetime avoid most errors due to dangling references.
The behaviour of code you gave is undefined. Partial case of undefined behaviour is working as expected, so here is nothing strange that the code works. Code can work now and it can broke anyway at any time depending of compiler version, compiler options, stack content and a moon phase.
So first and most important is to avoid dangling pointers (and all other kinds of undefined behaviour) everywhere.
What about clearing variables in destructor, I found a best practice:
Follow coding rules saving me from mistakes of access to unallocated or destroyed objects. I cannot describe it in a few words but rules are pretty common (see here and anywhere).
Analyze code by humans (code review) or by statical analyzers (like cppcheck or PVS-Studio or another) to avoid cases similar to one you described above.
Do not call delete manually, better use scoped_ptr or similar object lifetime managers. When delete is reasonable, I usually (usually) set pointer to nullptr after deletion to keep myself from mistakes.
Use pointers as rare as it possible. References are preferred.
When objects of my class used outside and I suspect that somebody can access it after deletion I can put signature field inside, set it to something like 0xDEAD in destructor and check at enter or every public method. Here be careful to not slow down your code to unacceptable speed.
After all of this setting i from your example to 0 or -1 is redundant. As for me it's not a thing you should focus your attention.

Which is more efficient memory wise: static functions, or functions of an object that is deleted right away?

The question as what the title says, static functions? or functions of an object that is deleted right away?
I know that in a real situation the difference is completely unnoticable but i would still like to know which is more efficient in saving memory. I really don't mind the overhead given by the "new" and "delete" command.
MyClass::staticFunction();
or...
myObject = new MyClass;
myObject->normalFunction();
delete myObject;
edit: the second code might as well be MyClass().normalFunction(); silly...
there are a few things to consider here;
there will only be one instance of the myObject, and it is only used ONCE in the application.
after usage, it is deleted right away because it is not needed.
one would ask, why is this even in a class? why not just put the function where it is used with temporary variables? the answer is encapsulation and readability. i do believe static functions use the same resources as global functions since in fact, they really are global functions that enjoy class scope. the only reason i have to put it in it's own class is to make my code more readable, and encapsulation.
As it stands, none of this makes any sense. The real solution would be to provide a free function (a function at namespace scope), because this is what free functions are there for.
Oh, and since you asked: If calling this one single function has noticeable overhead in your code, then you will only find out about it through careful profiling. Profiling is also what would answer your question which way is faster.
But first make sure your code is easy to read and well maintainable. Optimizing this then will be much easier than fixing prematurely micro-"optimized" code. The only early optimizations you should employ are those that result in optimal data structures and algorithms.
(Note that new and delete will very likely have far greater overhead than what the function actually does, let alone calling it.)
in fact, they really are global functions that enjoy class scope
I think that is spot on.
It doesn't make much of a difference. However, the following would make a lot more sense:
{
MyClass myObject;
myObject.normalFunction();
}
Or even,
MyClass().normalFunction();
Why would you bother creating a heap-allocated instance of an object that doesn't even matter?
We can't say for certain without trying it on a specific platform (or knowing the details of that platform), but we can probably say that the static one will not be slower than the one with new & delete.
There is no overhead to calling non-virtual member function versus class static function versus a free function since binding is resolved at compile-time. The only difference is that member functions get one extra argument for this pointer, but with static and free function you have to pass the object somehow, so it's the same.
If the code is complex enough that you think it needs to be in its own class, then that suggests multiple methods and state stored in the object. You're asking to compare that to using multiple functions, and by implication state passed as arguments. If there's a lot of shared state between the methods, using an object is reasonable. If there's not, using multiple functions is reasonable.
For scoping you may just prefer using namespace.
On the other hand, since you're asking for memory efficience, I can see one reason why you may prefer the object. If you're going to have a consider amount of memory allocated, encapsulating it in the object members sounds like a way to free them afterwards.

C++: Safe to use locals of caller in function?

I think it's best if I describe the situation using a code example:
int MyFuncA()
{
MyClass someInstance;
//<Work with and fill someInstance...>
MyFuncB( &someInstance )
}
int MyFuncB( MyClass* instance )
{
//Do anything you could imagine with instance, *except*:
//* Allowing references to it or any of it's data members to escape this function
//* Freeing anything the class will free in it's destructor, including itself
instance->DoThis();
instance->ModifyThat();
}
And here come my straightforward questions:
Is the above concept guranteed, by C and C++ standards, to work as expected? Why (not)?
Is this considered doing this, sparingly and with care, bad practice?
Is the above concept guranteed, by C and C++ standards, to work as expected? Why (not)?
Yes, it will work as expected. someInstance is available through the scope of MyFuncA. The call to MyFuncB is within that scope.
Is this considered doing this, sparingly and with care, bad practice?
Don't see why.
I don't see any problem in actually using the pointer you were passed to call functions on the object. As long as you call public methods of MyClass, everything remains valid C/C++.
The actual instance you create at the beginning of MyFuncA() will get destroyed at the end of MyFuncA(), and you are guaranteed that the instance will remain valid for the whole execution of MyFuncB() because someInstance is still valid in the scope of MyFuncA().
Yes it will work. It does not matter if the pointer you pass into MyFuncB is on the stack or on the heap (in this specific case).
In regards for the bad practice part you can probably argue both ways. In general it's bad I think because if for any reason any object which is living outside of MyFuncA gets hold of the object reference then it will die a horrible death later on and cause sometime very hard to track bugs. It rewally depends how extensive the usage of the object becomes in MyFuncB. Especially when it starts involving another 3rd class it can get messy.
Others have answered the basic question, with "yeah, that's legal". And in the absence of greater architecture it is hard to call it good or bad practice. But I'll try and wax philosophical on the broader question you seem to be picking up about pointers, object lifetimes, and expectations across function calls...
In the C++ language, there's no built-in way to pass a pointer to a function and "enforce" that it won't stow that away after the call is complete. And since C++ pointers are "weak references" by default, the objects pointed to may disappear out from under someone you pass it to.
But explicitly weak pointer abstractions do exist, for instance in Qt:
http://doc.qt.nokia.com/latest/qweakpointer.html
These are designed to specifically encode the "paranoia" to the recipient that the object it is holding onto can disappear out from under it. Anyone dereferencing one sort of realizes something is up, and they have to take the proper cautions under the design contract.
Additionally, abstractions like shared pointer exist which signal a different understanding to the recipient. Passing them one of those gives them the right to keep the object alive as long as they want, giving you something like garbage collection:
http://doc.qt.nokia.com/4.7-snapshot/qsharedpointer.html
These are only some options. But in the most general sense, if you come up with any interesting invariant for the lifetimes of your object...consider not passing raw pointers. Instead pass some pointer-wrapping class that embodies and documents the rules of the "game" in your architecture.
(One of major the reasons to use C++ instead of other languages is the wealth of tools you have to do cool things like that, without too much runtime cost!)
i don't think there should be any problem with that barring, as you say, something that frees the object, or otherwise trashes its state. i think whatever unexpected things happen would not have anything to do with using the class this way. (nothing in life is guaranteed of course, but classes are intended to be passed around and operated on, whether it's a local variable or otherwise i do not believe is relevant.)
the one thing you would not be able to do is keep a reference to the class after it goes out of scope when MyFuncA() returns, but that's just the nature of the scoping rules.

Best practice when calling initialize functions multiple times?

This may be a subjective question, but I'm more or less asking it and hoping that people share their experiences. (As that is the biggest thing which I lack in C++)
Anyways, suppose I have -for some obscure reason- an initialize function that initializes a datastructure from the heap:
void initialize() {
initialized = true;
pointer = new T;
}
now When I would call the initialize function twice, an memory leak would happen (right?). So I can prevent this is multiple ways:
ignore the call (just check wether I am initialized, and if I am don't do anything)
Throw an error
automatically "cleanup" the code and then reinitialize the thing.
Now what is generally the "best" method, which helps keeping my code manegeable in the future?
EDIT: thank you for the answers so far. However I'd like to know how people handle this is a more generic way. - How do people handle "simple" errors which can be ignored. (like, calling the same function twice while only 1 time it makes sense).
You're the only one who can truly answer the question : do you consider that the initialize function could eventually be called twice, or would this mean that your program followed an unexpected execution flow ?
If the initialize function can be called multiple times : just ignore the call by testing if the allocation has already taken place.
If the initialize function has no decent reason to be called several times : I believe that would be a good candidate for an exception.
Just to be clear, I don't believe cleanup and regenerate to be a viable option (or you should seriously consider renaming the function to reflect this behavior).
This pattern is not unusual for on-demand or lazy initialization of costly data structures that might not always be needed. Singleton is one example, or for a class data member that meets those criteria.
What I would do is just skip the init code if the struct is already in place.
void initialize() {
if (!initialized)
{
initialized = true;
pointer = new T;
}
}
If your program has multiple threads you would have to include locking to make this thread-safe.
I'd look at using boost or STL smart pointers.
I think the answer depends entirely on T (and other members of this class). If they are lightweight and there is no side-effect of re-creating a new one, then by all means cleanup and re-create (but use smart pointers). If on the other hand they are heavy (say a network connection or something like that), you should simply bypass if the boolean is set...
You should also investigate boost::optional, this way you don't need an overall flag, and for each object that should exist, you can check to see if instantiated and then instantiate as necessary... (say in the first pass, some construct okay, but some fail..)
The idea of setting a data member later than the constructor is quite common, so don't worry you're definitely not the first one with this issue.
There are two typical use cases:
On demand / Lazy instantiation: if you're not sure it will be used and it's costly to create, then better NOT to initialize it in the constructor
Caching data: to cache the result of a potentially expensive operation so that subsequent calls need not compute it once again
You are in the "Lazy" category, in which case the simpler way is to use a flag or a nullable value:
flag + value combination: reuse of existing class without heap allocation, however this requires default construction
smart pointer: this bypass the default construction issue, at the cost of heap allocation. Check the copy semantics you need...
boost::optional<T>: similar to a pointer, but with deep copy semantics and no heap allocation. Requires the type to be fully defined though, so heavier on dependencies.
I would strongly recommend the boost::optional<T> idiom, or if you wish to provide dependency insulation you might fall back to a smart pointer like std::unique_ptr<T> (or boost::scoped_ptr<T> if you do not have access to a C++0x compiler).
I think that this could be a scenario where the Singleton pattern could be applied.

Know what references an object

I have an object which implements reference counting mechanism. If the number of references to it becomes zero, the object is deleted.
I found that my object is never deleted, even when I am done with it. This is leading to memory overuse. All I have is the number of references to the object and I want to know the places which reference it so that I can write appropriate cleanup code.
Is there some way to accomplish this without having to grep in the source files? (That would be very cumbersome.)
A huge part of getting reference counting (refcounting) done correctly in C++ is to use Resource Allocation Is Initialization so it's much harder to accidentally leak references. However, this doesn't solve everything with refcounts.
That said, you can implement a debug feature in your refcounting which tracks what is holding references. You can then analyze this information when necessary, and remove it from release builds. (Use a configuration macro similar in purpose to how DEBUG macros are used.)
Exactly how you should implement it is going to depend on all your requirements, but there are two main ways to do this (with a brief overview of differences):
store the information on the referenced object itself
accessible from your debugger
easier to implement
output to a special trace file every time a reference is acquired or released
still available after the program exits (even abnormally)
possible to use while the program is running, without running in your debugger
can be used even in special release builds and sent back to you for analysis
The basic problem, of knowing what is referencing a given object, is hard to solve in general, and will require some work. Compare: can you tell me every person and business that knows your postal address or phone number?
One known weakness of reference counting is that it does not work when there are cyclic references, i.e. (in the simplest case) when one object has a reference to another object which in turn has a reference to the former object. This sounds like a non-issue, but in data structures such as binary trees with back-references to parent nodes, there you are.
If you don't explicitly provide for a list of "reverse" references in the referenced (un-freed) object, I don't see a way to figure out who is referencing it.
In the following suggestions, I assume that you don't want to modify your source, or if so, just a little.
You could of course walk the whole heap / freestore and search for the memory address of your un-freed object, but if its address turns up, it's not guaranteed to actually be a memory address reference; it could just as well be any random floating point number, of anything else. However, if the found value lies inside a block a memory that your application allocated for an object, chances improve a little that it's indeed a pointer to another object.
One possible improvement over this approach would be to modify the memory allocator you use -- e.g. your global operator new -- so that it keeps a list of all allocated memory blocks and their sizes. (In a complete implementation of this, operator delete would have remove the list entry for the freed block of memory.) Now, at the end of your program, you have a clue where to search for the un-freed object's memory address, since you have a list of memory blocks that your program actually used.
The above suggestions don't sound very reliable to me, to be honest; but maybe defining a custom global operator new and operator delete that does some logging / tracing goes in the right direction to solve your problem.
I am assuming you have some class with say addRef() and release() member functions, and you call these when you need to increase and decrease the reference count on each instance, and that the instances that cause problems are on the heap and referred to with raw pointers. The simplest fix may be to replace all pointers to the controlled object with boost::shared_ptr. This is surprisingly easy to do and should enable you to dispense with your own reference counting - you can just make those functions I mentioned do nothing. The main change required in your code is in the signatures of functions that pass or return your pointers. Other places to change are in initializer lists (if you initialize pointers to null) and if()-statements (if you compare pointers with null). The compiler will find all such places after you change the declarations of the pointers.
If you do not want to use the shared_ptr - maybe you want to keep the reference count intrinsic to the class - you can craft your own simple smart pointer just to deal with your class. Then use it to control the lifetime of your class objects. So for example, instead of pointer assignment being done with raw pointers and you "manually" calling addRef(), you just do an assignment of your smart pointer class which includes the addRef() automatically.
I don't think it's possible to do something without code change. With code change you can for example remember the pointers of the objects which increase reference count, and then see what pointer is left and examine it in the debugger. If possible - store more verbose information, such as object name.
I have created one for my needs. You can compare your code with this one and see what's missing. It's not perfect but it should work in most of the cases.
http://sites.google.com/site/grayasm/autopointer
when I use it I do:
util::autopointer<A> aptr=new A();
I never do it like this:
A* ptr = new A();
util::autopointer<A> aptr = ptr;
and later to start fulling around with ptr; That's not allowed.
Further I am using only aptr to refer to this object.
If I am wrong I have now the chance to get corrections. :) See ya!