I've recently started appreciating std::auto_ptr and now I read that it will be deprecated. I started using it for two situations:
Return value of a factory
Communicating ownership transfer
Examples:
// Exception safe and makes it clear that the caller has ownership.
std::auto_ptr<Component> ComponentFactory::Create() { ... }
// The receiving method/function takes ownership of the pointer. Zero ambiguity.
void setValue(std::auto_ptr<Value> inValue);
Despite the problematic copy semantics I find auto_ptr useful. And there doesn't seem to be an alternative for the above examples.
Should I just keep using it and later switch to std::unique_ptr? Or is it to be avoided?
It is so very very useful, despite it's flaws, that I'd highly recommend just continuing to use it and switching to unique_ptr when it becomes available.
::std::unique_ptr requires a compiler that supports rvalue references which are part of the C++0x draft standard, and it will take a little while for there to be really wide support for it. And until rvalue references are available, ::std::auto_ptr is the best you can do.
Having both ::std::auto_ptr and ::std::unique_ptr in your code might confuse some people. But you should be able to search and replace for ::std::unique_ptr when you decide to change it. You may get compiler errors if you do that, but they should be easily fixable. The top rated answer to this question about replacing ::std::auto_ptr with ::std::unique_tr has more details.
deprecated doesn't mean it's going away, just that there will be a better alternative.
I'd suggest keeping using it on current code, but using the new alternative on new code (New programs or modules, not small changes to current code). Consistency is important
I'd suggest you go use boost smart pointers.
Related
I was told recently in a job interview their project works on building the smallest size binary for their application (runs embedded) so I would not be able to use things such as templating or smart pointers as these would increase the binary size, they generally seemed to imply using things from std would be generally a no go (not all cases).
After the interview, I tried to do research online about coding and what features from standard lib caused large binary sizes and I could find basically nothing in regards to this. Is there a way to quantify using certain features and the size impact they would have (without needing to code 100 smart pointers in a code base vs self managed for example).
This question probably deserves more attention than it’s likely to get, especially for people trying to pursue a career in embedded systems. So far the discussion has gone about the way that I would expect, specifically a lot of conversation about the nuances of exactly how and when a project built with C++ might be more bloated than one written in plain C or a restricted C++ subset.
This is also why you can’t find a definitive answer from a good old fashioned google search. Because if you just ask the question “is C++ more bloated than X?”, the answer is always going to be “it depends.”
So let me approach this from a slightly different angle. I’ve both worked for, and interviewed at companies that enforced these kinds of restrictions, I’ve even voluntarily enforced them myself. It really comes down to this. When you’re running an engineering organization with more than one person with plans to keep hiring, it is wildly impractical to assume everyone on your team is going to fully understand the implications of using every feature of a language. Coding standards and language restrictions serve as a cheap way to prevent people from doing “bad things” without knowing they’re doing “bad things”.
How you define a “bad thing” is then also context specific. On a desktop platform, using lots of code space isn’t really a “bad” enough thing to rigorously enforce. On a tiny embedded system, it probably is.
C++ by design makes it very easy for an engineer to generate lots of code without having to type it out explicitly. I think that statement is pretty self-evident, it’s the whole point of meta-programming, and I doubt anyone would challenge it, in fact it’s one of the strengths of the language.
So then coming back to the organizational challenges, if your primary optimization variable is code space, you probably don’t want to allow people to use features that make it trivial to generate code that isn’t obvious. Some people will use that feature responsibly and some people won’t, but you have to standardize around the least common denominator. A C compiler is very simple. Yes you can write bloated code with it, but if you do, it will probably be pretty obvious from looking at it.
(Partially extracted from comments I wrote earlier)
I don't think there is a comprehensive answer. A lot also depends on the specific use case and needs to be judged on a case-by-case basis.
Templates
Templates may result in code bloat, yes, but they can also avoid it. If your alternative is introducing indirection through function pointers or virtual methods, then the templated function itself may become bigger in code size simply because function calls take several instructions and removes optimization potential.
Another aspect where they can at least not hurt is when used in conjunction with type erasure. The idea here is to write generic code, then put a small template wrapper around it that only provides type safety but does not actually emit any new code. Qt's QList is an example that does this to some extend.
This bare-bones vector type shows what I mean:
class VectorBase
{
protected:
void** start, *end, *capacity;
void push_back(void*);
void* at(std::size_t i);
void clear(void (*cleanup_function)(void*));
};
template<class T>
class Vector: public VectorBase
{
public:
void push_back(T* value)
{ this->VectorBase::push_back(value); }
T* at(std::size_t i)
{ return static_cast<T*>(this->VectorBase::at(i)); }
~Vector()
{ clear(+[](void* object) { delete static_cast<T*>(object); }); }
};
By carefully moving as much code as possible into the non-templated base, the template itself can focus on type-safety and to provide necessary indirections without emitting any code that wouldn't have been here anyway.
(Note: This is just meant as a demonstration of type erasure, not an actually good vector type)
Smart pointers
When written carefully, they won't generate much code that wouldn't be there anyway. Whether an inline function generates a delete statement or the programmer does it manually doesn't really matter.
The main issue that I see with those is that the programmer is better at reasoning about code and avoiding dead code. For example even after a unique_ptr has been moved away, the destructor of the pointer still has to emit code. A programmer knows that the value is NULL, the compiler often doesn't.
Another issue comes up with calling conventions. Objects with destructors are usually passed on the stack, even if you declare them pass-by-value. Same for return values. So a function unique_ptr<foo> bar(unique_ptr<foo> baz) will have higher overhead than foo* bar(foo* baz) simply because pointers have to be put on and off the stack.
Even more egregiously, the calling convention used for example on Linux makes the caller clean up parameters instead of the callee. That means if a function accepts a complex object like a smart pointer by value, a call to the destructor for that parameter is replicated at every call site, instead of putting it once inside the function. Especially with unique_ptr this is so stupid because the function itself may know that the object has been moved away and the destructor is superfluous; but the caller doesn't know this (unless you have LTO).
Shared pointers are a different beast altogether, simply because they allow a lot of different tradeoffs. Should they be atomic? Should they allow type casting, weak pointers, what indirection is used for destruction? Do you really need two raw pointers per shared pointer or can the reference counter be accessed through shared object?
Exceptions, RTTI
Generally avoided and removed via compiler flags.
Library components
On a bare-metal system, pulling in parts of the standard library can have a significant effect that can only be measured after the linker step. I suggest any such project use continuous integration and tracks the code size as a metric.
For example I once added a small feature, I don't remember which, and in its error handling it used std::stringstream. That pulled in the entire iostream library. The resulting code exceeded my entire RAM and ROM capacity. IIRC the issue was that even though exception handling was deactivated, the exception message was still being set up.
Move constructors and destructors
It's a shame that C++'s move semantics aren't the same as for example Rust's where objects can be moved with a simple memcpy and then "forgetting" their original location. In C++ the destructor for a moved object is still invoked, which requires more code in the move constructor / move assignment operator, and in the destructor.
Qt for example accounts for such simple cases in its meta type system.
Let's say I have a class with a method that returns a shared_ptr.
What are the possible benefits and drawbacks of returning it by reference or by value?
Two possible clues:
Early object destruction. If I return the shared_ptr by (const) reference, the reference counter is not incremented, so I incur the risk of having the object deleted when it goes out of scope in another context (e.g. another thread). Is this correct? What if the environment is single-threaded, can this situation happen as well?
Cost. Pass-by-value is certainly not free. Is it worth avoiding it whenever possible?
Thanks everybody.
Return smart pointers by value.
As you've said, if you return it by reference, you won't properly increment the reference count, which opens up the risk of deleting something at the improper time. That alone should be enough reason to not return by reference. Interfaces should be robust.
The cost concern is nowadays moot thanks to return value optimization (RVO), so you won't incur a increment-increment-decrement sequence or something like that in modern compilers. So the best way to return a shared_ptr is to simply return by value:
shared_ptr<T> Foo()
{
return shared_ptr<T>(/* acquire something */);
};
This is a dead-obvious RVO opportunity for modern C++ compilers. I know for a fact that Visual C++ compilers implement RVO even when all optimizations are turned off. And with C++11's move semantics, this concern is even less relevant. (But the only way to be sure is to profile and experiment.)
If you're still not convinced, Dave Abrahams has an article that makes an argument for returning by value. I reproduce a snippet here; I highly recommend that you go read the entire article:
Be honest: how does the following code make you feel?
std::vector<std::string> get_names();
...
std::vector<std::string> const names = get_names();
Frankly, even though I should know better, it makes me nervous. In principle, when get_names()
returns, we have to copy a vector of strings. Then, we need to copy it again when we initialize
names, and we need to destroy the first copy. If there are N strings in the vector, each copy
could require as many as N+1 memory allocations and a whole slew of cache-unfriendly data accesses > as the string contents are copied.
Rather than confront that sort of anxiety, I’ve often fallen back on pass-by-reference to avoid
needless copies:
get_names(std::vector<std::string>& out_param );
...
std::vector<std::string> names;
get_names( names );
Unfortunately, this approach is far from ideal.
The code grew by 150%
We’ve had to drop const-ness because we’re mutating names.
As functional programmers like to remind us, mutation makes code more complex to reason about by undermining referential transparency and equational reasoning.
We no longer have strict value semantics for names.
But is it really necessary to mess up our code in this way to gain efficiency? Fortunately, the answer turns out to be no (and especially not if you are using C++0x).
Regarding any smart pointer (not just shared_ptr), I don't think it's ever acceptable to return a reference to one, and I would be very hesitant to pass them around by reference or raw pointer. Why? Because you cannot be certain that it will not be shallow-copied via a reference later. Your first point defines the reason why this should be a concern. This can happen even in a single-threaded environment. You don't need concurrent access to data to put bad copy semantics in your programs. You don't really control what your users do with the pointer once you pass it off, so don't encourage misuse giving your API users enough rope to hang themselves.
Secondly, look at your smart pointer's implementation, if possible. Construction and destruction should be darn close to negligible. If this overhead isn't acceptable, then don't use a smart pointer! But beyond this, you will also need to examine the concurrency architecture that you've got, because mutually exclusive access to the mechanism that tracks the uses of the pointer is going to slow you down more than mere construction of the shared_ptr object.
Edit, 3 years later: with the advent of the more modern features in C++, I would tweak my answer to be more accepting of cases when you've simply written a lambda that never lives outside of the calling function's scope, and isn't copied somewhere else. Here, if you wanted to save the very minimal overhead of copying a shared pointer, it would be fair and safe. Why? Because you can guarantee that the reference will never be mis-used.
I've read a reasonable amount in decent textbooks about the auto_ptr class. While I understand what it is, and how it gets you around the problem of getting exceptions in places like constructors, I am having trouble figuring out when someone would actually use it.
An auto_ptr can only hold a single type (no array new[] initialization is supported). It changes ownership when you pass it into functions or try and duplicate it (it's not a reference counting smart pointer).
What is a realistic usage scenario for this class give its limitations? It seems like most of the textbook examples of its use are reaching because there isn't even a reason to be using a pointer over a stack variable in most of the cases...
Anyway, I'll stop my rant - but if you can provide a short example/description or a link to a good usage scenario for this I'd be grateful. I just want to know where I should use it in practice in case I come across the situation - I like to practice what I learn so I remember it.
I'll give you a short example for a good usage. Consider this:
auto_ptr<SomeResource> some_function() {
auto_ptr<SomeResource> my_ptr = get_the_resource();
function_that_throws_an_exception();
return my_ptr;
}
The function that raises an exception would normally cause your pointer to be lost, and the object pointed to would not be deleted. With the auto_ptr this can't happen, since it is destroyed when it leaves the frame it was created, if it hasn't been assigned (for example with return).
auto_ptr has been deprecated in the now finalized C++11 standard. Some of the replacements are already available through TR1 or the Boost libraries. Examples are shared_ptr and unique_ptr (scoped_ptr in boost).
I'd like to know what is considered nowadays the best practice when returning a pointer to a polymorphic object from a function, for example when using factories. If I transfer the ownership, should I return boost::unique_ptr<Interface>? What should I return if I don't transfer the ownership (e.g. returning a reference to a member)? Is there an alternative, non boost-based way which is also commonly used? Thanks.
EDIT: it is supposed to be C++03 compatible, with a possibility to easily upgrade to 0x
EDIT2: Please note I'm explicitly asking about common approaches, best practices, and not just "a way to do this". A solution implying a conditional search-and-replace over the codebase in future does not look like a good practice, does it?
EDIT3: Another point about auto_ptr is that it is deprecated, whatever neat it is, so it looks strange to advertise its usage at the interface level. Then, someone unaware will put the returned pointer into a STL container, and so on and so forth. So if you know another somehow common solution, you are very welcome to add an answer.
Use ::std::auto_ptr for now, and when C++0x is available, then switch to ::std::unique_ptr. At least in the factory case where you are handing ownership back to the caller.
Yes ::std::auto_ptr has problems and is ugly. Yes, it's deprecated in C++0x. But that is the recommended way to do it. I haven't examined ::boost::unique_ptr, but without move semantics I don't see that it can do any better than ::std::auto_ptr.
I prefer the idea of upgrading by doing a search and replace, though there are some unusual cases in which that won't have the expected result. Fortunately, these cases generate compiler errors:
::std::auto_ptr<int> p(new int);
::std::auto_ptr<int> p2 = p;
will have to become at least like this
::std::unique_ptr<int> p(new int);
::std::unique_ptr<int> p2 = ::std::move(p);
I prefer search and replace because I find that using macros and typedefs for things like this tend to make things more obscure and difficult to understand later. A search and replace of your codebase can be applied selectively if need be (::std::auto_ptr won't go away in C++0x, it's just deprecated) and leaves your code with clear and obvious intent.
As for what's 'commonly' done, I don't think the problem has been around for long enough for there to be a commonly accepted method of handling the changeover.
I've learned in College that you always have to free your unused Objects but not how you actually do it. For example structuring your code right and so on.
Are there any general rules on how to handle pointers in C++?
I'm currently not allowed to use boost. I have to stick to pure c++ because the framework I'm using forbids any use of generics.
I have worked with the embedded Symbian OS, which had an excellent system in place for this, based entirely on developer conventions.
Only one object will ever own a pointer. By default this is the creator.
Ownership can be passed on. To indicate passing of ownership, the object is passed as a pointer in the method signature (e.g. void Foo(Bar *zonk);).
The owner will decide when to delete the object.
To pass an object to a method just for use, the object is passed as a reference in the method signature (e.g. void Foo(Bat &zonk);).
Non-owner classes may store references (never pointers) to objects they are given only when they can be certain that the owner will not destroy it during use.
Basically, if a class simply uses something, it uses a reference. If a class owns something, it uses a pointer.
This worked beautifully and was a pleasure to use. Memory issues were very rare.
Rules:
Wherever possible, use a
smart pointer. Boost has some
good ones.
If you
can't use a smart pointer, null out
your pointer after deleting it.
Never work anywhere that won't let you use rule 1.
If someone disallows rule 1, remember that if you grab someone else's code, change the variable names and delete the copyright notices, no-one will ever notice. Unless it's a school project, where they actually check for that kind of shenanigans with quite sophisticated tools. See also, this question.
I would add another rule here:
Don't new/delete an object when an automatic object will do just fine.
We have found that programmers who are new to C++, or programmers coming over from languages like Java, seem to learn about new and then obsessively use it whenever they want to create any object, regardless of the context. This is especially pernicious when an object is created locally within a function purely to do something useful. Using new in this way can be detrimental to performance and can make it all too easy to introduce silly memory leaks when the corresponding delete is forgotten. Yes, smart pointers can help with the latter but it won't solve the performance issues (assuming that new/delete or an equivalent is used behind the scenes). Interestingly (well, maybe), we have found that delete often tends to be more expensive than new when using Visual C++.
Some of this confusion also comes from the fact that functions they call might take pointers, or even smart pointers, as arguments (when references would perhaps be better/clearer). This makes them think that they need to "create" a pointer (a lot of people seem to think that this is what new does) to be able to pass a pointer to a function. Clearly, this requires some rules about how APIs are written to make calling conventions as unambiguous as possible, which are reinforced with clear comments supplied with the function prototype.
In the general case (resource management, where resource is not necessarily memory), you need to be familiar with the RAII pattern. This is one of the most important pieces of information for C++ developers.
In general, avoid allocating from the heap unless you have to. If you have to, use reference counting for objects that are long-lived and need to be shared between diverse parts of your code.
Sometimes you need to allocate objects dynamically, but they will only be used within a certain span of time. For example, in a previous project I needed to create a complex in-memory representation of a database schema -- basically a complex cyclic graph of objects. However, the graph was only needed for the duration of a database connection, after which all the nodes could be freed in one shot. In this kind of scenario, a good pattern to use is something I call the "local GC idiom." I'm not sure if it has an "official" name, as it's something I've only seen in my own code, and in Cocoa (see NSAutoreleasePool in Apple's Cocoa reference).
In a nutshell, you create a "collector" object that keeps pointers to the temporary objects that you allocate using new. It is usually tied to some scope in your program, either a static scope (e.g. -- as a stack-allocated object that implements the RAII idiom) or a dynamic one (e.g. -- tied to the lifetime of a database connection, as in my previous project). When the "collector" object is freed, its destructor frees all of the objects that it points to.
Also, like DrPizza I think the restriction to not use templates is too harsh. However, having done a lot of development on ancient versions of Solaris, AIX, and HP-UX (just recently - yes, these platforms are still alive in the Fortune 50), I can tell you that if you really care about portability, you should use templates as little as possible. Using them for containers and smart pointers ought to be ok, though (it worked for me). Without templates the technique I described is more painful to implement. It would require that all objects managed by the "collector" derive from a common base class.
G'day,
I'd suggest reading the relevant sections of "Effective C++" by Scott Meyers. Easy to read and he covers some interesting gotchas to trap the unwary.
I'm also intrigued by the lack of templates. So no STL or Boost. Wow.
BTW Getting people to agree on conventions is an excellent idea. As is getting everyone to agree on conventions for OOD. BTW The latest edition of Effective C++ doesn't have the excellent chapter about OOD conventions that the first edition had which is a pity, e.g. conventions such as public virtual inheritance always models an "isa" relationship.
Rob
When you have to use manage memory
manually, make sure you call delete
in the same
scope/function/class/module, which
ever applies first, e.g.:
Let the caller of a function allocate the memory that is filled by it,
do not return new'ed pointers.
Always call delete in the same exe/dll as you called new in, because otherwise you may have problems with heap corruptions (different incompatible runtime libraries).
you could derive everything from some base class that implement smart pointer like functionality (using ref()/unref() methods and a counter.
All points highlighted by #Timbo are important when designing that base class.