Prevent memory leaks and easy/idiomatic programming in C++ - c++

Is there a compromise between managing memory and easy/idiomatic programming in C++?
For example, let's say I have the following classes. We have an Item that someone may purchase, and a ShoppingList of those Items. Let's just say Item and ShoppingList are big data structures with more features than shown and that we would not want to pass them by value normally (but they are stripped down to the basics features in this example).
class Item {
private:
std::string name;
float cost;
public:
Item(std::string name, float cost) : name{name}, cost{cost} { }
std::string toString(void) {
return name + " = " + std::to_string(cost);
}
~Item() { std::cout << "~Item()" << std::endl; }
};
class ShoppingList {
private:
std::vector<Item *> items;
public:
ShoppingList() {}
void addItem(Item &item) {
items.push_back(&item);
}
std::vector<Item *> getItems() {
return items;
}
~ShoppingList() {
std::cout << "~ShoppingList()" << std::endl;
}
};
Now, I want to be able to conveniently do something like the following (inline the argument).
ShoppingList shoppingList{};
shoppingList.add(Item{"soap", 1.99}); // no
C++17 will not me allow to try this either.
ShoppingList shoppingList{};
shoppingList.add(&Item{"soap", 1.99}); // no
shoppingList.add(new Item{"soap", 1.99}); // no
Instead, I have to do something like this.
ShoppingList shoppingList{};
Item soap{"soap", 1.99};
shoppingList.add(soap);
The above is "ok" for adding 1 element, but things get "worse" when I try to use a for loop. The item created on each loop ends with the same address and I do not end up with the intended effect.
ShoppingList shoppingList{};
auto products = std::vector<std::string>{"brush", "comb"};
for (auto product : products) {
Item item{product, 0.99};
shoppingList.addItem(item);
}
It seems unreasonable, to me, to do the following.
ShoppingList shoppingList{};
Item item1{"item1", 0.99};
Item item2{"item2", 0.99};
Item item3{"item3", 0.99};
// and so on
shoppingList.addItem(item1);
shoppingList.addItem(item2);
shoppingList.addItem(item3);
// and so on
There's just so many ways of thinking about how to make it easier to use the code. One way would be to use raw pointers. So I may change the method signature of ShoppingList.addItem(...) to as follows.
void addItem(Item *item);
Then I can do some less difficult use of the objects as follows.
ShoppingList shoppingList{};
shoppingList.addItem(new Item{"item1", 0.99});
shoppingList.addItem(new Item{"item2", 0.99});
shoppingList.addItem(new Item{"item3", 0.99});
With this new method signature, for loops should work.
ShoppingList shoppingList{};
auto products = std::vector<std::string>{"brush", "comb"};
for (auto product : products) {
shoppingList.addItem(new Item{product, 0.99});
}
Ok, so this approach makes it easier to use/write the code, but valgrind analysis says Leak_DefintelyLost where I start using new.
Another approach is smart pointers. So let's wrap the objects into smart pointers; in particular shared_pointer. Here's the new ShoppingList using smart pointers.
class ShoppingList {
private:
std::vector<std::shared_ptr<Item>> items;
public:
ShoppingList() {}
void addItem(std::shared_ptr<Item> item) {
items.push_back(item);
}
std::vector<std::shared_ptr<Item>> getItems() {
return items;
}
~ShoppingList() {
std::cout << "~ShoppingList()" << std::endl;
}
};
Usage will look like the following.
ShoppingList shoppingList{};
shoppingList.addItem(std::make_shared<Item>("soap", 32.0));
shoppingList.addItem(std::make_shared<Item>("shampoo", 32.0));
valgrind does not detect a single memory leak issue.
The only problem I have with using shared_ptr is verbosity and nested types. This declaration looks bad to me. That is a lot of : and < with >.
std::vector<std::shared_ptr<Item>> items;
In one situation where I had a map of maps of big data structures, I considered using shared pointers and ended up with a declaration as follows. If another coder looks at what the intention is (or if I myself come back later to troubleshoot), what is the intention here?
std::map<std::string, std::map<std::string, std::vector<std::shared_ptr<SomeClass>>>> dataBag;
It seems to me that the : and < and > are polluting the code and intent all for the purpose of managing memory leaks. I am not sure what's good or bad or idiomatic to write C++17 while being concise but also preventing memory leaks boostrapped by the language constructs.
Now, I did encounter the idea of RAII. But mocking some classes with that approach (raw pointers only), valgrind still shows Leak_DefinitelyLost.
Any general guidance on I suppose what I would refer to as idiomatic C++ that is memory leakage sensitive/preventative would be helpful. I was going to go down the road of creating factories to store all the pointers that I will create across my API, but I realized that approach would cause the memory to continue to build up even when there are no longer references to some pointers (and smart pointers already handle that part).
At this point, I am almost resigned to the acceptance of living with smart pointers (though there may be diamond and colon operators all over the place) to avoid memory leaks.

One way or another, if you want a collection of items, you're going to have to create each item in the collection. Storing pointers to items mostly helps if they're really so tremendously huge that copying them is unacceptable, even if it happens only a few times, such as when a vector reallocates its memory.
As for passing by reference: right now, you're defining the parameter as an lvalue reference. That means the reference can only refer to an lvalue. Thus the requirement to create a named object (an lvalue) then pass it.
You apparently don't want that. In your case, you have a couple of choices. One is a reference to const. Unlike a (non-const) lvalue reference, this can bind to a temporary object.
Another choice would be to use an rvalue reference. An rvalue reference has the advantage that it can bind to an rvalue (such as a temporary object), and also that it's "aware" that it's dealing with an rvalue, so it can "steal" the contents of that object. This is particularly useful if the object mostly contains a pointer to the actual data, in which case you can just copy the pointer (and modify the existing object so it won't destroy the data when it's destroyed).
In most cases, you're best off starting with the simplest code that could work. Create the vector of objects, and live with the fact that they'll get copied. Sometime later, if you find that you have a real bottleneck from copying those objects as that vector is reallocated, it's soon enough to change your code--such as to store std::unique_ptrs instead of storing objects directly. But that's not your first choice. Chances are pretty good that when you profile the code, you'll find that copying those objects isn't a bottleneck at all, so putting work into keeping them from being copied would have been wasted.

A long time ago I get c++ advice.
Never use raw pointers.
As soon as you are not low-level library developer. You don't need raw pointers. Don't need. Period.

Related

c++ return structures and vectors optimally

I am reading a lot of different things on C++ optimization and I am getting quite mixed up. I would appreciate some help. Basically, I want to clear up what needs to be a pointer or not when I am passing vectors and structures as parameters or returning vectors and structures.
Say I have a Structure that contains 2 elements: an int and then a vector of integers. I will be creating this structure locally in a function, and then returning it. This function will be called multiple times and generate a new structure every time. I would like to keep the last structure created in a class member (lastStruct_ for example). So before returning the struct I could update lastStruct_ in some way.
Now, what would be the best way to do this, knowing that the vector in the structure can be quite large (would need to avoid copies). Does the vector in the struct need to be a pointer ? If I want to share lastStruct_ to other classes by creating a get_lastStruct() method, should I return a reference to lastStruct_, a pointer, or not care about that ? Should lastStruct_ be a shared pointer ?
This is quite confusing to me because apparently C++ knows how to avoid copying, but I also see a lot of people recommending the use of pointers while others say a pointer to a vector makes no sense at all.
struct MyStruct {
std::vector<int> pixels;
int foo;
}
class MyClass {
MyStruct lastStruct_;
public:
MyStruct create_struct();
MyStruct getLastStruct();
}
MyClass::create_struct()
{
MyStruct s = {std::vector<int>(100, 1), 1234};
lastStruct_ = s;
return s;
}
MyClass::getLastStruct()
{
return lastStruct_;
}
If the only copy you're trying to remove is the one that happen when you return it from your factory function, I'd say containing the vector directly will be faster all the time.
Why? Two things. Return Value Optimisation (RVO/NRVO) will remove any need for temporaries when returning. This is enough for almost all cases.
When return value optimisation don't apply, move semantics will. returning a named variable (eg: return my_struct;) will do implicit move in the case NRVO won't apply.
So why is it always faster than a shared pointer? Because when copying the shared pointer, you must dereference the control block to increase the owner count. And since it's an atomic operation, the incrementation is not free.
Also, using a shared pointer brings shared ownership and non-locality. If you were to use a shared pointer, use a pointer to const data to bring back value semantics.
Now that you added the code, it's much clearer what you're trying to do.
There's no way around the copy here. If you measure performance degradation, then containing a std::shared_ptr<const std::vector<int>> might be the solution, since you'll keep value semantic but avoid vector copy.

Reasons to return reference to std::unique_ptr

I wonder if there are any legitimate reasons to return unique pointers by reference in C++, i.e. std::unique_ptr<T>&?
I've never actually seen that trick before, but the new project I've got seems to use this pattern a lot. From the first glance, it just effectively breaks / circumvents the "unique ownership" contract, making it impossible to catch the error in compile-time. Consider the following example:
class TmContainer {
public:
TmContainer() {
// Create some sort of complex object on heap and store unique_ptr to it
m_time = std::unique_ptr<tm>(new tm());
// Store something meaningful in its fields
m_time->tm_year = 42;
}
std::unique_ptr<tm>& time() { return m_time; }
private:
std::unique_ptr<tm> m_time;
};
auto one = new TmContainer();
auto& myTime = one->time();
std::cout << myTime->tm_year; // works, outputs 42
delete one;
std::cout << myTime->tm_year; // obviously fails at runtime, as `one` is deleted
Note that if we'd returned just std::unique_ptr<tm> (not a reference), it would raise a clear compile-time error, or would force use to use move semantics:
// Compile-time error
std::unique_ptr<tm> time() { return m_time; }
// Works as expected, but m_time is no longer owned by the container
std::unique_ptr<tm> time() { return std::move(m_time); }
I suspect that a general rule of thumb is that all such cases warrant use of std::shared_ptr. Am I right?
There are two use cases for this and in my opinion it is indicative of bad design. Having a non-const reference means that you can steal the resource or replace it without having to offer separate methods.
// Just create a handle to the managed object
auto& tm_ptr = tm_container.time();
do_something_to_tm(*tm_ptr);
// Steal the resource
std::unique_ptr<TmContainer> other_tm_ptr = std::move(tm_ptr);
// Replace the managed object with another one
tm_ptr = std::make_unique<TmContainer>;
I strongly advocate against these practices because they are error prone and less readable. It's best to offer an interface such as the following, provided you actually need this functionality.
tm& time() { return *m_time; }
std::unique_ptr<tm> release_time() { return {std::move(m_time)}; }
// If tm is cheap to move
void set_time(tm t) { m_time = make_unique<tm>(std::move(t)); }
// If tm is not cheap to move or not moveable at all
void set_time(std::unique_ptr t_ptr) { m_time = std::move(t_ptr); }
This was too long of a comment. I don't have a good idea for requested use case. The only thing I imagine is some middle ware for a utility library.
The question is what do you want and need to model. What are the semantics. Returning reference does not anything useful that I know.
The only advantage in your example is no explicit destructor. If you wanted to make exact mirror of this code it'd be a raw pointer IMHO. What exact type one should use depends on the exact semantics that they want to model.
Perhaps the intention behind the code you show is to return a non-owning pointer, which idiomatically (as I have learned) is modeled through a raw pointer. If you need to guarantee the object is alive, then you should use shared_ptr but bear it mind, that it means sharing ownership - i.e. detaching it's lifetime from the TmContianer.
Using raw pointer would make your code fail the same, but one could argue, that there is no reason for explicit delete as well, and lifetime of the object can be properly managed through scopes.
This is of course debatable, as semantics and meaning of words and phrases is, but my experience says, that's how c++ people write, talk and understand the pointers.
std::unique_ptr does not satisfy the requirements of CopyConstructible or CopyAssignable as per design.
So the object has to be returned as a reference, if needed.

C++ Vectors, Lists, Pointers, and Confusion

I have an Entity class with a variety of subclasses for particular types of entities. The objects of the subclasses have various relationships with each other, so I was using Entity pointers and static_casts to describe these relationships and to allow entities to access information about the other entities they have relationships with.
However, I found that using raw pointers was a source of a lot of difficulty & hard to debug errors. For instance, I wasted several hours before realizing I was making pointers to an object before copying it into a vector, which invalidated the pointers.
I'm using vectors and lists & stl containers like that to manage all of my memory - I am really trying to avoid worrying about memory management and low-level issues.
Say Cow is a subclass of Entity, and I have a vector of Cows (or some data structure of Cows). This vector might move itself around in memory if it needs to resize or something. I don't really want to care about those details, but I know they might invalidate my pointers. So if a Pig has a relationship with a cow and the cow vector is resized, the Pig loses track of where its friend is located in memory & has a dangerous incorrect pointer.
Here are my questions, although I think someone experienced might be able to help me more just by reading the situation...
Should I try to slip something into the constructor/destructor of my Entity objects so that they automatically invalidate relationships other entities have with them when they are deleted or update memory locations if they are moved? If this is a good approach, any help on the semantics would be useful. I have been writing all of my classes with no constructors or destructors at all so that they are extremely easy to use in things like vectors.
Is the better option to just keep track of the containers & manually update the pointers whenever I know the containers are going to move? So the strategy there would be to iterate through any container that may have moved right after it moves and update all of the pointers immediately.
As I understand it, you're doing something like this:
cow florence, buttercup;
std::vector<cow> v = { florence, buttercup };
cow* buttercup_ptr = &v[1];
// do something involving the cow*
And your buttercup_ptr is getting invalidated because, eg. florence was removed from the vector.
It is possible to resolve this by using smart pointers, like this:
std::shared_ptr<cow> florence = std::make_shared<cow>();
std::vector<std::shared_ptr<cow>> v;
v.push_back(florence);
You can now freely share florence... regardles off how the vector gets shuffled around, it remains valid.
If you want to let florence be destroyed when you pop her out of the vector, that presents a minor problem: anyone holding a copy of the shared_ptr will prevent this particular cow instance being cleaned up. You can avoid that by using a weak_ptr.
std::weak_ptr<cow> florence_ref = florence;
In order to use a weak_ptr you call lock() on it to convert it to a full shared_ptr. If the underlying cow instance was already destroyed because it had no strong references, an excetion is throw (you can also call expired() to check this before lock(), of course!)
The other possibility (as I suggested in my comment on the original question) is to use iterators pointing into the container holding the cow instances. This is a little stranger, but iterators are quite like pointers in many ways. You just have to be careful in your choice of container, to ensure your iterators aren't going to be invalidate (which mean syou can't use a vector for this purpose);
std::set<cow> s;
cow florence;
auto iter = s.insert(florence);
// do something with iter, and not worry if other cows are added to
// or removed from s
I don't think I'd necessarily advise this option, though I'm sure it would work. Using smart pointers is a better way to show future maintainance programmers your intent and has the advantage that getting dangling references is much harder.
You have a misconception about pointers. A pointer is (for the sake of this discussion!) just an address. Why would copying an address invalidate anything? Let's say your object is created in memory at address 12345. new therefore gives you a pointer whose value is 12345:
Entity *entity = new Cow; // object created at address 12345
// entity now has value 12345
Later, when you copy that pointer into a vector (or any other standard container), you copy nothing but the value 12345.
std::vector<Entity *> entities;
entities.push_back(entity);
// entities[0] now has value 12345, entity does not change and remains 12345
In fact, there's no difference with regards to this compared to, say, a std::vector<int>:
int i = 12345;
std::vector<int> ints;
ints.push_back(i);
// ints[0] now has value 12345, i does not change and remains 12345
When the vector is copied, the values inside do not change, either:
std::vector<int> ints2 = ints;
std::vector<Entity *> entities2 = entities;
// i, ints[0] and ints[0] are 12345
// entity, entities[0] and entities[0] are 12345
So if a Pig has a relationship with a cow and the cow vector is
resized, the Pig loses track of where its friend is located in memory
& has a dangerous incorrect pointer.
This fear is unfounded. The Pig does not lose track of its friend.
The dangerous thing is keeping a pointer to a deleted object:
delete entity;
entities[0]->doSometing(); // undefined behaviour, may crash
How to solve this problem is subject of thousands of questions, websites, blogs and books in C++. Smart pointers may help you, but you should first truly understand the concept of ownership.
My two cent. The code below is a working example (C++11, translates with g++ 4.8.2) with smart pointers and factory function create. Should be quite failsafe. (You can always destroy something in C++ if you really want to.)
#include <memory>
#include <vector>
#include <algorithm>
#include <iostream>
#include <sstream>
class Entity;
typedef std::shared_ptr<Entity> ptr_t;
typedef std::vector<ptr_t> animals_t;
class Entity {
protected:
Entity() {};
public:
template<class E, typename... Arglist>
friend ptr_t create(Arglist... args) {
return ptr_t(new E(args...));
}
animals_t my_friends;
virtual std::string what()=0;
};
class Cow : public Entity {
double m_milk;
Cow(double milk) :
Entity(),
m_milk(milk)
{}
public:
friend ptr_t create<Cow>(double);
std::string what() {
std::ostringstream os;
os << "I am a cow.\nMy mother gave " << m_milk << " liters.";
return os.str();
}
};
class Pig : public Entity {
Pig() : Entity() {}
public:
friend ptr_t create<Pig>();
std::string what() {
return "I am a pig.";
}
};
int main() {
animals_t animals;
animals.push_back(create<Cow>(1000.0));
animals.push_back(create<Pig>());
animals[0]->my_friends.push_back(animals[1]);
std::for_each(animals.begin(),animals.end(),[](ptr_t v){
std::cout << '\n' << v->what();
std::cout << "\nMy friends are:";
std::for_each(v->my_friends.begin(),v->my_friends.end(),[](ptr_t f){
std::cout << '\n' << f->what();
});
std::cout << '\n';
});
}
/*
Local Variables:
compile-command: "g++ -std=c++11 test.cc -o a.exe && ./a.exe"
End:
*/
By declaring the Entity constructor as protected one forces that derived objects of type Entity must be created through the factory function create.
The factory makes sure that objects always to into smart pointers (in our case std::shared_ptr).
It is written as template to make it re-usable in the subclasses. More precisely it is a variadic template to allow constructors with any numbers of arguments in the derived classes. In the modified version of the program one passes the milk production of the mother cow to the newly created cow as a constructor argument.

C++ best practice: Returning reference vs. object

I'm trying to learn C++, and trying to understand returning objects. I seem to see 2 ways of doing this, and need to understand what is the best practice.
Option 1:
QList<Weight *> ret;
Weight *weight = new Weight(cname, "Weight");
ret.append(weight);
ret.append(c);
return &ret;
Option 2:
QList<Weight *> *ret = new QList();
Weight *weight = new Weight(cname, "Weight");
ret->append(weight);
ret->append(c);
return ret;
(of course, I may not understand this yet either).
Which way is considered best-practice, and should be followed?
Option 1 is defective. When you declare an object
QList<Weight *> ret;
it only lives in the local scope. It is destroyed when the function exits. However, you can make this work with
return ret; // no "&"
Now, although ret is destroyed, a copy is made first and passed back to the caller.
This is the generally preferred methodology. In fact, the copy-and-destroy operation (which accomplishes nothing, really) is usually elided, or optimized out and you get a fast, elegant program.
Option 2 works, but then you have a pointer to the heap. One way of looking at C++ is that the purpose of the language is to avoid manual memory management such as that. Sometimes you do want to manage objects on the heap, but option 1 still allows that:
QList<Weight *> *myList = new QList<Weight *>( getWeights() );
where getWeights is your example function. (In this case, you may have to define a copy constructor QList::QList( QList const & ), but like the previous example, it will probably not get called.)
Likewise, you probably should avoid having a list of pointers. The list should store the objects directly. Try using std::list… practice with the language features is more important than practice implementing data structures.
Use the option #1 with a slight change; instead of returning a reference to the locally created object, return its copy.
i.e. return ret;
Most C++ compilers perform Return value optimization (RVO) to optimize away the temporary object created to hold a function's return value.
In general, you should never return a reference or a pointer. Instead, return a copy of the object or return a smart pointer class which owns the object. In general, use static storage allocation unless the size varies at runtime or the lifetime of the object requires that it be allocated using dynamic storage allocation.
As has been pointed out, your example of returning by reference returns a reference to an object that no longer exists (since it has gone out of scope) and hence are invoking undefined behavior. This is the reason you should never return a reference. You should never return a raw pointer, because ownership is unclear.
It should also be noted that returning by value is incredibly cheap due to return-value optimization (RVO), and will soon be even cheaper due to the introduction of rvalue references.
passing & returning references invites responsibilty.! u need to take care that when you modify some values there are no side effects. same in the case of pointers. I reccomend you to retun objects. (BUT IT VERY-MUCH DEPENDS ON WHAT EXACTLY YOU WANT TO DO)
In ur Option 1, you return the address and Thats VERY bad as this could lead to undefined behaviour. (ret will be deallocated, but y'll access ret's address in the called function)
so use return ret;
It's generally bad practice to allocate memory that has to be freed elsewhere. That's one of the reasons we have C++ rather than just C. (But savvy programmers were writing object-oriented code in C long before the Age of Stroustrup.) Well-constructed objects have quick copy and assignment operators (sometimes using reference-counting), and they automatically free up the memory that they "own" when they are freed and their DTOR automatically is called. So you can toss them around cheerfully, rather than using pointers to them.
Therefore, depending on what you want to do, the best practice is very likely "none of the above." Whenever you are tempted to use "new" anywhere other than in a CTOR, think about it. Probably you don't want to use "new" at all. If you do, the resulting pointer should probably be wrapped in some kind of smart pointer. You can go for weeks and months without ever calling "new", because the "new" and "delete" are taken care of in standard classes or class templates like std::list and std::vector.
One exception is when you are using an old fashion library like OpenCV that sometimes requires that you create a new object, and hand off a pointer to it to the system, which takes ownership.
If QList and Weight are properly written to clean up after themselves in their DTORS, what you want is,
QList<Weight> ret();
Weight weight(cname, "Weight");
ret.append(weight);
ret.append(c);
return ret;
As already mentioned, it's better to avoid allocating memory which must be deallocated elsewhere. This is what I prefer doing (...these days):
void someFunc(QList<Weight *>& list){
// ... other code
Weight *weight = new Weight(cname, "Weight");
list.append(weight);
list.append(c);
}
// ... later ...
QList<Weight *> list;
someFunc(list)
Even better -- avoid new completely and using std::vector:
void someFunc(std::vector<Weight>& list){
// ... other code
Weight weight(cname, "Weight");
list.push_back(weight);
list.push_back(c);
}
// ... later ...
std::vector<Weight> list;
someFunc(list);
You can always use a bool or enum if you want to return a status flag.
Based on experience, do not use plain pointers because you can easily forget to add proper destruction mechanisms.
If you want to avoid copying, you can go for implementing the Weight class with copy constructor and copy operator disabled:
class Weight {
protected:
std::string name;
std::string desc;
public:
Weight (std::string n, std::string d)
: name(n), desc(d) {
std::cout << "W c-tor\n";
}
~Weight (void) {
std::cout << "W d-tor\n";
}
// disable them to prevent copying
// and generate error when compiling
Weight(const Weight&);
void operator=(const Weight&);
};
Then, for the class implementing the container, use shared_ptr or unique_ptr to implement the data member:
template <typename T>
class QList {
protected:
std::vector<std::shared_ptr<T>> v;
public:
QList (void) {
std::cout << "Q c-tor\n";
}
~QList (void) {
std::cout << "Q d-tor\n";
}
// disable them to prevent copying
QList(const QList&);
void operator=(const QList&);
void append(T& t) {
v.push_back(std::shared_ptr<T>(&t));
}
};
Your function for adding an element would make use or Return Value Optimization and would not call the copy constructor (which is not defined):
QList<Weight> create (void) {
QList<Weight> ret;
Weight& weight = *(new Weight("cname", "Weight"));
ret.append(weight);
return ret;
}
On adding an element, the let the container take the ownership of the object, so do not deallocate it:
QList<Weight> ql = create();
ql.append(*(new Weight("aname", "Height")));
// this generates segmentation fault because
// the object would be deallocated twice
Weight w("aname", "Height");
ql.append(w);
Or, better, force the user to pass your QList implementation only smart pointers:
void append(std::shared_ptr<T> t) {
v.push_back(t);
}
And outside class QList you'll use it like:
Weight * pw = new Weight("aname", "Height");
ql.append(std::shared_ptr<Weight>(pw));
Using shared_ptr you could also 'take' objects from collection, make copies, remove from collection but use locally - behind the scenes it would be only the same only object.
All of these are valid answers, avoid Pointers, use copy constructors, etc. Unless you need to create a program that needs good performance, in my experience most of the performance related problems are with the copy constructors, and the overhead caused by them. (And smart pointers are not any better on this field, I'd to remove all my boost code and do the manual delete because it was taking too much milliseconds to do its job).
If you're creating a "simple" program (although "simple" means you should go with java or C#) then use copy constructors, avoid pointers and use smart pointers to deallocate the used memory, if you're creating a complex programs or you need a good performance, use pointers all over the place, and avoid copy constructors (if possible), just create your set of rules to delete pointers and use valgrind to detect memory leaks,
Maybe I will get some negative points, but I think you'll need to get the full picture to take your design choices.
I think that saying "if you're returning pointers your design is wrong" is little misleading. The output parameters tends to be confusing because it's not a natural choice for "returning" results.
I know this question is old, but I don't see any other argument pointing out the performance overhead of that design choices.

Can I have polymorphic containers with value semantics in C++?

As a general rule, I prefer using value rather than pointer semantics in C++ (ie using vector<Class> instead of vector<Class*>). Usually the slight loss in performance is more than made up for by not having to remember to delete dynamically allocated objects.
Unfortunately, value collections don't work when you want to store a variety of object types that all derive from a common base. See the example below.
#include <iostream>
using namespace std;
class Parent
{
public:
Parent() : parent_mem(1) {}
virtual void write() { cout << "Parent: " << parent_mem << endl; }
int parent_mem;
};
class Child : public Parent
{
public:
Child() : child_mem(2) { parent_mem = 2; }
void write() { cout << "Child: " << parent_mem << ", " << child_mem << endl; }
int child_mem;
};
int main(int, char**)
{
// I can have a polymorphic container with pointer semantics
vector<Parent*> pointerVec;
pointerVec.push_back(new Parent());
pointerVec.push_back(new Child());
pointerVec[0]->write();
pointerVec[1]->write();
// Output:
//
// Parent: 1
// Child: 2, 2
// But I can't do it with value semantics
vector<Parent> valueVec;
valueVec.push_back(Parent());
valueVec.push_back(Child()); // gets turned into a Parent object :(
valueVec[0].write();
valueVec[1].write();
// Output:
//
// Parent: 1
// Parent: 2
}
My question is: Can I have have my cake (value semantics) and eat it too (polymorphic containers)? Or do I have to use pointers?
Since the objects of different classes will have different sizes, you would end up running into the slicing problem if you store them as values.
One reasonable solution is to store container safe smart pointers. I normally use boost::shared_ptr which is safe to store in a container. Note that std::auto_ptr is not.
vector<shared_ptr<Parent>> vec;
vec.push_back(shared_ptr<Parent>(new Child()));
shared_ptr uses reference counting so it will not delete the underlying instance until all references are removed.
I just wanted to point out that vector<Foo> is usually more efficient than vector<Foo*>. In a vector<Foo>, all the Foos will be adjacent to each other in memory. Assuming a cold TLB and cache, the first read will add the page to the TLB and pull a chunk of the vector into the L# caches; subsequent reads will use the warm cache and loaded TLB, with occasional cache misses and less frequent TLB faults.
Contrast this with a vector<Foo*>: As you fill the vector, you obtain Foo*'s from your memory allocator. Assuming your allocator is not extremely smart, (tcmalloc?) or you fill the vector slowly over time, the location of each Foo is likely to be far apart from the other Foos: maybe just by hundreds of bytes, maybe megabytes apart.
In the worst case, as you scan through a vector<Foo*> and dereferencing each pointer you will incur a TLB fault and cache miss -- this will end up being a lot slower than if you had a vector<Foo>. (Well, in the really worst case, each Foo has been paged out to disk, and every read incurs a disk seek() and read() to move the page back into RAM.)
So, keep on using vector<Foo> whenever appropriate. :-)
Yes, you can.
The boost.ptr_container library provides polymorphic value semantic versions of the standard containers. You only have to pass in a pointer to a heap-allocated object, and the container will take ownership and all further operations will provide value semantics , except for reclaiming ownership, which gives you almost all the benefits of value semantics by using a smart pointer.
You might also consider boost::any. I've used it for heterogeneous containers. When reading the value back, you need to perform an any_cast. It will throw a bad_any_cast if it fails. If that happens, you can catch and move on to the next type.
I believe it will throw a bad_any_cast if you try to any_cast a derived class to its base. I tried it:
// But you sort of can do it with boost::any.
vector<any> valueVec;
valueVec.push_back(any(Parent()));
valueVec.push_back(any(Child())); // remains a Child, wrapped in an Any.
Parent p = any_cast<Parent>(valueVec[0]);
Child c = any_cast<Child>(valueVec[1]);
p.write();
c.write();
// Output:
//
// Parent: 1
// Child: 2, 2
// Now try casting the child as a parent.
try {
Parent p2 = any_cast<Parent>(valueVec[1]);
p2.write();
}
catch (const boost::bad_any_cast &e)
{
cout << e.what() << endl;
}
// Output:
// boost::bad_any_cast: failed conversion using boost::any_cast
All that being said, I would also go the shared_ptr route first! Just thought this might be of some interest.
While searching for an answer to this problem, I came across both this and a similar question. In the answers to the other question you will find two suggested solutions:
Use std::optional or boost::optional and a visitor pattern. This solution makes it hard to add new types, but easy to add new functionality.
Use a wrapper class similar to what Sean Parent presents in his talk. This solution makes it hard to add new functionality, but easy to add new types.
The wrapper defines the interface you need for your classes and holds a pointer to one such object. The implementation of the interface is done with free functions.
Here is an example implementation of this pattern:
class Shape
{
public:
template<typename T>
Shape(T t)
: container(std::make_shared<Model<T>>(std::move(t)))
{}
friend void draw(const Shape &shape)
{
shape.container->drawImpl();
}
// add more functions similar to draw() here if you wish
// remember also to add a wrapper in the Concept and Model below
private:
struct Concept
{
virtual ~Concept() = default;
virtual void drawImpl() const = 0;
};
template<typename T>
struct Model : public Concept
{
Model(T x) : m_data(move(x)) { }
void drawImpl() const override
{
draw(m_data);
}
T m_data;
};
std::shared_ptr<const Concept> container;
};
Different shapes are then implemented as regular structs/classes. You are free to choose if you want to use member functions or free functions (but you will have to update the above implementation to use member functions). I prefer free functions:
struct Circle
{
const double radius = 4.0;
};
struct Rectangle
{
const double width = 2.0;
const double height = 3.0;
};
void draw(const Circle &circle)
{
cout << "Drew circle with radius " << circle.radius << endl;
}
void draw(const Rectangle &rectangle)
{
cout << "Drew rectangle with width " << rectangle.width << endl;
}
You can now add both Circle and Rectangle objects to the same std::vector<Shape>:
int main() {
std::vector<Shape> shapes;
shapes.emplace_back(Circle());
shapes.emplace_back(Rectangle());
for (const auto &shape : shapes) {
draw(shape);
}
return 0;
}
The downside of this pattern is that it requires a large amount of boilerplate in the interface, since each function needs to be defined three times.
The upside is that you get copy-semantics:
int main() {
Shape a = Circle();
Shape b = Rectangle();
b = a;
draw(a);
draw(b);
return 0;
}
This produces:
Drew rectangle with width 2
Drew rectangle with width 2
If you are concerned about the shared_ptr, you can replace it with a unique_ptr.
However, it will no longer be copyable and you will have to either move all objects or implement copying manually.
Sean Parent discusses this in detail in his talk and an implementation is shown in the above mentioned answer.
Take a look at static_cast and reinterpret_cast
In C++ Programming Language, 3rd ed, Bjarne Stroustrup describes it on page 130. There's a whole section on this in Chapter 6.
You can recast your Parent class to Child class. This requires you to know when each one is which. In the book, Dr. Stroustrup talks about different techniques to avoid this situation.
Do not do this. This negates the polymorphism that you're trying to achieve in the first place!
Most container types want to abstract the particular storage strategy, be it linked list, vector, tree-based or what have you. For this reason, you're going to have trouble with both possessing and consuming the aforementioned cake (i.e., the cake is lie (NB: someone had to make this joke)).
So what to do? Well there are a few cute options, but most will reduce to variants on one of a few themes or combinations of them: picking or inventing a suitable smart pointer, playing with templates or template templates in some clever way, using a common interface for containees that provides a hook for implementing per-containee double-dispatch.
There's basic tension between your two stated goals, so you should decide what you want, then try to design something that gets you basically what you want. It is possible to do some nice and unexpected tricks to get pointers to look like values with clever enough reference counting and clever enough implementations of a factory. The basic idea is to use reference counting and copy-on-demand and constness and (for the factor) a combination of the preprocessor, templates, and C++'s static initialization rules to get something that is as smart as possible about automating pointer conversions.
I have, in the past, spent some time trying to envision how to use Virtual Proxy / Envelope-Letter / that cute trick with reference counted pointers to accomplish something like a basis for value semantic programming in C++.
And I think it could be done, but you'd have to provide a fairly closed, C#-managed-code-like world within C++ (though one from which you could break through to underlying C++ when needed). So I have a lot of sympathy for your line of thought.
Just to add one thing to all 1800 INFORMATION already said.
You might want to take a look at "More Effective C++" by Scott Mayers "Item 3: Never treat arrays polymorphically" in order to better understand this issue.
I'm using my own templated collection class with exposed value type semantics, but internally it stores pointers. It's using a custom iterator class that when dereferenced gets a value reference instead of a pointer. Copying the collection makes deep item copies, instead of duplicated pointers, and this is where most overhead lies (a really minor issue, considered what I get instead).
That's an idea that could suit your needs.