Related
Is there a compromise between managing memory and easy/idiomatic programming in C++?
For example, let's say I have the following classes. We have an Item that someone may purchase, and a ShoppingList of those Items. Let's just say Item and ShoppingList are big data structures with more features than shown and that we would not want to pass them by value normally (but they are stripped down to the basics features in this example).
class Item {
private:
std::string name;
float cost;
public:
Item(std::string name, float cost) : name{name}, cost{cost} { }
std::string toString(void) {
return name + " = " + std::to_string(cost);
}
~Item() { std::cout << "~Item()" << std::endl; }
};
class ShoppingList {
private:
std::vector<Item *> items;
public:
ShoppingList() {}
void addItem(Item &item) {
items.push_back(&item);
}
std::vector<Item *> getItems() {
return items;
}
~ShoppingList() {
std::cout << "~ShoppingList()" << std::endl;
}
};
Now, I want to be able to conveniently do something like the following (inline the argument).
ShoppingList shoppingList{};
shoppingList.add(Item{"soap", 1.99}); // no
C++17 will not me allow to try this either.
ShoppingList shoppingList{};
shoppingList.add(&Item{"soap", 1.99}); // no
shoppingList.add(new Item{"soap", 1.99}); // no
Instead, I have to do something like this.
ShoppingList shoppingList{};
Item soap{"soap", 1.99};
shoppingList.add(soap);
The above is "ok" for adding 1 element, but things get "worse" when I try to use a for loop. The item created on each loop ends with the same address and I do not end up with the intended effect.
ShoppingList shoppingList{};
auto products = std::vector<std::string>{"brush", "comb"};
for (auto product : products) {
Item item{product, 0.99};
shoppingList.addItem(item);
}
It seems unreasonable, to me, to do the following.
ShoppingList shoppingList{};
Item item1{"item1", 0.99};
Item item2{"item2", 0.99};
Item item3{"item3", 0.99};
// and so on
shoppingList.addItem(item1);
shoppingList.addItem(item2);
shoppingList.addItem(item3);
// and so on
There's just so many ways of thinking about how to make it easier to use the code. One way would be to use raw pointers. So I may change the method signature of ShoppingList.addItem(...) to as follows.
void addItem(Item *item);
Then I can do some less difficult use of the objects as follows.
ShoppingList shoppingList{};
shoppingList.addItem(new Item{"item1", 0.99});
shoppingList.addItem(new Item{"item2", 0.99});
shoppingList.addItem(new Item{"item3", 0.99});
With this new method signature, for loops should work.
ShoppingList shoppingList{};
auto products = std::vector<std::string>{"brush", "comb"};
for (auto product : products) {
shoppingList.addItem(new Item{product, 0.99});
}
Ok, so this approach makes it easier to use/write the code, but valgrind analysis says Leak_DefintelyLost where I start using new.
Another approach is smart pointers. So let's wrap the objects into smart pointers; in particular shared_pointer. Here's the new ShoppingList using smart pointers.
class ShoppingList {
private:
std::vector<std::shared_ptr<Item>> items;
public:
ShoppingList() {}
void addItem(std::shared_ptr<Item> item) {
items.push_back(item);
}
std::vector<std::shared_ptr<Item>> getItems() {
return items;
}
~ShoppingList() {
std::cout << "~ShoppingList()" << std::endl;
}
};
Usage will look like the following.
ShoppingList shoppingList{};
shoppingList.addItem(std::make_shared<Item>("soap", 32.0));
shoppingList.addItem(std::make_shared<Item>("shampoo", 32.0));
valgrind does not detect a single memory leak issue.
The only problem I have with using shared_ptr is verbosity and nested types. This declaration looks bad to me. That is a lot of : and < with >.
std::vector<std::shared_ptr<Item>> items;
In one situation where I had a map of maps of big data structures, I considered using shared pointers and ended up with a declaration as follows. If another coder looks at what the intention is (or if I myself come back later to troubleshoot), what is the intention here?
std::map<std::string, std::map<std::string, std::vector<std::shared_ptr<SomeClass>>>> dataBag;
It seems to me that the : and < and > are polluting the code and intent all for the purpose of managing memory leaks. I am not sure what's good or bad or idiomatic to write C++17 while being concise but also preventing memory leaks boostrapped by the language constructs.
Now, I did encounter the idea of RAII. But mocking some classes with that approach (raw pointers only), valgrind still shows Leak_DefinitelyLost.
Any general guidance on I suppose what I would refer to as idiomatic C++ that is memory leakage sensitive/preventative would be helpful. I was going to go down the road of creating factories to store all the pointers that I will create across my API, but I realized that approach would cause the memory to continue to build up even when there are no longer references to some pointers (and smart pointers already handle that part).
At this point, I am almost resigned to the acceptance of living with smart pointers (though there may be diamond and colon operators all over the place) to avoid memory leaks.
One way or another, if you want a collection of items, you're going to have to create each item in the collection. Storing pointers to items mostly helps if they're really so tremendously huge that copying them is unacceptable, even if it happens only a few times, such as when a vector reallocates its memory.
As for passing by reference: right now, you're defining the parameter as an lvalue reference. That means the reference can only refer to an lvalue. Thus the requirement to create a named object (an lvalue) then pass it.
You apparently don't want that. In your case, you have a couple of choices. One is a reference to const. Unlike a (non-const) lvalue reference, this can bind to a temporary object.
Another choice would be to use an rvalue reference. An rvalue reference has the advantage that it can bind to an rvalue (such as a temporary object), and also that it's "aware" that it's dealing with an rvalue, so it can "steal" the contents of that object. This is particularly useful if the object mostly contains a pointer to the actual data, in which case you can just copy the pointer (and modify the existing object so it won't destroy the data when it's destroyed).
In most cases, you're best off starting with the simplest code that could work. Create the vector of objects, and live with the fact that they'll get copied. Sometime later, if you find that you have a real bottleneck from copying those objects as that vector is reallocated, it's soon enough to change your code--such as to store std::unique_ptrs instead of storing objects directly. But that's not your first choice. Chances are pretty good that when you profile the code, you'll find that copying those objects isn't a bottleneck at all, so putting work into keeping them from being copied would have been wasted.
A long time ago I get c++ advice.
Never use raw pointers.
As soon as you are not low-level library developer. You don't need raw pointers. Don't need. Period.
Suppose I have a class Widget with a container data member d_members, and another container data member d_special_members containing pointers to distinguished elements of d_members. The special members are determined in the constructor:
#include <vector>
struct Widget
{
std::vector<int> d_members;
std::vector<int*> d_special_members;
Widget(std::vector<int> members) : d_members(members)
{
for (auto& member : d_members)
if (member % 2 == 0)
d_special_members.push_back(&member);
}
};
What is the best way to implement the copy constructor and operator=() for such a class?
The d_special_members in the copy should point to the copy of d_members.
Is it necessary to repeat the work that was done in the constructor? I hope this can be avoided.
I would probably like to use the copy-and-swap idiom.
I guess one could use indices instead of pointers, but in my actual use case d_members has a type like std::vector< std::pair<int, int> > (and d_special_members is still just std::vector<int*>, so it refers to elements of pairs), so this would not be very convenient.
Only the existing contents of d_members (as given at construction time) are modified by the class; there is never any reallocation (which would invalidate the pointers).
It should be possible to construct Widget objects with d_members of arbitrary size at runtime.
Note that the default assignment/copy just copies the pointers:
#include <iostream>
using namespace std;
int main()
{
Widget w1({ 1, 2, 3, 4, 5 });
cout << "First special member of w1: " << *w1.d_special_members[0] << "\n";
Widget w2 = w1;
*w2.d_special_members[0] = 3;
cout << "First special member of w1: " << *w1.d_special_members[0] << "\n";
}
yields
First special member of w1: 2
First special member of w1: 3
What you are asking for is an easy way to maintain associations as data is moved to new memory locations. Pointers are far from ideal for this, as you have discovered. What you should be looking for is something relative, like a pointer-to-member. That doesn't quite apply in this case, so I would go with the closest alternative I see: store indices into your sub-structures. So store an index into the vector and a flag indicating the first or second element of the pair (and so on, if your structure gets even more complex).
The other alternative I see is to traverse the data in the old object to figure out which element a given special pointer points to -- essentially computing the indices on the fly -- then find the corresponding element in the new object and take its address. (Maybe you could use a calculation to speed this up, but I'm not sure that would be portable.) If there is a lot of lookup and not much copying, this might be better for overall performance. However, I would rather maintain the code that stores indices.
The best way is to use indices. Honestly. It makes moves and copies just work; this is a very useful property because it's so easy to get silently wrong behavior with hand written copies when you add members. A private member function that converts an index into a reference/pointer does not seem very onerous.
That said, there may still be similar situations where indices aren't such a good option. If you, for example have a unordered_map instead of a vector, you could of course still store the keys rather than pointers to the values, but then you are going through an expensive hash.
If you really insist on using pointers rather that indices, I'd probably do this:
struct Widget
{
std::vector<int> d_members;
std::vector<int*> d_special_members;
Widget(std::vector<int> members) : d_members(members)
{
for (auto& member : d_members)
if (member % 2 == 0)
d_special_members.push_back(&member);
}
Widget(const Widget& other)
: d_members(other.d_members)
, d_special_members(new_special(other))
{}
Widget& operator=(const Widget& other) {
d_members = other.d_members;
d_special_members = new_special(other);
}
private:
vector<int*> new_special(const Widget& other) {
std::vector<int*> v;
v.reserve(other.d_special_members.size());
std::size_t special_index = 0;
for (std::size_t i = 0; i != d_members.size(); ++i) {
if (&other.d_members[i] == other.d_special_members[special_index]) {
v.push_back(&d_members[i});
++special_index;
}
}
return v;
}
};
My implementation runs in linear time and uses no extra space, but exploits the fact (based on your sample code) that there are no repeats in the pointers, and that the pointers are ordered the same as the original data.
I avoid copy and swap because it's not necessary to avoid code duplication and there just isn't any reason for it. It's a possible performance hit to get strong exception safety, that's all. However, writing a generic CAS that gives you strong exception safety with any correctly implemented class is trivial. Class writers should usually not use copy and swap for the assignment operator (there are, no doubt, exceptions).
This work for me for vector of pairs, though it's terribly ugly and I would never use it in real code:
std::vector<std::pair<int, int>> d_members;
std::vector<int*> d_special_members;
Widget(const Widget& other) : d_members(other.d_members) {
d_special_members.reserve(other.d_special_members.size());
for (const auto p : other.d_special_members) {
ptrdiff_t diff = (char*)p - (char*)(&other.d_members[0]);
d_special_members.push_back((int*)((char*)(&d_members[0]) + diff));
}
}
For sake of brevity I used only C-like type cast, reinterpret_cast would be better. I am not sure whether this solution does not result in undefined behavior, in fact I guess it does, but I dare to say that most compilers will generate a working program.
I think using indexes instead of pointers is just perfect. You don't need any custom copy code then.
For convenience you may want to define a member function converting the index to actual pointer you want. Then your members can be of arbitrary complexity.
private:
int* getSpecialMemberPointerFromIndex(int specialIndex)
{
return &d_member[specialIndex];
}
I'm working on a program that stores a vital data structure as an unstructured string with program-defined delimiters (so we need to walk the string and extract the information we need as we go) and we'd like to convert it to a more structured data type.
In essence, this will require a struct with a field describing what kind of data the struct contains and another field that's a string with the data itself. The length of the string will always be known at allocation time. We've determined through testing that doubling the number of allocations required for each of these data types is an unnacceptable cost. Is there any way to allocate the memory for the struct and the std::string contained in the struct in a single allocation? If we were using cstrings I'd just have a char * in the struct and point it to the end of the struct after allocating a block big enough for the struct and string, but we'd prefer std::string if possible.
Most of my experience is with C, so please forgive any C++ ignorance displayed here.
If you have such rigorous memory needs, then you're going to have to abandon std::string.
The best alternative is to find or write an implementation of basic_string_ref (a proposal for the next C++ standard library), which is really just a char* coupled with a size. But it has all of the (non-mutating) functions of std::basic_string. Then you use a factory function to allocate the memory you need (your struct size + string data), and then use placement new to initialize the basic_string_ref.
Of course, you'll also need a custom deletion function, since you can't just pass the pointer to "delete".
Given the previously linked to implementation of basic_string_ref (and its associated typedefs, string_ref), here's a factory constructor/destructor, for some type T that needs to have a string on it:
template<typename T> T *Create(..., const char *theString, size_t lenstr)
{
char *memory = new char[sizeof(T) + lenstr + 1];
memcpy(memory + sizeof(T), theString, lenstr);
try
{
return new(memory) T(..., string_ref(theString, lenstr);
}
catch(...)
{
delete[] memory;
throw;
}
}
template<typename T> T *Create(..., const std::string & theString)
{
return Create(..., theString.c_str(), theString.length());
}
template<typename T> T *Create(..., const string_ref &theString)
{
return Create(..., theString.data(), theString.length());
}
template<typename T> void Destroy(T *pValue)
{
pValue->~T();
char *memory = reinterpret_cast<char*>(pValue);
delete[] memory;
}
Obviously, you'll need to fill in the other constructor parameters yourself. And your type's constructor will need to take a string_ref that refers to the string.
If you are using std::string, you can't really do one allocation for both structure and string, and you also can't make the allocation of both to be one large block. If you are using old C-style strings it's possible though.
If I understand you correctly, you are saying that through profiling you have determined that the fact that you have to allocate a string and another data member in your data structure imposes an unacceptable cost to you application.
If that's indeed the case I can think of a couple solutions.
You could pre-allocate all of these structures up front, before your program starts. Keep them in some kind of fixed collection so they aren't copy-constructed, and reserve enough buffer in your strings to hold your data.
Controversial as it may seem, you could use old C-style char arrays. It seems like you are fogoing much of the reason to use strings in the first place, which is the memory management. However in your case, since you know the needed buffer sizes at start up, you could handle this yourself. If you like the other facilities that string provides, bear in mind that much of that is still available in the <algorithm>s.
Take a look at Variable Sized Struct C++ - the short answer is that there's no way to do it in vanilla C++.
Do you really need to allocate the container structs on the heap? It might be more efficient to have those on the stack, so they don't need to be allocated at all.
Indeed two allocations can seem too high. There are two ways to cut them down though:
Do a single allocation
Do a single dynamic allocation
It might not seem so different, so let me explain.
1. You can use the struct hack in C++
Yes this is not typical C++
Yes this requires special care
Technically it requires:
disabling the copy constructor and assignment operator
making the constructor and destructor private and provide factory methods for allocating and deallocating the object
Honestly, this is the hard-way.
2. You can avoid allocating the outer struct dynamically
Simple enough:
struct M {
Kind _kind;
std::string _data;
};
and then pass instances of M on the stack. Move operations should guarantee that the std::string is not copied (you can always disable copy to make sure of it).
This solution is much simpler. The only (slight) drawback is in memory locality... but on the other hand the top of the stack is already in the CPU cache anyway.
C-style strings can always be converted to std::string as needed. In fact, there's a good chance that your observations from profiling are due to fragmentation of your data rather than simply the number of allocations, and creating an std::string on demand will be efficient. Of course, not knowing your actual application this is just a guess, and really one can't know this until it's tested anyways. I imagine a class
class my_class {
std::string data() const { return self._data; }
const char* data_as_c_str() const // In case you really need it!
{ return self._data; }
private:
int _type;
char _data[1];
};
Note I used a standard clever C trick for data layout: _data is as long as you want it to be, so long as your factory function allocates the extra space for it. IIRC, C99 even gave a special syntax for it:
struct my_struct {
int type;
char data[];
};
which has good odds of working with your C++ compiler. (Is this in the C++11 standard?)
Of course, if you do do this, you really need to make all of the constructors private and friend your factory function, to ensure that the factory function is the only way to actually instantiate my_class -- it would be broken without the extra memory for the array. You'll definitely need to make operator= private too, or otherwise implement it carefully.
Rethinking your data types is probably a good idea.
For example, one thing you can do is, rather than trying to put your char arrays into a structured data type, use a smart reference instead. A class that looks like
class structured_data_reference {
public:
structured_data_reference(const char *data):_data(data) {}
std::string get_first_field() const {
// Do something interesting with _data to get the first field
}
private:
const char *_data;
};
You'll want to do the right thing with the other constructors and assignment operator too (probably disable assignment, and implement something reasonable for move and copy). And you may want reference counted pointers (e.g. std::shared_ptr) throughout your code rather than bare pointers.
Another hack that's possible is to just use std::string, but store the type information in the first entry (or first several). This requires accounting for that whenever you access the data, of course.
I'm not sure if this exactly addressing your problem. One way you can optimize the memory allocation in C++ by using a pre-allocated buffer and then using a 'placement new' operator.
I tried to solve your problem as I understood it.
unsigned char *myPool = new unsigned char[10000];
struct myStruct
{
myStruct(char* aSource1, char* aSource2)
{
original = new (myPool) string(aSource1); //placement new
data = new (myPool) string(aSource2); //placement new
}
~myStruct()
{
original = NULL; //no deallocation needed
data = NULL; //no deallocation needed
}
string* original;
string* data;
};
int main()
{
myStruct* aStruct = new (myPool) myStruct("h1", "h2");
// Use the struct
aStruct = NULL; // No need to deallocate
delete [] myPool;
return 0;
}
[Edit] After, the comment from NicolBolas, the problem is bit more clear. I decided to write one more answer, eventhough in reality it is not that much advantageous than using a raw character array. But, I still believe that this is well within the stated constraints.
Idea would be to provide a custom allocater for the string class as specified in this SO question.
In the implementation of the allocate method, use the placement new as
pointer allocate(size_type n, void * = 0)
{
// fail if we try to allocate too much
if((n * sizeof(T))> max_size()) { throw std::bad_alloc(); }
//T* t = static_cast<T *>(::operator new(n * sizeof(T)));
T* t = new (/* provide the address of the original character buffer*/) T[n];
return t;
}
The constraint is that for the placement new to work, the original string address should be known to the allocater at run time. This can be achieved by external explicit setting before the new string member creation. However, this is not so elegant.
In essence, this will require a struct with a field describing what kind of data the struct contains and another field that's a string with the data itself.
I have a feeling that may you are not exploiting C++'s type-system to its maximum potential here. It looks and feels very C-ish (that is not a proper word, I know). I don't have concrete examples to post here since I don't have any idea about the problem you are trying to solve.
Is there any way to allocate the memory for the struct and the std::string contained in the struct in a single allocation?
I believe that you are worrying about the structure allocation followed by a copy of the string to the structure member? This ideally shouldn't happen (but of course, this depends on how and when you are initializng the members). C++11 supports move construction. This should take care of any extra string copies that you are worried about.
You should really, really post some code to make this discussion worthwhile :)
a vital data structure as an unstructured string with program-defined delimiters
One question: Is this string mutable? If not, you can use a slightly different data-structure. Don't store copies of parts of this vital data structure but rather indices/iterators to this string which point to the delimiters.
// assume that !, [, ], $, % etc. are your program defined delims
const std::string vital = "!id[thisisdata]$[moredata]%[controlblock]%";
// define a special struct
enum Type { ... };
struct Info {
size_t start, end;
Type type;
// define appropriate ctors
};
// parse the string and return Info obejcts
std::vector<Info> parse(const std::string& str) {
std::vector<Info> v;
// loop through the string looking for delims
for (size_t b = 0, e = str.size(); b < e; ++b) {
// on hitting one such delim create an Info
switch( str[ b ] ) {
case '%':
...
case '$;:
// initializing the start and then move until
// you get the appropriate end delim
}
// use push_back/emplace_back to insert this newly
// created Info object back in the vector
v.push_back( Info( start, end, kind ) );
}
return v;
}
According to what I found on the net, you should use references as much as you can, except when you must use a pointer. As far as I understood, the only reasons to use a pointer are:
when dealing with raw memory or
when you may have to return null or
when you are dealing with the returned newly allocated object from new().
The first cannot really be handled. It's intrinsic in the task.
The second can be handled by a Null object pattern, and return a reference to that Null object. It requires that you define a Null object per each class instance you want to be able to refer to as Null.
The third can be handled with a smart pointer.
What I am wondering is: is it possible to "go java in C++" and program completely using references disregarding pointers altogether ? Would it be possible to define, by language, a null object in C++ like in Java ?
Sorry if the question is badly stated or a FAQ, I am very rusty on C++ and I have to start a new project in it, so I need to re-learn a few new things and get used to think in a different way.
You cannot reseat a Reference but you can reseat a Pointer to point to new variables, this behavior cannot be simulated anyway in references(References always remain bound to the variable they are initialized to) and the convenience which pointers provide in achieving this would practically make C++ without pointers and only references virtually impossible .
What would be needed? Rebindable references - aka pointers. References are, once bound, not able to change their referee.
struct Anchor{ /*some data*/ };
struct Sprite{
void set_anchor(Anchor const& a){ _anchor = &a; }
Anchor const* _anchor;
};
struct Entity{
Anchror _anchor;
};
With something like this, you can just reposition a Sprite on the screen by changing its anchor. How would you do that with references?
You can't reseat a reference, and you can't have a null reference. What Java calls references are actually pointers. And I don't know why you would want to avoid them. References (in C++) have a specific role: they allow using the same syntax as pass/return by value, without the overhead; they can also be used for inout parameters and for exposing the internals of an object (e.g. an operator[] will often return a reference). Other than as function parameters or return values, they should be fairly rare.
The Null Object Pattern is really bad, just so you know. It's not any sort of far-reaching solution.
Pointers are needed to be rebindable, and that's it. Also, Java's "references" are actually C++'s "pointers". C++'s "references" have no Java equivalent.
I have reassigned C++ references in the past using code something like this:
struct EmptyType {};
const EmptyType Null;
template<typename DataT>
class DynamicReferenceT
{
union
{
DataT & _ref;
DataT * _ptr;
};
public:
DynamicReferenceT(DataT & r) : _ref(r) { }
void operator=(DataT & r) { _ptr = &r; }
void operator=(const EmptyType &) { _ptr = NULL; }
DataT & operator()() { return _ref; }
};
class SomeClass
{
int _value;
public:
SomeClass(int val) : _value(val) {}
int value() { return _value; }
};
int _tmain(int argc, _TCHAR* argv[])
{
SomeClass objA1(100), objA2(200), objA3(300);
// initially define the dynamic ref
DynamicReferenceT<SomeClass> refObj(objA1);
cout << "refObj = " << refObj().value() << endl;
// reassign the dynamic ref
refObj = objA2;
cout << "refObj = " << refObj().value() << endl;
refObj = objA3;
cout << "refObj = " << refObj().value() << endl;
// assign "null" to reference
refObj = Null;
return 0;
}
// output
100
200
300
You can't overload the . operator in C++, so I overload the () operator to "dereference" the dynamic reference.
This is just a toy example -- you should further flesh out the implementation to handle deletion of objects and copying.
Update
I concede to the C++ community here that this was a very poor C++ example on my part and really stands as a good example of how to abuse C++. I don't actually promote such constructs in practice save for the one case I mentioned where I had to redefine references in a very, very bad legacy C++ library I inherited some years ago (at a dot-com who shall remain nameless).
As a general rule, I prefer using value rather than pointer semantics in C++ (ie using vector<Class> instead of vector<Class*>). Usually the slight loss in performance is more than made up for by not having to remember to delete dynamically allocated objects.
Unfortunately, value collections don't work when you want to store a variety of object types that all derive from a common base. See the example below.
#include <iostream>
using namespace std;
class Parent
{
public:
Parent() : parent_mem(1) {}
virtual void write() { cout << "Parent: " << parent_mem << endl; }
int parent_mem;
};
class Child : public Parent
{
public:
Child() : child_mem(2) { parent_mem = 2; }
void write() { cout << "Child: " << parent_mem << ", " << child_mem << endl; }
int child_mem;
};
int main(int, char**)
{
// I can have a polymorphic container with pointer semantics
vector<Parent*> pointerVec;
pointerVec.push_back(new Parent());
pointerVec.push_back(new Child());
pointerVec[0]->write();
pointerVec[1]->write();
// Output:
//
// Parent: 1
// Child: 2, 2
// But I can't do it with value semantics
vector<Parent> valueVec;
valueVec.push_back(Parent());
valueVec.push_back(Child()); // gets turned into a Parent object :(
valueVec[0].write();
valueVec[1].write();
// Output:
//
// Parent: 1
// Parent: 2
}
My question is: Can I have have my cake (value semantics) and eat it too (polymorphic containers)? Or do I have to use pointers?
Since the objects of different classes will have different sizes, you would end up running into the slicing problem if you store them as values.
One reasonable solution is to store container safe smart pointers. I normally use boost::shared_ptr which is safe to store in a container. Note that std::auto_ptr is not.
vector<shared_ptr<Parent>> vec;
vec.push_back(shared_ptr<Parent>(new Child()));
shared_ptr uses reference counting so it will not delete the underlying instance until all references are removed.
I just wanted to point out that vector<Foo> is usually more efficient than vector<Foo*>. In a vector<Foo>, all the Foos will be adjacent to each other in memory. Assuming a cold TLB and cache, the first read will add the page to the TLB and pull a chunk of the vector into the L# caches; subsequent reads will use the warm cache and loaded TLB, with occasional cache misses and less frequent TLB faults.
Contrast this with a vector<Foo*>: As you fill the vector, you obtain Foo*'s from your memory allocator. Assuming your allocator is not extremely smart, (tcmalloc?) or you fill the vector slowly over time, the location of each Foo is likely to be far apart from the other Foos: maybe just by hundreds of bytes, maybe megabytes apart.
In the worst case, as you scan through a vector<Foo*> and dereferencing each pointer you will incur a TLB fault and cache miss -- this will end up being a lot slower than if you had a vector<Foo>. (Well, in the really worst case, each Foo has been paged out to disk, and every read incurs a disk seek() and read() to move the page back into RAM.)
So, keep on using vector<Foo> whenever appropriate. :-)
Yes, you can.
The boost.ptr_container library provides polymorphic value semantic versions of the standard containers. You only have to pass in a pointer to a heap-allocated object, and the container will take ownership and all further operations will provide value semantics , except for reclaiming ownership, which gives you almost all the benefits of value semantics by using a smart pointer.
You might also consider boost::any. I've used it for heterogeneous containers. When reading the value back, you need to perform an any_cast. It will throw a bad_any_cast if it fails. If that happens, you can catch and move on to the next type.
I believe it will throw a bad_any_cast if you try to any_cast a derived class to its base. I tried it:
// But you sort of can do it with boost::any.
vector<any> valueVec;
valueVec.push_back(any(Parent()));
valueVec.push_back(any(Child())); // remains a Child, wrapped in an Any.
Parent p = any_cast<Parent>(valueVec[0]);
Child c = any_cast<Child>(valueVec[1]);
p.write();
c.write();
// Output:
//
// Parent: 1
// Child: 2, 2
// Now try casting the child as a parent.
try {
Parent p2 = any_cast<Parent>(valueVec[1]);
p2.write();
}
catch (const boost::bad_any_cast &e)
{
cout << e.what() << endl;
}
// Output:
// boost::bad_any_cast: failed conversion using boost::any_cast
All that being said, I would also go the shared_ptr route first! Just thought this might be of some interest.
While searching for an answer to this problem, I came across both this and a similar question. In the answers to the other question you will find two suggested solutions:
Use std::optional or boost::optional and a visitor pattern. This solution makes it hard to add new types, but easy to add new functionality.
Use a wrapper class similar to what Sean Parent presents in his talk. This solution makes it hard to add new functionality, but easy to add new types.
The wrapper defines the interface you need for your classes and holds a pointer to one such object. The implementation of the interface is done with free functions.
Here is an example implementation of this pattern:
class Shape
{
public:
template<typename T>
Shape(T t)
: container(std::make_shared<Model<T>>(std::move(t)))
{}
friend void draw(const Shape &shape)
{
shape.container->drawImpl();
}
// add more functions similar to draw() here if you wish
// remember also to add a wrapper in the Concept and Model below
private:
struct Concept
{
virtual ~Concept() = default;
virtual void drawImpl() const = 0;
};
template<typename T>
struct Model : public Concept
{
Model(T x) : m_data(move(x)) { }
void drawImpl() const override
{
draw(m_data);
}
T m_data;
};
std::shared_ptr<const Concept> container;
};
Different shapes are then implemented as regular structs/classes. You are free to choose if you want to use member functions or free functions (but you will have to update the above implementation to use member functions). I prefer free functions:
struct Circle
{
const double radius = 4.0;
};
struct Rectangle
{
const double width = 2.0;
const double height = 3.0;
};
void draw(const Circle &circle)
{
cout << "Drew circle with radius " << circle.radius << endl;
}
void draw(const Rectangle &rectangle)
{
cout << "Drew rectangle with width " << rectangle.width << endl;
}
You can now add both Circle and Rectangle objects to the same std::vector<Shape>:
int main() {
std::vector<Shape> shapes;
shapes.emplace_back(Circle());
shapes.emplace_back(Rectangle());
for (const auto &shape : shapes) {
draw(shape);
}
return 0;
}
The downside of this pattern is that it requires a large amount of boilerplate in the interface, since each function needs to be defined three times.
The upside is that you get copy-semantics:
int main() {
Shape a = Circle();
Shape b = Rectangle();
b = a;
draw(a);
draw(b);
return 0;
}
This produces:
Drew rectangle with width 2
Drew rectangle with width 2
If you are concerned about the shared_ptr, you can replace it with a unique_ptr.
However, it will no longer be copyable and you will have to either move all objects or implement copying manually.
Sean Parent discusses this in detail in his talk and an implementation is shown in the above mentioned answer.
Take a look at static_cast and reinterpret_cast
In C++ Programming Language, 3rd ed, Bjarne Stroustrup describes it on page 130. There's a whole section on this in Chapter 6.
You can recast your Parent class to Child class. This requires you to know when each one is which. In the book, Dr. Stroustrup talks about different techniques to avoid this situation.
Do not do this. This negates the polymorphism that you're trying to achieve in the first place!
Most container types want to abstract the particular storage strategy, be it linked list, vector, tree-based or what have you. For this reason, you're going to have trouble with both possessing and consuming the aforementioned cake (i.e., the cake is lie (NB: someone had to make this joke)).
So what to do? Well there are a few cute options, but most will reduce to variants on one of a few themes or combinations of them: picking or inventing a suitable smart pointer, playing with templates or template templates in some clever way, using a common interface for containees that provides a hook for implementing per-containee double-dispatch.
There's basic tension between your two stated goals, so you should decide what you want, then try to design something that gets you basically what you want. It is possible to do some nice and unexpected tricks to get pointers to look like values with clever enough reference counting and clever enough implementations of a factory. The basic idea is to use reference counting and copy-on-demand and constness and (for the factor) a combination of the preprocessor, templates, and C++'s static initialization rules to get something that is as smart as possible about automating pointer conversions.
I have, in the past, spent some time trying to envision how to use Virtual Proxy / Envelope-Letter / that cute trick with reference counted pointers to accomplish something like a basis for value semantic programming in C++.
And I think it could be done, but you'd have to provide a fairly closed, C#-managed-code-like world within C++ (though one from which you could break through to underlying C++ when needed). So I have a lot of sympathy for your line of thought.
Just to add one thing to all 1800 INFORMATION already said.
You might want to take a look at "More Effective C++" by Scott Mayers "Item 3: Never treat arrays polymorphically" in order to better understand this issue.
I'm using my own templated collection class with exposed value type semantics, but internally it stores pointers. It's using a custom iterator class that when dereferenced gets a value reference instead of a pointer. Copying the collection makes deep item copies, instead of duplicated pointers, and this is where most overhead lies (a really minor issue, considered what I get instead).
That's an idea that could suit your needs.