How to efficiently restore an stl container to old values

How to efficiently restore an stl container to old values - c++

I have a large class representing a graph. This class containing several containers (vector and set) of complex types.
During search I need to modify the graph to avoid loops in the results.
Since I have to run many searches I need to restore the class to its originial state very often.
Currently I am simply assigning the saved continers to the modified ones:
void Graph::restore(){
mEdges=mSafeEdges; //std::vector<Edge> Edge has no heap based data
mNodes=mSafeNodes; //std::vector<GraphNode> A Graph Node contains std::set<int>
}
As I said, Edges and nodes are complex, with each node conaining e.g. a set.
Each pair of containers has equal size. Profiling my code showed that the simple restore function is the major bottleneck of the programme, taking approx 6ms at each run. The edge vector takes 1.5 ms to copy and the nodes 4.5. Is there a better, faster way to copy containers of complex types or at least to copy the Edge Vector ?

If you only modify some of the objects on each search then you could see whether copy-on-write helps.
template<typename T>
class Cow {
std::shared_ptr<T> owned;
const Node* non_owned;
public:
explicit Cow(const T& n) : non_owned(n) { }
const T& get() const { return *non_owned; }
T& copy() {
if (!owned) {
owned = std::make_shared<T>(*non_owned);
non_owned = owned.get();
}
return *owned;
}
};
Then replace mEdges and mNodes with containers of Cow<Edge> and Cow<GraphNode> (or only do it for nodes, since that's the more expensive type to copy).
You'd have to modify the search logic to work with the wrapper type (or give it a conversion operator to const T&) and then explicitly add a call to copy() when you want a modifiable object, but you would avoid making copies of objects unless necessary.

Related

Intrinsic benefit of using a LinkedList class to point to head Node vs. just using Node*

We can define a LinkedListNode as below:
template <typename T>
struct LinkedListNode {
T val;
LinkedListNode* next;
LinkedListNode() : val{}, next(nullptr) {}
LinkedListNode(T x) : val{ x }, next(nullptr) {}
LinkedListNode(T x, LinkedListNode* next) : val{ x }, next(next) {}
};
If we want to define a function that takes a "Linked List", we have two options. First, we could pass a LinkedListNode* to the function.
template <typename T>
int func(LinkedListNode<T>* node);
Second, we could define a LinkedList class that holds a pointer to the "head" node. Then we could define a function that takes a LinkedList.
template <typename T>
struct LinkedList {
LinkedListNode<T>* head;
// other member functions
};
template <typename T>
int func(LinkedList<T>& llist);
One reason the second appears preferable because it might allow better encapsulation of functions that modify a "Linked List". For example, a FindMax that takes a LinkedListNode* might better fit as a member function of LinkedList than as a member function of LinkedListNode.
What concrete reasons are there to prefer one over the other? I'm especially interested in reasons you might prefer to just use LinkedListNode*s.

I think before you even choose to use a singly linked list, you should have some reason to use it over plain std::vector. You need actual benchmarks that show that a singly linked list would improve performance in the particular application you have in mind; you'd be surprised how often it makes things worse, not better. Hint: theoretic computational complexity is orthogonal from memory access patterns, and on modern CPUs the memory access patterns determine performance - most computation is essentially free, in that it takes no extra time: it gets hidden under all the cache misses.
Then you should have a reason not to use std::forward_list. But maybe you need intrusive linked lists: then make a case for not using boost::intrusive::slist<T> or a similar existing and well tested library type.
If you're still going forward with your own implementation, then the very first step would be to use std::unique_ptr as the owning pointer for child nodes, instead of manual memory management - that way it'll be very easy to show that no memory is being leaked - the code becomes correct by construction and memory leaks require extra effort vs. happening by omission.
In other words: don't reinvent the wheel unless you have a well articulated reason for that. Of course, you can implement linked lists all you want as an exercise, but be aware that you're most likely implementing a container that you'll make the least use of - so I'd argue that you'd learn a lot more about how C++ works by implementing e.g. a vector/array container.
If you do use std::unique_ptr, or even manual memory management, you're likely to run into the destructor stack explosion pitfall. Consider
template <typename T>
struct LinkedListNode1 {
T val;
std::unique_ptr<LinkedListNode1> next;
};
template <typename T>
struct LinkedListNode2 {
T val;
LinkedListNode2* next = nullptr;
~LinkedListNode2() { delete next; }
};
In both cases, the destructor gets invoked recursively, and if the list is sufficiently long, you'll run out of stack. Recursion is also usually less efficient than a loop. To prevent that, you must be never deallocating nodes that have non-null next.
template <typename T>
struct LinkedListNode1 {
T val;
std::unique_ptr<LinkedListNode1> next;
~LinkedListNode1() {
auto node = std::move(next);
while (node)
node = std::move(node->next);
assert(!next);
}
};
template <typename T>
struct LinkedListNode2 {
T val;
LinkedListNode2* next = {};
~LinkedListNode2() {
using std::swap;
LinkedListNode2* node = {};
swap(node, next);
while (node) {
LinkedListNode2* tmp = {};
swap(tmp, node);
assert(!node);
swap(node, tmp->next);
assert(!tmp->next);
delete tmp;
}
assert(!next);
}
};
Smart pointers make the code much simpler. I wrote the raw pointer version with swaps to make it easier to show that no memory is leaking: a swap used correctly never "loses" a value.
For example, a FindMax that takes a LinkedListNode*
That's again reinventing the wheel. In C++, the idiom for "finding a maximum element" is std::max_element from #include <algorithm>. You should leverage the algorithms that the standard library provides (and any others you may need, e.g. from Boost or header-only libraries).
To do that, you need an iterator for the list. It will be, by necessity, a LegacyForwardIterator. Here, is a has a strict technical meaning: it's a concise way of saying "your iterator will fulfill the concept of and abide by the contract of LegacyForwardIterator".
Such an iterator would look very roughly as follows:
template <typename T>
class LinkedListNode1 {
std::unique_ptr<LinkedListNode1> next;
template <typename V> class iterator_impl {
LinkedListNode1 *node = {};
using const_value_type = std::add_const_t<V>;
using non_const_value_type = std::remove_const_t<V>;
public:
using value_type = V;
using reference = V&;
using pointer = V*;
iterator_impl() = default;
template <typename VO>
iterator_impl(const iterator_impl<VO> &o) : node(o.operator->()) {}
explicit iterator_impl(LinkedListNode1 *node) : node(node) {}
auto *operator->() const { return node; }
pointer operator&() const { return &(node->val); }
reference operator*() const { return node->val; }
iterator_impl &operator++() { node = node->next.get(); return *this; }
iterator_impl operator++(int) {
auto retval = *this;
this->operator++();
return retval;
}
bool operator==(const iterator_impl &o) const { return node == o.node; }
bool operator!=(const iterator_impl &o) const { return node != o.node; }
};
public:
T val;
using iterator = iterator_impl<T>;
using const_iterator = iterator_impl<const T>;
The next pointer can be made private. Then, the basic functionality would include:
LinkedListNode1() = default;
LinkedListNode1(const T &val) : val(val) {}
~LinkedListNode1() {
auto node = std::move(next);
while (node)
node = std::move(node->next);
}
iterator begin() { return iterator(this); }
iterator end() { return {}; }
const_iterator begin() const { return const_iterator(this); }
const_iterator end() const { return {}; }
const_iterator cbegin() const { return const_iterator(this); }
const_iterator cend() const { return {}; }
iterator insert_after(const_iterator pos, const T& value) {
auto next = std::make_unique<LinkedListNode1>();
next->val = value;
auto retval = iterator(next.get());
pos->next = std::move(next);
return retval;
}
One would use insert_after to extend the list. Other such methods would need to be added, of course.
Then, we'd probably also want to support initializer lists:
LinkedListNode1(std::initializer_list<T> init) {
auto src = init.begin();
if (src == init.end()) return;
val = *src++;
for (auto dst = iterator(this); src != init.end(); ++src)
dst = insert_after(dst, *src);
}
};
Now you can pre-populate the list with an initializer list, iterate it using range-for, and use it with standard algorithms:
#include <iostream>
int main() {
LinkedListNode1<int> list{1, 3, 2};
for (auto const &val : list)
std::cout << val << '\n';
assert(*std::max_element(list.begin(), list.end()) == 3);
}
But now we come to the most important question:
What concrete reasons are there to prefer one over the other
The default - the starting point - is to provide a container, since that's the abstraction we deal with: the "thing" that you think of is a linked list, not a list node. The data structure you learn of is, again, a linked list. And for a good reason: The node type is an implementation detail, so you'd need to come up with application-specific reasons for exposing the node type, and any argument made must stand up to the scrutiny when faced with iterators. Do you really need to expose those nodes, or is what you actually want just a convenient way to iterate over the items stored in the collection, perhaps split the list, etc? Node access is not necessary for any of it. It's all a solved problem, as you'll learn by reading the documentation of std::forward_list.
You'd also want to consider allocator support. I'd not worry about the C++98 allocators, but the polymorphic allocators are (finally!) actually usable, so you'd want to implement those (c.f. std::pmr::polymorphic_allocator and the std::pmr namespace in general).
For full functionality, you'd pretty much need to add most of std::forward_list's methods and constructors. So it's a bit of work, and there are lots of details to make it work well no matter the value type. And thus we come full circle: real containers that are meant to be useful without worrying about low-level details are lots of work, but they are a joy to use - and they look nothing like most textbook "teaching" code.
A linked list is often used when teaching data structures - true. Yet most C++ books used in teaching are woefully inadequate in demonstrating what a modern, fully functional data structure/container entails - they can't even get that right for something as "simple" as a singly linked list.
The gap between a C-like singly linked list - exactly what you started with in the question - and a singly linked list C++ container is on the order of a couple thousand lines of code and tests. That's what they don't usually teach, and that's where the most important bits really are: they are the difference between toy code, and production code.
Even without tests, a fully functional singly linked list container is ~500 lines without polymorphic allocator support, and probably at least double that with such support, and tests would double the code size several times - although if you were clever about it, you could reuse a lot of the tests used by various STL implementations :)
And, by the way: a decent implementation of a linked list in C won't force you to manually deal with nodes either. The list itself - the container - will be an abstract data type with a bunch of functions that provide the functionality, and with some abstraction for iterators as well (even though they'll be just pointers in some type-safe disguise). This is again the difference between teaching code and easy-to-use-correctly and hard-to-use-incorrectly production code. One example I can think of right now are the stretchy buffers, as implemented in Bitwise ion project. This is a link to a video where those are implemented live, and they serve as a decent example of how abstractions work in C (and also how you definitely shouldn't be writing this in C++ - C and C++ are different languages!).

Defining an actual LinkedList type allows you to directly support operations that would be relatively difficult to support by just passing around a pointer to a node.
One comment has already mentioned storing the size of the linked list as a member, so you can have a function return the size of the linked list in constant time. That is a useful thing to do, but I think it only hints at the real point, which (in my opinion) is having things that apply to the linked list as a whole, rather than just operations on individual nodes.
In C++, one obvious possibility here is having a destructor that properly destroys a complete linked list when it goes out of scope.
int foo() {
LinkedList a;
// code that uses `a`
} // <-- here `a` goes out of scope, and should be destroyed
One of the big features of C++ as a whole is deterministic destruction, and its support for that is based on destructors that run when objects go out of scope.
With a linked list, you'd (at least normally) plan on all the nodes in the linked list being allocated dynamically. If you just use a pointer to node, it'll be up to you to keep track of when you no longer need/want a particular linked list, and manually destroy all the nodes in the linked list when it's no longer needed.
By creating a linked-list class, you get the benefit of deterministic destruction, so you no longer need to keep track of when a list is no longer needed--the compiler tracks that for you, and when it goes out of scope, it gets destroyed automatically.
I'd also expect a linked list to support copy construction, move construction, copy assignment, and move assignment--and probably also a few things like comparison (at least for in/equality, and possibly ordering). And all of these require a fair amount of manual intervention if you decide to implement your linked list as a pointer to a node, instead of having an actual linked list class.
As such, I'd say if you really want to use C++ (even close to how it's intended to work) creating a class to encapsulate your linked list is an absolute necessity. As long as you're just passing around pointers to nodes, what you're writing is fundamentally C (even if it may use some features specific to C++ so a C compiler won't accept it).

Overloading push_back() in vector to allow non-duplicate elements

Can we overload the push_back() method in std::vector to allow non-duplicate elements? I know std::set and std::unordered_set are supposed to avoid duplicate elements, but std::set sorts the elements and std::unordered_set stores the elements in no particular order. I need to retrieve the elements in the order they are inserted, while ensuring duplicate elements are not inserted.
Edit: There's a possible duplicate for this question here. The best solution to this duplicate proposes to have an auxiliary data structure and another custom method "add". This doesn't look good for me since(I'll put it in a separate documentation) the users inserting data in std::vector rarely refer to the documentation for any custom functions. If there's no efficient way though, this can be a last resort.

Many people advise against it, but it seems there's some kind of urban legend going around that doing so will cause the universe to undergo vacuum decay and reality as we know it will dissolve.
You can publicly inherit from std::vector. But you have to think about what you can do with that.
If you inherit from vector, it is highly recommended that you don't add any data members to it. This can cause object slicing (google "c++ object slicing".) You also need to keep in mind that vector is not using virtual functions. That means you cannot override member functions. You can only shadow them, so it's not guaranteed that it will always be your push_back() function that gets called. The original will get called if you pass an object of your class to something that takes a reference to a vector, for example.
So in the end, you'd need to add a push_back_unique() function instead. But that in turns means that can be served by a simple free function instead. So inheriting vector isn't needed. This of course means there's never a guarantee that the elements in the vector will be unique. Other code might use push_back() instead somewhere.
Inheriting vector makes sense if you want to add completely new convenience functions that don't impose or lift any restrictions that vector has. If you want something that looks like a vector but really isn't (because it has different behavior and/or restrictions), you should implement your own type that delegates the container functionality to vector by either inheriting privately from it, or by having it as a private data member, and then replicate the vector API through public wrapper functions.
But this is very tedious to implement. Usually, you don't really need all the API from vector. So I'd say just write a smaller class around vector that only provides the functionality you need. And that functionality sounds like it's going to be pretty much read-only, since allowing write access to the elements allows for setting an element to the same value as another, breaking the container's uniqueness. So you could do something like:
template<typename T>
class UniqueVector
{
public:
void push_back(T&& elem)
{
if (std::find(vec_.begin(), vec_.end(), elem) == vec_.end()) {
vec_.push_back(std::forward(elem));
}
}
const T& operator[](size_t index) const
{
return vec_[index];
}
auto begin() const
{
return vec_.cbegin();
}
auto end() const
{
return vec_.cend();
}
private:
std::vector<T> vec_;
};
If you still want to allow write access to individual elements, then you can provide non-const functions that check if the value that is passed is already in the vector. Like:
void assign_if_unique(size_t index, T&& value)
{
if (std::find(vec_.begin(), vec_.end(), value) == vec_.end()) {
vec_[index] = std::forward(value);
}
}
This is a minimal example. You should obviously add the functions you actually want. Like size(), empty(), and whatever else you need.

You should first define a free function1 to implement your feature:
template<class T>
std::vector<T>&
push_back_unique(std::vector<T>& dest, T const& src)
{ /* ... */ }
If you use this a lot, and if make sense regarding your program, you might want to define an operator to do so:
template<class T>
std::vector<T>& operator<<(std::vector<T>& dest, T const& src)
{ return push_back_unique(dest, src); }
This allows:
std::vector<int> data;
data << 5 << 8 << 13 << 5 << 21;
for (auto n : data) std::cout << n << " "; // prints 5 8 13 21
1) This is because inheriting from standard containers is often bad practice and brings pitfalls.

Programming a simple object oriented graph in C++

I am really trying to be a better programmer, and to make more modular, organized code.
As an exercise, I was trying to make a very simple Graph class in C++ with STL. In the code below, my Node object does not compile because the commented line results in a reference to a reference in STL.
#include <set>
class KeyComparable
{
public:
int key;
};
bool operator <(const KeyComparable & lhs, const KeyComparable & rhs)
{
return lhs.key < rhs.key;
}
class Node : public KeyComparable
{
public:
// the following line prevents compilation
// std::set<Node &> adjacent;
};
I would like to store the edges in a set (by key) because it allows fast removal of edges by key. If I were to store list<Node*>, that would work fine, but it wouldn't allow fast deletion by key.
If I use std::set<Node>, changes made through an edge will only change the local copy (not actually the adjacent Node). If I use std::set<Node*>, I don't believe the < operator will work because it will operate on the pointers themselves, and not the memory they index.
I considered wrapping references or pointers in another class, possibly my KeyComparable class (according to the linked page, this is how boost handles it).
Alternatively, I could store the std::list<Node*> and a std::map<int, iterator>' of locations in thestd::list`. I'm not sure if the iterators will stay valid as I change the list.
Ages ago, everything here would just be pointers and I'd handle all the data structures manually. But I'd really like to stop programming C-style in every language I use, and actually become a good programmer.
What do you think is the best way to handle this problem? Thanks a lot.

As you have deduced, you can't store references in STL containers because one of the requirements of items stored is that they be assignable. It's same reason why you can't store arrays in STL containers. You also can't overload operators without at least one being a user-defined type, which makes it appear that you can't do custom comparisons if you store pointers in an STL class...
However, you can still use std::set with pointers if you give set a custom comparer functor:
struct NodePtrCompare {
bool operator()(const Node* left, const Node* right) const {
return left->key < right->key;
}
};
std::set<Node*, NodePtrCompare> adjacent;
And you still get fast removals by key like you want.

C++ tree of pointers template question

I've just come across a nice STL-like tree container class written by Kasper Peeters:
http://tree.phi-sci.com/
However, because it's STL-like, it's geared towards having a single class type in the tree; i.e. template <class T>. The trouble is, like STL-lists, if it suffers from the polymorphic class problem, in that the objects in the tree that are pointers to heap based objects (like pointers to a base class), aren't destroyed when nodes are deleted.
So, my options:
1: Use a tree of boost::shared_ptr, although this is more expensive/overkill than I'd like.
2: Write a little pointer wrapper like the one I've written below. The idea being that it wraps a pointer, which when it goes out of scope, deletes its pointer. It's not ref counted, it's just guarantees the pointer destruction.
template<class T>
class TWSafeDeletePtr
{
private:
T *_ptr;
public:
TWSafeDeletePtr() : _ptr(NULL) {}
TWSafeDeletePtr(T *ptr) : _ptr(ptr) {}
~TWSafeDeletePtr()
{
delete _ptr;
}
T &operator=(T *ptr)
{
assert(!_ptr);
delete _ptr;
_ptr=ptr;
return *ptr;
}
void set(T *ptr)
{
*this=ptr;
}
T &operator*() const { return *_ptr; }
T *operator->() const { return _ptr; }
};
3: Write my own allocator which allocates the node objects from a pool in the allocate() and deletes the pointed to memory in the deallocate().
4: Specialise the code to make a tree of pointers, avoiding the initial allocation and copy construction, plus innately knowing how to delete the pointed-to data.
I already have option 2 working, but I'm not really happy with it, because I have to actually insert an empty ptr to begin with, then set() the pointer when the insert returns an iterator. This is because the tree uses copy construction, and hence the temporary object passed on the stack will ultimate delete the pointer when it goes out of scope. So I set the pointer upon return. It works, it's hidden, but I don't like it.
Option 3 is looking like the best candidate, however I thought I'd ask if anyone else has already done this, or has any other suggestions?
Ok, I'ved decided to go with option 1 (tree of shared_ptrs), mainly because it's using standard libraries, but also because the extra refcount per node won't break the bank. Thanks for the replies everyone :)
Cheers,
Shane

I don't like the allocator version, because allocators are supposed to allocate memory, not construct objects. So there's no guarantee that the number of requested allocations to the allocator matches the number of objects to be constructed; it would depend on the implementation whether you get away with it.
The tree calls the copy constructor on an inserted or appended value after the allocator has allocated the memory for it, so you would be hard pressed to write something which worked with polymorphic objects - alloc_.allocate doesn't know the runtime type of x before the constructor is called (lines 886 on):
tree_node* tmp = alloc_.allocate(1,0);
kp::constructor(&tmp->data, x);
Also looking at the code it doesn't seem to use assignment at all, and your wrapper only supplies the default copy constructor, so I can't see any of your suggested mechanisms working - when a node is assigned to the same value as it already holds with this code called (lines 1000 on):
template <class T, class tree_node_allocator>
template <class iter>
iter tree<T, tree_node_allocator>::replace(iter position, const T& x)
{
kp::destructor(&position.node->data);
kp::constructor(&position.node->data, x);
return position;
}
your smart pointer would destruct their referent when their destructor is called here; you may get away with it by passing a reference counted pointer instead (so x doesn't destroy its referent until it goes out of scope, rather than the position.node->data destructor destroying it).
If you want to use this tree, then I would either use it as an index into data owned elsewhere rather than the tree owning the data, or stick with the shared_ptr approach.

[Edit] Shane has chosen to go with the boost::shared_ptr solution and has pointed out that he needs to store polymorphic base pointers. Should memory/processing efficiency ever become a concern (after profiling, of course), consider a base pointer wrapper with safe copying behavior (e.g., deep-copying the pointee through a clone method) and the fast swap idiom shown in #5. This would be similar to suggested solution #2, but safe and without making assumptions on the data type being used (ex: trying to use auto_ptr with containers).
I think you should consider option #5.
1: Use a tree of boost::shared_ptr,
although this is more
expensive/overkill than I'd like.
First of all, do you realize that any linked structure like std::list, std::set, std::map requires a separate memory allocation/deallocation per node but doesn't require copying nodes to do things like rebalance the tree? The only time the reference counter will amount to any processing overhead is when you insert to the tree.
2: Write a little pointer wrapper like
the one I've written below. The idea
being that it wraps a pointer, which
when it goes out of scope, deletes its
pointer. It's not ref counted, it's
just guarantees the pointer
destruction.
For this tree you might be able to get away with it since you have the tree implementation, but it's a heavy assumption. Consider at least making the pointer wrapper non-copyable so that you'll get a compiler error if you ever try to use it for something which does copy node elements.
3: Write my own allocator which
allocates the node objects from a pool
in the allocate() and deletes the
pointed to memory in the deallocate().
If it's an STL-compliant memory allocator, it should not be making such assumptions about the memory contents in deallocate. Nevertheless, writing a fast memory allocator which can assume fixed allocation sizes can really speed up any linked structure. Writing a fast memory allocator that consistently outperforms malloc/free is non-trivial work, however, and there are issues to consider like memory alignment.
4: Specialise the code to make a tree
of pointers, avoiding the initial
allocation and copy construction, plus
innately knowing how to delete the
pointed-to data.
Making a wrapper for the tree is probably the most robust solution. You'll have full control over when to insert and remove elements (nodes) and can do whatever you like in the mean time.
Option #5: just store the element directly in the tree and focus on making the element fast.
This is your best bet if you ask me. Instead of map<int, ExpensiveElement*> or map<int, shared_ptr<ExpensiveElement> >, consider simply map<int, ExpensiveElement>.
After all, you obviously want the tree to be the memory manager (deleting a node deletes the element). That happens when we avoid the indirection of a pointer to the element already.
However, your concern seems to be the overhead of the copy-in policy of insert (copy ctor overhead on ExpensiveElement). No problem! Just use operator[] instead of insert:
map<int, ExpensiveElement> my_map;
// default constructs ExpensiveElement
// and calls ExpensiveElement::do_something().
// No copies of ExpensiveElement are made!
my_map[7].do_something();
Tada! No copying, no need to worry about proper destruction, and no memory allocation/deallocation overhead per element.
If default constructing ExpensiveElement won't suffice, then consider making default construction super cheap (practically free) and implement a swap method.
map<int, ExpensiveElement> my_map;
// construct an ExpensiveElement and swap it into the map
// this avoids redundant work and copying and can be very
// efficient if the default constructor of ExpensiveElement
// is cheap to call
ExpensiveElement element(...);
my_map[7].swap(element);
To make the default construction super cheap and allow for a swap method, you could implement a fast pimpl on ExpensiveElement. You can make it so the default ctor doesn't even allocate the pimpl, making it a null pointer, while the swap method swaps the two pimpls of ExpensiveElement. Now you have super cheap default construction and a way to swap properly constructed ExpensiveElements into the map, avoiding the redundancy of deep copies all together.
What if ExpensiveElement cannot have a default ctor?
Then make a wrapper which does. The approach can be similar to the pointer wrapper you suggested, except it will be a complete class with valid (safe) copying behavior. The copy ctor can deep copy the pointee, for example, if reference counting is undesired. The difference may sound subtle, but this way it's a very safe and complete solution which doesn't make assumptions about how the tree is implemented; safe and general like boost::shared_ptr but without the reference counting. Just provide a swap method as your one and only means to shallow swap data without requiring a deep copy and use it to swap things into the tree.
What if we need to store polymorphic base pointers?
See answer immediately above and modify the wrapper to call something like clone (prototype pattern) to deep copy the pointee.

First of all, you could benefit from move semantics here. If you have access to C++0x.
Otherwise, the Boost Pointer Container library has solved the issue of the STL containers of pointers by... recoding it all.
Another issue with containers of pointers that you did not mention is the copy of the container. In your case the original container and its copy both point to the same objects, so changing one will not change the other.
You can of course alleviate this by writing a proxy class which wraps the pointer and provides deep copying semantic (clone method in the object wrapped). But you will then copy the data more often that if the container is pointer aware.... it's less work though.
/// Requirement: T is a model of Cloneable
template <class T>
class Proxy
{
template <class> friend class Proxy;
public:
// Constructor
Proxy(): mPointer(0) {}
explicit Proxy(T* t): mPointer(t) {}
template <class U>
explicit Proxy(std::auto_ptr<U> t): mPointer(t.release()) {}
template <class U>
explicit Proxy(std::unique_ptr<U> t): mPointer(t.release()) {}
// Copy Constructor
Proxy(Proxy const& rhs):
mPointer(rhs.mPointer ? rhs.mPointer->clone() : 0) {}
template <class U>
Proxy(Proxy<U> const& rhs):
mPointer(rhs.mPointer ? rhs.mPointer->clone() : 0) {}
// Assignment Operator
Proxy& operator=(Proxy const& rhs)
{
Proxy tmp(rhs);
this->swap(tmp);
return *this;
}
template <class U>
Proxy& operator=(Proxy<U> const& rhs)
{
Proxy tmp(rhs);
this->swap(tmp);
return *this;
}
// Move Constructor
Proxy(Proxy&& rhs): mPointer(rhs.release()) {}
template <class U>
Proxy(Proxy<U>&& rhs): mPointer(rhs.release()) {}
// Move assignment
Proxy& operator=(Proxy&& rhs)
{
Proxy tmp(rhs);
this->swap(tmp);
return *this;
}
template <class U>
Proxy& operator=(Proxy&& rhs)
{
Proxy tmp(rhs);
this->swap(tmp);
return *this;
}
// Destructor
~Proxy() { delete mPointer; }
void swap(Proxy& rhs)
{
T* tmp = rhs.mPointer;
rhs.mPointer = mPointer;
mPointer = tmp;
}
T& operator*() { return *mPointer; }
T const& operator*() const { return *mPointer; }
T* operator->() { return mPointer; }
T const* operator->() const { return mPointer; }
T* release() { T* tmp = mPointer; mPointer = 0; return tmp; }
private:
T* mPointer;
};
// Free functions
template <class T>
void swap(Proxy<T>& lhs, Proxy<T>& rhs) { lhs.swap(rhs); }
Note that as well as providing deep-copying semantics, it provides deep-constness. This may or may not be to your taste.
It would also be good taste to provide equivalent to static_cast and dynamic_cast operations, this is left as an exercise to the reader ;)

It seems that the cleanest solution would be a container adaptor in the style of Boost Pointer Container. This would smooth the syntax a lot as well. However writing such an adaptor is tedious and repetive as you would have to "lift" typedefs and repeat every member function of the class that is to be adapted.

It looks like option 1 is probably the best. shared_ptr is very common and most people should know how it works. Is there a problem with the syntax map_obj[key].reset(new ValueType);?
Unless you have measurement data that your wrapper for option 2 is a significant savings in use compared to shared_ptr, shared_ptr seems safer since people will know about its semantics right away.
Option three seems complex for what it provides. You need to implement the allocate/construct and deallocate/destruct pairs, and make sure that if a node is copied around it will not have deletion problems.
Option four is probably the second-best option. I wouldn't suggest using it unless profiling shows that the shared_ptr operations really are that expensive though, since this requires reinventing code that's already been written and debugged in the standard library.

I'ved decided to go with option 1 (tree of shared_ptrs), mainly because it's using standard libraries, but also because the extra refcount per node won't break the bank. Thanks for the replies everyone :)

1.
I already have option 1 working, but I'm not really happy with it, because I have to actually insert an empty ptr to begin with, then set() the pointer when the insert returns an iterator. This is because the tree uses copy construction, and hence the temporary object passed on the stack will ultimate delete the pointer when it goes out of scope. So I set the pointer upon return. It works, it's hidden, but I don't like it.
Until there is at least one copy of the same shared_ptr, pointed object won't be destroyed so there is no problem you writing about.
2.Your class is pointless. Use shared_ptr.
3.The allocator would have to know what kind of object to create when asked for a piece of bytes, this is not well solution.
4.Too much complications.
I suggest solution 1.

Can I use the [] operator in C++ to create virtual arrays

I have a large code base, originally C ported to C++ many years ago, that is operating on a number of large arrays of spatial data. These arrays contain structs representing point and triangle entities that represent surface models. I need to refactor the code such that the specific way these entities are stored internally varies for specific scenarios. For example if the points lie on a regular flat grid, I don't need to store the X and Y coordinates, as they can be calculated on the fly, as can the triangles. Similarly, I want to take advantage of out of core tools such as STXXL for storage. The simplest way of doing this is replacing array access with put and get type functions, e.g.
point[i].x = XV;
becomes
Point p = GetPoint(i);
p.x = XV;
PutPoint(i,p);
As you can imagine, this is a very tedious refactor on a large code base, prone to all sorts of errors en route. What I'd like to do is write a class that mimics the array by overloading the [] operator. As the arrays already live on the heap, and move around with reallocs, the code already assumes that references into the array such as
point *p = point + i;
may not be used. Is this class feasible to write? For example writing the methods below in terms of the [] operator;
void MyClass::PutPoint(int Index, Point p)
{
if (m_StorageStrategy == RegularGrid)
{
int xoffs,yoffs;
ComputeGridFromIndex(Index,xoffs,yoffs);
StoreGridPoint(xoffs,yoffs,p.z);
} else
m_PointArray[Index] = p;
}
}
Point MyClass::GetPoint(int Index)
{
if (m_StorageStrategy == RegularGrid)
{
int xoffs,yoffs;
ComputeGridFromIndex(Index,xoffs,yoffs);
return GetGridPoint(xoffs,yoffs); // GetGridPoint returns Point
} else
return m_PointArray[Index];
}
}
My concern is that all the array classes I've seen tend to pass by reference, whereas I think I'll have to pass structs by value. I think it should work put other than performance, can anyone see any major pitfalls with this approach. n.b. the reason I have to pass by value is to get
point[a].z = point[b].z + point[c].z
to work correctly where the underlying storage type varies.

You should not need to pass the array by value. For mutating the values in the array, you want two versions of operator[], one which returns a reference (to mutate) and one a const reference.
There is no reason in principle not to use operator[], as long as you do not need to vary the type of the storage at run time - there are no virtual operators, so you would need a named function if you want runtime polymorphism. In that case, you can create a simple struct which adapts the operator calls to function calls (though it rather depends on the storage API - if the code assumes that assigning to the point's member variables changes the stored data, you might have to make the point type a template variable too so this can be overridden).
Looking at your sample code, it has a test for the storage strategy. Do not do this. Either use OO and have your storage object implement a common virtual interface, or (probably better) use template programming to vary the storage mechanism.
If you look at the guarantees made by std::vector (in more recent C++ standards), then it is possible to have something which has dynamic storage and allows use of pointer arithmetic, though that requires contiguous storage. Given that some of your values are created on the fly, it is probably not worth placing that restriction on your implementations, but the constraint itself does not prevent use of operator[].

What you want is possible, but as you need write access as well, the result will be a little bit more complex sometimes. What you want is the setter function returning not a direct "Point write access", rather a temporary copy, which will do the write once the copy goes out of the scope.
Following code fragment tries to outline the solution:
class PointVector
{
MyClass container_;
public:
class PointExSet: public Point
{
MyClass &container_;
int index_;
public:
PointExSet(MyClass &container, int index)
:Point(container.GetVector(index)),container_(container),index_(index)
{
}
~PointExSet()
{
container_.PutVector(index_) = *this;
}
};
PointExSet operator [] (int i)
{
return PointExSet(container_,i);
}
};
It is not as nice as you would probably hope it to be, but I am afraid you cannot get a much better solution in C++.

To have a full control over operations on array, operator[] should return a special object (invented long ago and called "cursor") that will handle operations for you.
As an example:
class Container
{
PointCursor operator [] (int i)
{
return PointCursor(this,i);
}
};
class PointCursor
{
public:
PointCursor(_container, _i)
: container(_container), i(_i),
//initialize subcursor
x(container, i) {}
//subcursor
XCursor x;
private:
Container* container;
int i;
};
class XCursor
{
public:
XCursor(_container, _i)
: container(_container), i(_i) {}
XCursor& operator = (const XCursor& xc)
{
container[i].x = xc.container[xc.i].x;
//or do whatever you want over x
}
Container* container;
int i;
}
//usage
my_container[i].x = their_container[j].x; //calls XCursor::operator = ()

After reading the above answers, I decided that Pete's answer with two versions of operator[] was the best way forward. To handle the morphing between types at run-time I created a new array template class that took four parameters as follows;
template<class TYPE, class ARG_TYPE,class BASE_TYPE, class BASE_ARG_TYPE>
class CMorphArray
{
int GetSize() { return m_BaseData.GetSize(); }
BOOL IsEmpty() { return m_BaseData.IsEmpty(); }
// Accessing elements
const TYPE& GetAt(int nIndex) const;
TYPE& GetAt(int nIndex);
void SetAt(int nIndex, ARG_TYPE newElement);
const TYPE& ElementAt(int nIndex) const;
TYPE& ElementAt(int nIndex);
// Potentially growing the array
int Add(ARG_TYPE newElement);
// overloaded operator helpers
const TYPE& operator[](int nIndex) const;
TYPE& operator[](int nIndex);
CBigArray<BASE_TYPE, BASE_ARG_TYPE> m_BaseData;
private:
CBigArray<TYPE, ARG_TYPE> m_RefCache;
CBigArray<int, int&> m_RefIndex;
CBigArray<int, int&> m_CacheIndex;
virtual void Convert(BASE_TYPE,ARG_TYPE) = 0;
virtual void Convert(TYPE,BASE_ARG_TYPE) = 0;
void InitCache();
TYPE& GetCachedElement(int nIndex);
};
The main data storage is in m_BaseData which is the data in its native format, which can vary in type as discussed. m_RefCache is secondary array to cache of elements in the expected format, and the GetCachedElement function uses the virtual Convert functions to translate the data as it is moved in and out of the cache. The cache needs to be at least as big as the number of simultaneous references that can be active at any one time, but in my case will probably benefit from being bigger as it reduces the number of conversions required. While Alsk's cursor implementation probably would have worked well, the solution given requires fewer object copies and temporary variables, and ought to afford slightly better performance which is important in this case.
Apologies to all you STL fans for the older MFC look and feel; the rest of the project is MFC so it makes more sense in this case. The CBigArray was the result of a related stack overflow question that became the basis of my large array handling. I hope to finish the implementation today and test tomorrow. If it all goes belly up on me, I'll edit this post accoringly.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js