I am really trying to be a better programmer, and to make more modular, organized code.
As an exercise, I was trying to make a very simple Graph class in C++ with STL. In the code below, my Node object does not compile because the commented line results in a reference to a reference in STL.
#include <set>
class KeyComparable
{
public:
int key;
};
bool operator <(const KeyComparable & lhs, const KeyComparable & rhs)
{
return lhs.key < rhs.key;
}
class Node : public KeyComparable
{
public:
// the following line prevents compilation
// std::set<Node &> adjacent;
};
I would like to store the edges in a set (by key) because it allows fast removal of edges by key. If I were to store list<Node*>, that would work fine, but it wouldn't allow fast deletion by key.
If I use std::set<Node>, changes made through an edge will only change the local copy (not actually the adjacent Node). If I use std::set<Node*>, I don't believe the < operator will work because it will operate on the pointers themselves, and not the memory they index.
I considered wrapping references or pointers in another class, possibly my KeyComparable class (according to the linked page, this is how boost handles it).
Alternatively, I could store the std::list<Node*> and a std::map<int, iterator>' of locations in thestd::list`. I'm not sure if the iterators will stay valid as I change the list.
Ages ago, everything here would just be pointers and I'd handle all the data structures manually. But I'd really like to stop programming C-style in every language I use, and actually become a good programmer.
What do you think is the best way to handle this problem? Thanks a lot.
As you have deduced, you can't store references in STL containers because one of the requirements of items stored is that they be assignable. It's same reason why you can't store arrays in STL containers. You also can't overload operators without at least one being a user-defined type, which makes it appear that you can't do custom comparisons if you store pointers in an STL class...
However, you can still use std::set with pointers if you give set a custom comparer functor:
struct NodePtrCompare {
bool operator()(const Node* left, const Node* right) const {
return left->key < right->key;
}
};
std::set<Node*, NodePtrCompare> adjacent;
And you still get fast removals by key like you want.
Related
We can define a LinkedListNode as below:
template <typename T>
struct LinkedListNode {
T val;
LinkedListNode* next;
LinkedListNode() : val{}, next(nullptr) {}
LinkedListNode(T x) : val{ x }, next(nullptr) {}
LinkedListNode(T x, LinkedListNode* next) : val{ x }, next(next) {}
};
If we want to define a function that takes a "Linked List", we have two options. First, we could pass a LinkedListNode* to the function.
template <typename T>
int func(LinkedListNode<T>* node);
Second, we could define a LinkedList class that holds a pointer to the "head" node. Then we could define a function that takes a LinkedList.
template <typename T>
struct LinkedList {
LinkedListNode<T>* head;
// other member functions
};
template <typename T>
int func(LinkedList<T>& llist);
One reason the second appears preferable because it might allow better encapsulation of functions that modify a "Linked List". For example, a FindMax that takes a LinkedListNode* might better fit as a member function of LinkedList than as a member function of LinkedListNode.
What concrete reasons are there to prefer one over the other? I'm especially interested in reasons you might prefer to just use LinkedListNode*s.
I think before you even choose to use a singly linked list, you should have some reason to use it over plain std::vector. You need actual benchmarks that show that a singly linked list would improve performance in the particular application you have in mind; you'd be surprised how often it makes things worse, not better. Hint: theoretic computational complexity is orthogonal from memory access patterns, and on modern CPUs the memory access patterns determine performance - most computation is essentially free, in that it takes no extra time: it gets hidden under all the cache misses.
Then you should have a reason not to use std::forward_list. But maybe you need intrusive linked lists: then make a case for not using boost::intrusive::slist<T> or a similar existing and well tested library type.
If you're still going forward with your own implementation, then the very first step would be to use std::unique_ptr as the owning pointer for child nodes, instead of manual memory management - that way it'll be very easy to show that no memory is being leaked - the code becomes correct by construction and memory leaks require extra effort vs. happening by omission.
In other words: don't reinvent the wheel unless you have a well articulated reason for that. Of course, you can implement linked lists all you want as an exercise, but be aware that you're most likely implementing a container that you'll make the least use of - so I'd argue that you'd learn a lot more about how C++ works by implementing e.g. a vector/array container.
If you do use std::unique_ptr, or even manual memory management, you're likely to run into the destructor stack explosion pitfall. Consider
template <typename T>
struct LinkedListNode1 {
T val;
std::unique_ptr<LinkedListNode1> next;
};
template <typename T>
struct LinkedListNode2 {
T val;
LinkedListNode2* next = nullptr;
~LinkedListNode2() { delete next; }
};
In both cases, the destructor gets invoked recursively, and if the list is sufficiently long, you'll run out of stack. Recursion is also usually less efficient than a loop. To prevent that, you must be never deallocating nodes that have non-null next.
template <typename T>
struct LinkedListNode1 {
T val;
std::unique_ptr<LinkedListNode1> next;
~LinkedListNode1() {
auto node = std::move(next);
while (node)
node = std::move(node->next);
assert(!next);
}
};
template <typename T>
struct LinkedListNode2 {
T val;
LinkedListNode2* next = {};
~LinkedListNode2() {
using std::swap;
LinkedListNode2* node = {};
swap(node, next);
while (node) {
LinkedListNode2* tmp = {};
swap(tmp, node);
assert(!node);
swap(node, tmp->next);
assert(!tmp->next);
delete tmp;
}
assert(!next);
}
};
Smart pointers make the code much simpler. I wrote the raw pointer version with swaps to make it easier to show that no memory is leaking: a swap used correctly never "loses" a value.
For example, a FindMax that takes a LinkedListNode*
That's again reinventing the wheel. In C++, the idiom for "finding a maximum element" is std::max_element from #include <algorithm>. You should leverage the algorithms that the standard library provides (and any others you may need, e.g. from Boost or header-only libraries).
To do that, you need an iterator for the list. It will be, by necessity, a LegacyForwardIterator. Here, is a has a strict technical meaning: it's a concise way of saying "your iterator will fulfill the concept of and abide by the contract of LegacyForwardIterator".
Such an iterator would look very roughly as follows:
template <typename T>
class LinkedListNode1 {
std::unique_ptr<LinkedListNode1> next;
template <typename V> class iterator_impl {
LinkedListNode1 *node = {};
using const_value_type = std::add_const_t<V>;
using non_const_value_type = std::remove_const_t<V>;
public:
using value_type = V;
using reference = V&;
using pointer = V*;
iterator_impl() = default;
template <typename VO>
iterator_impl(const iterator_impl<VO> &o) : node(o.operator->()) {}
explicit iterator_impl(LinkedListNode1 *node) : node(node) {}
auto *operator->() const { return node; }
pointer operator&() const { return &(node->val); }
reference operator*() const { return node->val; }
iterator_impl &operator++() { node = node->next.get(); return *this; }
iterator_impl operator++(int) {
auto retval = *this;
this->operator++();
return retval;
}
bool operator==(const iterator_impl &o) const { return node == o.node; }
bool operator!=(const iterator_impl &o) const { return node != o.node; }
};
public:
T val;
using iterator = iterator_impl<T>;
using const_iterator = iterator_impl<const T>;
The next pointer can be made private. Then, the basic functionality would include:
LinkedListNode1() = default;
LinkedListNode1(const T &val) : val(val) {}
~LinkedListNode1() {
auto node = std::move(next);
while (node)
node = std::move(node->next);
}
iterator begin() { return iterator(this); }
iterator end() { return {}; }
const_iterator begin() const { return const_iterator(this); }
const_iterator end() const { return {}; }
const_iterator cbegin() const { return const_iterator(this); }
const_iterator cend() const { return {}; }
iterator insert_after(const_iterator pos, const T& value) {
auto next = std::make_unique<LinkedListNode1>();
next->val = value;
auto retval = iterator(next.get());
pos->next = std::move(next);
return retval;
}
One would use insert_after to extend the list. Other such methods would need to be added, of course.
Then, we'd probably also want to support initializer lists:
LinkedListNode1(std::initializer_list<T> init) {
auto src = init.begin();
if (src == init.end()) return;
val = *src++;
for (auto dst = iterator(this); src != init.end(); ++src)
dst = insert_after(dst, *src);
}
};
Now you can pre-populate the list with an initializer list, iterate it using range-for, and use it with standard algorithms:
#include <iostream>
int main() {
LinkedListNode1<int> list{1, 3, 2};
for (auto const &val : list)
std::cout << val << '\n';
assert(*std::max_element(list.begin(), list.end()) == 3);
}
But now we come to the most important question:
What concrete reasons are there to prefer one over the other
The default - the starting point - is to provide a container, since that's the abstraction we deal with: the "thing" that you think of is a linked list, not a list node. The data structure you learn of is, again, a linked list. And for a good reason: The node type is an implementation detail, so you'd need to come up with application-specific reasons for exposing the node type, and any argument made must stand up to the scrutiny when faced with iterators. Do you really need to expose those nodes, or is what you actually want just a convenient way to iterate over the items stored in the collection, perhaps split the list, etc? Node access is not necessary for any of it. It's all a solved problem, as you'll learn by reading the documentation of std::forward_list.
You'd also want to consider allocator support. I'd not worry about the C++98 allocators, but the polymorphic allocators are (finally!) actually usable, so you'd want to implement those (c.f. std::pmr::polymorphic_allocator and the std::pmr namespace in general).
For full functionality, you'd pretty much need to add most of std::forward_list's methods and constructors. So it's a bit of work, and there are lots of details to make it work well no matter the value type. And thus we come full circle: real containers that are meant to be useful without worrying about low-level details are lots of work, but they are a joy to use - and they look nothing like most textbook "teaching" code.
A linked list is often used when teaching data structures - true. Yet most C++ books used in teaching are woefully inadequate in demonstrating what a modern, fully functional data structure/container entails - they can't even get that right for something as "simple" as a singly linked list.
The gap between a C-like singly linked list - exactly what you started with in the question - and a singly linked list C++ container is on the order of a couple thousand lines of code and tests. That's what they don't usually teach, and that's where the most important bits really are: they are the difference between toy code, and production code.
Even without tests, a fully functional singly linked list container is ~500 lines without polymorphic allocator support, and probably at least double that with such support, and tests would double the code size several times - although if you were clever about it, you could reuse a lot of the tests used by various STL implementations :)
And, by the way: a decent implementation of a linked list in C won't force you to manually deal with nodes either. The list itself - the container - will be an abstract data type with a bunch of functions that provide the functionality, and with some abstraction for iterators as well (even though they'll be just pointers in some type-safe disguise). This is again the difference between teaching code and easy-to-use-correctly and hard-to-use-incorrectly production code. One example I can think of right now are the stretchy buffers, as implemented in Bitwise ion project. This is a link to a video where those are implemented live, and they serve as a decent example of how abstractions work in C (and also how you definitely shouldn't be writing this in C++ - C and C++ are different languages!).
Defining an actual LinkedList type allows you to directly support operations that would be relatively difficult to support by just passing around a pointer to a node.
One comment has already mentioned storing the size of the linked list as a member, so you can have a function return the size of the linked list in constant time. That is a useful thing to do, but I think it only hints at the real point, which (in my opinion) is having things that apply to the linked list as a whole, rather than just operations on individual nodes.
In C++, one obvious possibility here is having a destructor that properly destroys a complete linked list when it goes out of scope.
int foo() {
LinkedList a;
// code that uses `a`
} // <-- here `a` goes out of scope, and should be destroyed
One of the big features of C++ as a whole is deterministic destruction, and its support for that is based on destructors that run when objects go out of scope.
With a linked list, you'd (at least normally) plan on all the nodes in the linked list being allocated dynamically. If you just use a pointer to node, it'll be up to you to keep track of when you no longer need/want a particular linked list, and manually destroy all the nodes in the linked list when it's no longer needed.
By creating a linked-list class, you get the benefit of deterministic destruction, so you no longer need to keep track of when a list is no longer needed--the compiler tracks that for you, and when it goes out of scope, it gets destroyed automatically.
I'd also expect a linked list to support copy construction, move construction, copy assignment, and move assignment--and probably also a few things like comparison (at least for in/equality, and possibly ordering). And all of these require a fair amount of manual intervention if you decide to implement your linked list as a pointer to a node, instead of having an actual linked list class.
As such, I'd say if you really want to use C++ (even close to how it's intended to work) creating a class to encapsulate your linked list is an absolute necessity. As long as you're just passing around pointers to nodes, what you're writing is fundamentally C (even if it may use some features specific to C++ so a C compiler won't accept it).
I have an object, which can be identified by its name, and I want to place it in one of the STL containers.
class MyClass {
public:
//getters and setters, other functions
private:
std::string name;
//other member variables
};
So at first I thought that the usage of map-like structures is irrelevant in my case, because in those structures the identifier (the key) is separated from the class itself. Using a map I must return the name variable and copy it "outside" the class (waste of memory and illogical, breaking OOP rules).
My next shot was the usage of set-like structures. In this case I only have the key field, where I load my whole object. Using this method I must overload my <, > and == operators in order to use an object as key. I can even make a functor for hashing if I use unordered_set, it works just fine. The problem here is that I cannot use the container functions as I would with a map. This is valid mapInstance.find("example"), and this is not setInstance.find("example"). I must create an object with the member variable name set to "example" and pass it to the find() function. The problem with this solution is that the other member variables in my class are duplicated and not used. I even tried overloading the <, > and == operators for std::string and MyClass classes, which works fine if I use them like this stringInstance < MyClassInstance, but the container functions are unusable (I even tried to overload the functor to work with a string with no success).
Can you suggest me a simple way (or a way), how to solve this problem with std::set or std::map(maybe others)?
In std::map the key cannot be a reference (as I know), and I don't know how to resolve it with std::set.
Note: The problem with storing a pointer in a map's key field is that if we change our mind and use unordered_map instead of map, the hash will be calculated based on the pointer, not based on the string (the hash function can be overridden, but it seems very complicated for a simple task).
Thanks for your help!
You should ask yourself what your requirements for your container are.
Things to consider are:
How many objects will usually be in the container
What are the memory constraints
How often will objects be searched for and which complexity is acceptable?
A std::map has some requirements which might conflict with your class. E.g. the key is not allowed to be changed once an element is added to the map. However, your class might change the name at every time. From this consideration it should become clear that a std::map cannot work with a reference to a string as the key.
In the simplest case, you might consider using a std::list and std::find_if with a predicate to check for a special name. This would have O(n) complexity.
Sometimes you have to build upon what's given to get the desired results. I'd suggest that what you're looking for is a specialized adapter for your use case.
Here's a very basic implementation that you can build upon. In many cases, the cache locality you get from using std::vector will give better performance than using a standard container that provides a similar feature set.
Given a simple type and some helper operators:
struct Obj
{
int key;
std::string name;
};
bool operator<(const Obj& lhs, const Obj& rhs)
{
return lhs.key < rhs.key;
}
bool operator==(const Obj& lhs, int rhs)
{
return lhs.key == rhs;
}
bool operator<(const Obj& lhs, int rhs)
{
return lhs.key < rhs;
}
We can devise a simple flat_map class that provides the basic complexity guarantees of a map, but will meet your requirement of finding objects by key without constructing a value type (like you need to with a set). Insertion gives more complexity, but lookups are of similar complexity. If lookups occur much more frequently then insertions (which is the case most of the time) this can work well.
class flat_map
{
public:
using container_type = std::vector<Obj>;
// insert object into the set
// complexity varies based on length of container
void insert(Obj&& obj)
{
container_.emplace_back(std::move(obj));
std::sort(container_.begin(), container_.end());
}
// find with O(log N) complexity
container_type::iterator find(int key)
{
auto it = std::lower_bound(container_.begin(), container_.end(), key);
if(it != container_.end() && *it == key)
return it;
return container_.end();
}
private:
container_type container_;
};
Example usage:
int main()
{
flat_map obj;
obj.insert({1, "one"});
obj.insert({2, "two"});
obj.insert({3, "three"});
auto it = obj.find(2);
std::cout << it->key << ' ' << it->name << '\n';
}
You can extend the flat_map adapter in any way you see fit. Adding the required overloads to meet your requirements, templatizing the parameters (allocator, comparisons, storage types, etc.).
For the sake of presenting my question, let's assume I have a set of pointers (same type)
{p1, p2, ..., pn}
I would like to store them in multiple containers as I need different access strategy to access them. Suppose I want to store them in two containers, linked list and a hash table. For linked list, I have the order and for hash table I have the fast access. Now, the problem is that if I remove a pointer from one container, I'll need to remember to remove from other container. This makes the code hard to maintain. So the question is that are there other patterns or data structures to manage situation like this? Would smart pointer help here?
If I understand correctly, you want to link your containers so that removing from one removes from all. I don't think this is directly possible. Possible solutions:
re-design whole object architecture, so pointer is not in many containers.
Use Boost Multi-index Containers Library to achieve all features you want in one container.
Use a map key instead of direct pointer to track objects, and keep the pointer itself in one map.
use std::weak_ptr so you can check if item has been deleted somewhere else, and turn it to std::shared_ptr while it is used (you need one container to have "master" std::shared_ptr to keep object around when not used)
create function/method/class to delete objects, which knows all containers, so you don't forget accidentally, when all deletion is in one place.
Why don't you create your own class which contains the both std::list and std::unordred_map and provide accessing functions and provide removal functions in a way that you can access them linearly with the list and randomly with the unordred_map, and the deletion will be deleting from both containers and insertion will insert to both. ( kind of a wrapper class :P )
Also you can consider about using std::map, and providing it a comparison function which will always keep your data structure ordered in the desired way and also you can randomly access the elements with log N access time.
As usually, try to isolate this logic to make things easier to support. Some small class with safe public interface (sorry, I didn't compile this, it is just a pseudocode).
template<class Id, Ptr>
class Store
{
public:
void add(Id id, Ptr ptr)
{
m_ptrs.insert(ptr);
m_ptrById.insert(std::make_pair(id, ptr));
}
void remove(Ptr ptr)
{
// remove in sync as well
}
private:
std::list<Ptr> m_ptrs;
std::map<Id, Ptr> m_ptrById;
};
Then use Store for keeping your pointers in sync.
If I understand your problem correctly, you are less concern with memory management (new/delete issue) and more concern with the actual "book keeping" of which element is valid or not.
So, I was thinking of wrapping each point with a "reference counter"
template< class Point >
class BookKeeping {
public:
enum { LIST_REF = 0x01,
HASH_REF = 0x02 };
BookKeeping( const Point& p ): m_p(p), m_refCout( 0x3 ) {} // assume object created in both containers
bool isValid() const { return m_refCount == 0x3; } // not "freed" from any container
void remove( unsigned int from ) { m_refCount = m_refCount & ! from ; }
private:
Point m_p;
unsigned int m_refCount;
};
See the answer (the only one, by now) to this similar question. In that case a deque is proposed instead of a list, since the OP only wanted to insert/remove at the ends of the sequence.
Anyway, you might prefer to use the Boost Multi-index Containers Library.
I have a large class representing a graph. This class containing several containers (vector and set) of complex types.
During search I need to modify the graph to avoid loops in the results.
Since I have to run many searches I need to restore the class to its originial state very often.
Currently I am simply assigning the saved continers to the modified ones:
void Graph::restore(){
mEdges=mSafeEdges; //std::vector<Edge> Edge has no heap based data
mNodes=mSafeNodes; //std::vector<GraphNode> A Graph Node contains std::set<int>
}
As I said, Edges and nodes are complex, with each node conaining e.g. a set.
Each pair of containers has equal size. Profiling my code showed that the simple restore function is the major bottleneck of the programme, taking approx 6ms at each run. The edge vector takes 1.5 ms to copy and the nodes 4.5. Is there a better, faster way to copy containers of complex types or at least to copy the Edge Vector ?
If you only modify some of the objects on each search then you could see whether copy-on-write helps.
template<typename T>
class Cow {
std::shared_ptr<T> owned;
const Node* non_owned;
public:
explicit Cow(const T& n) : non_owned(n) { }
const T& get() const { return *non_owned; }
T& copy() {
if (!owned) {
owned = std::make_shared<T>(*non_owned);
non_owned = owned.get();
}
return *owned;
}
};
Then replace mEdges and mNodes with containers of Cow<Edge> and Cow<GraphNode> (or only do it for nodes, since that's the more expensive type to copy).
You'd have to modify the search logic to work with the wrapper type (or give it a conversion operator to const T&) and then explicitly add a call to copy() when you want a modifiable object, but you would avoid making copies of objects unless necessary.
So I have some legacy code which I would love to use more modern techniques. But I fear that given the way that things are designed, it is a non-option. The core issue is that often a node is in more than one list at a time. Something like this:
struct T {
T *next_1;
T *prev_1;
T *next_2;
T *prev_2;
int value;
};
this allows the core have a single object of type T be allocated and inserted into 2 doubly linked lists, nice and efficient.
Obviously I could just have 2 std::list<T*>'s and just insert the object into both...but there is one thing which would be way less efficient...removal.
Often the code needs to "destroy" an object of type T and this includes removing the element from all lists. This is nice because given a T* the code can remove that object from all lists it exists in. With something like a std::list I would need to search for the object to get an iterator, then remove that (I can't just pass around an iterator because it is in several lists).
Is there a nice c++-ish solution to this, or is the manually rolled way the best way? I have a feeling the manually rolled way is the answer, but I figured I'd ask.
As another possible solution, look at Boost Intrusive, which has an alternate list class a lot of properties that may make it useful for your problem.
In this case, I think it'd look something like this:
using namespace boost::intrusive;
struct tag1; struct tag2;
typedef list_base_hook< tag<tag1> > base1;
typedef list_base_hook< tag<tag2> > base2;
class T: public base1, public base2
{
int value;
}
list<T, base_hook<base1> > list1;
list<T, base_hook<base2> > list2;
// constant time to get iterator of a T item:
where_in_list1 = list1.iterator_to(item);
where_in_list2 = list2.iterator_to(item);
// once you have iterators, you can remove in contant time, etc, etc.
Instead of managing your own next/previous pointers, you could indeed use an std::list. To solve the performance of remove problem, you could store an iterator to the object itself (one member for each std::list the element can be stored in).
You can extend this to store a vector or array of iterators in the class (in case you don't know the number of lists the element is stored in).
I think the proper answer depends on how performance-critical this application is. Is it in an inner loop that could potentially cost the program a user-perceivable runtime difference?
There is a way to create this sort of functionality by creating your own classes derived from some of the STL containers, but it might not even be worth it to you. At the risk of sounding tiresome, I think this might be an example of premature optimization.
The question to answer is why this C struct exists in the first place. You can't re-implement the functionality in C++ until you know what that functionality is. Some questions to help you answer that are,
Why lists? Does the data need to be in sequence, i.e., in order? Does the order mean something? Does the application require ordered traversal?
Why two containers? Does membership in the container indicated some kind of property of the element?
Why a double-linked list specifically? Is O(1) insertion and deletion important? Is reverse-iteration important?
The answer to some or all of these may be, "no real reason, that's just how they implemented it". If so, you can replace that intrusive C-pointer mess with a non-intrusive C++ container solution, possibly containing shared_ptrs rather than ptrs.
What I'm getting at is, you may not need to re-implement anything. You may be able to discard the entire business, and store the values in proper C++ containers.
How's this?
struct T {
std::list<T*>::iterator entry1, entry2;
int value;
};
std::list<T*> list1, list2;
// init a T* item:
item = new T;
item->entry1 = list1.end();
item->entry2 = list2.end();
// add a T* item to list 1:
item->entry1 = list1.insert(<where>, item);
// remove a T* item from list1
if (item->entry1 != list1.end()) {
list1.remove(item->entry1); // this is O(1)
item->entry1 = list1.end();
}
// code for list2 management is similar
You could make T a class and use constructors and member functions to do most of this for you. If you have variable numbers of lists, you can use a list of iterators std::vector<std::list<T>::iterator> to track the item's position in each list.
Note that if you use push_back or push_front to add to the list, you need to do item->entry1 = list1.end(); item->entry1--; or item->entry1 = list1.begin(); respectively to get the iterator pointed in the right place.
It sounds like you're talking about something that could be addressed by applying graph theory. As such the Boost Graph Library might offer some solutions.
list::remove is what you're after. It'll remove any and all objects in the list with the same value as what you passed into it.
So:
list<T> listOne, listTwo;
// Things get added to the lists.
T thingToRemove;
listOne.remove(thingToRemove);
listTwo.remove(thingToRemove);
I'd also suggest converting your list node into a class; that way C++ will take care of memory for you.
class MyThing {
public:
int value;
// Any other values associated with T
};
list<MyClass> listOne, listTwo; // can add and remove MyClass objects w/o worrying about destroying anything.
You might even encapsulate the two lists into their own class, with add/remove methods for them. Then you only have to call one method when you want to remove an object.
class TwoLists {
private:
list<MyClass> listOne, listTwo;
// ...
public:
void remove(const MyClass& thing) {
listOne.remove(thing);
listTwo.remove(thing);
}
};