Copy constructor for a generic doubly-linked list - c++

template <class T>
class list
{
public:
//stuff
list(const list &cctorList); //cctor
private:
struct node
{
node *next;
node *previous;
T *info;
}
node *firstNode; //pointer to the first node (NULL if none)
node *lastNode; //pointer to the last node (NULL if none)
}
I'm now trying to define list(const list &cctorList); //cctor but I'm running into trouble.
Here's what I have so far:
template <class T>
list<T>::list(const list &cctorList)
{
node *another = new node;
firstNode = another;
another->previous = NULL;
another->info = new T(*(cctorList->info));
// ...
}
Is everything up to this point correct? Is there a way for me to recursively assign another->next? Also, is there an easier way to accomplish this using iterators?

You should be using std::list. In fact, you should be using std::vector, because it is faster for most practical purposes (list is only faster if the objects are really large or really expensive to construct) and you don't need random access.
new T(*(cctorList->info)); won't compile, because cctorList (list&) does not have operator-> and it does not have info member either.
The copy constructor is best implemented in terms of the other, more primitive operations like push_back and iteration. So first do those and than the copy constructor becomes:
template <class T>
list<T>::list(const list &cctorList)
{
std::copy(begin(cctorList), end(cctorList), std::back_inserter(this));
}
In fact I'd just template that constructor:
template <class T, class Collection> list(const Collection &cctorList)
(body remains the same). That works as copy constructor, but also allows copying from any other collection of any type that can be implicitly converted to T.
The actual data should be held by value. I.e. the node should be defined as
struct node
{
node *next;
node *previous;
T info;
}
you are copying the value anyway, so you don't need to do two separate allocations for node and T when one will do.
Edit: You say you want to learn concepts. But the most important concept of modern C++ is composition of algorithms. Their definitions are often trivial. Basic implementation of std::copy is just:
template <typename InputIterator, typename OutputIterator>
OutputIterator copy(InputIterator begin, InputIterator end, OutputIterator out) {
for(;begin != end; ++out, ++begin) *out = *begin;
}
Now this does not appear to allocate anything. The trick lies in the back_insertion_iterator. Insertion iterator is a trick to make this work without preallocating the sequences. It defines operator* using push_back on the underlying collection and ignores operator++. That satisfies "output iterator" concept, because it only guarantees to work when these two calls are strictly interleaved and makes algorithms work on many things from plain old arrays to output streams.
The other part is that while the trivial definitions are correct, they are not the actual definitions used in the library. The actual definitions in the library are optimized. E.g. usually std::copy will check whether the input iterators know their distance and if the output is insert operator to sequence with reserve operation and call it to avoid some allocations. Those are optimizations and depend on implementation details of the standard library.
You can go and write down the basic implementations of things from standard library and test they work the same if you want to understand them. But you should follow the way standard library defines things by building them up from simple helper bits like std::copy, std::swap, insertion iterator adapters and such. If you look in the standard library, most functions there are one-liners!
Edit2: Also with all the genericity the standard library provides, there are still bits criticized for not being generic enough. E.g. GotW #84: Monoliths "Unstrung" discusses which methods of std::string could be converted to generic algorithms.

Related

Intrinsic benefit of using a LinkedList class to point to head Node vs. just using Node*

We can define a LinkedListNode as below:
template <typename T>
struct LinkedListNode {
T val;
LinkedListNode* next;
LinkedListNode() : val{}, next(nullptr) {}
LinkedListNode(T x) : val{ x }, next(nullptr) {}
LinkedListNode(T x, LinkedListNode* next) : val{ x }, next(next) {}
};
If we want to define a function that takes a "Linked List", we have two options. First, we could pass a LinkedListNode* to the function.
template <typename T>
int func(LinkedListNode<T>* node);
Second, we could define a LinkedList class that holds a pointer to the "head" node. Then we could define a function that takes a LinkedList.
template <typename T>
struct LinkedList {
LinkedListNode<T>* head;
// other member functions
};
template <typename T>
int func(LinkedList<T>& llist);
One reason the second appears preferable because it might allow better encapsulation of functions that modify a "Linked List". For example, a FindMax that takes a LinkedListNode* might better fit as a member function of LinkedList than as a member function of LinkedListNode.
What concrete reasons are there to prefer one over the other? I'm especially interested in reasons you might prefer to just use LinkedListNode*s.
I think before you even choose to use a singly linked list, you should have some reason to use it over plain std::vector. You need actual benchmarks that show that a singly linked list would improve performance in the particular application you have in mind; you'd be surprised how often it makes things worse, not better. Hint: theoretic computational complexity is orthogonal from memory access patterns, and on modern CPUs the memory access patterns determine performance - most computation is essentially free, in that it takes no extra time: it gets hidden under all the cache misses.
Then you should have a reason not to use std::forward_list. But maybe you need intrusive linked lists: then make a case for not using boost::intrusive::slist<T> or a similar existing and well tested library type.
If you're still going forward with your own implementation, then the very first step would be to use std::unique_ptr as the owning pointer for child nodes, instead of manual memory management - that way it'll be very easy to show that no memory is being leaked - the code becomes correct by construction and memory leaks require extra effort vs. happening by omission.
In other words: don't reinvent the wheel unless you have a well articulated reason for that. Of course, you can implement linked lists all you want as an exercise, but be aware that you're most likely implementing a container that you'll make the least use of - so I'd argue that you'd learn a lot more about how C++ works by implementing e.g. a vector/array container.
If you do use std::unique_ptr, or even manual memory management, you're likely to run into the destructor stack explosion pitfall. Consider
template <typename T>
struct LinkedListNode1 {
T val;
std::unique_ptr<LinkedListNode1> next;
};
template <typename T>
struct LinkedListNode2 {
T val;
LinkedListNode2* next = nullptr;
~LinkedListNode2() { delete next; }
};
In both cases, the destructor gets invoked recursively, and if the list is sufficiently long, you'll run out of stack. Recursion is also usually less efficient than a loop. To prevent that, you must be never deallocating nodes that have non-null next.
template <typename T>
struct LinkedListNode1 {
T val;
std::unique_ptr<LinkedListNode1> next;
~LinkedListNode1() {
auto node = std::move(next);
while (node)
node = std::move(node->next);
assert(!next);
}
};
template <typename T>
struct LinkedListNode2 {
T val;
LinkedListNode2* next = {};
~LinkedListNode2() {
using std::swap;
LinkedListNode2* node = {};
swap(node, next);
while (node) {
LinkedListNode2* tmp = {};
swap(tmp, node);
assert(!node);
swap(node, tmp->next);
assert(!tmp->next);
delete tmp;
}
assert(!next);
}
};
Smart pointers make the code much simpler. I wrote the raw pointer version with swaps to make it easier to show that no memory is leaking: a swap used correctly never "loses" a value.
For example, a FindMax that takes a LinkedListNode*
That's again reinventing the wheel. In C++, the idiom for "finding a maximum element" is std::max_element from #include <algorithm>. You should leverage the algorithms that the standard library provides (and any others you may need, e.g. from Boost or header-only libraries).
To do that, you need an iterator for the list. It will be, by necessity, a LegacyForwardIterator. Here, is a has a strict technical meaning: it's a concise way of saying "your iterator will fulfill the concept of and abide by the contract of LegacyForwardIterator".
Such an iterator would look very roughly as follows:
template <typename T>
class LinkedListNode1 {
std::unique_ptr<LinkedListNode1> next;
template <typename V> class iterator_impl {
LinkedListNode1 *node = {};
using const_value_type = std::add_const_t<V>;
using non_const_value_type = std::remove_const_t<V>;
public:
using value_type = V;
using reference = V&;
using pointer = V*;
iterator_impl() = default;
template <typename VO>
iterator_impl(const iterator_impl<VO> &o) : node(o.operator->()) {}
explicit iterator_impl(LinkedListNode1 *node) : node(node) {}
auto *operator->() const { return node; }
pointer operator&() const { return &(node->val); }
reference operator*() const { return node->val; }
iterator_impl &operator++() { node = node->next.get(); return *this; }
iterator_impl operator++(int) {
auto retval = *this;
this->operator++();
return retval;
}
bool operator==(const iterator_impl &o) const { return node == o.node; }
bool operator!=(const iterator_impl &o) const { return node != o.node; }
};
public:
T val;
using iterator = iterator_impl<T>;
using const_iterator = iterator_impl<const T>;
The next pointer can be made private. Then, the basic functionality would include:
LinkedListNode1() = default;
LinkedListNode1(const T &val) : val(val) {}
~LinkedListNode1() {
auto node = std::move(next);
while (node)
node = std::move(node->next);
}
iterator begin() { return iterator(this); }
iterator end() { return {}; }
const_iterator begin() const { return const_iterator(this); }
const_iterator end() const { return {}; }
const_iterator cbegin() const { return const_iterator(this); }
const_iterator cend() const { return {}; }
iterator insert_after(const_iterator pos, const T& value) {
auto next = std::make_unique<LinkedListNode1>();
next->val = value;
auto retval = iterator(next.get());
pos->next = std::move(next);
return retval;
}
One would use insert_after to extend the list. Other such methods would need to be added, of course.
Then, we'd probably also want to support initializer lists:
LinkedListNode1(std::initializer_list<T> init) {
auto src = init.begin();
if (src == init.end()) return;
val = *src++;
for (auto dst = iterator(this); src != init.end(); ++src)
dst = insert_after(dst, *src);
}
};
Now you can pre-populate the list with an initializer list, iterate it using range-for, and use it with standard algorithms:
#include <iostream>
int main() {
LinkedListNode1<int> list{1, 3, 2};
for (auto const &val : list)
std::cout << val << '\n';
assert(*std::max_element(list.begin(), list.end()) == 3);
}
But now we come to the most important question:
What concrete reasons are there to prefer one over the other
The default - the starting point - is to provide a container, since that's the abstraction we deal with: the "thing" that you think of is a linked list, not a list node. The data structure you learn of is, again, a linked list. And for a good reason: The node type is an implementation detail, so you'd need to come up with application-specific reasons for exposing the node type, and any argument made must stand up to the scrutiny when faced with iterators. Do you really need to expose those nodes, or is what you actually want just a convenient way to iterate over the items stored in the collection, perhaps split the list, etc? Node access is not necessary for any of it. It's all a solved problem, as you'll learn by reading the documentation of std::forward_list.
You'd also want to consider allocator support. I'd not worry about the C++98 allocators, but the polymorphic allocators are (finally!) actually usable, so you'd want to implement those (c.f. std::pmr::polymorphic_allocator and the std::pmr namespace in general).
For full functionality, you'd pretty much need to add most of std::forward_list's methods and constructors. So it's a bit of work, and there are lots of details to make it work well no matter the value type. And thus we come full circle: real containers that are meant to be useful without worrying about low-level details are lots of work, but they are a joy to use - and they look nothing like most textbook "teaching" code.
A linked list is often used when teaching data structures - true. Yet most C++ books used in teaching are woefully inadequate in demonstrating what a modern, fully functional data structure/container entails - they can't even get that right for something as "simple" as a singly linked list.
The gap between a C-like singly linked list - exactly what you started with in the question - and a singly linked list C++ container is on the order of a couple thousand lines of code and tests. That's what they don't usually teach, and that's where the most important bits really are: they are the difference between toy code, and production code.
Even without tests, a fully functional singly linked list container is ~500 lines without polymorphic allocator support, and probably at least double that with such support, and tests would double the code size several times - although if you were clever about it, you could reuse a lot of the tests used by various STL implementations :)
And, by the way: a decent implementation of a linked list in C won't force you to manually deal with nodes either. The list itself - the container - will be an abstract data type with a bunch of functions that provide the functionality, and with some abstraction for iterators as well (even though they'll be just pointers in some type-safe disguise). This is again the difference between teaching code and easy-to-use-correctly and hard-to-use-incorrectly production code. One example I can think of right now are the stretchy buffers, as implemented in Bitwise ion project. This is a link to a video where those are implemented live, and they serve as a decent example of how abstractions work in C (and also how you definitely shouldn't be writing this in C++ - C and C++ are different languages!).
Defining an actual LinkedList type allows you to directly support operations that would be relatively difficult to support by just passing around a pointer to a node.
One comment has already mentioned storing the size of the linked list as a member, so you can have a function return the size of the linked list in constant time. That is a useful thing to do, but I think it only hints at the real point, which (in my opinion) is having things that apply to the linked list as a whole, rather than just operations on individual nodes.
In C++, one obvious possibility here is having a destructor that properly destroys a complete linked list when it goes out of scope.
int foo() {
LinkedList a;
// code that uses `a`
} // <-- here `a` goes out of scope, and should be destroyed
One of the big features of C++ as a whole is deterministic destruction, and its support for that is based on destructors that run when objects go out of scope.
With a linked list, you'd (at least normally) plan on all the nodes in the linked list being allocated dynamically. If you just use a pointer to node, it'll be up to you to keep track of when you no longer need/want a particular linked list, and manually destroy all the nodes in the linked list when it's no longer needed.
By creating a linked-list class, you get the benefit of deterministic destruction, so you no longer need to keep track of when a list is no longer needed--the compiler tracks that for you, and when it goes out of scope, it gets destroyed automatically.
I'd also expect a linked list to support copy construction, move construction, copy assignment, and move assignment--and probably also a few things like comparison (at least for in/equality, and possibly ordering). And all of these require a fair amount of manual intervention if you decide to implement your linked list as a pointer to a node, instead of having an actual linked list class.
As such, I'd say if you really want to use C++ (even close to how it's intended to work) creating a class to encapsulate your linked list is an absolute necessity. As long as you're just passing around pointers to nodes, what you're writing is fundamentally C (even if it may use some features specific to C++ so a C compiler won't accept it).

std::list, move item in list using iterators only

It seems to me given what I know about linked lists that this should be possible but I haven't found anywhere that has the answer so I'm asking here.
Given two iterators to items in the same list. I'd like to take the item pointed to by iterator "frm" and "insert" it into the list before the item pointed to by iterator "to".
It seems that all that is needed is to change the pointers on the items in the list pointing to "frm" (to remove "frm"), then changing the pointers on the item pointing at "to" so that it references "frm" then changing the pointers on "frm" node to point to "to".
I looked everywhere for this and couldn't find an answer.
NOTE that I cannot use splice as I do not have access to the list only the iterators to the items in the list.
template <typename T>
void move(typename std::list<T>::iterator frm, typename std::list<T>::iterator to) {
//remove the item from the list at frm
//insert the item at frm before the item at to
}
Iterators contain the minimal information required to point to a piece of data, what you are missing is the fact that linked lists have other bookkeeping that go along with it as well, so essentially the list class looks something like the following
template <typename Type>
class list {
int size; // for O(1) size()
Type* head;
Type* tail;
class Iterator {
Type* element;
// no back pointer to list<Type>*
};
...
};
And to remove an element from the list you would need to update those data members as well. And to do that an iterator must contain a back pointer to the list itself, which is not required as per the interface offered for most iterators. Notice also that the algorithms in the STL do not actually modify the bookkeeping for the containers the operate on, only maybe swap elements, and rearrange things.
I would encourage you took look into the <algorithm> header, as well as into facilities like std::back_inserter and std::move_iterator to get an idea of how iterators are wrapped to actually modify the container they represent.
The implementation of this is implementation defined but the c++ standard allows the use of iter_swap though it doesn't do this exactly. This maybe optimized to swap the pointers on the values held in the linked list similar to what I have described effectively reordering the items in the list without a full swap needed.
iter_swap() versus swap() -- what's the difference?

Questions about the standard library list

In the past I've implemented a linked list using nodes.
I am looking at some of the properties of the standard library list and it has iterators and the appropriate member functions.
What are the iterators in a list exactly? Are they the node pointers?
For a vector, basically you have pointers to the element type and the data structures is built on an underlying dynamic array of that given type.
For the lists, it seems it is just a sequence of nodes, array of nodes. So are the iterators the node pointers rather than a pointer to the node data type?
Basically what I am asking is if for a vector I had this iterator:
tyepdef T* iterator;
Would the iterator for the list be
typedef node* iterator;
where node is something like:
template <class T> struct node {
node() { next = 0; }
node(T i, node* n = 0) : data(i), next(n) {}
node* next;
T data;
}
If this is the case, it seems that operations like dereferencing will have to be overloaded.
The std::list<T>::iterator objects internally point to a node but have operators which appropriately follow the pointers to next or previoys node. That is, they are not pointers as incrementing a pointer just adds one rather than follow a link. You can inagine a list iterator looks somewhat like this:
template <typename T>
class list_iterator {
friend list<T>
Node* node;
list_iterator(T* node):node(node) {}
list_iterator& operator++() {
node = node->next;
return *this;
}
T& operator*() { return *node; }
// ...
};
The iterator on lists should behave similar to the iterators on other sequence containers like vector. I.e. it should behave like a pointer to the list::value_type as if it were in an array or similar (with ++ and -- doing the expected operation going to next and previous). The internals of the holding structure aren't really exposed through the iterator. The iterator abstraction generally frees the programmer from thinking about how the data is stored. In the future, you could theoretically swap your std::list for a std::vector without changing your code, so long as you're only using operations available to both.

How to iterate over a non-linear container

Consider a hierarchical tree structure, where an item may have sibling items (at the same level in the hierarhcy) and may also have children items (one level down in hierarchy).
Lets say the structure can be defined like:
// an item of a hierarchical data structure
struct Item {
int data; // keep it an int, rather than <T>, for simplicity
vector<Item> children;
};
I wanted to be able to use algorithms over this structure, like the algorithms for a std::map, std::vector, etc. So, I created a few algorithms, like:
template <class Function>
Function for_each_children_of_item( Item, Function f ); // deep (recursive) traversal
template <class Function>
Function for_each_direct_children_of_item( Item, Function f ); // shallow (1st level) traversal
template <class Function>
Function for_each_parent_of_item( Item, Function f ); // going up to the root item
One thing that troubled me is that there are 3 for_each() functions for the same structure. But they give a good description of how they iterate, so I decided to live with it.
Then, soon, the need for more algorithms emerged (like find_if, count_if, any_of, etc), which made me feel I'm not on the right track, design-wise.
One solution I can think of, that would reduce the workload, would be to simply write:
vector<Item> get_all_children_of_item( Item ); // recursive
vector<Item> get_all_direct_children_of_item( Item ); // 1st level items
vector<Item> get_all_parents_of_item( Item ); // up to the root item
and then I could use all the STL algorithms.
I am a bit wary of this solution, because it involves copying.
I cannot think of a way to implement an iterator, as there is no obvious end() iterator in the recursive version of the traversal.
Can anybody present a typical / idiomatic way to deal with such non-linear data structures ?
Can/should iterators be created for such a structure? how?
Use iterators.
I cannot think of a way to implement an iterator, as there is no obvious end() iterator in the recursive version of the traversal.
end() can be any designated special value for your iterator class as long as your increment operator produces it when stepping past the last element. And/or override operator ==/!= for your iterator.
If you want to be really robust, implement an iterator mode for each of the XPath axes.

Writing traversal methods for classes (TreeNode) that are designed to be inside (arbitrary) containers

I have a class that (when simplified) looks like this:
class TreeNode
{
ptrdiff_t sibling_offset, first_child_offset;
public:
long whatever;
// ...
};
The tree nodes must contain offsets instead of pointers, because it needs to be possible for them to be embedded in containers (like std::vector) that may reallocate their storage when necessary, without having to spend time re-linking all the nodes.
Now, if I have a suitably defined class TreeIterator<Iter> (perhaps defined as a friend of TreeNode) whose job it is to iterate over a TreeNode's children, then any STL-style client of my class should be able to use it to iterate over the children of a node in a standard, STL fashion:
typedef std::vector<TreeNode> Tree;
Tree tree = ...;
TreeIterator<Tree::iterator> const root = tree.begin();
for (TreeIterator<Tree::iterator> i = root->begin(); i != root->end(); ++i)
{
foo(i->whatever);
process_further(i);
}
The trouble is, root->begin() is impossible because the TreeNode doesn't know anything about the container it's in.
(And it shouldn't! The only thing it cares about is that the container has suitable iterators.)
And yet, I (the author of TreeNode) am the only one who could possibly how to iterate over its children.
How do I resolve this issue, without restricting the type of the container that a TreeNode may be stored in?
Obviously this is easy if I force the user to use std::vector, but he should be free to use any arbitrary STL-compliant container.
You just define functions begin() and end() in TreeNode. And then use them in your code.
class TreeNode {
...
std::vector<T>::iterator begin() {return vec.begin();}
std::vector<T>::iterator end() {return vec.end();}
...
private:
std::vector<T> vec;
}