We can define a LinkedListNode as below:
template <typename T>
struct LinkedListNode {
T val;
LinkedListNode* next;
LinkedListNode() : val{}, next(nullptr) {}
LinkedListNode(T x) : val{ x }, next(nullptr) {}
LinkedListNode(T x, LinkedListNode* next) : val{ x }, next(next) {}
};
If we want to define a function that takes a "Linked List", we have two options. First, we could pass a LinkedListNode* to the function.
template <typename T>
int func(LinkedListNode<T>* node);
Second, we could define a LinkedList class that holds a pointer to the "head" node. Then we could define a function that takes a LinkedList.
template <typename T>
struct LinkedList {
LinkedListNode<T>* head;
// other member functions
};
template <typename T>
int func(LinkedList<T>& llist);
One reason the second appears preferable because it might allow better encapsulation of functions that modify a "Linked List". For example, a FindMax that takes a LinkedListNode* might better fit as a member function of LinkedList than as a member function of LinkedListNode.
What concrete reasons are there to prefer one over the other? I'm especially interested in reasons you might prefer to just use LinkedListNode*s.
I think before you even choose to use a singly linked list, you should have some reason to use it over plain std::vector. You need actual benchmarks that show that a singly linked list would improve performance in the particular application you have in mind; you'd be surprised how often it makes things worse, not better. Hint: theoretic computational complexity is orthogonal from memory access patterns, and on modern CPUs the memory access patterns determine performance - most computation is essentially free, in that it takes no extra time: it gets hidden under all the cache misses.
Then you should have a reason not to use std::forward_list. But maybe you need intrusive linked lists: then make a case for not using boost::intrusive::slist<T> or a similar existing and well tested library type.
If you're still going forward with your own implementation, then the very first step would be to use std::unique_ptr as the owning pointer for child nodes, instead of manual memory management - that way it'll be very easy to show that no memory is being leaked - the code becomes correct by construction and memory leaks require extra effort vs. happening by omission.
In other words: don't reinvent the wheel unless you have a well articulated reason for that. Of course, you can implement linked lists all you want as an exercise, but be aware that you're most likely implementing a container that you'll make the least use of - so I'd argue that you'd learn a lot more about how C++ works by implementing e.g. a vector/array container.
If you do use std::unique_ptr, or even manual memory management, you're likely to run into the destructor stack explosion pitfall. Consider
template <typename T>
struct LinkedListNode1 {
T val;
std::unique_ptr<LinkedListNode1> next;
};
template <typename T>
struct LinkedListNode2 {
T val;
LinkedListNode2* next = nullptr;
~LinkedListNode2() { delete next; }
};
In both cases, the destructor gets invoked recursively, and if the list is sufficiently long, you'll run out of stack. Recursion is also usually less efficient than a loop. To prevent that, you must be never deallocating nodes that have non-null next.
template <typename T>
struct LinkedListNode1 {
T val;
std::unique_ptr<LinkedListNode1> next;
~LinkedListNode1() {
auto node = std::move(next);
while (node)
node = std::move(node->next);
assert(!next);
}
};
template <typename T>
struct LinkedListNode2 {
T val;
LinkedListNode2* next = {};
~LinkedListNode2() {
using std::swap;
LinkedListNode2* node = {};
swap(node, next);
while (node) {
LinkedListNode2* tmp = {};
swap(tmp, node);
assert(!node);
swap(node, tmp->next);
assert(!tmp->next);
delete tmp;
}
assert(!next);
}
};
Smart pointers make the code much simpler. I wrote the raw pointer version with swaps to make it easier to show that no memory is leaking: a swap used correctly never "loses" a value.
For example, a FindMax that takes a LinkedListNode*
That's again reinventing the wheel. In C++, the idiom for "finding a maximum element" is std::max_element from #include <algorithm>. You should leverage the algorithms that the standard library provides (and any others you may need, e.g. from Boost or header-only libraries).
To do that, you need an iterator for the list. It will be, by necessity, a LegacyForwardIterator. Here, is a has a strict technical meaning: it's a concise way of saying "your iterator will fulfill the concept of and abide by the contract of LegacyForwardIterator".
Such an iterator would look very roughly as follows:
template <typename T>
class LinkedListNode1 {
std::unique_ptr<LinkedListNode1> next;
template <typename V> class iterator_impl {
LinkedListNode1 *node = {};
using const_value_type = std::add_const_t<V>;
using non_const_value_type = std::remove_const_t<V>;
public:
using value_type = V;
using reference = V&;
using pointer = V*;
iterator_impl() = default;
template <typename VO>
iterator_impl(const iterator_impl<VO> &o) : node(o.operator->()) {}
explicit iterator_impl(LinkedListNode1 *node) : node(node) {}
auto *operator->() const { return node; }
pointer operator&() const { return &(node->val); }
reference operator*() const { return node->val; }
iterator_impl &operator++() { node = node->next.get(); return *this; }
iterator_impl operator++(int) {
auto retval = *this;
this->operator++();
return retval;
}
bool operator==(const iterator_impl &o) const { return node == o.node; }
bool operator!=(const iterator_impl &o) const { return node != o.node; }
};
public:
T val;
using iterator = iterator_impl<T>;
using const_iterator = iterator_impl<const T>;
The next pointer can be made private. Then, the basic functionality would include:
LinkedListNode1() = default;
LinkedListNode1(const T &val) : val(val) {}
~LinkedListNode1() {
auto node = std::move(next);
while (node)
node = std::move(node->next);
}
iterator begin() { return iterator(this); }
iterator end() { return {}; }
const_iterator begin() const { return const_iterator(this); }
const_iterator end() const { return {}; }
const_iterator cbegin() const { return const_iterator(this); }
const_iterator cend() const { return {}; }
iterator insert_after(const_iterator pos, const T& value) {
auto next = std::make_unique<LinkedListNode1>();
next->val = value;
auto retval = iterator(next.get());
pos->next = std::move(next);
return retval;
}
One would use insert_after to extend the list. Other such methods would need to be added, of course.
Then, we'd probably also want to support initializer lists:
LinkedListNode1(std::initializer_list<T> init) {
auto src = init.begin();
if (src == init.end()) return;
val = *src++;
for (auto dst = iterator(this); src != init.end(); ++src)
dst = insert_after(dst, *src);
}
};
Now you can pre-populate the list with an initializer list, iterate it using range-for, and use it with standard algorithms:
#include <iostream>
int main() {
LinkedListNode1<int> list{1, 3, 2};
for (auto const &val : list)
std::cout << val << '\n';
assert(*std::max_element(list.begin(), list.end()) == 3);
}
But now we come to the most important question:
What concrete reasons are there to prefer one over the other
The default - the starting point - is to provide a container, since that's the abstraction we deal with: the "thing" that you think of is a linked list, not a list node. The data structure you learn of is, again, a linked list. And for a good reason: The node type is an implementation detail, so you'd need to come up with application-specific reasons for exposing the node type, and any argument made must stand up to the scrutiny when faced with iterators. Do you really need to expose those nodes, or is what you actually want just a convenient way to iterate over the items stored in the collection, perhaps split the list, etc? Node access is not necessary for any of it. It's all a solved problem, as you'll learn by reading the documentation of std::forward_list.
You'd also want to consider allocator support. I'd not worry about the C++98 allocators, but the polymorphic allocators are (finally!) actually usable, so you'd want to implement those (c.f. std::pmr::polymorphic_allocator and the std::pmr namespace in general).
For full functionality, you'd pretty much need to add most of std::forward_list's methods and constructors. So it's a bit of work, and there are lots of details to make it work well no matter the value type. And thus we come full circle: real containers that are meant to be useful without worrying about low-level details are lots of work, but they are a joy to use - and they look nothing like most textbook "teaching" code.
A linked list is often used when teaching data structures - true. Yet most C++ books used in teaching are woefully inadequate in demonstrating what a modern, fully functional data structure/container entails - they can't even get that right for something as "simple" as a singly linked list.
The gap between a C-like singly linked list - exactly what you started with in the question - and a singly linked list C++ container is on the order of a couple thousand lines of code and tests. That's what they don't usually teach, and that's where the most important bits really are: they are the difference between toy code, and production code.
Even without tests, a fully functional singly linked list container is ~500 lines without polymorphic allocator support, and probably at least double that with such support, and tests would double the code size several times - although if you were clever about it, you could reuse a lot of the tests used by various STL implementations :)
And, by the way: a decent implementation of a linked list in C won't force you to manually deal with nodes either. The list itself - the container - will be an abstract data type with a bunch of functions that provide the functionality, and with some abstraction for iterators as well (even though they'll be just pointers in some type-safe disguise). This is again the difference between teaching code and easy-to-use-correctly and hard-to-use-incorrectly production code. One example I can think of right now are the stretchy buffers, as implemented in Bitwise ion project. This is a link to a video where those are implemented live, and they serve as a decent example of how abstractions work in C (and also how you definitely shouldn't be writing this in C++ - C and C++ are different languages!).
Defining an actual LinkedList type allows you to directly support operations that would be relatively difficult to support by just passing around a pointer to a node.
One comment has already mentioned storing the size of the linked list as a member, so you can have a function return the size of the linked list in constant time. That is a useful thing to do, but I think it only hints at the real point, which (in my opinion) is having things that apply to the linked list as a whole, rather than just operations on individual nodes.
In C++, one obvious possibility here is having a destructor that properly destroys a complete linked list when it goes out of scope.
int foo() {
LinkedList a;
// code that uses `a`
} // <-- here `a` goes out of scope, and should be destroyed
One of the big features of C++ as a whole is deterministic destruction, and its support for that is based on destructors that run when objects go out of scope.
With a linked list, you'd (at least normally) plan on all the nodes in the linked list being allocated dynamically. If you just use a pointer to node, it'll be up to you to keep track of when you no longer need/want a particular linked list, and manually destroy all the nodes in the linked list when it's no longer needed.
By creating a linked-list class, you get the benefit of deterministic destruction, so you no longer need to keep track of when a list is no longer needed--the compiler tracks that for you, and when it goes out of scope, it gets destroyed automatically.
I'd also expect a linked list to support copy construction, move construction, copy assignment, and move assignment--and probably also a few things like comparison (at least for in/equality, and possibly ordering). And all of these require a fair amount of manual intervention if you decide to implement your linked list as a pointer to a node, instead of having an actual linked list class.
As such, I'd say if you really want to use C++ (even close to how it's intended to work) creating a class to encapsulate your linked list is an absolute necessity. As long as you're just passing around pointers to nodes, what you're writing is fundamentally C (even if it may use some features specific to C++ so a C compiler won't accept it).
It seems to me given what I know about linked lists that this should be possible but I haven't found anywhere that has the answer so I'm asking here.
Given two iterators to items in the same list. I'd like to take the item pointed to by iterator "frm" and "insert" it into the list before the item pointed to by iterator "to".
It seems that all that is needed is to change the pointers on the items in the list pointing to "frm" (to remove "frm"), then changing the pointers on the item pointing at "to" so that it references "frm" then changing the pointers on "frm" node to point to "to".
I looked everywhere for this and couldn't find an answer.
NOTE that I cannot use splice as I do not have access to the list only the iterators to the items in the list.
template <typename T>
void move(typename std::list<T>::iterator frm, typename std::list<T>::iterator to) {
//remove the item from the list at frm
//insert the item at frm before the item at to
}
Iterators contain the minimal information required to point to a piece of data, what you are missing is the fact that linked lists have other bookkeeping that go along with it as well, so essentially the list class looks something like the following
template <typename Type>
class list {
int size; // for O(1) size()
Type* head;
Type* tail;
class Iterator {
Type* element;
// no back pointer to list<Type>*
};
...
};
And to remove an element from the list you would need to update those data members as well. And to do that an iterator must contain a back pointer to the list itself, which is not required as per the interface offered for most iterators. Notice also that the algorithms in the STL do not actually modify the bookkeeping for the containers the operate on, only maybe swap elements, and rearrange things.
I would encourage you took look into the <algorithm> header, as well as into facilities like std::back_inserter and std::move_iterator to get an idea of how iterators are wrapped to actually modify the container they represent.
The implementation of this is implementation defined but the c++ standard allows the use of iter_swap though it doesn't do this exactly. This maybe optimized to swap the pointers on the values held in the linked list similar to what I have described effectively reordering the items in the list without a full swap needed.
iter_swap() versus swap() -- what's the difference?
In the past I've implemented a linked list using nodes.
I am looking at some of the properties of the standard library list and it has iterators and the appropriate member functions.
What are the iterators in a list exactly? Are they the node pointers?
For a vector, basically you have pointers to the element type and the data structures is built on an underlying dynamic array of that given type.
For the lists, it seems it is just a sequence of nodes, array of nodes. So are the iterators the node pointers rather than a pointer to the node data type?
Basically what I am asking is if for a vector I had this iterator:
tyepdef T* iterator;
Would the iterator for the list be
typedef node* iterator;
where node is something like:
template <class T> struct node {
node() { next = 0; }
node(T i, node* n = 0) : data(i), next(n) {}
node* next;
T data;
}
If this is the case, it seems that operations like dereferencing will have to be overloaded.
The std::list<T>::iterator objects internally point to a node but have operators which appropriately follow the pointers to next or previoys node. That is, they are not pointers as incrementing a pointer just adds one rather than follow a link. You can inagine a list iterator looks somewhat like this:
template <typename T>
class list_iterator {
friend list<T>
Node* node;
list_iterator(T* node):node(node) {}
list_iterator& operator++() {
node = node->next;
return *this;
}
T& operator*() { return *node; }
// ...
};
The iterator on lists should behave similar to the iterators on other sequence containers like vector. I.e. it should behave like a pointer to the list::value_type as if it were in an array or similar (with ++ and -- doing the expected operation going to next and previous). The internals of the holding structure aren't really exposed through the iterator. The iterator abstraction generally frees the programmer from thinking about how the data is stored. In the future, you could theoretically swap your std::list for a std::vector without changing your code, so long as you're only using operations available to both.
Consider a hierarchical tree structure, where an item may have sibling items (at the same level in the hierarhcy) and may also have children items (one level down in hierarchy).
Lets say the structure can be defined like:
// an item of a hierarchical data structure
struct Item {
int data; // keep it an int, rather than <T>, for simplicity
vector<Item> children;
};
I wanted to be able to use algorithms over this structure, like the algorithms for a std::map, std::vector, etc. So, I created a few algorithms, like:
template <class Function>
Function for_each_children_of_item( Item, Function f ); // deep (recursive) traversal
template <class Function>
Function for_each_direct_children_of_item( Item, Function f ); // shallow (1st level) traversal
template <class Function>
Function for_each_parent_of_item( Item, Function f ); // going up to the root item
One thing that troubled me is that there are 3 for_each() functions for the same structure. But they give a good description of how they iterate, so I decided to live with it.
Then, soon, the need for more algorithms emerged (like find_if, count_if, any_of, etc), which made me feel I'm not on the right track, design-wise.
One solution I can think of, that would reduce the workload, would be to simply write:
vector<Item> get_all_children_of_item( Item ); // recursive
vector<Item> get_all_direct_children_of_item( Item ); // 1st level items
vector<Item> get_all_parents_of_item( Item ); // up to the root item
and then I could use all the STL algorithms.
I am a bit wary of this solution, because it involves copying.
I cannot think of a way to implement an iterator, as there is no obvious end() iterator in the recursive version of the traversal.
Can anybody present a typical / idiomatic way to deal with such non-linear data structures ?
Can/should iterators be created for such a structure? how?
Use iterators.
I cannot think of a way to implement an iterator, as there is no obvious end() iterator in the recursive version of the traversal.
end() can be any designated special value for your iterator class as long as your increment operator produces it when stepping past the last element. And/or override operator ==/!= for your iterator.
If you want to be really robust, implement an iterator mode for each of the XPath axes.
I have a class that (when simplified) looks like this:
class TreeNode
{
ptrdiff_t sibling_offset, first_child_offset;
public:
long whatever;
// ...
};
The tree nodes must contain offsets instead of pointers, because it needs to be possible for them to be embedded in containers (like std::vector) that may reallocate their storage when necessary, without having to spend time re-linking all the nodes.
Now, if I have a suitably defined class TreeIterator<Iter> (perhaps defined as a friend of TreeNode) whose job it is to iterate over a TreeNode's children, then any STL-style client of my class should be able to use it to iterate over the children of a node in a standard, STL fashion:
typedef std::vector<TreeNode> Tree;
Tree tree = ...;
TreeIterator<Tree::iterator> const root = tree.begin();
for (TreeIterator<Tree::iterator> i = root->begin(); i != root->end(); ++i)
{
foo(i->whatever);
process_further(i);
}
The trouble is, root->begin() is impossible because the TreeNode doesn't know anything about the container it's in.
(And it shouldn't! The only thing it cares about is that the container has suitable iterators.)
And yet, I (the author of TreeNode) am the only one who could possibly how to iterate over its children.
How do I resolve this issue, without restricting the type of the container that a TreeNode may be stored in?
Obviously this is easy if I force the user to use std::vector, but he should be free to use any arbitrary STL-compliant container.
You just define functions begin() and end() in TreeNode. And then use them in your code.
class TreeNode {
...
std::vector<T>::iterator begin() {return vec.begin();}
std::vector<T>::iterator end() {return vec.end();}
...
private:
std::vector<T> vec;
}