Questions about the standard library list - c++

In the past I've implemented a linked list using nodes.
I am looking at some of the properties of the standard library list and it has iterators and the appropriate member functions.
What are the iterators in a list exactly? Are they the node pointers?
For a vector, basically you have pointers to the element type and the data structures is built on an underlying dynamic array of that given type.
For the lists, it seems it is just a sequence of nodes, array of nodes. So are the iterators the node pointers rather than a pointer to the node data type?
Basically what I am asking is if for a vector I had this iterator:
tyepdef T* iterator;
Would the iterator for the list be
typedef node* iterator;
where node is something like:
template <class T> struct node {
node() { next = 0; }
node(T i, node* n = 0) : data(i), next(n) {}
node* next;
T data;
}
If this is the case, it seems that operations like dereferencing will have to be overloaded.

The std::list<T>::iterator objects internally point to a node but have operators which appropriately follow the pointers to next or previoys node. That is, they are not pointers as incrementing a pointer just adds one rather than follow a link. You can inagine a list iterator looks somewhat like this:
template <typename T>
class list_iterator {
friend list<T>
Node* node;
list_iterator(T* node):node(node) {}
list_iterator& operator++() {
node = node->next;
return *this;
}
T& operator*() { return *node; }
// ...
};

The iterator on lists should behave similar to the iterators on other sequence containers like vector. I.e. it should behave like a pointer to the list::value_type as if it were in an array or similar (with ++ and -- doing the expected operation going to next and previous). The internals of the holding structure aren't really exposed through the iterator. The iterator abstraction generally frees the programmer from thinking about how the data is stored. In the future, you could theoretically swap your std::list for a std::vector without changing your code, so long as you're only using operations available to both.

Related

Decrementing iterator from end() iterator in binary search tree

In a binary search tree, the end() member function shall simply return iterator(nullptr), right? But the node nullptr contains no information about its left, right, and parent data. So how do we decrement an iterator from end()? The following is what I have so far (defined in the iterator<T> class). My iterator<T> class has node<T>* ptr as its only data member, and my node<T> class has members T value; node *left, *right, *parent;.
iterator& operator--() {
// if (!ptr) what to do???
if (ptr->left)
ptr = node<T>::max_node(ptr->left);
else {
node<T>* previous = ptr;
ptr = ptr->parent;
while (ptr && ptr->left == previous) {
previous = ptr;
ptr = ptr->parent;
}
}
return *this;
}
If you want your iterator to be a bidirectional iterator you’ll need to provide the necessary information to find the last node of the tree in the iterator returned by end(). To avoid dealing with a special case just to decrement the end() iterator you could use a “secret” node which uses its left pointer to point at the root of the tree. This way the end() iterator would simply be a pointer to this secret node: with this approach there is no need for any special treatment in increment, decrement, or comparison.
Since it is generally undesirable to store a node value in the root the usual implementation splits the nodes into a NodeBase holding only the pointer needed for navigation and a NodeValue deriving from the NodeBase and adding the node’s value. The root uses a NodeBase member and the iterator type a NodeBase*: for the navigation only the pointers are needed. Since only valid iterators are allowed to be dereferenced a static_cast<NodeValue*>(n) can be used to obtain the NodeValue when accessing the node’s value.
Instead of end() being iterator(nullptr) make it iterator(&anchor), where anchor is the member of the containing tree that is used to hold the root beginning and final node pointers. And then to decrement you simply step back to the preceding node (the opposite action to increment).
It does require somewhat more work than simply holding a root node pointer but it also allows begin() and end() to be O(1) operations.

STL-like List made of linked list with nodes

Is there a possiblity to write a template for List (like in STL) that will be made of double linked list using nodes connected to themself and to provide ability to use iterators like begin or end?
If I had nested class:
class Node{
T data;
Node* previous, next;
Node(T &data, Node* next);
};
And my list would have begin() function:
template<class T>
class List {
Node *data; //first element
...
public:
T* begin() { return data->data; }; //return content of the first element
...
I assume that if I would like to use that list with for example std::copy function like
copy(l.begin(), l.end(), out);
then copy function iterates through the list using "begin++" then it would like to increment a pointer that points to the "data" object inside of node. Then it would not take a data from next node.
So is it possible to make that kind of a list?
First of all, there's std::list - which is probably what you want.
Secondly, your implementation of begin() does not fit the expectation for what that function returns for containers. You'll want to return something that at the very least models ForwardIterator (and since it's doubly-linked, BidirectionalIterator). Basically, this needs to work:
List<int> my_list = ...;
auto it = my_list.begin();
int& x = *it; // first value in the list
++it; // next element in the list
int& y = *it; // next value in the list
Right now, begin() yields a List<int>::Node*. That dereferences to a List<int>::Node, but it should dereference to an int. Wrong type and leaking the abstraction. Incrementing the pointer compiles, but it will point to some arbitrary spot in memory rather than the next node. There is no guarantee after all that the next node will be adjacent in memory (almost certainly it isn't!)
So you need to write your own iterator type which wraps your Node class, which will have to do those operations correctly based on the iterator concepts. Basically you're just mapping the iterator concept operations to what those look like for Node. To get you started as an example:
Node* underlying;
iterator& operator++() {
underlying = underlying->next;
return *this;
}
T& operator*() {
return underlying->data;
}
Also, check out the Boost's Iterator Facade library, which is helpful for writing iterators correctly.

Writing traversal methods for classes (TreeNode) that are designed to be inside (arbitrary) containers

I have a class that (when simplified) looks like this:
class TreeNode
{
ptrdiff_t sibling_offset, first_child_offset;
public:
long whatever;
// ...
};
The tree nodes must contain offsets instead of pointers, because it needs to be possible for them to be embedded in containers (like std::vector) that may reallocate their storage when necessary, without having to spend time re-linking all the nodes.
Now, if I have a suitably defined class TreeIterator<Iter> (perhaps defined as a friend of TreeNode) whose job it is to iterate over a TreeNode's children, then any STL-style client of my class should be able to use it to iterate over the children of a node in a standard, STL fashion:
typedef std::vector<TreeNode> Tree;
Tree tree = ...;
TreeIterator<Tree::iterator> const root = tree.begin();
for (TreeIterator<Tree::iterator> i = root->begin(); i != root->end(); ++i)
{
foo(i->whatever);
process_further(i);
}
The trouble is, root->begin() is impossible because the TreeNode doesn't know anything about the container it's in.
(And it shouldn't! The only thing it cares about is that the container has suitable iterators.)
And yet, I (the author of TreeNode) am the only one who could possibly how to iterate over its children.
How do I resolve this issue, without restricting the type of the container that a TreeNode may be stored in?
Obviously this is easy if I force the user to use std::vector, but he should be free to use any arbitrary STL-compliant container.
You just define functions begin() and end() in TreeNode. And then use them in your code.
class TreeNode {
...
std::vector<T>::iterator begin() {return vec.begin();}
std::vector<T>::iterator end() {return vec.end();}
...
private:
std::vector<T> vec;
}

Copy constructor for a generic doubly-linked list

template <class T>
class list
{
public:
//stuff
list(const list &cctorList); //cctor
private:
struct node
{
node *next;
node *previous;
T *info;
}
node *firstNode; //pointer to the first node (NULL if none)
node *lastNode; //pointer to the last node (NULL if none)
}
I'm now trying to define list(const list &cctorList); //cctor but I'm running into trouble.
Here's what I have so far:
template <class T>
list<T>::list(const list &cctorList)
{
node *another = new node;
firstNode = another;
another->previous = NULL;
another->info = new T(*(cctorList->info));
// ...
}
Is everything up to this point correct? Is there a way for me to recursively assign another->next? Also, is there an easier way to accomplish this using iterators?
You should be using std::list. In fact, you should be using std::vector, because it is faster for most practical purposes (list is only faster if the objects are really large or really expensive to construct) and you don't need random access.
new T(*(cctorList->info)); won't compile, because cctorList (list&) does not have operator-> and it does not have info member either.
The copy constructor is best implemented in terms of the other, more primitive operations like push_back and iteration. So first do those and than the copy constructor becomes:
template <class T>
list<T>::list(const list &cctorList)
{
std::copy(begin(cctorList), end(cctorList), std::back_inserter(this));
}
In fact I'd just template that constructor:
template <class T, class Collection> list(const Collection &cctorList)
(body remains the same). That works as copy constructor, but also allows copying from any other collection of any type that can be implicitly converted to T.
The actual data should be held by value. I.e. the node should be defined as
struct node
{
node *next;
node *previous;
T info;
}
you are copying the value anyway, so you don't need to do two separate allocations for node and T when one will do.
Edit: You say you want to learn concepts. But the most important concept of modern C++ is composition of algorithms. Their definitions are often trivial. Basic implementation of std::copy is just:
template <typename InputIterator, typename OutputIterator>
OutputIterator copy(InputIterator begin, InputIterator end, OutputIterator out) {
for(;begin != end; ++out, ++begin) *out = *begin;
}
Now this does not appear to allocate anything. The trick lies in the back_insertion_iterator. Insertion iterator is a trick to make this work without preallocating the sequences. It defines operator* using push_back on the underlying collection and ignores operator++. That satisfies "output iterator" concept, because it only guarantees to work when these two calls are strictly interleaved and makes algorithms work on many things from plain old arrays to output streams.
The other part is that while the trivial definitions are correct, they are not the actual definitions used in the library. The actual definitions in the library are optimized. E.g. usually std::copy will check whether the input iterators know their distance and if the output is insert operator to sequence with reserve operation and call it to avoid some allocations. Those are optimizations and depend on implementation details of the standard library.
You can go and write down the basic implementations of things from standard library and test they work the same if you want to understand them. But you should follow the way standard library defines things by building them up from simple helper bits like std::copy, std::swap, insertion iterator adapters and such. If you look in the standard library, most functions there are one-liners!
Edit2: Also with all the genericity the standard library provides, there are still bits criticized for not being generic enough. E.g. GotW #84: Monoliths "Unstrung" discusses which methods of std::string could be converted to generic algorithms.

What does it mean for a data structure to be "intrusive"?

I've seen the term intrusive used to describe data structures like lists and stacks, but what does it mean?
Can you give a code example of an intrusive data structure, and how it differs from a non-intrusive one?
Also, why make it intrusive (or, non-intrusive)? What are the benefits? What are the disadvantages?
An intrusive data structure is one that requires help from the elements it intends to store in order to store them.
Let me reword that. When you put something into that data structure, that "something" becomes aware of the fact that it is in that data structure, in some way. Adding the element to the data structure changes the element.
For instance, you can build a non-intrusive binary tree, where each node have a reference to the left and right sub-trees, and a reference to the element value of that node.
Or, you can build an intrusive one where the references to those sub-trees are embedded into the value itself.
An example of an intrusive data structure would be an ordered list of elements that are mutable. If the element changes, the list needs to be reordered, so the list object has to intrude on the privacy of the elements in order to get their cooperation. ie. the element has to know about the list it is in, and inform it of changes.
ORM-systems usually revolve around intrusive data structures, to minimize iteration over large lists of objects. For instance, if you retrieve a list of all the employees in the database, then change the name of one of them, and want to save it back to the database, the intrusive list of employees would be told when the employee object changed because that object knows which list it is in.
A non-intrusive list would not be told, and would have to figure out what changed and how it changed by itself.
In a intrusive container the data itself is responsible for storing the necessary information for the container. That means that on the one side the data type needs to be specialized depending on how it will be stored, on the other side it means that the data "knows" how it is stored and thus can be optimized slightly better.
Non-intrusive:
template<typename T>
class LinkedList
{
struct ListItem
{
T Value;
ListItem* Prev;
ListItem* Next;
};
ListItem* FirstItem;
ListItem* LastItem;
[...]
ListItem* append(T&& val)
{
LastItem = LastItem.Next = new ListItem{val, LastItem, nullptr};
};
};
LinkedList<int> IntList;
Intrusive:
template<typename T>
class LinkedList
{
T* FirstItem;
T* LastItem;
[...]
T* append(T&& val)
{
T* newValue = new T(val);
newValue.Next = nullptr;
newValue.Prev = LastItem;
LastItem.Next = newValue;
LastItem = newValue;
};
};
struct IntListItem
{
int Value;
IntListItem* Prev;
IntListItem* Next;
};
LinkedList<IntListItem> IntList;
Personally I prefer intrusive design for it's transparency.