Linked Lists example - Why are they used? - c++

I was looking at a code block on how to get interface information for Unix / iOS / Mac OS X (IP address, interface names, etc.), and wanted to understand more of why linked lists are used. I'm not a full-time programmer, but I can code and always trying to learn. I do understand basic C/C++ but never had experience or had to use linked lists.
I'm trying to learn OS X and iOS development and was trying to get network interface information and came across this:
https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man3/getifaddrs.3.html
If I understand this correctly, it appears a linked list is used to link a bunch of structs together for each interface. Why is a linked list used in this situation? How come the structs aren't just created and stored in an array?
Thanks

Linked list algorithms are very nice when you don't know how many elements are going to be in the list when you get started, or if you may add or remove elements over time. Linked lists are especially powerful if you want to add or remove elements anywhere other than the end of the list. Linked lists are very common in Unix. Probably the best place to research is Wikipedia, which discuss the advantages, disadvantages, and other details. But the primary lesson is that linked lists are very good for dynamic data structures, while arrays tend to be better when things are static.
Network interfaces may feel very static if you think of them as "network cards," but they're used for many other things like VPN connections and can change quite often.

[...] and wanted to understand more of why linked lists are used. I'm not a full-time programmer, but I can code and always trying to learn. I do understand basic C/C++ but never had experience or had to use linked lists.
Linked lists are actually an extremely simple data structure. They come in a few varieties but the overall concept is just to allocate nodes and link them together through indices or pointers, like so:
Why is a linked list used in this situation?
Linked lists have some interesting properties, one of which is noted in the above diagram, like constant-time removals and insertions from/to the middle.
How come the structs aren't just created and stored in an array?
They actually could be. In the diagram above, the nodes can be directly stored in the array. The point of then linking the nodes is to allow things like rapid insertions and removals. An array of elements doesn't offer that flexibility, but if you store an array of nodes which store indices or pointers to next and possibly previous elements, then you can start rearranging the structure and removing things and inserting things to the middle all in constant-time by just playing with the links.
The most efficient uses of linked lists often store the nodes contiguously or partially contiguously (ex: using a free list) and just link them together to allow rapid insertions and removals. You can just store the nodes in a big array, like vector, and then link things up and unlink them through indices. Another interesting property of linked lists is that you can rapidly transfer the elements from the middle of one list to another by just changing a couple of pointers.
They also have a property which makes them very efficient to store contiguously when care is paid to their allocation in that every node is of the same size. As an example, it can be tricky to represent a bunch of variable-sized buckets efficiently if they all use their own array-like container, since each one would want to allocate a different amount of memory. However, if they just store an index/pointer to a list node, they can easily store all the nodes in one giant array for all the buckets.
That said, in C++, linked lists are often misused. In spite of their algorithmic benefits, of lot of that doesn't actually translate to superior performance if the nodes are not allocated in a way that provides spatial locality. Otherwise you can incur a cache miss and possibly some page faults accessing every single node.
Nevertheless, used with care about where the nodes go in memory, they can be tremendously useful. Here is one example usage:
In this case, we might have a particle simulation where every single particle is moving around each frame with collision detection where we partition the screen into grid cells. This allows us to avoid quadratic complexity collision detection, since a particle only needs to check for collision with other particles in the same cell. A real-world version might store 100x100 grid cells (10,000 grid cells).
However, if we used an array-based data structure like std::vector for all 10,000 grid cells, that would be explosive in memory. On top of that, transferring each particle from one cell to another would be a costly linear-time operation. By utilizing a linked list here (and one that just uses integers into an array for the links), we can just change a few indices (pointers) here and there to transfer a particle from one cell to another as it moves, while the memory usage is quite cheap (10,000 grid cells means 10,000 32-bit integers which translates to about 39 kilobytes with 4 bytes of overhead per particle for the link).
Used carefully, linked lists are a tremendously useful structure. However, they can often be misused since a naive implementation which wants to allocate every single node separately against a general-purpose memory allocator tends to incur cache misses galore as the nodes will be very fragmented in memory. The useful of linked lists tends to be a detail forgotten lately, especially in C++, since the std::list implementation, unless used with a custom allocator, is in that naive cache misses-galore category. However, the way they're used in operating systems tends to be very efficient, reaping these algorithmic benefits mentioned above without losing locality of reference.

There are various ways to store data. In c++, the first choice is typically a std::vector, but there are std::list and other containers - the choice will depend on several factors such as how often and where do you want to insert/delete things (vector is great for deleting/adding at the end, but inserting in the middle is bad - linked lists take much less to insert in the middle, but will be less good to iterate over).
However, the API for this function is a classic C (rather than C++), so we have to have a "variable length container", and of course, we could implement something in C that resembles std::vector (a value that holds number of elements and a pointer to the actual elements). I'm not sure why the designers DIDN'T do that in this case, but a linked list has the great advantage that it is near zero cost to extend it with one more element. If you don't know beforehand how many there will be, this is a good benefit. And my guess is that there aren't many enough of these objects worry about performance as such [the caller can always rearrange it into a more suitable form later].

Linked lists are very perfect data structures to store very large amount of data's whose number of element is not known. It is very flexible data structure which expand and contract on run time. It also reduce the extra memory allocation or waste because they use dynamic memories to store data. When we finish to use the data it deletes the data as well as that memory allocation.

I agree with everyone here about the benefits of linked list over array for dynamic data length but i need to add something
if the ifaddrs allocated structures are identical in length ... there is no any advantage of using linked list over array.. and if so i can consider it as a "bad design"
but if not (and may be this is the case ..please notice " The ifaddrs structure contains at least the following entries"... the array will not be the proper representation for variable length structures
consider this example
struct ifaddrs
{
struct ifaddrs *ifa_next; /* Pointer to next struct */
char *ifa_name; /* Interface name */
u_int ifa_flags; /* Interface flags */
struct sockaddr *ifa_addr; /* Interface address */
struct sockaddr *ifa_netmask; /* Interface netmask */
struct sockaddr *ifa_dstaddr; /* P2P interface destination */
void *ifa_data; /* Address specific data */
};
struct ifaddrs_ofothertype
{
struct ifaddrs ifaddrs; /* embed the original structure */
char balhblah[256]; /* some other variable */
};
the mentioned function can return a list of mixed structure like (ifaddrs_ofothertype* casted to ifaddrs*) and (ifaddrs*) without worrying about structure length for each element

If you want to learn iOS you have to learn pointer and memory allocation knowledge from the very base. Although Objective-C is the next generation programming language of C programming language but has a bit difference in syntax specially in method calling and definition. Before you get into iOS/Mac OSX you should have understand Pointers knowledge, MVC knowledge and also understand the core information of iOS Frameworks then you can be a professional iOS Developer.
For that visit RayWenderLich iOS Tutiorials

Related

Which data structure is sorted by insertion and has fast "contains" check?

I am looking for a data structure that preserves the order in which the elements were inserted and offers a fast "contains" predicate. I also need iterator and random access. Performance during insertion or deletion is not relevant. I am also willing to accept overhead in terms of memory consumption.
Background: I need to store a list of objects. The objects are instances of a class called Neuron and stored in a Layer. The Layer object has the following public interface:
class Layer {
public:
Neuron *neuronAt(const size_t &index) const;
NeuronIterator begin();
NeuronIterator end();
bool contains(const Neuron *const &neuron) const;
void addNeuron(Neuron *const &neuron);
};
The contains() method is called quite often when the software runs, I've asserted that using callgrind. I tried to circumvent some of the calls to contains(), but is still a hot spot. Now I hope to optimize exactly this method.
I thought of using std::set, using the template argument to provide my own comparator struct. But the Neuron class itself does not give its position in the Layer away. Additionally, I'd like to have *someNeuronIterator = anotherNeuron to work without screwing up the order.
Another idea was to use a plain old C array. Since I do not care about the performance of adding a new Neuron object, I thought I could make sure that the Neuron objects are always stored linear in memory. But that would invalidate the pointer I pass to addNeuron(); at least I'd have to change it to point to the new copy I created to keep things linear aligned. Right?
Another idea was to use two data structures in the Layer object: A vector/list for the order, and a map/hash for lookup. But that would contradict my wish for an iterator that allowed operator* without a const reference, wouldn't it?
I hope somebody can hint an idea for a data structure or a concept that would satisfy my needs, or at least give me an idea for an alternative. Thanks!
If this contains check is really where you need the fastest execution, and assuming you can be a little intrusive with the source code, the fastest way to check if a Neuron belongs in a layer is to simply flag it when you insert it into a layer (ex: bit flag).
You have guaranteed O(1) checks at that point to see if a Neuron belongs in a layer and it's also fast at the micro-level.
If there can be numerous layer objects, this can get a little trickier, as you'll need a separate bit for each potential layer a neuron can belong to unless a Neuron can only belong in a single layer at once. This is reasonably manageable, however, if the number of layers are relatively fixed in size.
If the latter case and a Neuron can only belong to one layer at once, then all you need is a backpointer to Layer*. To see if a Neuron belongs in a layer, simply see if that backpointer points to the layer object.
If a Neuron can belong to multiple layers at once, but not too many at one time, then you could store like a little array of backpointers like so:
struct Neuron
{
...
Layer* layers[4]; // use whatever small size that usually fits the common case
Layer* ptr;
int num_layers;
};
Initialize ptr to point to layers if there are 4 or fewer layers to which the Neuron belongs. If there are more, allocate it on the free store. In the destructor, free the memory if ptr != layers. You can also optimize away num_layers if the common case is like 1 layer, in which case a null-terminated solution might work better. To see if a Neuron belongs to a layer, simply do a linear search through ptr. That's practically constant-time complexity with respect to the number of Neurons provided that they don't belong in a mass number of layers at once.
You can also use a vector here but you might reduce cache hits on those common case scenarios since it'll always put its contents in a separate block, even if the Neuron only belongs to like 1 or 2 layers.
This might be a bit different from what you were looking for with a general-purpose, non-intrusive data structure, but if your performance needs are really skewed towards these kinds of set operations, an intrusive solution is going to be the fastest in general. It's not quite as pretty and couples your element to the container, but hey, if you need max performance...
Another idea was to use a plain old C array. Since I do not care about the performance of adding a new Neuron object, I thought I could make sure that the Neuron objects are always stored linear in memory. But that would invalidate the pointer I pass to addNeuron(); [...]
Yes, but it won't invalidate indices. While not as convenient to use as pointers, if you're working with mass data like vertices of a mesh or particles of an emitter, it's common to use indices here to avoid the invalidation and possibly to save an extra 32-bits per entry on 64-bit systems.
Update
Given that Neurons only exist in one Layer at a time, I'd go with the back pointer approach. Seeing if a neuron belongs to a layer becomes a simple matter of checking if the back pointer points to the same layer.
Since there's an API involved, I'd suggest, just because it sounds like you're pushing around a lot of data and have already profiled it, that you focus on an interface which revolves around aggregates (layers, e.g.) rather than individual elements (neurons). It'll just leave you a lot of room to swap out underlying representations when your clients aren't performing operations at the individual scalar element-type interface.
With the O(1) contains implementation and the unordered requirement, I'd go with a simple contiguous structure like std::vector. However, you do expose yourself to potential invalidation on insertion.
Because of that, if you can, I'd suggest working with indices here. However, that become a little unwieldy since it requires your clients to store both a pointer to the layer in which a neuron belongs in addition to its index (though if you do this, the backpointer becomes unnecessary as the client is tracking where things belong).
One way to mitigate this is to simply use something like std::vector<Neuron*> or ptr_vector if available. However, that can expose you to cache misses and heap overhead, and if you want to optimize that, this is where the fixed allocator comes in handy. However, that's a bit of a pain with alignment issues and a bit of a research topic, and so far it seems like your main goal is not to optimize insertion or sequential access quite as much as this contains check, so I'd start with the std::vector<Neuron*>.
You can get O(1) contains-check, O(1) insert and preserve insertion order. If you are using Java, looked at LinkedHashMap. If you are not using Java, look at LinkedHashMap and figure out a parallel data structure that does it or implement it yourself.
It's just a hashmap with a doubly linked list. The link list is to preserve order and the hashmap is to allow O(1) access. So when you insert an element, it makes an entry with the key, and the map will point to a node in the linked list where your data will reside. To look up, you go to the hash table to find the pointer directly to your linked list node (not the head), and get the value in O(1). To access them sequentially, you just traverse the linked list.
A heap sounds like it could be useful to you. It's like a tree, but the newest element is always inserted at the top, and then works its way down based on its value, so there is a method to quickly check if it's there.
Otherwise, you could store a hash table (quick method to check if the neuron is contained in the table) with the key for the neuron, and values of: the neuron itself, and the timestep upon which the neuron was inserted (to check its chronological insertion time).

Arrays vs. Doubly Linked Lists for Queue simulation

I am working on an assignment for school simulating a line with students and multiple windows open at the Registrar's Office.
I got the queue for the students down but it was suggested by someone that I use an array for the windows implementing our queue class we made on our own.
I don't understand why an array would work when there are other variables I want to know about each window besides just the student time decrementing.
I'm just looking for some direction or more in depth explanation on how that's possible to use an array to just store the time each student is at the window as opposed to another doubly linked list?
The way I see it you've got a variable number of students and a fixed number of windows (buildings don't usually change all that often). If I were to make a representation of this in code I would use a dynamically sized container (a list, vector, queue, etc.) to contain all the students and a fixed-size array for the registers. This would embody the intent of the real situation in code, making it less likely that someone else using your code makes any mistakes related to the size of the Registrar's Office. Often choosing a container type is all about its intended use!
Thus you can design a class to hold all the registers using a fixed-size array (or even nicer: a template-dictated size seeing as your using C++). Then you can write all your other Registrar-related functions using the given size argument and thus never go out-of-bounds in your Registrar-array.
Lastly: an array holds whatever information you want it to hold. You can have it hold only numbers (like int) but you can also have it hold objects of any type! What I mean to say is: create a Registrar class that holds all the information you want to collect for every individual Registrar. Then create an array that holds Registrar objects. Then whenever you access an individual element in the array you can access all the information of the individual Registrar through that single reference.

Concatenate 2 STL vectors in constant O(1) time

I'll give some context as to why I'm trying to do this, but ultimately the context can be ignored as it is largely a classic Computer Science and C++ problem (which must surely have been asked before, but a couple of cursory searches didn't turn up anything...)
I'm working with (large) real time streaming point clouds, and have a case where I need to take 2/3/4 point clouds from multiple sensors and stick them together to create one big point cloud. I am in a situation where I do actually need all the data in one structure, whereas normally when people are just visualising point clouds they can get away with feeding them into the viewer separately.
I'm using Point Cloud Library 1.6, and on closer inspection its PointCloud class (under <pcl/point_cloud.h> if you're interested) stores all data points in an STL vector.
Now we're back in vanilla CS land...
PointCloud has a += operator for adding the contents of one point cloud to another. So far so good. But this method is pretty inefficient - if I understand it correctly, it 1) resizes the target vector, then 2) runs through all Points in the other vector, and copies them over.
This looks to me like a case of O(n) time complexity, which normally might not be too bad, but is bad news when dealing with at least 300K points per cloud in real time.
The vectors don't need to be sorted or analysed, they just need to be 'stuck together' at the memory level, so the program knows that once it hits the end of the first vector it just has to jump to the start location of the second one. In other words, I'm looking for an O(1) vector merging method. Is there any way to do this in the STL? Or is it more the domain of something like std::list#splice?
Note: This class is a pretty fundamental part of PCL, so 'non-invasive surgery' is preferable. If changes need to be made to the class itself (e.g. changing from vector to list, or reserving memory), they have to be considered in terms of the knock on effects on the rest of PCL, which could be far reaching.
Update: I have filed an issue over at PCL's GitHub repo to get a discussion going with the library authors about the suggestions below. Once there's some kind of resolution on which approach to go with, I'll accept the relevant suggestion(s) as answers.
A vector is not a list, it represents a sequence, but with the additional requirement that elements must be stored in contiguous memory. You cannot just bundle two vectors (whose buffers won't be contiguous) into a single vector without moving objects around.
This problem has been solved many times before such as with String Rope classes.
The basic approach is to make a new container type that stores pointers to point clouds. This is like a std::deque except that yours will have chunks of variable size. Unless your clouds chunk into standard sizes?
With this new container your iterators start in the first chunk, proceed to the end then move into the next chunk. Doing random access in such a container with variable sized chunks requires a binary search. In fact, such a data structure could be written as a distorted form of B+ tree.
There is no vector equivalent of splice - there can't be, specifically because of the memory layout requirements, which are probably the reason it was selected in the first place.
There's also no constant-time way to concatenate vectors.
I can think of one (fragile) way to concatenate raw arrays in constant time, but it depends on them being aligned on page boundaries at both the beginning and the end, and then re-mapping them to be adjacent. This is going to be pretty hard to generalise.
There's another way to make something that looks like a concatenated vector, and that's with a wrapper container which works like a deque, and provides a unified iterator and operator[] over them. I don't know if the point cloud library is flexible enough to work with this, though. (Jamin's suggestion is essentially to use something like this instead of the vector, and Zan's is roughly what I had in mind).
No, you can't concatenate two vectors by a simple link, you actually have to copy them.
However! If you implement move-semantics in your element type, you'd probably get significant speed gains, depending on what your element contains. This won't help if your elements don't contain any non-trivial types.
Further, if you have your vector reserve way in advance the memory needed, then that'd also help speed things up by not requiring a resize (which would cause an undesired huge new allocation, possibly having to defragment at that memory size, and then a huge memcpy).
Barring that, you might want to create some kind of mix between linked-lists and vectors, with each 'element' of the list being a vector with 10k elements, so you only need to jump list links once every 10k elements, but it allows you to dynamically grow much easier, and make your concatenation breeze.
std::list<std::vector<element>> forIllustrationOnly; //Just roll your own custom type.
index = 52403;
listIndex = index % 1000
vectorIndex = index / 1000
forIllustrationOnly[listIndex][vectorIndex] = still fairly fast lookups
forIllustrationOnly[listIndex].push_back(vector-of-points) = much faster appending and removing of blocks of points.
You will not get this scaling behaviour with a vector, because with a vector, you do not get around the copying. And you can not copy an arbitrary amount of data in fixed time.
I do not know PointCloud, but if you can use other list types, e.g. a linked list, this behaviour is well possible. You might find a linked list implementation which works in your environment, and which can simply stick the second list to the end of the first list, as you imagined.
Take a look at Boost range joint at http://www.boost.org/doc/libs/1_54_0/libs/range/doc/html/range/reference/utilities/join.html
This will take 2 ranges and join them. Say you have vector1 and vector 2.
You should be able to write
auto combined = join(vector1,vector2).
Then you can use combined with algorithms, etc as needed.
No O(1) copy for vector, ever, but, you should check:
Is the element type trivially copyable? (aka memcpy)
Iff, is my vector implementation leveraging this fact, or is it stupidly looping over all 300k elements executing a trivial assignment (or worse, copy-ctor-call) for each element?
What I have seen is that, while both memcpyas well as an assignment-for-loop have O(n) complexity, a solution leveraging memcpy can be much, much faster.
So, the problem might be that the vector implementation is suboptimal for trivial types.

Caching a linked list - is it possible?

I know that arrays may fully exploit the caching mechanisms on a x86_64 architecture by fitting into cache lines and because of their sequential nature. A linked list is a series of structs/objects linked together by pointers, is it possible to take advantage of the caching system with such a structure? Linked list's objects may be allocated anywhere in memory
It's true that linked list entries can be anywhere, but they don't have to be "just anywhere". For instance, you can allocate them out of a "zone". Allocate a bunch of contiguous entries at one time, string them together into a list of "free entries that are contiguous", and then parcel them out. Allocate another zone-full as needed. With some not-very-clean tricks you can eventually re-linearize freed entries, and so on.
Most of the time it's not actually worth going to all this effort, though.
You can have multiple entries per linked list element, i.e. a small array of entries in each element. This allows caching of a few entries whilst still maintaining the dynamic nature of the list.
This is an unrolled list and sort of gives you what you're after.
You can probably have one element of linked list to contain more than 1 data entry.
e.g. consider below struct.
struct myll{
int data[16];
char valid[16/8];
struct myll* next;
}
This way, you are making the granularity as 16 entries per node. However, you still have an option to add more entries than 16, using another node & delete using "valid" flag. It's a bit painful to implement, but depends on your requirement.
I guess, somewhat similar mechanism is used for some file systems.

Vector versus dynamic array, does it make a big difference in speed?

Now I am writing some code for solving vehicle routing problems. To do so, one important decision is to choose how to encode the solutions. A solution contains several routes, one for each vehicle. Each route has a customer visiting sequence, the load of route, the length of route.
To perform modifications on a solution the information, I also need to quickly find some information.
For example,
Which route is a customer in?
What customers does a route have?
How many nodes are there in a route?
What nodes are in front of or behind a node?
Now, I am thinking to use the following structure to keep a solution.
struct Sol
{
vector<short> nextNode; // show what is the next node of each node;
vector<short> preNode; //show what is the preceding node
vector<short> startNode;
vector<short> rutNum;
vector<short> rutLoad;
vector<float> rutLength;
vector<short> rutSize;
};
The common size of each vector is instance dependent, between 200-2000.
I heard it is possible to use dynamic array to do this job. But it seems to me dynamic array is more complicated. One has to locate the memory and release the memory. Here my question is twofold.
How to use dynamic array to realize the same purpose? how to define the struct or class so that memory location and release can be easily taken care of?
Will using dynamic array be faster than using vector? Assuming the solution structure need to be accessed million times.
It is highly unlikely that you'll see an appreciable performance difference between a dynamic array and a vector since the latter is essentially a very thin wrapper around the former. Also bear in mind that using a vector would be significantly less error-prone.
It may, however, be the case that some information is better stored in a different type of container altogether, e.g. in an std::map. The following might be of interest: What are the complexity guarantees of the standard containers?
It is important to give some thought to the type of container that gets used. However, when it comes to micro-optimizations (such as vector vs dynamic array), the best policy is to profile the code first and only focus on parts of the code that prove to be real -- rather than assumed -- bottlenecks.
It's quite possible that vector's code is actually better and more performant than dynamic array code you would write yourself. Only if profiling shows significant time spent in vector would I consider writing my own error-prone replacement. See also Dynamically allocated arrays or std::vector
I'm using MSVC and the implementation looks to be as quick as it can be.
Accessing the array via operator [] is:
return (*(this->_Myfirst + _Pos));
Which is as quick as you are going to get with dynamic memory.
The only overhead you are going to get is in the memory use of a vector, it seems to create a pointer to the start of the vector, the end of the vector, and the end of the current sequence. This is only 2 more pointers than you would need if you were using a dynamic array. You are only creating 200-2000 of these, I doubt memory is going to be that tight.
I am sure the other stl implementations are very similar. I would absorb the minor cost of vector storage and use them in your project.