Related
I was looking at a code block on how to get interface information for Unix / iOS / Mac OS X (IP address, interface names, etc.), and wanted to understand more of why linked lists are used. I'm not a full-time programmer, but I can code and always trying to learn. I do understand basic C/C++ but never had experience or had to use linked lists.
I'm trying to learn OS X and iOS development and was trying to get network interface information and came across this:
https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man3/getifaddrs.3.html
If I understand this correctly, it appears a linked list is used to link a bunch of structs together for each interface. Why is a linked list used in this situation? How come the structs aren't just created and stored in an array?
Thanks
Linked list algorithms are very nice when you don't know how many elements are going to be in the list when you get started, or if you may add or remove elements over time. Linked lists are especially powerful if you want to add or remove elements anywhere other than the end of the list. Linked lists are very common in Unix. Probably the best place to research is Wikipedia, which discuss the advantages, disadvantages, and other details. But the primary lesson is that linked lists are very good for dynamic data structures, while arrays tend to be better when things are static.
Network interfaces may feel very static if you think of them as "network cards," but they're used for many other things like VPN connections and can change quite often.
[...] and wanted to understand more of why linked lists are used. I'm not a full-time programmer, but I can code and always trying to learn. I do understand basic C/C++ but never had experience or had to use linked lists.
Linked lists are actually an extremely simple data structure. They come in a few varieties but the overall concept is just to allocate nodes and link them together through indices or pointers, like so:
Why is a linked list used in this situation?
Linked lists have some interesting properties, one of which is noted in the above diagram, like constant-time removals and insertions from/to the middle.
How come the structs aren't just created and stored in an array?
They actually could be. In the diagram above, the nodes can be directly stored in the array. The point of then linking the nodes is to allow things like rapid insertions and removals. An array of elements doesn't offer that flexibility, but if you store an array of nodes which store indices or pointers to next and possibly previous elements, then you can start rearranging the structure and removing things and inserting things to the middle all in constant-time by just playing with the links.
The most efficient uses of linked lists often store the nodes contiguously or partially contiguously (ex: using a free list) and just link them together to allow rapid insertions and removals. You can just store the nodes in a big array, like vector, and then link things up and unlink them through indices. Another interesting property of linked lists is that you can rapidly transfer the elements from the middle of one list to another by just changing a couple of pointers.
They also have a property which makes them very efficient to store contiguously when care is paid to their allocation in that every node is of the same size. As an example, it can be tricky to represent a bunch of variable-sized buckets efficiently if they all use their own array-like container, since each one would want to allocate a different amount of memory. However, if they just store an index/pointer to a list node, they can easily store all the nodes in one giant array for all the buckets.
That said, in C++, linked lists are often misused. In spite of their algorithmic benefits, of lot of that doesn't actually translate to superior performance if the nodes are not allocated in a way that provides spatial locality. Otherwise you can incur a cache miss and possibly some page faults accessing every single node.
Nevertheless, used with care about where the nodes go in memory, they can be tremendously useful. Here is one example usage:
In this case, we might have a particle simulation where every single particle is moving around each frame with collision detection where we partition the screen into grid cells. This allows us to avoid quadratic complexity collision detection, since a particle only needs to check for collision with other particles in the same cell. A real-world version might store 100x100 grid cells (10,000 grid cells).
However, if we used an array-based data structure like std::vector for all 10,000 grid cells, that would be explosive in memory. On top of that, transferring each particle from one cell to another would be a costly linear-time operation. By utilizing a linked list here (and one that just uses integers into an array for the links), we can just change a few indices (pointers) here and there to transfer a particle from one cell to another as it moves, while the memory usage is quite cheap (10,000 grid cells means 10,000 32-bit integers which translates to about 39 kilobytes with 4 bytes of overhead per particle for the link).
Used carefully, linked lists are a tremendously useful structure. However, they can often be misused since a naive implementation which wants to allocate every single node separately against a general-purpose memory allocator tends to incur cache misses galore as the nodes will be very fragmented in memory. The useful of linked lists tends to be a detail forgotten lately, especially in C++, since the std::list implementation, unless used with a custom allocator, is in that naive cache misses-galore category. However, the way they're used in operating systems tends to be very efficient, reaping these algorithmic benefits mentioned above without losing locality of reference.
There are various ways to store data. In c++, the first choice is typically a std::vector, but there are std::list and other containers - the choice will depend on several factors such as how often and where do you want to insert/delete things (vector is great for deleting/adding at the end, but inserting in the middle is bad - linked lists take much less to insert in the middle, but will be less good to iterate over).
However, the API for this function is a classic C (rather than C++), so we have to have a "variable length container", and of course, we could implement something in C that resembles std::vector (a value that holds number of elements and a pointer to the actual elements). I'm not sure why the designers DIDN'T do that in this case, but a linked list has the great advantage that it is near zero cost to extend it with one more element. If you don't know beforehand how many there will be, this is a good benefit. And my guess is that there aren't many enough of these objects worry about performance as such [the caller can always rearrange it into a more suitable form later].
Linked lists are very perfect data structures to store very large amount of data's whose number of element is not known. It is very flexible data structure which expand and contract on run time. It also reduce the extra memory allocation or waste because they use dynamic memories to store data. When we finish to use the data it deletes the data as well as that memory allocation.
I agree with everyone here about the benefits of linked list over array for dynamic data length but i need to add something
if the ifaddrs allocated structures are identical in length ... there is no any advantage of using linked list over array.. and if so i can consider it as a "bad design"
but if not (and may be this is the case ..please notice " The ifaddrs structure contains at least the following entries"... the array will not be the proper representation for variable length structures
consider this example
struct ifaddrs
{
struct ifaddrs *ifa_next; /* Pointer to next struct */
char *ifa_name; /* Interface name */
u_int ifa_flags; /* Interface flags */
struct sockaddr *ifa_addr; /* Interface address */
struct sockaddr *ifa_netmask; /* Interface netmask */
struct sockaddr *ifa_dstaddr; /* P2P interface destination */
void *ifa_data; /* Address specific data */
};
struct ifaddrs_ofothertype
{
struct ifaddrs ifaddrs; /* embed the original structure */
char balhblah[256]; /* some other variable */
};
the mentioned function can return a list of mixed structure like (ifaddrs_ofothertype* casted to ifaddrs*) and (ifaddrs*) without worrying about structure length for each element
If you want to learn iOS you have to learn pointer and memory allocation knowledge from the very base. Although Objective-C is the next generation programming language of C programming language but has a bit difference in syntax specially in method calling and definition. Before you get into iOS/Mac OSX you should have understand Pointers knowledge, MVC knowledge and also understand the core information of iOS Frameworks then you can be a professional iOS Developer.
For that visit RayWenderLich iOS Tutiorials
I am designing a Graph in c++ using a hash table for its elements. The hashtable is using open addressing and the Graph has no more than 50.000 edges. I also designed a PRIM algorithm to find the minimum spanning tree of the graph. My PRIM algorithm creates storage for the following data:
A table named Q to put there all the nodes in the beginning. In every loop, a node is visited and in the end of the loop, it's deleted from Q.
A table named Key, one for each node. The key is changed when necessary (at least one time per loop).
A table named Parent, one for each node. In each loop, a new element is inserted in this table.
A table named A. The program stores here the final edges of the minimum spanning tree. It's the table that is returned.
What would be the most efficient data structure to use for creating these tables, assuming the graph has 50.000 edges?
Can I use arrays?
I fear that the elements for every array will be way too many. I don't even consider using linked lists, of course, because the accessing of each element will take to much time. Could I use hash tables?
But again, the elements are way to many. My algorithm works well for Graphs consisting of a few nodes (10 or 20) but I am sceptical about the situation where the Graphs consist of 40.000 nodes. Any suggestion is much appreciated.
(Since comments were getting a bit long): The only part of the problem that seems to get ugly for very large size, is that every node not yet selected has a cost and you need to find the one with lowest cost at each step, but executing each step reduces the cost of a few effectively random nodes.
A priority queue is perfect when you want to keep track of lowest cost. It is efficient for removing the lowest cost node (which you do at each step). It is efficient for adding a few newly reachable nodes, as you might on any step. But in the basic design, it does not handle reducing the cost of a few nodes that were already reachable at high cost.
So (having frequent need for a more functional priority queue), I typically create a heap of pointers to objects and in each object have an index of its heap position. The heap methods all do a callback into the object to inform it whenever its index changes. The heap also has some external calls into methods that might normally be internal only, such as the one that is perfect for efficiently fixing the heap when an existing element has its cost reduced.
I just reviewed the documentation for the std one
http://en.cppreference.com/w/cpp/container/priority_queue
to see if the features I always want to add were there in some form I hadn't noticed before (or had been added in some recent C++ version). So far as I can tell, NO. Most real world uses of priority queue (certainly all of mine) need minor extra features that I have no clue how to tack onto the standard version. So I have needed to rewrite it from scratch including the extra features. But that isn't actually hard.
The method I use has been reinvented by many people (I was doing this in C in the 70's, and wasn't first). A quick google search found one of many places my approach is described in more detail than I have described it.
http://users.encs.concordia.ca/~chvatal/notes/pq.html#heap
I'll give some context as to why I'm trying to do this, but ultimately the context can be ignored as it is largely a classic Computer Science and C++ problem (which must surely have been asked before, but a couple of cursory searches didn't turn up anything...)
I'm working with (large) real time streaming point clouds, and have a case where I need to take 2/3/4 point clouds from multiple sensors and stick them together to create one big point cloud. I am in a situation where I do actually need all the data in one structure, whereas normally when people are just visualising point clouds they can get away with feeding them into the viewer separately.
I'm using Point Cloud Library 1.6, and on closer inspection its PointCloud class (under <pcl/point_cloud.h> if you're interested) stores all data points in an STL vector.
Now we're back in vanilla CS land...
PointCloud has a += operator for adding the contents of one point cloud to another. So far so good. But this method is pretty inefficient - if I understand it correctly, it 1) resizes the target vector, then 2) runs through all Points in the other vector, and copies them over.
This looks to me like a case of O(n) time complexity, which normally might not be too bad, but is bad news when dealing with at least 300K points per cloud in real time.
The vectors don't need to be sorted or analysed, they just need to be 'stuck together' at the memory level, so the program knows that once it hits the end of the first vector it just has to jump to the start location of the second one. In other words, I'm looking for an O(1) vector merging method. Is there any way to do this in the STL? Or is it more the domain of something like std::list#splice?
Note: This class is a pretty fundamental part of PCL, so 'non-invasive surgery' is preferable. If changes need to be made to the class itself (e.g. changing from vector to list, or reserving memory), they have to be considered in terms of the knock on effects on the rest of PCL, which could be far reaching.
Update: I have filed an issue over at PCL's GitHub repo to get a discussion going with the library authors about the suggestions below. Once there's some kind of resolution on which approach to go with, I'll accept the relevant suggestion(s) as answers.
A vector is not a list, it represents a sequence, but with the additional requirement that elements must be stored in contiguous memory. You cannot just bundle two vectors (whose buffers won't be contiguous) into a single vector without moving objects around.
This problem has been solved many times before such as with String Rope classes.
The basic approach is to make a new container type that stores pointers to point clouds. This is like a std::deque except that yours will have chunks of variable size. Unless your clouds chunk into standard sizes?
With this new container your iterators start in the first chunk, proceed to the end then move into the next chunk. Doing random access in such a container with variable sized chunks requires a binary search. In fact, such a data structure could be written as a distorted form of B+ tree.
There is no vector equivalent of splice - there can't be, specifically because of the memory layout requirements, which are probably the reason it was selected in the first place.
There's also no constant-time way to concatenate vectors.
I can think of one (fragile) way to concatenate raw arrays in constant time, but it depends on them being aligned on page boundaries at both the beginning and the end, and then re-mapping them to be adjacent. This is going to be pretty hard to generalise.
There's another way to make something that looks like a concatenated vector, and that's with a wrapper container which works like a deque, and provides a unified iterator and operator[] over them. I don't know if the point cloud library is flexible enough to work with this, though. (Jamin's suggestion is essentially to use something like this instead of the vector, and Zan's is roughly what I had in mind).
No, you can't concatenate two vectors by a simple link, you actually have to copy them.
However! If you implement move-semantics in your element type, you'd probably get significant speed gains, depending on what your element contains. This won't help if your elements don't contain any non-trivial types.
Further, if you have your vector reserve way in advance the memory needed, then that'd also help speed things up by not requiring a resize (which would cause an undesired huge new allocation, possibly having to defragment at that memory size, and then a huge memcpy).
Barring that, you might want to create some kind of mix between linked-lists and vectors, with each 'element' of the list being a vector with 10k elements, so you only need to jump list links once every 10k elements, but it allows you to dynamically grow much easier, and make your concatenation breeze.
std::list<std::vector<element>> forIllustrationOnly; //Just roll your own custom type.
index = 52403;
listIndex = index % 1000
vectorIndex = index / 1000
forIllustrationOnly[listIndex][vectorIndex] = still fairly fast lookups
forIllustrationOnly[listIndex].push_back(vector-of-points) = much faster appending and removing of blocks of points.
You will not get this scaling behaviour with a vector, because with a vector, you do not get around the copying. And you can not copy an arbitrary amount of data in fixed time.
I do not know PointCloud, but if you can use other list types, e.g. a linked list, this behaviour is well possible. You might find a linked list implementation which works in your environment, and which can simply stick the second list to the end of the first list, as you imagined.
Take a look at Boost range joint at http://www.boost.org/doc/libs/1_54_0/libs/range/doc/html/range/reference/utilities/join.html
This will take 2 ranges and join them. Say you have vector1 and vector 2.
You should be able to write
auto combined = join(vector1,vector2).
Then you can use combined with algorithms, etc as needed.
No O(1) copy for vector, ever, but, you should check:
Is the element type trivially copyable? (aka memcpy)
Iff, is my vector implementation leveraging this fact, or is it stupidly looping over all 300k elements executing a trivial assignment (or worse, copy-ctor-call) for each element?
What I have seen is that, while both memcpyas well as an assignment-for-loop have O(n) complexity, a solution leveraging memcpy can be much, much faster.
So, the problem might be that the vector implementation is suboptimal for trivial types.
I know that arrays may fully exploit the caching mechanisms on a x86_64 architecture by fitting into cache lines and because of their sequential nature. A linked list is a series of structs/objects linked together by pointers, is it possible to take advantage of the caching system with such a structure? Linked list's objects may be allocated anywhere in memory
It's true that linked list entries can be anywhere, but they don't have to be "just anywhere". For instance, you can allocate them out of a "zone". Allocate a bunch of contiguous entries at one time, string them together into a list of "free entries that are contiguous", and then parcel them out. Allocate another zone-full as needed. With some not-very-clean tricks you can eventually re-linearize freed entries, and so on.
Most of the time it's not actually worth going to all this effort, though.
You can have multiple entries per linked list element, i.e. a small array of entries in each element. This allows caching of a few entries whilst still maintaining the dynamic nature of the list.
This is an unrolled list and sort of gives you what you're after.
You can probably have one element of linked list to contain more than 1 data entry.
e.g. consider below struct.
struct myll{
int data[16];
char valid[16/8];
struct myll* next;
}
This way, you are making the granularity as 16 entries per node. However, you still have an option to add more entries than 16, using another node & delete using "valid" flag. It's a bit painful to implement, but depends on your requirement.
I guess, somewhat similar mechanism is used for some file systems.
Now I am writing some code for solving vehicle routing problems. To do so, one important decision is to choose how to encode the solutions. A solution contains several routes, one for each vehicle. Each route has a customer visiting sequence, the load of route, the length of route.
To perform modifications on a solution the information, I also need to quickly find some information.
For example,
Which route is a customer in?
What customers does a route have?
How many nodes are there in a route?
What nodes are in front of or behind a node?
Now, I am thinking to use the following structure to keep a solution.
struct Sol
{
vector<short> nextNode; // show what is the next node of each node;
vector<short> preNode; //show what is the preceding node
vector<short> startNode;
vector<short> rutNum;
vector<short> rutLoad;
vector<float> rutLength;
vector<short> rutSize;
};
The common size of each vector is instance dependent, between 200-2000.
I heard it is possible to use dynamic array to do this job. But it seems to me dynamic array is more complicated. One has to locate the memory and release the memory. Here my question is twofold.
How to use dynamic array to realize the same purpose? how to define the struct or class so that memory location and release can be easily taken care of?
Will using dynamic array be faster than using vector? Assuming the solution structure need to be accessed million times.
It is highly unlikely that you'll see an appreciable performance difference between a dynamic array and a vector since the latter is essentially a very thin wrapper around the former. Also bear in mind that using a vector would be significantly less error-prone.
It may, however, be the case that some information is better stored in a different type of container altogether, e.g. in an std::map. The following might be of interest: What are the complexity guarantees of the standard containers?
It is important to give some thought to the type of container that gets used. However, when it comes to micro-optimizations (such as vector vs dynamic array), the best policy is to profile the code first and only focus on parts of the code that prove to be real -- rather than assumed -- bottlenecks.
It's quite possible that vector's code is actually better and more performant than dynamic array code you would write yourself. Only if profiling shows significant time spent in vector would I consider writing my own error-prone replacement. See also Dynamically allocated arrays or std::vector
I'm using MSVC and the implementation looks to be as quick as it can be.
Accessing the array via operator [] is:
return (*(this->_Myfirst + _Pos));
Which is as quick as you are going to get with dynamic memory.
The only overhead you are going to get is in the memory use of a vector, it seems to create a pointer to the start of the vector, the end of the vector, and the end of the current sequence. This is only 2 more pointers than you would need if you were using a dynamic array. You are only creating 200-2000 of these, I doubt memory is going to be that tight.
I am sure the other stl implementations are very similar. I would absorb the minor cost of vector storage and use them in your project.