Caching a linked list - is it possible?

Caching a linked list - is it possible? - c++

I know that arrays may fully exploit the caching mechanisms on a x86_64 architecture by fitting into cache lines and because of their sequential nature. A linked list is a series of structs/objects linked together by pointers, is it possible to take advantage of the caching system with such a structure? Linked list's objects may be allocated anywhere in memory

It's true that linked list entries can be anywhere, but they don't have to be "just anywhere". For instance, you can allocate them out of a "zone". Allocate a bunch of contiguous entries at one time, string them together into a list of "free entries that are contiguous", and then parcel them out. Allocate another zone-full as needed. With some not-very-clean tricks you can eventually re-linearize freed entries, and so on.
Most of the time it's not actually worth going to all this effort, though.

You can have multiple entries per linked list element, i.e. a small array of entries in each element. This allows caching of a few entries whilst still maintaining the dynamic nature of the list.
This is an unrolled list and sort of gives you what you're after.

You can probably have one element of linked list to contain more than 1 data entry.
e.g. consider below struct.
struct myll{
int data[16];
char valid[16/8];
struct myll* next;
}
This way, you are making the granularity as 16 entries per node. However, you still have an option to add more entries than 16, using another node & delete using "valid" flag. It's a bit painful to implement, but depends on your requirement.
I guess, somewhat similar mechanism is used for some file systems.

Related

Best STL containers to avoid heap fragmentation

I have a program which analyzes 150,000 files. Valgrind reports no memory leak but the program slows over time.
Some of the problems were related to using std::string too often and mktime taking too much time. (see C++ slows over time reading 70,000 files)
But it still slows down over time. Lotharyx suggested that container usage is causing heap fragmentation.
I read the various flow charts on the pros and cons of the different STL containers but I didn't quite get it.
In the pseudo code below, I'm not sure I've made the right choices to avoid heap fragmentation.
fileList.clear()
scan all disks and build "fileList", a std::set of file paths matching a pattern.
// filepaths are named by date, eg 20160530.051000, so are intrinsically ordered
foreach(filePath in fileList)
{
if (alreadyHaveFileDetails(filePath))
continue;
// otherwise
collect file details into a fileInfoStruct; // like size, contents, mod
fileInfoMap[time_t date] = fileInfoStruct;
}
// fileInfoMap is ordered collection of information structs of about 100,000 files
// traverse the list in order
foreach (fileInfo in fileInfoMap)
{
if (meetsCondition(fileInfo))
{
TEventInfo event = makeEventInfo()
eventList.push_back(event);
}
}
And the above sequence repeats forever.
So for choice of containers, I've used (or need):
fileList -- list of unique strings containing 150,000 pathnames.
I chose std::set because it it automatically handles duplicates and maintains sort order automatically.
No random access, only add the entries, sort them (manually or automatically), and iterate over them.
fileInfoMap -- an array of structures keyed by a time_t timestamp corresponding to the date of the file.
I chose std::map. It too would have 150,000 entries so occupies a lot of memory.
No random access, only add the entries to one end. Must iterate over them and, if necessary, delete entries from the middle.
eventList -- a small list of "event" structures, say 50 items.
I chose std::vector. Not sure why really.
No random access, only add entries to one end and later iterate over the collection.
I'm fairly new to C++. Thanks for your consideration.

About memory management, container belongs to two large families: the one that allocate all elements together, and the one that allocate elements separately.
vector and deque belong to the first family, list, set and map to the second.
Memory fragmentation arises when elements are continuously added and removed from a container that is not supporting global relocation.
One way to avoid the problem is to use a vectors, using "reserve" to anticipate the memory need to reduce relocations, and keeping the data sorted upon insertion.
Another way is to use "linking based container" (like list, set etc.) providing them an allocator that allocate memory from larger chunks, recycling them instead of calling a raw malloc/free for every single element insert/remove.
Give a look to std::allocator
You can easily write an allocator by deriving from std::allocator and overriding the allocate/deallocate functions adding all the required logic, and passing yourallocator as the optional template parameter of the container you will like to use.

Including the end node in a Linked List

I am implementing a linked list in C++ and I am trying to decide which is better: including the end node in the class (thus increasing its size) or not including it (thus decreasing its but also increasing the time to add to the end and whatnot).
My thinking is that since a Node object is relatively small compared to the standard 4-8 GB of RAM that is common now-a-days, it is well worth the extra space to save time.
But I am wondering if anyone can tell me if I am wrong or any instances of you deciding with one or the other for whatever reasons.

Which one is better depends entirely on your use case. There are use cases for not including end and only supporting iteration in a single direction (for which we have std::forward_list) and there are use cases for taking the extra memory and creating a doubly linked list (for which we have std::list).
There is no one true "better" container option. It depends on which operations your program needs to do more of. Each container has its own niche.
Except std::vector. It's just awesome.

Linked Lists example - Why are they used?

I was looking at a code block on how to get interface information for Unix / iOS / Mac OS X (IP address, interface names, etc.), and wanted to understand more of why linked lists are used. I'm not a full-time programmer, but I can code and always trying to learn. I do understand basic C/C++ but never had experience or had to use linked lists.
I'm trying to learn OS X and iOS development and was trying to get network interface information and came across this:
https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man3/getifaddrs.3.html
If I understand this correctly, it appears a linked list is used to link a bunch of structs together for each interface. Why is a linked list used in this situation? How come the structs aren't just created and stored in an array?
Thanks

Linked list algorithms are very nice when you don't know how many elements are going to be in the list when you get started, or if you may add or remove elements over time. Linked lists are especially powerful if you want to add or remove elements anywhere other than the end of the list. Linked lists are very common in Unix. Probably the best place to research is Wikipedia, which discuss the advantages, disadvantages, and other details. But the primary lesson is that linked lists are very good for dynamic data structures, while arrays tend to be better when things are static.
Network interfaces may feel very static if you think of them as "network cards," but they're used for many other things like VPN connections and can change quite often.

[...] and wanted to understand more of why linked lists are used. I'm not a full-time programmer, but I can code and always trying to learn. I do understand basic C/C++ but never had experience or had to use linked lists.
Linked lists are actually an extremely simple data structure. They come in a few varieties but the overall concept is just to allocate nodes and link them together through indices or pointers, like so:
Why is a linked list used in this situation?
Linked lists have some interesting properties, one of which is noted in the above diagram, like constant-time removals and insertions from/to the middle.
How come the structs aren't just created and stored in an array?
They actually could be. In the diagram above, the nodes can be directly stored in the array. The point of then linking the nodes is to allow things like rapid insertions and removals. An array of elements doesn't offer that flexibility, but if you store an array of nodes which store indices or pointers to next and possibly previous elements, then you can start rearranging the structure and removing things and inserting things to the middle all in constant-time by just playing with the links.
The most efficient uses of linked lists often store the nodes contiguously or partially contiguously (ex: using a free list) and just link them together to allow rapid insertions and removals. You can just store the nodes in a big array, like vector, and then link things up and unlink them through indices. Another interesting property of linked lists is that you can rapidly transfer the elements from the middle of one list to another by just changing a couple of pointers.
They also have a property which makes them very efficient to store contiguously when care is paid to their allocation in that every node is of the same size. As an example, it can be tricky to represent a bunch of variable-sized buckets efficiently if they all use their own array-like container, since each one would want to allocate a different amount of memory. However, if they just store an index/pointer to a list node, they can easily store all the nodes in one giant array for all the buckets.
That said, in C++, linked lists are often misused. In spite of their algorithmic benefits, of lot of that doesn't actually translate to superior performance if the nodes are not allocated in a way that provides spatial locality. Otherwise you can incur a cache miss and possibly some page faults accessing every single node.
Nevertheless, used with care about where the nodes go in memory, they can be tremendously useful. Here is one example usage:
In this case, we might have a particle simulation where every single particle is moving around each frame with collision detection where we partition the screen into grid cells. This allows us to avoid quadratic complexity collision detection, since a particle only needs to check for collision with other particles in the same cell. A real-world version might store 100x100 grid cells (10,000 grid cells).
However, if we used an array-based data structure like std::vector for all 10,000 grid cells, that would be explosive in memory. On top of that, transferring each particle from one cell to another would be a costly linear-time operation. By utilizing a linked list here (and one that just uses integers into an array for the links), we can just change a few indices (pointers) here and there to transfer a particle from one cell to another as it moves, while the memory usage is quite cheap (10,000 grid cells means 10,000 32-bit integers which translates to about 39 kilobytes with 4 bytes of overhead per particle for the link).
Used carefully, linked lists are a tremendously useful structure. However, they can often be misused since a naive implementation which wants to allocate every single node separately against a general-purpose memory allocator tends to incur cache misses galore as the nodes will be very fragmented in memory. The useful of linked lists tends to be a detail forgotten lately, especially in C++, since the std::list implementation, unless used with a custom allocator, is in that naive cache misses-galore category. However, the way they're used in operating systems tends to be very efficient, reaping these algorithmic benefits mentioned above without losing locality of reference.

There are various ways to store data. In c++, the first choice is typically a std::vector, but there are std::list and other containers - the choice will depend on several factors such as how often and where do you want to insert/delete things (vector is great for deleting/adding at the end, but inserting in the middle is bad - linked lists take much less to insert in the middle, but will be less good to iterate over).
However, the API for this function is a classic C (rather than C++), so we have to have a "variable length container", and of course, we could implement something in C that resembles std::vector (a value that holds number of elements and a pointer to the actual elements). I'm not sure why the designers DIDN'T do that in this case, but a linked list has the great advantage that it is near zero cost to extend it with one more element. If you don't know beforehand how many there will be, this is a good benefit. And my guess is that there aren't many enough of these objects worry about performance as such [the caller can always rearrange it into a more suitable form later].

Linked lists are very perfect data structures to store very large amount of data's whose number of element is not known. It is very flexible data structure which expand and contract on run time. It also reduce the extra memory allocation or waste because they use dynamic memories to store data. When we finish to use the data it deletes the data as well as that memory allocation.

I agree with everyone here about the benefits of linked list over array for dynamic data length but i need to add something
if the ifaddrs allocated structures are identical in length ... there is no any advantage of using linked list over array.. and if so i can consider it as a "bad design"
but if not (and may be this is the case ..please notice " The ifaddrs structure contains at least the following entries"... the array will not be the proper representation for variable length structures
consider this example
struct ifaddrs
{
struct ifaddrs *ifa_next; /* Pointer to next struct */
char *ifa_name; /* Interface name */
u_int ifa_flags; /* Interface flags */
struct sockaddr *ifa_addr; /* Interface address */
struct sockaddr *ifa_netmask; /* Interface netmask */
struct sockaddr *ifa_dstaddr; /* P2P interface destination */
void *ifa_data; /* Address specific data */
};
struct ifaddrs_ofothertype
{
struct ifaddrs ifaddrs; /* embed the original structure */
char balhblah[256]; /* some other variable */
};
the mentioned function can return a list of mixed structure like (ifaddrs_ofothertype* casted to ifaddrs*) and (ifaddrs*) without worrying about structure length for each element

If you want to learn iOS you have to learn pointer and memory allocation knowledge from the very base. Although Objective-C is the next generation programming language of C programming language but has a bit difference in syntax specially in method calling and definition. Before you get into iOS/Mac OSX you should have understand Pointers knowledge, MVC knowledge and also understand the core information of iOS Frameworks then you can be a professional iOS Developer.
For that visit RayWenderLich iOS Tutiorials

Vector versus dynamic array, does it make a big difference in speed?

Now I am writing some code for solving vehicle routing problems. To do so, one important decision is to choose how to encode the solutions. A solution contains several routes, one for each vehicle. Each route has a customer visiting sequence, the load of route, the length of route.
To perform modifications on a solution the information, I also need to quickly find some information.
For example,
Which route is a customer in?
What customers does a route have?
How many nodes are there in a route?
What nodes are in front of or behind a node?
Now, I am thinking to use the following structure to keep a solution.
struct Sol
{
vector<short> nextNode; // show what is the next node of each node;
vector<short> preNode; //show what is the preceding node
vector<short> startNode;
vector<short> rutNum;
vector<short> rutLoad;
vector<float> rutLength;
vector<short> rutSize;
};
The common size of each vector is instance dependent, between 200-2000.
I heard it is possible to use dynamic array to do this job. But it seems to me dynamic array is more complicated. One has to locate the memory and release the memory. Here my question is twofold.
How to use dynamic array to realize the same purpose? how to define the struct or class so that memory location and release can be easily taken care of?
Will using dynamic array be faster than using vector? Assuming the solution structure need to be accessed million times.

It is highly unlikely that you'll see an appreciable performance difference between a dynamic array and a vector since the latter is essentially a very thin wrapper around the former. Also bear in mind that using a vector would be significantly less error-prone.
It may, however, be the case that some information is better stored in a different type of container altogether, e.g. in an std::map. The following might be of interest: What are the complexity guarantees of the standard containers?
It is important to give some thought to the type of container that gets used. However, when it comes to micro-optimizations (such as vector vs dynamic array), the best policy is to profile the code first and only focus on parts of the code that prove to be real -- rather than assumed -- bottlenecks.

It's quite possible that vector's code is actually better and more performant than dynamic array code you would write yourself. Only if profiling shows significant time spent in vector would I consider writing my own error-prone replacement. See also Dynamically allocated arrays or std::vector

I'm using MSVC and the implementation looks to be as quick as it can be.
Accessing the array via operator [] is:
return (*(this->_Myfirst + _Pos));
Which is as quick as you are going to get with dynamic memory.
The only overhead you are going to get is in the memory use of a vector, it seems to create a pointer to the start of the vector, the end of the vector, and the end of the current sequence. This is only 2 more pointers than you would need if you were using a dynamic array. You are only creating 200-2000 of these, I doubt memory is going to be that tight.
I am sure the other stl implementations are very similar. I would absorb the minor cost of vector storage and use them in your project.

Most efficient tree structure for what I'm trying to do

I'm wondering what the most generally efficient tree structure would be for a collection that has the following requirements:
The tree will hold anywhere between 0 and 232 - 1 items.
Each item will be a simple structure, containing one 32-bit unsigned integer (the item's unique ID, which will be used as the tree value) and two pointers.
Items will be inserted and removed from the tree very often; some items in the tree will remain there for the duration of the program, while others will only be in the tree very briefly before being removed.
Once an item is removed, its unique ID (that 32-bit unsigned integer) will be recycled and reused for a new item.
The tree structure needs to support efficient inserts and deletions, as well as quick lookups by the unique ID. Also, finding the first available unused unique ID needs to be a fast operation.
What sort of tree would be best-suited for these requirements?
EDIT: This tree is going to be held only in memory; at no point will it be persisted to disk. I don't need to worry about hitting the disk, or disk caching, or anything of the sort. This is also why I'm not looking into using something like SQLite.

Depending on how fast you need this to be you might just treat the whole thing as a single, in-memory table mmap-ed onto a file. Addressing is by direct computation. You can simply chain the free slots so you always know exactly where the next free one is. Most accesses will have a max of 1 or 2 disk accesses (depending on underlying filesystem requirements). Put a buttload of memory on the machine and you might not hit the disk at all.
I know this sounds pretty brute force, but you'd be amazed how fast it can be.
Update in response to: "I'm not looking for a disk-persistable solution"
Well, if you truly are going to have as many as 2^32 items in this structure (times how big it is) then you either need enough memory on the machine to hold this puppy or the kernel will start to swap things in and out of memory for you. This still translates to hitting the disk. If you let it swap, don't forget to check the size of the swap area, there's a good chance you'll have to bump it. Using mmap (or something similar) is sort of like creating your own private swap area and it will probably have less impact on other processes running on the same system.
I'll note that once this thing exceeds your available physical memory (whether you are using swap space or mmap or B-trees or Black-Red or extensible hashing or whatever) it becomes critical to understand
your access pattern. If you are hopscotching all over the place you're going to be hitting the disk a lot. One of the primary reasons for using a structure like a B-tree (or any one of several similar structures) is that the top level of the tree (containing the index) tends to stay in memory (because most paging algorithms use LRU) and you only eat a disk access when you touch a leaf page.
Bottom line: it either fits in memory or it doesn't. If it doesn't then your 10^-9 sec memory access turns into a 10^-3 disk access. I.e. 1 million times slower.
TANSTAAFL!

Have you considered something like a trie? Lookup is linear in key length, which in your case means essentially constant, and storage can be more compact due to nodes sharing common substrings.
Keep in mind, though, that if your data set is actually filling large amounts of your key space your bigger efficiency concern is likely to be caching and disk access, not lookups.

I would go for a red-black tree, because it balances the tree on insertion to ensure optimal insertion/deletion/retrieval. An AVL tree is an option, but it's slightly slower for insertions because it's more rigid about balancing on insertions.
http://en.wikipedia.org/wiki/Red-black_tree
http://en.wikipedia.org/wiki/AVL_tree

My reflex would tell me to reach for a standard implementation, such as the one in stl. But suppose you have reasons to implement your own I would typically go for either Red-Black Trees, which performs well on all operations. Alternatively I would try splay trees which can be really fast but have amortized complexity, i.e. some individual operations might take a little longer.
Stay away from AVL trees as you need to do a lot of updates. AVL trees are good for when you have a lot of lookups but few updates as the updated can be fairly slow.

Do you expect your tree to really hold 2^32-1 entries? Even half that and I would definitely try this with SQLite. You may be able to fit it all in memory, but if you page once, a database will be faster. Database are meant to handle huge data sets efficiently, especially when the whole set won't fit in memory at once.
I you do intend to do this yourself, look at some database code and use a BTree. A red-black will be faster with smaller datasets but with that much data your bottle neck isn't going to be processor speed but memory and harddrive speed.
All that said I can't imagine a map of pointers that large being useful. You'll be pushing the limits of modern memory just storing the map. You won't have anything left over for the map to point to.

boost::unordered_map has amortized constant time insertions, deletions and lookups. It's the best data structure for what you described.
Its only downside is that it's, well, unordered as the name says.. And also if you're REALLY unlucky it could end up being linear time if every single hash clashes. However that can be easily avoided using boost's default boost::hash function. Additionally hashing integers is trivial; so that worst case scenario will not happen to you.
(Note: it's not a tree but a hash table, and you asked specifically for a "Tree".. Maybe you thought that the most efficient way was some sort of tree (it's not)?)

Why a tree at all?
To me it seems you need a database. If you expect lower count of nodes, Hash Table could be enough.
I'm going to warn you about the memory. If you fill up whole tree (2^32 items) you will need 4 gigabytes for the values themselves another 8GB for the pointers. Consider the database, if this is likely.

Each item is represented by a 32-bit identity, which is its key, and two pointers. Are the pointers associated with the tree, or do they have to do with the identity?
If they're just part of implementing the tree, ditch them. You don't need them. Represent whether a number is there or not as a bit in a really big bitmap. Finding the lowest unused bit isn't fast, but I don't think it can be. It's only about 512M of main memory, which isn't that bad.
If the pointers are meaningful data, use an array. You're going to have to allocate space for four giganodes plus pointers to make up the map anyway, so allocate space for four giganodes plus one indicator each for whether the node is active or not. Use memset() to set the whole thing to zero, and keep a lowest-unused-node pointer. Use that to add a node. When you delete a node, mark it as unused, and use the pointers to maintain a two-way linked free list. You'll have to find the next lower unused node, and that might take a while, but again I don't see how to keep this fast. (If you just need an unused node, not the lowest one, just put the released node on the free list somewhere.)
This is likely to take about 64G or 96G of RAM, but that's less than a map solution.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js