Optimisation linked list

Optimisation linked list - c++

Silly question maybe - but, what would be faster:
-Deleting an item from a linked list every time an item is gone (whatever that item might be)
-Just marking the record as dead and overwriting it after x amount of time or conditions.
Would I not use less cpu time by avoiding all the removing and inserting rather than just overwriting.

Only way to find out is to use a profiler.
Keep in mind that there are other factors which are very important that you didn't specify in your question. For example, it makes a big difference if you know which element to delete, i.e. you have the pointer to it ready, or if you have to iterate over the linked list to find it. In that case, a vector could actually be several times faster than any linked list (depending on the number of elements) due to caching.

Overwriting is going to be faster. If deleting involves freeing allocated memory from the heap, then overwriting will definitely be faster. Memory allocation is relatively slow. Plus, over time your list memory would get fragmented which might cause cache misses. Even freeing memory that goes back into a memory manager of some sort would be slower since you have the overhead of managing the nodes of the list.

Related

Why is malloc() based on linked-list?

In the worst case, on a section (is this the right term?) of memory of size n, linked-list needs O(n) time to allocate a block of memory in suitable size.
However, if malloc is tree-based, say, an interval tree, only O(logn) time is needed. Furthermore, a tree can satisfied such requirements without extra time (in terms of time complexity) as "Find the smallest block of free memory whose size is larger themx" , "Always allocate on the borders of free memory" and "Free only a part of the allocated memory". A drawback maybe that freeing memory takes O(logn) time.
Thanks
ps. I've seen the question Data structures for traversable memory pool, but the author doesn't seem to have figured it out.

I don't KNOW the answer, but here's some thoughts:
There is absolutely no REQUIREMENT that malloc is implemented in a particular way. However, an unbalanced tree is just as bad as a linked list in the worst case. A balanced linked list is a lot more maintenance. A tree, where you have two links per node, also takes up more memory than a single-linked list. Removing nodes in a linked list is easier as well as inserting at the end is very easy.
And in most systems, there is (almost) exactly one free for every malloc - so if you make one faster by making the other slower, you gain very little.
It is also relatively common that the "next allocation is the same as the previous one", meaning that if the last allocation is first in list, it's an O(1) operation.
In real-time systems, buckets are often used for allocations, such that there are a number of fixed sizes, and each time something is allocated out of the main heap, the size is rounded to the nearest larger size, and when freed it goes into the bucket of that size (which is a linked list). If there is a free element of that size already, then that allocation is used. Besides the speed of allocation/free being O(1), this has the benefit of reducing fragmentation - it's not completely impossible to "rip all the heap into small pieces, and then not have any large lumps left", but it's at least not possible to take up most of the memory by simply allocating a byte more each time until you have half the heap size in one allocation.
(Also, in Linux's GLIBC, allocations over a certain size do not end up in the linked list at all - they are allocated directly via mmap and released back with munmap when free is called)
Finally, algorithmic complexity is not everything - in real life, it is the actual time spent on something that matters - even if the algorithm has O(n) but each operation is fast, it can beat O(logn). Similarly, particularly in C++, small allocations are highly dominant, which means that overhead of more memory per node is an important factor.

There is no specification which says that malloc needs to be based on a linked list. Between platforms, the implementation may change. On one platform, speed may be crucial and a tree can be implemented, on another platform, memory is more expensive and a linked list (or such) is used in order to save as much memory as possible.

Should I use boost fast pool allocator for following?

I have a server that throughout the course of 24 hours keeps adding new items to a set. Elements are not deleted over the 24 period, just new elements keep getting inserted.
Then at end of period the set is cleared, and new elements start getting added again for another 24 hours.
Do you think a fast pool allocator would be useful here as to reuse the memory and possibly help with fragmentation?
The set grows to around 1 million elements. Each element is about 1k.

It's highly unlikely …but you are of course free to test it in your program.
For a collection of that size and allocation pattern (more! more! more! + grow! grow! grow!), you should use an array of vectors. Just keep it in contiguous blocks and reserve() when they are created and you never need to reallocate/resize or waste space and bandwidth traversing lists. vector is going to be best for your memory layout with a collection that large. Not one big vector (which would take a long time to resize), but several vectors, each which represent chunks (ideal chunk size can vary by platform -- I'd start with 5MB each and measure from there). If you follow, you see there is no need to resize or reuse memory; just create an allocation every few minutes for the next N objects -- there is no need for high frequency/speed object allocation and recreation.
The thing about a pool allocator would suggest you want a lot of objects which have discontiguous allocations, lots of inserts and deletes like a list of big allocations -- this is bad for a few reasons. If you want to create an implementation which optimizes for contiguous allocation at this size, just aim for the blocks with vectors approach. Allocation and lookup will both be close to minimal. At that point, allocation times should be tiny (relative to the other work you do). Then you will also have nothing unusual or surprising about your allocation patterns. However, the fast pool allocator suggests you treat this collection as a list, which will have terrible performance for this problem.
Once you implement that block+vector approach, a better performance comparison (at that point) would be to compare boost's pool_allocator vs std::allocator. Of course, you could test all three, but memory fragmentation is likely going to be reduced far more by that block of vectors approach, if you implement it correctly. Reference:
If you are seriously concerned about performance, use fast_pool_allocator when dealing with containers such as std::list, and use pool_allocator when dealing with containers such as std::vector.

Linked list vs dynamic array for implementing a stack using vector class

I was reading up on the two different ways of implementing a stack: linked list and dynamic arrays. The main advantage of a linked list over a dynamic array was that the linked list did not have to be resized while a dynamic array had to be resized if too many elements were inserted hence wasting alot of time and memory.
That got me wondering if this is true for C++ (as there is a vector class which automatically resizes whenever new elements are inserted)?

It's difficult to compare the two, because the patterns of their memory usage are quite different.
Vector resizing
A vector resizes itself dynamically as needed. It does that by allocating a new chunk of memory, moving (or copying) data from the old chunk to the new chunk, the releasing the old one. In a typical case, the new chunk is 1.5x the size of the old (contrary to popular belief, 2x seems to be quite unusual in practice). That means for a short time while reallocating, it needs memory equal to roughly 2.5x as much as the data you're actually storing. The rest of the time, the "chunk" that's in use is a minimum of 2/3rds full, and a maximum of completely full. If all sizes are equally likely, we can expect it to average about 5/6ths full. Looking at it from the other direction, we can expect about 1/6th, or about 17% of the space to be "wasted" at any given time.
When we do resize by a constant factor like that (rather than, for example, always adding a specific size of chunk, such as growing in 4Kb increments) we get what's called amortized constant time addition. In other words, as the array grows, resizing happens exponentially less often. The average number of times items in the array have been copied tends to a constant (usually around 3, but depends on the growth factor you use).
linked list allocations
Using a linked list, the situation is rather different. We never see resizing, so we don't see extra time or memory usage for some insertions. At the same time, we do see extra time and memory used essentially all the time. In particular, each node in the linked list needs to contain a pointer to the next node. Depending on the size of the data in the node compared to the size of a pointer, this can lead to significant overhead. For example, let's assume you need a stack of ints. In a typical case where an int is the same size as a pointer, that's going to mean 50% overhead -- all the time. It's increasingly common for a pointer to be larger than an int; twice the size is fairly common (64-bit pointer, 32-bit int). In such a case, you have ~67% overhead -- i.e., obviously enough, each node devoting twice as much space to the pointer as the data being stored.
Unfortunately, that's often just the tip of the iceberg. In a typical linked list, each node is dynamically allocated individually. At least if you're storing small data items (such as int) the memory allocated for a node may be (usually will be) even larger than the amount you actually request. So -- you ask for 12 bytes of memory to hold an int and a pointer -- but the chunk of memory you get is likely to be rounded up to 16 or 32 bytes instead. Now you're looking at overhead of at least 75% and quite possibly ~88%.
As far as speed goes, the situation is rather similar: allocating and freeing memory dynamically is often quite slow. The heap manager typically has blocks of free memory, and has to spend time searching through them to find the block that's most suited to the size you're asking for. Then it (typically) has to split that block into two pieces, one to satisfy your allocation, and another of the remaining memory it can use to satisfy other allocations. Likewise, when you free memory, it typically goes back to that same list of free blocks and checks whether there's an adjoining block of memory already free, so it can join the two back together.
Allocating and managing lots of blocks of memory is expensive.
cache usage
Finally, with recent processors we run into another important factor: cache usage. In the case of a vector, we have all the data right next to each other. Then, after the end of the part of the vector that's in use, we have some empty memory. This leads to excellent cache usage -- the data we're using gets cached; the data we're not using has little or no effect on the cache at all.
With a linked list, the pointers (and probable overhead in each node) are distributed throughout our list. I.e., each piece of data we care about has, right next to it, the overhead of the pointer, and the empty space allocated to the node that we're not using. In short, the effective size of the cache is reduced by about the same factor as the overall overhead of each node in the list -- i.e., we might easily see only 1/8th of the cache storing the date we care about, and 7/8ths devoted to storing pointers and/or pure garbage.
Summary
A linked list can work well when you have a relatively small number of nodes, each of which is individually quite large. If (as is more typical for a stack) you're dealing with a relatively large number of items, each of which is individually quite small, you're much less likely to see a savings in time or memory usage. Quite the contrary, for such cases, a linked list is much more likely to basically waste a great deal of both time and memory.

Yes, what you say is true for C++. For this reason, the default container inside std::stack, which is the standard stack class in C++, is neither a vector nor a linked list, but a double ended queue (a deque). This has nearly all the advantages of a vector, but it resizes much better.
Basically, an std::deque is a linked list of arrays of sorts internally. This way, when it needs to resize, it just adds another array.

First, the performance trade-offs between linked-lists and dynamic arrays are a lot more subtle than that.
The vector class in C++ is, by requirement, implemented as a "dynamic array", meaning that it must have an amortized-constant cost for inserting elements into it. How this is done is usually by increasing the "capacity" of the array in a geometric manner, that is, you double the capacity whenever you run out (or come close to running out). In the end, this means that a reallocation operation (allocating a new chunk of memory and copying the current content into it) is only going to happen on a few occasions. In practice, this means that the overhead for the reallocations only shows up on performance graphs as little spikes at logarithmic intervals. This is what it means to have "amortized-constant" cost, because once you neglect those little spikes, the cost of the insert operations is essentially constant (and trivial, in this case).
In a linked-list implementation, you don't have the overhead of reallocations, however, you do have the overhead of allocating each new element on freestore (dynamic memory). So, the overhead is a bit more regular (not spiked, which can be needed sometimes), but could be more significant than using a dynamic array, especially if the elements are rather inexpensive to copy (small in size, and simple object). In my opinion, linked-lists are only recommended for objects that are really expensive to copy (or move). But at the end of the day, this is something you need to test in any given situation.
Finally, it is important to point out that locality of reference is often the determining factor for any application that makes extensive use and traversal of the elements. When using a dynamic array, the elements are packed together in memory one after the other and doing an in-order traversal is very efficient as the CPU can preemptively cache the memory ahead of the reading / writing operations. In a vanilla linked-list implementation, the jumps from one element to the next generally involves a rather erratic jumps between wildly different memory locations, which effectively disables this "pre-fetching" behavior. So, unless the individual elements of the list are very big and operations on them are typically very long to execute, this lack of pre-fetching when using a linked-list will be the dominant performance problem.
As you can guess, I rarely use a linked-list (std::list), as the number of advantageous applications are few and far between. Very often, for large and expensive-to-copy objects, it is often preferable to simply use a vector of pointers (you get basically the same performance advantages (and disadvantages) as a linked list, but with less memory usage (for linking pointers) and you get random-access capabilities if you need it).
The main case that I can think of, where a linked-list wins over a dynamic array (or a segmented dynamic array like std::deque) is when you need to frequently insert elements in the middle (not at either ends). However, such situations usually arise when you are keeping a sorted (or ordered, in some way) set of elements, in which case, you would use a tree structure to store the elements (e.g., a binary search tree (BST)), not a linked-list. And often, such trees store their nodes (elements) using a semi-contiguous memory layout (e.g., a breadth-first layout) within a dynamic array or segmented dynamic array (e.g., a cache-oblivious dynamic array).

Yes, it's true for C++ or any other language. Dynamic array is a concept. The fact that C++ has vector doesn't change the theory. The vector in C++ actually does the resizing internally so this task isn't the developers' responsibility. The actual cost doesn't magically disappear when using vector, it's simply offloaded to the standard library implementation.

std::vector is implemented using a dynamic array, whereas std::list is implemented as a linked list. There are trade-offs for using both data structures. Pick the one that best suits your needs.
As you indicated, a dynamic array can take a larger amount of time adding an item if it gets full, as it has to expand itself. However, it is faster to access since all of its members are grouped together in memory. This tight grouping also usually makes it more cache-friendly.
Linked lists don't need to resize ever, but traversing them takes longer as the CPU must jump around in memory.

That got me wondering if this is true for c++ as there is a vector class which automatically resizes whenever new elements are inserted.
Yes, it still holds, because a vector resize is a potentially expensive operation. Internally, if the pre-allocated size for the vector is reached and you attempt to add new elements, a new allocation takes place and the old data is moved to the new memory location.

From the C++ documentation:
vector::push_back - Add element at the end
Adds a new element at the end of the vector, after its current last element. The content of val is copied (or moved) to the new element.
This effectively increases the container size by one, which causes an automatic reallocation of the allocated storage space if -and only if- the new vector size surpasses the current vector capacity.

http://channel9.msdn.com/Events/GoingNative/GoingNative-2012/Keynote-Bjarne-Stroustrup-Cpp11-Style
Skip to 44:40. You should prefer std::vector whenever possible over a std::list, as explained in the video, by Bjarne himself. Since std::vector stores all of it's elements next to each other, in memory, and because of that it will have the advantage of being cached in memory. And this is true for adding and removing elements from std::vector and also searching. He states that std::list is 50-100x slower than a std::vector.
If you really want a stack, you should really use std::stack instead of making your own.

std::list vs std::vector iteration

It is said that iterating through a vector (as in reading all it's element) is faster than iterating through a list, because of optimized cache.
Is there any ressource on the web that would quantify how much it impacts the performances ?
Also, would it be better to use a custom linked list, whom elements would be prealocated so that they are consecutive in memory?
The idea behind that is that I want to store elements in a certain order that won't change. I still need to be able to insert some at run time in the midle quickly, but most of them will still be consecutive, because the order won't change.
Does the fact that the elements are consecutive have an impact in the cache, or because I'll still call list_element->next instead of ++list_element it does not improve anything ?

The main difference between vector and lists is that in vector elements are constructed subsequently inside a preallocated buffer, while in a list elements are constructed one by one.
As a consequence, elements in a vector are granted to occupy a contiguous memory space, while list elements (unless some specific situations, like a custom allocator working that way) aren't granted to be so, and can be "sparse" around the memory.
Now, since the processor operates on a cache (that can be up to 1000 times faster than the main RAM) that remaps entire pages of the main memory, if elements are consecutive it is higly probable that they fits a same memory page and hence are moved all together in the cache when iteration begins. While proceeding, everything happens in the cache without further moving of data or further access to the slower RAM.
With list-s, since elements are sparse everywhere, "going to the next" means refer to an address that may not be in the same memory page of its previous, and hence, the cache needs to be updated upon every iteration step, accessing the slower RAM on each iteration.
The performance difference greatly depends on the processor and on the type of memory used for both the main RAM and the cache, and on the way the std::allocator (and ultimately operator new and malloc) are implemented, so a general number is impossible to be given.
(Note: great difference means bad RAM respect to to the cache, but may also means bad implementation on list-s)

The efficiency gains from cache coherency due to compact representation of data structures can be rather dramatic. In the case of vectors compared to lists, compact representation can be better not just for read but even for insertion (shifting in vectors) of elements up to the order of 500K elements for some particular architecture as demonstrated in Figure 3 of this article by Bjarne Stroustrup:
http://www2.research.att.com/~bs/Computer-Jan12.pdf
(Publisher site: http://www.computer.org/portal/web/csdl/doi/10.1109/MC.2011.353)
I think that if this is a critical factor for your program, you should profile it on your architecture.

Not sure if I can explain it right but here's my view(i'm thinking along the lines of translated machine instruction below:),
Vector iterator(contiguous memory):
When you increment a vector iterator, the iterator value is simply added the size of the object(known at compile time) to point to the next object. In most CPUs this is anything from one to three instructions at most.
List iterator(linked list http://www.sgi.com/tech/stl/List.html):
When you increment a list iterator(the pointed object), the location of the forward link is located by adding some number to the base of the object pointed and then loaded up as the new value of the iterator. There is more than one memory access for this and is slower than the vector iteration operation.

Multithreading not taking advantage of multiple cores?

My computer is a dual core core2Duo. I have implemented multithreading in a slow area of my application but I still notice cpu usage never exceeds 50% and it still lags after many iterations. Is this normal? I was hopeing it would get my cpu up to 100% since im dividing it into 4 threads. Why could it still be capped at 50%?
Thanks
See What am I doing wrong? (multithreading)
for my implementation, except I fixed the issue that that code was having

Looking at your code, you are making a huge number of allocations in your tight loop--in each iteration you dynamically allocate two, two-element vectors and then push those back onto the result vector (thus making copies of both of those vectors); that last push back will occasionally cause a reallocation and a copy of the vector contents.
Heap allocation is relatively slow, even if your implementation uses a fast, fixed-size allocator for small blocks. In the worst case, the general-purpose allocator may even use a global lock; if so, it will obliterate any gains you might get from multithreading, since each thread will spend a lot of time waiting on heap allocation.
Of course, profiling would tell you whether heap allocation is constraining your performance or whether it's something else. I'd make two concrete suggestions to cut back your heap allocations:
Since every instance of the inner vector has two elements, you should consider using a std::array (or std::tr1::array or boost::array); the array "container" doesn't use heap allocation for its elements (they are stored like a C array).
Since you know roughly how many elements you are going to put into the result vector, you can reserve() sufficient space for those elements before inserting them.

From your description we have very little to go on, however, let me see if I can help:
You have implemented a lock-based system but you aren't judiciously using the resources of the second, third, or fourth threads because the entity that they require is constantly locked. (this is a very real and obvious area I'd look into first)
You're not actually using more than a single thread. Somehow, somewhere, those other threads aren't even fired up or initialized. (sounds stupid but I've done this before)
Look into those areas first.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js