Efficient memory allocation before iterative algorithm

Efficient memory allocation before iterative algorithm - c++

I'm trying to implement an algorithm (C++) which is basically structured in the following way:
Build data structures which include
a) vector(double)
b) vector(vector(double)) (representing a sparse matrix A, inner vector
sizes not known a priori and also not the same)
Run an iterative algorithm performing lots of calculations on those
structures
To make it efficient I would like to make sure that before part two starts the memory is optimally allocated, so if possible that the vectors stored in A are next to each other in the memory without any overhead or anything. Is there a good way to do this?

If you want to make the creation of the vector more efficient you could use vector::reserve which reduces reallocation if append data to the vectors.
The access to the vectors is always as efficient as possible. The data in a std::vector is saved contiguously. That means that for big amounts of data it is not relevant if the inner vector are allocated nect to each other or not, because it need the same number of dereferencings.
In general I would say the STL is pretty efficient, but except for functions like reserve you can not do much to optimize it.
If you need higher speed you have to use a data structure that is optimized for your use case.

Related

How to append a large number of elements to stxxl vector efficiently?

I need to append a large number of elements to a stxxl vector. What is the most efficient way of adding elements to a stxxl vector? Right now, I'm using push_back of the stxxl vector, but it doesn't seem very efficient. It's far from saturating the disk bandwidth. Is there a better way?
Thanks,
Da

Most of the things written about "Efficient Sequential Reading and Writing to Vectors" apply in your case.
Besides vector_bufwriter, which fills a vector using an imperative loop, there is also a variant of stxxl::stream::materialize() which does it in a functional programming style.
About previously knowing the vector's size: this is not really necessary for EM, since one can allocate blocks on the fly. These will then generally not be in order, but so be it, there is no guarantee on that anyway.
I see someone (me) made vector_bufwriter automatically double the vector's size if the filling reaches the vector's end. At the moment, I don't think this is necessary, maybe one should change this behaviour.

According to the documentation:
If one needs only to sequentially write elements to the vector in n/B I/Os the currently fastest method is stxxl::generate.
Does not really answer why push_back should be I/O-inefficient, though.

One approach:
First reserve the number of elements you need. Resizing a vector with some types can be very time consuming. Appending many elements can result in several resizes as the vector grows.
Once resized, append using emplace_back (or simply push if the type is trivial, e.g. int).
Also review the member functions. An implementation which suits your needs well may already exist.

Choosing a STL Container for a very large list

I have a very large list of items (~2 millions) that I want to optimize for access speed. I iterate trough the items using an iterator (++it).
Right now the code is implemented using std:map<std::wstring, STRUCT>.
I wonder if it's worth to change std::map with a std::deque<std::pair<std::wstring, STRUCT>>. I think I would have advantage of using pointer arithmetic and minimize cache miss. It worths ?
I know that profiling is the answer but I need an opinion before implementing this ...

If you know in advance the size, then std::Vector is clearly the way to go it your objects aren't too big.
std::vector<Object> list;
list.reserve(2000000);
And then fill it as usual.
This is the fastest and least memory consuming approach. However, you need to be able to allocate enought continous memory. But excepted if your object are 1kb big, it shouldn't be a problem.

With deque, you would lose ( or would have to re-implement ) the advantage of Key-Value pairs. If it's not essential for your data, I would consider using deque.

Generally, if you're only doing search in this set (no insertions/deletions), you're probably better off using a sorted sequential cointainer, like deque or vector. You can then use simple binary search to find the needed elements. The advantage of using a sequential container is that it is better in terms of memory usage, has very simple implementation, and provides better locality of reference. I'd write one version of the code using vector, and another version of the code using deque, then compare them in terms of preformance to decide which one to use in the final version.
However, if your structure needs to be updated (new elements need to be inserted or old elements have to be deleted frequently), map is better choice. Or maybe, you just have to drop STL containers altogether and just use an in-memory database (see SQLite), but it highly depends on what problem you're solving.

The fastest container to iterate through is usually a vector, so if you want to optimize for iteration at the expense of everything else, use that.
Overall app performance of course will depend how many times you iterate, and how you construct your data in the first place. For a simple test, once your map has been populated you can construct a vector from it as follows:
vector<pair<K,V> > myvec(mymap.begin(), mymap.end());
Where K and V are the key and value types of the map. Then just use the vector iterators in place of the map iterators and compare performance.
Of course, if you want to modify the map in future, then normally it would not be appropriate to optimize for iteration at the expense of everything else.

Best sorting algorithm for sorting structs with byte comparisons?

I have an array of 64 structs that hold a decent amount of data (struct is around 128 bytes, so that's 8192 bytes that need to be re-arragned). The array needs to be sorted based off a single unsigned byte in each struct. An interesting property of my data is that it is likely that there will be many duplicates of sorted value - meaning that if you got rid of all duplicates, the array may only be 10 unique elements long, but this is not a given.
Once sorted, I need to create a stack that stores the size and type that each unique byte-run starts:
so if I ended up with sorted values:
4,4,4,9,9,9,9,9,14,14
the stack would be:
(4,3), (9,5), (14,2)
I figured that there would be some nice optimizations I could perform under these conditions. If I do heapsort, I can create the stack while I'm sorting, but would this be faster than a qsort and then build the stack afterwords? Would any sorting algorithm run slower because of the large structs I'm using? Any optimizations I can make because I'm only comparing bytes?
BTW: language is c++
Thanks.

I would imagine that STL will do what you want nicely. Re-writing your own sort routines and containers is likely to be error-prone, and slow. So only worry about if you find it's a bottleneck.

In general with large objects it can be faster to sort an array of pointers/indices of the objects rather than the objects. Or sort an array of nodes, where each node contains a pointer/index of the object and the object's sort key (in this case the key is one byte). To do this in C++ you can just supply a suitable comparator to std::sort or std::stable_sort. Then if you need the original objects in order, as opposed to just needing to know the correct order, finally copy the objects into a new array.
Copying 128 bytes is almost certainly much slower than performing a byte comparison, even with an extra indirection. So for optimal performance it's the moves you need to look at, not the comparisons, and dealing in pointers is one way to avoid most of the moving.
You could build your run-length encoding as you perform the copy at the end.
Of course it might be possible to go even faster with some custom sorting algorithm which makes special use of the numbers in your case (64, "around 128", and 1). But even simple questions like "which is fastest - introsort, heap sort or merge sort" are generally impossible to answer without writing and running the code.

The sort would not be slower because you will be sorting pointer or references to the structs and not the actual struct in memory.

The fact that your keys are integers, and there really aren't a lot of them,
odds are the Bucket Sort, with a bucket size of 1, would be very applicable.

How large does a collection have to be for std::map<k,v> to outpace a sorted std::vector<std::pair<k,v> >?

How large does a collection have to be for std::map to outpace a sorted std::vector >?
I've got a system where I need several thousand associative containers, and std::map seems to carry a lot of overhead in terms of CPU cache. I've heard somewhere that for small collections std::vector can be faster -- but I'm wondering where that line is....
EDIT: I'm talking about 5 items or fewer at a time in a given structure. I'm concerned most with execution time, not storage space. I know that questions like this are inherently platform-specific, but I'm looking for a "rule of thumb" to use.
Billy3

It's not really a question of size, but of usage.
A sorted vector works well when the usage pattern is that you read the data, then you do lookups in the data.
A map works well when the usage pattern involves a more or less arbitrary mixture of modifying the data (adding or deleting items) and doing queries on the data.
The reason for this is fairly simple: a map has higher overhead on an individual lookup (thanks to using linked nodes instead of a monolithic block of storage). An insertion or deletion that maintains order, however, has a complexity of only O(lg N). An insertion or deletion that maintains order in a vector has a complexity of O(N) instead.
There are, of course, various hybrid structures that can be helpful to consider as well. For example, even when data is being updated dynamically, you often start with a big bunch of data, and make a relatively small number of changes at a time to it. In this case, you can load your data into memory into a sorted vector, and keep the (small number of) added objects in a separate vector. Since that second vector is normally quite small, you simply don't bother with sorting it. When/if it gets too big, you sort it and merge it with the main data set.
Edit2: (in response to edit in question). If you're talking about 5 items or fewer, you're probably best off ignoring all of the above. Just leave the data unsorted, and do a linear search. For a collection this small, there's effectively almost no difference between a linear search and a binary search. For a linear search you expect to scan half the items on average, giving ~2.5 comparisons. For a binary search you're talking about log2 N, which (if my math is working this time of the morning) works out to ~2.3 -- too small a difference to care about or notice (in fact, a binary search has enough overhead that it could very easily end up slower).

If you say "outspace" you mean consuming more space (aka memory), then it's very likely that vector will always be more efficient (the underlying implementation is an continous memory array with no othe data, where map is a tree, so every data implies using more space). This however depends on how much the vector reserves extra space for future inserts.
When it is about time (and not space), vector will also always be more effective (doing a dichotomic search). But it will be extreamly bad for adding new elements (or removing them).
So : no simple answer ! Look-up the complexities, think about the uses you are going to do. http://www.cplusplus.com/reference/stl/

The main issue with std::map is an issue of cache, as you pointed.
The sorted vector is a well-known approach: Loki::AssocVector.
For very small datasets, the AssocVector should crush the map despite the copy involved during insertion simply because of cache locality. The AssocVector will also outperform the map for read-only usage. Binary search is more efficient there (less pointers to follow).
For all other uses, you'll need to profile...
There is however an hybrid alternative that you might wish to consider: using the Allocator parameter of the map to restrict the memory area where the items are allocated, thus minimizing the locality reference issue (the root of cache misses).
There is also a paradigm shift that you might consider: do you need sorted items, or fast look-up ?
In C++, the only STL-compliant containers for fast-lookup have been implemented in terms of Sorted Associative Containers for years. However the up-coming C++0x features the long awaited unordered_map which could out perform all the above solutions!

EDIT: Seeing as you're talking about 5 items or fewer:
Sorting involves swapping items. When inserting into std::map, that will only involve pointer swaps. Whether a vector or map will be faster depends on how fast it is to swap two elements.
I suggest you profile your application to figure it out.
If you want a simple and general rule, then you're out of luck - you'll need to consider at least the following factors:
Time
How often do you insert new items compared to how often you lookup?
Can you batch inserts of new items?
How expensive is sorting you vector? Vectors of elements that are expensive to swap become very expensive to sort - vectors of pointers take far less.
Memory
How much overhead per allocation does the allocator you're using have? std::map will perform one allocation per item.
How big are your key/value pairs?
How big are your pointers? (32/64 bit)
How fast does you implementation of std::vector grow? (Popular growth factors are 1.5 and 2)
Past a certain size of container and element, the overhead of allocation and tree pointers will become outweighed by the cost of the unused memory at the end of the vector - but by far the easiest way to find out if and when this occurs is by measuring.

It has to be in the millionth items. And even there ...
I am more thinking here to memory usage and memory accesses. Under hundreds of thousands, take whatever you want, there will be no noticeable difference. CPUs are really fast these days, and the bottleneck is memory latency.
But even with millions of items, if your map<> has been build by inserting elements in random order. When you want to traverse your map (in sorted order) you'll end up jumping around randomly in the memory, stalling the CPU for memory to be available, resulting in poor performance.
On the other side, if your millions of items are in a vector, traversing it is really fast, taking advantage of the CPU memory accesses predictions.
As other have written, it depends on your usage.
Edit: I would more question the way to organize your thousands of associative containers than the containers themselves if they contain only 5 items.

selection of data structure

I use C++, say i want to store 40 usernames, I will simply use an array. However, if I want to store 40000 usernames is this still a good idea in terms of search speed? Which data structure should I use to improve this speed?

You need to specify what the insertion and removal requirements are. Do things need to be removed and inserted at random points in the sequence?
Also, why the requirement to search sequentially? Are you doing searches that aren't suitable for a hash table lookup?
At the moment I'd suggest a deque or a list. Often it's best to choose a container with the interface that makes for the simplest implementation for your algorithm and then only change the choice if the performance is inadequate and an alternative provides the necessary speedup.
A vector has two principle advantages, there is no per-object memory overhead, although vectors will over-allocate to prevent frequent copying and objects are stored contiguously so sequential access tends to be fast. These are also its disadvantages. Growing vectors require reallocation and copying, and insertion and removal from anywhere other than the end of the vector also require copying. Contiguous storage can produce problems for vectors with large numbers of objects or large objects as the contiguous storage requirements can be hard to satisfy even with only mild memory fragmentation.
A list doesn't require contigous storage but list nodes usually have a per-object overhead of two pointers (in most implementation). This can be significant in list of very small objects (e.g. in a list of pointers, each node is 3x the size of the data item). Insertion and removal from the middle of a list is very cheap though and list nodes never need to me moved in memory once created.
A deque uses chunked storage, so it has a low per-object overhead similar to a vector, but doesn't require contiguous storage over the whole container so doesn't have the same problem with fragmented memory spaces. It is often a very good choice for collections and is often overlooked.

As a rule of thumb, prefer vector to list or, diety forbid, C-style array.
After the vector is filled, make sure it is properly ordered using the sort algorithm. You can then search for a particular record using either find, binary_search or lower_bound. (You don't need to sort to use find.)

Seriously unless you are in a resource constrained environment (embedded platform, phone, or other). Use a std::map, save the effort of doing sorting or searching and let the container take care of everything. This will possibly be a sorted tree structure, probably balance (e.g. Red-Black), which means you will get good searching performance. Unless the size of you data is close to the size of one or two pointers, the memory overhead of whatever data structure you pick is negligable. You Graphics Card probably has more memory that you are going to use up for the data you are think about.
As others said there is very little good reason to use vanilla array, if you don't want to use a map use std::vector or std::list depending on whether you need insert/delete data (=>list) or not (=>vector)
Also consider if you really need all that data in memory, how about putting it on disk via sqlite. Or even use sqlite for in memory access. It all depends on what you need to do with your data.

std::vector and std::list seem good for this task. You can use an array if you know the maximum number of records beforehands.

If you need only sequentially search and storage, then list is the proper container.
Also, vector wouldn't be a bad choice.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js