Choosing specific objects satisfying conditions - c++

Let's say I have objects which look very roughly like this:
class object
{
public:
// ctors etc.
bool has_property_X() const { ... }
std::size_t size() const { ... }
private:
// a little something here, but not really much
};
I'm storing these objects inside a vector and the vector is rather small (say, at most around 1000 elements). Then, inside a performance critical algorithm, I would like to choose the object that both has the property X and has the least size (in case there are multiple such objects, choose any of them). I need to do this "choosing" multiple times, and both the holding of the property X and the size may vary in between the choices, so that the objects are in a way dynamic here. Both queries (property, size) can be made in constant time.
How would I best achieve this? Performance is profiled to be important here. My ideas at the moment:
1) Use std::min_element with a suitable predicate. This would probably also need boost::filter_iterator or something similar to iterate over objects satisfying property X?
2) Use some data structure, such as a priority queue. I would store pointers or reference_wrappers to the objects and so forth. This atleast to me, feels slow and probably it's not even feasible because of the dynamic nature of the objects.
Any other suggestions or comments on these thoughts? Should I just go ahead and try any or both of these schemes and profile?

Your last choice is always a good one. Our intuitions about how code will run are often wrong. So where possible profiling is always useful on critical code.

Related

Data structure for sparse insertion

I am asking this question mostly for confirmation, because I am not an expert in data structures, but I think the structure that suits my need is a hashmap.
Here is my problem (which I guess is typical?):
We are looking at pairwise interactions between a large number of objects (say N=90k), so think about the storage as a sparse matrix;
There is a process, say (P), which randomly starts from one object, and computes a model which may lead to another object: I cannot predict the pairs in advance, so I need to be able to "create" entries dynamically (arguably the performance is not very critical here);
The process (P) may "revisit" existing pairs and update the corresponding element in the matrix: this happens a lot, and therefore I need to be able to find and update as fast as possible.
Finally, the process (P) is repeated millions of times, but only requires write access to the data structure, it does not need to know about the latest "state" of that storage. This feels intuitively like a detail that might be exploited to improve performance, but I don't think hashmaps do.
This last point is actually the main reason for my question here: is there a data-structure which satisfies the first three points (I'm thinking hash-map, correct?), and which would also exploit the last feature for improved performance (I'm thinking something like buffering operations and execute them in bulk asynchronously)?
EDIT: I am working with C++, and would prefer it if there was an existing library implementing that data structure. In addition, I am limited by the system requirements; I cannot use C++11 features.
I would use something like:
#include <boost/unordered_map.hpp>
class Data
{
boost::unordered_map<std::pair<int,int>,double> map;
public:
void update(int i, int j, double v)
{
map[std::pair<int,int>(i,j)] += v;
}
void output(); // Prints data somewhere.
};
That will get you going (you may need to declare a suitable hash function). You might be able to speed things up by making the key type be a 64-bit integer, and using ((int64_t)i << 32) | j to make the index.
If nearly all the updates go to a small fraction of the pairs, you could have two maps (small and large), and directly update the small map. Every time the size of small passed a threshold, you could update large and clear small. You would need to do some carefully testing to see if this helped or not. The only reason I think it might help, is by improving cache locality.
Even if you end up using a different data structure, you can keep this class interface, and the rest of the code will be undisturbed. In particular, dropping sparsehash into the same structure will be very easy.

C++ index-to-index map

I have a list of IDs (integers).
They are sorted in a really efficient way so that my application can easily handle them, for example
9382
297832
92
83723
173934
(this sort is really important in my application).
Now I am facing the problem of having to access certain values of an ID in another vector.
For example certain values for ID 9382 are located on someVectorB[30].
I have been using
const int UNITS_MAX_SIZE = 400000;
class clsUnitsUnitIDToArrayIndex : public CBaseStructure
{
private:
int m_content[UNITS_MAX_SIZE];
long m_size;
protected:
void ProcessTxtLine(string line);
public:
clsUnitsUnitIDToArrayIndex();
int *Content();
long Size();
};
But now that I raised UNITS_MAX_SIZE to 400.000, I get page stack errors, and that tells me that I am doing something wrong. I think the entire approach is not really good.
What should I use if I want to locate an ID in a different vector if the "position" is different?
ps: I am looking for something simple that can be easily read-in from a file and that can also easily be serialized to a file. That is why I have been using this brute-force approach before.
If you want a mapping from int's to int's and your index numbers non-consecutive you should consider a std::map. In this case you would define it as such:
std::map<int, int> m_idLocations;
A map represents a mapping between two types. The first type is the "key" and is used for lookup up the second type known as the "value". For each id lookup you can insert it with:
m_idLocations[id] = position;
// or
m_idLocations.insert(std::pair<int,int>(id, position));
And you can look them up using the following syntax:
m_idLocations[id];
Typically a std::map in the stl is implemented using red-black trees which have a worse-cast lookup speed of O(log n). This is slightly slower then O(1) that you'll be getting from the huge array however it's a substantially better use of a space and you're unlikely to notice the difference in practise unless you're storing truly gigantic amounts of numbers or doing an enourmous amount of lookups.
Edit:
In response to some of the comments I think it's important to point out that moving from O(1) to O(log n) can make a significant difference in the speed of your application not to mention practical speed concerns from moving to fixed blocks of memory to tree based structure. However I think that it's important to initially represent what you're trying to say (an int-to-int) mapping and avoid the pitfall of premature optimization.
After you've represented the concept you should then use a profiler to determine if and where the speed issues are. If you find that the map is causing issues then you should look at replacing your mapping with something that you think will be quicker. Make sure to test that the optimization helped and don't forget to include a big comment about what you are representing and why it needed to be changed.
if nothing else works you can just allocate the array dynamically in the constructor. this will move the large array on the heap and avoid your page stack error. you should also remember to release the resource while destroying your clsUnitsUnitIDToArrayIndex
But the recommended usage is as suggested by other members, use a std::vector or std::map
Probably you are getting stackoverflow error due to int m_content[UNITS_MAX_SIZE]. The array is allocated on the stack and 400000 is a pretty big number for the stack. You can use std::vector instead, it is dynamically allocated and you can return a reference of vector member to avoid copy operation:
std::vector<int> m_content(UNITS_MAX_SIZE);
const std::vector<int> &clsUnitsUnitIDToArrayIndex::Content() const
{
return m_content;
}

stl map performance?

I am using map<MyStruct, I*> map1;. Apparently 9% of my total app time is spent in there. Specifically on one line of one of my major functions. The map isn't very big (<1k almost always, <20 is common).
Is there an alternative implementation i may want to use? I think i shouldn't write my own but i could if i thought it was a good idea.
Additional info: I always check before adding an element. If a key exist I need to report a problem. Than after a point i will be using map heavily for lookups and will not add any more elements.
First you need to understand what a map is and what the operations that you are doing represent. A std::map is a balanced binary tree, lookup will take O( log N ) operations, each of which is a comparison of the keys plus some extra that you can ignore in most cases (pointer management). Insertion takes roughly the same time to locate the point of insertion, plus allocation of the new node, the actual insertion into the tree and rebalancing. The complexity is again O( log N ) although the hidden constants are higher.
When you try to determine whether an key is in the map prior to insertion you are incurring the cost of the lookup and if it does not succeed, the same cost to locate the point of insertion. You can avoid the extra cost by using std::map::insert that return a pair with an iterator and a bool telling you whether the insertion actually happened or the element was already there.
Beyond that, you need to understand how costly it is to compare your keys, which falls out of what the question shows (MyStruct could hold just one int or a thousand of them), which is something you need to take into account.
Finally, it might be the case that a map is not the most efficient data structure for your needs, and you might want to consider using either an std::unordered_map (hash table) that has expected constant time insertions (if the hash function is not horrible) or for small data sets even a plain ordered array (or std::vector) on which you can use binary search to locate the elements (this will reduce the number of allocations, at the cost of more expensive insertions, but if the held types are small enough it might be worth it)
As always with performance, measure and then try to understand where the time is being spent. Also note that a 10% of the time spent in a particular function or data structure might be a lot or almost nothing at all, depending on what your application is. For example, if your application is just performing lookups and insertions into a data set, and that takes only a 10% of the CPU you have a lot to optimize everywhere else!
Probably it will be quicker to just do an insert and check if the pair.second is false if key already exists:
like this
if ( myMap.insert( make_pair( MyStruct, I* ) ).second == false)
{
// report error
}
else
// inserted new value
... rather than doing a find call every time.
Instead of map you could try unordered_map which uses hash keys, instead of a tree, to find elements. This answer gives some hints when to prefer unordered_map over map.
It might be a long shot, but for small collections, sometimes the most critical factor is the cache performance.
Since std::map implements a Red-Black Tree, which is [AFAIK] not very cache-efficient - maybe implementing the map as a std::vector<pair<MyStruct,I*>> would be a good idea, and use binary search there [instead of map look-ups], at the very least it should be efficient once you start only looking up [stop inserting elements], since the std::vector is more likely to fit in cache than the map.
This factor [cpu-cache] is usually neglected and hidden as constant in the big O notation, but for large collections it might have major effect.
The way you are using the map, you're doing lookups on the basis of a MyStruct instance and depending on your particular implementation, the required comparison may or may not be costly.
Is there an alternative implementation i may want to use? I think i shouldn't write my own but i could if i thought it was a good idea.
If you understand the problem well enough, you should detail how your implementation will be superior.
Is map the proper structure? If so, then your standard library's implementation will likely be of good quality (well optimized).
Can MyStruct comparison be simplified?
Where is the problem -- resizing? lookup?
Have you minimized copy and assign costs for your structures?
As stated in the comments, without proper code, there is little universal answers to give you. However, if MyStruct is really huge the stack copying may be costly. Perhaps it makes sense to store pointers to MyStruct and implement your own compare mechanism:
template <typename T> struct deref_cmp {
bool operator()(std::shared_ptr<T> lhs, std::shared_ptr<T> rhs) const {
return *lhs < *rhs;
}
};
std::map<std::shared_ptr<MyStruct>, I*, deref_cmp<MyStruct>> mymap;
However, this is something you will have to profile. It might speed things up.
You would look up an element like this
template <typename T> struct NullDeleter {
void operator()(T const*) const {}
};
// needle being a MyStruct
mymap.find(std::shared_ptr<MyStruct>(&needle,NullDeleter()));
Needless to say, there is more potential to optimise.

An integer hashing problem

I have a (C++) std::map<int, MyObject*> that contains a couple of millions of objects of type MyObject*. The maximum number of objects that I can have, is around 100 millions. The key is the object's id. During a certain process, these objects must be somehow marked( with a 0 or 1) as fast as possible. The marking cannot happen on the objects themselves (so I cannot introduce a member variable and use that for the marking process). Since I know the minimum and maximum id (1 to 100_000_000), the first thought that occured to me, was to use a std::bit_set<100000000> and perform my marking there. This solves my problem and also makes it easier when marking processes run in parallel, since these use their own bit_set to mark things, but I was wondering what the solution could be, if I had to use something else instead of a 0-1 marking, e.g what could I use if I had to mark all objects with an integer number ?
Is there some form of a data structure that can deal with this kind of problem in a compact (memory-wise) manner, and also be fast ? The main queries of interest are whether an object is marked, and with what was marked with.
Thank you.
Note: std::map<int, MyObject*> cannot be changed. Whatever data structure I use, must not deal with the map itself.
How about making the value_type of your map a std::pair<bool, MyObject*> instead of MyObject*?
If you're not concerned with memory, then a std::vector<int> (or whatever suits your need in place of an int) should work.
If you don't like that, and you can't modify your map, then why not create a parallel map for the markers?
std::map<id,T> my_object_map;
std::map<id,int> my_marker_map;
If you cannot modify the objects directly, have you considered wrapping the objects before you place them in the map? e.g.:
struct
{
int marker;
T *p_x;
} T_wrapper;
std::map<int,T_wrapper> my_map;
If you're going to need to do lookups anyway, then this will be no slower.
EDIT: As #tenfour suggests in his/her answer, a std::pair may be a cleaner solution here, as it saves the struct definition. Personally, I'm not a big fan of std::pairs, because you have to refer to everything as first and second, rather than by meaningful names. But that's just me...
The most important question to ask yourself is "How many of these 100,000,000 objects might be marked (or remain unmarked)?" If the answer is smaller than roughly 100,000,000/(2*sizeof(int)), then just use another std::set or std::tr1::unordered_set (hash_set previous to tr1) to track which ones are so marked (or remained unmarked).
Where does 2*sizeof(int) come from? It's an estimate of the amount of memory overhead to maintain a heap structure in a deque of the list of items that will be marked.
If it is larger, then use std::bitset as you were about to use. It's overhead is effectively 0% for the scale of quantity you need. You'll need about 13 megabytes of contiguous ram to hold the bitset.
If you need to store a marking as well as presence, then use std::tr1::unordered_map using the key of Object* and value of marker_type. And again, if the percentage of marked nodes is higher than the aforementioned comparison, then you'll want to use some sort of bitset to hold the number of bits needed, with suitable adjustments in size, at 12.5 megabytes per bit.
A purpose-built object holding the bitset might be your best choice, given the clarification of the requirements.
Edit: this assumes that you've done proper time-complexity computations for what are acceptable solutions to you, since changing the base std::map structure is no longer permitted.
If you don't mind using hacks, take a look at the memory optimization used in Boost.MultiIndex. It can store one bit in the LSB of a stored pointer.

Choice of the most performant container (array)

This is my little big question about containers, in particular, arrays.
I am writing a physics code that mainly manipulates a big (> 1 000 000) set of "particles" (with 6 double coordinates each). I am looking for the best way (in term of performance) to implement a class that will contain a container for these data and that will provide manipulation primitives for these data (e.g. instantiation, operator[], etc.).
There are a few restrictions on how this set is used:
its size is read from a configuration file and won't change during execution
it can be viewed as a big two dimensional array of N (e.g. 1 000 000) lines and 6 columns (each one storing the coordinate in one dimension)
the array is manipulated in a big loop, each "particle / line" is accessed and computation takes place with its coordinates, and the results are stored back for this particle, and so on for each particle, and so on for each iteration of the big loop.
no new elements are added or deleted during the execution
First conclusion, as the access on the elements is essentially done by accessing each element one by one with [], I think that I should use a normal dynamic array.
I have explored a few things, and I would like to have your opinion on the one that can give me the best performances.
As I understand there is no advantage to use a dynamically allocated array instead of a std::vector, so things like double** array2d = new ..., loop of new, etc are ruled out.
So is it a good idea to use std::vector<double> ?
If I use a std::vector, should I create a two dimensional array like std::vector<std::vector<double> > my_array that can be indexed like my_array[i][j], or is it a bad idea and it would be better to use std::vector<double> other_array and acces it with other_array[6*i+j].
Maybe this can gives better performance, especially as the number of columns is fixed and known from the beginning.
If you think that this is the best option, would it be possible to wrap this vector in a way that it can be accessed with a index operator defined as other_array[i,j] // same as other_array[6*i+j] without overhead (like function call at each access) ?
Another option, the one that I am using so far is to use Blitz, in particular blitz::Array:
typedef blitz::Array<double,TWO_DIMENSIONS> store_t;
store_t my_store;
Where my elements are accessed like that: my_store(line, column);.
I think there are not much advantage to use Blitz in my case because I am accessing each element one by one and that Blitz would be interesting if I was using operations directly on array (like matrix multiplication) which I am not.
Do you think that Blitz is OK, or is it useless in my case ?
These are the possibilities I have considered so far, but maybe the best one I still another one, so don't hesitate to suggest me other things.
Thanks a lot for your help on this problem !
Edit:
From the very interesting answers and comments bellow a good solution seems to be the following:
Use a structure particle (containing 6 doubles) or a static array of 6 doubles (this avoid the use of two dimensional dynamic arrays)
Use a vector or a deque of this particle structure or array. It is then good to traverse them with iterators, and that will allow to change from one to another later.
In addition I can also use a Blitz::TinyVector<double,6> instead of a structure.
So is it a good idea to use std::vector<double> ?
Usually, a std::vector should be the first choice of container. You could use either std::vector<>::reserve() or std::vector<>::resize() to avoid reallocations while populating the vector. Whether any other container is better can be found by measuring. And only by measuring. But first measure whether anything the container is involved in (populating, accessing elements) is worth optimizing at all.
If I use a std::vector, should I create a two dimensional array like std::vector<std::vector<double> > [...]?
No. IIUC, you are accessing your data per particle, not per row. If that's the case, why not use a std::vector<particle>, where particle is a struct holding six values? And even if I understood incorrectly, you should rather write a two-dimensional wrapper around a one-dimensional container. Then align your data either in rows or columns - what ever is faster with your access patterns.
Do you think that Blitz is OK, or is it useless in my case?
I have no practical knowledge about blitz++ and the areas it is used in. But isn't blitz++ all about expression templates to unroll loop operations and optimizing away temporaries when doing matrix manipulations? ICBWT.
First of all, you don't want to scatter the coordinates of one given particle all over the place, so I would begin by writing a simple struct:
struct Particle { /* coords */ };
Then we can make a simple one dimensional array of these Particles.
I would probably use a deque, because that's the default container, but you may wish to try a vector, it's just that 1.000.000 of particles means about a single chunk of a few MBs. It should hold but it might strain your system if this ever grows, while the deque will allocate several chunks.
WARNING:
As Alexandre C remarked, if you go the deque road, refrain from using operator[] and prefer to use iteration style. If you really need random access and it's performance sensitive, the vector should prove faster.
The first rule when choosing from containers is to use std::vector. Then, only after your code is complete and you can actually measure performance, you can try other containers. But stick to vector first. (And use reserve() from the start)
Then, you shouldn't use an std::vector<std::vector<double> >. You know the size of your data: it's 6 doubles. No need for it to be dynamic. It is constant and fixed. You can define a struct to hold you particle members (the six doubles), or you can simply typedef it: typedef double particle[6]. Then, use a vector of particles: std::vector<particle>.
Furthermore, as your program uses the particle data contained in the vector sequentially, you will take advantage of the modern CPU cache read-ahead feature at its best performance.
You could go several ways. But in your case, don't declare astd::vector<std::vector<double> >. You're allocating a vector (and you copy it around) for every 6 doubles. Thats way too costly.
If you think that this is the best option, would it be possible to wrap this vector in a way that it can be accessed with a index operator defined as other_array[i,j] // same as other_array[6*i+j] without overhead (like function call at each access) ?
(other_array[i,j] won't work too well, as i,j employs the comma operator to evaluate the value of "i", then discards that and evaluates and returns "j", so it's equivalent to other_array[i]).
You will need to use one of:
other_array[i][j]
other_array(i, j) // if other_array implements operator()(int, int),
// but std::vector<> et al don't.
other_array[i].identifier // identifier is a member variable
other_array[i].identifier() // member function getting value
other_array[i].identifier(double) // member function setting value
You may or may not prefer to put get_ and set_ or similar on the last two functions should you find them useful, but from your question I think you won't: functions are prefered in APIs between parts of large systems involving many developers, or when the data items may vary and you want the algorithms working on the data to be independent thereof.
So, a good test: if you find yourself writing code like other_array[i][3] where you've decided "3" is the double with the speed in it, and other_array[i][5] because "5" is the the acceleration, then stop doing that and give them proper identifiers so you can say other_array[i].speed and .acceleration. Then other developers can read and understand it, and you're much less likely to make accidental mistakes. On the other hand, if you are iterating over those 6 elements doing exactly the same things to each, then you probably do want Particle to hold a double[6], or to provide an operator[](int). There's no problem doing both:
struct Particle
{
double x[6];
double& speed() { return x[3]; }
double speed() const { return x[3]; }
double& acceleration() { return x[5]; }
...
};
BTW / the reason that vector<vector<double> > may be too costly is that each set of 6 doubles will be allocated on the heap, and for fast allocation and deallocation many heap implementations use fixed-size buckets, so your small request will be rounded up t the next size: that may be a significant overhead. The outside vector will also need to record a extra pointer to that memory. Further, heap allocation and deallocation is relatively slow - in you're case, you'd only be doing it at startup and shutdown, but there's no particular point in making your program slower for no reason. Even more importantly, the areas on the heap may just around in memory, so your operator[] may have cache-faults pulling in more distinct memory pages than necessary, slowing the entire program. Put another way, vectors store elements contiguously, but the pointed-to-vectors may not be contiguous.