which data structure for a list of object with fast lookup feature - c++

I have a data structure and have to perform lookup on it, I would like to optimize things...
struct Data
{
std::string id_;
double data_;
};
I use currently a std::vector<Data> and std::find algorithm but I'm wondering if another data structure would be more convenient :
hash table ?
map ?
boost multi index container ?
other things ?
EDIT:
Each time I receive a message from network I have to lookup into this vector (with id as key), and update/retrieve some informations. (Data structure have more fields than in my example)
EDIT2:
I don't care about order.
I have to insert/erase element into this data structure frequently.

It really depends on your requirements, but two possibilities are to sort your vector and do a binary search, or to use a map. Both can be implemented within about 15 minutes, so I suggest you try both of them.
Edit: Given your requirement that you want to add and remove things often, and the size of your data, I'd use an unordered_map (i.e. a hash table) as the first try. You can always change to another container later.

It depends on whether you care about the order of the elements in your container or not. If you do care, you can do no better than now. If you don't, a hashed container should provide the fastest lookup.
But it also depends on other factors. For instance, if you create the container once and never change it, then maybe an ordered vector, with binary search, will be best.

Related

Datastructure for quick access with more than one key or with key and priority

Thanks to std::map and similar data structures, it's easy to do quick insertion, access and deletion of data elements based on a key.
Thanks to std::make_heap and it's colleages, it's easy to maintain a priority queue based on a value.
But very often, the algorithm needs a combination of both. For example, one has the following struct:
struct entry{
int id;
char name[20];
double value;
}
The algorithm needs to quickly find and remove the entry with the highest value. That calls for a priority queue with std's heap functions. It also needs to quickly remove some elements based on name and/or id. That calls for a std::map.
When programming that kind of algorithms, I often end up just using a good datastructure for the operation that is most needed (for example, priority access), and then use a linear search through that structure for the lesser needed operation, for example removal of a key.
But is it possible to implement that kind of algorithm maintaining quick access for priority and access over two keys?
One way is boost multi index.
Another is to create two data structures whose value is a shared_ptr<const entry> and who use a different ordering, then a wrapping class that ensures adding/removing occurs in both. When you want to edit you naturally have to remove then reinsert.
Boost's multi-index is more complex to set up, but claims faster performance as the two data structures are intertwined, causing better cache performance and less memory usage.

Looking for a data structure that's fast to initialize and fast to lookup (O(1))

I need a data structure in which I want to store information about which instances I already processed during an action. Because of limitations I can't store it in the instance itself (e.g. because I can execute the action in parallel.
What's specific is that the instances for which I want to store information have a unique number, so instead of a pointer to the instance, I could use that unique number to store information.
My first solution was to use an std::set<Instance *>. Every time i process an instance, I add it to the set so that I know that I already processed that instance.
Advantage: this is very fast to initialize
Disadvantage: lookups are not O(1), but O(logN)
My second soluction was to use an std::vector<bool> (actually std::vector<byte> because bool vectors have a specific specialization which makes it slower than a non-bool vector). The unique number of the instance can be used as index into the vector, and in the vector simply contains true or false to indicate if we already processed the instance or not (luckily my unique numbers start to count from 1).
Advantage: lookups are O(1)
Disadvantage: initialization if relatively slow, since std::vector needs to initialize every element explicitly (and probably also independently)
I could also use a C-style array (on which I can use memset), but since the number of instances (or the number of unique numbers) is not known beforehand, I need to write my own code to extend the array, memset the rest of the array, ... (which is not very hard, but which is something I want to avoid).
Is there any other kind of data structure that is very fast to initialize, and has O(1) lookup time?
You may try boost::unordered_set or the new C++11 std::unordered_set. They are hashed based containers rather than trees like std::set.
Well, with such a simple identification method... I would use a hash table.
Can you not use boost::unordered_map or std::unordered_map ?
Of course, you might prefer more sophisticated implementations if you want guaranteed O(1) insertion instead of amortized O(1) insertion, but it should get you started.

Which is the fastest STL container for find?

Alright as a preface I have a need to cache a relatively small subset of rarely modified data to avoid querying the database as frequently for performance reasons. This data is heavily used in a read-only sense as it is referenced often by a much larger set of data in other tables.
I've written a class which will have the ability to store basically the entirety of the two tables in question in memory while listening for commit changes in conjunction with a thread safe callback mechanism for updating the cached objects.
My current implementation has two std::vectors one for the elements of each table. The class provides both access to the entirety of each vector as well as convenience methods for searching for a specific element of table data via std::find, std::find_if, etc.
Does anyone know if using std::list, std::set, or std::map over std::vector for searching would be preferable? Most of the time that is what will be requested of these containers after populating once from the database when a new connection is made.
I'm also open to using C++0x features supported by VS2010 or Boost.
For searching a particular value, with std::set and std::map it takes O(log N) time, while with the other two it takes O(N) time; So, std::set or std::map are probably better. Since you have access to C++0x, you could also use std::unordered_set or std::unordered_map which take constant time on average.
For find_if, there's little difference between them, because it takes an arbitrary predicate and containers cannot optimize arbitrarily, of course.
However if you will be calling find_if frequently with a certain predicate, you can optimize yourself: use a std::map or std::set with a custom comparator or special keys and use find instead.
A sorted vector using std::lower_bound can be just as fast as std::set if you're not updating very often; they're both O(log n). It's worth trying both to see which is better for your own situation.
Since from your (extended) requirements you need to search on multiple fields, I would point you to Boost.MultiIndex.
This Boost library lets you build one container (with only one exemplary of each element it contains) and index it over an arbitrary number of indices. It also lets you precise which indices to use.
To determine the kind of index to use, you'll need extensive benchmarks. 500 is a relatively low number of entries, so constant factors won't play nicely. Furthermore, there can be a noticeable difference between single-thread and multi-thread usage (most hash-table implementations can collapse on MT usage because they do not use linear-rehashing, and thus a single thread ends up rehashing the table, blocking all others).
I would recommend a sorted index (skip-list like, if possible) to accomodate range requests (all names beginning by Abc ?) if the performance difference is either unnoticeable or simply does not matter.
If you only want to search for distinct values, one specific column in the table, then std::hash is fastest.
If you want to be able to search using several different predicates, you will need some kind of index structure. It can be implemented by extending your current vector based approach with several hash tables or maps, one for each field to search for, where the value is either an index into the vector, or a direct pointer to the element in the vector.
Going further, if you want to be able to search for ranges, such as all occasions having a date in July you need an ordered data structure, where you can extract a range.
Not an answer per se, but be sure to use a typedef to refer to the container type you do use, something like typedef std::vector< itemtype > data_table_cache; Then use your typedef type everywhere.

Dynamic array width id?

I need some sort of dynamic array in C++ where each element have their own id represented by an int.
The datatype needs these functions:
int Insert() - return ID
Delete(int ID)
Get(ID) - return Element
What datatype should I use? I'we looked at Vector and List, but can't seem to find any sort of ID. Also I'we looked at map and hastable, these may be usefull. I'm however not sure what to chose.
I would probably use a vector and free id list to handle deletions, then the index is the id. This is really fast to insert and get and fairly easy to manage (the only trick is the free list for deleted items).
Otherwise you probably want to use a map and just keep track of the lowest unused id and assign it upon insertion.
A std::map could work for you, which allows to associate a key to a value. The key would be your ID, but you should provide it yourself when adding an element to the map.
An hash table is a sort of basic mechanism that can be used to implement an unordered map. It corresponds to std::unordered_map.
It seems that the best container to use is unordered_map.
It is based on hash. You can insert, delete or searche for elements in O(n).
Currently unordered_map is not in STL. If you want to use STL container use std::map.
It is based on tree. Inserts, deletes and searches for elements in O(n*log(n)).
Still the container choice depends much on the usage intensity. For example, if you will find for elements rare, vector and list could be ok. These containers do not have find method, but <algorithm> library include it.
A vector gives constant-time random access, the "id" can simply be the offset (index) into the vector. A deque is similar, but doesn't store all items contiguously.
Either of these would be appropriate, if the ID values can start at 0 (or a known offset from 0 and increment monotonically). Over time if there are a large amount of removals, either vector or deque can become sparsely populated, which may be detrimental.
std::map doesn't have the problem of becoming sparsely populated, but look ups move from constant time to logarithmic time, which could impact performance.
boost::unordered_map may be the best yet, as the best case scenario as a hash table will likely have the best overall performance characteristics given the question. However, usage of the boost library may be necessary -- but there are also unordered container types in std::tr1 if available in your STL implementation.

c++ std::map question about iterator order

I am a C++ newbie trying to use a map so I can get constant time lookups for the find() method.
The problem is that when I use an iterator to go over the elements in the map, elements do not appear in the same order that they were placed in the map.
Without maintaining another data structure, is there a way to achieve in order iteration while still retaining the constant time lookup ability?
Please let me know.
Thanks,
jbu
edit: thanks for letting me know map::find() isn't constant time.
Without maintaining another data structure, is there a way to achieve in order iteration while still retaining the constant time lookup ability?
No, that is not possible. In order to get an efficient lookup, the container will need to order the contents in a way that makes the efficient lookup possible. For std::map, that will be some type of sorted order; for std::unordered_map, that will be an order based on the hash of the key.
In either case, the order will be different then the order in which they were added.
First of all, std::map guarantees O(log n) lookup time. You might be thinking about std::tr1::unordered_map. But that by definitions sacrifices any ordering to get the constant-time lookup.
You'd have to spend some time on it, but I think you can bash boost::multi_index_container to do what you want.
What about using a vector for the keys in the original order and a map for the fast access to the data?
Something like this:
vector<string> keys;
map<string, Data*> values;
// obtaining values
...
keys.push_back("key-01");
values["key-01"] = new Data(...);
keys.push_back("key-02");
values["key-02"] = new Data(...);
...
// iterating over values in original order
vector<string>::const_iterator it;
for (it = keys.begin(); it != keys.end(); it++) {
Data* value = values[*it];
}
I'm going to actually... go backward.
If you want to preserve the order in which elements were inserted, or in general to control the order, you need a sequence that you will control:
std::vector (yes there are others, but by default use this one)
You can use the std::find algorithm (from <algorithm>) to search for a particular value in the vector: std::find(vec.begin(), vec.end(), value);.
Oh yes, it has linear complexity O(N), but for small collections it should not matter.
Otherwise, you can start looking up at Boost.MultiIndex as already suggested, but for a beginner you'll probably struggle a bit.
So, shirk the complexity issue for the moment, and come up with something that work. You'll worry about speed when you are more familiar with the language.
Items are ordered by operator< (by default) when applied to the key.
PS. std::map does not gurantee constant time look up.
It gurantees max complexity of O(ln(n))
First off, std::map isn't constant-time lookup. It's O(log n). Just thought I should set that straight.
Anyway, you have to specify your own comparison function if you want to use a different ordering. There isn't a built-in comparison function that can order by insertion time, but, if your object holds a timestamp field, you can arrange to set the timestamp at the time of insertion, and using a by-timestamp comparison.
Map is not meant for placing elements in some order - use vector for that.
If you want to find something in map you should "search" by the key using [the operator
If you need both: iteration and search by key see this topic
Yes you can create such a data structure, but not using the standard library... the reason is that standard containers can be nested but cannot be mixed.
There is no problem implementing for example a map data structure where all the nodes are also in a doubly linked list in order of insertion, or for example a map where all nodes are in an array. It seems to me that one of these structures could be what you're looking for (depending which operation you prefer to be fast), but neither of them is trivial to build using standard containers because every standard container (vector, list, set, ...) wants to be the one and only way to access contained elements.
For example I found useful in many cases to have nodes that were at the same time in multiple doubly-linked lists, but you cannot do that using std::list.