What container to choose - c++

I thought about storing some objects ... and now I don't know what to choose.
So, now I have such code:
std::map<std::string, Object*> mObjects;
But, as I was told here before, it's slow due to allocation of std::string in each searching, so the key should be integer.
Why did I chose std::string as key? Because it's very easy to access objects by their name, for example:
mObjects["SomeObj"];
So my first idea is:
std::map<int, Object*> mObjects;
and key is an CRC of object name:
mObjects[CRC32("SomeObject")];
But it's a bit unstable. And I know there is special hash-maps for this.
And the last, I have to sort my objects in map using some Compare function.
Any ideas about container I can use?
So again, the main points:
Accesing objects by string, but keyshould be integer, not string
Sorting objects in map by some function
p.s. boost usage is permissible.

I can't say for sure, but are you always accessing items in the map by a literal string? If so, then you should just use consecutive enumerated values with symbolic names, and an appropriately sized vector.
Assuming that you won't know the names until runtime 1000 items in the map seems really small for searching to possibly be a bottleneck. Are you sure that the lookup is the performance problem? Have you profiled to make sure that is the case? In general, using the most intuitive container is going to result in better code (because you can grasp the algorithm more easily) code.
Does your comment about constructing strings imply that you passing C-strings into the find function over and over? Try to avoid that by using std::string consistently in your application.
If you insist on using the two-part approach: I suggest storing all your items in a vector. Then you have one unordered_map from string to index and another vector that has all the indexes into the main container. Then you sort this second container of indexes to get the ordering you need. Finally, when you delete items from the master container you'll need to clean up both of the other two referencing containers.

Related

Storing pointers inside C++ standard library containers

I am implementing a project, in an enviroment where I am required to create hundreds of millions of std::string objects. I am storing these strings in multiple containers, therefore the count (copying) of these objects are multiplied, and is a huge bottleneck for my program.
I am trying to come up with a solution, and my online research has taken me so far. Basically my idea is, given that the strings I construct are constant and unnecessarily copied, I would like to instead allocate my own c-type strings (char arrays), and share these pointers across containers (I need these containers because of different advantages of look-up, insertion, etc...).
The main containers that I use are std::vector, std::map, std::unordered_set. For the last two, I am looking for ways to make these containers compatible with the char* type. I employed the help of stackoverflow, created a custom hash function for std::unordered_set and a char* comparison (less than, "<") for like std::map<_,_,less_than>.
To make my questions clear, I am going to list them.
Before going into technical issues, is this achievable, or standard usage, or worth striving for ?
Comparison function works, insertion is successful. However, given that two pointer can point to the same string, but act like different keys inside an std::ordered_set and std::map, will I also need something like an equality operator overload to be able to use methods like contains or erase. For example if const char* p1 = "beta" and const char* p2 = "beta", std::map::erase(p1) should be able to delete the entry p2 if present inside std::map (suppose two "beta" 's are inside different memory positions.`).
If I made myself clear, is there a better way ?
Thanks for your time.

Use 2 keys in std::map

I was trying to create a std::map with 2 keys. I can do it with std::pair or create a struct and use it as the key.
In my software, there's a search function that is being called a lot. This function searches by the 1st or the 2nd key.
If I had about 1000 items in the map, I'm guessing it will take some time if I wanted to search it. So I thought that if I make another std::map that holds the 2nd key and the value is 1st key, then I can take the value and search in the other map to get the real value.
But my guess is that this will take more memory. What is the best option in this scenario?
This is an engineering decision that you'll have to resolve.
Multiple maps only makes sense (IMHO) if you know that the sets of key1 and key2 keys conflict. Otherwise, why not just insert both keys into the same map, each corresponding value referring to your object?
You don't want to duplicate your object, so you might put them in a vector, and put the vector index as the mapped value. Or use a map of key-to-pointer, etc.
1k items is not really that many, so I'm not concerned about the memory use here, but using a map instead of an unordered_map might be a concern (rb-tree vs. hash table).
Also, if you remove items from the map, you'll need to be able to remove both keys together, so be sure to account for that.
Brute force approach
You can store your items in std::vector and have two maps: the first with your first key and pointers (or indices) to vector items and the second with your second key and pointers (or indices) to vector items. The problem is to maintain all three containers when your set is modified.
Pointers vs. indices: Pointers are dangerous as pointed correctly in the comment, but simpler if you're going to delete items from the vector. Otherwise indices are safer.
Smart approach
You can use Boost.MultiIndex container that was designed for cases exactly like yours.

Dynamic size of array in c++?

I am confused. I don't know what containers should I use. I tell you what I need first. Basically I need a container that can stored X number of Object (and the number of objects is unknown, it could be 1 - 50k).
I read a lot, over here array vs list its says: array need to be resized if the number of objects is unknown (I am not sure how to resize an array in C++), and it also stated that if using a linked list, if you want to search certain item, it will loop through (iterate) from first to end (or vice versa) while an array can specify "array object at index".
Then I went for an other solution, map, vector, etc. Like this one: array vs vector. Some responder says never use array.
I am new to C++, I only used array, vector, list and map before. Now, for my case, what kind of container you will recommend me to use? Let me rephrase my requirements:
Need to be a container
The number of objects stored is unknown but is huge (1 - 40k maybe)
I need to loop through the containers to find specific object
std::vector is what you need.
You have to consider 2 things when selecting a stl container.
Data you want to store
Operations you want to perform on the stored data
There wasa good diagram in a question here on SO, which depitcs this, I cannot find the link to it but I had it saved long time ago, here it is:
You cannot resize an array in C++, not sure where you got that one from. The container you need is std::vector.
The general rule is: use std::vector until it doesn't work, then shift to something that does. There are all sorts of theoretical rules about which one is better, depending on the operations, but I've regularly found that std::vector outperforms the others, even when the most frequent operations are things where std::vector is supposedly worse. Locality seems more important than most of the theoretical considerations on a modern machine.
The one reason you might shift from std::vector is because of iterator validity. Inserting into an std::vector may invalidate iterators; inserting into a std::list never.
Do you need to loop through the container, or you have a key or ID for your objects?
If you have a key or ID - you can use map to be able to quickly access the object by it, if the id is the simple index - then you can use vector.
Otherwise you can iterate through any container (they all have iterators) but list would be the best if you want to be memory efficient, and vector if you want to be performance oriented.
You can use vector. But if you need to find objects in the container, then consider using set, multiset or map.

Efficient C++ associative container with vector key

I've constructed a map which has a vector as its key: map<vector<KeyT>, T> which I'm trying to optimize now.
An experiment with manually nested maps map<vector<KeyT>, map<KeyT,T> > where the first key is the original vector minus the last element and the second key is the last element shows a reasonable speed-up.
Now I'm wondering whether there exists a semi-standard implementation (like boost or similar) of an associative container where vector keys are implemented as such a hierarchical structure of containers.
Ideally, this would create as many layers as there are elements in the key vector, while keeping a uniform syntax for vectors of different length.
Are you sure you need to optimise it? std::string is basically like a std::vector and we happily use std::string as an array key!
Have you profiled your code? std::map doesn't copy its key/value pairs unneccesarily -- what exactly are you afraid of?
Are your vector keys of a fixed-size? std::tuple might help in that case.
If not, it might help to partition your containers according to the length of the key, although the effectiveness of schemes such as this are highly domain-dependent.
My first hunch is that you want to improve map lookup time by reducing the volume of the key. This is what hash functions are for. C++ tr1 and Boost have hash_maps by the name of unordered_map
I'll try to devise a small sample in some time here

Hash map, string compares, and std::map?

First off, I would like to make a few points I believe to be true. Please can these be verified?
A hash map stores strings by
converting them into an integer
somehow.
std::map is not a hash map, and if I'm using strings, I should consider using a hash map for memory issues?
String compares are not good to rely on.
If std::map is not a hash map and I should not be relying on string compares (basically, I have a map with strings as keys...I was told to look up using hash maps instead?), is there a hash map in the C++ STL? If not, how about Boost?
Secondly, Is a hash map worth it for [originally] an std::map< std::string, non-POD GameState >?
I think my point is getting across...I plan to have a store of different game states I can look up and register into a factory.
If any more info is needed, please ask.
Thanks for your time.
I don't believe most of your points are correct.
there is no hash map in the current standard. C++0x introduces unordered_map, who's implementation will be a hash table and your compiler probably already supports it.
std::map is implemented as a balanced tree, not a hash table. There are no "memory issues" when using either map type with strings, either as keys or data.
strings are not stored as numbers in either case - an unordered_map will use a hashing function to derive a numeric key from the string, but this is not stored.
my experience is that unordered_map is about twice the speed of map - they have basically the same interface, so you can try both with your own data - whenever you are interested in performance you should always perform tests your self with your own real data, rather than depending on the experience of others. Both map types will be somewhat sensitive to the length of the string key.
Assuming you have some class A, that you want to access via a string key, the maps would be declared as:
map <string, A> amap;
unordered_map <string, A> umap;
I made a benchmark that compares std::map with boost::unordered_map.
My conclusion was basically this: If you do not need map-specific things like equal_range, then always use boost::unordered_map.
The full benchmark can be found here
A hash map will typically have some integral representation of a string, yes.
std::map has a requirement to be sorted, so implementing it as a hash table is unlikely, and I've never seen it in practice.
Whether string comparisons are good or bad depends entirely on what you're doing, what data you're comparing, and how often. If the first letter differs then that's barely any different from an integer comparison, for example.
You want unordered_map (that's the Boost version - there is also a version in the TR1 standard library if your compiler has that).
Is it worth it for game states? Yes, but only because using an unordered_map is simple. You're prematurely worrying about optimisations at this stage. Save the worries over access patterns for things you're going to look up thousands of times a second (ie. when your profiler tells you that it's a problem).