using multiple keyvalues for std::map C++ - c++

Hi I am trying to write a function which reads data from a file and then saves it in memory.
This memory will need a x and a y value to be identified. It may not be linear, there can be big jumps between the different x and y values and the amount of values are unknown which excludes the use of an multi-dimensional array.
I wanted to use std::map since it does what I need, but it does not support multiple key values. What else could I use to store the data or is there a way to merge the X and Y values so that it will be able to be used in a map container?

Make a pair of the x and y values, and use that as the key:
std::map<std::pair<int, int>, whatever>
Note that as it stands, this will treat the x values as more significant than the y values if you traverse the map in order. If you want the y values to be more significant, you'd want to put them first in the pair.

You should use an std::pair as the key to your map:
std::map<std::pair<int, int>, value_type> m;
You can insert into the map using:
m[std::make_pair(0, 0)] = some_value;
If you don't care about the order of your elements and would like faster retrieval and insertion, try a std::unordered_map instead.

Although you could use std::pair as suggested by the others, I would seriously consider making a simple class that contains data members for your key. It improves readability, and also makes it easier to extend it with a 3rd, 4th, ... member if needed.
If you start with std::pair, and then you want to add a 3rd element, you could be tempted to move to std::tuple, but this can result in unreadable code.
Simply make a key class, and give it a decent constructor (one argument per data member).

Related

performance of boost multi_index_container

I am interested to know the performance of multi_index_container for the following use case:
struct idx_1 {};
struct idx_2 {};
typedef multi_index_container<
Object,
indexed_by<
// Keyed by: idx1
hashed_unique<
tag<idx_1>,
unique_key >,
// Keyed by: (attribute1, attribute2 and attribute3)
ordered_non_unique<
tag<idx_2>,
composite_key<
Object,
attribute1,
attribute2,
attribute3 > >
>
> ObjectMap;
I need a map to save the object, and the number of objects should be more than 300,000. while each object has 1 unique key and 3 attributes. The details of the keys:
unique key is "unique" as the name
each attribute only has a few possible values, say there's only 16 combinations. So with 300,000 objects, each combination will have a list of 300,000/16 objects
attribute1 needs to be modified from one value to another value occasionally
object finding is always be done via the unique_key while the composite_key is used to iterating objects with one or several attributes
For such use case, multi_index_container is a very good fit as I don't need to maintain several map independently. For the unique key part, I believe hashed_unique is a good candidate instead of ordered_unique.
But I am extremely not comfortable about the "ordered_non_unique" part. I don't know how's implemented in boost. My guess it boost maintain a list of objects in a single list for each combination similar to the unordered_map(forgive me if it's too naive!). If that's the case, modify the attribute an existing object will be a big pain as it requires to 1) go through a long list of objects for a particular combination 2) execute the equal comparison 3) and move the destination combination.
the steps that I suspect with high latency:
ObjectMap objects_;
auto& by_idx1 = objects_.get<idx1>();
auto it = by_idx1.find(some_unique_key);
Object new_value;
by_idx1.modify(it, [&](const Object& object) {
object = new_value;
});
My concern is that whether the last "modify" function has some liner behavior as stated to go through some potential long list of objects under one combination...
As this is a very specific piece of code, I'd suggest you benchmark and profile it using a large amount of real-world data.
As Fabio comments, your best option is to profile the case and see the outcome. Anyway, an ordered_non_unique index is implemented exactly as a std::multimap, namely via a regular rb-tree with the provision that elements with equivalent keys are allowed to coexist in the container; no lists of equivalent elements or anything. As for modify (for your particular use case replace is a better fit), the following procedure is executed:
Check if the element is in place: O(1).
If not, rearrange: O(log n), which for 300,000 elements amounts to a maximum of 19 element comparisons (not 300,000/16=18,750 as you suggest): these comparisons are done lexicographically on the triple (attribute1, attribute2, attribute3). Is this fast enough or not? Well, that depends on your performance requirements, so only profiling can really tell.

Efficiency of std::tuple and std::map

I am currently building a program that relies on multiple vectors and maps to retain previously computed information. My vectors are of the standard format, and not very interesting. The maps are of the form
std::map<std::string, long double>
with the intent of the string being a parseable mapping of one vector onto another, such as
std::map<std::string, long double> rmMap("fvProperty --> fvTime", 3.0234);
where I can later on split the string and compare the substrings to the vector names to figure out which ones were involved in getting the number.
However, I have recently found that std::tuple is available, meaning I can skip the string entirely and use the vector positions instead with
std::tuple<unsigned int, unsigned int, long double>
This allows me (as far as I can tell) to use both the first and the second value as a key, which seems preferable to parsing a string for my indices.
My crux is that I am unaware of the efficiency here. There will be a lot of calls to these tuples/maps, and efficacy is of the essence, as the program is expected to run for weeks before producing an end result.
Thus, I would ask you if tuples are more or equally efficient (in terms of memory, cache, and cycles) than maps when it comes to large and computationally intense programs.
EDIT:
If tuples cannot be used in this way, would the map
std::map<std::pair<unsigned int, unsigned int>, long double>
be an effective substitute to using strings for identification?
The advantage of the map is to efficiently query data associated to a key.
It cannot be compared to tuples which just pack values together: you will have to travel tuples yourself to retrieve the correct value.
Using a map<pair<unsigned int, unsigned int>, long double> as Mike suggests is probably the way to go.
Tuples and Maps are used for very different purposes. So it should be primarily about what you want to use them for, not how efficient they are. You'll have some trouble to turn on your car with a screw driver, as you will have trying to fix a screw with your car keys. I'd advice against both.
A tuple is one little dataset, in your case e.g. the set {1,2,3.0234}. A map is for mapping multiple keys to their values. In fact, a map internally consists of multiple tuples (pairs), each one containing a key and the associated value. Inside the map, the pairs are arranged in a way that makes searching for the keys easy.
In your case, I'd prefer the map<pair<int, int>, double> as your edit suggests. The keys (i.e. the vector index pairings) are much easier to search for and "parse" than those strings.

std::map<int, int> vs. vector of vector

I need a container to store a value (int) according to two attributes, source (int) and destination (int) i.e. when a source sends something to a destination, I need to store it as an element in a container. The source is identified by a unique int ID (an integer from 0-M), where M is in the tens to hundreds, and so is the destination (0-N). The container will be updated by iterations of another function.
I have been using a vector(vector(int)) which means goes in the order of source(destination(value)). A subsequent process needs to check this container, to see if an element exists in for a particular source, and a particular destination - it will need to differentiate between an empty 'space' and a filled one. The container has the possibility of being very sparse.
The value to be stored CAN be 0 so I haven't had success trying to find out if the space is empty, since I can't seem to do something like container[M][N].empty().
I have no experience with maps, but I have seen another post that suggests a map might be useful, and an std::map<int, int> seems to be similar to a vector<vector<int>>.
To summarise:
Is there a way to check if a specific vector of vector 'space' is empty (since I can't compare it to 0)
Is a std::map<int, int> better for this purpose, and how do I use one?
I need a container to store a value (int) according to two attributes,
source (int) and destination (int)
std::map<std::pair<int, int>, int>
A subsequent process needs to check this container, to see if an
element exists in for a particular source, and a particular
destination - it will need to differentiate between an empty 'space'
and a filled one.
std::map::find
http://www.cplusplus.com/reference/map/map/find/
The container has the possibility of being very sparse.
Use a std::map. The "correct" choice of a container is based on how you need to find things and how you need to insert/delete things. If you want to find things fast, use a map.
First of all, assuming you want an equivalent structure of
vector<vector<int>>
you would want
std::map<int,std::vector<int>>
because for each key in a map, there is one unique value only.
If your sources are indexed very closely sequentially as 0...N, will be doing a lot of look-ups, and few deletions, you should use a vector of vectors.
If your sources have arbitrary IDs that do not closely follow a sequential order or if you are going to do a lot of insertions/deletions, you should use a map<int,vector<int>> - usually implemented by a binary tree.
To check the size of a vector, you use
myvec.size()
To check whether a key exists in a map, you use
mymap.count(ID) //this will return 0 or 1 (we cannot have more than 1 value to a key)
I have used maps for a while and even though I'm nowhere close to an expert, they've been very convenient for me to use for storing and modifying connections between data.
P.S. If there's only up to one destination matching a source, you can proceed with
map<int,int>
Just use the count() method to see whether a key exists before reading it
If you want to keep using a vector but want to add a check for whether the item contains a valid value, look at boost::optional. The type would now be std::vector<std::vector<boost::optional<int>>>.
You can also use a map, but the key into the map needs to be both IDs not just one.
std::map<std::pair<int,int>,int>
Edit: std::pair implements a comparison operator operator< that should be sufficient for use in a map, see http://en.cppreference.com/w/cpp/utility/pair/operator_cmp.

Address of map value

I have a settings which are stored in std::map. For example, there is WorldTime key with value which updates each main cycle iteration. I don't want to read it from map when I do need (it's also processed each frame), I think it's not fast at all. So, can I get pointer to the map's value and access it? The code is:
std::map<std::string, int> mSettings;
// Somewhere in cycle:
mSettings["WorldTime"] += 10; // ms
// Somewhere in another place, also called in cycle
DrawText(mSettings["WorldTime"]); // Is slow to call each frame
So the idea is something like:
int *time = &mSettings["WorldTime"];
// In cycle:
DrawText(&time);
How wrong is it? Should I do something like that?
Best use a reference:
int & time = mSettings["WorldTime"];
If the key doesn't already exist, the []-access will create the element (and value-initialize the mapped value, i.e. 0 for an int). Alternatively (if the key already exists):
int & time = *mSettings.find("WorldTime");
As an aside: if you have hundreds of thousands of string keys or use lookup by string key a lot, you might find that an std::unordered_map<std::string, int> gives better results (but always profile before deciding). The two maps have virtually identical interfaces for your purpose.
According to this answer on StackOverflow, it's perfectly OK to store a pointer to a map element as it will not be invalidated until you delete the element (see note 3).
If you're worried so much about performance then why are you using strings for keys? What if you had an enum? Like this:
enum Settings
{
WorldTime,
...
};
Then your map would be using ints for keys rather than strings. It has to do comparisons between the keys because I believe std::map is implemented as a balanced tree. Comparisons between ints are much faster than comparisons between strings.
Furthermore, if you're using an enum for keys, you can just use an array, because an enum IS essentially a map from some sort of symbol (ie. WorldTime) to an integer, starting at zero. So then do this:
enum Settings
{
WorldTime,
...
NumSettings
};
And then declare your mSettings as an array:
int mSettings[NumSettings];
Which has faster lookup time compared to a std::map. Reference like this then:
DrawText(mSettings[WorldTime]);
Since you're basically just accessing a value in an array rather than accessing a map this is going to be a lot faster and you don't have to worry about the pointer/reference hack you were trying to do in the first place.

How to associate to a number another number without using array

Let's say we have read these values:
3
1241
124515
5322353
341
43262267234
1241
1241
3213131
And I have an array like this (with the elements above):
a[0]=1241
a[1]=124515
a[2]=43262267234
a[3]=3
...
The thing is that the elements' order in the array is not constant (I have to change it somewhere else in my program).
How can I know on which position does one element appear in the read document.
Note that I can not do:
vector <int> a[1000000000000];
a[number].push_back(all_positions);
Because a will be too large (there's a memory restriction). (let's say I have only 3000 elements, but they're values are from 0 to 2^32)
So, in the example above, I would want to know all the positions 1241 is appearing on without iterating again through all the read elements.
In other words, how can I associate to the number "1241" the positions "1,6,7" so I can simply access them in O(1) (where 1 actually is the number of positions the element appears)
If there's no O(1) I want to know what's the optimal one ...
I don't know if I've made myself clear. If not, just say it and I'll update my question :)
You need to use some sort of dynamic array, like a vector (std::vector) or other similar containers (std::list, maybe, it depends on your needs).
Such data structures are safer and easier to use than C-style array, since they take care of memory management.
If you also need to look for an element in O(1) you should consider using some structures that will associate both an index to an item and an item to an index. I don't think STL provides any, but boost should have something like that.
If O(log n) is a cost you can afford, also consider std::map
You can use what is commonly refered to as a multimap. That is, it stores Key and multiple values. This is an O(log) look up time.
If you're working with Visual Studios they provide their own hash_multimap, else may I suggest using Boost::unordered_map with a list as your value?
You don't need a sparse array of 1000000000000 elements; use an std::map to map positions to values.
If you want bi-directional lookup (that is, you sometimes want "what are the indexes for this value?" and sometimes "what value is at this index?") then you can use a boost::bimap.
Things get further complicated as you have values appearing more than once. You can sacrifice the bi-directional lookup and use a std::multimap.
You could use a map for that. Like:
std::map<int, std::vector<int>> MyMap;
So everytime you encounter a value while reading the file, you append it's position to the map. Say X is the value you read and Y is the position then you just do
MyMap[X].push_back( Y );
Instead of you array use
std::map<int, vector<int> > a;
You need an associative collection but you might want to associated with multiple values.
You can use std::multimap< int, int >
or
you can use std::map< int, std::set< int > >
I have found in practice the latter is easier for removing items if you just need to remove one element. It is unique on key-value combinations but not on key or value alone.
If you need higher performance then you may wish to use a hash_map instead of map. For the inner collection though you will not get much performance in using a hash, as you will have very few duplicates and it is better to std::set.
There are many implementations of hash_map, and it is in the new standard. If you don't have the new standard, go for boost.
It seems you need a std::map<int,int>. You can store the mapping such as 1241->0 124515->1 etc. Then perform a look up on this map to get the array index.
Besides the std::map solution offered by others here (O(log n)), there's the approach of a hash map (implemented as boost::unordered_map or std::unordered_map in C++0x, supported by modern compilers).
It would give you O(1) lookup on average, which often is faster than a tree-based std::map. Try for yourself.
You can use a std::multimap to store both a key (e.g. 1241) and multiple values (e.g. 1, 6 and 7).
An insert has logarithmic complexity, but you can speed it up if you give the insert method a hint where it can insert the item.
For O(1) lookup you could hash the number to find its entry (key) in a hash map (boost::unordered_map, dictionary, stdex::hash_map etc)
The value could be a vector of indices where the number occurs or a 3000 bit array (375 bytes) where the bit number for each respective index where the number (key) occurs is set.
boost::unordered_map<unsigned long, std::vector<unsigned long>> myMap;
for(unsigned long i = 0; i < sizeof(a)/sizeof(*a); ++i)
{
myMap[a[i]].push_back(i);
}
Instead of storing an array of integer, you could store an array of structure containing the integer value and all its positions in an array or vector.