set map implementation in C++ - c++

I find that both set and map are implemented as a tree. set is a binary search tree, map is a self-balancing binary search tree, such as red-black tree? I am confused about the difference about the implementation. The difference I can image are as follow
1) element in set has only one value(key), element in map has two values.
2) set is used to store and fetch elements by itself. map is used to store and fetch elements via key.
What else are important?

Maps and sets have almost identical behavior and it's common for the implementation to use the exact same underlying technique.
The only important difference is map doesn't use the whole value_type to compare, just the key part of it.

Usually you'll know right away which you need: if you just have a bool for the "value" argument to the map, you probably want a set instead.
Set is a discrete mathematics concept that, in my experience, pops up again and again in programming. The stl set class is a relatively efficient way to keep track of sets where the most common opertions are insert/remove/find.
Maps are used where objects have a unique identity that is small compared to their entire set of attributes. For example, a web page can be defined as a URL and a byte stream of contents. You could put that byte stream in a set, but the binary search process would be extremely slow (since the contents are much bigger than the URL) and you wouldn't be able to look up a web page if its contents change. The URL is the identity of the web page, so it is the key of the map.

A map is usually implemented as a set< std::pair<> >.
The set is used when you want an ordered list to quickly search for an item, basically, while a map is used when you want to retrieve a value given its key.
In both cases, the key (for map) or value (for set) must be unique. If you want to store multiple values that are the same, you would use multimap or multiset.

Related

Can I reinterpret a memory mapped file of key-value pairs as a map in order to sort them?

I have a memory mapped file that contains key-value pairs. Both the key and value are uint32_t, and all the keys and values are stored in the file in binary, where a key immediately proceeds the value. The file contains only these pairs, no delimiters.
I want to be able to sort all of these key-value pairs by increasing key.
The following just compiled in my code:
struct FileAsMap { map<uint32_t, uint32_t> keyValueMap; };
const FileAsMap* fileAsMap = reinterpret_cast<FileAsMap*>(mmappedData);
but I don't really know what to do from here, since by definition the map container keeps a strict weak ordering of the pairs by key. If I just reinterpret the mapped file as a map, how can I get the pairs to order?
it's not an answer but explanations don't fit into comment limitations.
The keys in a map are usually unique (at least in std::map they are). But maps in general differ one from another in method they sort stored keys. For example std::map is based on a balanced binary tree with average complexity of retrieving a given key equal to O(ln(n)) where n is a number of elements in the map. Or e.g. std::unordered_map is a hashmap internally with the average access time = O(1). That is it looks for a key in constant time regardless of number of elements inside.
In any case all these data containers demand dedicated internal in-memory structure which practically never looks like a simple stream of key-value pairs. That's why I've told above in the first comment that it's almost impossible to reuse one of standard maps as a convenient data accessor for mmap-ed data w/o prior read and unpack the data stream.
But you can create your own map-like class which would iterate over data in mmap-ed area and would check in its operator[](size_t i) if a stored key matches the requested one. Iguess that a simplest implementation would take a single screen of code.
But beware: sequental scan is a relatively expensive operation, so if you got enough elements in the file, it could become unacceptable slow. In this case you'll need some optimized indexing. For example all keys are read in the beginning of processing and an indexing array is built. But all these questions heavily depend on task details, ao it's better to stop explanations now.
If you have any further questions feel free to ask. Of course a good question assumes that you have already studied the subject and now have encountered a particular problem that you can't solve yoursef
There are a lot of reasons why the answer is no. The two simplest are:
Maps are a structure that stores data in a form in which it's already sorted. Your data isn't already sorted, so it's simply not a map.
The map class has its own internal data structure that it uses to store maps. Unless your file replicates this internal structure perfectly (which it almost certainly can't since it likely includes pointers into memory) the map class will misunderstand the data in the file.
How did u serialize the data to the file?
Assuming that you serialized a struct consisting of maps, you'd de-serialize as below:
FileAsMap* fileAsMap = reinterpret_cast<FileAsMap*>(mmappedData);
Gives access to entire structure (blob).
(*fileAsMap).keyValueMap gives access to map.

std::map<int, int> vs. vector of vector

I need a container to store a value (int) according to two attributes, source (int) and destination (int) i.e. when a source sends something to a destination, I need to store it as an element in a container. The source is identified by a unique int ID (an integer from 0-M), where M is in the tens to hundreds, and so is the destination (0-N). The container will be updated by iterations of another function.
I have been using a vector(vector(int)) which means goes in the order of source(destination(value)). A subsequent process needs to check this container, to see if an element exists in for a particular source, and a particular destination - it will need to differentiate between an empty 'space' and a filled one. The container has the possibility of being very sparse.
The value to be stored CAN be 0 so I haven't had success trying to find out if the space is empty, since I can't seem to do something like container[M][N].empty().
I have no experience with maps, but I have seen another post that suggests a map might be useful, and an std::map<int, int> seems to be similar to a vector<vector<int>>.
To summarise:
Is there a way to check if a specific vector of vector 'space' is empty (since I can't compare it to 0)
Is a std::map<int, int> better for this purpose, and how do I use one?
I need a container to store a value (int) according to two attributes,
source (int) and destination (int)
std::map<std::pair<int, int>, int>
A subsequent process needs to check this container, to see if an
element exists in for a particular source, and a particular
destination - it will need to differentiate between an empty 'space'
and a filled one.
std::map::find
http://www.cplusplus.com/reference/map/map/find/
The container has the possibility of being very sparse.
Use a std::map. The "correct" choice of a container is based on how you need to find things and how you need to insert/delete things. If you want to find things fast, use a map.
First of all, assuming you want an equivalent structure of
vector<vector<int>>
you would want
std::map<int,std::vector<int>>
because for each key in a map, there is one unique value only.
If your sources are indexed very closely sequentially as 0...N, will be doing a lot of look-ups, and few deletions, you should use a vector of vectors.
If your sources have arbitrary IDs that do not closely follow a sequential order or if you are going to do a lot of insertions/deletions, you should use a map<int,vector<int>> - usually implemented by a binary tree.
To check the size of a vector, you use
myvec.size()
To check whether a key exists in a map, you use
mymap.count(ID) //this will return 0 or 1 (we cannot have more than 1 value to a key)
I have used maps for a while and even though I'm nowhere close to an expert, they've been very convenient for me to use for storing and modifying connections between data.
P.S. If there's only up to one destination matching a source, you can proceed with
map<int,int>
Just use the count() method to see whether a key exists before reading it
If you want to keep using a vector but want to add a check for whether the item contains a valid value, look at boost::optional. The type would now be std::vector<std::vector<boost::optional<int>>>.
You can also use a map, but the key into the map needs to be both IDs not just one.
std::map<std::pair<int,int>,int>
Edit: std::pair implements a comparison operator operator< that should be sufficient for use in a map, see http://en.cppreference.com/w/cpp/utility/pair/operator_cmp.

What's the best way to search from several map<key,value>?

I have created a vector which contains several map<>.
vector<map<key,value>*> v;
v.push_back(&map1);
// ...
v.push_back(&map2);
// ...
v.push_back(&map3);
At any point of time, if a value has to be retrieved, I iterate through the vector and find the key in every map element (i.e. v[0], v[1] etc.) until it's found. Is this the best way ? I am open for any suggestion. This is just an idea I have given, I am yet to implement this way (please show if any mistake).
Edit: It's not important, in which map the element is found. In multiple modules different maps are prepared. And they are added one by one as the code progresses. Whenever any key is searched, the result should be searched in all maps combined till that time.
Without more information on the purpose and use, it might be a little difficult to answer. For example, is it necessary to have multiple map objects? If not, then you could store all of the items in a single map and eliminate the vector altogether. This would be more efficient to do the lookups. If there are duplicate entries in the maps, then the key for each value could include the differentiating information that currently defines into which map the values are put.
If you need to know which submap the key was found in, try:
unordered_set<key, pair<mapid, value>>
This has much better complexity for searching.
If the keys do not overlap, i.e., are unique througout all maps, then I'd advice a set or unordered_set with a custom comparision functor, as this will help with the lookup. Or even extend the first map with the new maps, if profiling shows that is fast enough / faster.
If the keys are not unique, go with a multiset or unordered_multiset, again with a custom comparision functor.
You could also sort your vector manually and search it with a binary_search. In any case, I advice using a tree to store all maps.
It depends on how your maps are "independently created", but if it's an option, I'd make just one global map (or multimap) object and pass that to all your creators. If you have lots of small maps all over the place, you can just call insert on the global one to merge your maps into it.
That way you have only a single object in which to perform lookup, which is reasonably efficient (O(log n) for multimap, expected O(1) for unordered_multimap).
This also saves you from having to pass raw pointers to containers around and having to clean up!

C++: multiple keyed map

I am searching for a (multi)map where there values are associated by different key types. Basically what was asked here for Java but for C++. Is there something like this already or do I have to implement it myself?
Another, more simple case (the above case would solve this already but there may be a more simple solution esp for this case):
I want a multimap where my values are all unique and ordered (the keys are also ordered of course) and I want to be able to do a search in the map for a specific value in O(log n) time. So I can get the associated key to a value in O(log n) time. And I can get the associated value to a key also in O(log n) time.
If you want to be able to search both by key and by value use boost.bimap.
If you need multiple keys use boost.multi-index.
Boost Multi-Index.

How to implement an associative array/map/hash table data structure (in general and in C++)

Well I'm making a small phone book application and I've decided that using maps would be the best data structure to use but I don't know where to start. (Gotta implement the data structure from scratch - school work)
Tries are quite efficient for implementing maps where the keys are short strings. The wikipedia article explains it pretty well.
To deal with duplicates, just make each node of the tree store a linked list of duplicate matches
Here's a basic structure for a trie
struct Trie {
struct Trie* letter;
struct List *matches;
};
malloc(26*sizeof(struct Trie)) for letter and you have an array. if you want to support punctuations, add them at the end of the letter array.
matches can be a linked list of matches, implemented however you like, I won't define struct List for you.
Simplest solution: use a vector which contains your address entries and loop over the vector to search.
A map is usually implemented either as a binary tree (look for red/black trees for balancing) or as a hash map. Both of them are not trivial: Trees have some overhead for organisation, memory management and balancing, hash maps need good hash functions, which are also not trivial. But both structures are fun and you'll get a lot of insight understanding by implementing one of them (or better, both :-)).
Also consider to keep the data in the vector list and let the map contain indices to the vector (or pointers to the entries): then you can easily have multiple indices, say one for the name and one for the phone number, so you can look up entries by both.
That said I just want to strongly recommend using the data structures provided by the standard library for real-world-tasks :-)
A simple approach to get you started would be to create a map class that uses two vectors - one for the key and one for the value. To add an item, you insert a key in one and a value in another. To find a value, you just loop over all the keys. Once you have this working, you can think about using a more complex data structure.