Is std::map a good solution? - c++

All,
I have following task.
I have finite number of strings (categories). Then in each category there will be a set of team and the value pairs. The number of team is finite based on the user selection.
Both sizes are not more than 25.
Now the value will change based on the user input and when it change the team should be sorted based on the value.
I was hoping that STL has some kind of auto sorted vector or list container, but the only thing I could find is std::map<>.
So what I think I need is:
struct Foo
{
std::string team;
double value;
operator<();
};
std::map<std::string,std::vector<Foo>> myContainer;
and just call std::sort() when the value will change.
Or is there more efficient way to do it?
[EDIT]
I guess I need to clarify what I mean.
Think about it this way.
You have a table. The rows of this table are teams. The columns of this table are categories. The cells of this table are divided in half. Top half is the category value for a given team. This value is increasing with every player.
Now when the player is added to a team, the scoring categories of the player will be added to a team and the data in the columns will be sorted. So, for category "A" it may be team1, team2; and for category "B" it may be team2, team1.
Then based on the position of each team the score will be assigned for each team/category.
And that score I will need to display.
I hope this will clarify what I am trying to achieve and it become more clear of what I'm looking for.
[/EDIT]

It really depend how often you are going to modify the data in the map and how often you're just going to be searching for the std::string and grabbing the vector.
If your access pattern is add map entry then fill all entries in the vector then access the next, fill all entries in the vector, etc. Then randomly access the map for the vector afterwards then .. no map is probably not the best container. You'd be better off using a vector containing a standard pair of the string and the vector, then sort it once everything has been added.
In fact organising it as above is probably the most efficient way of setting it up (I admit this is not always possible however). Furthermore it would be highly advisable to use some sort of hash value in place of the std::string as a hash compare is many times faster than a string compare. You also have the string stored in Foo anyway.
map will, however, work but it really depends on exactly what you are trying to do.

Related

C++ - Sort map based on values, if values same sort based on key

I came across a problem where I needed to store two values, one id and other its influence, and id should be randomly accessible. Also it should be sorted based on influence and if both influence are same , sort based on id. With these things in mind, I used map,but is there a way to actually do it ?
I tried below comparator and map but it gives error
struct cmp
{
bool comparator()(const pair<int,int>a,const pair<int,int>b)
{
if(a.second==b.second) return a.first<b.first;
else return a.second<b.second;
}
};
unordered_map<int,int,cmp>m;
From what I understand, you want a collection sorted by one value but quickly indexable by another. These two points are in contradiction. Sorting a collection by a key value makes it quicker to index by that key value. There is no easy way to make a collection quickly indexable in two different ways at the same time. Note that I say "quickly indexable" instead of talking about random access. That's yet a different concept.
In short, you need two collections. You can have a main one that stores influence-ID pairs and is sorted by influences, and a secondary one that stores IDs and maps them to the main collection. There are many options for that; here is one:
std::set<std::pair<int,int>> influences;
std::unordered_map<int, decltype(influences)::iterator> ids;
When inserting an influence-ID pair, you can insert into influence first and then take the iterator to that new element and insert it with the ID.
Another solution is to keep a main collection of influence-ID pairs but this time sorted by IDs (and binary search it when you need to get the influence from an ID), and maintain a array of permutation indices that tells you at what index an element of the main collection would be if it were sorted by influence. Something like this:
std::vector<std::pair<int,int>> influences;
// insert all elements sorted by ID
std::vector<decltype(influences)::size_type> sorted_indices;
// now sort by influences but modifying `sorted_indices` instead.
Relevant question
If the IDs are all from 0 to N, you can even just have influences indexed by ID:
std::vector<int> influences; // influences[id] is the influence value corresponding to id
This gives you random access.
The "correct" choice depends on the many other possible constraints you may have.

Not sure which data structure to use

Assuming I have the following text:
today was a good day and today was a sunny day.
I break up this text into lines, seperated by white spaces, which is
Today
was
a
good
etc.
Now I use the vector data structure to simple count the number of words in a text via .size(). That's done.
However, I also want to check If a word comes up more than once, and if so, how many time. In my example "today" comes up 2 times.
I want to store that "today" and append a 2/x (depending how often it comes up in a large text). Now that's not just for "today" but for every word in the text. I want to look up how often a word appears, append an counter, and sort it (the word + counters) in descending order (that's another thing, but
not important right now).
I'm not sure which data structure to use here. Map perhaps? But I can't add counters to map.
Edit: This is what I've done so far: http://pastebin.com/JncR4kw9
You should use a map. Infact, you should use an unordered_map.
unordered_map<string,int> will give you a hash table which will use strings as keys, and you can augment the integer to keep count.
unordered_map has the advantage of O(1) lookup and insertion over the O(logn) lookup and insertion of a map. This is because the former uses an array as a container whereas the latter uses some implementation of trees (red black, I think).
The only disadvantage of an unordered_map is that as mentioned in its name, you can't iterate over all the elements in lexical order. This should be clear from the explanation of their structure above. However, you don't seem to need such a traversal, and hence it shouldn't be an issue.
unordered_map<string,int> mymap;
mymap[word]++; // will increment the counter associated with the count of a word.
Why not use two data structures? The vector you have now, and a map, using the string as the key, and an integer as data, which then will be the number of times the word was found in the text.
Sort the vector in alphabetical order.
Scan it and compare every word to those that follow, until you find a different one, and son on.
a, a, and, day, day, sunny, today, today, was, was
2 1 2 1 2 2
A better option to consider is Radix Tree, https://en.wikipedia.org/wiki/Radix_tree
Which is quite memory efficient, and in case of large text input, it will perform better than alternative data structures.
One can store the frequencies of a word in the nodes of tree. Also it will reap the benefits of "locality of reference[For any text document]" too.

Implementing a mutable ranking table in c++

In an event-driven simulator, I need to keep track of the popularity of a large number of content elements from a catalog. More specifically I am interested in knowing the rank of any given content, i.e. its position in a list sorted by descending number of requests. I know that the number of requests per content is only going to be increased by one each time, so there is no dramatic shift in the ranking. Furthermore, elements are inserted or deleted from the catalog only in rare occasions, while requests are much more numerous and frequent. What is the best data structure to implement this?
These are the options that I have considered:
a std::map<ContentElement, unsigned int> mapping contents to the number of requests they received. Not a good choice, as it requires me to dump everything to a list and sort it whenever I want to know the ranking of a content, which is very often.
a boost::multi_index_container with two indexes, a hashed_unique for the ContentElement and an ordered_not_unique for the number of requests. This allows me to quickly retrieve a content in order to update its number of requests and to keep the container sorted as I do this through a modify call, but my understanding of the ordered index is that it still forces me to iterate through all its element in order to figure the rank of a content - I could not figure a simple way of extracting the position in the ranking from the ordered iterator.
a boost::bimap between content elements and ranking position, supported by an external sorted vector storing the number of requests per content. Essentially the rank of a content would also represent the index of the vector element with its number of requests. This allows me to do everything I want to do (e.g., easily go from content to rank and viceversa) and sorting the vector after a new request should require at most two swaps in the bimap. However it feels clumsy and error-prone as I could easily loose sync between the map and the vector and then everything would fall apart.
My guts tell me there must be a much simpler and more elegant way of handling this, but I could not find it. Can anyone help?
There is no need to do a full sort. The key insight here is that a ranking can only change by +1 or -1 when it is accessed. I would do the following...
Keep the element in a container of your choice, e.g.
map< elementId, elementInstance >
Maintain a linked list of element rankings, something like this:
list< rankingInstance >
The rankingInstance has a pointer to an elementInstance and the value of the current rank and current number of accesses. On access, you:
access the element in the map
get its current rank, and access count
update the count
using the current rank, access the linked list
check the neighbors
swap position in list if necessary
if swapping occurred, go back and update the two elements whose rank changed
It may seem so simple, but my suggestion is to use Bubble Sort on your list. Since, Bubble Sort compares and switches only the adjacent elements, which is your case, simply one up or one down move in the ranking. Your vector may keep the 'Rank' as key, 'ContentHash' as value in a vector. A map containing 'Content' or 'Content Reference' will also needed. I hope this very simple approach gives some insights about your problem.

STL map like data structure to allow searching based on two keys

I want to make a map like structure to allow searching by two keys both will be strings, here's an example:
Myclass s;
Person p = s.find("David"); // searching by name
// OR
p = s.find("XXXXX"); // searching by ID
i don't want a code solution, i just want some help to get started like the structures i can use to achieve what i want, help is appreciated guys, it's finals week.
Put your records into a vector (or list). Add a pointer to the record objects to two maps, one with one key and one with the other.
There are many different ways how this could be achieved. The question is: what are the complexities of insert, delete and lookup operations that you aim for?
std::map is implemented as red-black tree that provides increadibly quick self-balancing (rotations) and all of mentioned operations (lookup/find, insert, delete) with complexity of O(log(n)). Note that this suits the idea of single key.
With 2 keys you can not keep elements sorted because the order based on one key will be most likely different than order based on the other one. The most straightforward and natural approach would be storing records in one container and holding the keys used by this container in 2 different structures, one optimized for retrieving this key given id and the other one for retrieving it given name.
If there is a constraint of storing everything at one place while you'd like to optimize find operation that will support two different keys, then you could create a wrapper of std::map<std::string, Person> where each element would be contained twice (each time under a different key), i.e. something like:
std::map<std::string, Person> myContainer;
...
Person p;
std::string id = "1E57A";
std::string name = "David";
myContainer[id] = p;
myContainer[name] = p;
I can think of 2 advantages of doing this:
quite satisfying performance:
lookup with complexity O(log(2*n))
insertion & deletion with complexity O(2*log(2*n))
extremely simple implementation (using existing container)
you just need to remember than the "expected" size of the container is half of its actual size
both of the keys: id and name should be attributes of Person so that when you find a concrete element given one of these keys, you immediately have the other one too
Disadvantage is that it will consume 2x so much memory and there might even be a constraint that:
none of the names should be an id of some other person at the same time and vice versa (no id should be a name of some other person)

How to find largest values in C++ Map

My teacher in my Data Structures class gave us an assignment to read in a book and count how many words there are. Thats not all; we need to display the 100 most common words. My gut says to sort the map, but I only need 100 words from the map. After googling around, is there a "Textbook Answer" to sorting maps by the value and not the key?
I doubt there's a "Textbook Answer", and the answer is no: you can't sort maps by value.
You could always create another map using the values. However, this is not the most efficient solution. What I think would be better is for you to chuck the values into a priority_queue, and then pop the first 100 off.
Note that you don't need to store the words in the second data structure. You can store pointers or references to the word, or even a map::iterator.
Now, there's another approach you could consider. That is to maintain a running order of the top 100 candidates as you build your first map. That way there would be no need to do the second pass and build an extra structure which, as you pointed out, is wasteful.
To do this efficiently you would probably use a heap-like approach and do a bubble-up whenever you update a value. Since the word counts only ever increase, this would suit the heap very nicely. However, you would have a maintenance issue on your hands. That is: how you reference the position of a value in the heap, and keeping track of values that fall off the bottom.