Implementing a mutable ranking table in c++ - c++

In an event-driven simulator, I need to keep track of the popularity of a large number of content elements from a catalog. More specifically I am interested in knowing the rank of any given content, i.e. its position in a list sorted by descending number of requests. I know that the number of requests per content is only going to be increased by one each time, so there is no dramatic shift in the ranking. Furthermore, elements are inserted or deleted from the catalog only in rare occasions, while requests are much more numerous and frequent. What is the best data structure to implement this?
These are the options that I have considered:
a std::map<ContentElement, unsigned int> mapping contents to the number of requests they received. Not a good choice, as it requires me to dump everything to a list and sort it whenever I want to know the ranking of a content, which is very often.
a boost::multi_index_container with two indexes, a hashed_unique for the ContentElement and an ordered_not_unique for the number of requests. This allows me to quickly retrieve a content in order to update its number of requests and to keep the container sorted as I do this through a modify call, but my understanding of the ordered index is that it still forces me to iterate through all its element in order to figure the rank of a content - I could not figure a simple way of extracting the position in the ranking from the ordered iterator.
a boost::bimap between content elements and ranking position, supported by an external sorted vector storing the number of requests per content. Essentially the rank of a content would also represent the index of the vector element with its number of requests. This allows me to do everything I want to do (e.g., easily go from content to rank and viceversa) and sorting the vector after a new request should require at most two swaps in the bimap. However it feels clumsy and error-prone as I could easily loose sync between the map and the vector and then everything would fall apart.
My guts tell me there must be a much simpler and more elegant way of handling this, but I could not find it. Can anyone help?

There is no need to do a full sort. The key insight here is that a ranking can only change by +1 or -1 when it is accessed. I would do the following...
Keep the element in a container of your choice, e.g.
map< elementId, elementInstance >
Maintain a linked list of element rankings, something like this:
list< rankingInstance >
The rankingInstance has a pointer to an elementInstance and the value of the current rank and current number of accesses. On access, you:
access the element in the map
get its current rank, and access count
update the count
using the current rank, access the linked list
check the neighbors
swap position in list if necessary
if swapping occurred, go back and update the two elements whose rank changed

It may seem so simple, but my suggestion is to use Bubble Sort on your list. Since, Bubble Sort compares and switches only the adjacent elements, which is your case, simply one up or one down move in the ranking. Your vector may keep the 'Rank' as key, 'ContentHash' as value in a vector. A map containing 'Content' or 'Content Reference' will also needed. I hope this very simple approach gives some insights about your problem.

Related

Is there any way to move the cursor position of a linked list in constant time?

I have a linked list like this:
Head->A->B->C->D->Tail.
There can be N (1<N<10^5) items in the list.
The current cursor position is, cursor->B which is 2 if we think like an array.
I have to perform the following operation on my list:
insert x characters in the list at the cursor position and update the
cursor.
delete y (y < N) characters starting from the
cursor position and update the cursor.
move the cursor to a specific position within in the list.
I want all this operation in constant time.
Can anyone kindly help by suggesting any data structure model?
There isn't. Searching / iterating is linear in complexity - O(n). If you want a constant complexity, you need to use the different data structure. Since you are using C++, you should utilize one from the Containers library.
If the data can be sorted then by using "skip lists" a speed up can be achieved.
The principle is that extra pointers are used to skip ahead.
skip list is a data structure that allows fast search within an ordered sequence of elements. Fast search is made possible by maintaining a linked hierarchy of subsequences, with each successive subsequence skipping over fewer elements than the previous one ...
wikipedia
Therefore, with O(√n) extra space, we are able to reduce the time complexity to O(√n).
Skip-list
Of course it is not possible to use the linked list for that. As said before a linked list has a linear complexity.
You can try to use a more complex data structure like a hash as a lookup-container for the items in your list, which has a complexity of - O(n). Instead of storing the items itself the stored item can contain a pointer / index showing to the next item. But you have to keep in mind that the deletion will be still expensive because when removing one item you have to refresh the links showing to this item as well, So the item itself will need to know, if any other items are pointing to it.

LRU sorted by score in C++, is there such container?

I need to implement a very efficient cache LRU with the following properties: entries are indices in a vector of cache entries, each cache hit updates an empirical score, computed from some values that can be kept in the container value, like number of hits, size of matched object etc.
I need to be able to quickly pick a victim for cache eviction from the bottom of such LRU, and be able to quickly iterate over some number of the best-performing entries from the top, so such container needs to be sorted.
So far, I was only be able to come up with a vector of structures that hold values for score calculation that are updated, and bi-directional links, which I use to put the updated element in place after score recalculation by linear search from its current position and score comparison. This search may obviously happen upwards (when the score is updated, always getting bigger) and downwards (when an element is evicted, and its score resets to 0). Linear search may not be so bad, because this is running for a long time, and scores of elements that survive grow large, and each increment is small, so the element does not have to move very far to get to its place, and in case of reset I can start search from bottom.
I am aware of STL sorted containers, folly's cache LRU implementation, and Boost.Bimap (this last one seems to be an overkill for what I need).
Can I do better than a linear search here? Does anyone know of an implementation?
Thanks in advance!
Update: implemented a solution that involves a vector of iterators into a set that has index into the vector (for uniqueness) + necessary data to compute the score, with comparator sorting by the score.
Seems to work well, maybe there is a more elegant solution out there?

C++ data structure to perform indexed list

I am looking for the most efficient data structure to maintain an indexed list. You can easily view it interms of a STL map :
std::map<int,std::vector<int> > eff_ds;
I am using this as an example because I am currently using this setup. The operations that I would like to perform are :
Insert values based on key : similar to eff_ds[key].push_back(..);
Print the contents of the data structure in terms of each key.
I am also trying to use an unordered map and a forward list,
std::unordered_map<int,std::forward_list<int> > eff_ds;
Is this the best I could do in terms of time if I use C++ or are there other options ?
UPDATE:
I can do insertion either way - front/back as long as I do the same for all the keys. To make my problem more clear, consider the following:
At each iteration of my algorithm, I am going to have an external block give me a (key,value) - both of which are single integers - pair as an output. Of course, I will have to insert this value to the corresponding key. Also, at different iterations, the same key might be returned with different values. At the end my output data(written to a file) should look something like this:
k1: v1 v2 v3 v4
k2: v5 v6 v7
k3: v8
.
.
.
kn: vm
The number of these iterations are pretty large ~1m.
There are two dimensions to your problem:
What is the best container to use where you want to be able to look up the items in the container using a numeric key, with a large number of keys, and the keys are sparse
A numeric key might lend itself to a vector for this, however if the keys are sparsely populated that would waste a lot of memory.
Assuming you do not want to iterate through the keys in order (which you did not state as a requirement), then an unordered_map might be the best bet.
What is the best container for a list of numbers, allowing for insertion at either end and the ability to retrieve the list of numbers in order (the value type of the outer map)
The answer to this will depend on how frequently you want to insert elements at the front. If that is commonly occurring then you might want to consider a forward_list. If you are mainly inserting on the end then a vector would be lower overhead.
Based on your updated question, since you can limit yourself to adding the values to the end of the lists, and since you are not concerned with duplicate entries in the lists, I would recommend using std::unordered_map<int,vector<int> >

Is std::map a good solution?

All,
I have following task.
I have finite number of strings (categories). Then in each category there will be a set of team and the value pairs. The number of team is finite based on the user selection.
Both sizes are not more than 25.
Now the value will change based on the user input and when it change the team should be sorted based on the value.
I was hoping that STL has some kind of auto sorted vector or list container, but the only thing I could find is std::map<>.
So what I think I need is:
struct Foo
{
std::string team;
double value;
operator<();
};
std::map<std::string,std::vector<Foo>> myContainer;
and just call std::sort() when the value will change.
Or is there more efficient way to do it?
[EDIT]
I guess I need to clarify what I mean.
Think about it this way.
You have a table. The rows of this table are teams. The columns of this table are categories. The cells of this table are divided in half. Top half is the category value for a given team. This value is increasing with every player.
Now when the player is added to a team, the scoring categories of the player will be added to a team and the data in the columns will be sorted. So, for category "A" it may be team1, team2; and for category "B" it may be team2, team1.
Then based on the position of each team the score will be assigned for each team/category.
And that score I will need to display.
I hope this will clarify what I am trying to achieve and it become more clear of what I'm looking for.
[/EDIT]
It really depend how often you are going to modify the data in the map and how often you're just going to be searching for the std::string and grabbing the vector.
If your access pattern is add map entry then fill all entries in the vector then access the next, fill all entries in the vector, etc. Then randomly access the map for the vector afterwards then .. no map is probably not the best container. You'd be better off using a vector containing a standard pair of the string and the vector, then sort it once everything has been added.
In fact organising it as above is probably the most efficient way of setting it up (I admit this is not always possible however). Furthermore it would be highly advisable to use some sort of hash value in place of the std::string as a hash compare is many times faster than a string compare. You also have the string stored in Foo anyway.
map will, however, work but it really depends on exactly what you are trying to do.

How to repeatedly insert elements into a sorted list fast

I do not have formal CS training, so bear with me.
I need to do a simulation, which can abstracted away to the following (omitting the details):
We have a list of real numbers representing the times of events. In
each step, we
remove the first event, and
as a result of "processing" it, a few other events may get inserted into the list at a strictly later time
and repeat this many times.
Questions
What data structure / algorithm can I use to implement this as efficiently as possible? I need to increase the number of events/numbers in the list significantly. The priority is to make this as fast as possible for a long list.
Since I'm doing this in C++, what data structures are already available in the STL or boost that will make it simple to implement this?
More details:
The number of events in the list is variable, but it's guaranteed to be between n and 2*n where n is some simulation parameter. While the event times are increasing, the time-difference of the latest and earliest events is also guaranteed to be less than a constant T. Finally, I suspect that the density of events in time, while not constant, also has an upper and lower bound (i.e. all the events will never be strongly clustered around a single point in time)
Efforts so far:
As the title of the question says, I was thinking of using a sorted list of numbers. If I use a linked list for constant time insertion, then I have trouble finding the position where to insert new events in a fast (sublinear) way.
Right now I am using an approximation where I divide time into buckets, and keep track of how many event are there in each bucket. Then process the buckets one-by-one as time "passes", always adding a new bucket at the end when removing one from the front, thus keeping the number of buckets constant. This is fast, but only an approximation.
A min-heap might suit your needs. There's an explanation here and I think STL provides the priority_queue for you.
Insertion time is O(log N), removal is O(log N)
It sounds like you need/want a priority queue. If memory serves, the priority queue adapter in the standard library is written to retrieve the largest items instead of the smallest, so you'll have to specify that it use std::greater for comparison.
Other than that, it provides just about exactly what you've asked for: the ability to quickly access/remove the smallest/largest item, and the ability to insert new items quickly. While it doesn't maintain all the items in order, it does maintain enough order that it can still find/remove the one smallest (or largest) item quickly.
I would start with a basic priority queue, and see if that's fast enough.
If not, then you can look at writing something custom.
http://en.wikipedia.org/wiki/Priority_queue
A binary tree is always sorted and has faster access times than a linear list. Search, insert and delete times are O(log(n)).
But it depends whether the items have to be sorted all the time, or only after the process is finished. In the latter case a hash table is probably faster. At the end of the process you then would copy the items to an array or a list and sort it.