What is the most efficient way to rank the item of a list based on the preference of the user by showing two item at a time? - list

For some context, I recommend to watch this video from Tom Scott in which he determines what the best "thing" is : https://youtu.be/ALy6e7GbDRQ. I think it will help for the explanation of my question.
Basically I am trying to make an algorithm/program that makes one user rank all the items from a list by choosing which one he likes the most between two items at a time.
For example with the most basic list with 3 items: A, B and C.
The order of operations would look like this:
The user is presented with the choice: A or C.
And he prefer C.
The user is presented with the choice: B or C.
And he prefer C.
The user is presented with the choice: A or B.
and he prefer A.
So we know that the ranked order is [C > A > B] in 3 comparisons.
But what if at step 4, the user would have chosen B? We could assume by logic that he prefer B over A before step 5. So we would know that the ranked list would be [B > C > A] in only 2 comparison and it is as accurate as the first situation. Consequently, we can see that the number of steps depends on the user's choices.
So what is the right or most efficient way to rank all the items in the smallest number of comparisons? I thought about literally comparing every possible combination and have a counter for every item that goes up each time an item is selected over the other. But with that way the number of comparisons to do would just grow exponentially depending on the size of the list and it would not be the most efficient way just like in the example above. I also thought about using a simple sort algorithm in which the user would do the comparison "manually" but I am not very familiar with sort algorithm in general.
In Tom Scott's video, the website pick two items at random and he uses such a big pool of user that the problem just corrects itself by probability and statistic and he does not need to worry about having every item being picked equally to be accurate. Since I just want one user to rank the items, I need to find a way to pick the right items to compare to be the most efficient possible (I think). Another difference is that I will not have over 7000 items like in Tom's version. I want to use this to rank for example "All Disney movies", "Certain music genres" or "All Zelda video games" so I am pretty sure that I will have at max maybe around 100 items.
My question is : am I on the right track? Should I use a simple sort algorithm (if yes which one should I use?) or is the only mathematical way to do this effectively is to compare every combination? Or am I missing something that could help me? If I brute force every combination, it will clearly be the most simple and accurate way to rank the items but it will certainly not be the most efficient. I suppose that the order of the comparisons and which items we compare really matter to be the most efficient and this is where I am lost.
I know it is not really a conventional question to ask here but thanks for helping me or orienting me on the right path.

Assuming that the user will create a total ordering of items, the algorithm should maintain a fully ordered list of an increasing number of items. Take the next unordered item and use the comparisons needed for binary search to insert it in the ordered list.
Take the next unordered item and use binary search comparisons of two items to insert that.
Repeat until all items are ordered.
The binary searches should minimise the comparisons at each stage and present the user with more meaningful comparisons over time for each new item.

Related

data structure advice on c++

I am looking for data structure in c++ and I need an advice.
I have nodes, every node has unique_id and group_id:
1 1.1.1.1
2 1.1.1.2
3 1.1.1.3
4 1.1.2.1
5 1.1.2.2
6 1.1.2.3
7 2.1.1.1
8 2.1.1.2
I need a data structure to answer those questions:
what is the group_id of node 4
give me list (probably vector) of unique_id's that belong to group 1.1.1
give me list (probably vector) of unique_id's that belong to group 1.1
give me list (probably vector) of unique_id's that belong to group 1
Is there a data structure that can answer those questions (what is the complexity time of inserting and answering)? or should I implement it?
I would appreciate an example.
EDIT:
at the beginning, I need to build this data structure. most of the action is reading by group id. insertion will happen but less then reading.
the time complexity is more important than memory space
To me, hierarchical data like the group ID calls for a tree structure. (I assume that for 500 elements this is not really necessary, but it seems natural and scales well.)
Each element in the first two levels of the tree would just hold vectors (if they come ordered) or maps (if they come un-ordered) of sub-IDs.
The third level in the tree hierarchy would hold pointers to leaves, again in a vector or map, which contain the fourth group ID part and the unique ID.
Questions 2-4 are easily and quickly answered by navigating the tree.
For question 1 one needs an additional map from unique IDs to leaves in the tree; each element inserted into the tree also has a pointer to it inserted into the map.
First of all, if you are going to have only a small number of nodes then it would probably make sense not to mess with advanced data structuring. Simple linear search could be sufficient.
Next, it looks like a good job for SQL. So may be it's a good idea to incorporate into your app SQLite library. But even if you really want to do it without SQL it's still a good hint: what you need are two index trees to support quick searching through your array. The complexity (if using balanced trees) will be logarithmic for all operations.
Depends...
How often do you insert? Or do you mostly read?
How often do you access by Id or GroupId?
With a max of 500 nodes I would put them in a simple Vector where the Id is the offset into the array (if the Ids are indeed as shown). The group-search can than be implemented by iterating over the array and comparing the partial gtroup-ids.
If this is too expensive and you really access the strcuture a lot and need very high performance, or you do a lot of inserts I would implement a tree with a HashMap for the Id's.
If the data is stored in a database you may use a SELECT/ CONNECT BY if your systems supports that and query the information directly from the DB.
Sorry for not providing a clear answer, but the solution depends on too many factors ;-)
Sounds like you need a container with two separate indexes on unique_id and group_id. Question 1 will be handled by the first index, Questions 2-4 will be handled by the second.
Maybe take a look at Boost Multi-index Containers Library
I am not sure of the perfect DS for this. But I would like to make use of a map.
It will give you O(1) efficiency for question 1 and for insertion O(logn) and deletion. The issue comes for question 2,3,4 where your efficiency will be O(n) where n is the number of nodes.

Is std::map a good solution?

All,
I have following task.
I have finite number of strings (categories). Then in each category there will be a set of team and the value pairs. The number of team is finite based on the user selection.
Both sizes are not more than 25.
Now the value will change based on the user input and when it change the team should be sorted based on the value.
I was hoping that STL has some kind of auto sorted vector or list container, but the only thing I could find is std::map<>.
So what I think I need is:
struct Foo
{
std::string team;
double value;
operator<();
};
std::map<std::string,std::vector<Foo>> myContainer;
and just call std::sort() when the value will change.
Or is there more efficient way to do it?
[EDIT]
I guess I need to clarify what I mean.
Think about it this way.
You have a table. The rows of this table are teams. The columns of this table are categories. The cells of this table are divided in half. Top half is the category value for a given team. This value is increasing with every player.
Now when the player is added to a team, the scoring categories of the player will be added to a team and the data in the columns will be sorted. So, for category "A" it may be team1, team2; and for category "B" it may be team2, team1.
Then based on the position of each team the score will be assigned for each team/category.
And that score I will need to display.
I hope this will clarify what I am trying to achieve and it become more clear of what I'm looking for.
[/EDIT]
It really depend how often you are going to modify the data in the map and how often you're just going to be searching for the std::string and grabbing the vector.
If your access pattern is add map entry then fill all entries in the vector then access the next, fill all entries in the vector, etc. Then randomly access the map for the vector afterwards then .. no map is probably not the best container. You'd be better off using a vector containing a standard pair of the string and the vector, then sort it once everything has been added.
In fact organising it as above is probably the most efficient way of setting it up (I admit this is not always possible however). Furthermore it would be highly advisable to use some sort of hash value in place of the std::string as a hash compare is many times faster than a string compare. You also have the string stored in Foo anyway.
map will, however, work but it really depends on exactly what you are trying to do.

C++ Inventory & Item Crafting System - turning "stuff" into hash_tag "garblygook" and reversing the hash_tags to get real "stuff"

I'm creating a item crafting system for a game and need to be able to take any random selected items that a player could select and transform whatever items selected into a hash_tag which can then be compared to all the hash_tags from all item-mixes possible, searching for a correct match. This should be the simplest and fastest means to get the result I'm looking for, but of all other ways of doing this sort of thing (I've experience with just about all of them), hash tags are the one thing I've never even slightly touched. I've no idea where to even begin, and could use a lot of help with this.
Basically what it needs to do is allow the player to select anything he or she has, combine the selected things into a hash_tag and check the hash tag board for that number. Whether or not that number results in a "valid combination" or a "this is not a valid combination" doesn't matter, so long as all possible mixes are available on the hash tag board.
On the side there'll obviously be some code for picking things and removing them if there's a valid match and adding in the new item instead, but that's not what I need help with.
(Although anyone with suggestions on this I'll be glad to hear them!)
From what I have gathered so far you have an ordered list of inputs
(the items being crafted) and are looking for a function that returns
a hash (probably for easy comparisons and storage) and also has the
property of being reversible.
Such a thing cannot exist for the general case as long as your hash
has less bits than your input data hashing will produce collisions and
with those collisions the backward transformation will be impossible.
A good start would be just to choose unique identifiers for each item
and use a list of those identifiers(order them by size if order is
irrelevant to the crafting) as the hash. Comparison will still be
reasonable fast.

Hard sorting problem - what type of algorithm should I be using?

The problem:
N nodes are related to each other by a 'closeness' factor ranging from 0 to 1, where a factor of 1 means that the two nodes have nothing in common and 0 means the two nodes are exactly alike.
If two nodes are both close to another node (i.e. they have a factor close to 0) then this doesn't mean that they will be close together, although probabilistically they do have a much higher chance of being close together.
-
The question:
If another node is placed in the set, find the node that it is closest to in the shortest possible amount of time.
This isn't a homework question, this is a real world problem that I need to solve - but I've never taken any algorithm courses etc so I don't have a clue what sort of algorithm I should be researching.
I can index all of the nodes before another one is added and gather closeness data between each node, but short of comparing all nodes to the new node I haven't been able to come up with an efficient solution. Any ideas or help would be much appreciated :)
Because your 'closeness' metric obeys the triangle inequality, you should be able to use a variant of BK-Trees to organize your elements. Adapting them to real numbers should simply be a matter of choosing an interval to quantize your number on, and otherwise using the standard Bk-Tree procedure. Some experimentation may be required - you might want to increase the resolution of the quantization as you progress down the tree, for instance.
but short of comparing all nodes to
the new node I haven't been able to
come up with an efficient solution
Without any other information about the relationships between nodes, this is the only way you can do it since you have to figure out the closeness factor between the new node and each existing node. A O(n) algorithm can be a perfectly decent solution.
One addition you might consider - keep in mind we have no idea what data structure you are using for your objects - is to organize all present nodes into a graph, where nodes with factors below a certain threshold can be considered connected, so you can first check nodes that are more likely to be similar/related.
If you want the optimal algorithm in terms of speed, but O(n^2) space, then for each node create a sorted list of other nodes (ordered by closeness).
When you get a new node, you have to add it to the indexed list of all the other nodes, and all the other nodes need to be added to its list.
To find the closest node, just find the first node on any node's list.
Since you already need O(n^2) space (in order to store all the closeness information you need basically an NxN matrix where A[i,j] represents the closeness between i and j) you might as well sort it and get O(1) retrieval.
If this closeness forms a linear spectrum (such that closeness to something implies closeness to other things that are close to it, and not being close implies not being close to those close), then you can simply do a binary or interpolation sort on insertion for closeness, handling one extra complexity: at each point you have to see if closeness increases or decreases below or above.
For example, if we consider letters - A is close to B but far from Z - then the pre-existing elements can be kept sorted, say: A, B, E, G, K, M, Q, Z. To insert say 'F', you start by comparing with the middle element, [3] G, and the one following that: [4] K. You establish that F is closer to G than K, so the best match is either at G or to the left, and we move halfway into the unexplored region to the left... 3/2=[1] B, followed by E, and we find E's closer to F, so the match is either at E or to its right. Halving the space between our earlier checks at [3] and [1], we test at [2] and find it equally-distant, so insert it in between.
EDIT: it may work better in probabilistic situations, and require less comparisons, to start at the ends of the spectrum and work your way in (e.g. compare F to A and Z, decide it's closer to A, see if A's closer or the halfway point [3] G). Also, it might be good to finish with a comparison to the closest few points either side of where the binary/interpolation led you.
ACM Surveys September 2001 carried two papers that might be relevant, at least for background. "Searching in Metric Spaces", lead author Chavez, and "Searching in High Dimensional Spaces - Index Structures for Improving the Performance of Multimedia Databases", lead author Bohm. From memory, if all you have is the triangle inequality, you can use it to some effect, but if you can trim your data down to a sensible number of dimensions, you can do better by using a search structure that knows about this dimensional structure.
Facebook has this thing where it puts you and all of your friends in a graph, then slowly moves everyone around until people are grouped together based on mutual friends and so on.
It looked to me like they just made anything <0.5 an attractive force, anything >0.5 a repulsive force, and moved people with every iteration based on the net force. After a couple hundred iterations, it was looking pretty darn good.
Note: this is not an algorithm it is a heuristic. In the facebook implementation I saw, two people were not able to reach equilibrium and kept dancing around each other. It turns out they were actually the same person with two different accounts.
Also, it took about 15 minutes on a decent computer and ~100 nodes. YMMV.
It looks suspiciously like a Nearest Neighbor Search problem (also called a similarity search)

Finding all permutations that match a set of rules

I am given N numbers and for them apply M rules about their order. The rules are represented in a pairs of indexes and every pair (A, B) is telling that the number with index A (A-th number) must be AFTER the B-th number - it doesn't have to be next to him.
Ex: N = 4
1 2 3 4
M = 2
3 2
3 1
Output: 1234, 4213, 4123, 2134, 2143, 2413, 1423 ...Maybe there are even more:)
The algorithm should give me all the permutations available that don't break the rules, like in the example - 3 must always be after 2 and after 1.
I tried bruteforcing but it didn't work (although bruteforce should work in here, N is in the range (1,8). )
Any ideas ?
Just as a hint.
You can treat your set of rules as a graph. Each index is a vertex, each rule is a directed edge.
Any proper ordering of the numbers (i.e. a permutation that satisfies the rules) corresponds to so called topological ordering of the above graph. In order to generate all valid orderings of your numbers you need to generate all possible topological orderings of that graph.
P.S. The first algorithm for topological ordering given at the linked Wikipedia page already allows for a fairly straightforward solution that would enumerate all valid permutations. It will take some effort and some care to implement, but it is not rocket science.
Brute forcing would be going through every permutation, which is O(N!), and for each permutation simply looping through every rule to confirm that they aplpy, which is O(M). This ends up O(N!M) which is kind of ridiculous, but it shouldn't "not work" for such a small set.
Honestly, your best bet is to go back and get the brute force solution working. Once that is done (and if you still have time, etc) you can look for a better algorithm.
EDIT to the down voter. The student is (should be) trying to get his homework done on time. By the sounds of it, his homework is a programming exercise where a brute-force solution would be adequate. Helping him to figure out an efficient algorithm is not addressing his REAL problem.
In this case he has tried the simple brute-force approach (which everyone agrees ought to work for small N values) and given up on it prematurely to try something that is probably more difficult. Any experienced developer will tell you that this is a bad idea. The student needs and deserves to be told so, and if he is sensible he will pay attention. But obviously, it his choice ...