Is boost::graph the right tool for this task? - c++

I have the following simplified situation:
I need to create a tree like the following one:
I start at one node with a score of 3. From this node I calculate all possible next nodes which have the score 4, 6, 5 and 7. In the next step I only want to consider the two nodes with the highest score, in our case 6 and 7. From these two nodes I again calculate all possible next nodes. The highest score of all the next nodes are 12 and 13 so in the next step, I only want to consider these two nodes as my next start point. This means all nodes from the previous score 6 nodes are ignored from now on.
And so on...
I have no idea about graph theory right now (but I guess I have to do some research)
I looked for some libraries which might help me implement this in C++. I came across boost::graph which looks promising at a first glance. Downside is, it looks also also quite complex.
My question is:
Do you think boost::graph can be easily used to implement a tree like this? Is it worth spending some days trying to lean boost::graph, or is it not the right library for me.
Is there a better library? I quite like boost and have used it a lot, but not the graph library of it.
I think my requirements are
calculation of my nodes is quite expensive, therefore I need a tree/graph which saves the "score" of each node and can be easily extended later.
it should be possible to simply say _"from now on let's ignore those nodes" and focus on those two best ones.
I need to somehow easily be able to get access to the "end of my tree" at each step, so that I know which nodes are the current nodes I have to calculate the next possible nodes.
to calculate the score of my next nodes I need to be able to easily follow the tree back to its root (I do not only need the previous score but more information. My implementation would somehow require that each node is an object and a score and I need to have access to the members of those objects as well as to the score). This means I need to be able to reconstruct my way through the tree.
Note:
I guess it should be a kind of standard problem for graph theory. But right now I do not really know where to start my research. If you have any good literature, papers, descriptions, key words for google, for this problem please tell me.
Edit:
Many people have pointed out that a array might be sufficient. First of all: You might be right and thank you for your comment. My example was highly simplified. At each step I would need to calculate up to ten thousand of new possible nodes and I need to make roughly 1000 iterations. The Idea of only following a couple of nodes (in my example 2 but in my application most likely ~100 or something like "the best x percent" was simply to reduce the number of possibilities. I think it is obvious that the problem would otherwise explode quite quickly.
Edit2:
seems like there is some confusion about the numbers:
Right now the code runs sequential (no graph, just an array). For every possible new node I calculated a score based on some metrics. At the end I select the node which has the highest score and go on with the new iteration.
The idea how to implement it with a graph is the following:
Again for every possible node I calculate the score (this is the number over the lines that connect two nodes) but the thing we are interested in is the sum of all these scores through the graph which is the number I wrote in the nodes. This means I am interested in the path through the graph which leads to the highest sum of scores.

Related

Genetic Algorithm: Grouping students without knowing exact number of groups

I have students with defined levels. Some of the students are in the groups from previous week, some of them are new. Students from previous week should be kept in their groups.
Group has a level which is calculated by the average of containing student's levels. New student can be added to the group if difference of student level and group level is less than defined limit(for example 3). There is also minimum and maximum group sizes. If there is no enough space in the group we should create new one.
I have tried to solve with clustering algorithms(hierarchical and non hierarchical) but non of them works for my case.
I need to create minimum amount of groups.
I would like to know will genetic algorithm work . The genes of a chromosome would represent a single student and their assignment to a class. Fitness function part will use all constraints(max group size, min group size).
As I understood for applying genetic algorithm I need to know groups count which is not clear in my case. Any ideas?
Yes, genetic algorithm can work. I'm not sure where you get the idea that you have to know the quantity of groups. All a genetic algorithm needs is a generator for making children, a fitness function to judge which children are the best, and a few quantity parameters (how many to keep as parents for the next generation, how many children to produce, ... things that are in the generator).
I suggest that your individuals ("chromosomes") be a list of the groups for the new generation. To save time, your generator should yield only viable children: those that fulfill the group-size requirements. Any child that does not satisfy those should be skipped and replaced.
The main work in this scenario is setting up a generator that knows how to split groups: when you find that a new student requires a new group, then you have to draw min_group_size-1 students from other groups. If you have the entire population of new students at once, then you can make global decisions, instead.
Is this enough to move you in a useful direction?
Update per user comment:
You cannot guarantee finding the optimal answer with a genetic algorithm.
The number of chromosomes depends on what works best for you. You need to handle a variety of possible group assignments, as well as new groups. Here is where you have to experiment; welcome to machine learning.
I would start investigating with a "comfortable" number of chromosomes, perhaps the quantity of groups times sqrt(quantity of new students). Depending on time constraints, I'd think that somewhere from 20 to 200 chromosomes would be good for you. Your critical measures of success are how often it finds a great solution, and how much time you spend finding it.
Yes, forming students groups can be done with the help of optimization. Genetic Algorithm (GA) is not the only one optimization algorithm that has been applied to the specific problem, but also Particle Swarm Optimization (PSO). In a recent research, a PSO was implemented to classify students under unknown number of groups. PSO showed improved capabilities compared to GA. I think that all you need is the specific research.
The paper is: Forming automatic groups of learners using particle swarm optimization for applications of differentiated instruction
You can find the paper here: https://doi.org/10.1002/cae.22191
Perhaps the researchers could guide you through researchgate:
https://www.researchgate.net/publication/338078753
As I can see, the researchers used the characteristics of each cluster as a solution vector (chromosomes) combined with a threshold number determining the number of groups (very interesting - I think this is exactly what you need).
I hope I have helped you

How to implement TSP with dynamic in C++

Recently I asked a question on Stack Overflow asking for help to solve a problem. It is a travelling salesman problem where I have up to 40,000 cities but I only need to visit 15 of them.
I was pointed to use Dijkstra with a priority queue to make a connectivity matrix for the 15 cities I need to visit and then do TSP on that matrix with DP. I had previously only used Dijkstra with O(n^2). After trying to figure out how to implement Dijkstra, I finally did it (enough to optimize from 240 seconds to 0.6 for 40,000 cities). But now I am stuck at the TSP part.
Here are the materials I used for learning TSP :
Quora
GeeksForGeeks
I sort of understand the algorithm (but not completely), but I am having troubles implementing it. Before this I have done dynamic programming with arrays that would be dp[int] or dp[int][int]. But now when my dp matrix has to be dp[subset][int] I don't have any idea how should I do this.
My questions are :
How do I handle the subsets with dynamic programming? (an example in C++ would be appreciated)
Do the algorithms I linked to allow visiting cities more than once, and if they don't what should I change?
Should I perhaps use another TSP algorithm instead? (I noticed there are several ways to do it). Keep in mind that I must get the exact value, not approximate.
Edit:
After some more research I stumbled across some competitive programming contest lectures from Stanford and managed to find TSP here (slides 26-30). The key is to represent the subset as a bitmask. This still leaves my other questions unanswered though.
Can any changes be made to that algorithm to allow visiting a city more than once. If it can be done, what are those changes? Otherwise, what should I try?
I think you can use the dynamic solution and add to each pair of node a second edge with the shortest path. See also this question:Variation of TSP which visits multiple cities.
Here is a TSP implementation, you will find the link of the implemented problem in the post.
The algorithms you linked don't allow visiting cities more than once.
For your third question, I think Phpdna answer was good.
Can cities be visited more than once? Yes and no. In your first step, you reduce the problem to the 15 relevant cities. This results in a complete graph, i.e. one where every node is connected to every other node. The connection between two such nodes might involve multiple cities on the original map, including some of the relevant ones, but that shouldn't be relevant to your algorithm in the second step.
Whether to use a different algorithm, I would perhaps do a depth-first search through the graph. Using a minimum spanning tree, you can give an upper and lower bound to the remaining cities, and use that to pick promising solutions and to discard hopeless ones (aka pruning). There was also a bunch of research done on this topic, just search the web. For example, in cases where the map is actually carthesian (i.e. the travelling costs are the distance between two points on a plane), you can exploit this info to improve the algorithms a bit.
Lastly, if you really intend to increase the number of visited cities, you will find that the time for computing it increases vastly, so you will have to abandon your requirement for an exact solution.

dependency sort with detection of cyclic dependencies

Before you start throwing links to wikipedia and blogs in my face, please hear me out.
I'm trying to find the optimal algorithm/function to do a dependency sort on... stuff. Each item has a list of its dependencies.
I would like to have something iterator-based, but that's not very important.
What is important is that the algorithm points out exactly which items are part of the dependency cycle. I'd like to give detailed error information in this case.
Practically, I'm thinking of subclassing my items from a "dependency node" class, which has the necessary booleans/functions to get the job done. Cool (but descriptive) names are welcome :)
It's normally called a topological sort. Most books/papers/whatever that cover topological sorting will also cover cycle detection as a matter of course.
I don't exactly get why is it so hard to find the dependecy cycle if there is any! you just have to check if there is any node you already passed over while appling bfs algorithm to find out all the dependecies. if there is one you just roll back the way you came down to revisit a node alll the way up and mark all the nodes until you reach the first visit at the specified node. all the ones in your pass will be marked as a cycle. (just leave a comment and i'll give a code to do that if you need)

Graph - strongly connected components

Is there any fast way to determine the size of the largest strongly connected component in a graph?
I mean, like, the obvious approach would mean determining every SCC (could be done using two DFS calls, I suppose) and then looping through them and taking the maximum.
I'm pretty sure there has to be some better approach if I only need to have the size of that component and only the largest one, but I can't think of a good solution. Any ideas?
Thanks.
Let me answer your question with another question -
How can you determine which value in a set is the largest without examining all of the values?
Firstly you could use Tarjan's algorithm which needs only one DFS instead of two. If you understand the algorithm clearly, the SCCs form a DAG and this algo finds them in the reverse topological sort order. So if you have a sense of the graph (like a visual representation) and if you know that relative big SCCs occur at end of the DAG then you could stop the algorithm once first few SCCs are found.

Mahjong-solitaire solver algorithm, which needs a speed-up

I'm developing a Mahjong-solitaire solver and so far, I'm doing pretty good. However,
it is not so fast as I would like it to be so I'm asking for any additional optimization
techniques you guys might know of.
All the tiles are known from the layouts, but the solution isn't. At the moment, I have few
rules which guarantee safe removal of certain pairs of same tiles (which cannot be an obstacle to possible solution).
For clarity, a tile is free when it can be picked any time and tile is loose, when it doesn't bound any other tiles at all.
If there's four free free tiles available, remove them immediately.
If there's three tiles that can be picked up and at least one of them is a loose tile, remove the non-loose ones.
If there's three tiles that can be picked up and only one free tile (two looses), remove the free and one random loose.
If there's three loose tiles available, remove two of them (doesn't matter which ones).
Since there is four times the exact same tile, if two of them are left, remove them since they're the only ones left.
My algorithm searches solution in multiple threads recursively. Once a branch is finished (to a position where there is no more moves) and it didn't lead to a solution, it puts the position in a vector containing bad ones. Now, every time a new branch is launched it'll iterate via the bad positions to check, if that particular position has been already checked.
This process continues until solution is found or all possible positions are being checked.
This works nicely on a layout which contains, say, 36 or 72 tiles. But when there's more,
this algorithm becomes pretty much useless due to huge amount of positions to search from.
So, I ask you once more, if any of you have good ideas how to implement more rules for safe tile-removal or any other particular speed-up regarding the algorithm.
Very best regards,
nhaa123
I don't completely understand how your solver works. When you have a choice of moves, how do you decide which possibility to explore first?
If you pick an arbitrary one, it's not good enough - it's just brute search, basically. You might need to explore the "better branches" first. To determine which branches are "better", you need a heuristic function that evaluates a position. Then, you can use one of popular heuristic search algorithms. Check these:
A* search
beam search
(Google is your friend)
Some years ago, I wrote a program that solves Solitaire Mahjongg boards by peeking. I used it to examine one million turtles (took a day or something on half a computer: it had two cores) and it appeared that about 2.96 percent of them cannot be solved.
http://www.math.ru.nl/~debondt/mjsolver.html
Ok, that was not what you asked, but you might have a look at the code to find some pruning heuristics in it that did not cross your mind thus far. The program does not use more than a few megabytes of memory.
Instead of a vector containing the "bad" positions, use a set which has a constant lookup time instead of a linear one.
If 4 Tiles are visible but can not be picked up, the other Tiles around have to be removed first. The Path should use your Rules to remove a minimum of Tiles, towards these Tiles, to open them up.
If Tiles are hidden by other Tiles, the Problem has no full Information to find a Path and a Probability of remaining Tiles needs to be calculated.
Very nice Problem!