I am solving this question on LeetCode.com. A statement in the question says:
Some courses may have prerequisites, for example to take course 0 you have to first take course 1, which is expressed as a pair: [0,1]
My aim is to come up with a graphical representation. My question is, as per the above statement, should I create a graph from:
a. 0 -> 1; or
b. 1 -> 0?
The reason I am confused is, if I come up with the former, I would actually do the opposite of what is required - I would visit 0 before I do the prerequisite 1. On the other hand, if I do it the latter way, what if there's a scenario wherein to take course 0, I have to take multiple prerequisite courses, say, 1 and 2? Using the latter representation, I would end up completing course 0 from 1 (thanks to the edge), without first doing course 2.
How should I create the directional edge?
It doesn't matter. If you reverse all your edges and topological sort, the result that you'll get will be the reverse of some topological ordering of the original graph. Do it in whichever way makes the most sense to you.
Related
First of all, I should say I am not familiar with the Graph theory and also my mathematics knowledge is very poor. Anyhow I am using graph concepts for my analysis.
Basically, I am decomposing an undirected graph (say G) into cycles (closed graph). The specialty of my cycle is that they are the shortest cycles that one can traverse between two vertices (as they are cycle, starting and ending are same though). According to my example graph, my cycles are (1,4,5,1)(1,2,3,4,1)(7,9,8,7) (I neglect the cycles whose length is less than 3).
Edit: I use depth first search to get the cycles and then got the smallest cycles.
Later, I am further braking those cycles into directed paths. In here, I broke the cycles through the edges (through red lines in figure), so that I inserted starting and ending nodes for my new path graphs. So for the cycle (7,9,8,7)=> new directed paths are (a,9,c)(d,8,7,b)
Edit: the further breaking is done only for selected cycles. It is just inserting a new vector and updating the elements. Any graph theory related algorithms doesn't involve here.
Then I do some analysis with my data.
I did all above things. So, my problem is how to describe the entire
things with mathematical notations (without example like I said). This is very hard for me as I do not have even basics.
I was trying and googling but still cannot find a way to describe what I did. I guess, the thing what I did is clear for you.
So, Could you please help me, How to describe
decomposing a undirected graph into cycles (shortest cycles)
Cycle breaking via edges and make directed path graphs (as shown in figure)
with mathematical notation (according to graph theory)
I have seen many authors use different notations and symbols to define graphs and their sub graphs, but for me, I can not define such things as my basic are too poor. So, Please help me to say these things in a formal, mathematical way. Thanks in advance.
I have inserted sample figures to get idea also.
Note: I have add c++ tag as many computer scientists use graph theories and would like to have a response.
The first problem you might encounter in an attempt to put your operations in a mathematical description is your definition of the "shortest cycles" as cycles are typically defined as a sequence of vertices connected by edges in which the first one is also the last one.
Math crashcourse
In math a graph is typcally described by two sets V (like vertices) and E (like edges)
The set E consisting of sets with two elements each of them being a vertex.
Such as
V = { v1, v2, ...., vn }
E = { ..., {vi, vk}, ... }
Every set in E correspends to one edge in your graph.
As such a (connected) path is typcially defined as:
A sequence of vertices v1, ...., vn with the property that for every two consecutive vertexes in the sequence vi and vi+1 the set { vi, vi+1 } is an element of the set E.
(practically speaking: there is an edge from vertex vi to vertex vi+i)
A cycle is typically defined as a path with the property: v1 = vn (thus the first vertex is also the last one)
Whith this definition an your example already the sequence: 1, 4, 1 forms a cycle (in the mathematical sense)
As such every edge in your graph would count as a "shortest" cycle, while the examples given are definately longer!
You told that you
... neglect the cycles whose length is less than 3
this doesn't look to bad as a starting point for your description. Unfortunately I didn't completely understand the next steps you want to perform.
Advice
My advice, or the least the way I would approach the problem is to convert the rather long description to some kind of shorter algorithmic description while refining on exactly how you try to perform the task. This way getting to your final description shouldn't be too hard to accomplish. Especially don't forget to tell what exactly the input to your algorithm is. Even that doesn't seem to be too clear from your description.
are you starting with a already known set of "shortest" cycles?
or are you just given a graph as input and have to determine the "shortest" cycles yourself?
if you detect them yourself how exactly is this done?
Especially don't forget to tell about this part of the story if it applies as it seems to be one of the most crucial ones to your problem.
I want to generate all the Hamiltonian Cycles of a complete undirected graph (permutations of a set where loops and reverses count as duplicates, and are left out).
For example, permutations of {1,2,3} are
Standard Permutations:
1,2,3
1,3,2
2,1,3
2,3,1
3,1,2
3,2,1
What I want the program/algorithm to print for me:
1,2,3
Since 321 is just 123 backward, 312 is just 123 rotated one place, etc.
I see a lot of discussion on the number of these cycles a given set has, and algorithms to find if a graph has a Hamiltonian cycle or not, but nothing on how to enumerate them in a complete, undirected graph (i.e. a set of numbers that can be preceded or succeeded by any other number in the set).
I would really like an algorithm or C++ code to accomplish this task, or if you could direct me to where there is material on the topic. Thanks!
You can place some restrictions on the output to eliminate the unwanted permutations. Lets say we want to permute the numbers 1, ..., N. To avoid some special cases assume that N > 2.
To eliminate simple rotations we can require that the first place is 1. This is true, because an arbitrary permutation can always be rotated into this form.
To eliminate reverses we can require that the number at the second place must be smaller than the number at the last place. This is true, because from the two permutations starting with 1 that are reverses of each other, exactly one has this property.
So a very simple algorithm could enumerate all permutations and leave out the invalid ones. Of course there are optimisations possible. For example permutations that do not start with 1 can easily be avoided during the generation step.
An uber-lazy way to check if a path is the same one that starts in a different point in the cycle (IE, the same loop or the reverse of the same loop is this:
1: Decide that by convention all cycles will start from the lowest vertex number and continue in the direction of the lower of the two adjacent ordinals.
Hence, all of the above paths would be described in the same way.
The second, other useful bit of information here:
If you would like to check that two paths are the same you can concatenate one with its self and check if it contains either the second path or the reverse of the second path.
That is,
1 2 3 1 2 3
contains all of the above paths or their reverses. Since the process of finding all hamiltonian cycles seems much, much slower than the slight inefficiency this algorithm has, I felt I could throw it in :)
Question: Circulation problems allow you to have both a lower and an upper bound on the flow through a particular arc. The upper bound I understand (like pipes, there's only so much stuff that can go through). However, I'm having a difficult time understanding the lower bound idea. What does it mean? Will an algorithm for solving the problem...
try to make sure every arc with a lower bound will get at least that much flow, failing completely if it can't find a way?
simply disregard the arc if the lower bound can't be met? This would make more sense to me, but would mean there could be arcs with a flow of 0 in the resulting graph, i.e.
Context: I'm trying to find a way to quickly schedule a set of events, which each have a length and a set of possible times they can be scheduled at. I'm trying to reduce this problem to a circulation problem, for which efficient algorithms exist.
I put every event in a directed graph as a node, and supply it with the amount of time slots it should fill. Then I add all the possible times as nodes as well, and finally all the time slots, like this (all arcs point to the right):
The first two events have a single possible time and a length of 1, and the last event has a length of 4 and two possible times.
Does this graph make sense? More specifically, will the amount of time slots that get 'filled' be 2 (only the 'easy' ones) or six, like in the picture?
(I'm using a push-relabel algorithm from the LEMON library if that makes any difference.)
Regarding the general circulation problem:
I agree with #Helen; even though it may not be as intuitive to conceive of a practical use of a lower bound, it is a constraint that must be met. I don't believe you would be able to disregard this constraint, even when that flow is zero.
The flow = 0 case appeals to the more intuitive max flow problem (as pointed out by #KillianDS). In that case, if the flow between a pair of nodes is zero, then they cannot affect the "conservation of flow sum":
When no lower bound is given then (assuming flows are non-negative) a zero flow cannot influence the result, because
It cannot introduce a violation to the constraints
It cannot influence the sum (because it adds a zero term).
A practical example of a minimum flow could exist because of some external constraint (an associated problem requires at least X water go through a certain pipe, as pointed out by #Helen). Lower bound constraints could also arise from an equivalent dual problem, which minimizes the flow such that certain edges have lower bound (and finds an optimum equivalent to a maximization problem with an upper bound).
For your specific problem:
It seems like you're trying to get as many events done in a fixed set of time slots (where no two events can overlap in a time slot).
Consider the sets of time slots that could be assigned to a given event:
E1 -- { 9:10 }
E2 -- { 9:00 }
E3 -- { 9:20, 9:30, 9:40, 9:50 }
E3 -- { 9:00, 9:10, 9:20, 9:30 }
So you want to maximize the number of task assignments (i.e. events incident to edges that are turned "on") s.t. the resulting sets are pairwise disjoint (i.e. none of the assigned time slots overlap).
I believe this is NP-Hard because if you could solve this, you could use it to solve the maximal set packing problem (i.e. maximal set packing reduces to this). Your problem can be solved with integer linear programming, but in practice these problems can also be solved very well with greedy methods / branch and bound.
For instance, in your example problem. event E1 "conflicts" with E3 and E2 conflicts with E3. If E1 is assigned (there is only one option), then there is only one remaining possible assignment of E3 (the later assignment). If this assignment is taken for E3, then there is only one remaining assignment for E2. Furthermore, disjoint subgraphs (sets of events that cannot possibly conflict over resources) can be solved separately.
If it were me, I would start with a very simple greedy solution (assign tasks with fewer possible "slots" first), and then use that as the seed for a branch and bound solver (if the greedy solution found 4 task assignments, then bound if you recursive subtree of assignments cannot exceed 3). You could even squeeze out some extra performance by creating the graph of pairwise intersections between the sets and only informing the adjacent sets when an assignment is made. You can also update your best number of assignments as you continue the branch and bound (I think this is normal), so if you get lucky early, you converge quickly.
I've used this same idea to find the smallest set of proteins that would explain a set of identified peptides (protein pieces), and found it to be more than enough for practical problems. It's a very similar problem.
If you need bleeding edge performance:
When rephrased, integer linear programming can do nearly any variant of this problem that you'd like. Of course, in very bad cases it may be slow (in practice, it's probably going to work for you, especially if your graph is not very densely connected). If it doesn't, regular linear programming relaxations approximate the solution to the ILP and are generally quite good for this sort of problem.
Hope this helps.
The lower bound on the flow of an arc is a hard constraint. If the constraints can't be met, then the algorithm fails. In your case, they definitely can't be met.
Your problem can not be modeled with a pure network-flow model even with lower bounds. You are trying to get constraint that a flow is either 0 or at least some lower bound. That requires integer variables. However, the LEMON package does have an interface where you can add integer constraints. The flow out of each of the first layer of arcs must be either 0 or n where n is the number of required time-slots or you could say that at most one arc out of each "event" has nonzero flow.
Your "disjunction" constraint,
can be modeled as
f >= y * lower
f <= y * upper
with y restricted to being 0 or 1. If y is 0, then f can only be 0. If y is 1, the f can be any value between lower and upper. The mixed-integer programming algorithms will orders of magnitude slower than the network-flow algorithms, but they will model your problem.
The problem:
N nodes are related to each other by a 'closeness' factor ranging from 0 to 1, where a factor of 1 means that the two nodes have nothing in common and 0 means the two nodes are exactly alike.
If two nodes are both close to another node (i.e. they have a factor close to 0) then this doesn't mean that they will be close together, although probabilistically they do have a much higher chance of being close together.
-
The question:
If another node is placed in the set, find the node that it is closest to in the shortest possible amount of time.
This isn't a homework question, this is a real world problem that I need to solve - but I've never taken any algorithm courses etc so I don't have a clue what sort of algorithm I should be researching.
I can index all of the nodes before another one is added and gather closeness data between each node, but short of comparing all nodes to the new node I haven't been able to come up with an efficient solution. Any ideas or help would be much appreciated :)
Because your 'closeness' metric obeys the triangle inequality, you should be able to use a variant of BK-Trees to organize your elements. Adapting them to real numbers should simply be a matter of choosing an interval to quantize your number on, and otherwise using the standard Bk-Tree procedure. Some experimentation may be required - you might want to increase the resolution of the quantization as you progress down the tree, for instance.
but short of comparing all nodes to
the new node I haven't been able to
come up with an efficient solution
Without any other information about the relationships between nodes, this is the only way you can do it since you have to figure out the closeness factor between the new node and each existing node. A O(n) algorithm can be a perfectly decent solution.
One addition you might consider - keep in mind we have no idea what data structure you are using for your objects - is to organize all present nodes into a graph, where nodes with factors below a certain threshold can be considered connected, so you can first check nodes that are more likely to be similar/related.
If you want the optimal algorithm in terms of speed, but O(n^2) space, then for each node create a sorted list of other nodes (ordered by closeness).
When you get a new node, you have to add it to the indexed list of all the other nodes, and all the other nodes need to be added to its list.
To find the closest node, just find the first node on any node's list.
Since you already need O(n^2) space (in order to store all the closeness information you need basically an NxN matrix where A[i,j] represents the closeness between i and j) you might as well sort it and get O(1) retrieval.
If this closeness forms a linear spectrum (such that closeness to something implies closeness to other things that are close to it, and not being close implies not being close to those close), then you can simply do a binary or interpolation sort on insertion for closeness, handling one extra complexity: at each point you have to see if closeness increases or decreases below or above.
For example, if we consider letters - A is close to B but far from Z - then the pre-existing elements can be kept sorted, say: A, B, E, G, K, M, Q, Z. To insert say 'F', you start by comparing with the middle element, [3] G, and the one following that: [4] K. You establish that F is closer to G than K, so the best match is either at G or to the left, and we move halfway into the unexplored region to the left... 3/2=[1] B, followed by E, and we find E's closer to F, so the match is either at E or to its right. Halving the space between our earlier checks at [3] and [1], we test at [2] and find it equally-distant, so insert it in between.
EDIT: it may work better in probabilistic situations, and require less comparisons, to start at the ends of the spectrum and work your way in (e.g. compare F to A and Z, decide it's closer to A, see if A's closer or the halfway point [3] G). Also, it might be good to finish with a comparison to the closest few points either side of where the binary/interpolation led you.
ACM Surveys September 2001 carried two papers that might be relevant, at least for background. "Searching in Metric Spaces", lead author Chavez, and "Searching in High Dimensional Spaces - Index Structures for Improving the Performance of Multimedia Databases", lead author Bohm. From memory, if all you have is the triangle inequality, you can use it to some effect, but if you can trim your data down to a sensible number of dimensions, you can do better by using a search structure that knows about this dimensional structure.
Facebook has this thing where it puts you and all of your friends in a graph, then slowly moves everyone around until people are grouped together based on mutual friends and so on.
It looked to me like they just made anything <0.5 an attractive force, anything >0.5 a repulsive force, and moved people with every iteration based on the net force. After a couple hundred iterations, it was looking pretty darn good.
Note: this is not an algorithm it is a heuristic. In the facebook implementation I saw, two people were not able to reach equilibrium and kept dancing around each other. It turns out they were actually the same person with two different accounts.
Also, it took about 15 minutes on a decent computer and ~100 nodes. YMMV.
It looks suspiciously like a Nearest Neighbor Search problem (also called a similarity search)
I am given N numbers and for them apply M rules about their order. The rules are represented in a pairs of indexes and every pair (A, B) is telling that the number with index A (A-th number) must be AFTER the B-th number - it doesn't have to be next to him.
Ex: N = 4
1 2 3 4
M = 2
3 2
3 1
Output: 1234, 4213, 4123, 2134, 2143, 2413, 1423 ...Maybe there are even more:)
The algorithm should give me all the permutations available that don't break the rules, like in the example - 3 must always be after 2 and after 1.
I tried bruteforcing but it didn't work (although bruteforce should work in here, N is in the range (1,8). )
Any ideas ?
Just as a hint.
You can treat your set of rules as a graph. Each index is a vertex, each rule is a directed edge.
Any proper ordering of the numbers (i.e. a permutation that satisfies the rules) corresponds to so called topological ordering of the above graph. In order to generate all valid orderings of your numbers you need to generate all possible topological orderings of that graph.
P.S. The first algorithm for topological ordering given at the linked Wikipedia page already allows for a fairly straightforward solution that would enumerate all valid permutations. It will take some effort and some care to implement, but it is not rocket science.
Brute forcing would be going through every permutation, which is O(N!), and for each permutation simply looping through every rule to confirm that they aplpy, which is O(M). This ends up O(N!M) which is kind of ridiculous, but it shouldn't "not work" for such a small set.
Honestly, your best bet is to go back and get the brute force solution working. Once that is done (and if you still have time, etc) you can look for a better algorithm.
EDIT to the down voter. The student is (should be) trying to get his homework done on time. By the sounds of it, his homework is a programming exercise where a brute-force solution would be adequate. Helping him to figure out an efficient algorithm is not addressing his REAL problem.
In this case he has tried the simple brute-force approach (which everyone agrees ought to work for small N values) and given up on it prematurely to try something that is probably more difficult. Any experienced developer will tell you that this is a bad idea. The student needs and deserves to be told so, and if he is sensible he will pay attention. But obviously, it his choice ...