DAG and Top Sort - directed-acyclic-graphs

"Arranging the vertices of a DAG according to increasing pre-number results in a topological sort." is not a true statement apparently, but I'm not seeing why it isn't. If the graph is directed and doesn't have cycles, then shouldn't the order in which we visit the vertices necessarily be the correct order in which we sort it topologically?

Arranging by increasing pre-number does not guarantee a valid topological sort. Consider this graph:
A
↓
B → C → D
The two valid topological orders of this graph are:
A, B, C, D
B, A, C, D
If you were to visit the nodes beginning with C, one possible pre-number order would be:
C, D, A, B
That is not a valid topological order. An even simpler example is this graph:
B → A
There is clearly one valid topological order, but if we were to visit A first and sort by pre-number, the resulting order would be backwards.

Related

Is it possible to create back DAG from given all possible topological sort?

Is it possible to create back original DAG from given all possible topological sort of a DAG? And what if some n number(less than total possible topological sorts) of topological sorts given, can a DAG be constructed such that it satisfies given n topological orderings ?
Given the vertices of your DAG, construct a new relation a->b defined by a->b if and only if a appears before b in all the given topological sorts.
This new graph is acyclic, transitive, and compatible with the given topological sorts.
Even given all topological orderings, it's not possible to uniquely determine the DAG, since the following two three-vertex DAGs have the same topological orderings.
a -> b -> c
a -> b -> c
\________^
However, the above procedure produces the transitive closure of the original DAG, if you're given all possible topological orderings. Because if there's a path from a to b in the original DAG, then necessarily a appears before b in every topological ordering. Conversely, if a appears before b in every topological ordering, then there must be a path between a and b in the original DAG. For if not, add an edge between b and a in the original DAG (this can't form a cycle, since there's no path from a to b), and topologically sort the DAG. This gives a valid topological sort of the original DAG where b necessarily appears before a, giving a contradiction.

Correct DFS traversal of graph

[Question] (http://imgur.com/KHBuDcf)
[Attempted Answer] (http://imgur.com/aO0lblA)
since it's DFS traversal of graph we use a stack so i visited A as it's given in the question then i went to B since it's a directed graph then to C since C doesn't have anyother directions so i have to visit back my stack i.e B now i went to D now D either leads to C or i have to move back in my stack so i moved to B (since i already visited C) B is exhausted again so i went back to A, now my A leads me to the F and it's reversable G,H don't even have a link so is it the correct way to ignore them or i should visit them aswell ? what should be the correct DFS traversal answer?
The concept behind the DFS/BFS is that you should only visit the connected nodes while traversing not the disconnected nodes so indeed the visited nodes are correct and the order in which you have visited is also correct but you should really try to make the stack representation a little bit clear since it's tough to make out how you followed the stack in the image to your attempt.
The best way to represent a DFS traversal is by building the corresponding spanning tree.
The graph of the figure in your question can be represented using adjacency lists that way:
A: B, C, F
B: C, D
C:
D: A, C
E: C, G
F: A, C
G: E
A DFS starting from A will only visit A, B, C, D and F and yield the following spanning tree:
You can add to this tree the unused edges (those ignored because they lead to allready visited vertices) and even give their classification (forward edges, backward edges and cross edges.) But the spanning tree is probably enough.
For more general consideration: a DFS is a recursive procedure (which can be simulated using a stack) that can be described this way:
DFS(g, cur, mark):
mark[cur] = True
foreach s in successors of cur in g:
if not mark[s]:
DFS(g, s, mark)
But you should already know ...
EDIT: here is a simple implementation in python that produces the dot used to build figure:
https://gist.github.com/slashvar/d4954d04352fc38356f774198e3aa86b

Efficient trick for maximal bipartite matching in bigger graph

Problem: We are given two arrays A & B of integers. Now in each step we are allowed to remove any 2 non co-prime integers each from the two arrays. We have to find the maximal number of pairs that can be removed by these steps.
Bounds:
length of A, B <=105
every integer <=109
Dinic's algorithm - O(V2E)
Edmonds-karp algorithm - O(VE2)
Hopcroft–Karp algorithm - O(E sqrt(V))
My approach up till now: This can be modeled as bipartite matching problem with two sets A and B and edges can be created between every non co-prime pair of integers from the corresponding set.
But the problem is that there can be O(V2) edges in the graph and most Bipartite matching and max-flow algorithms will be super slow for such large graphs.
I am looking for some problem specific or mathematical optimization that can solve the problem in reasonable time. To pass the test cases i need at most O(V log V) or O(V sqrt(V)) algorithm.
Thanks in advance.
You could try making a graph with vertices for:
A source
Every element in A
Every prime present in any number in A
Every element in B
A destination
Add directed edges with capacity 1 from source to elements in A, and from elements in B to destination.
Add directed edges with capacity 1 from each element x in A to every distinct prime in the prime factorisation of x.
Add directed edges with capacity 1 from each prime p to every element x in B where p divides x
Then solve for max flow from source to destination.
The numbers will have a small number of factors (at most 9 because 2.3.5.7.11.13.17.19.23.29 is bigger than 10**9), so you will have at most 1,800,000 edges in the middle.
This is much fewer than the 10,000,000,000 edges you could have had before (e.g. if all 100,000 entries in A and B were all even) so perhaps your max flow algorithm has a chance of meeting the time limit.

How can I make adding a link to a doubly-linked list perform N/2 rather than O(N) (int index, element a) as parameters

It seems that the only Big-Oh behavior possible for adding something to a linked list would be O(N) since you must traverse the entire list. However from what I hear the overall number of operations should be no more than N/2. Can someone please explain how this is possible, as I see it if you traverse from both ends of the linked list the overall behavior will still be O(N). What am I missing?
If you are asking why will inserting an element at an arbitrary position in a doubly linked list of length N always take at most O(N/2), that could be because if you always maintain a separate pointer/reference to the middle element and a count of the total number of elements, you will only need to traverse at most half the list to insert at a given position.
For example, say you have a list of [B, C, D, E, F, G, H], if you have a pointer to the E element and you know you have 7 items in your list. If you call insert(0, A) to insert element A at position 0, then you'll know that if you traverse 3 links backwards you'll go from position 3 to position 0 (remember zero index is the first, so you go from E#3 -> D#2 -> C#1 -> B#0). From there you can insert element A before your 'current' element (B).
Note that people normally leave constant terms out of big-O analysis; O(n/2) and O(n) have the same performance characteristics as n increases.
If you don't care about ordering, adding something to a linked list can be O(1) -- always add the new item at one end.
At a guess, the N/2 is based on an assumption that the list is sorted, so on average you only traverse half the list to find the correct insertion point.
If you want random access to the items, the real answer is simple: use something other than a linked list.

Split up a collection, for each subset respecting probabilities for properties of its items

For a small game (for which I am a bit forced to use C++, so STL-based solutions can be interesting here), I encountered following neat problem. I was wondering if there is any literature on the subject that I could read, or clever implementations.
Collection S of unique items {E1, E2, E3}, each item E having a set of properties, {P1, P2, P3...}
This collection should be split up in S1, S2, S3, S4. It is defined how large S1..4 have to be exactly. We can assume the collection can be correctly split up in those sizes for the remainder of the problem.
Now, for S1, a number of constraints can appear, {C1, C2..}, which specify that for instance, no items with the property P1 may appear in it. Another constraint could be that it should favour the items with property P2 with a factor of 0.8 (we can assume these types of constraints are normalized for all of the subsets per property).
The "weighting" is not that hard to implement. I simply fill some array with candidate numbers, the ones with higher weight are represented more in this array. I then select a random element of the array. the size of the array determines accuracy/granularity (in my case, a small array suffices).
The problem is forbidding some items to appear. It can easily lead to a situation where one item in S needs to be placed in one of the subsets S1, S2, S3, or S4, but this can no longer happen because the subsets are either all full, or the ones that are not full have a specific constraint that this item cannot appear in the set. So you have to backtrack the placement. Doing so too often may violate the weighted probability too much.
How is this problem called, or does it easily map to another (probably NP) problem?
EDIT: Example:
S = {A, B, C, D, E, F, G, H, I, J, K, L, M }
S1 = [ 0.8 probability of having VOWEL, CANNOT HAVE I or K, SIZE = 6 ]
S2 = [ 0.2 probability of having VOWEL, CANNOT HAVE M, B, E, SIZE = 7 ]
Now, suppose we start filling by FOR(LETTER IN S):
LETTER A, create a fill array based on property constraints (0.8 vs 0.2):
[ 1, 1, 1, 1, 1, 1, 1, 2, 2].
Pick a random element from that array: 1.
Now, put A in S1.
For letter I, for instance, the only candidate would be 2, since S1 has a constraint that I cannot appear in it.
Keep doing this, eventually you might end up with:
C = { M } // one more letter to distribute
S1 = A, B, D, E, F, G
S2 = C, F, G, I, K, L
Now, where to place M? I tcannot be placed in S1, since that one is full, and it cannot be placed in S2 because it has a constraint that M cannot be placed in it.
The only way is to backtrack some placement, but then we might mess with the weighted distribution too much (f.i., giving S2 one vowel of S1, which flips around the natural distribution)
Note that this become slightly more complex (in the sense that more backtracks would be needed) when more subsets are in play, instead of just 2.
This has resemblance to a constraint satisfaction problem (CSP) with hard and soft constraints. There are a couple of standard algorithms for that, but you have to check, if they apply to your particular problem instance.
Check wikipedia for starters.
How about this heuristic:
1 Taking into consideration limitations due to constraints and full sets, locate any elements that only meet the criteria for a single set and place them there. If at any point, one of these insertion causes a set to become full, re-evaluate the elements for meeting the criteria for only a single set.
2 Now look only at elements that could fit in exactly two sets. For each element compute the differences in the required probabilities for each set if you added that element vs if you did not. Insert the element into the set where the insert results in the best short term result (first fit / greedy algorithm). If an insert fills up a set, re-evaluate the elements for meeting the criteria for only two sets
3 Continue for elements that fit in 3 sets, 4 sets ... n sets.
At this point all elements will be placed into sets meeting all the constraints, but the probabilities are probably not optimal. You could continue by swapping elements between the sets (only allowing swaps that don't violate constraints), by using a gradient descent or random-restart hill clibing algorithm on a function describing the how closely all the probability's are met. This will tend to converge towards the optimal solution but is not guaranteed to meet it. Continue until you meet your requirements to within an acceptable amount, or until a fixed time limit is met, or until the improvements possible is below a set threshold.