How to convert a graph that is NOT DAG to a graph that is DAG? - directed-acyclic-graphs

Hello there everyone,
Is there any general algorithms that takes a non DAG(directed acyclic graph) as input and outputs a directed acyclic graph.
Currently, I am not sure which data structures I will be using to represent my graph. But, I am searching just for the algorithm at this point.
Hope you guys can inform me on the matter.
Cheers~
I want to go from this:
To this:
NEW GRAPH GRAPH THIS IS ONLY FOR THE SOLUTION BeyelerStudios provided

You can always simply flip some edges to get an acyclic graph from a cyclic one (graph G with vertices V and edges E):
input: G(V,E), n = |V|
visited <- empty set
queue <- empty queue
for each node in V
// skip visited nodes
if visited.find(node)
continue
// push a dummy edge (node is unvisited)
queue.push(edge(inf, node))
while !queue.empty()
edge <- queue.pop()
if visited.find(edge.stop)
// potential cycle detected
if edge.start != edge.stop
// eliminate loops, if any
E.flip(edge)
else
visited.insert(edge.stop)
for each outgoing edge e at edge.stop
queue.push(e)
Depending on the queue you use you get different behaviour:
a stack (LIFO queue) results in depth-first traversal
a FIFO queue results in breadth-first traversal
a priority queue results in a DAG containing its spanning tree(s)
There's a caviat in the above code: potential cycle detected. Imagine a graph with vertices A, B, C and edges A->B, A->C, C->B. The above snippet detects a potential cycle when processing C->B last. If you want to disambiguate valid edges from edges introducing cycles at that point you need to show that there are no paths from B to C yet. This is a much harder task and there are some good answers (and hints) in this answer here: basically you'd need to perform another graph traversal to detect (or exclude) such a conflicting path.

Related

What does a “successor” stand for in llvm?

When I reading the llvm document, especially the BasicBlock part, the concept “successor” appears a lot, so what is it?
Both terms, basic block and successor, are coming from the field of Control Flow Analysis (or CFA).
In CFA, a program is represented using Control Flow Graph (or CFG).
Each vertex (or node) in a CFG is a basic block. Since CFG is a directed graph, a basic block may have incoming and outcoming edges. E.g.: A -> B -> C. Incoming edges are coming from predecessors, and the outcoming edges are leading to successors.
The set of successors/predecessors for the mentioned example (A -> B -> C):
pred(A) = {}
succ(A) = {B}
pred(B) = {A}
succ(B) = {C}
pred(C) = {B}
succ(C) = {}
A successor is the target of a the branch that ends a basic block.
For example, if a basic block ends with a five-way switch, then that block has six successors (five explicit and the switch's default). A basic block that ends with a return has no successors.

TBB flow graph conditional execution AND multiple in - and outputs

I've read TBB flow graph conditional execution and have a slightly different problem.
Is it possible to create a node with multiple inputs and multiple outputs AND to control the execution by a conditional variable? Maybe without ugly casts.
I've attached a simple example how I would like to design the graph. How is the besst way to get it run with TBB flow graph?
start_node sends a start_msg to some_node
if start_msg is empty, some_node sends a continue_msg to end_node, else some_node sends a continue_msg to itself AND a data_msg to end_node
if continue_msg is received by some_node, previous start_msg is checked if it's empty, if so, a continue_msg is send to end_node, else a data_msg is send.
+--continue_msg--+
| |
+-----+ +-----+
| |
| | +----data_msg---+
v | / \
start_node --start_msg--> some_node end_node
\ /
+--continue_msg--+
One problem I'm dealing with: I can't say how many good elements are inside of start_msg even is the size is known (let's say start_msg holds a tbb::concurrent_vector<T>. If some_node finds a bad element, it will be ignored and some_node is sending a continue_msg to itself.
It looks like, the source_node can be used in your algorithm. source_node can generate as many messages as you need. So the algorithm can be reworked a bit:
source_node -> ... -> end_node
Why do you need a continue_msg to be sent to the end_node? To mark the last message? Perhaps, you can use a std::pair<T,bool> where the first element is data and the second one is an indication of the last message.
The Body of the source_node finds the valid element in tbb::concurrent_vector<T>, creates a new message make_pair(Data, false) and returns true for each Body invocation. When the last element is extracted from the container it creates make_pair(Data, true) and returns false as an indication of the last element.
Unfortunately, I do not know the original algorithm and my suggestion can be inappropriate. Could you provide more details if it does not suit your needs, please?

Remove 100,000+ nodes from a Boost graph

I have a graph ( adjacency_list (listS, vecS, bidirectionalS, VertexVal) ) in which I need to delete 100,000+ nodes. Each node also contains a structure of 2 64-bit integers and another 64-bit integer. The guid check that happens in the code below is checking 1st integer in the structure.
On my laptop ( i7 2.7GHz, 16GB RAM ) it takes about 88 seconds according to VTune.
Following is how I delete the nodes:
vertex_iterator vi,vi_end;
boost::tie(vi, vi_end) = boost::vertices(m_graph);
while (vi!=vi_end) {
if (m_graph[*vi].guid.part1 == 0) {
boost::remove_vertex(*vi,m_graph);
boost::tie(vi, vi_end) = boost::vertices(m_graph);
} else
++vi;
}
Vtune shows that the boost::remove_vertex() call takes 88.145 seconds. Is there a more efficient way to delete these vertices?
In your removal branch you re-tie() the iterators:
boost::tie(vi, vi_end) = boost::vertices(m_graph);
This will cause the loop to restart every time you restart the loop. This is exactly Schlemiel The Painter.
I'll find out whether you can trust remove_vertex not triggering a reallocation. If so, it's easily fixed. Otherwise, you'd want an indexer-based loop instead of iterator-based. Or you might be able to work on the raw container (it's a private member, though, as I remember).
Update Using vecS as the container for vertices is going to cause bad performance here:
If the VertexList template parameter of the adjacency_list was vecS, then all vertex descriptors, edge descriptors, and iterators for the graph are invalidated by this operation. <...> If you need to make frequent use of the remove_vertex() function the listS selector is a much better choice for the VertexList template parameter.
This small benchmark test.cpp compares:
with -DSTABLE_IT (listS)
$ ./stable
Generated 100000 vertices and 5000 edges in 14954ms
The graph has a cycle? false
starting selective removal...
Done in 0ms
After: 99032 vertices and 4916 edges
without -DSTABLE_IT (vecS)
$ ./unstable
Generated 100000 vertices and 5000 edges in 76ms
The graph has a cycle? false
starting selective removal...
Done in 396ms
After: 99032 vertices and 4916 edges
using filtered_graph (thanks #cv_and_he in the comments)
Generated 100000 vertices and 5000 edges in 15ms
The graph has a cycle? false
starting selective removal...
Done in 0ms
After: 99032 vertices and 4916 edges
Done in 13ms
You can clearly see that removal is much faster for listS but generating is much slower.
I was able to successfully serialize the graph using Boost serialization routines into a string, parse the string and remove the nodes I didn't need and de-serialize the modified string. For 200,000 total nodes in graph and 100,000 that needs to be deleted I was able to successfully finish the operation in less than 2 seconds.
For my particular use-case each vertex has 3 64bit integers. When it needs to be deleted, I mark 2 of those integers as 0s. A valid vertex would never have a 0. When the point comes to clean up the graph - to delete the "deleted" vertices, I follow the above logic.
In the code below removeDeletedNodes() does the string parsing and removing the vertices and mapping the edge numbers.
It would be interesting to see more of the Vtune data.
My experience has been that the default Microsoft allocator can be a big bottleneck when deleting tens of thousands of small objects. Does your Vtune graph show a lot of time in delete or free?
If so, consider switching to a third-party allocator. Nedmalloc is said to be good: http://www.nedprod.com/programs/portable/nedmalloc/
Google has one, tcmalloc, which is very well regarded and much faster than the built-in allocators on almost every platform. https://code.google.com/p/gperftools/ tcmalloc is not a drop-in for Windows.

How can I use priority queue in SWI-Prolog?

I am trying to implement the A* algorithm in SWI-Prolog. I have a graph whose each state consists of the following values (Cost_So_Far, Heuristic, "Doesn't Matter", "Doesn't Matter", "Doesn't Matter") and I want to insert the state into a priority queue according to Heuristic which is an integer. How can I do this?
You can use a "heap" library, which is an implementation of the concept of "priority queue". There's a heap Prolog implementation by Richard O'Keefe floating around. SWI-Prolog also comes with a heap implementation in its "heaps" library by Lars Buitinck. Logtalk (which runs on several Prolog systems, including SWI-Prolog) also includes max- and min-heaps derived from Richard's original implementation. Using the heuristic value as the key as Boris suggested, a heap should be more efficient than a list that you would have to resort every time you add a new pair.
Some useful links:
SWI-Prolog heaps library
Logtalk heap protocol
Logtalk min-heap and max-heap implementations
One easy way is to use a Key-Value pair list, which has the form:
[1-state(Cost_so_far, ...), 2-state(...), 3-state(...)]
Your integer value from the heuristic would be the key, compound term with the state functor (of whatever arity you need) would be the value. Note that this is the conventional way of keeping a list of pairs. You can use matching to get them out, for example, the state at the head of the queue would be:
[Heuristic-state(A, B, C)|QueueRest]
You should probably use the built-in keysort/2 for sorting it (very efficiently) every time you have added new states at the top of the queue.
This is a basic implementation of priority queues.
I just started learning Prolog.
If any better way to implement priority queues, please comment down below.
takeout(X,[X|R],R).
takeout(X,[F|Fs],[F|S]):- takeout(X,Fs,S).
delete_elem([H|T]):- min_p([H|T],99,MinP) , search_minp(MinP , [H|T] , Num) ,
takeout((Num,MinP),[H|T],Z) , write(Z).
ins([],L,L).
ins([(Num,Priority)|T],L,[(Num,Priority)|Z] ):- ins(T,L,Z).
/* here i used ins function to append (Num,Priority) to a existing queue
if [(99,2) , (90,1) , (96, 3)] this is priority queue. if want to add (93,4)
ins([(93,4)] , [(99,2) , (90,1) , (96, 3)] ,Z).
Z = [(99,2) , (90,1) , (96, 3) , (93,4)]
*/
min_p([],Min,Min).
min_p([(Num,Priority)|Z],X,Y):- Priority < X , Min is Priority , min_p(Z,Min,Y).
min_p([(Num,Priority)|Z],X,Y):- Priority > X , min_p(Z,X,Y).
/* min_p here I used to find the minimum priority from the given priority queue.
X should be INT_MAX
min_p([(99,2) , (90,1) , (96, 3) , (93,4)],999,Y).
Y = 1 .
*/
search_minp(Priority,[(Num,P)|T],Num):- Priority is P.
search_minp(Priority,[(Num,P1)|T],X):- search_minp(Priority,T,X).
/* search_minp is used to search element corresponding to minimum priority.
if this is our priority queue [(99,2) , (90,1) , (96, 3) , (93,4)]
search_minp(1,[(99,2) , (90,1) , (96, 3) , (93,4)],Z).
Z = 90.
*/

Joining fragments of messages together

I need to write a function that can receive fragments of different messages, and then piece them together. The fragments are in the form of a class msg, which holds information of
int message_id
int no_of_fragments
int fragment_id
string msg_fragment
The function needs to do the following
Check received message - if no_of_fragments == 1 then the message has not been fragmented and function can stop here
If no_of_fragments > 1 then message is fragmented
get message_id and fragment_id
collect all fragments e.g. for message_id=111 with no_of_fragments=6, the system should ensure that fragments_id 1-6 have been collected
piece fragments together
What is the best way for doing this? I thought a map might be useful (with the message_id serving as key, pointing to a container that would hold the fragments) but would appreciate any suggestions.
Thank you!
I would use a map of vectors. Each time you receive a new message ID, use that as your map key. Then allocate a vector to hold the fragments based on the number of fragments specified in the first fragment received (doesn't have to be in order). You'll also need to hold the count, so it's easy to know when you've received the last fragment, so probably a map of message_id to a struct of count and the vector of fragments.
My c++ is rusty:
struct message_parts {
int fragments_expected; // init to no_of_fragments
int fragments_received; // init to 0 (you'll bump it as soon as you add the fragment to the vector)
vector<fragment *> fragments; <-- initialize size to no_of_fragments
}
std::map<int, message_parts> partial_messages
When you insert a fragment, put it directly into the location in the fragments vector based on the fragment_id - 1 (since they are zero-indexed). This way you'll always have them in the right order, no matter the order they come in.
After you add a fragment, check to see if fragments_received == fragments expected, and then you can piece it together and deal with the data.
This gives constant time first-fragment detection and allocation, constant time fragment insertion, constant time complete-message-received detection, and linear time message reconstruction (can't do any better than this).
This solution requires no special casing for non-fragmented data.
Don't forget to delete the fragments once you've reassembled them into the complete message.