I need to write a program on breath first search. I have good idea about the algorithm and can implement it. I have a small problem. In my homework i have been asked to generate random connectivity among the nodes. I thought generate a random number between 0 and all edges possible which will represent the total number of edges. Then for each one of those edges, you will randomly select two nodes. but this doesn't sound good. Need help
Related
I want to induce a small graph on a large graph. Both of these graphs are shown bellow. The vertices with same color are equivalent.
small graph (subject to be induced)
large graph (The smaller graph is induced on)
The problem is none of the red vertices on the large graph have 4 neighbors as the small graph. So boost::vf2_subgraph_iso would fail inducing small on the large. But If the inducing algorithm is more tolerant (maximal matching instead of matching all vertices) then it may yield a closest match.
On the other hand boost::mcgregor_common_subgraphs takes really exponentially long time to complete. The other consequence is applying Maximum Common Subgraph finding algorithm on, two absolutely incompatible graphs is wastage of huge amount of CPU time. mcgregor's algorithm is based on finding cliques. So it starts with 1 vertex subgraph aand progresses towards larger subgraphs. For a subgraph hving more than 10 vertices it takes unacceptable amount of time to reach to the obvious solution.
So What is the solution in this scenario ? Is there a more tolerant algorithm for inducing subgraph ?
I'm writing an C++ application that uses boost graph library.
I use kruskal_minimum_spanning_tree function in BGL.
Are there any ways to check how many times does this function iterate, because application hangs at this function?
Update. I have a progress bar in application that indicates progress of some algorithm. When I pass 100 vertices to BGL, it computes minimum spanning tree for less than a second. But when I pass more than 10000 vertices, BGL can calculate the tree for 10 or even 20 minutes. So I would like to see, how much edges in tree are computed.
I'm given a series of cities, and each one produces an amount of electricity and needs an amount of electricity. Each city has up to 8 adjacent cities, and I am trying to minimize the number of transfers.
If A->B 10 energy, total cost of transfer is 10.
If A->B->C 10 energy (A to C through B), total cost of transfer is 20.
I thought about using Djikstra's on each point that needs energy, and ending the search for that point when enough energy has been found, but thought of several pitfalls.
I was wondering what else I could consider that could potentially work?
I also considered looking into the Floyd-Warshall algorithm as well as the Hagerup (read a bit about them on wikipedia and they seemed potentially viable)
Thanks
Your problem is easily reduced to a well-known minimum-cost flow problem:
The minimum-cost flow problem (MCFP) is to find the cheapest possible
way of sending a certain amount of flow through a flow network.
This reduction can be done the following way. Add a dummy "source" and "sink" vertices to your graph, add directed edge from source to each original vertex with capacity equal to production rate at that vertex, add a directed edge from each original vertex to sink with capacity equal to consumption rate at that vertex. Set capacities and costs on your original edges as you need them, and solve the max-flow min-cost problem on the resulting network.
I also doubt that Dijkstra algorithm or any shortest-path algorithm will be of any use, as they are concerned with the path of only one unit of electricity from a particular city, and do not take into account "interference" effects from electricity produced in different cities. For example, if you have two cities (A and B) producing 1 unit of energy, one more city (C) close to both A and B consuming 1 unit of energy, and one more city (D) far away consuming 1 unit of energy, then you will have to route energy from either A either B to D, but no shortest-path algorithm will offer you this.
Ending the search as soon as you have enough energy isn't guaranteed to find the shortest path, but letting Dijkstra run completely for each point that's a power consumer will, and is probably still reasonable to do computationally depending on the size of the network.
Lookup A* algorithm it improves on dijkstra with heuristics which might remove some pitfalls.
I can't really think of any other algorithm.
Actually I think A* should be fine.
I have an implementation of Dijkstra's Algorithm, based on the code on this website. Basically, I have a number of nodes (say 10000), and each node can have 1 to 3 connections to other nodes.
The nodes are generated randomly within a 3d space. The connections are also randomly generated, however it always tries to find connections with it's closest neighbors first and slowly increases the search radius. Each connection is given a distance of one. (I doubt any of this matters but it's just background).
In this case then, the algorithm is just being used to find the shortest number of hops from the starting point to all the other nodes. And it works well for 10,000 nodes. The problem I have is that, as the number of nodes increases, say towards 2 million, I use up all of my computers memory when trying to build the graph.
Does anyone know of an alternative way of implementing the algorithm to reduce the memory footprint, or is there another algorithm out there that uses less memory?
According to your comment above, you are representing the edges of the graph with a distance matrix long dist[GRAPHSIZE][GRAPHSIZE]. This will take O(n^2) memory, which is too much for large values of n. It is also not a good representation in terms of execution time when you only have a small number of edges: it will cause Dijkstra's algorithm to take O(n^2) time (where n is the number of nodes) when it could potentially be faster, depending on the data structures used.
Since in your case you said each node is only connected to up to 3 other nodes, you shouldn't use this matrix: Instead, for each node you should store a list of the nodes it is connected to. Then when you want to go over the neighbors of a node, you just need to iterate over this list.
In some specific cases you don't even need to store this list because it can be calculated for each node when needed. For example, when the graph is a grid and each node is connected to the adjacent grid nodes, it's easy to find a node's neighbors on the fly.
If you really cannot afford memory, even with minimizations on your graph representation, you may develop a variation of the Dijkstra's algorithm, considering a divide and conquer method.
The idea is to split data into minor chunks, so you'll be able to perform Dijkstra's algorithm in each chunk, for each of the points within it.
For each solution generated in these minor chunks, consider the it as an unique node to another data chunk, from which you'll start another execution of Dijkstra.
For example, consider the points below:
.B .C
.E
.A .D
.F .G
You can select the closest points to a given node, say, within two hops, and then use the solution as part of the graph extended, considering the former points as only one set of points, with a distance equal to the resulting distance of the Dijkstra solution.
Say you start from D:
select the closest points to D within a given number of hops;
use Dijkstra's algorithm upon the selected entries, commencing from D;
use the solution as a graph with the central node D and the last nodes in the shortest paths as nodes directly linked to D;
extend the graph, repeating the algorithm until all the nodes have been considered.
Although there's a costly extra processing here, you'd be able to surpass memory limitation, and, if you have some other machines, you can even distribute the processes.
Please, note this is just the idea of the process, the process I've described is not necessarily the best way to do it. You may find something interesting looking for distributed Dijkstra's algorithm.
I like boost::graph a lot. It's memory consumption is very decent (I've used it on road networks with 10 million nodes and 2Gb ram).
It has a Dijkstra implementation, but if the goal is to implement and understand it by yourself, you can still use their graph representation (I suggest adjacency list) and compare your result with theirs to be sure your result is correct.
Some people mentioned other algorithms. I don't think this will play a big role on the memory usage, but more likely in the speed. 2M nodes, if the topology is close to a street-network, the running time will be less than a second from one node to all others.
http://www.boost.org/doc/libs/1_52_0/libs/graph/doc/index.html
Question: Which data structure is more efficient when calculating n most frequent words in a text file. Hash tables or Priority Queues?
I've previously asked a question related to this subject however after the creative responses I got confused and I've decided on two data types that I actually implement easily; Hash table vs Priority Queues
Priority Queue Confusion: To be honest, I've listened to a lecture from youtube related to priority queues, understood it's every component, however when it comes to its applicability, I get confused. Using a binary heap I can easily implement the priority queue however my challenge is the match its components usage to frequency problem.
My Hash table Idea: Since in here deciding the on hash table's size was a bit uncertain I've decided to go with what makes more sense to me: 26. Due to the number of letters in alphabet. In addition, with a good hash function it would be efficient. However reaching out and out again for linked lists (using separate chaining for collusion) and incrementing its integer value by 1 ,in my opinion, wouldn't be efficient.
Sorry for the long post, but as fellow programmers which one would you recommend. If priority queue can you simply give me ideas to relate it to my question, if hash table could anything be done to make it even more efficient ?
A hash table would be the faster of the two choices offered, besides making more sense. Rather than choosing the size 26, if you have an estimate of the total number of unique words (and most people's vocabularies outside of technical specialized terms is not a lot bigger than 10,000 - 20,000 is really big, and 30,000 is for people who make a hobby of collecting words), make the size big enough that you don't expect to ever fill it so the probability of a collision is low - not more than 25%. If you want to be more conservative, implement a function to rehash the contents of the table into a table of twice the original size (and make the size a prime, so only approximately twice the original size).
Now since this is tagged C++, you might ask yourself why you aren't just using a multiset straight out of the standard template library. It will keep a count of how many of each word you enter into it.
In either case you'll need to make a separate pass to find which of the words are the n most frequent, as you only have the frequencies, not the rank order of the frequencies.
Why don't you use a generic/universal string hashing function? After all you don't want to count the first letter, you want to count over all possible words. I'd keep the bucket count dynamic. If not you will need to do insane amounts of linked-list traversals.