Dijkstra's algorithm: memory consumption

Dijkstra's algorithm: memory consumption - c++

I have an implementation of Dijkstra's Algorithm, based on the code on this website. Basically, I have a number of nodes (say 10000), and each node can have 1 to 3 connections to other nodes.
The nodes are generated randomly within a 3d space. The connections are also randomly generated, however it always tries to find connections with it's closest neighbors first and slowly increases the search radius. Each connection is given a distance of one. (I doubt any of this matters but it's just background).
In this case then, the algorithm is just being used to find the shortest number of hops from the starting point to all the other nodes. And it works well for 10,000 nodes. The problem I have is that, as the number of nodes increases, say towards 2 million, I use up all of my computers memory when trying to build the graph.
Does anyone know of an alternative way of implementing the algorithm to reduce the memory footprint, or is there another algorithm out there that uses less memory?

According to your comment above, you are representing the edges of the graph with a distance matrix long dist[GRAPHSIZE][GRAPHSIZE]. This will take O(n^2) memory, which is too much for large values of n. It is also not a good representation in terms of execution time when you only have a small number of edges: it will cause Dijkstra's algorithm to take O(n^2) time (where n is the number of nodes) when it could potentially be faster, depending on the data structures used.
Since in your case you said each node is only connected to up to 3 other nodes, you shouldn't use this matrix: Instead, for each node you should store a list of the nodes it is connected to. Then when you want to go over the neighbors of a node, you just need to iterate over this list.
In some specific cases you don't even need to store this list because it can be calculated for each node when needed. For example, when the graph is a grid and each node is connected to the adjacent grid nodes, it's easy to find a node's neighbors on the fly.

If you really cannot afford memory, even with minimizations on your graph representation, you may develop a variation of the Dijkstra's algorithm, considering a divide and conquer method.
The idea is to split data into minor chunks, so you'll be able to perform Dijkstra's algorithm in each chunk, for each of the points within it.
For each solution generated in these minor chunks, consider the it as an unique node to another data chunk, from which you'll start another execution of Dijkstra.
For example, consider the points below:
.B .C
.E
.A .D
.F .G
You can select the closest points to a given node, say, within two hops, and then use the solution as part of the graph extended, considering the former points as only one set of points, with a distance equal to the resulting distance of the Dijkstra solution.
Say you start from D:
select the closest points to D within a given number of hops;
use Dijkstra's algorithm upon the selected entries, commencing from D;
use the solution as a graph with the central node D and the last nodes in the shortest paths as nodes directly linked to D;
extend the graph, repeating the algorithm until all the nodes have been considered.
Although there's a costly extra processing here, you'd be able to surpass memory limitation, and, if you have some other machines, you can even distribute the processes.
Please, note this is just the idea of the process, the process I've described is not necessarily the best way to do it. You may find something interesting looking for distributed Dijkstra's algorithm.

I like boost::graph a lot. It's memory consumption is very decent (I've used it on road networks with 10 million nodes and 2Gb ram).
It has a Dijkstra implementation, but if the goal is to implement and understand it by yourself, you can still use their graph representation (I suggest adjacency list) and compare your result with theirs to be sure your result is correct.
Some people mentioned other algorithms. I don't think this will play a big role on the memory usage, but more likely in the speed. 2M nodes, if the topology is close to a street-network, the running time will be less than a second from one node to all others.
http://www.boost.org/doc/libs/1_52_0/libs/graph/doc/index.html

Related

Clustering or Filtering points in WGS84 Coordinates

So I'm trying to solve a problem. I have a point which can be a player, and I have several objects around, some are farther some are near er. I want to exclude all points that are farther and include the nearer using distances for example. How would one cluster or filter the objects. I'm thinking about spatial partitioning. The objects are in geographic coordinates. The number of objects can be 10.000

If every single point is allowed to move, updates might get expensive for kd-trees or similar adaptive structures. I guess I would go for a static partitioning approach, e.g. divide the space into a set of cells (quadratic or rectangular) and for each cell store references to the contained points alongside with maximum and minimum coordinates of the set of contained points. When points are moving, you can trivially compute the current cell they are in. When it comes to distance calculation, you just determine relevant cells and then compute the distances to their contained points with linear time.
I see three basic advantages with this approach:
By looking at the current contained min and max coordinates for each cell you can easily determine whether or not its empty and, if not, the whole set of contained points is relevant for your player's current position.
You can organize the static cells in a tree structure (e.g. a Quadtree) with perfect balancing. For each inner node of the tree you store and update the combined min and max coordinates of their child nodes. Note that updates are quite inexpensive because the tree's structure is not affected at all.
You don't need to sort your points (as it would be necessary for other structures or specific implementations). This could save you a lot of performance if objects are moving rapidly.
Building and maintaining the data structure is simple. You don't have to wreck your brain with exotic test cases and complicated structure updates.
There are, of course, some drawbacks in choosing a non-adaptive data structure because it's, well, non-adaptive. For example, you highly depend on the grid cells' size. If you choose it too small (worst case: one point per cell), the tree's depth bloats up and traversing gets expensive. On the other hand, if you choose it too large (worst case: at some point, all points are in the same cell), you will perform many unneeded and potentially expensive distance calculations.
All in all, it really depends on the kind of data you have. The proposal I gave you should give reasonably good results, but there probably are more efficient ways to do it. If you have enough time, implement both, an adaptive and a static partitioning approach, come up with some representative tests and compare them to each other.
Hope this helps ;)

Best graph algorithm for least transfer in an electric grid

I'm given a series of cities, and each one produces an amount of electricity and needs an amount of electricity. Each city has up to 8 adjacent cities, and I am trying to minimize the number of transfers.
If A->B 10 energy, total cost of transfer is 10.
If A->B->C 10 energy (A to C through B), total cost of transfer is 20.
I thought about using Djikstra's on each point that needs energy, and ending the search for that point when enough energy has been found, but thought of several pitfalls.
I was wondering what else I could consider that could potentially work?
I also considered looking into the Floyd-Warshall algorithm as well as the Hagerup (read a bit about them on wikipedia and they seemed potentially viable)
Thanks

Your problem is easily reduced to a well-known minimum-cost flow problem:
The minimum-cost flow problem (MCFP) is to find the cheapest possible
way of sending a certain amount of flow through a flow network.
This reduction can be done the following way. Add a dummy "source" and "sink" vertices to your graph, add directed edge from source to each original vertex with capacity equal to production rate at that vertex, add a directed edge from each original vertex to sink with capacity equal to consumption rate at that vertex. Set capacities and costs on your original edges as you need them, and solve the max-flow min-cost problem on the resulting network.
I also doubt that Dijkstra algorithm or any shortest-path algorithm will be of any use, as they are concerned with the path of only one unit of electricity from a particular city, and do not take into account "interference" effects from electricity produced in different cities. For example, if you have two cities (A and B) producing 1 unit of energy, one more city (C) close to both A and B consuming 1 unit of energy, and one more city (D) far away consuming 1 unit of energy, then you will have to route energy from either A either B to D, but no shortest-path algorithm will offer you this.

Ending the search as soon as you have enough energy isn't guaranteed to find the shortest path, but letting Dijkstra run completely for each point that's a power consumer will, and is probably still reasonable to do computationally depending on the size of the network.

Lookup A* algorithm it improves on dijkstra with heuristics which might remove some pitfalls.
I can't really think of any other algorithm.
Actually I think A* should be fine.

Generate random connectivity between nodes in a tree

I need to write a program on breath first search. I have good idea about the algorithm and can implement it. I have a small problem. In my homework i have been asked to generate random connectivity among the nodes. I thought generate a random number between 0 and all edges possible which will represent the total number of edges. Then for each one of those edges, you will randomly select two nodes. but this doesn't sound good. Need help

Least expensive equality search for small unsorted arrays

What is the most efficient method, in terms of execution time, to search a small array of about 4 to 16 elements, for an element that is equal to what you're searching for, in C++? The element being searched for is a pointer in this case, so it's relatively small.
(My purpose is to prevent points in a point cloud from creating edges with points that already share an edge with them. The edge array is small for each point, but there can be a massive number of points. Also, I'm just curious too!)

Your best bet is to profile your specific application with a variety of mechanisms and see which performs best.
I suspect that given it's unsorted a straight linear search will be best for you. If you're able to pre-sort the array once and it updates infrequently or never, you could pre-sort and then use a binary search.

Try a linear search; try starting with one or more binary chop stages. The former involves more comparisons on average; the latter has more scope for cache misses and branch mispredictions, and requires that the arrays are pre-sorted.
Only by measuring can you tell which is faster, and then only on the platform you measured on.

If you have to do this search more than once, and the array doesn't change often/at all, sort it and then use a binary search.

breadth first or depth first search

I know how this algorithm works, but cant decide when to use which algorithm ?
Are there some guidelines, where one better perform than other or any considerations ?
Thanks very much.

If you want to find a solution with the shortest number of steps or if your tree has infinite height (or very large) you should use breadth first.
If you have a finite tree and want to traverse all possible solutions using the smallest amount of memory then you should use depth first.
If you are searching for the best chess move to play you could use iterative deepening which is a combination of both.
IDDFS combines depth-first search's space-efficiency and breadth-first search's completeness (when the branching factor is finite).

BFS is generally useful in cases where the graph has some meaningful "natural layering" (e.g., closer nodes represent "closer" results) and your goal result is likely to be located closer to the starting point or the starting points are "cheaper to search".
When you want to find the shortest path, BFS is a natural choice.
If your graph is infinite or pro grammatically generated, you would probably want to search closer layers before venturing afield, as the cost of exploring remote nodes before getting to the closer nodes is prohibitive.
If accessing more remote nodes would be more expensive due to memory/disk/locality issues, BFS may again be better.

Which method to use usually depends on application (ie. the reason why you have to search a graph) - for example topological sorting requires depth-first search whereas Ford-Fulkerson algorithm for finding maximum flow requires breadth-first search.

If you are traversing a tree, depth-first will use memory proportional to its depth. If the tree is reasonably balanced (or has some other limit on its depth), it may be convenient to use recursive depth-first traversal.
However, don't do this for traversing a general graph; it will likely cause a stack overflow. For unbounded trees or general graphs, you will need some kind of auxiliary storage that can expand to a size proportional to the number of input nodes. In this case, breadth-first traversal is simple and convenient.
If your problem provides a reason to choose one node over another, you might consider using a priority queue, instead of a stack (for depth-first) or a FIFO (for breadth-first). A priority queue will take O(log K) time (where K is the current number of different priorities) to find the best node at each step, but the optimization may be worth it.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js