How can i optimize this dijkstra structure code? - c++

This is the dijkstra structure i am using :(however the MAXV(which is maximum number of vertices is maximum at 500 and every time i try to change it to something more than this it generates and error when running )
-I want to use this way to represent a graph with 10000 vertices, does anyone know how to optimize it ?
#include<iostream>
#include<stdio.h>
#include<stdlib.h>
#include<conio.h>
using namespace std;
#define MAXV 500
#define MAXINT 99999
typedef struct{
int next;
int weight;
}edge;
typedef struct{
edge edges[MAXV][MAXV];
int nedges;
int nvertices;
int ndegree[MAXV];
}graph;
and this is my dijkstra code:
int dijkstra(graph *g,int start,int end){
int distance[MAXV];
bool intree[MAXV];
for(int i=0;i<=MAXV;++i){
intree[i]=false;
distance[i]=MAXINT;
}
int v=start;
distance[v]=0;
while(intree[v]==false){
intree[v]=true;
for(int i=0;i<g->ndegree[v];++i){
int cand=g->edges[v][i].next;
int weight=g->edges[v][i].weight;
if(distance[cand] > distance[v]+weight){
distance[cand] = distance[v]+weight;
}
}
int dist=MAXINT;
for(int i=0;i<g->nvertices;++i){
if((intree[i]==false) && (dist > distance[i])){
dist=distance[i];
v=i;
}
}
}
return distance[end];
}

Use adjacency lists for storing the graph. Right now you're using an adjacency matrix, which means that you allocate MAXV*MAXV*sizeof(edge) bytes just for that. That's a lot when MAXV is 10 000, so you're probably getting a segmentation fault. Switching to adjacency lists will get rid of the error.
However, even with adjacency lists, the Dijkstra algorithm you have right now is O(n^2) where n is the number of nodes. That's still a lot for 10 000 nodes. Consider implementing Dijkstra with heaps (also here) if you have to support this many nodes.

Related

Prim's Algorithm using adjacency list in c++

I am implementing a simple version of Prim's algorithm using adjacency list using basic graph implementation idea.Here is my approach for this algorithm-
1.Pick an index.
2.Inside the Prims function,mark the index as visited.
3.If any adjacent vertex is not visited and cost of that vertex is less than mi(which is initialized as INF at the start of the function) then mi stores the cost and parent stores the index.
4.At the last adjacent vertex mi stores the smallest cost and parent stores the smallest cost index.
5.Again call the prims for parent vertex.
6.At last return the total cost.
But the problem in my approach is when I am comparing cost[v] with mi then it gives me an error cause I am comparing int with vector.I have tried using mi as a vector but then it gives me error in result section.My code is given below-
#include<bits/stdc++.h>
using namespace std;
#define mx 10005
#define INF 1000000007
vector<int>graph[mx],cost[mx];
int visited[mx]={0};
int result=0,parent;
int n,e,x,y,w,mi;
int prims(int u)
{
mi=INF;
visited[u]=1;
for(int i=0;i<graph[u].size();++i)
{
int v=graph[u][i];
if(visited[v]==0 and cost[v]<mi)
{
mi=cost[v];
parent=v;
}
if(i==graph[u].size()-1)
{
result+=mi;
prims(parent);
}
}
return result;
}
int main()
{
cin>>n>>e;
for(int i=1;i<=e;++i)
{
cin>>x>>y>>w;
graph[x].push_back(y);
graph[y].push_back(x);
cost[x].push_back(w);
cost[y].push_back(w);
}
int src;cin>>src;
int p=prims(src);
cout<<p<<endl;
return 0;
}

Fast element-wise access in Eigen::SparseMatrix in Latent Dirichlet Allocation

I am implementing Latent Dirichlet Allocation (LDA) in Rcpp. In LDA, we need to deal with a huge sparse matrix (e.g. 50 x 3000).
I decided to use SparseMatrix in Eigen. However, since I need access to each cell, computationally expensive .coeffRef slows down my function a lot.
Is there any way to use SparseMatrix while keeping the speed?
What I want to do has four steps,
I know which cell (i,j) I want to access.
I want to know whether the cell (i,j) is 0 or not.
If the cell (i,j) is not 0, I want to know its value.
After doing some analysis with the value in step 2 and 3, I want to update the cell (i,j). In this step, I might need to update the cell (i,j) which originally has 0.
#include <iostream>
#include <Eigen/dense>
#include <Eigen/Sparse>
using namespace std;
using namespace Eigen;
typedef Eigen::Triplet<double> T;
int main(){
Eigen::SparseMatrix<double> spmat;
// Insert in spmat
vector<T> tripletList;
int value;
tripletList.push_back(T(0,1,1));
tripletList.push_back(T(0,3,2));
tripletList.push_back(T(1,5,3));
tripletList.push_back(T(2,4,4));
tripletList.push_back(T(4,1,5));
tripletList.push_back(T(4,5,6));
spmat.resize(5,7); // define size
spmat.setFromTriplets(tripletList.begin(), tripletList.end());
for(int i=0; i<5; i++){ // I am accessing all cells just to clarify I need to access cell
for(int j=0; j<7; j++){
// Check if (i,j) is 0
if(spmat.coeffRef(i,j) != 0){
// Some analysis
value = spmat.coeffRef(i,j)*2; // just an example, more complex in the model
}
spmat.coeffRef(i,j) += value; // update (i,j)
}
}
cout << spmat << endl;
return 0;
}
Since the number of rows is much smaller than the columns, I considered accessing a column and then check the row value, but I couldn't handle SparseMatrix<double>::InnerIterator it(spmat, colid).

Finding nearest neighbor(s) in a KD Tree

Warning: Fairly long question, perhaps too long. If so, I apologize.
I'm working on a program involving a nearest neighbor(s) search of a kd tree (in this example, it is an 11 dimensional tree with 3961 individual points). We've only just learned about them, and while I have a good grasp of what the tree does, I get very confused when it comes to the nearest neighbor search.
I've set up a 2D array of points, each containing a quality and a location, which looks like this.
struct point{
double quality;
double location;
}
// in main
point **parray;
// later points to an array of [3961][11] points
I then translated the data so it has zero mean, and rescaled it for unit variance. I won't post the code as it's not important to my questions. Afterwards, I built the points into the tree in random order like this:
struct Node {
point *key;
Node *left;
Node *right;
Node (point *k) { key = k; left = right = NULL; }
};
Node *kd = NULL;
// Build the data into a kd-tree
random_shuffle(parray, &parray[n]);
for(int val=0; val<n; val++) {
for(int dim=1; dim<D+1; dim++) {
kd = insert(kd, &parray[val][dim], dim);
}
}
Pretty standard stuff. If I've used random_shuffle() incorrectly, or if anything is inherently wrong about the structure of my tree, please let me know. It should shuffle the first dimension of the parray, while leaving the 11 dimensions of each in order and untouched.
Now I'm on to the neighbor() function, and here's where I've gotten confused.
The neighbor() function (last half is pseudocode, where I frankly have no idea where to start):
Node *neighbor (Node *root, point *pn, int d,
Node *best, double bestdist) {
double dist = 0;
// Recursively move down tree, ignore the node we are comparing to
if(!root || root->key == pn) return NULL;
// Dist = SQRT of the SUMS of SQUARED DIFFERENCES of qualities
for(int dim=1; dim<D+1; dim++)
dist += pow(pn[d].quality - root->key->quality, 2);
dist = sqrt(dist);
// If T is better than current best, current best = T
if(!best || dist<bestdist) {
bestdist = dist;
best = root;
}
// If the dist doesn't reach a plane, prune search, walk back up tree
// Else traverse down that tree
// Process root node, return
}
Here's the call to neighbor in main(), mostly uncompleted. I'm not sure what should be in main() and what should be in the neighbor() function:
// Nearest neighbor(s) search
double avgdist = 0.0;
// For each neighbor
for(int i=0; i<n; i++) {
// Should this be an array/tree of x best neighbors to keep track of them?
Node *best;
double bestdist = 1000000000;
// Find nearest neighbor(s)?
for(int i=0; i<nbrs; i++) {
neighbor(kd, parray[n], 1, best, &bestdist);
}
// Determine "distance" between the two?
// Add to total dist?
avgdist += bestdist;
}
// Average the total dist
// avgdist /= n;
As you can see, I'm stuck on these last two sections of code. I've been wracking my brain over this for a few days now, and I'm still stuck. It's due very soon, so of course any and all help is appreciated. Thanks in advance.
The kd-tree does not involve shuffling.
In fact, you will want to use sorting (or better, quickselect) to build the tree.
First solve it for the nearest neighbor (1NN). It should be fairly clear how to find the kNN once you have this part working, by keeping a heap of the top candidates, and using the kth point for pruning.

Creating random undirected graph in C++

The issue is I need to create a random undirected graph to test the benchmark of Dijkstra's algorithm using an array and heap to store vertices. AFAIK a heap implementation shall be faster than an array when running on sparse and average graphs, however when it comes to dense graphs, the heap should became less efficient than an array.
I tried to write code that will produce a graph based on the input - number of vertices and total number of edges (maximum number of edges in undirected graph is n(n-1)/2).
On the entrance I divide the total number of edges by the number of vertices so that I have a const number of edges coming out from every single vertex. The graph is represented by an adjacency list. Here is what I came up with:
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <list>
#include <set>
#define MAX 1000
#define MIN 1
class Vertex
{
public:
int Number;
int Distance;
Vertex(void);
Vertex(int, int);
~Vertex(void);
};
Vertex::Vertex(void)
{
Number = 0;
Distance = 0;
}
Vertex::Vertex(int C, int D)
{
Number = C;
Distance = D;
}
Vertex::~Vertex(void)
{
}
int main()
{
int VertexNumber, EdgeNumber;
while(scanf("%d %d", &VertexNumber, &EdgeNumber) > 0)
{
int EdgesFromVertex = (EdgeNumber/VertexNumber);
std::list<Vertex>* Graph = new std::list<Vertex> [VertexNumber];
srand(time(NULL));
int Distance, Neighbour;
bool Exist, First;
std::set<std::pair<int, int>> Added;
for(int i = 0; i < VertexNumber; i++)
{
for(int j = 0; j < EdgesFromVertex; j++)
{
First = true;
Exist = true;
while(First || Exist)
{
Neighbour = rand() % (VertexNumber - 1) + 0;
if(!Added.count(std::pair<int, int>(i, Neighbour)))
{
Added.insert(std::pair<int, int>(i, Neighbour));
Exist = false;
}
First = false;
}
}
First = true;
std::set<std::pair<int, int>>::iterator next = Added.begin();
for(std::set<std::pair<int, int>>::iterator it = Added.begin(); it != Added.end();)
{
if(!First)
Added.erase(next);
Distance = rand() % MAX + MIN;
Graph[it->first].push_back(Vertex(it->second, Distance));
Graph[it->second].push_back(Vertex(it->first, Distance));
std::set<std::pair<int, int>>::iterator next = it;
First = false;
}
}
// Dijkstra's implementation
}
return 0;
}
I get an error:
set iterator not dereferencable" when trying to create graph from set data.
I know it has something to do with erasing set elements on the fly, however I need to erase them asap to diminish memory usage.
Maybe there's a better way to create some undirectioned graph? Mine is pretty raw, but that's the best I came up with. I was thinking about making a directed graph which is easier task, but it doesn't ensure that every two vertices will be connected.
I would be grateful for any tips and solutions!
Piotry had basically the same idea I did, but he left off a step.
Only read half the matrix, and ignore you diagonal for writing values to. If you always want a node to have an edge to itself, add a one at the diagonal. If you always do not want a node to have an edge to itself, leave it as a zero.
You can read the other half of your matrix for a second graph for testing your implementation.
Look at the description of std::set::erase :
Iterator validity
Iterators, pointers and references referring to elements removed by
the function are invalidated.
All other iterators, pointers and
references keep their validity.
In your code, if next is equal to it, and you erase element of std::set by next, you can't use it. In this case you must (at least) change it and only after this keep using of it.

Optimizing a dijkstra implementation

QUESTION EDITED, now I only want to know if a queue can be used to improve the algorithm.
I have found this implementation of a mix cost max flow algorithm, which uses dijkstra: http://www.stanford.edu/~liszt90/acm/notebook.html#file2
Gonna paste it here in case it gets lost in the internet void:
// Implementation of min cost max flow algorithm using adjacency
// matrix (Edmonds and Karp 1972). This implementation keeps track of
// forward and reverse edges separately (so you can set cap[i][j] !=
// cap[j][i]). For a regular max flow, set all edge costs to 0.
//
// Running time, O(|V|^2) cost per augmentation
// max flow: O(|V|^3) augmentations
// min cost max flow: O(|V|^4 * MAX_EDGE_COST) augmentations
//
// INPUT:
// - graph, constructed using AddEdge()
// - source
// - sink
//
// OUTPUT:
// - (maximum flow value, minimum cost value)
// - To obtain the actual flow, look at positive values only.
#include <cmath>
#include <vector>
#include <iostream>
using namespace std;
typedef vector<int> VI;
typedef vector<VI> VVI;
typedef long long L;
typedef vector<L> VL;
typedef vector<VL> VVL;
typedef pair<int, int> PII;
typedef vector<PII> VPII;
const L INF = numeric_limits<L>::max() / 4;
struct MinCostMaxFlow {
int N;
VVL cap, flow, cost;
VI found;
VL dist, pi, width;
VPII dad;
MinCostMaxFlow(int N) :
N(N), cap(N, VL(N)), flow(N, VL(N)), cost(N, VL(N)),
found(N), dist(N), pi(N), width(N), dad(N) {}
void AddEdge(int from, int to, L cap, L cost) {
this->cap[from][to] = cap;
this->cost[from][to] = cost;
}
void Relax(int s, int k, L cap, L cost, int dir) {
L val = dist[s] + pi[s] - pi[k] + cost;
if (cap && val < dist[k]) {
dist[k] = val;
dad[k] = make_pair(s, dir);
width[k] = min(cap, width[s]);
}
}
L Dijkstra(int s, int t) {
fill(found.begin(), found.end(), false);
fill(dist.begin(), dist.end(), INF);
fill(width.begin(), width.end(), 0);
dist[s] = 0;
width[s] = INF;
while (s != -1) {
int best = -1;
found[s] = true;
for (int k = 0; k < N; k++) {
if (found[k]) continue;
Relax(s, k, cap[s][k] - flow[s][k], cost[s][k], 1);
Relax(s, k, flow[k][s], -cost[k][s], -1);
if (best == -1 || dist[k] < dist[best]) best = k;
}
s = best;
}
for (int k = 0; k < N; k++)
pi[k] = min(pi[k] + dist[k], INF);
return width[t];
}
pair<L, L> GetMaxFlow(int s, int t) {
L totflow = 0, totcost = 0;
while (L amt = Dijkstra(s, t)) {
totflow += amt;
for (int x = t; x != s; x = dad[x].first) {
if (dad[x].second == 1) {
flow[dad[x].first][x] += amt;
totcost += amt * cost[dad[x].first][x];
} else {
flow[x][dad[x].first] -= amt;
totcost -= amt * cost[x][dad[x].first];
}
}
}
return make_pair(totflow, totcost);
}
};
My question is if it can be improved by using a priority queue inside of Dijkstra(). I tried but I couldn't get it to work properly.
Actually I suspect that in Dijkstra it should be looping over adjacent nodes, not all nodes...
Thanks a lot.
Surely Dijkstra's algorithm can be improved by using minheap. After we put a vertex into shortest-path tree and process (i.e. label) all adjacent vertices, our next step is to select the vertex with smallest label, not yet in the tree.
This is where minheap comes to mind. Rather than sequentially scan through all vertices, we extract the min element from heap and restructure it, which takes O(logn) time vs O(n). Note that the heap is going to keep only those vertices that are not yet in the shortest-path tree. However we should be able to somehow modify vertices in the heap, if we update their labels.
I am not so sure using a priority queue to implement Dijkstra's algorithm will actually improve the run time because, while using a priority queue decreases the amount of time needed to find the vertex with minimum distance from the source (O(log V) with a priority queue vs. O(V) in the naive implementation), it also increases the amount of time needed to process a new edge (O(log V) with a priority queue vs. O(1) in the naive implementation).
Thus, for the naive implementation, the running time is O(V^2+E).
However, for the priority queue implementation, the running time is O(V log V+E log V).
For very dense graphs, E could be O(V^2), which means the naive implementation would have running time O(V^2+V^2)=O(V^2) while the priority queue implementation would have running time O(V log V+V^2 log V)=O(V^2 log V). Thus, as you can see, the priority queue implementation actually has a worse worst-case run time in the case of dense graphs.
Given the fact that the people writing the above implementation stored the edges as an adjacency matrix rather than using adjacency lists, it looks like the people who wrote this code were expecting the graph to be a dense graph with O(V^2) edges, so it makes sense that they would use the naive implementation over the priority queue implementation here.
For more info about running time of Dijkstra's algorithm, read up on this Wikipedia page.