create pairs of vertices using adjacency list in linear time - c++

I have n number of vertices numbered 1...n and want to pair every vertex with all other vertices. That would result in n*(n-1)/2 number of edges. Each vertex has some strength.The difference between the strength of two vertices is the weight of the edge.I need to get the total weight. Using two loops I can do this in O(n^2) time. But I want to reduce the time.I can use adjacency list and using that create a graph of n*(n-1)/2 edges but how will I create the adjacency list without using two loops. The input takes only the number of vertices and the strength of each vertex.
for(int i=0;i<n;i++)
for(int j=i+1;j<n;j++)
{
int w=abs((strength[i]-strength[j]));
sum+=w;
}
this is what i did earlier.I need a better way to do this.

If there are O(N*N) edges, then you can't list them all in linear time.
However, if indeed all you need is to compute the sum, here's a solution in O(N*log(N)). You can kind of improve the solution by using instead O(N) sorting algorithm, such as radix sort.
#include <algorithm>
#include <cstdint>
// ...
std::sort(strength, strength+n);
uint64_t sum = 0;
int64_t runSum = strength[0];
for(int i=1; i<n; i++) {
sum += int64_t(i)*strength[i] - runSum;
runSum += strength[i];
}
// Now "sum" contains the sum of weigths over all edges
To explain the algorithm:
The idea is to avoid summing over all edges explicitly (requiring O(N*N)), but rather to add sums of several weights at once. Consider the last vertex n-1 and the average A[n-1] = (strength[0] + strength[1] + ... + strength[n-2])/(n-1): obviously we could add (strength[n-1] - A[n-1]) * (n-1), i.e. n-1 weights at once, if the weights were all larger than strength[n-1], or all smaller than it. However, due to abs operation, we would have to add different amounts depending on whether the strength of the other vertex is larger or smaller than the strength of the current vertex. So one solution is to sort the strengths first, so to ensure that each next strength is greater or equal to the previous.

Related

Hashing pairs of integers to array indices?

I have a set of sequential, 0-based integers representing vertex indices in a mesh.
Every vertex is connected to at least 2 other vertices, to form edges of the mesh.
Edges are represented by pairs of vertices. So, for example, (0, 2) might be one edge between vertex 0 and 2.
Currently, in order to quickly lookup edges in my mesh, I store my Edge class in a std::unordered_map, and generate hashes as follows:
//sorted so (0, 2) and (2, 0) will return same hash
__int64 GetEdgeHash (int vertex1, int vertex2)
{
return (__int64)min(vertex1, vertex2) * INT_MAX + max(vertex1, vertex2);
}
However, an unordered_map has enough overhead during creation and lookup that it has a noticeable performance impact elsewhere in my code. I'm wondering if there's a way to hash pairs of integer such that each pair corresponds to some index an array whose size is <= numVertices * 2 (since the number of edges in a mesh could never exceed that value). If that were possible, I could just use a normal std::vector to store my edges and processing them would be much faster.
Obviously that's not currently possible since my hash function will return values anywhere from 0 to 4611686016279904256.
A naive approach like:
int GetEdgeHash (int vertex1, int vertex2)
{
return vertex1 + vertex;
}
would satisfy the array size limitation, but obviously results in many collisions.
Is there another way to achieve the same goal?
A very simple solution could be based on your initial approach, however not using INT_MAX, but the number of existing vertices:
uint64_t numberOfVertices;
uint64_t index(uint32_t vertex1, uint32_t vertex2)
{
return vertex1 * numberOfVertices + vertex2;
}
This algorithm is collision free, but requires a vector size of the square of numberOfVertices; as is, though, is only applicable if you have a fix (or at least, a maximum) number of vertices.
If the number of vertices might increase beyond the maximum, you could e. g. duplicate the maximum each time it is exceeded, which would, however, require to re-"hash" all your nodes in the vector, i. e. this is an expensive operation and should occur as rarely as possible (duplication of maximum might already make this rare enough...).

Merging K Sorted Arrays/Vectors Complexity

While looking into the problem of merging k sorted contiguous arrays/vectors and how it differs in implementation from merging k sorted linked lists I found two relatively easy naive solutions for merging k contiguous arrays and a nice optimized method based off of pairwise-merging that simulates how mergeSort() works. The two naive solutions I implemented seem to have the same complexity, but in a big randomized test I ran it seems one is way more inefficient than the other.
Naive merging
My naive merging method works as follows. We create an output vector<int> and set it to the first of k vectors we are given. We then merge in the second vector, then the third, and so on. Since a typical merge() method that takes in two vectors and returns one is asymptotically linear in both space and time to the number of elements in both vectors the total complexity will be O(n + 2n + 3n + ... + kn) where n is the average number of elements in each list. Since we're adding 1n + 2n + 3n + ... + kn I believe the total complexity is O(n*k^2). Consider the following code:
vector<int> mergeInefficient(const vector<vector<int> >& multiList) {
vector<int> finalList = multiList[0];
for (int j = 1; j < multiList.size(); ++j) {
finalList = mergeLists(multiList[j], finalList);
}
return finalList;
}
Naive selection
My second naive solution works as follows:
/**
* The logic behind this algorithm is fairly simple and inefficient.
* Basically we want to start with the first values of each of the k
* vectors, pick the smallest value and push it to our finalList vector.
* We then need to be looking at the next value of the vector we took the
* value from so we don't keep taking the same value. A vector of vector
* iterators is used to hold our position in each vector. While all iterators
* are not at the .end() of their corresponding vector, we maintain a minValue
* variable initialized to INT_MAX, and a minValueIndex variable and iterate over
* each of the k vector iterators and if the current iterator is not an end position
* we check to see if it is smaller than our minValue. If it is, we update our minValue
* and set our minValue index (this is so we later know which iterator to increment after
* we iterate through all of them). We do a check after our iteration to see if minValue
* still equals INT_MAX. If it has, all iterators are at the .end() position, and we have
* exhausted every vector and can stop iterative over all k of them. Regarding the complexity
* of this method, we are iterating over `k` vectors so long as at least one value has not been
* accounted for. Since there are `nk` values where `n` is the average number of elements in each
* list, the time complexity = O(nk^2) like our other naive method.
*/
vector<int> mergeInefficientV2(const vector<vector<int> >& multiList) {
vector<int> finalList;
vector<vector<int>::const_iterator> iterators(multiList.size());
// Set all iterators to the beginning of their corresponding vectors in multiList
for (int i = 0; i < multiList.size(); ++i) iterators[i] = multiList[i].begin();
int k = 0, minValue, minValueIndex;
while (1) {
minValue = INT_MAX;
for (int i = 0; i < iterators.size(); ++i){
if (iterators[i] == multiList[i].end()) continue;
if (*iterators[i] < minValue) {
minValue = *iterators[i];
minValueIndex = i;
}
}
iterators[minValueIndex]++;
if (minValue == INT_MAX) break;
finalList.push_back(minValue);
}
return finalList;
}
Random simulation
Long story short, I built a simple randomized simulation that builds a multidimensional vector<vector<int>>. The multidimensional vector starts with 2 vectors each of size 2, and ends up with 600 vectors each of size 600. Each vector is sorted, and the sizes of the larger container and each child vector increase by two elements every iteration. I time how long it takes for each algorithm to perform like this:
clock_t clock_a_start = clock();
finalList = mergeInefficient(multiList);
clock_t clock_a_stop = clock();
clock_t clock_b_start = clock();
finalList = mergeInefficientV2(multiList);
clock_t clock_b_stop = clock();
I then built the following plot:
My calculations say the two naive solutions (merging and selecting) both have the same time complexity but the above plot shows them as very different. At first I rationalized this by saying there may be more overhead in one vs the other, but then realized that the overhead should be a constant factor and not produce a plot like the following. What is the explanation for this? I assume my complexity analysis is wrong?
Even if two algorithms have the same complexity (O(nk^2) in your case) they may end up having enormously different running times depending upon your size of input and the 'constant' factors involved.
For example, if an algorithm runs in n/1000 time and another algorithm runs in 1000n time, they both have the same asymptotic complexity but they shall have very different running times for 'reasonable' choices of n.
Moreover, there are effects caused by caching, compiler optimizations etc that may change the running time significantly.
For your case, although your calculation of complexities seem to be correct, but in the first case, the actual running time shall be (nk^2 + nk)/2 whereas in the second case, the running time shall be nk^2. Notice that the division by 2 may be significant because as k increases the nk term shall be negligible.
For a third algorithm, you can modify the Naive selection by maintaining a heap of k elements containing the first elements of all the k vectors. Then your selection process shall take O(logk) time and hence the complexity shall reduce to O(nklogk).

unsorted matrix search algorithm

is there a suitable algorithm that allows a program to search through an unsorted matrix in search of the biggest prime number within. The matrix is of size m*n and may be populated with other prime numbers and non-primes. The search must find the biggest prime.
I have studied the divide and conquer algorithms, and binary trees, and step-wise searches, but all of these deal with sorted matrices.
First of all, it doesn't matter if you are using m * n matrix or vector with m * n elements. Generally speaking, you will have to visit each matrix element at least once, as it is not sorted. There are few hints to make process faster.
If it is big matrix, you should visit elements row by row (and not column by column) as matrix is stored that way in memory so that elements from the same row will likely be in the cache once you access one of them.
Testing number's primeness is the most costly part of your task so if numbers in matrix are not too big, you can use Eratosthenes' sieve algorithm to make lookup of prime numbers in advance. https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes
If you don't use Eratosthenes' sieve, maybe it will be beneficial if you sort your numbers before algorithm so that you can test numbers from the greatest to the smallest. In that case, your algorithm can stop once the first prime number is found. If you don't sort it, you will have to test all numbers, which is probably slowest method.
You could do this:
for (int i = 0; i < m; m++)
{
for (int j = 0; j < n; j++)
{
if ((array[i][j] == *a prime number*)
&& (array[i][j] > biggestPrime))
{
biggestPrime = array[i][j];
}
}
}

Graph traversal of n steps

Given a simple undirected graph like this:
Starting in D, A, B or C (V_start)—I have to calculate how many possible paths there are from the starting point (V_start) to the starting point (V_start) of n steps, where each edge and vertex can be visited an unlimited amount of times.
I was thinking of doing a depth first search, stopping when steps > n || (steps == n && vertex != V_start), however, this becomes rather expensive if, for instance, n = 1000000. My next thought led me to combining DFS with dynamic programming, however, this is where I'm stuck.
(This is not homework, just me getting stuck playing around with graphs and algorithms for the sake of learning.)
How would I go about solving this in a reasonable time with a large n?
This task is solved by matrix multiplication.
Create matrix nxn containing 0s and 1s (1 for a cell mat[i][j] if there is path from i to j). Multiply this matrix by itself k times (consider using fast matrix exponentiation). Then in the matrix's cell mat[i][j] you have the number of paths with length k starting from i and ending in j.
NOTE: The fast matrix exponentiation is basically the same as the fast exponentiation, just that instead you multiply numbers you multiply matrices.
NOTE2: Lets assume n is the number of vertices in the graph. Then the algorithm I propose here runs in time complexity O(log k * n3) and has memory complexity of O(n 2). You can improve it a bit more if you use optimized matrix multiplication as described here. Then the time complexity will become O(log k * nlog27).
EDIT As requested by Antoine I include an explanation why this algorithm actually works:
I will prove the algorithm by induction. The base of the induction is obvious: initially I have in the matrix the number of paths of length 1.
Let us assume that until the power of k if I raise the matrix to the power of k I have in mat[i][j] the number of paths with length k between i and j.
Now lets consider the next step k + 1. It is obvious that every path of length k + 1 consists of prefix of length k and one more edge. This basically means that the paths of length k + 1 can be calculated by (here I denote by mat_pow_k the matrix raised to the kth power)
num_paths(x, y, k + 1) = sumi=0i<n mat_pow_k[x][i] * mat[i][y]
Again: n is the number of vertices in the graph. This might take a while to understand, but basically the initial matrix has 1 in its mat[i][y] cell only if there is direct edge between x and y. And we count all possible prefixes of such edge to form path of length k + 1.
However the last thing I wrote is actually calculating the k + 1st power of mat, which proves the step of the induction and my statement.
It's quite like a dynamic programming problem:
define a f[n][m] to be the number of paths from the starting point to point n in m steps
from every point n to its adjacent k, you have formula: f[k][m+1] = f[k][m+1] + f[n][m]
in the initialization, all f[n][m] will be 0, but f[starting_point][0] = 1
so you can calculate the final result
pseudo code:
memset(f, 0, sizeof(f));
f[starting_point][0] = 1;
for (int step = 0; step < n; ++step) {
for (int point = 0; point < point_num; ++point) {
for (int next_point = 0; next_point < point_num; ++ next_point) {
if (adjacent[point][next_point]) {
f[next_point][step+1] += f[point][step];
}
}
}
}
return f[starting_point][n]

Fast weighted random selection from very large set of values

I'm currently working on a problem that requires the random selection of an element from a set. Each of the elements has a weight(selection probability) associated with it.
My problem is that for sets with a small number of elements say 5-10, the complexity (running time) of the solution I was is acceptable, however as the number of elements increases say for 1K or 10K etc, the running time becomes unacceptable.
My current strategy is:
Select random value X with range [0,1)
Iterate elements summing their weights until the sum is greater than X
The element which caused the sum to exceed X is chosen and returned
For large sets and a large number of selections this process begins to exhibit quadratic behavior, in short is there a faster way? a better algorithm perhaps?
You want to use the Walker algorithm. With N elements, there's a set-up
cost of O(N). However, the sampling cost is O(1). See
A. J. Walker, An Efficient Method for Generating
Discrete Random Variables and General Distributions, ACM TOMS 3, 253-256
(1977).
Knuth, TAOCP, Vol 2, Sec 3.4.1.A.
The RandomSelect class of a RandomLib
implements this algorithm.
Assuming that the element weights are fixed, you can work with precomputed sums. This is like working with the cumulative probability function directly, rather than the density function.
The lookup can then be implemented as a binary search, and hence be log(N) in the number of elements.
A binary search obviously requires random_access to the container of the weights.
Alternatively, use a std::map<> and the upper_bound() method.
#include <iostream>
#include <map>
#include <stdlib.h>
int main ()
{
std::map<double, char> cumulative;
typedef std::map<double, char>::iterator It;
cumulative[.20]='a';
cumulative[.30]='b';
cumulative[.40]='c';
cumulative[.80]='d';
cumulative[1.00]='e';
const int numTests = 10;
for(int i = 0;
i != numTests;
++i)
{
double linear = rand()*1.0/RAND_MAX;
std::cout << linear << "\t" << cumulative.upper_bound(linear)->second << std::endl;
}
return 0;
}
If you have a quick enough way to sample a random element uniformly, you can use rejection sampling; all you need to know is the maximum weight. It would work as follows: Suppose the maximum weight is M. Pick a number X uniformly in [0,1]. Sample elements repeatedly until you find one whose weight is at least M*X; choose this one.
Or, an approximate solution: pick 100 elements uniformly at random; choose one proportional to weight within this set.