Really slow when getting maximum distance between m nodes (Backtracking) (c++) - c++

I've been asked to make an exercise using Backtracking, or Backtracking + Branch and Bound, where the imput data are n, m and a matrix(n x n). The n, represents a number of people, and the m, some people from the n. In the matrix, there are the distances among them, and the distances between i and j is different from the j and the i.
I am trying to get the maximum distance i can get from m nodes, that distance is the sum of the distance of all of them. For example, if i choose the node 1, 2 and 4, the result is the sums: distance(1, 2) + distance(2,1) + distance(2, 4) + distance(4, 2) + distance(1, 4) + distance(4, 1).
I have used Backtracking with Branch and Bound (iterative, not recursive), storing the nodes (structs where i store the current value and nodes used) that may get me to a solution. This nodes stores de lower and upper bound, i mean, the lower and higher solution I can obtain if i keep on using this node and his sons. From a node x, I generate all the possible nodes (nodes that are not used), and I check if this node may get me to a solution, if not, this node is discarded and erased.
The code i have implemented to make this, works, but it is really slowly. With low values of n and m, it is quick, but if i use higher numbers it is really slowly.
This is the main function and the others functions that are used:
void backtracking(int **matrix, int n, int m){
/////////////////////////////////////////////////////
/*
Part where I try to get the minimum/maximum that I can get from the beginning of the problem
*/
// Lists where I store the values from the matrix, sort from the minimum to the maximum, and from
// the maximum to the minimum. The values are the distances, I mean, the sum of matrix[i][j] and
// matrix[j][i].
list<int> listMinSums; // list of minimum sums
list<int> listMaxSums; // list of maximum sums
int nMinimumSums = floor((m*m - m)/2); // rounding down
int nMaximumSums = ceil((m*m - m)/2); // rounding up
/*
* m*m - m = Given m nodes, there are m*m - m sums.
*
* I count matrix[i][j] + matrix[j][i] as one, so there
* are (m*m - m)/2 sums.
*/
for (int i = 0; i < n; i++){
for (int j = 0; j < i; j++){
int x = (matrix[i][j] + matrix[j][i]);
// to differentiate from the minimum and maximum sums, I use false and true
aLista(listMinSums, x, nMinimumSums, false);
aLista(listMaxSums, x, nMaximumSums, true);
}
}
int min = 0;
int max = 0;
int contador = 0; // counting in every iteration to not surpassing the minimum/maximum sums
list<int>::iterator it = listMinSums.begin();
while (it != listMinSums.end() && contador < nMinimumSums){
min += *it;
it++;
contador++;
}
contador = 0;
list<int>::iterator it2 = listMaxSums.begin();
while (it2 != listMaxSums.end() && contador < nMaximumSums){
max += *it2;
it2++;
contador++;
}
//////////////////////////////////////////////////////////////////////////
//////////////////////////////////////////////////////////////////////////
//////////////////////////////////////////////////////////////////////////
// LLV = List of Live Nodes. Where I store the nodes that are going to
// guide me to the solution
list<nodo*> llv;
// I do not store the root node, i store the first n nodes, of the first level.
for (int i = 0; i < n; i++){
nodo *nod = new nodo(n);
nod ->level = 0;
//lower bound. It's the lower solution i can get from this node
nod ->ci = min;
// upper bound. The higher solution i can get from this node.
nod ->cs = max;
// estimated benefit. If i have to choose from one node or another, i choose the one with higher
nod ->be = (min+max)/2;
// The node i is used
nod ->used[i] = true;
// Inserting this node to the list of live nodes.
insert(llv, nod);
}
int solution = 0; // Initial solution
int c = min; // c = variable used to not use a node and his "sons", anymore
while (!empty(llv)){
nodo *x = erase(llv, n); // erasing the node with the highest estimated benefit from the llv.
if (x ->cs > c){
for (int i = 0; i < n; i++){ // Creating every son of the x node...
if (!(x ->used[i])){ // ... that has not being used yet
nodo *y = new nodo(n);
y ->level = x ->level + 1;
for (int j = 0; j < n; j++){
y ->used[j] = x ->used[j];
}
y ->used[i] = true;
// Adding the values. For example, if node 1 and 2 were inserted, and this is the node 4,
// adding matrix[1][4]+matrix[4][1]+matrix[2][4] + matrix[4][2]
int acum = 0;
for (int k = 0; k < n; k++){
if (k != i && consult(x ->used, k))
acum += matrix[i][k] + matrix[k][i];
}
y ->bact = x ->bact + acum;
// Getting the lower and upper bound of this node y.
cotas(y, x, i, y ->level, n, matrix);
y ->be = (y ->ci + y ->cs)/2;
// Node where i can get the solution
if (y ->level == (m-1) && y ->bact > solution){
solution = y ->bact;
if (y ->bact > c)
c = y ->bact;
}
// Checking if i can update c
else if (y ->level != (m-1) && y ->cs > c){
insert(llv, y);
if (y ->ci > c)
c = y ->ci;
}
else{
// i cannot use this node anymore, so i delete it.
delete y;
}
}
}
}
}
cout << solution << endl;
liberacionMemoria(matrix, n); // freeing the memory used in the matrix
}
void liberacionMemoria(int **matriz, int n){
for (int i = 0; i < n; i++)
delete[] matriz[i];
delete[] matriz;
}
void insert(list<nodo*> &t, nodo *x){
list<nodo*>::iterator it= t.begin();
t.insert(it, x);
}
/*
* Getting the node with hightest estimated benefit from the list of live nodes
* */
nodo* erase (list<nodo*> &t, int n){
nodo* erased = new nodo(n);
erased ->level = -1;
erased ->be = -1;
list<nodo*>::iterator it= t.begin();
list<nodo*>::iterator it2;
while (it != t.end()){
nodo* aux = *it;
if (aux ->be > erased ->be){
it2 = it;
erased = aux;
}
else if (aux ->be == erased ->be && aux ->level > erased ->level){
it2 = it;
erased = aux;
}
it++;
}
t.erase(it2);
return erased;
}
/*
* Checking if in the array of nodes used, the node in the x position it's used
* */
bool consult(bool *nodesUsed, int x){
if (nodesUsed[x])
return true;
return false;
}
bool empty(list<nodo*> &t){
list<nodo*>::iterator it= t.begin();
return (it==t.end());
}
bool aLista(list<int> &t, int x, int m, bool MayorAMenor){
list<int>::iterator it = t.begin();
int contador = 0;
while (it != t.end() && contador < m){
if (!MayorAMenor){ // lower to upper
if (*it > x){
t.insert(it, x);
return true;
}
}
else{
if (*it < x){
t.insert(it, x);
return true;
}
}
contador++;
it++;
}
if (it == t.end() && contador < m){
t.insert(it, x);
return true;
}
return false;
}
void cotas(nodo *sonNode, nodo *fatherNode, int elegido, int m, int n, int **matriz){
int max = 0;
int min = 999;
// Getting the sums from the chosen node with the already used
for (int i = 0; i < n; i++){
if (consult(sonNode ->used, i)){
if (elegido != i){
int x = matriz[i][elegido] + matriz[elegido][i];
if (x > max)
max = x;
if (x < min)
min = x;
}
}
}
min *= (m-1);
max *= (m-1);
min += fatherNode -> bact;
max += fatherNode -> bact;
sonNode -> ci = fatherNode ->ci - min;
sonNode -> cs = fatherNode ->cs - max;
}
I think, that the reason of going really slow with n and m a bit high, it is really slowly, it is because of the upper and lower bounds of the nodes, that are not accurate, but i don't know how to make it better.
I've been many days thinking how to do it, and trying to, but nothing works.
Here there are some examples:
Given an n = 4 and m = 2 and the following matrix:
0 3 2 4
2 0 4 5
2 1 0 4
2 3 2 0
the result is 8. This works and it is quickly.
But with n = 40 and m = 10 it never ends...
I hope someone may help me. Thanks.
****EDIT******
I may not have explained well. My doubt is, how can i know, from a node x, the less and the maximum I can get.
Because, the length of the solution nodes depends on m, but the solution changes if i choose some nodes or others, and I don't know how to be sure, of obtaining the less and the maximum from a node, but being accurate, to be able to cut the others branchs that do not guide me to a solution

Related

Finding heaviest path (biggest sum of weights) of an undirected weighted graph? Bellman Ford --

There's a matrix, each of its cell contains an integer value (both positive and negative). You're given an initial position in the matrix, now you have to find a path that the sum of all the cells you've crossed is the biggest. You can go up, down, right, left and only cross a cell once.
My solution is using Bellman Ford algorithm: Let's replace all the values by their opposite number, now we've just got a new matrix. Then, I create an undirected graph from the new matrix, each cell is a node, stepping on a cell costs that cell's value - it's the weight. So, I just need to find the shortest path of the graph using Bellman-Ford algorithm. That path will be the longest path of our initial matrix.
Well, there's a problem. The graph contains negative cycles, also has too many nodes and edges. The result, therefore, isn't correct.
This is my code:
Knowing that xd and yd is the initial coordinate of the robot.
void MatrixToEdgelist()
{
int k = 0;
for (int i=1;i<=n;i++)
for (int j=1;j<=n;j++)
{
int x = (i - 1) * n + j;
int y = x + 1;
int z = x + n;
if (j<n)
{
edges.push_back(make_tuple(x, y, a[i][j+1]));
}
if (i<n)
{
edges.push_back(make_tuple(x, z, a[i+1][j]));
}
}
}
void BellmanFord(Robot r){
int x = r.getXd();
int y = r.getYd();
int z = (x-1)*n + y;
int l = n*n;
int distance[100];
int previous[100]{};
int trace[100];
trace[1] = z;
for (int i = 1; i <= l; i++) {
distance[i] = INF;
}
distance[z] = a[x][y];
for (int i = 1; i <= l-1; i++) {
for (auto e : edges) {
int a, b, w;
tie(a, b, w) = e;
//distance[b] = min(distance[b], distance[a]+w);
if (distance[b] < distance[a] + w)// && previous[a] != b)
{
distance[b] = distance[a] + w;
previous[b] = a;
}
}
}
//print result
int Max=INF;
int node;
for (int i=2;i<=l;i++)
{
if (Max < distance[i])
{
Max = distance[i];
node = i;
}
}
if (Max<0)cout << Max << "\n";
else cout << Max << "\n";
vector<int> ans;
int i = node;
ans.push_back(i);
while (i != z)
{
i = previous[i];
ans.push_back(i);
}
for (int i=ans.size()-1;i>=0;i--)
{
int x, y;
if (ans[i] % n == 0)
{
x = ans[i] / n;
y = n;
}
else{
x = ans[i] / n + 1;
y = ans[i] - (( x - 1 ) * n);
}
cout << x << " " << y << "\n";
}
}
Example matrix
The result
Clearly that the distance should have continued to update, but it doesn't. It stops at the final node.
"Let's replace all the values by their opposite number"
Not sure what you mean by an opposite number. Anyway, that is incorrect.
If you have negative weights, then the usual solution is to add the absolute value of the most negative weight to EVERY weight.
Why Bellman-Ford? Dijkstra should be sufficient for this problem. ( By default Dijkstra finds the cheapest path. You find the most expensive by assigning the absolute value of ( the original weight minus the greatest ) to every link. )

Find a path between 2 points in a maze with minimum turns

The problem:
Given a 2D matrix consist of 0 and 1, you can only go in location with 1. Start at point (x, y), we can move to 4 adjacent points: up, down, left, right; which are: (x+1, y), (x-1, y), (x, y+1), (x, y-1).
Find a path from point (x, y) to point (s, t) so that it has the least number of turns.
My question:
I tried to solve this problem using dijsktra, it got most of the cases right, but in some cases, it didn't give the most optimal answer.
Here's my code:
pair<int,int> go[4] = {{-1,0}, {0,1}, {1,0}, {0,-1}};
bool minimize(int &x, const int &y){
if(x > y){
x = y;
return true;
}return false;
}
struct Node{
pair<int,int> point;
int turn, direc;
Node(pii _point, int _turn, int _direc){
point = _point;
turn = _turn;
direc = _direc;
}
bool operator < (const Node &x) const{
return turn > x.turn;
}
};
void dijkstra(){
memset(turns, 0x3f, sizeof turns);
turns[xHome][yHome] = -1;
priority_queue<Node> pq;
pq.push(Node({xHome, yHome}, -1, -1));
while(!pq.empty()){
while(!pq.empty() &&
pq.top().turn > turns[pq.top().point.first][pq.top().point.second])pq.pop();
if(pq.empty())break;
pii point = pq.top().point;
int direc = pq.top().direc;
pq.pop();
for(int i = 0; i < 4; i++){
int x = point.first + go[i].first ;
int y = point.second + go[i].second;
if(!x || x > row || !y || y > col)continue;
if(matrix[x][y])
if(minimize(turns[x][y], turns[point.first ][point.second] + (i != direc)))
pq.push(Node({x, y}, turns[x][y], i));
}
}
}
P/S: The main solving is in void dijkstra, the others are just to give some more information in case you guys need it.
I have found a way to solve this problem, storing directions and using BFS() to reduce the time complexity:
struct Node{
short row, col;
char dir;
Node(int _row = 0, int _col = 0, int _dir = 0){
row = _row; col = _col; dir = _dir;
}
};
void BFS(){
memset(turns, 0x3f, sizeof turns);
deque<pair<int, Node> > dq;
for(int i = 0; i < 4; i++){
Node s(xHome + dx[i], yHome + dy[i], i);
if(!matrix[s.row][s.col])continue;
turns[s.row][s.col][s.dir] = 0;
dq.push_back({0, s});
}
while(!dq.empty()){
int d = dq.front().fi;
Node u = dq.front().se;
dq.pop_front();
if(d != turns[u.row][u.col][u.dir])continue;
for(int i = 0; i < 4; i++){
Node v(u.row + dx[i], u.col + dy[i], i);
if(!matrix[v.row][v.col])continue;
if(minimize(turns[v.row][v.col][v.dir], turns[u.row][u.col][u.dir] + (i != u.dir))){
if(i == u.dir)dq.push_front({turns[v.row][v.col][v.dir], v});
else dq.push_back({turns[v.row][v.col][v.dir], v});
trace[v.row][v.col][v.dir] = u;
}
}
}
}
An obvious error in you algorithm is that to detect the length of the path start->x->y, you should store all directions to x that can form a shortest path from start to x.
For example, suppose start=(0,0),x=(1,1),y=(1,2) and there are two paths from start to x: start->(0,1)->x, start->(1,0)->x, both have the shortest length. However, start->(0,1)->x->y has two turns while start->(1,0)->x->y has only one turn. So you need to store all the directions for each node (in this case, you should store both the directions (0,1)->x and (1,0)->x in x.

Find the sum of weights of edges between every pair of nodes in a weighted tree

I need to find an efficient way to find the sum of values of all simple paths in a weighted tree. The value of a simple path is defined as the sum of weights of all edges in the given simple path.
This is my try, but it is not working. Please tell the correct approach.
#include <iostream>
#include <vector>
using namespace std;
typedef long long ll;
typedef pair<int, ll> pil;
const int MAXN = 1e5;
int n, color[MAXN + 2];
vector<pil> adj[MAXN + 2];
ll sum1, cnt1[MAXN + 2], cnt[MAXN + 2], res;
void visit(int u, int p)
{
cnt[u] = 1;
cnt1[u] = color[u];
for (int i = 0; i < (int) adj[u].size(); ++i)
{
int v = adj[u][i].first;
ll w = adj[u][i].second;
if (v == p)
continue;
visit(v, u);
ll tmp = cnt1[v] * (n - sum1 - cnt[v] + cnt1[v]);
tmp += (cnt[v] - cnt1[v]) * (sum1 - cnt1[v]);
res += tmp * w;
cnt[u] += cnt[v];
cnt1[u] += cnt1[v];
}
}
int main()
{
scanf("%d", &n);
for (int i = 1; i <= n; ++i)
{
scanf("%d", color + i);
sum1 += color[i];
}
for (int i = 1, u, v; i < n; ++i)
{
scanf("%d %d %lld", &u, &v, &res);
adj[u].push_back(pil(v, res));
adj[v].push_back(pil(u, res));
}
res = 0;
visit(1, -1);
printf("%lld\n", res);
return 0;
}
Below is Simple implementation of what #Arjun Singh explained.
int64_t ans = 0;
int dfs(int node, int parent) {
int cur_subtree_size = 1;
for(int child : adj[node]) {
if(parent != child) {
int child_subtree_size = dfs(child, node);
int64_t contribution_of_cur_edge = child_subtree_size * (N - child_subtree_size) * weight[{node, child}];
ans += contribution_of_cur_edge;
cur_subtree_size += child_subtree_size;
}
}
return cur_subtree_size;
}
You can calculate the contribution of each edge in the final answer. Let's say an edge connects two components component1 ---- component2. Then the contribution of this edge in final answer will be -
Vertices(component1) * Vertices(component2)*edge_weight.
The no. of vertices in each component can be found easily running a dfs and calculating the number of vertices in the subtree of each vertex. Let an edge connect vertex u and v. u is the parent of v. Then,
Vertices(v) = Number of vertices in the subtree of v = Vertices(component1)
Vertices(component2) = n - Vertices(v)
You can precalculate this subtree array. So final time complexity will be O(n).

From recursive algorithm to an iterative one

I have to convert this recursive algorithm into an iterative one:
int alg(int A[], int x, int y, int k){
int val = 0;
if (x <= y){
int z = (x+y)/2;
if(A[z] == k){
val = 1;
}
int a = alg(A,x,z-1,k);
int b;
if(a > val){
b = alg(A,z+1,y,k);
}else{
b = a + val;
}
val += a + b;
}
return val;
}
I tried with a while loop, but I can't figure out how I can calculate "a" and "b" variables, so I did this:
int algIterative(int A[], int x, int y, int k){
int val = 0;
while(x <= y){
int z = (x+y)/2;
if(A[z] == k){
val = 1;
}
y = z-1;
}
}
But actually I couldn't figure out what this algorithm does.
My questions are:
What does this algorithm do?
How can I convert it to iterative?
Do I need to use stacks?
Any help will be appreciated.
I am not sure that alg computes anything useful.
It processes the part of the array A between the indexes x and y, and computes a kind of counter.
If the interval is empty, the returned value (val) is 0. Otherwise, if the middle element of this subarray equals k, val is set to 1. Then the values for the left and right subarrays are added and the total is returned. So in a way, it counts the number of k's in the array.
But, if the count on the left side is found to be not larger than val, i.e. 0 if val = 0 or 0 or 1 if val = 1, the value on the right is evaluated as the value on the left + val.
Derecursivation might be possible without a stack. If you look at the sequence of subintervals that are traversed, you can reconstruct it from the binary representation of N. Then the result of the function is the accumulation of partials results collected along a postorder process.
If the postorder can be turned to inorder, this will reduce to a single linear pass over A. This is a little technical.
A simple way could be smt like this with the aid of a two dimensional array:
int n = A.length;
int[][]dp = new int[n][n];
for(int i = n - 1;i >= 0; i--){
for(int j = i; j < n; j++){
// This part is almost similar to the recursive part.
int z = (i+j)/2;
int val = 0;
if(A[z] == k){
val = 1;
}
int a = z > i ? dp[i][z - 1] : 0;
int b;
if(a > val){
b = (z + 1 <= j) ? dp[z + 1][j] : 0;
}else{
b = a + val;
}
val += a + b;
dp[i][j] = val;
}
}
return dp[0][n - 1];
Explanation:
Notice that for i, it is decreasing, and j, it is increasing, so, when calculate dp[x][y], you need dp[x][z - 1] (with z - 1 < j) and dp[z + 1][j] (with z >= i), and those values should already be populated.

What is the fastest search method for a sorted array?

Answering to another question, I wrote the program below to compare different search methods in a sorted array. Basically I compared two implementations of Interpolation search and one of binary search. I compared performance by counting cycles spent (with the same set of data) by the different variants.
However I'm sure there is ways to optimize these functions to make them even faster. Does anyone have any ideas on how can I make this search function faster? A solution in C or C++ is acceptable, but I need it to process an array with 100000 elements.
#include <stdlib.h>
#include <stdio.h>
#include <time.h>
#include <stdint.h>
#include <assert.h>
static __inline__ unsigned long long rdtsc(void)
{
unsigned long long int x;
__asm__ volatile (".byte 0x0f, 0x31" : "=A" (x));
return x;
}
int interpolationSearch(int sortedArray[], int toFind, int len) {
// Returns index of toFind in sortedArray, or -1 if not found
int64_t low = 0;
int64_t high = len - 1;
int64_t mid;
int l = sortedArray[low];
int h = sortedArray[high];
while (l <= toFind && h >= toFind) {
mid = low + (int64_t)((int64_t)(high - low)*(int64_t)(toFind - l))/((int64_t)(h-l));
int m = sortedArray[mid];
if (m < toFind) {
l = sortedArray[low = mid + 1];
} else if (m > toFind) {
h = sortedArray[high = mid - 1];
} else {
return mid;
}
}
if (sortedArray[low] == toFind)
return low;
else
return -1; // Not found
}
int interpolationSearch2(int sortedArray[], int toFind, int len) {
// Returns index of toFind in sortedArray, or -1 if not found
int low = 0;
int high = len - 1;
int mid;
int l = sortedArray[low];
int h = sortedArray[high];
while (l <= toFind && h >= toFind) {
mid = low + ((float)(high - low)*(float)(toFind - l))/(1+(float)(h-l));
int m = sortedArray[mid];
if (m < toFind) {
l = sortedArray[low = mid + 1];
} else if (m > toFind) {
h = sortedArray[high = mid - 1];
} else {
return mid;
}
}
if (sortedArray[low] == toFind)
return low;
else
return -1; // Not found
}
int binarySearch(int sortedArray[], int toFind, int len)
{
// Returns index of toFind in sortedArray, or -1 if not found
int low = 0;
int high = len - 1;
int mid;
int l = sortedArray[low];
int h = sortedArray[high];
while (l <= toFind && h >= toFind) {
mid = (low + high)/2;
int m = sortedArray[mid];
if (m < toFind) {
l = sortedArray[low = mid + 1];
} else if (m > toFind) {
h = sortedArray[high = mid - 1];
} else {
return mid;
}
}
if (sortedArray[low] == toFind)
return low;
else
return -1; // Not found
}
int order(const void *p1, const void *p2) { return *(int*)p1-*(int*)p2; }
int main(void) {
int i = 0, j = 0, size = 100000, trials = 10000;
int searched[trials];
srand(-time(0));
for (j=0; j<trials; j++) { searched[j] = rand()%size; }
while (size > 10){
int arr[size];
for (i=0; i<size; i++) { arr[i] = rand()%size; }
qsort(arr,size,sizeof(int),order);
unsigned long long totalcycles_bs = 0;
unsigned long long totalcycles_is_64 = 0;
unsigned long long totalcycles_is_float = 0;
unsigned long long totalcycles_new = 0;
int res_bs, res_is_64, res_is_float, res_new;
for (j=0; j<trials; j++) {
unsigned long long tmp, cycles = rdtsc();
res_bs = binarySearch(arr,searched[j],size);
tmp = rdtsc(); totalcycles_bs += tmp - cycles; cycles = tmp;
res_is_64 = interpolationSearch(arr,searched[j],size);
assert(res_is_64 == res_bs || arr[res_is_64] == searched[j]);
tmp = rdtsc(); totalcycles_is_64 += tmp - cycles; cycles = tmp;
res_is_float = interpolationSearch2(arr,searched[j],size);
assert(res_is_float == res_bs || arr[res_is_float] == searched[j]);
tmp = rdtsc(); totalcycles_is_float += tmp - cycles; cycles = tmp;
}
printf("----------------- size = %10d\n", size);
printf("binary search = %10llu\n", totalcycles_bs);
printf("interpolation uint64_t = %10llu\n", totalcycles_is_64);
printf("interpolation float = %10llu\n", totalcycles_is_float);
printf("new = %10llu\n", totalcycles_new);
printf("\n");
size >>= 1;
}
}
If you have some control over the in-memory layout of the data, you might want to look at Judy arrays.
Or to put a simpler idea out there: a binary search always cuts the search space in half. An optimal cut point can be found with interpolation (the cut point should NOT be the place where the key is expected to be, but the point which minimizes the statistical expectation of the search space for the next step). This minimizes the number of steps but... not all steps have equal cost. Hierarchical memories allow executing a number of tests in the same time as a single test, if locality can be maintained. Since a binary search's first M steps only touch a maximum of 2**M unique elements, storing these together can yield a much better reduction of search space per-cacheline fetch (not per comparison), which is higher performance in the real world.
n-ary trees work on that basis, and then Judy arrays add a few less important optimizations.
Bottom line: even "Random Access Memory" (RAM) is faster when accessed sequentially than randomly. A search algorithm should use that fact to its advantage.
Benchmarked on Win32 Core2 Quad Q6600, gcc v4.3 msys. Compiling with g++ -O3, nothing fancy.
Observation - the asserts, timing and loop overhead is about 40%, so any gains listed below should be divided by 0.6 to get the actual improvement in the algorithms under test.
Simple answers:
On my machine replacing the int64_t with int for "low", "high" and "mid" in interpolationSearch gives a 20% to 40% speed up. This is the fastest easy method I could find. It is taking about 150 cycles per look-up on my machine (for the array size of 100000). That's roughly the same number of cycles as a cache miss. So in real applications, looking after your cache is probably going to be the biggest factor.
Replacing binarySearch's "/2" with a ">>1" gives a 4% speed up.
Using STL's binary_search algorithm, on a vector containing the same data as "arr", is about the same speed as the hand coded binarySearch. Although on the smaller "size"s STL is much slower - around 40%.
I have an excessively complicated solution, which requires a specialized sorting function. The sort is slightly slower than a good quicksort, but all of my tests show that the search function is much faster than a binary or interpolation search. I called it a regression sort before I found out that the name was already taken, but didn't bother to think of a new name (ideas?).
There are three files to demonstrate.
The regression sort/search code:
#include <sstream>
#include <math.h>
#include <ctime>
#include "limits.h"
void insertionSort(int array[], int length) {
int key, j;
for(int i = 1; i < length; i++) {
key = array[i];
j = i - 1;
while (j >= 0 && array[j] > key) {
array[j + 1] = array[j];
--j;
}
array[j + 1] = key;
}
}
class RegressionTable {
public:
RegressionTable(int arr[], int s, int lower, int upper, double mult, int divs);
RegressionTable(int arr[], int s);
void sort(void);
int find(int key);
void printTable(void);
void showSize(void);
private:
void createTable(void);
inline unsigned int resolve(int n);
int * array;
int * table;
int * tableSize;
int size;
int lowerBound;
int upperBound;
int divisions;
int divisionSize;
int newSize;
double multiplier;
};
RegressionTable::RegressionTable(int arr[], int s) {
array = arr;
size = s;
multiplier = 1.35;
divisions = sqrt(size);
upperBound = INT_MIN;
lowerBound = INT_MAX;
for (int i = 0; i < size; ++i) {
if (array[i] > upperBound)
upperBound = array[i];
if (array[i] < lowerBound)
lowerBound = array[i];
}
createTable();
}
RegressionTable::RegressionTable(int arr[], int s, int lower, int upper, double mult, int divs) {
array = arr;
size = s;
lowerBound = lower;
upperBound = upper;
multiplier = mult;
divisions = divs;
createTable();
}
void RegressionTable::showSize(void) {
int bytes = sizeof(*this);
bytes = bytes + sizeof(int) * 2 * (divisions + 1);
}
void RegressionTable::createTable(void) {
divisionSize = size / divisions;
newSize = multiplier * double(size);
table = new int[divisions + 1];
tableSize = new int[divisions + 1];
for (int i = 0; i < divisions; ++i) {
table[i] = 0;
tableSize[i] = 0;
}
for (int i = 0; i < size; ++i) {
++table[((array[i] - lowerBound) / divisionSize) + 1];
}
for (int i = 1; i <= divisions; ++i) {
table[i] += table[i - 1];
}
table[0] = 0;
for (int i = 0; i < divisions; ++i) {
tableSize[i] = table[i + 1] - table[i];
}
}
int RegressionTable::find(int key) {
double temp = multiplier;
multiplier = 1;
int minIndex = table[(key - lowerBound) / divisionSize];
int maxIndex = minIndex + tableSize[key / divisionSize];
int guess = resolve(key);
double t;
while (array[guess] != key) {
// uncomment this line if you want to see where it is searching.
//cout << "Regression Guessing " << guess << ", not there." << endl;
if (array[guess] < key) {
minIndex = guess + 1;
}
if (array[guess] > key) {
maxIndex = guess - 1;
}
if (array[minIndex] > key || array[maxIndex] < key) {
return -1;
}
t = ((double)key - array[minIndex]) / ((double)array[maxIndex] - array[minIndex]);
guess = minIndex + t * (maxIndex - minIndex);
}
multiplier = temp;
return guess;
}
inline unsigned int RegressionTable::resolve(int n) {
float temp;
int subDomain = (n - lowerBound) / divisionSize;
temp = n % divisionSize;
temp /= divisionSize;
temp *= tableSize[subDomain];
temp += table[subDomain];
temp *= multiplier;
return (unsigned int)temp;
}
void RegressionTable::sort(void) {
int * out = new int[int(size * multiplier)];
bool * used = new bool[int(size * multiplier)];
int higher, lower;
bool placed;
for (int i = 0; i < size; ++i) {
/* Figure out where to put the darn thing */
higher = resolve(array[i]);
lower = higher - 1;
if (higher > newSize) {
higher = size;
lower = size - 1;
} else if (lower < 0) {
higher = 0;
lower = 0;
}
placed = false;
while (!placed) {
if (higher < size && !used[higher]) {
out[higher] = array[i];
used[higher] = true;
placed = true;
} else if (lower >= 0 && !used[lower]) {
out[lower] = array[i];
used[lower] = true;
placed = true;
}
--lower;
++higher;
}
}
int index = 0;
for (int i = 0; i < size * multiplier; ++i) {
if (used[i]) {
array[index] = out[i];
++index;
}
}
insertionSort(array, size);
}
And then there is the regular search functions:
#include <iostream>
using namespace std;
int binarySearch(int array[], int start, int end, int key) {
// Determine the search point.
int searchPos = (start + end) / 2;
// If we crossed over our bounds or met in the middle, then it is not here.
if (start >= end)
return -1;
// Search the bottom half of the array if the query is smaller.
if (array[searchPos] > key)
return binarySearch (array, start, searchPos - 1, key);
// Search the top half of the array if the query is larger.
if (array[searchPos] < key)
return binarySearch (array, searchPos + 1, end, key);
// If we found it then we are done.
if (array[searchPos] == key)
return searchPos;
}
int binarySearch(int array[], int size, int key) {
return binarySearch(array, 0, size - 1, key);
}
int interpolationSearch(int array[], int size, int key) {
int guess = 0;
double t;
int minIndex = 0;
int maxIndex = size - 1;
while (array[guess] != key) {
t = ((double)key - array[minIndex]) / ((double)array[maxIndex] - array[minIndex]);
guess = minIndex + t * (maxIndex - minIndex);
if (array[guess] < key) {
minIndex = guess + 1;
}
if (array[guess] > key) {
maxIndex = guess - 1;
}
if (array[minIndex] > key || array[maxIndex] < key) {
return -1;
}
}
return guess;
}
And then I wrote a simple main to test out the different sorts.
#include <iostream>
#include <iomanip>
#include <cstdlib>
#include <ctime>
#include "regression.h"
#include "search.h"
using namespace std;
void randomizeArray(int array[], int size) {
for (int i = 0; i < size; ++i) {
array[i] = rand() % size;
}
}
int main(int argc, char * argv[]) {
int size = 100000;
string arg;
if (argc > 1) {
arg = argv[1];
size = atoi(arg.c_str());
}
srand(time(NULL));
int * array;
cout << "Creating Array Of Size " << size << "...\n";
array = new int[size];
randomizeArray(array, size);
cout << "Sorting Array...\n";
RegressionTable t(array, size, 0, size*2.5, 1.5, size);
//RegressionTable t(array, size);
t.sort();
int trials = 10000000;
int start;
cout << "Binary Search...\n";
start = clock();
for (int i = 0; i < trials; ++i) {
binarySearch(array, size, i % size);
}
cout << clock() - start << endl;
cout << "Interpolation Search...\n";
start = clock();
for (int i = 0; i < trials; ++i) {
interpolationSearch(array, size, i % size);
}
cout << clock() - start << endl;
cout << "Regression Search...\n";
start = clock();
for (int i = 0; i < trials; ++i) {
t.find(i % size);
}
cout << clock() - start << endl;
return 0;
}
Give it a try and tell me if it's faster for you. It's super complicated, so it's really easy to break it if you don't know what you are doing. Be careful about modifying it.
I compiled the main with g++ on ubuntu.
Unless your data is known to have special properties, pure interpolation search has the risk of taking linear time. If you expect interpolation to help with most data but don't want it to hurt in the case of pathological data, I would use a (possibly weighted) average of the interpolated guess and the midpoint, ensuring a logarithmic bound on the run time.
One way of approaching this is to use a space versus time trade-off. There are any number of ways that could be done. The extreme way would be to simply make an array with the max size being the max value of the sorted array. Initialize each position with the index into sortedArray. Then the search would simply be O(1).
The following version, however, might be a little more realistic and possibly be useful in the real world. It uses a "helper" structure that is initialized on the first call. It maps the search space down to a smaller space by dividing by a number that I pulled out of the air without much testing. It stores the index of the lower bound for a group of values in sortedArray into the helper map. The actual search divides the toFind number by the chosen divisor and extracts the narrowed bounds of sortedArray for a normal binary search.
For example, if the sorted values range from 1 to 1000 and the divisor is 100, then the lookup array might contain 10 "sections". To search for value 250, it would divide it by 100 to yield integer index position 250/100=2. map[2] would contain the sortedArray index for values 200 and larger. map[3] would have the index position of values 300 and larger thus providing a smaller bounding position for a normal binary search. The rest of the function is then an exact copy of your binary search function.
The initialization of the helper map might be more efficient by using a binary search to fill in the positions rather than a simple scan, but it is a one time cost so I didn't bother testing that. This mechanism works well for the given test numbers which are evenly distributed. As written, it would not be as good if the distribution was not even. I think this method could be used with floating point search values too. However, extrapolating it to generic search keys might be harder. For example, I am unsure what the method would be for character data keys. It would need some kind of O(1) lookup/hash that mapped to a specific array position to find the index bounds. It's unclear to me at the moment what that function would be or if it exists.
I kludged the setup of the helper map in the following implementation pretty quickly. It is not pretty and I'm not 100% sure it is correct in all cases but it does show the idea. I ran it with a debug test to compare the results against your existing binarySearch function to be somewhat sure it works correctly.
The following are example numbers:
100000 * 10000 : cycles binary search = 10197811
100000 * 10000 : cycles interpolation uint64_t = 9007939
100000 * 10000 : cycles interpolation float = 8386879
100000 * 10000 : cycles binary w/helper = 6462534
Here is the quick-and-dirty implementation:
#define REDUCTION 100 // pulled out of the air
typedef struct {
int init; // have we initialized it?
int numSections;
int *map;
int divisor;
} binhelp;
int binarySearchHelp( binhelp *phelp, int sortedArray[], int toFind, int len)
{
// Returns index of toFind in sortedArray, or -1 if not found
int low;
int high;
int mid;
if ( !phelp->init && len > REDUCTION ) {
int i;
int numSections = len / REDUCTION;
int divisor = (( sortedArray[len-1] - 1 ) / numSections ) + 1;
int threshold;
int arrayPos;
phelp->init = 1;
phelp->divisor = divisor;
phelp->numSections = numSections;
phelp->map = (int*)malloc((numSections+2) * sizeof(int));
phelp->map[0] = 0;
phelp->map[numSections+1] = len-1;
arrayPos = 0;
// Scan through the array and set up the mapping positions. Simple linear
// scan but it is a one-time cost.
for ( i = 1; i <= numSections; i++ ) {
threshold = i * divisor;
while ( arrayPos < len && sortedArray[arrayPos] < threshold )
arrayPos++;
if ( arrayPos < len )
phelp->map[i] = arrayPos;
else
// kludge to take care of aliasing
phelp->map[i] = len - 1;
}
}
if ( phelp->init ) {
int section = toFind / phelp->divisor;
if ( section > phelp->numSections )
// it is bigger than all values
return -1;
low = phelp->map[section];
if ( section == phelp->numSections )
high = len - 1;
else
high = phelp->map[section+1];
} else {
// use normal start points
low = 0;
high = len - 1;
}
// the following is a direct copy of the Kriss' binarySearch
int l = sortedArray[low];
int h = sortedArray[high];
while (l <= toFind && h >= toFind) {
mid = (low + high)/2;
int m = sortedArray[mid];
if (m < toFind) {
l = sortedArray[low = mid + 1];
} else if (m > toFind) {
h = sortedArray[high = mid - 1];
} else {
return mid;
}
}
if (sortedArray[low] == toFind)
return low;
else
return -1; // Not found
}
The helper structure needs to be initialized (and memory freed):
help.init = 0;
unsigned long long totalcycles4 = 0;
... make the calls same as for the other ones but pass the structure ...
binarySearchHelp(&help, arr,searched[j],length);
if ( help.init )
free( help.map );
help.init = 0;
Look first at the data and whether a big gain can be got by data specific method over a general method.
For large static sorted datasets, you can create an additional index to provide partial pigeon holing, based on the amount of memory you're willing to use. e.g. say we create a 256x256 two dimensional array of ranges, which we populate with the start and end positions in the search array of elements with corresponding high order bytes. When we come to search, we then use the high order bytes on the key to find the range / subset of the array we need to search. If we did have ~ 20 comparisons on our binary search of 100,000 elements O(log2(n)) we're now down to ~4 comarisons for 16 elements, or O(log2 (n/15)). The memory cost here is about 512k
Another method, again suited to data that doesn't change much, is to divide the data into arrays of commonly sought items and rarely sought items. For example, if you leave your existing search in place running a wide number of real world cases over a protracted testing period, and log the details of the item being sought, you may well find that the distribution is very uneven, i.e. some values are sought far more regularly than others. If this is the case, break your array into a much smaller array of commonly sought values and a larger remaining array, and search the smaller array first. If the data is right (big if!), you can often achieve broadly similar improvements to the first solution without the memory cost.
There are many other data specific optimizations which score far better than trying to improve on tried, tested and far more widely used general solutions.
Posting my current version before the question is closed (hopefully I will thus be able to ehance it later). For now it is worse than every other versions (if someone understand why my changes to the end of loop has this effect, comments are welcome).
int newSearch(int sortedArray[], int toFind, int len)
{
// Returns index of toFind in sortedArray, or -1 if not found
int low = 0;
int high = len - 1;
int mid;
int l = sortedArray[low];
int h = sortedArray[high];
while (l < toFind && h > toFind) {
mid = low + ((float)(high - low)*(float)(toFind - l))/(1+(float)(h-l));
int m = sortedArray[mid];
if (m < toFind) {
l = sortedArray[low = mid + 1];
} else if (m > toFind) {
h = sortedArray[high = mid - 1];
} else {
return mid;
}
}
if (l == toFind)
return low;
else if (h == toFind)
return high;
else
return -1; // Not found
}
The implementation of the binary search that was used for comparisons can be improved. The key idea is to "normalize" the range initially so that the target is always > a minimum and < than a maximum after the first step. This increases the termination delta size. It also has the effect of special casing targets that are less than the first element of the sorted array or greater than the last element of the sorted array. Expect approximately a 15% improvement in search time. Here is what the code might look like in C++.
int binarySearch(int * &array, int target, int min, int max)
{ // binarySearch
// normalize min and max so that we know the target is > min and < max
if (target <= array[min]) // if min not normalized
{ // target <= array[min]
if (target == array[min]) return min;
return -1;
} // end target <= array[min]
// min is now normalized
if (target >= array[max]) // if max not normalized
{ // target >= array[max]
if (target == array[max]) return max;
return -1;
} // end target >= array[max]
// max is now normalized
while (min + 1 < max)
{ // delta >=2
int tempi = min + ((max - min) >> 1); // point to index approximately in the middle between min and max
int atempi = array[tempi]; // just in case the compiler does not optimize this
if (atempi > target)max = tempi; // if the target is smaller, we can decrease max and it is still normalized
else if (atempi < target)min = tempi; // the target is bigger, so we can increase min and it is still normalized
else return tempi; // if we found the target, return with the index
// Note that it is important that this test for equality is last because it rarely occurs.
} // end delta >=2
return -1; // nothing in between normalized min and max
} // end binarySearch