What am I doing wrong in this C++ mergesort? - c++

The program compiles and runs properly. A list of integers is read from an input file, but the output displays those numbers without any changes. I expect them to be sorted from least to greatest. For reference, I am trying to implement a version similar to the example on wikipedia.
// arrA contains items to sort; arrB is an array to work in
void mergesort(int *arrA, int *arrB, int first, int last) {
// a 1 element array is already sorted
// make increasingly longer sorted lists
for (int width = 1; width < last; width = 2 * width) {
// arrA is made up of 1 or more sorted lists of size width
for (int i = 0; i < last; i += 2 * width) {
// merge two sorted lists
// or copy arrA to arrB if arrA is full
merge(arrA, i, min(i+width, last), min (i + 2 * width,
last), arrB);
} // end for
// now arrB is full of sorted lists of size 2* width
// copy arrB into arrA for next iteration
copy(arrB, arrA, last);
} // end for
} // end mergesort
void merge(int *arrA, int iLeft, int iRight, int iEnd, int *arrB) {
int i0 = iLeft, i1 = iRight;
// while either list contains integers
for (int j = iLeft; j < iEnd; j++) {
// if 1st integer in left list is <= 1st integer of right list
if (i0 < iRight && (i1 >= iEnd || arrA[i0] <= arrA[i1])) {
arrB[j] = arrA[i0];
i0 += 1;
} // end if
else { // right head > left head
arrB[j] = arrA[i0];
i0 += 1;
} // end else
} // end for
} // end merge
void copy(int *origin, int *destination, int size) {
for (int i = 0; i < size; i++) {
destination[i] = origin[i];
} // end for
} // end copy
int main() {
int size = 0, first = 0, *arrA, *arrB;
// input data
read(&arrA, &arrB, &size);
// sorting
mergesort(arrA, arrB, first, size);
// output
write(arrA, first, size);
// cleanup
delete [] arrA;
delete [] arrB;
}
input
33 9 -2
output
33 9 -2

I haven't looked very deeply at your code, but this if-statement seems a bit off to me:
if (i0 < iRight && (i1 >= iEnd || arrA[i0] <= arrA[i1])) {
arrB[j] = arrA[i0];
i0 += 1;
} // end if
else { // right head > left head
arrB[j] = arrA[i0];
i0 += 1;
} // end else
Surely, the whole point of a pair of if/else clauses is that you do different things in the if vs. the else part. As far as I can tell, it's identical here.

Related

Why is this implementation of deleting an element from heap wrong?

Here is my implementation of deleting an element from Min Heap if the position of the element to be deleted is known:
void MinHeap::deleteKey(int i)
{
if(heap_size>0 && i<heap_size && i>=0)
{
if(heap_size==1)
heap_size--;
else
{
harr[i] = harr[heap_size-1];
heap_size--;
if(i<heap_size)
MinHeapify(i);
}
}
return ;
}
The function MinHeapify() is as follows:
void MinHeap::MinHeapify(int i)
{
int l = left(i);
int r = right(i);
int smallest = i;
if (l < heap_size && harr[l] < harr[i]) smallest = l;
if (r < heap_size && harr[r] < harr[smallest]) smallest = r;
if (smallest != i) {
swap(harr[i], harr[smallest]);
MinHeapify(smallest);
}
}
The structure of MinHeap is as follows:
struct MinHeap
{
int *harr;
int capacity, heap_size;
MinHeap(int cap) {heap_size = 0; capacity = cap; harr = new int[cap];}
int extractMin();
void deleteKey(int i);
void insertKey(int k);
int parent(int i);
int left(int i);
int right(int i);
};
This implementation of delete follows the logic that we swap the element to be deleted with the last element(I've just over-written the last element onto the element to be deleted as we don't need the element to be deleted), and then decreasing the size of the heap array. We finally Minheapify the heap from the position of the deleted element(which is now occupied by the last element).
This implementation is working for some but not all test cases.
What is the error with this approach?
Consider the following min heap:
0
/ \
4 1
/ \ / \
5 6 2 3
If you were to extract the node 5, with your current algorithm it would simply replace it with 3:
0
/ \
4 1
/ \ /
3 6 2
And since it has no children, nothing else is done. But this is not a min heap anymore, since 3 < 4, but 4 is a parent of 3.
To implement this you first need to sift-up the node, then sift-down (what you've called MinHeapify):
// Swap with parent until parent is less. Returns new index
int MinHeap::siftUp(int i)
{
while (i > 0)
{
int i_parent = parent(i);
if (harr[i_parent] < harr[i]) break;
swap(harr[i_parent], harr[i]);
i = i_parent;
}
return i;
}
// Swap with smallest child until it is smaller than both children. Returns new index
int MinHeap::siftDown(int i) {
while (true)
{
int l = left(i);
int r = right(i);
int smallest = i;
if (l < heap_size && harr[l] < harr[i]) smallest = l;
if (r < heap_size && harr[r] < harr[smallest]) smallest = r;
if (smallest == i) break;
swap(harr[i], harr[smallest]);
i = smallest;
}
return i;
}
void MinHeap::deleteKey(int i)
{
if (i<heap_size && i>=0)
{
if (i == heap_size-1)
heap_size--;
else
{
harr[i] = harr[heap_size-1];
heap_size--;
i = SiftUp(i);
SiftDown(i);
}
}
}

Really slow when getting maximum distance between m nodes (Backtracking) (c++)

I've been asked to make an exercise using Backtracking, or Backtracking + Branch and Bound, where the imput data are n, m and a matrix(n x n). The n, represents a number of people, and the m, some people from the n. In the matrix, there are the distances among them, and the distances between i and j is different from the j and the i.
I am trying to get the maximum distance i can get from m nodes, that distance is the sum of the distance of all of them. For example, if i choose the node 1, 2 and 4, the result is the sums: distance(1, 2) + distance(2,1) + distance(2, 4) + distance(4, 2) + distance(1, 4) + distance(4, 1).
I have used Backtracking with Branch and Bound (iterative, not recursive), storing the nodes (structs where i store the current value and nodes used) that may get me to a solution. This nodes stores de lower and upper bound, i mean, the lower and higher solution I can obtain if i keep on using this node and his sons. From a node x, I generate all the possible nodes (nodes that are not used), and I check if this node may get me to a solution, if not, this node is discarded and erased.
The code i have implemented to make this, works, but it is really slowly. With low values of n and m, it is quick, but if i use higher numbers it is really slowly.
This is the main function and the others functions that are used:
void backtracking(int **matrix, int n, int m){
/////////////////////////////////////////////////////
/*
Part where I try to get the minimum/maximum that I can get from the beginning of the problem
*/
// Lists where I store the values from the matrix, sort from the minimum to the maximum, and from
// the maximum to the minimum. The values are the distances, I mean, the sum of matrix[i][j] and
// matrix[j][i].
list<int> listMinSums; // list of minimum sums
list<int> listMaxSums; // list of maximum sums
int nMinimumSums = floor((m*m - m)/2); // rounding down
int nMaximumSums = ceil((m*m - m)/2); // rounding up
/*
* m*m - m = Given m nodes, there are m*m - m sums.
*
* I count matrix[i][j] + matrix[j][i] as one, so there
* are (m*m - m)/2 sums.
*/
for (int i = 0; i < n; i++){
for (int j = 0; j < i; j++){
int x = (matrix[i][j] + matrix[j][i]);
// to differentiate from the minimum and maximum sums, I use false and true
aLista(listMinSums, x, nMinimumSums, false);
aLista(listMaxSums, x, nMaximumSums, true);
}
}
int min = 0;
int max = 0;
int contador = 0; // counting in every iteration to not surpassing the minimum/maximum sums
list<int>::iterator it = listMinSums.begin();
while (it != listMinSums.end() && contador < nMinimumSums){
min += *it;
it++;
contador++;
}
contador = 0;
list<int>::iterator it2 = listMaxSums.begin();
while (it2 != listMaxSums.end() && contador < nMaximumSums){
max += *it2;
it2++;
contador++;
}
//////////////////////////////////////////////////////////////////////////
//////////////////////////////////////////////////////////////////////////
//////////////////////////////////////////////////////////////////////////
// LLV = List of Live Nodes. Where I store the nodes that are going to
// guide me to the solution
list<nodo*> llv;
// I do not store the root node, i store the first n nodes, of the first level.
for (int i = 0; i < n; i++){
nodo *nod = new nodo(n);
nod ->level = 0;
//lower bound. It's the lower solution i can get from this node
nod ->ci = min;
// upper bound. The higher solution i can get from this node.
nod ->cs = max;
// estimated benefit. If i have to choose from one node or another, i choose the one with higher
nod ->be = (min+max)/2;
// The node i is used
nod ->used[i] = true;
// Inserting this node to the list of live nodes.
insert(llv, nod);
}
int solution = 0; // Initial solution
int c = min; // c = variable used to not use a node and his "sons", anymore
while (!empty(llv)){
nodo *x = erase(llv, n); // erasing the node with the highest estimated benefit from the llv.
if (x ->cs > c){
for (int i = 0; i < n; i++){ // Creating every son of the x node...
if (!(x ->used[i])){ // ... that has not being used yet
nodo *y = new nodo(n);
y ->level = x ->level + 1;
for (int j = 0; j < n; j++){
y ->used[j] = x ->used[j];
}
y ->used[i] = true;
// Adding the values. For example, if node 1 and 2 were inserted, and this is the node 4,
// adding matrix[1][4]+matrix[4][1]+matrix[2][4] + matrix[4][2]
int acum = 0;
for (int k = 0; k < n; k++){
if (k != i && consult(x ->used, k))
acum += matrix[i][k] + matrix[k][i];
}
y ->bact = x ->bact + acum;
// Getting the lower and upper bound of this node y.
cotas(y, x, i, y ->level, n, matrix);
y ->be = (y ->ci + y ->cs)/2;
// Node where i can get the solution
if (y ->level == (m-1) && y ->bact > solution){
solution = y ->bact;
if (y ->bact > c)
c = y ->bact;
}
// Checking if i can update c
else if (y ->level != (m-1) && y ->cs > c){
insert(llv, y);
if (y ->ci > c)
c = y ->ci;
}
else{
// i cannot use this node anymore, so i delete it.
delete y;
}
}
}
}
}
cout << solution << endl;
liberacionMemoria(matrix, n); // freeing the memory used in the matrix
}
void liberacionMemoria(int **matriz, int n){
for (int i = 0; i < n; i++)
delete[] matriz[i];
delete[] matriz;
}
void insert(list<nodo*> &t, nodo *x){
list<nodo*>::iterator it= t.begin();
t.insert(it, x);
}
/*
* Getting the node with hightest estimated benefit from the list of live nodes
* */
nodo* erase (list<nodo*> &t, int n){
nodo* erased = new nodo(n);
erased ->level = -1;
erased ->be = -1;
list<nodo*>::iterator it= t.begin();
list<nodo*>::iterator it2;
while (it != t.end()){
nodo* aux = *it;
if (aux ->be > erased ->be){
it2 = it;
erased = aux;
}
else if (aux ->be == erased ->be && aux ->level > erased ->level){
it2 = it;
erased = aux;
}
it++;
}
t.erase(it2);
return erased;
}
/*
* Checking if in the array of nodes used, the node in the x position it's used
* */
bool consult(bool *nodesUsed, int x){
if (nodesUsed[x])
return true;
return false;
}
bool empty(list<nodo*> &t){
list<nodo*>::iterator it= t.begin();
return (it==t.end());
}
bool aLista(list<int> &t, int x, int m, bool MayorAMenor){
list<int>::iterator it = t.begin();
int contador = 0;
while (it != t.end() && contador < m){
if (!MayorAMenor){ // lower to upper
if (*it > x){
t.insert(it, x);
return true;
}
}
else{
if (*it < x){
t.insert(it, x);
return true;
}
}
contador++;
it++;
}
if (it == t.end() && contador < m){
t.insert(it, x);
return true;
}
return false;
}
void cotas(nodo *sonNode, nodo *fatherNode, int elegido, int m, int n, int **matriz){
int max = 0;
int min = 999;
// Getting the sums from the chosen node with the already used
for (int i = 0; i < n; i++){
if (consult(sonNode ->used, i)){
if (elegido != i){
int x = matriz[i][elegido] + matriz[elegido][i];
if (x > max)
max = x;
if (x < min)
min = x;
}
}
}
min *= (m-1);
max *= (m-1);
min += fatherNode -> bact;
max += fatherNode -> bact;
sonNode -> ci = fatherNode ->ci - min;
sonNode -> cs = fatherNode ->cs - max;
}
I think, that the reason of going really slow with n and m a bit high, it is really slowly, it is because of the upper and lower bounds of the nodes, that are not accurate, but i don't know how to make it better.
I've been many days thinking how to do it, and trying to, but nothing works.
Here there are some examples:
Given an n = 4 and m = 2 and the following matrix:
0 3 2 4
2 0 4 5
2 1 0 4
2 3 2 0
the result is 8. This works and it is quickly.
But with n = 40 and m = 10 it never ends...
I hope someone may help me. Thanks.
****EDIT******
I may not have explained well. My doubt is, how can i know, from a node x, the less and the maximum I can get.
Because, the length of the solution nodes depends on m, but the solution changes if i choose some nodes or others, and I don't know how to be sure, of obtaining the less and the maximum from a node, but being accurate, to be able to cut the others branchs that do not guide me to a solution

Merge sort variant: using link array

I was looking for the variants of Merge Sort. So my textbook says,
A variant of function Merge in which No records need to be moved at all, can be implemented by use of an auxiliary array links.
Firstly, I would like to state the code.
Algo MergeSort(low,high)
{
// a is the array to be sorted using auxiliary array link.
if(high-low<15)
return InsertionSort1(a,link,low,high);
else
{
mid = (low+high)/2;
q=MergeSort(low,mid);
r=MergeSort(mid+1,high);
return Merge1(q,r);
}
}
// The function Merge1 is as defined:
function Merge1(q,r)
{
// q and r are pointers to list contained in the global array link[0:n], the lists pointed at by q //and r are merged and a pointer to the beginning of the merged list is returned.
i=q;
j=r;
k=0;
while(i!=0 && j!=0)
{
if(a[i]<=a[j])
{
link[k]=i;
k=i;
i=link[i];
}
else
{
link[k]=j;
k=j;
j=link[j];
}
} // end of while
if(i=0)
link[k]=j;
else
link[k]=i;
return link[0];
}
Okay so what I understood of the algorithm is:
If the number of elements are less than 15, apply insertion sort and sort those elements.
This way, we will get many lists that will be sorted by themselves but the entire array will not be sorted as such.
To sort the entire array, the function Merge is used.
My question is,
How is the function Merge combining the different sorted lists to one sorted list? I dont have any idea of the concept of the link array.
I am sorry but I tried very hard to understand but I dont get how the output array is "sorted" ?
Any kind of example will be of utmost help.
Thank You.
I cleaned up the code, and I also added a bottom up version that uses an array of starting indexes (see below). I changed high in MergeSort() to end, so the call is now MergeSort(0, SIZE). i = MergeSort() returns the index of the smallest value in a[], then i = link[i] is the 2nd element, i = link[i] is the 3rd element, until i = -1. Instead of using insertion sort, MergeSort() directly sorts groups of size==1 or size==2 and initializes link[].
MergeLists() uses head for the start of a list (the old code uses link[0]), and -1 for the end of a list (the old code uses 0). This allows sorting of a[0] to a[n-1] (the old code was sorting a[1] to a[n], with a[0] unused).
If a[] ={5,4,8,7}, then MergeSort() returns a 1, and link[] = {3,0,-1,2}, link[1] = 0, link[0] = 3, link[3] = 2, link[2] = -1, so the order is a[1], a[0], a[3], a[2].
#define SIZE 4
static unsigned int a[SIZE] = {5,8,4,7};
static size_t link[SIZE]; /* index to next element */
size_t MergeLists(size_t i, size_t j)
{
size_t head;
size_t *pprev = &head; /* ptr: head or link[] */
while((i != -1) && (j != -1)){ /* while not end lists */
if(a[i] <= a[j]){ /* if i < j */
*pprev = i; /* link to i */
pprev = &link[i]; /* advance pprev */
i=*pprev; /* advance i */
} else { /* else */
*pprev = j; /* link to j */
pprev = &link[j]; /* advance pprev */
j=*pprev; /* advance j */
}
}
if(i == -1) /* if end of i list */
*pprev=j; /* link to rest of j */
else /* else */
*pprev=i; /* link to rest of i */
return head;
}
size_t MergeSort(size_t low, size_t end)
{
size_t mid, i, j;
if((end - low) == 0){ /* if size == 0 */
return low; /* (only on first call) */
}
if((end - low) == 1){ /* if size == 1 */
link[low] = -1; /* initialize link[] */
return low; /* return index */
}
if((end - low) == 2){ /* if size == 2 */
if(a[low] <= a[end-1]){ /* if in order */
link[low] = end-1; /* initialize link[] */
link[end-1] = -1;
return low; /* return index */
} else { /* else */
link[end-1] = low; /* initialize link[] */
link[low] = -1;
return end-1; /* return index */
}
}
mid = (low+end)/2; /* size > 2, recursively */
i = MergeSort(low, mid); /* split lists until */
j = MergeSort(mid, end); /* size <= 2 */
return MergeLists(i, j); /* merge a pair of lists */
}
int main(void)
{
size_t i;
i = MergeSort(0, SIZE);
do{
printf("%3d", a[i]);
i = link[i];
}while(i != -1);
return 0;
}
This is an example that is non-recursive. It uses an array of starting indexes S[]. N[] is the same a link[] above, and MergeLists() is the same as before. S[0] points to lists of size 1, S[1] points to lists of size 2, S[2] points to lists of size 4, ... S[i] points to lists of size 2^i (2 to the power i). S[31] points to a list of unlimited size. Elements are merged into the array one at a time, then the array lists are merged to form a single list.
#define NUMIDX (32) // number of indexes in array
// A[] is array to be sorted
// N[] is array of indexes to next index
// l is index of N[] to left list
// r is index of N[] to right list
// returns starting index (l or r) for merged list
size_t MergeLists(int A[], size_t N[], size_t l, size_t r)
{
size_t head;
size_t *pprev = &head; // ptr: head or N[]
while((l != -1) && (r != -1)){ // while not end lists
if(A[l] <= A[r]){ // if l <= r
*pprev = l; // link to l
pprev = &N[l]; // advance pprev
l=*pprev; // advance l
} else { // else
*pprev = r; // link to r
pprev = &N[r]; // advance pprev
r=*pprev; // advance r
}
}
if(l == -1) // if end of l list
*pprev=r; // link to rest of r
else // else
*pprev=l; // link to rest of l
return head;
}
// A[] is array to be sorted
// N[] is set to array of indexes to next index (-1 = end list)
// low is starting index of A[]
// end is ending index of A[] (1 past last)
// returns starting index of N[] for merged list
// S[] is array of starting indexes in N[]
// S[i] is starting index of list of size pow(2,i)
size_t MergeSort(int A[], size_t N[], size_t low, size_t end)
{
size_t S[NUMIDX]; // array of starting indexes
size_t i,j;
if((end - low) == 0){ // if size == 0
return low; // (only on first call)
}
for(i = 0; i < (end-low); i++) // init N[]
N[i] = -1;
for(i = 0; i < NUMIDX; i++) // init S[]
S[i] = -1;
for(j = low; j < end; j++){ // merge index lists into S[], N[]
low = j;
for(i = 0; (i < NUMIDX) && (S[i] != -1); i++){
low = MergeLists(A, N, S[i], low);
S[i] = -1;
}
if(i == NUMIDX)
i--;
S[i] = low;
}
low = -1; // merge S[] lists to one list in N[]
for(i = 0; i < NUMIDX; i++)
low = MergeLists(A, N, S[i], low);
return low;
}

Having trouble getting Merge sort to be O(n log n)

In full disclosure, I'm a student and having trouble with merge sort. The purpose is obviously to have a O(n log n), but it's more n^2. I think the problem lies within the tempList, as you'll see in the code, but in the program description it says to use static int tempList[LIST_SIZE] to avoid degradation.
Here's what I have and the runtime using start is around 16000, which is obviously way to long for the merge sort.
void mergeSort(int randomNum[], int lowIdx, int highIdx)
{
int midIdx;
if (lowIdx < highIdx)
{
midIdx = (highIdx + lowIdx) / 2;
mergeSort(randomNum, lowIdx, midIdx);
mergeSort(randomNum, midIdx + 1, highIdx);
merge(randomNum, lowIdx, midIdx, highIdx);
}
}
Here is the second portion of the sort
void merge(int randomNum[], int lowIdx, int midIdx, int highIdx)
{
static int tempList[MAX_SORT];
for (int count = 0; count <= highIdx; count++)
tempList[count] = randomNum[count];
int leftIdx = lowIdx,
rightIdx = midIdx + 1,
tempPos = lowIdx;
while (leftIdx <= midIdx && (rightIdx <= highIdx))
{
if (tempList[leftIdx] <= tempList[rightIdx])
{
randomNum[tempPos] = tempList[leftIdx];
leftIdx++;
}
else
{
randomNum[tempPos] = tempList[rightIdx];
rightIdx++;
}
tempPos++;
}
while (leftIdx <= midIdx)
{
randomNum[tempPos] = tempList[leftIdx];
tempPos++;
leftIdx++;
}
while (rightIdx <= highIdx)
{
randomNum[tempPos] = tempList[rightIdx];
tempPos++;
rightIdx++;
}
}
The details of the program are that we have an array with 100000 random numbers and sort it using various sorting algorithms. The other sorts are working as expected, but this one seems to be off by a lot in comparison to what the big-O is supposed to be.
Can someone please help?
Not sure if this is all of your problem, but this is one issue:
You are copying randomNum to tempList from 0 to highIdx, but you only ever access tempList from lowIdx to highIdx.
That means that all the items you copied from 0 to lowIdx are wasted copies.
Solution: Only copy what you need.
for (int count = lowIdx; count <= highIdx; count++)
You might want to consider a bottom up merge sort. Example template code. a[] is the array to be sorted, b[] is a temp array with the same size as a[]. The sorted data may end up in either a[] or b[]. This can be modified to always end up with the data in a[] by doing a pass count check at the start and optionally skipping the swap in place if there will be an even number of passes.
template <typename T>
T * BottomUpMergeSort(T a[], T b[], size_t n)
{
for(size_t s = 1; s < n; s += 2) // swap in place for 1st pass
if(a[s] < a[s-1])
std::swap(a[s], a[s-1]);
for(size_t s = 2; s < n; s <<= 1){ // s = run size
size_t ee = 0; // init end index
while(ee < n){ // merge pairs of runs
size_t ll = ee; // ll = start of left run
size_t rr = ll+s; // rr = start of right run
if(rr >= n){ // if only left run
rr = n;
BottomUpCopy(a, b, ll, rr); // copy left run
break; // end of pass
}
ee = rr+s; // ee = end of right run
if(ee > n)
ee = n;
BottomUpMerge(a, b, ll, rr, ee);
}
std::swap(a, b); // swap a and b
}
return a; // return sorted array
}
template <typename T>
void BottomUpCopy(T a[], T b[], size_t ll, size_t rr)
{
while(ll < rr){ // copy left run
b[ll] = a[ll];
ll++;
}
}
template <typename T>
void BottomUpMerge(T a[], T b[], size_t ll, size_t rr, size_t ee)
{
size_t o = ll; // b[] index
size_t l = ll; // a[] left index
size_t r = rr; // a[] right index
while(1){ // merge data
if(a[l] <= a[r]){ // if a[l] <= a[r]
b[o++] = a[l++]; // copy a[l]
if(l < rr) // if not end of left run
continue; // continue (back to while)
while(r < ee){ // else copy rest of right run
b[o++] = a[r++];
}
break; // and return
} else { // else a[l] > a[r]
b[o++] = a[r++]; // copy a[r]
if(r < ee) // if not end of right run
continue; // continue (back to while)
while(l < rr){ // else copy rest of left run
b[o++] = a[l++];
}
break; // and return
}
}
}

What is the fastest search method for a sorted array?

Answering to another question, I wrote the program below to compare different search methods in a sorted array. Basically I compared two implementations of Interpolation search and one of binary search. I compared performance by counting cycles spent (with the same set of data) by the different variants.
However I'm sure there is ways to optimize these functions to make them even faster. Does anyone have any ideas on how can I make this search function faster? A solution in C or C++ is acceptable, but I need it to process an array with 100000 elements.
#include <stdlib.h>
#include <stdio.h>
#include <time.h>
#include <stdint.h>
#include <assert.h>
static __inline__ unsigned long long rdtsc(void)
{
unsigned long long int x;
__asm__ volatile (".byte 0x0f, 0x31" : "=A" (x));
return x;
}
int interpolationSearch(int sortedArray[], int toFind, int len) {
// Returns index of toFind in sortedArray, or -1 if not found
int64_t low = 0;
int64_t high = len - 1;
int64_t mid;
int l = sortedArray[low];
int h = sortedArray[high];
while (l <= toFind && h >= toFind) {
mid = low + (int64_t)((int64_t)(high - low)*(int64_t)(toFind - l))/((int64_t)(h-l));
int m = sortedArray[mid];
if (m < toFind) {
l = sortedArray[low = mid + 1];
} else if (m > toFind) {
h = sortedArray[high = mid - 1];
} else {
return mid;
}
}
if (sortedArray[low] == toFind)
return low;
else
return -1; // Not found
}
int interpolationSearch2(int sortedArray[], int toFind, int len) {
// Returns index of toFind in sortedArray, or -1 if not found
int low = 0;
int high = len - 1;
int mid;
int l = sortedArray[low];
int h = sortedArray[high];
while (l <= toFind && h >= toFind) {
mid = low + ((float)(high - low)*(float)(toFind - l))/(1+(float)(h-l));
int m = sortedArray[mid];
if (m < toFind) {
l = sortedArray[low = mid + 1];
} else if (m > toFind) {
h = sortedArray[high = mid - 1];
} else {
return mid;
}
}
if (sortedArray[low] == toFind)
return low;
else
return -1; // Not found
}
int binarySearch(int sortedArray[], int toFind, int len)
{
// Returns index of toFind in sortedArray, or -1 if not found
int low = 0;
int high = len - 1;
int mid;
int l = sortedArray[low];
int h = sortedArray[high];
while (l <= toFind && h >= toFind) {
mid = (low + high)/2;
int m = sortedArray[mid];
if (m < toFind) {
l = sortedArray[low = mid + 1];
} else if (m > toFind) {
h = sortedArray[high = mid - 1];
} else {
return mid;
}
}
if (sortedArray[low] == toFind)
return low;
else
return -1; // Not found
}
int order(const void *p1, const void *p2) { return *(int*)p1-*(int*)p2; }
int main(void) {
int i = 0, j = 0, size = 100000, trials = 10000;
int searched[trials];
srand(-time(0));
for (j=0; j<trials; j++) { searched[j] = rand()%size; }
while (size > 10){
int arr[size];
for (i=0; i<size; i++) { arr[i] = rand()%size; }
qsort(arr,size,sizeof(int),order);
unsigned long long totalcycles_bs = 0;
unsigned long long totalcycles_is_64 = 0;
unsigned long long totalcycles_is_float = 0;
unsigned long long totalcycles_new = 0;
int res_bs, res_is_64, res_is_float, res_new;
for (j=0; j<trials; j++) {
unsigned long long tmp, cycles = rdtsc();
res_bs = binarySearch(arr,searched[j],size);
tmp = rdtsc(); totalcycles_bs += tmp - cycles; cycles = tmp;
res_is_64 = interpolationSearch(arr,searched[j],size);
assert(res_is_64 == res_bs || arr[res_is_64] == searched[j]);
tmp = rdtsc(); totalcycles_is_64 += tmp - cycles; cycles = tmp;
res_is_float = interpolationSearch2(arr,searched[j],size);
assert(res_is_float == res_bs || arr[res_is_float] == searched[j]);
tmp = rdtsc(); totalcycles_is_float += tmp - cycles; cycles = tmp;
}
printf("----------------- size = %10d\n", size);
printf("binary search = %10llu\n", totalcycles_bs);
printf("interpolation uint64_t = %10llu\n", totalcycles_is_64);
printf("interpolation float = %10llu\n", totalcycles_is_float);
printf("new = %10llu\n", totalcycles_new);
printf("\n");
size >>= 1;
}
}
If you have some control over the in-memory layout of the data, you might want to look at Judy arrays.
Or to put a simpler idea out there: a binary search always cuts the search space in half. An optimal cut point can be found with interpolation (the cut point should NOT be the place where the key is expected to be, but the point which minimizes the statistical expectation of the search space for the next step). This minimizes the number of steps but... not all steps have equal cost. Hierarchical memories allow executing a number of tests in the same time as a single test, if locality can be maintained. Since a binary search's first M steps only touch a maximum of 2**M unique elements, storing these together can yield a much better reduction of search space per-cacheline fetch (not per comparison), which is higher performance in the real world.
n-ary trees work on that basis, and then Judy arrays add a few less important optimizations.
Bottom line: even "Random Access Memory" (RAM) is faster when accessed sequentially than randomly. A search algorithm should use that fact to its advantage.
Benchmarked on Win32 Core2 Quad Q6600, gcc v4.3 msys. Compiling with g++ -O3, nothing fancy.
Observation - the asserts, timing and loop overhead is about 40%, so any gains listed below should be divided by 0.6 to get the actual improvement in the algorithms under test.
Simple answers:
On my machine replacing the int64_t with int for "low", "high" and "mid" in interpolationSearch gives a 20% to 40% speed up. This is the fastest easy method I could find. It is taking about 150 cycles per look-up on my machine (for the array size of 100000). That's roughly the same number of cycles as a cache miss. So in real applications, looking after your cache is probably going to be the biggest factor.
Replacing binarySearch's "/2" with a ">>1" gives a 4% speed up.
Using STL's binary_search algorithm, on a vector containing the same data as "arr", is about the same speed as the hand coded binarySearch. Although on the smaller "size"s STL is much slower - around 40%.
I have an excessively complicated solution, which requires a specialized sorting function. The sort is slightly slower than a good quicksort, but all of my tests show that the search function is much faster than a binary or interpolation search. I called it a regression sort before I found out that the name was already taken, but didn't bother to think of a new name (ideas?).
There are three files to demonstrate.
The regression sort/search code:
#include <sstream>
#include <math.h>
#include <ctime>
#include "limits.h"
void insertionSort(int array[], int length) {
int key, j;
for(int i = 1; i < length; i++) {
key = array[i];
j = i - 1;
while (j >= 0 && array[j] > key) {
array[j + 1] = array[j];
--j;
}
array[j + 1] = key;
}
}
class RegressionTable {
public:
RegressionTable(int arr[], int s, int lower, int upper, double mult, int divs);
RegressionTable(int arr[], int s);
void sort(void);
int find(int key);
void printTable(void);
void showSize(void);
private:
void createTable(void);
inline unsigned int resolve(int n);
int * array;
int * table;
int * tableSize;
int size;
int lowerBound;
int upperBound;
int divisions;
int divisionSize;
int newSize;
double multiplier;
};
RegressionTable::RegressionTable(int arr[], int s) {
array = arr;
size = s;
multiplier = 1.35;
divisions = sqrt(size);
upperBound = INT_MIN;
lowerBound = INT_MAX;
for (int i = 0; i < size; ++i) {
if (array[i] > upperBound)
upperBound = array[i];
if (array[i] < lowerBound)
lowerBound = array[i];
}
createTable();
}
RegressionTable::RegressionTable(int arr[], int s, int lower, int upper, double mult, int divs) {
array = arr;
size = s;
lowerBound = lower;
upperBound = upper;
multiplier = mult;
divisions = divs;
createTable();
}
void RegressionTable::showSize(void) {
int bytes = sizeof(*this);
bytes = bytes + sizeof(int) * 2 * (divisions + 1);
}
void RegressionTable::createTable(void) {
divisionSize = size / divisions;
newSize = multiplier * double(size);
table = new int[divisions + 1];
tableSize = new int[divisions + 1];
for (int i = 0; i < divisions; ++i) {
table[i] = 0;
tableSize[i] = 0;
}
for (int i = 0; i < size; ++i) {
++table[((array[i] - lowerBound) / divisionSize) + 1];
}
for (int i = 1; i <= divisions; ++i) {
table[i] += table[i - 1];
}
table[0] = 0;
for (int i = 0; i < divisions; ++i) {
tableSize[i] = table[i + 1] - table[i];
}
}
int RegressionTable::find(int key) {
double temp = multiplier;
multiplier = 1;
int minIndex = table[(key - lowerBound) / divisionSize];
int maxIndex = minIndex + tableSize[key / divisionSize];
int guess = resolve(key);
double t;
while (array[guess] != key) {
// uncomment this line if you want to see where it is searching.
//cout << "Regression Guessing " << guess << ", not there." << endl;
if (array[guess] < key) {
minIndex = guess + 1;
}
if (array[guess] > key) {
maxIndex = guess - 1;
}
if (array[minIndex] > key || array[maxIndex] < key) {
return -1;
}
t = ((double)key - array[minIndex]) / ((double)array[maxIndex] - array[minIndex]);
guess = minIndex + t * (maxIndex - minIndex);
}
multiplier = temp;
return guess;
}
inline unsigned int RegressionTable::resolve(int n) {
float temp;
int subDomain = (n - lowerBound) / divisionSize;
temp = n % divisionSize;
temp /= divisionSize;
temp *= tableSize[subDomain];
temp += table[subDomain];
temp *= multiplier;
return (unsigned int)temp;
}
void RegressionTable::sort(void) {
int * out = new int[int(size * multiplier)];
bool * used = new bool[int(size * multiplier)];
int higher, lower;
bool placed;
for (int i = 0; i < size; ++i) {
/* Figure out where to put the darn thing */
higher = resolve(array[i]);
lower = higher - 1;
if (higher > newSize) {
higher = size;
lower = size - 1;
} else if (lower < 0) {
higher = 0;
lower = 0;
}
placed = false;
while (!placed) {
if (higher < size && !used[higher]) {
out[higher] = array[i];
used[higher] = true;
placed = true;
} else if (lower >= 0 && !used[lower]) {
out[lower] = array[i];
used[lower] = true;
placed = true;
}
--lower;
++higher;
}
}
int index = 0;
for (int i = 0; i < size * multiplier; ++i) {
if (used[i]) {
array[index] = out[i];
++index;
}
}
insertionSort(array, size);
}
And then there is the regular search functions:
#include <iostream>
using namespace std;
int binarySearch(int array[], int start, int end, int key) {
// Determine the search point.
int searchPos = (start + end) / 2;
// If we crossed over our bounds or met in the middle, then it is not here.
if (start >= end)
return -1;
// Search the bottom half of the array if the query is smaller.
if (array[searchPos] > key)
return binarySearch (array, start, searchPos - 1, key);
// Search the top half of the array if the query is larger.
if (array[searchPos] < key)
return binarySearch (array, searchPos + 1, end, key);
// If we found it then we are done.
if (array[searchPos] == key)
return searchPos;
}
int binarySearch(int array[], int size, int key) {
return binarySearch(array, 0, size - 1, key);
}
int interpolationSearch(int array[], int size, int key) {
int guess = 0;
double t;
int minIndex = 0;
int maxIndex = size - 1;
while (array[guess] != key) {
t = ((double)key - array[minIndex]) / ((double)array[maxIndex] - array[minIndex]);
guess = minIndex + t * (maxIndex - minIndex);
if (array[guess] < key) {
minIndex = guess + 1;
}
if (array[guess] > key) {
maxIndex = guess - 1;
}
if (array[minIndex] > key || array[maxIndex] < key) {
return -1;
}
}
return guess;
}
And then I wrote a simple main to test out the different sorts.
#include <iostream>
#include <iomanip>
#include <cstdlib>
#include <ctime>
#include "regression.h"
#include "search.h"
using namespace std;
void randomizeArray(int array[], int size) {
for (int i = 0; i < size; ++i) {
array[i] = rand() % size;
}
}
int main(int argc, char * argv[]) {
int size = 100000;
string arg;
if (argc > 1) {
arg = argv[1];
size = atoi(arg.c_str());
}
srand(time(NULL));
int * array;
cout << "Creating Array Of Size " << size << "...\n";
array = new int[size];
randomizeArray(array, size);
cout << "Sorting Array...\n";
RegressionTable t(array, size, 0, size*2.5, 1.5, size);
//RegressionTable t(array, size);
t.sort();
int trials = 10000000;
int start;
cout << "Binary Search...\n";
start = clock();
for (int i = 0; i < trials; ++i) {
binarySearch(array, size, i % size);
}
cout << clock() - start << endl;
cout << "Interpolation Search...\n";
start = clock();
for (int i = 0; i < trials; ++i) {
interpolationSearch(array, size, i % size);
}
cout << clock() - start << endl;
cout << "Regression Search...\n";
start = clock();
for (int i = 0; i < trials; ++i) {
t.find(i % size);
}
cout << clock() - start << endl;
return 0;
}
Give it a try and tell me if it's faster for you. It's super complicated, so it's really easy to break it if you don't know what you are doing. Be careful about modifying it.
I compiled the main with g++ on ubuntu.
Unless your data is known to have special properties, pure interpolation search has the risk of taking linear time. If you expect interpolation to help with most data but don't want it to hurt in the case of pathological data, I would use a (possibly weighted) average of the interpolated guess and the midpoint, ensuring a logarithmic bound on the run time.
One way of approaching this is to use a space versus time trade-off. There are any number of ways that could be done. The extreme way would be to simply make an array with the max size being the max value of the sorted array. Initialize each position with the index into sortedArray. Then the search would simply be O(1).
The following version, however, might be a little more realistic and possibly be useful in the real world. It uses a "helper" structure that is initialized on the first call. It maps the search space down to a smaller space by dividing by a number that I pulled out of the air without much testing. It stores the index of the lower bound for a group of values in sortedArray into the helper map. The actual search divides the toFind number by the chosen divisor and extracts the narrowed bounds of sortedArray for a normal binary search.
For example, if the sorted values range from 1 to 1000 and the divisor is 100, then the lookup array might contain 10 "sections". To search for value 250, it would divide it by 100 to yield integer index position 250/100=2. map[2] would contain the sortedArray index for values 200 and larger. map[3] would have the index position of values 300 and larger thus providing a smaller bounding position for a normal binary search. The rest of the function is then an exact copy of your binary search function.
The initialization of the helper map might be more efficient by using a binary search to fill in the positions rather than a simple scan, but it is a one time cost so I didn't bother testing that. This mechanism works well for the given test numbers which are evenly distributed. As written, it would not be as good if the distribution was not even. I think this method could be used with floating point search values too. However, extrapolating it to generic search keys might be harder. For example, I am unsure what the method would be for character data keys. It would need some kind of O(1) lookup/hash that mapped to a specific array position to find the index bounds. It's unclear to me at the moment what that function would be or if it exists.
I kludged the setup of the helper map in the following implementation pretty quickly. It is not pretty and I'm not 100% sure it is correct in all cases but it does show the idea. I ran it with a debug test to compare the results against your existing binarySearch function to be somewhat sure it works correctly.
The following are example numbers:
100000 * 10000 : cycles binary search = 10197811
100000 * 10000 : cycles interpolation uint64_t = 9007939
100000 * 10000 : cycles interpolation float = 8386879
100000 * 10000 : cycles binary w/helper = 6462534
Here is the quick-and-dirty implementation:
#define REDUCTION 100 // pulled out of the air
typedef struct {
int init; // have we initialized it?
int numSections;
int *map;
int divisor;
} binhelp;
int binarySearchHelp( binhelp *phelp, int sortedArray[], int toFind, int len)
{
// Returns index of toFind in sortedArray, or -1 if not found
int low;
int high;
int mid;
if ( !phelp->init && len > REDUCTION ) {
int i;
int numSections = len / REDUCTION;
int divisor = (( sortedArray[len-1] - 1 ) / numSections ) + 1;
int threshold;
int arrayPos;
phelp->init = 1;
phelp->divisor = divisor;
phelp->numSections = numSections;
phelp->map = (int*)malloc((numSections+2) * sizeof(int));
phelp->map[0] = 0;
phelp->map[numSections+1] = len-1;
arrayPos = 0;
// Scan through the array and set up the mapping positions. Simple linear
// scan but it is a one-time cost.
for ( i = 1; i <= numSections; i++ ) {
threshold = i * divisor;
while ( arrayPos < len && sortedArray[arrayPos] < threshold )
arrayPos++;
if ( arrayPos < len )
phelp->map[i] = arrayPos;
else
// kludge to take care of aliasing
phelp->map[i] = len - 1;
}
}
if ( phelp->init ) {
int section = toFind / phelp->divisor;
if ( section > phelp->numSections )
// it is bigger than all values
return -1;
low = phelp->map[section];
if ( section == phelp->numSections )
high = len - 1;
else
high = phelp->map[section+1];
} else {
// use normal start points
low = 0;
high = len - 1;
}
// the following is a direct copy of the Kriss' binarySearch
int l = sortedArray[low];
int h = sortedArray[high];
while (l <= toFind && h >= toFind) {
mid = (low + high)/2;
int m = sortedArray[mid];
if (m < toFind) {
l = sortedArray[low = mid + 1];
} else if (m > toFind) {
h = sortedArray[high = mid - 1];
} else {
return mid;
}
}
if (sortedArray[low] == toFind)
return low;
else
return -1; // Not found
}
The helper structure needs to be initialized (and memory freed):
help.init = 0;
unsigned long long totalcycles4 = 0;
... make the calls same as for the other ones but pass the structure ...
binarySearchHelp(&help, arr,searched[j],length);
if ( help.init )
free( help.map );
help.init = 0;
Look first at the data and whether a big gain can be got by data specific method over a general method.
For large static sorted datasets, you can create an additional index to provide partial pigeon holing, based on the amount of memory you're willing to use. e.g. say we create a 256x256 two dimensional array of ranges, which we populate with the start and end positions in the search array of elements with corresponding high order bytes. When we come to search, we then use the high order bytes on the key to find the range / subset of the array we need to search. If we did have ~ 20 comparisons on our binary search of 100,000 elements O(log2(n)) we're now down to ~4 comarisons for 16 elements, or O(log2 (n/15)). The memory cost here is about 512k
Another method, again suited to data that doesn't change much, is to divide the data into arrays of commonly sought items and rarely sought items. For example, if you leave your existing search in place running a wide number of real world cases over a protracted testing period, and log the details of the item being sought, you may well find that the distribution is very uneven, i.e. some values are sought far more regularly than others. If this is the case, break your array into a much smaller array of commonly sought values and a larger remaining array, and search the smaller array first. If the data is right (big if!), you can often achieve broadly similar improvements to the first solution without the memory cost.
There are many other data specific optimizations which score far better than trying to improve on tried, tested and far more widely used general solutions.
Posting my current version before the question is closed (hopefully I will thus be able to ehance it later). For now it is worse than every other versions (if someone understand why my changes to the end of loop has this effect, comments are welcome).
int newSearch(int sortedArray[], int toFind, int len)
{
// Returns index of toFind in sortedArray, or -1 if not found
int low = 0;
int high = len - 1;
int mid;
int l = sortedArray[low];
int h = sortedArray[high];
while (l < toFind && h > toFind) {
mid = low + ((float)(high - low)*(float)(toFind - l))/(1+(float)(h-l));
int m = sortedArray[mid];
if (m < toFind) {
l = sortedArray[low = mid + 1];
} else if (m > toFind) {
h = sortedArray[high = mid - 1];
} else {
return mid;
}
}
if (l == toFind)
return low;
else if (h == toFind)
return high;
else
return -1; // Not found
}
The implementation of the binary search that was used for comparisons can be improved. The key idea is to "normalize" the range initially so that the target is always > a minimum and < than a maximum after the first step. This increases the termination delta size. It also has the effect of special casing targets that are less than the first element of the sorted array or greater than the last element of the sorted array. Expect approximately a 15% improvement in search time. Here is what the code might look like in C++.
int binarySearch(int * &array, int target, int min, int max)
{ // binarySearch
// normalize min and max so that we know the target is > min and < max
if (target <= array[min]) // if min not normalized
{ // target <= array[min]
if (target == array[min]) return min;
return -1;
} // end target <= array[min]
// min is now normalized
if (target >= array[max]) // if max not normalized
{ // target >= array[max]
if (target == array[max]) return max;
return -1;
} // end target >= array[max]
// max is now normalized
while (min + 1 < max)
{ // delta >=2
int tempi = min + ((max - min) >> 1); // point to index approximately in the middle between min and max
int atempi = array[tempi]; // just in case the compiler does not optimize this
if (atempi > target)max = tempi; // if the target is smaller, we can decrease max and it is still normalized
else if (atempi < target)min = tempi; // the target is bigger, so we can increase min and it is still normalized
else return tempi; // if we found the target, return with the index
// Note that it is important that this test for equality is last because it rarely occurs.
} // end delta >=2
return -1; // nothing in between normalized min and max
} // end binarySearch