C++ Binary Search Algorithm not working - c++

So I have a vector of ints named bList that has information already in it. I have it sorted before running the binary search.
//I have already inserted random ints into the vector
//Sort it
bubbleSort();
//Empty Line for formatting
cout << "\n";
//Print out sorted array.
print();
cout << "It will now search for a value using binary search\n";
int val = binSearch(54354);
cout<<val;
My bubble sort algorithm does work.
I have it returning an int which is the location of the searched value in the list.
//Its one argument is the value you are searching for.
int binSearch(int isbn) {
int lower = 0;
int upper = 19;//Vector size is 20.
int middle = (lower + upper) / 2;
while (lower < upper) {
middle = (lower + upper) / 2;
int midVal = bList[middle];
if (midVal == isbn) {
return middle;
break;
} else if (isbn > midVal) {
lower = midVal + 1;
} else if (isbn < midVal) {
upper - midVal - 1;
}
}
}
But for some reason, when I run it, it just keeps running and doesn't return anything.

Here the bug is:
// ...
} else if (isbn > midVal) {
lower = midVal + 1;
} else if (isbn < midVal) {
upper - midVal - 1;
}
You may want
lower = middle + 1;
and
upper = middle - 1;
instead.
You also need to explicitly return something when the required number cannot be found.

You still have a slight logic problem with your while condition:
int binary_search(int i, const std::vector<int>& vec) // you really should pass in the vector, if not convert it to use iterators
{
int result = -1; // default return value if not found
int lower = 0;
int upper = vec.size() - 1;
while (lower <= upper) // this will let the search run when lower == upper (meaning the result is one of the ends)
{
int middle = (lower + upper) / 2;
int val = vec[middle];
if (val == i)
{
result = middle;
break;
}
else if (i > val)
{
lower = middle + 1; // you were setting it to the value instead of the index
}
else if (i < val)
{
upper = middle - 1; // same here
}
}
return result; // moved your return down here to always return something (avoids a compiler error)
}
Alternatively, you could switch it to use iterators instead:
template<class RandomIterator>
RandomIterator binary_search(int i, RandomIterator start, RandomIterator end)
{
RandomIterator result = end;
while (start <= end) // this will let the search run when start == end (meaning the result is one of the ends)
{
RandomIterator middle = start + ((end - start) / 2);
if (*middle == i)
{
result = middle;
break;
}
else if (i > *middle)
{
start = middle + 1;
}
else if (i < *middle)
{
end = middle - 1;
}
}
return result;
}

Related

Optimizing the function for finding a Hamiltionian cycle in a grid graph?

I've made a working algorithm for finding a Hamiltonian cycle in a grid graph. However, the approach I implemented includes recursively checking all the possible combinations until I find the right one. This is fine on small graphs (like 6*6), but becomes way too slow on bigger ones, the ones that I need to find the cycle for (30 * 30).
In main I initialize a 2D vector of ints, representing out graph (board), and initalize it to -1. -1 represents that this space hasn't been 'filled' yet, while values above that represent their place in the cycle (0 - first cell, 1 - second cell etc.). And I use initialize a Vector2f (SFML's way of doing vectors, same as pairs in standard library), which I use to step all the possible states.
And I also initialize the path integer, which will help up later.And lastly I call the Hamiltionan cycle finding algorithm (HamCycle()).
int main(){
int path = 0;
int bx = 8;
std::vector<std::vector<int>> board{ 8 };
Vector2f pos = { 4 , 4 };
for (int i = 0; i < bx; i++) {
board[i].resize(bx);
for (int j = 0; j < bx; j++) board[i][j] = -1;
}
hamCycle(board, pos, path, bx);
};
Then I hamCycle() I check if pos vector goes outside of the grid, and if so return false. Else I give this cell the value of path, which is then increased. I check if the algorithm is done, and if it's a cycle or just a path. If it's a path, it returns false. Then I recursively check the cells around it and repeat the process.
bool hamCycle(std::vector<std::vector<int>> &board,Vector2f pos, int &path, int bx) {
//check if it's outside the box and if it's already occupied
if (pos.x >= bx || pos.x < 0 || pos.y >= bx || pos.y < 0) return false;
if (board[pos.x][pos.y] != -1) return false;
board[pos.x][pos.y] = path;
path++;
//check if the cycle is completed
bool isDone = true;
if (path != (bx * bx)) isDone = false;
//check if this cell is adjacent to the beggining and if so it's done
if (isDone) {
if (pos.x != 0 && pos.x != (size - 1) && pos.y != 0 && pos.y != (size - 1)) {
if ((board[pos.x + 1][pos.y] == 0) || (board[pos.x - 1][pos.y] == 0) || (board[pos.x][pos.y
+ 1] == 0)
|| (board[pos.x][pos.y - 1] == 0)) {
return true;
}
path--;
board[pos.x][pos.y] = -1;
return false;
}
else {
path--;
board[pos.x][pos.y] = -1;
return false;
};
}
//recursion time
if (hamCycle(board, Vector2f(pos.x + 1, pos.y), path, bx)) return true;
if (hamCycle(board, Vector2f(pos.x - 1, pos.y), path, bx)) return true;
if (hamCycle(board, Vector2f(pos.x, pos.y + 1), path, bx)) return true;
if (hamCycle(board, Vector2f(pos.x, pos.y - 1), path, bx)) return true;
path--;
board[pos.x][pos.y] = -1;
return false;
}
Right now it spends a lot of time checking all possible paths when it has already blocked an exit, which is innefficent. How can I improve this, so checking big grids is feasible? Like not checking if has a blocked exit, but if you know any other methods for improvement, please let me know.
You could try divide and conquer : take your board, divide it into small pieces (let's say 4), and find the right path for each of those pieces. The hard part is to define what is the right path. You need a path coming from the previous piece and going into the next one, passing by each cell. To do that, you can divide those pieces into smaller one, etc, until you have pieces of only one cell.
Note that this approach doesn't give you all the cycles possible, but almost always the same ones.
Finding one Hamiltonian cycle on a grid graph is really not that hard. I implemented it below. I used an std::array for the board because I wanted to train a bit the writing of constexpr functions. For the theoritical explanation, see here.
#include <iostream>
#include <array>
#include <optional>
#include <algorithm>
// Allows iterating of a two dimensional array in the cross direction.
template<typename Iter>
struct cross_iterator {
using difference_type = typename Iter::difference_type;
using value_type = typename Iter::value_type;
using pointer = typename Iter::pointer;
using reference = typename Iter::reference;
using iterator_category = typename Iter::iterator_category;
constexpr cross_iterator(Iter it, size_t pos) : _it(it), _pos(pos)
{}
constexpr auto& operator*() {
return (*_it)[_pos];
}
constexpr auto& operator++() {
++_it;
return *this;
}
constexpr auto& operator++(int) {
_it++;
return *this;
}
constexpr auto& operator--() {
--_it;
return *this;
}
constexpr auto& operator--(int) {
_it--;
return *this;
}
constexpr bool operator==(const cross_iterator<Iter> &other) const {
return _pos == other._pos && _it == other._it;
}
constexpr bool operator!=(const cross_iterator<Iter> &other) const {
return !(*this == other);
}
constexpr auto& operator+=(difference_type n) {
_it += n;
return *this;
}
Iter _it;
const size_t _pos;
};
template<typename Iter>
cross_iterator(Iter it, size_t pos) -> cross_iterator<std::decay_t<decltype(it)>>;
template<size_t N, size_t M = N>
using board = std::array<std::array<int, N>, M>;
template<size_t N, size_t M = N>
constexpr std::optional<board<N, M>> get_ham_cycle() {
if constexpr ( N%2 == 1 && M%2 == 1 ) {
if constexpr( N == 1 && M == 1 ) {
return {{{{0}}}};
}
else {
// There is no Hamiltonian Cycle on odd side grid graphs with side lengths > 1
return {};
}
} else
{
std::optional<board<N,M>> ret {std::in_place};
auto &arr = *ret;
int count = 0;
arr[0][0] = count++;
if constexpr ( N%2 == 0 ) {
for(auto i = 0ul; i < N; ++i) {
// We fill the columns in alternating directions
if ( i%2 == 0 ) {
std::generate(std::next(begin(arr[i])), end(arr[i]), [&count] () { return count++; });
} else {
std::generate(rbegin(arr[i]), std::prev(rend(arr[i])), [&count] () { return count++; });
}
}
std::generate(cross_iterator(rbegin(arr), 0), std::prev(cross_iterator(rend(arr), 0)), [&count] () { return count++; });
} else {
for(auto j = 0ul; j < M; ++j) {
// We fill the rows in alternating directions
if ( j%2 == 0 ) {
std::generate(std::next(cross_iterator(begin(arr)), j), cross_iterator(end(arr), j), [&count] () { return count++; });
} else {
std::generate(cross_iterator(rbegin(arr), j), std::prev(cross_iterator(rend(arr), j)), [&count] () { return count++; });
}
}
std::generate(rbegin(arr[0]), std::prev(rend(arr[0])), [&count] () { return count++; });
}
return ret;
}
}
int main() {
auto arr = *get_ham_cycle<30>();
for(auto i = 0ul; i < 30; ++i) {
for(auto j = 0ul; j < 30; ++j) {
std::cout << arr[i][j] << '\t';
}
std::cout << '\n';
}
return 0;
}
In a grid graph there is a hamilton cycle if and only if the width or the height are even (or both). Start in the top left corner, if the height is odd go all the way down, then up and down repeatedly while leaving one space at the top. When having reached the right corner you can go all the way up and to the left again.
4*5:
S<<<
v>v^
v^v^
v^v^
>^>^
4*4:
S<<<
v>v^
v^v^
>^>^
For odd width, just turn it 90 degree.
This runs in O(width*height).
PS: I'm currently looking for a way to find Hamilton Cycles in a grid graph with restrictions (for implementing a perfect snake player)

How to get the lexical rank of a string? [duplicate]

I'm posting this although much has already been posted about this question. I didn't want to post as an answer since it's not working. The answer to this post (Finding the rank of the Given string in list of all possible permutations with Duplicates) did not work for me.
So I tried this (which is a compilation of code I've plagiarized and my attempt to deal with repetitions). The non-repeating cases work fine. BOOKKEEPER generates 83863, not the desired 10743.
(The factorial function and letter counter array 'repeats' are working correctly. I didn't post to save space.)
while (pointer != length)
{
if (sortedWordChars[pointer] != wordArray[pointer])
{
// Swap the current character with the one after that
char temp = sortedWordChars[pointer];
sortedWordChars[pointer] = sortedWordChars[next];
sortedWordChars[next] = temp;
next++;
//For each position check how many characters left have duplicates,
//and use the logic that if you need to permute n things and if 'a' things
//are similar the number of permutations is n!/a!
int ct = repeats[(sortedWordChars[pointer]-64)];
// Increment the rank
if (ct>1) { //repeats?
System.out.println("repeating " + (sortedWordChars[pointer]-64));
//In case of repetition of any character use: (n-1)!/(times)!
//e.g. if there is 1 character which is repeating twice,
//x* (n-1)!/2!
int dividend = getFactorialIter(length - pointer - 1);
int divisor = getFactorialIter(ct);
int quo = dividend/divisor;
rank += quo;
} else {
rank += getFactorialIter(length - pointer - 1);
}
} else
{
pointer++;
next = pointer + 1;
}
}
Note: this answer is for 1-based rankings, as specified implicitly by example. Here's some Python that works at least for the two examples provided. The key fact is that suffixperms * ctr[y] // ctr[x] is the number of permutations whose first letter is y of the length-(i + 1) suffix of perm.
from collections import Counter
def rankperm(perm):
rank = 1
suffixperms = 1
ctr = Counter()
for i in range(len(perm)):
x = perm[((len(perm) - 1) - i)]
ctr[x] += 1
for y in ctr:
if (y < x):
rank += ((suffixperms * ctr[y]) // ctr[x])
suffixperms = ((suffixperms * (i + 1)) // ctr[x])
return rank
print(rankperm('QUESTION'))
print(rankperm('BOOKKEEPER'))
Java version:
public static long rankPerm(String perm) {
long rank = 1;
long suffixPermCount = 1;
java.util.Map<Character, Integer> charCounts =
new java.util.HashMap<Character, Integer>();
for (int i = perm.length() - 1; i > -1; i--) {
char x = perm.charAt(i);
int xCount = charCounts.containsKey(x) ? charCounts.get(x) + 1 : 1;
charCounts.put(x, xCount);
for (java.util.Map.Entry<Character, Integer> e : charCounts.entrySet()) {
if (e.getKey() < x) {
rank += suffixPermCount * e.getValue() / xCount;
}
}
suffixPermCount *= perm.length() - i;
suffixPermCount /= xCount;
}
return rank;
}
Unranking permutations:
from collections import Counter
def unrankperm(letters, rank):
ctr = Counter()
permcount = 1
for i in range(len(letters)):
x = letters[i]
ctr[x] += 1
permcount = (permcount * (i + 1)) // ctr[x]
# ctr is the histogram of letters
# permcount is the number of distinct perms of letters
perm = []
for i in range(len(letters)):
for x in sorted(ctr.keys()):
# suffixcount is the number of distinct perms that begin with x
suffixcount = permcount * ctr[x] // (len(letters) - i)
if rank <= suffixcount:
perm.append(x)
permcount = suffixcount
ctr[x] -= 1
if ctr[x] == 0:
del ctr[x]
break
rank -= suffixcount
return ''.join(perm)
If we use mathematics, the complexity will come down and will be able to find rank quicker. This will be particularly helpful for large strings.
(more details can be found here)
Suggest to programmatically define the approach shown here (screenshot attached below) given below)
I would say David post (the accepted answer) is super cool. However, I would like to improve it further for speed. The inner loop is trying to find inverse order pairs, and for each such inverse order, it tries to contribute to the increment of rank. If we use an ordered map structure (binary search tree or BST) in that place, we can simply do an inorder traversal from the first node (left-bottom) until it reaches the current character in the BST, rather than traversal for the whole map(BST). In C++, std::map is a perfect one for BST implementation. The following code reduces the necessary iterations in loop and removes the if check.
long long rankofword(string s)
{
long long rank = 1;
long long suffixPermCount = 1;
map<char, int> m;
int size = s.size();
for (int i = size - 1; i > -1; i--)
{
char x = s[i];
m[x]++;
for (auto it = m.begin(); it != m.find(x); it++)
rank += suffixPermCount * it->second / m[x];
suffixPermCount *= (size - i);
suffixPermCount /= m[x];
}
return rank;
}
#Dvaid Einstat, this was really helpful. It took me a WHILE to figure out what you were doing as I am still learning my first language(C#). I translated it into C# and figured that I'd give that solution as well since this listing helped me so much!
Thanks!
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Text.RegularExpressions;
namespace CsharpVersion
{
class Program
{
//Takes in the word and checks to make sure that the word
//is between 1 and 25 charaters inclusive and only
//letters are used
static string readWord(string prompt, int high)
{
Regex rgx = new Regex("^[a-zA-Z]+$");
string word;
string result;
do
{
Console.WriteLine(prompt);
word = Console.ReadLine();
} while (word == "" | word.Length > high | rgx.IsMatch(word) == false);
result = word.ToUpper();
return result;
}
//Creates a sorted dictionary containing distinct letters
//initialized with 0 frequency
static SortedDictionary<char,int> Counter(string word)
{
char[] wordArray = word.ToCharArray();
int len = word.Length;
SortedDictionary<char,int> count = new SortedDictionary<char,int>();
foreach(char c in word)
{
if(count.ContainsKey(c))
{
}
else
{
count.Add(c, 0);
}
}
return count;
}
//Creates a factorial function
static int Factorial(int n)
{
if (n <= 1)
{
return 1;
}
else
{
return n * Factorial(n - 1);
}
}
//Ranks the word input if there are no repeated charaters
//in the word
static Int64 rankWord(char[] wordArray)
{
int n = wordArray.Length;
Int64 rank = 1;
//loops through the array of letters
for (int i = 0; i < n-1; i++)
{
int x=0;
//loops all letters after i and compares them for factorial calculation
for (int j = i+1; j<n ; j++)
{
if (wordArray[i] > wordArray[j])
{
x++;
}
}
rank = rank + x * (Factorial(n - i - 1));
}
return rank;
}
//Ranks the word input if there are repeated charaters
//in the word
static Int64 rankPerm(String word)
{
Int64 rank = 1;
Int64 suffixPermCount = 1;
SortedDictionary<char, int> counter = Counter(word);
for (int i = word.Length - 1; i > -1; i--)
{
char x = Convert.ToChar(word.Substring(i,1));
int xCount;
if(counter[x] != 0)
{
xCount = counter[x] + 1;
}
else
{
xCount = 1;
}
counter[x] = xCount;
foreach (KeyValuePair<char,int> e in counter)
{
if (e.Key < x)
{
rank += suffixPermCount * e.Value / xCount;
}
}
suffixPermCount *= word.Length - i;
suffixPermCount /= xCount;
}
return rank;
}
static void Main(string[] args)
{
Console.WriteLine("Type Exit to end the program.");
string prompt = "Please enter a word using only letters:";
const int MAX_VALUE = 25;
Int64 rank = new Int64();
string theWord;
do
{
theWord = readWord(prompt, MAX_VALUE);
char[] wordLetters = theWord.ToCharArray();
Array.Sort(wordLetters);
bool duplicate = false;
for(int i = 0; i< theWord.Length - 1; i++)
{
if(wordLetters[i] < wordLetters[i+1])
{
duplicate = true;
}
}
if(duplicate)
{
SortedDictionary<char, int> counter = Counter(theWord);
rank = rankPerm(theWord);
Console.WriteLine("\n" + theWord + " = " + rank);
}
else
{
char[] letters = theWord.ToCharArray();
rank = rankWord(letters);
Console.WriteLine("\n" + theWord + " = " + rank);
}
} while (theWord != "EXIT");
Console.WriteLine("\nPress enter to escape..");
Console.Read();
}
}
}
If there are k distinct characters, the i^th character repeated n_i times, then the total number of permutations is given by
(n_1 + n_2 + ..+ n_k)!
------------------------------------------------
n_1! n_2! ... n_k!
which is the multinomial coefficient.
Now we can use this to compute the rank of a given permutation as follows:
Consider the first character(leftmost). say it was the r^th one in the sorted order of characters.
Now if you replace the first character by any of the 1,2,3,..,(r-1)^th character and consider all possible permutations, each of these permutations will precede the given permutation. The total number can be computed using the above formula.
Once you compute the number for the first character, fix the first character, and repeat the same with the second character and so on.
Here's the C++ implementation to your question
#include<iostream>
using namespace std;
int fact(int f) {
if (f == 0) return 1;
if (f <= 2) return f;
return (f * fact(f - 1));
}
int solve(string s,int n) {
int ans = 1;
int arr[26] = {0};
int len = n - 1;
for (int i = 0; i < n; i++) {
s[i] = toupper(s[i]);
arr[s[i] - 'A']++;
}
for(int i = 0; i < n; i++) {
int temp = 0;
int x = 1;
char c = s[i];
for(int j = 0; j < c - 'A'; j++) temp += arr[j];
for (int j = 0; j < 26; j++) x = (x * fact(arr[j]));
arr[c - 'A']--;
ans = ans + (temp * ((fact(len)) / x));
len--;
}
return ans;
}
int main() {
int i,n;
string s;
cin>>s;
n=s.size();
cout << solve(s,n);
return 0;
}
Java version of unrank for a String:
public static String unrankperm(String letters, int rank) {
Map<Character, Integer> charCounts = new java.util.HashMap<>();
int permcount = 1;
for(int i = 0; i < letters.length(); i++) {
char x = letters.charAt(i);
int xCount = charCounts.containsKey(x) ? charCounts.get(x) + 1 : 1;
charCounts.put(x, xCount);
permcount = (permcount * (i + 1)) / xCount;
}
// charCounts is the histogram of letters
// permcount is the number of distinct perms of letters
StringBuilder perm = new StringBuilder();
for(int i = 0; i < letters.length(); i++) {
List<Character> sorted = new ArrayList<>(charCounts.keySet());
Collections.sort(sorted);
for(Character x : sorted) {
// suffixcount is the number of distinct perms that begin with x
Integer frequency = charCounts.get(x);
int suffixcount = permcount * frequency / (letters.length() - i);
if (rank <= suffixcount) {
perm.append(x);
permcount = suffixcount;
if(frequency == 1) {
charCounts.remove(x);
} else {
charCounts.put(x, frequency - 1);
}
break;
}
rank -= suffixcount;
}
}
return perm.toString();
}
See also n-th-permutation-algorithm-for-use-in-brute-force-bin-packaging-parallelization.

Binary Search C++, STATUS_ACCESS_VIOLATION

the thing is that we are tring to do for first time a binary search function for a vector class, but I dont really know the reason why this is not working. Does anyone know what could be wrong(Status acces violation when the number is not in the vector array)??
// size_ is de number of used elements i the array
int Vector::bsearch(int value) const
{
int first = 0;
int last = size_ - 1;
int Substraction = size_ /2;
while(last >= first)
{
Substraction = first + (last - first) / 2;
if(array_[Substraction] > value)
last = Substraction;
else if(array_[Substraction] < value)
first = Substraction;
else if(array_[Substraction] == value)
return Substraction;
}
return CS170::Vector::NO_INDEX;
}
//SOLVED
int Vector::bsearch(int value) const
{
unsigned first = 0;
unsigned last = size_ - 1;
unsigned int mid;
if(value < array_[0] || value > array_[size_ - 1])
return CS170::Vector::NO_INDEX;
while(last >= first)
{
mid = first + (last - first) / 2;
if(value < array_[mid])
last = mid - 1;
else if(array_[mid] < value)
first = mid + 1;
else
return mid;
}
return CS170::Vector::NO_INDEX;
}
You're not excluding the element you just tested before the next loop. And your choice of variable names, Subtraction is dreadful. Its a midpoint.
int first = 0;
int last = size_-1;
int mid = 0;
while(last >= first)
{
mid = first + (last - first) / 2;
if(value < array_[mid])
last = mid-1; // don't include element just tested
else if(array_[mid] < value)
first = mid+1; // don't include element just tested
else return mid;
}
return CS170::Vector::NO_INDEX
Your initialization is wrong -
int first = size_ - 1;
int last = first = 0;
should be -
int last = size_ - 1;
int first = 0;
Also, in while you should use the following condition -
while (last>first)

Binary search to find the range in which the number lies

I have an array
Values array: 12 20 32 40 52
^ ^ ^ ^ ^
0 1 2 3 4
on which I have to perform binary search to find the index of the range in which the number lies. For example:
Given the number -> 19 (It lies between index 0 and 1), return 0
Given the number -> 22 (It lies between index 1 and 2), return 1
Given the number -> 40 (It lies between index 3 and 4), return 3
I implemented the binary search in the following manner, and this comes to be correct for case 1, and 3 but incorrect if we search for case 2 or 52, 55 32, etc.
#include <iostream>
using namespace std;
int findIndex(int values[], int number, unsigned first, unsigned last)
{
unsigned midPoint;
while(first<last)
{
unsigned midPoint = (first+last)/2;
if (number <= values[midPoint])
last = midPoint -1;
else if (number > values[midPoint])
first = midPoint + 1;
}
return midPoint;
}
int main()
{
int a[] = {12, 20, 32, 40, 52};
unsigned i = findIndex(a, 55, 0, 4);
cout << i;
}
Use of additional variables such as bool found is not allowed.
A range in C or C++ is normally given as the pointing directly to the lower bound, but one past the upper bound. Unless you're feeling extremely masochistic, you probably want to stick to that convention in your search as well.
Assuming you're going to follow that, your last = midpoint-1; is incorrect. Rather, you want to set last to one past the end of the range you're going to actually use, so it should be last = midpoint;
You also only really need one comparison, not two. In a binary search as long as the two bounds aren't equal, you're going to set either the lower or the upper bound to the center point, so you only need to do one comparison to decide which.
At least by convention, in C++, you do all your comparisons using < instead of <=, >, etc. Any of the above can work, but following the convention of using only < keeps from imposing extra (unnecessary) requirements on contained types.
Though most interviewers probably don't care, there's also a potential overflow when you do midpoint = (left + right)/2;. I'd generally prefer midpoint = left + (right - left)/2;
Taking those into account, code might look something like this:
template <class T>
T *lower_bound(T *left, T *right, T val) {
while (left < right) {
T *middle = left + (right - left) / 2;
if (*middle < val)
left = middle + 1;
else
right = middle;
}
return left;
}
template <class T>
T *upper_bound(T *left, T *right, T val) {
while (left < right) {
T *middle = left + (right - left) / 2;
if (val < *middle)
right = middle;
else
left = middle + 1;
}
return left;
}
Why not to use standard library functions?
#include <vector>
#include <algorithm>
#include <iostream>
using namespace std;
int main() {
for (int input = 10; input < 55; input++) {
cout << input << ": ";
// Your desire:
vector<int> v = { 12, 20, 32, 40, 52 };
if (input < v.front() || input > v.back()) {
cout << "Not found" << endl;
} else {
auto it = upper_bound(v.begin(), v.end(), input);
cout << it - v.begin() - 1 << endl;
}
}
}
Note: a pretty-cool site - http://en.cppreference.com/w/cpp/algorithm
This will work under the condition that min(A[i]) <= key <=max(A[i])
int binary_search(int A[],int key,int left, int right)
{
while (left <= right) {
int middle = left + (right - left) / 2;
if (A[middle] < key)
left = middle+1;
else if(A[middle] > key)
right = middle-1;
else
return middle;
}
return (left - 1);
}
For INPUT
4
1 3 8 10
4
OUTPUT
3 (the minimum of the 3 and 8)
#include <stdio.h>
int main()
{
int c, first, last, middle, n, search, array[100];
scanf("%d",&n);
for (c = 0; c < n; c++)
scanf("%d",&array[c]);
scanf("%d", &search);
first = 0;
last = n - 1;
middle = (first+last)/2;
while (first <= last) {
if (array[middle] < search)
{
first = middle + 1; }
else if (array[middle] == search) {
break;
}
else
{
last = middle - 1;
}
middle = (first + last)/2;
}
printf("%d\n",array[middle]);
return 0;
}
A regular binary search on success returns the index of the key. On failure to find the key it always stops at the index of the lowest key greater than the key we are searching. I guess following modified binary search algorithm will work.
Given sorted array A
Find a key using binary search and get an index.
If A[index] == key
return index;
else
while(index > 1 && A[index] == A[index -1]) index = index -1;
return index;
binsrch(array, num, low, high) {
if (num > array[high])
return high;
while(1) {
if (low == high-1)
return low;
if(low >= high)
return low-1;
mid = (low+high)/2
if (num < arr[mid])
high = mid;
else
low = mid+1;
}
}
here is a more specific answer
int findIndex(int values[],int key,int first, int last)
{
if(values[first]<=key && values[first+1]>=key)// stopping condition
{
return first;
}
int imid=first+(last-first)/2;
if(first==last || imid==first)
{
return -1;
}
if(values[imid]>key)
{
return findIndex(values,key,first,imid);
}
else if(values[imid]<=key)
{
return findIndex(values,key,imid,last);
}
}
I feel this is more inline to what you were looking for...and we won't crap out on the last value in this thing
/* binary_range.c (c) 2016 adolfo#di-mare.com */
/* http://stackoverflow.com/questions/10935635 */
/* This code is written to be easily translated to Fortran */
#include <stdio.h> /* printf() */
#include <assert.h> /* assert() */
/** Find the biggest index 'i' such that '*nSEED <= nVEC[i]'.
- nVEC[0..N-1] is an strict ascending order array.
- Returns and index in [0..N].
- Returns 'N' when '*nSEED>nVEC[N-1]'.
- Uses binary search to find the range for '*nSEED'.
*/
int binary_range( int *nSEED, int nVEC[] , int N ) {
int lo,hi, mid,plus;
if ( *nSEED > nVEC[N-1] ) {
return N;
}
for (;;) { /* lo = binary_range_search() */
lo = 0;
hi = N-1;
for (;;) {
plus = (hi-lo)>>1; /* mid = (hi+lo)/2; */
if ( plus == 0 ) { assert( hi-lo==1 );
if (*nSEED <= nVEC[lo]) {
hi = lo;
}
else {
lo = hi;
}
}
mid = lo + plus; /* mid = lo + (hi-lo)/2; */
if (*nSEED <= nVEC[mid]) {
hi = mid;
}
else {
lo = mid;
}
if (lo>=hi) { break; }
}
break;
} /* 'lo' is the index */
/* This implementation does not use division. */
/* ========================================= */
assert( *nSEED <= nVEC[lo] );
return lo;
}
/** Find the biggest index 'i' such that '*nSEED <= nVEC[i]'.
- nVEC[0..N-1] is an strict ascending order array.
- Returns and index in [0..N].
- Returns 'N' when '*nSEED>nVEC[N-1]'.
- Uses sequential search to find the range for '*nSEED'.
*/
int sequential_range( int* nSEED, int nVEC[] , int N ) {
int i;
if ( *nSEED > nVEC[N-1] ) {
return N;
}
i=0;
while ( i<N ) {
if ( *nSEED <= nVEC[i] ) { break; }
++i;
}
return i;
}
/** test->stackoverflow.10935635(). */
void test_10935635() {
{{ /* test.stackoverflow.10935635() */
/* http://stackoverflow.com/questions/10935635 */
/* binary_range search to find the range in which the number lies */
/* 0 1 2 3 4 */
int nVEC[] = { 12,20,32,40,52 }; int val;
int N = sizeof(nVEC)/sizeof(nVEC[0]); /* N = DIM(nVEC[]) */
val=19; val = binary_range( &val,nVEC,N );
/* 19 -> [12 < (19) <= 20] -> return 1 */
val=19; assert( binary_range( &val,nVEC,N ) == 1 );
/* 22 -> [20 < (22) <= 32] -> return 2 */
val=22; assert( binary_range( &val,nVEC,N ) == 2 );
/* 40 -> [32 < (40) <= 40] -> return 3 */
val=40; assert( binary_range( &val,nVEC,N ) == 3 );
/* Everything over 52 returns N */
val=53; assert( binary_range( &val,nVEC,N ) == N );
}}
}
/** Test program. */
int main() {
if (1) {
printf( "\ntest_10935635()" );
test_10935635();
}
printf( "\nEND" );
return 0;
}
/* Compiler: gcc.exe (tdm-1) 4.9.2 */
/* IDE: Code::Blocks 16.01 */
/* Language: C && C++ */
/* EOF: binary_range.c */
I know this is an old thread, but since I had to solve a similar problem I thought I would share it. Given a set of non-overlapping ranges of integers, I need to test if a given value lies in any of those ranges. The following (in Java), uses a modified binary search to test if a value lies within the sorted (lowest to highest) set of integer ranges.
/**
* Very basic Range representation for long values
*
*/
public class Range {
private long low;
private long high;
public Range(long low, long high) {
this.low = low;
this.high = high;
}
public boolean isInRange(long val) {
return val >= low && val <= high;
}
public long getLow() {
return low;
}
public void setLow(long low) {
this.low = low;
}
public long getHigh() {
return high;
}
public void setHigh(long high) {
this.high = high;
}
#Override
public String toString() {
return "Range [low=" + low + ", high=" + high + "]";
}
}
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
//Java implementation of iterative Binary Search over Ranges
class BinaryRangeSearch {
// Returns index of x if it is present in the list of Range,
// else return -1
int binarySearch(List<Range> ranges, int x)
{
Range[] arr = new Range[ranges.size()];
arr = ranges.toArray(arr);
int low = 0, high = arr.length - 1;
int iters = 0;
while (low <= high) {
int mid = low + (high - low) / 2; // find mid point
// Check if x is present a
if (arr[mid].getLow() == x) {
System.out.println(iters + " iterations");
return mid;
}
// If x greater, ignore left half
if (x > arr[mid].getHigh()) {
low = mid + 1;
}
else if (x >= arr[mid].getLow()) {
System.out.println(iters + " iterations");
return mid;
}
// If x is smaller, ignore right half of remaining Ranges
else
high = mid - 1;
iters++;
}
return -1; // not in any of the given Ranges
}
// Driver method to test above
public static void main(String args[])
{
BinaryRangeSearch ob = new BinaryRangeSearch();
// make a test list of long Range
int multiplier = 1;
List<Range> ranges = new ArrayList<>();
int high = 0;
for(int i = 0; i <7; i++) {
int low = i + high;
high = (i+10) * multiplier;
Range r = new Range(low, high);
multiplier *= 10;
ranges.add(r);
}
System.out.println(Arrays.toString(ranges.toArray()));
int result = ob.binarySearch(ranges, 11);
if (result == -1)
System.out.println("Element not present");
else
System.out.println("Element found at "
+ "index " + result);
}
}
My python implementation:
Time complexity: O(log(n))
Space complexity: O(log(n))
def searchForRange(array, target):
range = [-1, -1]
alteredBinarySerach(array, target, 0, len(array) -1, range, True)
alteredBinarySerach(array, target, 0, len(array) -1, range, False)
return range
def alteredBinarySerach(array, target, left, right, range, goLeft):
if left > right:
return
middle = (left+ right)//2
if array[middle] > target:
alteredBinarySerach(array, target, left, middle -1, range, goLeft)
elif array[middle] < target:
alteredBinarySerach(array, target, middle +1, right, range, goLeft)
else:
if goLeft:
if middle == 0 or array[middle -1] != target:
range[0] = middle
else:
alteredBinarySerach(array, target, left, middle -1 , range, goLeft)
else:
if middle == len(array) -1 or array[middle+1] != target:
range[1] = middle
else:
alteredBinarySerach(array, target, middle +1, right , range, goLeft)

What is the fastest search method for a sorted array?

Answering to another question, I wrote the program below to compare different search methods in a sorted array. Basically I compared two implementations of Interpolation search and one of binary search. I compared performance by counting cycles spent (with the same set of data) by the different variants.
However I'm sure there is ways to optimize these functions to make them even faster. Does anyone have any ideas on how can I make this search function faster? A solution in C or C++ is acceptable, but I need it to process an array with 100000 elements.
#include <stdlib.h>
#include <stdio.h>
#include <time.h>
#include <stdint.h>
#include <assert.h>
static __inline__ unsigned long long rdtsc(void)
{
unsigned long long int x;
__asm__ volatile (".byte 0x0f, 0x31" : "=A" (x));
return x;
}
int interpolationSearch(int sortedArray[], int toFind, int len) {
// Returns index of toFind in sortedArray, or -1 if not found
int64_t low = 0;
int64_t high = len - 1;
int64_t mid;
int l = sortedArray[low];
int h = sortedArray[high];
while (l <= toFind && h >= toFind) {
mid = low + (int64_t)((int64_t)(high - low)*(int64_t)(toFind - l))/((int64_t)(h-l));
int m = sortedArray[mid];
if (m < toFind) {
l = sortedArray[low = mid + 1];
} else if (m > toFind) {
h = sortedArray[high = mid - 1];
} else {
return mid;
}
}
if (sortedArray[low] == toFind)
return low;
else
return -1; // Not found
}
int interpolationSearch2(int sortedArray[], int toFind, int len) {
// Returns index of toFind in sortedArray, or -1 if not found
int low = 0;
int high = len - 1;
int mid;
int l = sortedArray[low];
int h = sortedArray[high];
while (l <= toFind && h >= toFind) {
mid = low + ((float)(high - low)*(float)(toFind - l))/(1+(float)(h-l));
int m = sortedArray[mid];
if (m < toFind) {
l = sortedArray[low = mid + 1];
} else if (m > toFind) {
h = sortedArray[high = mid - 1];
} else {
return mid;
}
}
if (sortedArray[low] == toFind)
return low;
else
return -1; // Not found
}
int binarySearch(int sortedArray[], int toFind, int len)
{
// Returns index of toFind in sortedArray, or -1 if not found
int low = 0;
int high = len - 1;
int mid;
int l = sortedArray[low];
int h = sortedArray[high];
while (l <= toFind && h >= toFind) {
mid = (low + high)/2;
int m = sortedArray[mid];
if (m < toFind) {
l = sortedArray[low = mid + 1];
} else if (m > toFind) {
h = sortedArray[high = mid - 1];
} else {
return mid;
}
}
if (sortedArray[low] == toFind)
return low;
else
return -1; // Not found
}
int order(const void *p1, const void *p2) { return *(int*)p1-*(int*)p2; }
int main(void) {
int i = 0, j = 0, size = 100000, trials = 10000;
int searched[trials];
srand(-time(0));
for (j=0; j<trials; j++) { searched[j] = rand()%size; }
while (size > 10){
int arr[size];
for (i=0; i<size; i++) { arr[i] = rand()%size; }
qsort(arr,size,sizeof(int),order);
unsigned long long totalcycles_bs = 0;
unsigned long long totalcycles_is_64 = 0;
unsigned long long totalcycles_is_float = 0;
unsigned long long totalcycles_new = 0;
int res_bs, res_is_64, res_is_float, res_new;
for (j=0; j<trials; j++) {
unsigned long long tmp, cycles = rdtsc();
res_bs = binarySearch(arr,searched[j],size);
tmp = rdtsc(); totalcycles_bs += tmp - cycles; cycles = tmp;
res_is_64 = interpolationSearch(arr,searched[j],size);
assert(res_is_64 == res_bs || arr[res_is_64] == searched[j]);
tmp = rdtsc(); totalcycles_is_64 += tmp - cycles; cycles = tmp;
res_is_float = interpolationSearch2(arr,searched[j],size);
assert(res_is_float == res_bs || arr[res_is_float] == searched[j]);
tmp = rdtsc(); totalcycles_is_float += tmp - cycles; cycles = tmp;
}
printf("----------------- size = %10d\n", size);
printf("binary search = %10llu\n", totalcycles_bs);
printf("interpolation uint64_t = %10llu\n", totalcycles_is_64);
printf("interpolation float = %10llu\n", totalcycles_is_float);
printf("new = %10llu\n", totalcycles_new);
printf("\n");
size >>= 1;
}
}
If you have some control over the in-memory layout of the data, you might want to look at Judy arrays.
Or to put a simpler idea out there: a binary search always cuts the search space in half. An optimal cut point can be found with interpolation (the cut point should NOT be the place where the key is expected to be, but the point which minimizes the statistical expectation of the search space for the next step). This minimizes the number of steps but... not all steps have equal cost. Hierarchical memories allow executing a number of tests in the same time as a single test, if locality can be maintained. Since a binary search's first M steps only touch a maximum of 2**M unique elements, storing these together can yield a much better reduction of search space per-cacheline fetch (not per comparison), which is higher performance in the real world.
n-ary trees work on that basis, and then Judy arrays add a few less important optimizations.
Bottom line: even "Random Access Memory" (RAM) is faster when accessed sequentially than randomly. A search algorithm should use that fact to its advantage.
Benchmarked on Win32 Core2 Quad Q6600, gcc v4.3 msys. Compiling with g++ -O3, nothing fancy.
Observation - the asserts, timing and loop overhead is about 40%, so any gains listed below should be divided by 0.6 to get the actual improvement in the algorithms under test.
Simple answers:
On my machine replacing the int64_t with int for "low", "high" and "mid" in interpolationSearch gives a 20% to 40% speed up. This is the fastest easy method I could find. It is taking about 150 cycles per look-up on my machine (for the array size of 100000). That's roughly the same number of cycles as a cache miss. So in real applications, looking after your cache is probably going to be the biggest factor.
Replacing binarySearch's "/2" with a ">>1" gives a 4% speed up.
Using STL's binary_search algorithm, on a vector containing the same data as "arr", is about the same speed as the hand coded binarySearch. Although on the smaller "size"s STL is much slower - around 40%.
I have an excessively complicated solution, which requires a specialized sorting function. The sort is slightly slower than a good quicksort, but all of my tests show that the search function is much faster than a binary or interpolation search. I called it a regression sort before I found out that the name was already taken, but didn't bother to think of a new name (ideas?).
There are three files to demonstrate.
The regression sort/search code:
#include <sstream>
#include <math.h>
#include <ctime>
#include "limits.h"
void insertionSort(int array[], int length) {
int key, j;
for(int i = 1; i < length; i++) {
key = array[i];
j = i - 1;
while (j >= 0 && array[j] > key) {
array[j + 1] = array[j];
--j;
}
array[j + 1] = key;
}
}
class RegressionTable {
public:
RegressionTable(int arr[], int s, int lower, int upper, double mult, int divs);
RegressionTable(int arr[], int s);
void sort(void);
int find(int key);
void printTable(void);
void showSize(void);
private:
void createTable(void);
inline unsigned int resolve(int n);
int * array;
int * table;
int * tableSize;
int size;
int lowerBound;
int upperBound;
int divisions;
int divisionSize;
int newSize;
double multiplier;
};
RegressionTable::RegressionTable(int arr[], int s) {
array = arr;
size = s;
multiplier = 1.35;
divisions = sqrt(size);
upperBound = INT_MIN;
lowerBound = INT_MAX;
for (int i = 0; i < size; ++i) {
if (array[i] > upperBound)
upperBound = array[i];
if (array[i] < lowerBound)
lowerBound = array[i];
}
createTable();
}
RegressionTable::RegressionTable(int arr[], int s, int lower, int upper, double mult, int divs) {
array = arr;
size = s;
lowerBound = lower;
upperBound = upper;
multiplier = mult;
divisions = divs;
createTable();
}
void RegressionTable::showSize(void) {
int bytes = sizeof(*this);
bytes = bytes + sizeof(int) * 2 * (divisions + 1);
}
void RegressionTable::createTable(void) {
divisionSize = size / divisions;
newSize = multiplier * double(size);
table = new int[divisions + 1];
tableSize = new int[divisions + 1];
for (int i = 0; i < divisions; ++i) {
table[i] = 0;
tableSize[i] = 0;
}
for (int i = 0; i < size; ++i) {
++table[((array[i] - lowerBound) / divisionSize) + 1];
}
for (int i = 1; i <= divisions; ++i) {
table[i] += table[i - 1];
}
table[0] = 0;
for (int i = 0; i < divisions; ++i) {
tableSize[i] = table[i + 1] - table[i];
}
}
int RegressionTable::find(int key) {
double temp = multiplier;
multiplier = 1;
int minIndex = table[(key - lowerBound) / divisionSize];
int maxIndex = minIndex + tableSize[key / divisionSize];
int guess = resolve(key);
double t;
while (array[guess] != key) {
// uncomment this line if you want to see where it is searching.
//cout << "Regression Guessing " << guess << ", not there." << endl;
if (array[guess] < key) {
minIndex = guess + 1;
}
if (array[guess] > key) {
maxIndex = guess - 1;
}
if (array[minIndex] > key || array[maxIndex] < key) {
return -1;
}
t = ((double)key - array[minIndex]) / ((double)array[maxIndex] - array[minIndex]);
guess = minIndex + t * (maxIndex - minIndex);
}
multiplier = temp;
return guess;
}
inline unsigned int RegressionTable::resolve(int n) {
float temp;
int subDomain = (n - lowerBound) / divisionSize;
temp = n % divisionSize;
temp /= divisionSize;
temp *= tableSize[subDomain];
temp += table[subDomain];
temp *= multiplier;
return (unsigned int)temp;
}
void RegressionTable::sort(void) {
int * out = new int[int(size * multiplier)];
bool * used = new bool[int(size * multiplier)];
int higher, lower;
bool placed;
for (int i = 0; i < size; ++i) {
/* Figure out where to put the darn thing */
higher = resolve(array[i]);
lower = higher - 1;
if (higher > newSize) {
higher = size;
lower = size - 1;
} else if (lower < 0) {
higher = 0;
lower = 0;
}
placed = false;
while (!placed) {
if (higher < size && !used[higher]) {
out[higher] = array[i];
used[higher] = true;
placed = true;
} else if (lower >= 0 && !used[lower]) {
out[lower] = array[i];
used[lower] = true;
placed = true;
}
--lower;
++higher;
}
}
int index = 0;
for (int i = 0; i < size * multiplier; ++i) {
if (used[i]) {
array[index] = out[i];
++index;
}
}
insertionSort(array, size);
}
And then there is the regular search functions:
#include <iostream>
using namespace std;
int binarySearch(int array[], int start, int end, int key) {
// Determine the search point.
int searchPos = (start + end) / 2;
// If we crossed over our bounds or met in the middle, then it is not here.
if (start >= end)
return -1;
// Search the bottom half of the array if the query is smaller.
if (array[searchPos] > key)
return binarySearch (array, start, searchPos - 1, key);
// Search the top half of the array if the query is larger.
if (array[searchPos] < key)
return binarySearch (array, searchPos + 1, end, key);
// If we found it then we are done.
if (array[searchPos] == key)
return searchPos;
}
int binarySearch(int array[], int size, int key) {
return binarySearch(array, 0, size - 1, key);
}
int interpolationSearch(int array[], int size, int key) {
int guess = 0;
double t;
int minIndex = 0;
int maxIndex = size - 1;
while (array[guess] != key) {
t = ((double)key - array[minIndex]) / ((double)array[maxIndex] - array[minIndex]);
guess = minIndex + t * (maxIndex - minIndex);
if (array[guess] < key) {
minIndex = guess + 1;
}
if (array[guess] > key) {
maxIndex = guess - 1;
}
if (array[minIndex] > key || array[maxIndex] < key) {
return -1;
}
}
return guess;
}
And then I wrote a simple main to test out the different sorts.
#include <iostream>
#include <iomanip>
#include <cstdlib>
#include <ctime>
#include "regression.h"
#include "search.h"
using namespace std;
void randomizeArray(int array[], int size) {
for (int i = 0; i < size; ++i) {
array[i] = rand() % size;
}
}
int main(int argc, char * argv[]) {
int size = 100000;
string arg;
if (argc > 1) {
arg = argv[1];
size = atoi(arg.c_str());
}
srand(time(NULL));
int * array;
cout << "Creating Array Of Size " << size << "...\n";
array = new int[size];
randomizeArray(array, size);
cout << "Sorting Array...\n";
RegressionTable t(array, size, 0, size*2.5, 1.5, size);
//RegressionTable t(array, size);
t.sort();
int trials = 10000000;
int start;
cout << "Binary Search...\n";
start = clock();
for (int i = 0; i < trials; ++i) {
binarySearch(array, size, i % size);
}
cout << clock() - start << endl;
cout << "Interpolation Search...\n";
start = clock();
for (int i = 0; i < trials; ++i) {
interpolationSearch(array, size, i % size);
}
cout << clock() - start << endl;
cout << "Regression Search...\n";
start = clock();
for (int i = 0; i < trials; ++i) {
t.find(i % size);
}
cout << clock() - start << endl;
return 0;
}
Give it a try and tell me if it's faster for you. It's super complicated, so it's really easy to break it if you don't know what you are doing. Be careful about modifying it.
I compiled the main with g++ on ubuntu.
Unless your data is known to have special properties, pure interpolation search has the risk of taking linear time. If you expect interpolation to help with most data but don't want it to hurt in the case of pathological data, I would use a (possibly weighted) average of the interpolated guess and the midpoint, ensuring a logarithmic bound on the run time.
One way of approaching this is to use a space versus time trade-off. There are any number of ways that could be done. The extreme way would be to simply make an array with the max size being the max value of the sorted array. Initialize each position with the index into sortedArray. Then the search would simply be O(1).
The following version, however, might be a little more realistic and possibly be useful in the real world. It uses a "helper" structure that is initialized on the first call. It maps the search space down to a smaller space by dividing by a number that I pulled out of the air without much testing. It stores the index of the lower bound for a group of values in sortedArray into the helper map. The actual search divides the toFind number by the chosen divisor and extracts the narrowed bounds of sortedArray for a normal binary search.
For example, if the sorted values range from 1 to 1000 and the divisor is 100, then the lookup array might contain 10 "sections". To search for value 250, it would divide it by 100 to yield integer index position 250/100=2. map[2] would contain the sortedArray index for values 200 and larger. map[3] would have the index position of values 300 and larger thus providing a smaller bounding position for a normal binary search. The rest of the function is then an exact copy of your binary search function.
The initialization of the helper map might be more efficient by using a binary search to fill in the positions rather than a simple scan, but it is a one time cost so I didn't bother testing that. This mechanism works well for the given test numbers which are evenly distributed. As written, it would not be as good if the distribution was not even. I think this method could be used with floating point search values too. However, extrapolating it to generic search keys might be harder. For example, I am unsure what the method would be for character data keys. It would need some kind of O(1) lookup/hash that mapped to a specific array position to find the index bounds. It's unclear to me at the moment what that function would be or if it exists.
I kludged the setup of the helper map in the following implementation pretty quickly. It is not pretty and I'm not 100% sure it is correct in all cases but it does show the idea. I ran it with a debug test to compare the results against your existing binarySearch function to be somewhat sure it works correctly.
The following are example numbers:
100000 * 10000 : cycles binary search = 10197811
100000 * 10000 : cycles interpolation uint64_t = 9007939
100000 * 10000 : cycles interpolation float = 8386879
100000 * 10000 : cycles binary w/helper = 6462534
Here is the quick-and-dirty implementation:
#define REDUCTION 100 // pulled out of the air
typedef struct {
int init; // have we initialized it?
int numSections;
int *map;
int divisor;
} binhelp;
int binarySearchHelp( binhelp *phelp, int sortedArray[], int toFind, int len)
{
// Returns index of toFind in sortedArray, or -1 if not found
int low;
int high;
int mid;
if ( !phelp->init && len > REDUCTION ) {
int i;
int numSections = len / REDUCTION;
int divisor = (( sortedArray[len-1] - 1 ) / numSections ) + 1;
int threshold;
int arrayPos;
phelp->init = 1;
phelp->divisor = divisor;
phelp->numSections = numSections;
phelp->map = (int*)malloc((numSections+2) * sizeof(int));
phelp->map[0] = 0;
phelp->map[numSections+1] = len-1;
arrayPos = 0;
// Scan through the array and set up the mapping positions. Simple linear
// scan but it is a one-time cost.
for ( i = 1; i <= numSections; i++ ) {
threshold = i * divisor;
while ( arrayPos < len && sortedArray[arrayPos] < threshold )
arrayPos++;
if ( arrayPos < len )
phelp->map[i] = arrayPos;
else
// kludge to take care of aliasing
phelp->map[i] = len - 1;
}
}
if ( phelp->init ) {
int section = toFind / phelp->divisor;
if ( section > phelp->numSections )
// it is bigger than all values
return -1;
low = phelp->map[section];
if ( section == phelp->numSections )
high = len - 1;
else
high = phelp->map[section+1];
} else {
// use normal start points
low = 0;
high = len - 1;
}
// the following is a direct copy of the Kriss' binarySearch
int l = sortedArray[low];
int h = sortedArray[high];
while (l <= toFind && h >= toFind) {
mid = (low + high)/2;
int m = sortedArray[mid];
if (m < toFind) {
l = sortedArray[low = mid + 1];
} else if (m > toFind) {
h = sortedArray[high = mid - 1];
} else {
return mid;
}
}
if (sortedArray[low] == toFind)
return low;
else
return -1; // Not found
}
The helper structure needs to be initialized (and memory freed):
help.init = 0;
unsigned long long totalcycles4 = 0;
... make the calls same as for the other ones but pass the structure ...
binarySearchHelp(&help, arr,searched[j],length);
if ( help.init )
free( help.map );
help.init = 0;
Look first at the data and whether a big gain can be got by data specific method over a general method.
For large static sorted datasets, you can create an additional index to provide partial pigeon holing, based on the amount of memory you're willing to use. e.g. say we create a 256x256 two dimensional array of ranges, which we populate with the start and end positions in the search array of elements with corresponding high order bytes. When we come to search, we then use the high order bytes on the key to find the range / subset of the array we need to search. If we did have ~ 20 comparisons on our binary search of 100,000 elements O(log2(n)) we're now down to ~4 comarisons for 16 elements, or O(log2 (n/15)). The memory cost here is about 512k
Another method, again suited to data that doesn't change much, is to divide the data into arrays of commonly sought items and rarely sought items. For example, if you leave your existing search in place running a wide number of real world cases over a protracted testing period, and log the details of the item being sought, you may well find that the distribution is very uneven, i.e. some values are sought far more regularly than others. If this is the case, break your array into a much smaller array of commonly sought values and a larger remaining array, and search the smaller array first. If the data is right (big if!), you can often achieve broadly similar improvements to the first solution without the memory cost.
There are many other data specific optimizations which score far better than trying to improve on tried, tested and far more widely used general solutions.
Posting my current version before the question is closed (hopefully I will thus be able to ehance it later). For now it is worse than every other versions (if someone understand why my changes to the end of loop has this effect, comments are welcome).
int newSearch(int sortedArray[], int toFind, int len)
{
// Returns index of toFind in sortedArray, or -1 if not found
int low = 0;
int high = len - 1;
int mid;
int l = sortedArray[low];
int h = sortedArray[high];
while (l < toFind && h > toFind) {
mid = low + ((float)(high - low)*(float)(toFind - l))/(1+(float)(h-l));
int m = sortedArray[mid];
if (m < toFind) {
l = sortedArray[low = mid + 1];
} else if (m > toFind) {
h = sortedArray[high = mid - 1];
} else {
return mid;
}
}
if (l == toFind)
return low;
else if (h == toFind)
return high;
else
return -1; // Not found
}
The implementation of the binary search that was used for comparisons can be improved. The key idea is to "normalize" the range initially so that the target is always > a minimum and < than a maximum after the first step. This increases the termination delta size. It also has the effect of special casing targets that are less than the first element of the sorted array or greater than the last element of the sorted array. Expect approximately a 15% improvement in search time. Here is what the code might look like in C++.
int binarySearch(int * &array, int target, int min, int max)
{ // binarySearch
// normalize min and max so that we know the target is > min and < max
if (target <= array[min]) // if min not normalized
{ // target <= array[min]
if (target == array[min]) return min;
return -1;
} // end target <= array[min]
// min is now normalized
if (target >= array[max]) // if max not normalized
{ // target >= array[max]
if (target == array[max]) return max;
return -1;
} // end target >= array[max]
// max is now normalized
while (min + 1 < max)
{ // delta >=2
int tempi = min + ((max - min) >> 1); // point to index approximately in the middle between min and max
int atempi = array[tempi]; // just in case the compiler does not optimize this
if (atempi > target)max = tempi; // if the target is smaller, we can decrease max and it is still normalized
else if (atempi < target)min = tempi; // the target is bigger, so we can increase min and it is still normalized
else return tempi; // if we found the target, return with the index
// Note that it is important that this test for equality is last because it rarely occurs.
} // end delta >=2
return -1; // nothing in between normalized min and max
} // end binarySearch