Segments sum algorithm - c++

I am trying to solve the following task:
1) Given the array A of size N.
2) Given set of range update queries i.e. (L, R, val) that should do A[i] += val for L <= i <= R.
3) Given the set of range sum queries i.e. (L, R) that should return sum(A[i]) for L <= i <= R.
Constraints:
1) Size of A, segments and queries sets N, N1, N2 <= 2^24.
2) 0 <= L <= 2^24, 0 <= R <= 2^24, 0 <= val <= 2^24.
Problem is to calculate sum of all range sum queries (S) modulo 2^32.
It seems that one may implement Segment tree to get desired sum with O(NlogN) time but actually we don't need to use this data structure. Instead, we can somehow calculate S in O(N) time just using 2 or 3 arrays. What is the general idea here?
I has recently wrote some algorithm in C++ to this problem but that is not optimal. Pseudocode:
Create two arrays Add[0..N-1] and Substract[0..N-1].
Iterate over the set of range updates and do Add[L] += val and Substract[R] += val.
Create array Partial_sum[0..N]
Partial_sum[0] = 0, what_to_add = 0.
For i in [1..N]:
5.1. Partial_sum[i] = Partial_sum[i - 1] + Add[i - 1] + what_do_add
5.2. what_do_add = what_to_add + Add[i - 1] - Substract[i - 1]
We get Partial_sum array and can easily calculate any segment sum (L, R) in O(1) time just like Partial_sum[R+1] - Partial_sum[L].
But, the problem is that step 2 is too slow. Also, the loop in step 5 is hard to undestand. That is O(n) solution but constant is too high. I know there should be the way to improve step 5 but I don't undestand how to do this.
Could someone give some ideas or even suggest their own algorithm to solve this problem?
Thank you.
My algorithm implementation:
#include <cstring>
#include <iostream>
#include <stdio.h>
typedef unsigned int UINT;
typedef unsigned long long ULL;
//MOD and size of A
const ULL MOD = 4294967296LL; // 2^32
const size_t N = 16777216; // 2^24
//params for next_rand()
UINT seed = 0;
UINT a;
UINT b;
//get random segment
UINT next_rand()
{
seed = seed * a + b;
return seed >> 8;
}
int main()
{
UINT N1, N2;
std::cin >> N1 >> N2;
std::cin >> a >> b;
UINT* add = new UINT[N]; //Add array
UINT* subs = new UINT[N]; //Substraction array
UINT* part_sum = new UINT[N + 1]; //Partial sums array
memset(add, 0, sizeof(UINT) * N);
memset(subs, 0, sizeof(UINT) * N);
memset(part_sum, 0, sizeof(UINT) * (N + 1)); //Initialize arrays
//step 2
for (size_t i = 0; i < N1; ++i)
{
UINT val = next_rand();
UINT l = next_rand();
UINT r = next_rand();
if (l > r)
{
std::swap(l, r);
}
add[l] = (add[l] + val);
subs[r] = (subs[r] + val);
}
part_sum[0] = 0;
UINT curr_add = 0;
//step 5
for (size_t i = 1; i <= N; ++i)
{
part_sum[i] = (part_sum[i - 1] + curr_add + add[i - 1]);
curr_add = (curr_add + add[i - 1] - subs[i - 1]);
}
UINT res_sum = 0;
//Get any segment sum in O(1)
for (size_t i = 0; i < N2; ++i)
{
UINT l = next_rand();
UINT r = next_rand();
if (l > r)
{
std::swap(l, r);
}
res_sum = (res_sum + part_sum[r + 1] - part_sum[l]);
}
std::cout << res_sum;
delete []add;
delete []subs;
delete []part_sum;
return 0;
}

I've implemented described algorithm in different way. It should work faster. It should work faster than before at maximum values of update and sum query sizes.
#include <iostream>
#include <stdio.h>
#include <vector>
typedef unsigned int UINT;
typedef unsigned long long ULL;
const ULL MOD = 4294967296LL; // 2^32
const size_t N = 16777216; // 2^24
UINT seed = 0;
UINT a;
UINT b;
UINT next_rand()
{
seed = seed * a + b;
return seed >> 8;
}
std::vector <std::pair<UINT, UINT> > add;
int main()
{
UINT upd_query_count;
UINT sum_query_count;
// freopen("fastadd.in", "r", stdin);
// freopen("fastadd.out", "w", stdout);
scanf("%u", &upd_query_count);
scanf("%u", &sum_query_count);
scanf("%u", &a);
scanf("%u", &b);
add.reserve(N+1);
for (size_t i = 0; i < upd_query_count; ++i)
{
UINT val = next_rand();
UINT l = next_rand();
UINT r = next_rand();
if (l > r)
{
add[r].first += val;
add[l + 1].first -= val;
}
else
{
add[l].first += val;
add[r + 1].first -= val;
}
}
for (size_t i = 0; i < sum_query_count; ++i)
{
UINT l = next_rand();
UINT r = next_rand();
if (l > r)
{
++add[r].second;
--add[l + 1].second;
}
else
{
++add[l].second;
--add[r + 1].second;
}
}
UINT curr_add = 0;
UINT res_sum = 0;
UINT times = 0;
for (size_t i = 0; i < N; ++i )
{
curr_add += add[i].first;
times += add[i].second;
res_sum += curr_add * times;
}
printf("%u\n", res_sum);
return 0;
}

So add ad subs are very large arrays.
The first place you should look for a speed up here is in memory access. As N1 becomes large you will end up with a tremendous number of cache misses. This is probably somewhat beyond the scope to explain so I'll link: http://en.wikipedia.org/wiki/CPU_cache
As far as a way you can speed this up. Lets try to improve spatial equality by ordering our access.
std::vector<std::pair<UINT, UINT>> l{N1};
std::vector<std::pair<UINT, UINT>> r{N1};
for(size_t i = 0; i < N1; ++i){
const UINT val = next_rand();
const UINT first = next_rand();
const UINT second = next_rand();
if(first > second){
l[i] = std::make_pair(second, val);
r[i] = std::make_pair(first, val);
}else{
l[i] = std::make_pair(first, val);
r[i] = std::make_pair(second, val);
}
}
std::sort(l.begin(), l.end());
std::sort(r.begin(), r.end());
for(size_t i = 0; i < N1; ++i){
add[l[i].first] += l[i].second;
subs[r[i].first] += r[i].second;
}
Keep a couple things in mind, std::pair's operator< compares the first element and if those are equal compares the second. That's how I'm able to use std::sort without writing any more code. However if first is equal for two elemnts, the highest val will always be the second one added. It doesn't seem like that would be a problem in your current code, but if it becomes one you can solve it by writing your own sort loop, rather than relying on std::sort.
Also depending on how sparse access is to each cache block it may be faster to do your additions in separate loops.
As always the only way you can really improve performance is when working with actual numbers so be sure to do your own bench marking as your're comparing methods.

Related

How to get the minimum XOR of a given value and the value from a query of range for a given array

Given an array A of n integers and given queries in the form of range [l , r] and a value x, find the minimum of A[i] XOR x where l <= i <= r and x will be different for different queries.
I tried solving this problem using segment trees but I am not sure what type of information I should store in them as x will be different for different queries.
0 < number of queries <= 1e4
0 < n <= 1e4
To solve this I used a std::vector as basis (not an array, or std::array), just for flexibility.
#include <algorithm>
#include <stdexcept>
#include <vector>
int get_xored_max(const std::vector<int>& values, const size_t l, const size_t r, const int xor_value)
{
// check bounds of l and r
if ((l >= values.size()) || (r >= values.size()))
{
throw std::invalid_argument("index out of bounds");
}
// todo check l < r
// create left & right iterators to create a smaller vector
// only containing the subset we're interested in.
auto left = values.begin() + l;
auto right = values.begin() + r + 1;
std::vector<int> range{ left, right };
// xor all the values in the subset
for (auto& v : range)
{
v ^= xor_value;
}
// use the standard library function for finding the iterator to the maximum
// then use the * to dereference the iterator and get the value
auto max_value = *std::max_element(range.begin(), range.end());
return max_value;
}
int main()
{
std::vector<int> values{ 1,3,5,4,2,4,7,9 };
auto max_value = get_xored_max(values, 0u, 7u, 3);
return 0;
}
Approach - Trie + Offline Processing
Time Complexity - O(N32)
Space Complexity - O(N32)
Edit:
This Approach will fail. I guess, we have to use square root decomposition instead of two pointers approach.
I have solved this problem using Trie for finding minimum xor in a range of [l,r]. I solved queries by offline processing by sorting them.
Input format:
the first line has n (no. of elements) and q (no. of queries). the second line has all n elements of the array. each subsequent line has a query and each query has 3 inputs l, r and x.
Example -
Input -
3 3
2 1 2
1 2 3
1 3 2
2 3 5
First, convert all 3 queries into queries sorted by l and r.
converted queries -
1 2 3
1 3 2
2 3 5
Key here is processing over sorted queries using two pointers approach.
#include <bits/stdc++.h>
using namespace std;
const int N = (int)2e4 + 77;
int n, q, l, r, x;
int a[N], ans[N];
vector<pair<pair<int, int>, pair<int, int>>> queries;
// Trie Implementation starts
struct node
{
int nxt[2], cnt;
void newnode()
{
memset(nxt, 0, sizeof(nxt));
cnt = 0;
}
} trie[N * 32];
int tot = 1;
void update(int x, int v)
{
int p = 1;
for (int i = 31; i >= 0; i--)
{
int id = x >> i & 1;
if (!trie[p].nxt[id])
{
trie[++tot].newnode();
trie[p].nxt[id] = tot;
}
p = trie[p].nxt[id];
trie[p].cnt += v;
}
}
int minXor(int x)
{
int res = 0, p = 1;
for (int i = 31; i >= 0; i--)
{
int id = x >> i & 1;
if (trie[p].nxt[id] and trie[trie[p].nxt[id]].cnt)
p = trie[p].nxt[id];
else
{
p = trie[p].nxt[id ^ 1];
res |= 1 << i;
}
}
return res;
}
// Trie Implementation ends
int main()
{
cin >> n >> q;
for (int i = 1; i <= n; i += 1)
{
cin >> a[i];
}
for (int i = 1; i <= q; i += 1)
{
cin >> l >> r >> x;
queries.push_back({{l, r}, {x, i}});
}
sort(queries.begin(), queries.end());
int left = 1, right = 1;
for (int i = 0; i < q; i += 1)
{
int l = queries[i].first.first;
int r = queries[i].first.second;
int x = queries[i].second.first;
int index = queries[i].second.second;
while (left < l)
{
update(a[left], -1);
left += 1;
}
while (right <= r)
{
update(a[right], 1);
right += 1;
}
ans[index] = minXor(x);
}
for (int i = 1; i <= q; i += 1)
{
cout << ans[i] << " \n";
}
return 0;
}
Edit: with O(number of bits) code
Use a binary tree to store the values of A, look here : Minimum XOR for queries
What you need to change is adding to each node the range of indexes for A corresponding to the values in the leafs.
# minimal xor in a range
nbits=16 # Number of bits for numbers
asize=5000 # Array size
ntest=50 # Number of random test
from random import randrange
# Insert element a iindex iin the tree (increasing i only)
def tinsert(a,i,T):
for b in range(nbits-1,-1,-1):
v=((a>>b)&1)
T[v+2].append(i)
if T[v]==[]:T[v]=[[],[],[],[]]
T=T[v]
# Buildtree : builds a tree based on array V
def build(V):
T=[[],[],[],[]] # Init tree
for i,a in enumerate(V): tinsert(a,i,T)
return(T)
# Binary search : is T intersec [a,b] non empty ?
def binfind(T,a,b):
s,e,om=0,len(T)-1,-1
while True:
m=(s+e)>>1
v=T[m]
if v<a:
s=m
if m==om: return(a<=T[e]<=b)
elif v>b:
e=m
if m==om: return(a<=T[s]<=b)
else: return(True) # a<=T(m)<=b
om=m
# Look for the min xor in a give range index
def minx(x,s,e,T):
if s<0 or s>=(len(T[2])+len(T[3])) or e<s: return
r=0
for b in range(nbits-1,-1,-1):
v=((x>>b)&1)
if T[v+2]==[] or not binfind(T[v+2],s,e): # not nr with b set to v ?
v=1-v
T=T[v]
r=(r<<1)|v
return(r)
# Tests the code on random arrays
max=(1<<nbits)-1
for i in range(ntest):
A=[randrange(0,max) for i in range(asize)]
T=build(A)
x,s=randrange(0,max),randrange(0,asize-1)
e=randrange(s,asize)
if min(v^x for v in A[s:e+1])!=x^minx(x,s,e,T):
print('error')
I was able to solve this using segment tree and tries as suggested by #David Eisenstat
Below is an implementation in c++.
I constructed a trie for each segment in the segment tree. And finding the minimum xor is just traversing and matching the corresponding trie using each bit of the query value (here)
#include <bits/stdc++.h>
#define rep(i, a, b) for (int i = a; i < b; i++)
using namespace std;
const int bits = 7;
struct trie {
trie *children[2];
bool end;
};
trie *getNode(void)
{
trie *node = new trie();
node->end = false;
node->children[0] = NULL;
node->children[1] = NULL;
return node;
}
trie *merge(trie *l, trie *r)
{
trie *node = getNode();
// Binary 0:
if (l->children[0] && r->children[0])
node->children[0] = merge(l->children[0], r->children[0]);
else if (!r->children[0])
node->children[0] = l->children[0];
else if (!l->children[0])
node->children[0] = r->children[0];
// Binary 1:
if (l->children[1] && r->children[1])
node->children[1] = merge(l->children[1], r->children[1]);
else if (!r->children[1])
node->children[1] = l->children[1];
else if (!l->children[1])
node->children[1] = r->children[1];
return node;
}
void insert(trie *root, int num)
{
int mask = 1 << bits;
int bin;
rep(i, 0, bits + 1)
{
bin = ((num & mask) >> (bits - i));
if (!root->children[bin]) root->children[bin] = getNode();
root = root->children[bin];
mask = mask >> 1;
}
root->end = true;
}
struct _segTree {
int n, height, size;
vector<trie *> tree;
_segTree(int _n)
{
n = _n;
height = (int)ceil(log2(n));
size = (int)(2 * pow(2, height) - 1);
tree.resize(size);
}
trie *construct(vector<int> A, int start, int end, int idx)
{
if (start == end) {
tree[idx] = getNode();
insert(tree[idx], A[start]);
return tree[idx];
}
int mid = start + (end - start) / 2;
tree[idx] = merge(construct(A, start, mid, 2 * idx + 1),
construct(A, mid + 1, end, 2 * idx + 2));
return tree[idx];
}
int findMin(int num, trie *root)
{
int mask = 1 << bits;
int bin;
int rnum = 0;
int res = 0;
rep(i, 0, bits + 1)
{
bin = ((num & mask) >> (bits - i));
if (!root->children[bin]) {
bin = 1 - bin;
if (!root->children[bin]) return res ^ num;
}
rnum |= (bin << (bits - i));
root = root->children[bin];
if (root->end) res = rnum;
mask = mask >> 1;
}
return res ^ num;
}
int Query(int X, int start, int end, int qstart, int qend, int idx)
{
if (qstart <= start && qend >= end) return findMin(X, tree[idx]);
if (qstart > end || qend < start) return INT_MAX;
int mid = start + (end - start) / 2;
return min(Query(X, start, mid, qstart, qend, 2 * idx + 1),
Query(X, mid + 1, end, qstart, qend, 2 * idx + 2));
}
};
int main()
{
int n, q;
vector<int> A;
vector<int> L;
vector<int> R;
vector<int> X;
cin >> n;
A.resize(n, 0);
rep(i, 0, n) cin >> A[i];
cin >> q;
L.resize(q);
R.resize(q);
X.resize(q);
rep(i, 0, q) cin >> L[i] >> R[i] >> X[i];
//---------------------code--------------------//
_segTree segTree(n);
segTree.construct(A, 0, n - 1, 0);
rep(i, 0, q)
{
cout << segTree.Query(X[i], 0, n - 1, L[i], R[i], 0) << " ";
}
return 0;
}
Time complexity : O((2n - 1)*k + qklogn)
Space complexity : O((2n - 1)*2k)
k -> number of bits

How to read binary files properly?

I have a problem with the NIST/Diehard Binary Matrix test. It's about dividing a binary sequence into a 32x32 matrix and calculating its rank. After calculating ranks I need to compute a xi^2 value and then calculate p-value(must be from 0 to 1). I'm getting p-value extremely small even in a random sequence.
I've tried to hardcode some small examples and getting my p-value right though I think my problem is in reading a binary sequence file and getting bits from it.
This is reading from a file and converting to bits sequence.
ifstream fin("seq1.bin", ios::binary);
fin.seekg(0, ios::end);
int n = fin.tellg();
unsigned int start, end;
char *buf = new char[n];
fin.seekg(0, ios::beg);
fin.read(buf, n);
n *= 8;
bool *s = new bool[n];
for (int i = 0; i < n / 8; i++) {
for (int j = 7; j >= 0; j--) {
s[(i) * 8 + 7 - j] = (bool)((buf[i] >> j) & 1);
}
}
Then I form my matrix and calculate it's rank
int *ranks = new int[N];
for (int i = 0; i < N; i++) {
bool *arr = new bool[m*q];
copy(s + i * m*q, s +(i * m*q) + (m * q), arr);
ranks[i] = binary_rank(arr, m, q);
}
Cheking occurance in ranks
int count_occurrences(int arr[], int n, int x){
int result = 0;
for (int i = 0; i < n; i++)
if (x == arr[i])
result++;
return result;
}
Calculating xi^2 and p-value
double calculate_xi(int fm, int fm_1, int remaining, int N) {
double N1 = 0.2888*N;
double N2 = 0.5776*N;
double N3 = 0.1336*N;
double x1 = (fm - N1)*(fm - N1) / N1;
double x2 = (fm_1 - N2)*(fm_1 - N2) / N2;
double x3 = (remaining - N3)*(remaining - N3) / N3;
return x1 + x2 + x3;
}
double calculate_pvalue(double xi2) {
return exp(-(xi2 / 2));
}
I expect p-value between 0 and 1 but getting 0 every time. It's because of the extremely big xi^2 value and I couldn't find what I've done wrong. Could you please help me to get things right.
For this part:
for (int i = 0; i < n / 8; i++) {
for (int j = 7; j >= 0; j--) {
s[(i) * 8 + 7 - j] = (bool)((buf[i] >> j) & 1);
}
}
when you add elements to s array, looks like you switch the position of bytes inside each character: the last bit in character in buf goes into the first bit in character in s array, because the shift initially is 7, so you take first bit in char from buf[], but for s[] it looks to be 0, resulting in swapping. It is easy to verify with debugger though, as from code it is not so obvious. Thanks.

Codility MinAbsSum

I tried this Codility test: MinAbsSum.
https://codility.com/programmers/lessons/17-dynamic_programming/min_abs_sum/
I solved the problem by searching the whole tree of possibilities. The results were OK, however, my solution failed due to timeout for large input. In other words the time complexity was not as good as expected. My solution is O(nlogn), something normal with trees. But this coding test was in the section "Dynamic Programming", and there must be some way to improve it. I tried with summing the whole set first and then using this information, but always there is something missing in my solution. Does anybody have an idea on how to improve my solution using DP?
#include <vector>
using namespace std;
int sum(vector<int>& A, size_t i, int s)
{
if (i == A.size())
return s;
int tmpl = s + A[i];
int tmpr = s - A[i];
return min (abs(sum(A, i+1, tmpl)), abs(sum(A, i+1, tmpr)));
}
int solution(vector<int> &A) {
return sum(A, 0, 0);
}
I could not solve it. But here's the official answer.
Quoting it:
Notice that the range of numbers is quite small (maximum 100). Hence,
there must be a lot of duplicated numbers. Let count[i] denote the
number of occurrences of the value i. We can process all occurrences
of the same value at once. First we calculate values count[i] Then we
create array dp such that:
dp[j] = −1 if we cannot get the sum j,
dp[j] >= ­ 0 if we can get sum j.
Initially, dp[j] = -1 for all of j (except dp[0] = 0). Then we scan
through all the values a appearing in A; we consider all a such
that count[a]>0. For every such a we update dp that dp[j] denotes
how many values a remain (maximally) after achieving sum j. Note
that if the previous value at dp[j] >= 0 then we can set dp[j] =
count[a] as no value a is needed to obtain the sum j. Otherwise we
must obtain sum j-a first and then use a number a to get sum j. In
such a situation dp[j] = dp[j-a]-1. Using this algorithm, we can
mark all the sum values and choose the best one (closest to half of S,
the sum of abs of A).
def MinAbsSum(A):
N = len(A)
M = 0
for i in range(N):
A[i] = abs(A[i])
M = max(A[i], M)
S = sum(A)
count = [0] * (M + 1)
for i in range(N):
count[A[i]] += 1
dp = [-1] * (S + 1)
dp[0] = 0
for a in range(1, M + 1):
if count[a] > 0:
for j in range(S):
if dp[j] >= 0:
dp[j] = count[a]
elif (j >= a and dp[j - a] > 0):
dp[j] = dp[j - a] - 1
result = S
for i in range(S // 2 + 1):
if dp[i] >= 0:
result = min(result, S - 2 * i)
return result
(note that since the final iteration only considers sums up until S // 2 + 1, we can save some space and time by only creating a DP Cache up until that value as well)
The Java answer provided by fladam returns wrong result for input [2, 3, 2, 2, 3], although it gets 100% score.
Java Solution
import java.util.Arrays;
public class MinAbsSum{
static int[] dp;
public static void main(String args[]) {
int[] array = {1, 5, 2, -2};
System.out.println(findMinAbsSum(array));
}
public static int findMinAbsSum(int[] A) {
int arrayLength = A.length;
int M = 0;
for (int i = 0; i < arrayLength; i++) {
A[i] = Math.abs(A[i]);
M = Math.max(A[i], M);
}
int S = sum(A);
dp = new int[S + 1];
int[] count = new int[M + 1];
for (int i = 0; i < arrayLength; i++) {
count[A[i]] += 1;
}
Arrays.fill(dp, -1);
dp[0] = 0;
for (int i = 1; i < M + 1; i++) {
if (count[i] > 0) {
for(int j = 0; j < S; j++) {
if (dp[j] >= 0) {
dp[j] = count[i];
} else if (j >= i && dp[j - i] > 0) {
dp[j] = dp[j - i] - 1;
}
}
}
}
int result = S;
for (int i = 0; i < Math.floor(S / 2) + 1; i++) {
if (dp[i] >= 0) {
result = Math.min(result, S - 2 * i);
}
}
return result;
}
public static int sum(int[] array) {
int sum = 0;
for(int i : array) {
sum += i;
}
return sum;
}
}
I invented another solution, better than the previous one. I do not use recursion any more.
This solution works OK (all logical tests passed), and also passed some of the performance tests, but not all. How else can I improve it?
#include <vector>
#include <set>
using namespace std;
int solution(vector<int> &A) {
if (A.size() == 0) return 0;
set<int> sums, tmpSums;
sums.insert(abs(A[0]));
for (auto it = begin(A) + 1; it != end(A); ++it)
{
for (auto s : sums)
{
tmpSums.insert(abs(s + abs(*it)));
tmpSums.insert(abs(s - abs(*it)));
}
sums = tmpSums;
tmpSums.clear();
}
return *sums.begin();
}
This solution (in Java) scored 100% for both (correctness and performance)
public int solution(int[] a){
if (a.length == 0) return 0;
if (a.length == 1) return a[0];
int sum = 0;
for (int i=0;i<a.length;i++){
sum += Math.abs(a[i]);
}
int[] indices = new int[a.length];
indices[0] = 0;
int half = sum/2;
int localSum = Math.abs(a[0]);
int minLocalSum = Integer.MAX_VALUE;
int placeIndex = 1;
for (int i=1;i<a.length;i++){
if (localSum<half){
if (Math.abs(2*minLocalSum-sum) > Math.abs(2*localSum - sum))
minLocalSum = localSum;
localSum += Math.abs(a[i]);
indices[placeIndex++] = i;
}else{
if (localSum == half)
return Math.abs(2*half - sum);
if (Math.abs(2*minLocalSum-sum) > Math.abs(2*localSum - sum))
minLocalSum = localSum;
if (placeIndex > 1) {
localSum -= Math.abs(a[indices[placeIndex--]]);
i = indices[placeIndex];
}
}
}
return (Math.abs(2*minLocalSum - sum));
}
this solution treats all elements like they are positive numbers and it's looking to reach as close as it can to the sum of all elements divided by 2 (in that case we know that the sum of all other elements will be the same delta far from the half too -> abs sum will be minimum possible ).
it does so by starting with the first element and successively adding others to the "local" sum (and recording indices of elements in the sum) until it reaches sum of x >= sumAll/2. if that x is equal to sumAll/2 we have an optimal solution. if not, we go step back in the indices array and continue picking other element where last iteration in that position ended. the result will be a "local" sum having abs((sumAll - sum) - sum) closest to 0;
fixed solution:
public static int solution(int[] a){
if (a.length == 0) return 0;
if (a.length == 1) return a[0];
int sum = 0;
for (int i=0;i<a.length;i++) {
a[i] = Math.abs(a[i]);
sum += a[i];
}
Arrays.sort(a);
int[] arr = a;
int[] arrRev = new int[arr.length];
int minRes = Integer.MAX_VALUE;
for (int t=0;t<=4;t++) {
arr = fold(arr);
int res1 = findSum(arr, sum);
if (res1 < minRes) minRes = res1;
rev(arr, arrRev);
int res2 = findSum(arrRev, sum);
if (res2 < minRes) minRes = res2;
arrRev = fold(arrRev);
int res3 = findSum(arrRev, sum);
if (res3 < minRes) minRes = res3;
}
return minRes;
}
private static void rev(int[] arr, int[] arrRev){
for (int i = 0; i < arrRev.length; i++) {
arrRev[i] = arr[arr.length - 1 - i];
}
}
private static int[] fold(int[] a){
int[] arr = new int[a.length];
for (int i=0;a.length/2+i/2 < a.length && a.length/2-i/2-1 >= 0;i+=2){
arr[i] = a[a.length/2+i/2];
arr[i+1] = a[a.length/2-i/2-1];
}
if (a.length % 2 > 0) arr[a.length-1] = a[a.length-1];
else{
arr[a.length-2] = a[0];
arr[a.length-1] = a[a.length-1];
}
return arr;
}
private static int findSum(int[] arr, int sum){
int[] indices = new int[arr.length];
indices[0] = 0;
double half = Double.valueOf(sum)/2;
int localSum = Math.abs(arr[0]);
int minLocalSum = Integer.MAX_VALUE;
int placeIndex = 1;
for (int i=1;i<arr.length;i++){
if (localSum == half)
return 2*localSum - sum;
if (Math.abs(2*minLocalSum-sum) > Math.abs(2*localSum - sum))
minLocalSum = localSum;
if (localSum<half){
localSum += Math.abs(arr[i]);
indices[placeIndex++] = i;
}else{
if (placeIndex > 1) {
localSum -= Math.abs(arr[indices[--placeIndex]]);
i = indices[placeIndex];
}
}
}
return Math.abs(2*minLocalSum - sum);
}
The following is a rendering of the official answer in C++ (scoring 100% in task, correctness, and performance):
#include <cmath>
#include <algorithm>
#include <numeric>
using namespace std;
int solution(vector<int> &A) {
// write your code in C++14 (g++ 6.2.0)
const int N = A.size();
int M = 0;
for (int i=0; i<N; i++) {
A[i] = abs(A[i]);
M = max(M, A[i]);
}
int S = accumulate(A.begin(), A.end(), 0);
vector<int> counts(M+1, 0);
for (int i=0; i<N; i++) {
counts[A[i]]++;
}
vector<int> dp(S+1, -1);
dp[0] = 0;
for (int a=1; a<M+1; a++) {
if (counts[a] > 0) {
for (int j=0; j<S; j++) {
if (dp[j] >= 0) {
dp[j] = counts[a];
} else if ((j >= a) && (dp[j-a] > 0)) {
dp[j] = dp[j-a]-1;
}
}
}
}
int result = S;
for (int i =0; i<(S/2+1); i++) {
if (dp[i] >= 0) {
result = min(result, S-2*i);
}
}
return result;
}
You are almost 90% to the actual solution. It seems you understand recursion very well. Now, You should apply dynamic programming here with your program.
Dynamic Programming is nothing but memoization to the recursion so that we will not calculate same sub problems again and again. If same sub problems encounter , we return the previously calculated and memorized value. Memorization can be done with the help of a 2D array , say dp[][], where first state represent current index of array and second state represent summation.
For this problem specific, instead of giving calls to both states from each state, you sometimes can greedily take decision to skip one call.
I would like to provide the algorithm and then my implementation in C++. Idea is more or less the same as the official codility solution with some constant optimisation added.
Calculate the maximum absolute element of the inputs.
Calculate the absolute sum of the inputs.
Count the number of occurrence of each number in the inputs. Store the results in a vector hash.
Go through each input.
For each input, goes through all possible sums of any number of inputs. It is a slight constant optimisation to go only up to half of the possible sums.
For each sum that has been made before, set the occurrence count of the current input.
Check for each potential sum equal to or greater than the current input whether this input has already been used before. Update the values at the current sum accordingly. We do not need to check for potential sums less than the current input in this iteration, since it is evident that it has not been used before.
The above nested loop will fill in each possible sum with a value greater than -1.
Go through this possible sum hash again to look for the closest sum to half that is possible to make. Eventually, the min abs sum will be the difference of this from the half multiplied by two as the difference will be added up in both groups as the difference from the median.
The runtime complexity of this algorithm is O(N * max(abs(A)) ^ 2), or simply O(N * M ^ 2). That is because the outer loop is iterating M times and the inner loop is iterating sum times. The sum is basically N * M in worst case. Therefore, it is O(M * N * M).
The space complexity of this solution is O(N * M) because we allocate a hash of N items for the counts and a hash of S items for the sums. S is N * M again.
int solution(vector<int> &A)
{
int M = 0, S = 0;
for (const int e : A) { M = max(abs(e), M); S += abs(e); }
vector<int> counts(M + 1, 0);
for (const int e : A) { ++counts[abs(e)]; }
vector<int> sums(S + 1, -1);
sums[0] = 0;
for (int ci = 1; ci < counts.size(); ++ci) {
if (!counts[ci]) continue;
for (int si = 0; si < S / 2 + 1; ++si) {
if (sums[si] >= 0) sums[si] = counts[ci];
else if (si >= ci and sums[si - ci] > 0) sums[si] = sums[si - ci] - 1;
}
}
int min_abs_sum = S;
for (int i = S / 2; i >= 0; --i) if (sums[i] >= 0) return S - 2 * i;
return min_abs_sum;
}
Let me add my 50 cent, how to come up with the score 100% solution.
For me it was hard to understand the ultimate solution, proposed earlier in this thread.
So I started with warm-up solution with score 63%, because its O(NxNxM),
and because it doesn't use the fact that M is quite small value, and there are many duplicates in big arrays
here the key part is to understand how array isSumPossible is filled and interpreted:
how to fill array isSumPossible using numbers in input array:
if isSumPossible[sum] >= 0, i.e. sum is already possible, even without current number, then let's set it's value to 1 - count of current number, that is left unused for this sum, it'll go to our "reserve", so we can use it later for greater sums.
if (isSumPossible[sum] >= 0) {
isSumPossible[sum] = 1;
}
if isSumPossible[sum] <= 0, i.e. sum is considered not yet possible, with all input numbers considered previously, then let's check maybe
smaller sum sum - number is already considered as possible, and we have in "reserve" our current number (isSumPossible[sum - number] == 1), then do following
else if (sum >= number && isSumPossible[sum - number] == 1) {
isSumPossible[sum] = 0;
}
here isSumPossible[sum] = 0 means that we have used number in composing sum and it's now considered as possible (>=0), but we have no number in "reserve", because we've used it ( =0)
how to interpret filled array isSumPossible after considering all numbers in input array:
if isSumPossible[sum] >= 0 then the sum is possible, i.e. it can be reached by summation of some numbers in given array
if isSumPossible[sum] < 0 then the sum can't be reached by summation of any numbers in given array
The more simple thing here is to understand why we are searching sums only in interval [0, maxSum/2]:
because if find a possible sum, that is very close to maxSum/2,
ideal case here if we've found possible sum = maxSum/2,
if so, then it's obvious, that we can somehow use the rest numbers in input array to make another maxSum/2, but now with negative sign, so as a result of annihilation we'll get solution = 0, because maxSum/2 + (-1)maxSum/2 = 0.
But 0 the best case solution, not always reachable.
But we, nevertheless, should seek for the minimal delta = ((maxSum - sum) - sum),
so this we seek for delta -> 0, that's why we have this:
int result = Integer.MAX_VALUE;
for (int sum = 0; sum < maxSum / 2 + 1; sum++) {
if (isSumPossible[sum] >= 0) {
result = Math.min(result, (maxSum - sum) - sum);
}
}
warm-up solution
public int solution(int[] A) {
if (A == null || A.length == 0) {
return 0;
}
if (A.length == 1) {
return A[0];
}
int maxSum = 0;
for (int i = 0; i < A.length; i++) {
A[i] = Math.abs(A[i]);
maxSum += A[i];
}
int[] isSumPossible = new int[maxSum + 1];
Arrays.fill(isSumPossible, -1);
isSumPossible[0] = 0;
for (int number : A) {
for (int sum = 0; sum < maxSum / 2 + 1; sum++) {
if (isSumPossible[sum] >= 0) {
isSumPossible[sum] = 1;
} else if (sum >= number && isSumPossible[sum - number] == 1) {
isSumPossible[sum] = 0;
}
}
}
int result = Integer.MAX_VALUE;
for (int sum = 0; sum < maxSum / 2 + 1; sum++) {
if (isSumPossible[sum] >= 0) {
result = Math.min(result, maxSum - 2 * sum);
}
}
return result;
}
and after this we can optimize it, using the fact that there are many duplicate numbers in big arrays, and we come up with the solution with 100% score, its O(Mx(NxM)), because maxSum = NxM at worst case
public int solution(int[] A) {
if (A == null || A.length == 0) {
return 0;
}
if (A.length == 1) {
return A[0];
}
int maxNumber = 0;
int maxSum = 0;
for (int i = 0; i < A.length; i++) {
A[i] = Math.abs(A[i]);
maxNumber = Math.max(maxNumber, A[i]);
maxSum += A[i];
}
int[] count = new int[maxNumber + 1];
for (int i = 0; i < A.length; i++) {
count[A[i]]++;
}
int[] isSumPossible = new int[maxSum + 1];
Arrays.fill(isSumPossible, -1);
isSumPossible[0] = 0;
for (int number = 0; number < maxNumber + 1; number++) {
if (count[number] > 0) {
for (int sum = 0; sum < maxSum / 2 + 1; sum++) {
if (isSumPossible[sum] >= 0) {
isSumPossible[sum] = count[number];
} else if (sum >= number && isSumPossible[sum - number] > 0) {
isSumPossible[sum] = isSumPossible[sum - number] - 1;
}
}
}
}
int result = Integer.MAX_VALUE;
for (int sum = 0; sum < maxSum / 2 + 1; sum++) {
if (isSumPossible[sum] >= 0) {
result = Math.min(result, maxSum - 2 * sum);
}
}
return result;
}
I hope I've made it at least a little clear
Kotlin solution
Time complexity: O(N * max(abs(A))**2)
Score: 100%
import kotlin.math.*
fun solution(A: IntArray): Int {
val N = A.size
var M = 0
for (i in 0 until N) {
A[i] = abs(A[i])
M = max(M, A[i])
}
val S = A.sum()
val counts = MutableList(M + 1) { 0 }
for (i in 0 until N) {
counts[A[i]]++
}
val dp = MutableList(S + 1) { -1 }
dp[0] = 0
for (a in 1 until M + 1) {
if (counts[a] > 0) {
for (j in 0 until S) {
if (dp[j] >= 0) {
dp[j] = counts[a]
} else if (j >= a && dp[j - a] > 0) {
dp[j] = dp[j - a] - 1
}
}
}
}
var result = S
for (i in 0 until (S / 2 + 1)) {
if (dp[i] >= 0) {
result = minOf(result, S - 2 * i)
}
}
return result
}

Convert this recursive function to iterative

How can I convert this recursive function to an iterative function?
#include <cmath>
int M(int H, int T){
if (H == 0) return T;
if (H + 1 >= T) return pow(2, T) - 1;
return M(H - 1, T - 1) + M(H, T - 1) + 1;
}
Well it's a 3-line code but it's very hard for me to convert this to an iterative function. Because it has 2 variables. And I don't know anything about Stacks so I couldn't convert that.
My purpose for doing this is speed of the function. This function is too slow. I wanted to use map to make this faster but I have 3 variables M, H and T so I couldn't use map
you could use dynamic programming - start from the bottom up when H == 0 and T == 0 calculate M and iterate them. here is a link explaining how to do this for Fibonacci numbers, which are quite similar to your problem.
Check this,recursive and not recursive versions gave equal results for all inputs i gave so far. The idea is to keep intermediate results in matrix, where H is row index, T is col index, and the value is M(H,T). By the way, you can calculate it once and later just obtain the result from the matrix, so you will have performance O(1)
int array[10][10]={{0}};
int MNR(int H, int T)
{
if(array[H][T])
return array[H][T];
for(int i =0; i<= H;++i)
{
for(int j = 0; j<= T;++j)
{
if(i == 0)
array[i][j] = j;
else if( i+1 > j)
array[i][j] = pow(2,j) -1;
else
array[i][j] = array[i-1][j-1] + array[i][j-1] + 1;
}
}
return array[H][T];
}
int M(int H, int T)
{
if (H == 0) return T;
if (H + 1 >= T) return pow(2, T) - 1;
return M(H - 1, T - 1) + M(H, T - 1) + 1;
}
int main()
{
printf("%d\n", M(6,3));
printf("%d\n", MNR(6,3));
}
Unless you know the formula for n-th (in your case, (m,n)-th) element of the sequence, the easiest way is to simulate the recursion using a stack.
The code should look like the following:
#include <cmath>
#include <stack>
struct Data
{
public:
Data(int newH, int newT)
: T(newT), H(newH)
{
}
int H;
int T;
};
int M(int H, int T)
{
std::stack<Data> st;
st.push(Data(H, T));
int sum = 0;
while (st.size() > 0)
{
Data top = st.top();
st.pop();
if (top.H == 0)
sum += top.T;
else if (top.H + 1 >= top.T)
sum += pow(2, top.T) - 1;
else
{
st.push(Data(top.H - 1, top.T - 1));
st.push(Data(top.H, top.T - 1));
sum += 1;
}
}
return sum;
}
The main reason why this function is slow is because it has exponential complexity, and it keeps recalculating the same members again and again. One possible cure is memoize pattern (handily explained with examples in C++ here). The idea is to store every result in a structure with a quick access (e.g. an array) and every time you need it again, retrieve already precomputed result. Of course, this approach is limited by the size of your memory, so it won't work for extremely big numbers...
In your case, we could do something like that (keeping the recursion but memoizing the results):
#include <cmath>
#include <map>
#include <utility>
std::map<std::pair<int,int>,int> MM;
int M(int H, int T){
std::pair<int,int> key = std::make_pair(H,T);
std::map<std::pair<int,int>,int>::iterator found = MM.find(key);
if (found!=MM.end()) return found->second; // skip the calculations if we can
int result = 0;
if (H == 0) result = T;
else if (H + 1 >= T) result = pow(2, T) - 1;
else result = M(H - 1, T - 1) + M(H, T - 1) + 1;
MM[key] = result;
return result;
}
Regarding time complexity, C++ maps are tree maps, so searching there is of the order of N*log(N) where N is the size of the map (number of results which have been already computed). There are also hash maps for C++ which are part of the STL but not part of the standard library, as was already mentioned on SO. Hash map promises constant search time (the value of the constant is not specified though :) ), so you might also give them a try.
You may calculate using one demintional array. Little theory,
Let F(a,b) == M(H,T)
1. F(0,b) = b
2. F(a,b) = 2^b - 1, when a+1 >= b
3. F(a,b) = F(a-1,b-1) + F(a,b-1) + 1
Let G(x,y) = F(y,x) ,then
1. G(x,0) = x // RULE (1)
2. G(x,y) = 2^x - 1, when y+1 >= x // RULE (2)
3. G(x,y) = G(x-1,y-1) + G(x-1,y) + 1 // RULE(3) --> this is useful,
// because for G(x,y) need only G(x-1,?), i.e if G - is two deminsions array, then
// for calculating G[x][?] need only previous row G[x-1][?],
// so we need only last two rows of array.
// Here some values of G(x,y)
4. G(0,y) = 2^0 - 1 = 0 from (2) rule.
5. G(1,0) = 1 from (1) rule.
6. G(1,y) = 2^1 - 1 = 1, when y > 0, from (2) rule.
G(0,0) = 0, G(0,1) = 0, G(0,2) = 0, G(0,3) = 0 ...
G(1,0) = 1, G(1,1) = 1, G(1,2) = 1, G(1,3) = 1 ...
7. G(2,0) = 2 from (1) rule
8. G(2,1) = 2^2 - 1 = 3 from (2) rule
9. G(2,y) = 2^2 - 1 = 3 when y > 0, from (2) rule.
G(2,0) = 2, G(2,1) = 3, G(2,2) = 3, G(2,3) = 3, ....
10. G(3,0) = 3 from (1) rule
11. G(3,1) = G(2,0) + G(2,1) + 1 = 2 + 3 + 1 = 6 from (3) rule
12. G(3,2) = 2^3 - 1 = 7, from (2) rule
Now, how to calculate this G(x,y)
int M(int H, int T ) { return G(T,H); }
int G(int x, int y)
{
const int MAX_Y = 100; // or something else
int arr[2][MAX_Y] = {0} ;
int icurr = 0, inext = 1;
for(int xi = 0; xi < x; ++xi)
{
for( int yi = 0; yi <= y ;++yi)
{
if ( yi == 0 )
arr[inext][yi] = xi; // rule (1);
else if ( yi + 1 >= xi )
arr[inext][yi] = (1 << xi) - 1; // rule ( 2 )
else arr[inext][yi] =
arr[icurr][yi-1] + arr[icurr][yi] + 1; // rule (3)
}
icurr ^= 1; inext ^= 1; //swap(i1,i2);
}
return arr[icurr][y];
}
// Or some optimizing
int G(int x, int y)
{
const int MAX_Y = 100;
int arr[2][MAX_Y] = {0};
int icurr = 0, inext = 1;
for(int ix = 0; ix < x; ++ix)
{
arr[inext][0] = ix; // rule (1)
for(int iy = 1; iy < ix - 1; ++ iy)
arr[inext][iy] = arr[icurr][iy-1] + arr[icurr][iy] + 1; // rule (3)
for(int iy = max(0,ix-1); iy <= y; ++iy)
arr[inext][iy] = (1 << ix ) - 1; // rule(2)
icurr ^= 1 ; inext ^= 1;
}
return arr[icurr][y];
}

What is the fastest search method for a sorted array?

Answering to another question, I wrote the program below to compare different search methods in a sorted array. Basically I compared two implementations of Interpolation search and one of binary search. I compared performance by counting cycles spent (with the same set of data) by the different variants.
However I'm sure there is ways to optimize these functions to make them even faster. Does anyone have any ideas on how can I make this search function faster? A solution in C or C++ is acceptable, but I need it to process an array with 100000 elements.
#include <stdlib.h>
#include <stdio.h>
#include <time.h>
#include <stdint.h>
#include <assert.h>
static __inline__ unsigned long long rdtsc(void)
{
unsigned long long int x;
__asm__ volatile (".byte 0x0f, 0x31" : "=A" (x));
return x;
}
int interpolationSearch(int sortedArray[], int toFind, int len) {
// Returns index of toFind in sortedArray, or -1 if not found
int64_t low = 0;
int64_t high = len - 1;
int64_t mid;
int l = sortedArray[low];
int h = sortedArray[high];
while (l <= toFind && h >= toFind) {
mid = low + (int64_t)((int64_t)(high - low)*(int64_t)(toFind - l))/((int64_t)(h-l));
int m = sortedArray[mid];
if (m < toFind) {
l = sortedArray[low = mid + 1];
} else if (m > toFind) {
h = sortedArray[high = mid - 1];
} else {
return mid;
}
}
if (sortedArray[low] == toFind)
return low;
else
return -1; // Not found
}
int interpolationSearch2(int sortedArray[], int toFind, int len) {
// Returns index of toFind in sortedArray, or -1 if not found
int low = 0;
int high = len - 1;
int mid;
int l = sortedArray[low];
int h = sortedArray[high];
while (l <= toFind && h >= toFind) {
mid = low + ((float)(high - low)*(float)(toFind - l))/(1+(float)(h-l));
int m = sortedArray[mid];
if (m < toFind) {
l = sortedArray[low = mid + 1];
} else if (m > toFind) {
h = sortedArray[high = mid - 1];
} else {
return mid;
}
}
if (sortedArray[low] == toFind)
return low;
else
return -1; // Not found
}
int binarySearch(int sortedArray[], int toFind, int len)
{
// Returns index of toFind in sortedArray, or -1 if not found
int low = 0;
int high = len - 1;
int mid;
int l = sortedArray[low];
int h = sortedArray[high];
while (l <= toFind && h >= toFind) {
mid = (low + high)/2;
int m = sortedArray[mid];
if (m < toFind) {
l = sortedArray[low = mid + 1];
} else if (m > toFind) {
h = sortedArray[high = mid - 1];
} else {
return mid;
}
}
if (sortedArray[low] == toFind)
return low;
else
return -1; // Not found
}
int order(const void *p1, const void *p2) { return *(int*)p1-*(int*)p2; }
int main(void) {
int i = 0, j = 0, size = 100000, trials = 10000;
int searched[trials];
srand(-time(0));
for (j=0; j<trials; j++) { searched[j] = rand()%size; }
while (size > 10){
int arr[size];
for (i=0; i<size; i++) { arr[i] = rand()%size; }
qsort(arr,size,sizeof(int),order);
unsigned long long totalcycles_bs = 0;
unsigned long long totalcycles_is_64 = 0;
unsigned long long totalcycles_is_float = 0;
unsigned long long totalcycles_new = 0;
int res_bs, res_is_64, res_is_float, res_new;
for (j=0; j<trials; j++) {
unsigned long long tmp, cycles = rdtsc();
res_bs = binarySearch(arr,searched[j],size);
tmp = rdtsc(); totalcycles_bs += tmp - cycles; cycles = tmp;
res_is_64 = interpolationSearch(arr,searched[j],size);
assert(res_is_64 == res_bs || arr[res_is_64] == searched[j]);
tmp = rdtsc(); totalcycles_is_64 += tmp - cycles; cycles = tmp;
res_is_float = interpolationSearch2(arr,searched[j],size);
assert(res_is_float == res_bs || arr[res_is_float] == searched[j]);
tmp = rdtsc(); totalcycles_is_float += tmp - cycles; cycles = tmp;
}
printf("----------------- size = %10d\n", size);
printf("binary search = %10llu\n", totalcycles_bs);
printf("interpolation uint64_t = %10llu\n", totalcycles_is_64);
printf("interpolation float = %10llu\n", totalcycles_is_float);
printf("new = %10llu\n", totalcycles_new);
printf("\n");
size >>= 1;
}
}
If you have some control over the in-memory layout of the data, you might want to look at Judy arrays.
Or to put a simpler idea out there: a binary search always cuts the search space in half. An optimal cut point can be found with interpolation (the cut point should NOT be the place where the key is expected to be, but the point which minimizes the statistical expectation of the search space for the next step). This minimizes the number of steps but... not all steps have equal cost. Hierarchical memories allow executing a number of tests in the same time as a single test, if locality can be maintained. Since a binary search's first M steps only touch a maximum of 2**M unique elements, storing these together can yield a much better reduction of search space per-cacheline fetch (not per comparison), which is higher performance in the real world.
n-ary trees work on that basis, and then Judy arrays add a few less important optimizations.
Bottom line: even "Random Access Memory" (RAM) is faster when accessed sequentially than randomly. A search algorithm should use that fact to its advantage.
Benchmarked on Win32 Core2 Quad Q6600, gcc v4.3 msys. Compiling with g++ -O3, nothing fancy.
Observation - the asserts, timing and loop overhead is about 40%, so any gains listed below should be divided by 0.6 to get the actual improvement in the algorithms under test.
Simple answers:
On my machine replacing the int64_t with int for "low", "high" and "mid" in interpolationSearch gives a 20% to 40% speed up. This is the fastest easy method I could find. It is taking about 150 cycles per look-up on my machine (for the array size of 100000). That's roughly the same number of cycles as a cache miss. So in real applications, looking after your cache is probably going to be the biggest factor.
Replacing binarySearch's "/2" with a ">>1" gives a 4% speed up.
Using STL's binary_search algorithm, on a vector containing the same data as "arr", is about the same speed as the hand coded binarySearch. Although on the smaller "size"s STL is much slower - around 40%.
I have an excessively complicated solution, which requires a specialized sorting function. The sort is slightly slower than a good quicksort, but all of my tests show that the search function is much faster than a binary or interpolation search. I called it a regression sort before I found out that the name was already taken, but didn't bother to think of a new name (ideas?).
There are three files to demonstrate.
The regression sort/search code:
#include <sstream>
#include <math.h>
#include <ctime>
#include "limits.h"
void insertionSort(int array[], int length) {
int key, j;
for(int i = 1; i < length; i++) {
key = array[i];
j = i - 1;
while (j >= 0 && array[j] > key) {
array[j + 1] = array[j];
--j;
}
array[j + 1] = key;
}
}
class RegressionTable {
public:
RegressionTable(int arr[], int s, int lower, int upper, double mult, int divs);
RegressionTable(int arr[], int s);
void sort(void);
int find(int key);
void printTable(void);
void showSize(void);
private:
void createTable(void);
inline unsigned int resolve(int n);
int * array;
int * table;
int * tableSize;
int size;
int lowerBound;
int upperBound;
int divisions;
int divisionSize;
int newSize;
double multiplier;
};
RegressionTable::RegressionTable(int arr[], int s) {
array = arr;
size = s;
multiplier = 1.35;
divisions = sqrt(size);
upperBound = INT_MIN;
lowerBound = INT_MAX;
for (int i = 0; i < size; ++i) {
if (array[i] > upperBound)
upperBound = array[i];
if (array[i] < lowerBound)
lowerBound = array[i];
}
createTable();
}
RegressionTable::RegressionTable(int arr[], int s, int lower, int upper, double mult, int divs) {
array = arr;
size = s;
lowerBound = lower;
upperBound = upper;
multiplier = mult;
divisions = divs;
createTable();
}
void RegressionTable::showSize(void) {
int bytes = sizeof(*this);
bytes = bytes + sizeof(int) * 2 * (divisions + 1);
}
void RegressionTable::createTable(void) {
divisionSize = size / divisions;
newSize = multiplier * double(size);
table = new int[divisions + 1];
tableSize = new int[divisions + 1];
for (int i = 0; i < divisions; ++i) {
table[i] = 0;
tableSize[i] = 0;
}
for (int i = 0; i < size; ++i) {
++table[((array[i] - lowerBound) / divisionSize) + 1];
}
for (int i = 1; i <= divisions; ++i) {
table[i] += table[i - 1];
}
table[0] = 0;
for (int i = 0; i < divisions; ++i) {
tableSize[i] = table[i + 1] - table[i];
}
}
int RegressionTable::find(int key) {
double temp = multiplier;
multiplier = 1;
int minIndex = table[(key - lowerBound) / divisionSize];
int maxIndex = minIndex + tableSize[key / divisionSize];
int guess = resolve(key);
double t;
while (array[guess] != key) {
// uncomment this line if you want to see where it is searching.
//cout << "Regression Guessing " << guess << ", not there." << endl;
if (array[guess] < key) {
minIndex = guess + 1;
}
if (array[guess] > key) {
maxIndex = guess - 1;
}
if (array[minIndex] > key || array[maxIndex] < key) {
return -1;
}
t = ((double)key - array[minIndex]) / ((double)array[maxIndex] - array[minIndex]);
guess = minIndex + t * (maxIndex - minIndex);
}
multiplier = temp;
return guess;
}
inline unsigned int RegressionTable::resolve(int n) {
float temp;
int subDomain = (n - lowerBound) / divisionSize;
temp = n % divisionSize;
temp /= divisionSize;
temp *= tableSize[subDomain];
temp += table[subDomain];
temp *= multiplier;
return (unsigned int)temp;
}
void RegressionTable::sort(void) {
int * out = new int[int(size * multiplier)];
bool * used = new bool[int(size * multiplier)];
int higher, lower;
bool placed;
for (int i = 0; i < size; ++i) {
/* Figure out where to put the darn thing */
higher = resolve(array[i]);
lower = higher - 1;
if (higher > newSize) {
higher = size;
lower = size - 1;
} else if (lower < 0) {
higher = 0;
lower = 0;
}
placed = false;
while (!placed) {
if (higher < size && !used[higher]) {
out[higher] = array[i];
used[higher] = true;
placed = true;
} else if (lower >= 0 && !used[lower]) {
out[lower] = array[i];
used[lower] = true;
placed = true;
}
--lower;
++higher;
}
}
int index = 0;
for (int i = 0; i < size * multiplier; ++i) {
if (used[i]) {
array[index] = out[i];
++index;
}
}
insertionSort(array, size);
}
And then there is the regular search functions:
#include <iostream>
using namespace std;
int binarySearch(int array[], int start, int end, int key) {
// Determine the search point.
int searchPos = (start + end) / 2;
// If we crossed over our bounds or met in the middle, then it is not here.
if (start >= end)
return -1;
// Search the bottom half of the array if the query is smaller.
if (array[searchPos] > key)
return binarySearch (array, start, searchPos - 1, key);
// Search the top half of the array if the query is larger.
if (array[searchPos] < key)
return binarySearch (array, searchPos + 1, end, key);
// If we found it then we are done.
if (array[searchPos] == key)
return searchPos;
}
int binarySearch(int array[], int size, int key) {
return binarySearch(array, 0, size - 1, key);
}
int interpolationSearch(int array[], int size, int key) {
int guess = 0;
double t;
int minIndex = 0;
int maxIndex = size - 1;
while (array[guess] != key) {
t = ((double)key - array[minIndex]) / ((double)array[maxIndex] - array[minIndex]);
guess = minIndex + t * (maxIndex - minIndex);
if (array[guess] < key) {
minIndex = guess + 1;
}
if (array[guess] > key) {
maxIndex = guess - 1;
}
if (array[minIndex] > key || array[maxIndex] < key) {
return -1;
}
}
return guess;
}
And then I wrote a simple main to test out the different sorts.
#include <iostream>
#include <iomanip>
#include <cstdlib>
#include <ctime>
#include "regression.h"
#include "search.h"
using namespace std;
void randomizeArray(int array[], int size) {
for (int i = 0; i < size; ++i) {
array[i] = rand() % size;
}
}
int main(int argc, char * argv[]) {
int size = 100000;
string arg;
if (argc > 1) {
arg = argv[1];
size = atoi(arg.c_str());
}
srand(time(NULL));
int * array;
cout << "Creating Array Of Size " << size << "...\n";
array = new int[size];
randomizeArray(array, size);
cout << "Sorting Array...\n";
RegressionTable t(array, size, 0, size*2.5, 1.5, size);
//RegressionTable t(array, size);
t.sort();
int trials = 10000000;
int start;
cout << "Binary Search...\n";
start = clock();
for (int i = 0; i < trials; ++i) {
binarySearch(array, size, i % size);
}
cout << clock() - start << endl;
cout << "Interpolation Search...\n";
start = clock();
for (int i = 0; i < trials; ++i) {
interpolationSearch(array, size, i % size);
}
cout << clock() - start << endl;
cout << "Regression Search...\n";
start = clock();
for (int i = 0; i < trials; ++i) {
t.find(i % size);
}
cout << clock() - start << endl;
return 0;
}
Give it a try and tell me if it's faster for you. It's super complicated, so it's really easy to break it if you don't know what you are doing. Be careful about modifying it.
I compiled the main with g++ on ubuntu.
Unless your data is known to have special properties, pure interpolation search has the risk of taking linear time. If you expect interpolation to help with most data but don't want it to hurt in the case of pathological data, I would use a (possibly weighted) average of the interpolated guess and the midpoint, ensuring a logarithmic bound on the run time.
One way of approaching this is to use a space versus time trade-off. There are any number of ways that could be done. The extreme way would be to simply make an array with the max size being the max value of the sorted array. Initialize each position with the index into sortedArray. Then the search would simply be O(1).
The following version, however, might be a little more realistic and possibly be useful in the real world. It uses a "helper" structure that is initialized on the first call. It maps the search space down to a smaller space by dividing by a number that I pulled out of the air without much testing. It stores the index of the lower bound for a group of values in sortedArray into the helper map. The actual search divides the toFind number by the chosen divisor and extracts the narrowed bounds of sortedArray for a normal binary search.
For example, if the sorted values range from 1 to 1000 and the divisor is 100, then the lookup array might contain 10 "sections". To search for value 250, it would divide it by 100 to yield integer index position 250/100=2. map[2] would contain the sortedArray index for values 200 and larger. map[3] would have the index position of values 300 and larger thus providing a smaller bounding position for a normal binary search. The rest of the function is then an exact copy of your binary search function.
The initialization of the helper map might be more efficient by using a binary search to fill in the positions rather than a simple scan, but it is a one time cost so I didn't bother testing that. This mechanism works well for the given test numbers which are evenly distributed. As written, it would not be as good if the distribution was not even. I think this method could be used with floating point search values too. However, extrapolating it to generic search keys might be harder. For example, I am unsure what the method would be for character data keys. It would need some kind of O(1) lookup/hash that mapped to a specific array position to find the index bounds. It's unclear to me at the moment what that function would be or if it exists.
I kludged the setup of the helper map in the following implementation pretty quickly. It is not pretty and I'm not 100% sure it is correct in all cases but it does show the idea. I ran it with a debug test to compare the results against your existing binarySearch function to be somewhat sure it works correctly.
The following are example numbers:
100000 * 10000 : cycles binary search = 10197811
100000 * 10000 : cycles interpolation uint64_t = 9007939
100000 * 10000 : cycles interpolation float = 8386879
100000 * 10000 : cycles binary w/helper = 6462534
Here is the quick-and-dirty implementation:
#define REDUCTION 100 // pulled out of the air
typedef struct {
int init; // have we initialized it?
int numSections;
int *map;
int divisor;
} binhelp;
int binarySearchHelp( binhelp *phelp, int sortedArray[], int toFind, int len)
{
// Returns index of toFind in sortedArray, or -1 if not found
int low;
int high;
int mid;
if ( !phelp->init && len > REDUCTION ) {
int i;
int numSections = len / REDUCTION;
int divisor = (( sortedArray[len-1] - 1 ) / numSections ) + 1;
int threshold;
int arrayPos;
phelp->init = 1;
phelp->divisor = divisor;
phelp->numSections = numSections;
phelp->map = (int*)malloc((numSections+2) * sizeof(int));
phelp->map[0] = 0;
phelp->map[numSections+1] = len-1;
arrayPos = 0;
// Scan through the array and set up the mapping positions. Simple linear
// scan but it is a one-time cost.
for ( i = 1; i <= numSections; i++ ) {
threshold = i * divisor;
while ( arrayPos < len && sortedArray[arrayPos] < threshold )
arrayPos++;
if ( arrayPos < len )
phelp->map[i] = arrayPos;
else
// kludge to take care of aliasing
phelp->map[i] = len - 1;
}
}
if ( phelp->init ) {
int section = toFind / phelp->divisor;
if ( section > phelp->numSections )
// it is bigger than all values
return -1;
low = phelp->map[section];
if ( section == phelp->numSections )
high = len - 1;
else
high = phelp->map[section+1];
} else {
// use normal start points
low = 0;
high = len - 1;
}
// the following is a direct copy of the Kriss' binarySearch
int l = sortedArray[low];
int h = sortedArray[high];
while (l <= toFind && h >= toFind) {
mid = (low + high)/2;
int m = sortedArray[mid];
if (m < toFind) {
l = sortedArray[low = mid + 1];
} else if (m > toFind) {
h = sortedArray[high = mid - 1];
} else {
return mid;
}
}
if (sortedArray[low] == toFind)
return low;
else
return -1; // Not found
}
The helper structure needs to be initialized (and memory freed):
help.init = 0;
unsigned long long totalcycles4 = 0;
... make the calls same as for the other ones but pass the structure ...
binarySearchHelp(&help, arr,searched[j],length);
if ( help.init )
free( help.map );
help.init = 0;
Look first at the data and whether a big gain can be got by data specific method over a general method.
For large static sorted datasets, you can create an additional index to provide partial pigeon holing, based on the amount of memory you're willing to use. e.g. say we create a 256x256 two dimensional array of ranges, which we populate with the start and end positions in the search array of elements with corresponding high order bytes. When we come to search, we then use the high order bytes on the key to find the range / subset of the array we need to search. If we did have ~ 20 comparisons on our binary search of 100,000 elements O(log2(n)) we're now down to ~4 comarisons for 16 elements, or O(log2 (n/15)). The memory cost here is about 512k
Another method, again suited to data that doesn't change much, is to divide the data into arrays of commonly sought items and rarely sought items. For example, if you leave your existing search in place running a wide number of real world cases over a protracted testing period, and log the details of the item being sought, you may well find that the distribution is very uneven, i.e. some values are sought far more regularly than others. If this is the case, break your array into a much smaller array of commonly sought values and a larger remaining array, and search the smaller array first. If the data is right (big if!), you can often achieve broadly similar improvements to the first solution without the memory cost.
There are many other data specific optimizations which score far better than trying to improve on tried, tested and far more widely used general solutions.
Posting my current version before the question is closed (hopefully I will thus be able to ehance it later). For now it is worse than every other versions (if someone understand why my changes to the end of loop has this effect, comments are welcome).
int newSearch(int sortedArray[], int toFind, int len)
{
// Returns index of toFind in sortedArray, or -1 if not found
int low = 0;
int high = len - 1;
int mid;
int l = sortedArray[low];
int h = sortedArray[high];
while (l < toFind && h > toFind) {
mid = low + ((float)(high - low)*(float)(toFind - l))/(1+(float)(h-l));
int m = sortedArray[mid];
if (m < toFind) {
l = sortedArray[low = mid + 1];
} else if (m > toFind) {
h = sortedArray[high = mid - 1];
} else {
return mid;
}
}
if (l == toFind)
return low;
else if (h == toFind)
return high;
else
return -1; // Not found
}
The implementation of the binary search that was used for comparisons can be improved. The key idea is to "normalize" the range initially so that the target is always > a minimum and < than a maximum after the first step. This increases the termination delta size. It also has the effect of special casing targets that are less than the first element of the sorted array or greater than the last element of the sorted array. Expect approximately a 15% improvement in search time. Here is what the code might look like in C++.
int binarySearch(int * &array, int target, int min, int max)
{ // binarySearch
// normalize min and max so that we know the target is > min and < max
if (target <= array[min]) // if min not normalized
{ // target <= array[min]
if (target == array[min]) return min;
return -1;
} // end target <= array[min]
// min is now normalized
if (target >= array[max]) // if max not normalized
{ // target >= array[max]
if (target == array[max]) return max;
return -1;
} // end target >= array[max]
// max is now normalized
while (min + 1 < max)
{ // delta >=2
int tempi = min + ((max - min) >> 1); // point to index approximately in the middle between min and max
int atempi = array[tempi]; // just in case the compiler does not optimize this
if (atempi > target)max = tempi; // if the target is smaller, we can decrease max and it is still normalized
else if (atempi < target)min = tempi; // the target is bigger, so we can increase min and it is still normalized
else return tempi; // if we found the target, return with the index
// Note that it is important that this test for equality is last because it rarely occurs.
} // end delta >=2
return -1; // nothing in between normalized min and max
} // end binarySearch