Create ranking for vector of double - c++

I have a vector with doubles which I want to rank (actually it's a vector with objects with a double member called costs). If there are only unique values or I ignore the nonunique values then there is no problem. However, I want to use the average rank for nonunique values. Furthermore, I have found some question at SO about ranks, however they ignore the non-unique values.
Example, say we have (1, 5, 4, 5, 5) then the corresponding ranks should be (1, 4, 2, 4, 4). When we ignore the non-unique values the ranks are (1, 3, 2, 4, 5).
When ignoring the nonunique values I used the following:
void Population::create_ranks_costs(vector<Solution> &pop)
{
size_t const n = pop.size();
// Create an index vector
vector<size_t> index(n);
iota(begin(index), end(index), 0);
sort(begin(index), end(index),
[&pop] (size_t idx, size_t idy) {
return pop[idx].costs() < pop[idy].costs();
});
// Store the result in the corresponding solutions
for (size_t idx = 0; idx < n; ++idx)
pop[index[idx]].set_rank_costs(idx + 1);
}
Does anyone know how to take the non-unique values into account? I prefer using std::algorithm since IMO this lead to clean code.

Here is a routine for vectors as the title of the question suggests:
template<typename Vector>
std::vector<double> rank(const Vector& v)
{
std::vector<std::size_t> w(v.size());
std::iota(begin(w), end(w), 0);
std::sort(begin(w), end(w),
[&v](std::size_t i, std::size_t j) { return v[i] < v[j]; });
std::vector<double> r(w.size());
for (std::size_t n, i = 0; i < w.size(); i += n)
{
n = 1;
while (i + n < w.size() && v[w[i]] == v[w[i+n]]) ++n;
for (std::size_t k = 0; k < n; ++k)
{
r[w[i+k]] = i + (n + 1) / 2.0; // average rank of n tied values
// r[w[i+k]] = i + 1; // min
// r[w[i+k]] = i + n; // max
// r[w[i+k]] = i + k + 1; // random order
}
}
return r;
}
A working example see on IDEone.
For ranks with tied (equal) values there are varying conventions (min, max, averaged rank, or random order). Choose one of these in the innermost for loop (averaged rank is common in statistics, min rank in sports).
Please take into account, that averaged ranks can be non-integral (n+0.5).
I don't know, if rounding down to integral rank n is a problem for your application.
The algorithm easily could be generalized for user-defined orderings like pop[i].costs(), with std::less<> as default.

One way to do so would be using a multimap.
Place the items in a multimap mapping your objects to size_ts (the intial values are unimportant). You can do this with one line (use the ctor that takes iterators).
Loop (either plainly or using whatever from algorithm) and assign 0, 1, ... as the values.
Loop over the distinct keys. For each distinct key, call equal_range for the key, and set its values to the average (again, you can use stuff from algorithm for this).
The overall complexity should be Theta(n log(n)), where n is the length of the vector.

Something along these lines:
size_t run_start = 0;
double run_cost = pop[index[0]].costs();
for (size_t idx = 1; idx <= n; ++idx) {
double new_cost = idx < n ? pop[index[idx]].costs() : 0;
if (idx == n || new_cost != run_cost) {
double avg_rank = (run_start + 1 + idx) / 2.0;
for (size_t j = run_start; j < idx; ++j) {
pop[index[j]].set_rank_costs(avg_rank);
}
run_start = idx;
run_cost = new_cost;
}
}
Basically, you iterate over the sorted sequence and identify runs of equal values (possibly runs of length 1). For each such run, you calculate its average rank, and set it for all elements in the run.

Related

Find pairs in an array such that a+b%10 = k

There is a ordered list like
A=[7, 9, 10, 11, 12, 13, 20]
and I have to find pairs a+b%10=k where 0<=k<=9
For example k = 0
Pairs: (7, 13), (9, 11), (10, 20)
How can i find the number of pairs in O(n) time?
I tried to find convert all the list with take mod(10)
for (auto i : A) {
if (i <= k) {
B.push_back(i);
}
else {
B.push_back(i % 10);
}
}
After that i tried to define summations that gives k via unorderep_map
unordered_map<int, int> sumList;
int j = k;
for (int i = 0; i < 10; i++) {
sumList[i] = j;
if (j==0) j=9;
j--;
}
But i can't figure out that how can i count the number of pairs in O(n), what can i do now?
Let’s begin with a simple example. Assume that k = 0. That means that we want to find the number of pairs that sum up to a multiple of 10. What would those pairs look like? Well, they could be formed by
adding up a number whose last digit is 1 with a number whose last digit is 9,
adding up a number whose last digit is 2 with a number whose last digit is 8,
adding up a number whose last digit is 3 with a number whose last digit is 7,
adding up a number whose last digit is 4 with a number whose last digit is 6, or
adding up two numbers whose last digit is 5, or
adding up two numbers whose last digit is 0.
So suppose you have a frequency table A where A[i] is the number of numbers with last digit i. Then the number of pairs of numbers whose last digits are i and j, respectively, is given by
A[i] * A[j] if i ≠ j, and
A[i] * A[i-1] / 2 if i = j.
Based on this, if you wanted to count the number of pairs summing to k mod 10, you could
fill in the A array, then
iterate over all possible pairs that sum to k, using the above formula to count up the number of pairs without explicitly listing all of them.
That last step takes time O(1), since there are only ten buckets and iterating over the pairs you need therefore requires at most a constant amount of work.
I’ll leave the rest of the details to you.
Hope this helps!
You can modify counting sort for this.
Below is an untested, unoptimized and only illustrative version:
int mods[10];
void count_mods(int nums[], int n) {
for (int i = 0; i < n; i++)
mods[nums[i]%10]++;
}
int count_pairs(int k) {
// TODO: there's definitely a better way to do this, but it's O(1) anyway..
int count = 0;
for (int i = 0; i < 10; i++)
for (int j = i+1; j < n; j++)
if ((i + j) % 10 == k) {
int pairs = mods[i] > mods[j] ? mods[j] : mods[i];
if (i == j)
pairs /= 2;
count += pairs;
}
return count;
}
EDIT:
With a smaller constant.
int mods[10];
void count_mods(int nums[], int n) {
for (int i = 0; i < n; i++)
mods[nums[i]%10]++;
}
int count_pairs(int k) {
int count = 0;
for (int i = 0; i < 10; i++) {
int j = k - i;
if (j < 0)
j += 10;
count += min(mods[i], mods[j]);
// When k = 2*i we count half (rounded down) the items to make the pairs.
// Thus, we substract the extra elements by rounding up the half.
if (i == j)
count -= (mods[i]+1) / 2;
}
// We counted everything twice.
return count / 2;
}

Make laplace-expansion more efficient

I created a little program that is able to calculate the determinant of a matrix in C++. I used laplace-expansion, although I know that there are more efficient ways to do it:
double getDeterminantLaplace(const std::vector<std::vector<double>> vect) {
int dimension = vect.size();
if(dimension == 0) {
return 1;
}
if(dimension == 1) {
return vect[0][0];
}
//Formula for 2x2-matrix
if(dimension == 2) {
return vect[0][0] * vect[1][1] - vect[0][1] * vect[1][0];
}
double result = 0;
int sign = 1;
for(int i = 0; i < dimension; i++) {
//Submatrix
std::vector<std::vector<double>> subVect(dimension - 1, std::vector<double> (dimension - 1));
for(int m = 1; m < dimension; m++) {
int z = 0;
for(int n = 0; n < dimension; n++) {
if(n != i) {
subVect[m-1][z] = vect[m][n];
z++;
}
}
}
//recursive call
result = result + sign * vect[0][i] * getDeterminantLaplace(subVect);
sign = -sign;
}
return result;
}
My question now is: How can this algorithm be made more efficient?
One of my ideas is to not create the "submatrices" and just work with the original matrix, but I don't really know how to do it. What do you think about this idea? How can I do this in C++?
Do you have any more ideas?
A first, trivial optimization is not to recurse when the current element is zero. This will give you an instant speed-up on sparse matrices.
The next optimization is what you already suggested: Do not to create all submatrices. You can do that by creating an index vector. For example, if your original matrix has 4×4 elements, you recurse with the following index vectors:
0: {1, 2, 3}
1: {0, 2, 3}
2: {0, 1, 3}
3: {0, 1, 2}
You don't need to create the index vector from scratch each time: Start with the subvector that is the current vector without its front, then overwrite the i-th place with the i-th entry of the current ubvector.
When you access the element s[r][c] of the submatrix, access element a[r + top][col[c]] of the original matrix. You can determine the index of the top row from the dimensions of the current column vector and the original matrix.
You never create submatrices, only sub-column vectors. Split your function in two: One public function as front-end, which calls the recursive worker function.
This will speed up the calculation somewhat, but unfortunately, this improvement will not buy you much when your matrices grow. Let's look at the 4×4 matrix again. In the first recursion step, you will consider these 3×3 submatrices:
1, 2, 3 0, 2, 3 0, 1, 3 0, 1, 2
From there, you will calculate these 2×2 submatrices:
2, 3 2, 3 1, 3 1, 2
1, 3 0, 3 0, 3 0, 2
1, 2 0, 2, 0, 1, 0, 1,
Notice that these 12 indices are realy just 6 different pairs. You'll calculate each of them twice. This will get worse the bigger your original matrix is. A solution to this is memoizing: Once you have calculated the determinant of a certain submatrix, store the value in an associated array. Before calculating a submatrix, check whether you have already done that and if so, just return the value you calculated earlier.
This will speec up your function, but it comes at a price: It will create many entries in the associated array.
Anyway, here's the code that implements all optimizations I've described:
#include <vector>
#include <map>
#include <iostream>
double subdet(const std::vector<std::vector<double> > &a,
const std::vector<int> &col,
std::map<std::vector<int>, double> &memo)
{
int dim = col.size();
int top = a.size() - dim;
if (memo.find(col) != memo.end()) {
return memo[col];
}
if (dim == 2) return a[top + 0][col[0]] * a[top + 1][col[1]]
- a[top + 0][col[1]] * a[top + 1][col[0]];
double result = 0.0;
int sign = 1;
std::vector<int> ncol(&col[1], &col[dim]);
for (int i = 0; i < dim; i++) {
if (a[top][col[i]]) {
double d = subdet(a, ncol, memo);
result = result + sign * a[top][col[i]] * d;
}
sign = -sign;
if (i + 1 < dim) ncol[i] = col[i];
}
memo[col] = result;
return result;
}
double det(const std::vector<std::vector<double> > a)
{
int dim = a.size();
if (dim == 0) return 1.0;
if (dim == 1) return a[0][0];
std::vector<int> col(dim);
std::map<std::vector<int>, double> memo;
for (unsigned i = 0; i < a.size(); i++) col[i] = i;
return subdet(a, col, memo);
}
Notes: The map (a binary tree with O(log n) lookup) should really be an unodered map (a hash table with O(1) lookup), but I couldn't get it to work, because I'm bad at C++. Sorry about that.
There's probably room for optimization of the lookup key, too: One can enumerate the possible index vectors or use a bit mask, perhaps, thereby saving memory in the hash map. It's no good string references to the column-index vector, because it's short-lived and we're swapping around in it a lot, so it's not constant.
Of course, other algorithms are better suited for finding the determinat of large matrices. My answer focuses on improving the existing method.

String decode: looking for a better approach

I have worked out a O(n square) solution to the problem. I was wondering about a better solution to this. (this is not a homework/interview problem but something I do out of my own interest, hence sharing here):
If a=1, b=2, c=3,….z=26. Given a string, find all possible codes that string
can generate. example: "1123" shall give:
aabc //a = 1, a = 1, b = 2, c = 3
kbc // since k is 11, b = 2, c= 3
alc // a = 1, l = 12, c = 3
aaw // a= 1, a =1, w= 23
kw // k = 11, w = 23
Here is my code to the problem:
void alpha(int* a, int sz, vector<vector<int>>& strings) {
for (int i = sz - 1; i >= 0; i--) {
if (i == sz - 1) {
vector<int> t;
t.push_back(a[i]);
strings.push_back(t);
} else {
int k = strings.size();
for (int j = 0; j < k; j++) {
vector<int> t = strings[j];
strings[j].insert(strings[j].begin(), a[i]);
if (t[0] < 10) {
int n = a[i] * 10 + t[0];
if (n <= 26) {
t[0] = n;
strings.push_back(t);
}
}
}
}
}
}
Essentially the vector strings will hold the sets of numbers.
This would run in n square. I am trying my head around at least an nlogn solution.
Intuitively tree should help here, but not getting anywhere post that.
Generally, your problem complexity is more like 2^n, not n^2, since your k can increase with every iteration.
This is an alternative recursive solution (note: recursion is bad for very long codes). I didn't focus on optimization, since I'm not up to date with C++X, but I think the recursive solution could be optimized with some moves.
Recursion also makes the complexity a bit more obvious compared to the iterative solution.
// Add the front element to each trailing code sequence. Create a new sequence if none exists
void update_helper(int front, std::vector<std::deque<int>>& intermediate)
{
if (intermediate.empty())
{
intermediate.push_back(std::deque<int>());
}
for (size_t i = 0; i < intermediate.size(); i++)
{
intermediate[i].push_front(front);
}
}
std::vector<std::deque<int>> decode(int digits[], int count)
{
if (count <= 0)
{
return std::vector<std::deque<int>>();
}
std::vector<std::deque<int>> result1 = decode(digits + 1, count - 1);
update_helper(*digits, result1);
if (count > 1 && (digits[0] * 10 + digits[1]) <= 26)
{
std::vector<std::deque<int>> result2 = decode(digits + 2, count - 2);
update_helper(digits[0] * 10 + digits[1], result2);
result1.insert(result1.end(), result2.begin(), result2.end());
}
return result1;
}
Call:
std::vector<std::deque<int>> strings = decode(codes, size);
Edit:
Regarding the complexity of the original code, I'll try to show what would happen in the worst case scenario, where the code sequence consists only of 1 and 2 values.
void alpha(int* a, int sz, vector<vector<int>>& strings)
{
for (int i = sz - 1;
i >= 0;
i--)
{
if (i == sz - 1)
{
vector<int> t;
t.push_back(a[i]);
strings.push_back(t); // strings.size+1
} // if summary: O(1), ignoring capacity change, strings.size+1
else
{
int k = strings.size();
for (int j = 0; j < k; j++)
{
vector<int> t = strings[j]; // O(strings[j].size) vector copy operation
strings[j].insert(strings[j].begin(), a[i]); // strings[j].size+1
// note: strings[j].insert treated as O(1) because other containers could do better than vector
if (t[0] < 10)
{
int n = a[i] * 10 + t[0];
if (n <= 26)
{
t[0] = n;
strings.push_back(t); // strings.size+1
// O(1), ignoring capacity change and copy operation
} // if summary: O(1), strings.size+1
} // if summary: O(1), ignoring capacity change, strings.size+1
} // for summary: O(k * strings[j].size), strings.size+k, strings[j].size+1
} // else summary: O(k * strings[j].size), strings.size+k, strings[j].size+1
} // for summary: O(sum[i from 1 to sz] of (k * strings[j].size))
// k (same as string.size) doubles each iteration => k ends near 2^sz
// string[j].size increases by 1 each iteration
// k * strings[j].size increases by ?? each iteration (its getting huge)
}
Maybe I made a mistake somewhere and if we want to play nice we can treat a vector copy as O(1) instead of O(n) in order to reduce complexity, but the hard fact remains, that the worst case is doubling outer vector size in each iteration (at least every 2nd iteration, considering the exact structure of the if conditions) of the inner loop and the inner loop depends on that growing vector size, which makes the whole story at least O(2^n).
Edit2:
I figured out the result complexity (the best hypothetical algoritm still needs to create every element of the result, so result complexity is like a lower bound to what any algorithm can archieve)
Its actually following the Fibonacci numbers:
For worst case input (like only 1s) of size N+2 you have:
size N has k(N) elements
size N+1 has k(N+1) elements
size N+2 is the combination of codes starting with a followed by the combinations from size N+1 (a takes one element of the source) and the codes starting with k, followed by the combinations from size N (k takes two elements of the source)
size N+2 has k(N) + k(N+1) elements
Starting with size 1 => 1 (a) and size 2 => 2 (aa or k)
Result: still exponential growth ;)
Edit3:
Worked out a dynamic programming solution, somewhat similar to your approach with reverse iteration over the code array and kindof optimized in its vector usage, based on the properties explained in Edit2.
The inner loop (update_helper) is still dominated by the count of results (worst case Fibonacci) and a few outer loop iterations will have a decent count of sub-results, but at least the sub-results are reduced to a pointer to some intermediate node, so copying should be pretty efficient. As a little bonus, I switched the result from numbers to characters.
Another edit: updated code with range 0 - 25 as 'a' - 'z', fixed some errors that led to wrong results.
struct const_node
{
const_node(char content, const_node* next)
: next(next), content(content)
{
}
const_node* const next;
const char content;
};
// put front in front of each existing sub-result
void update_helper(int front, std::vector<const_node*>& intermediate)
{
for (size_t i = 0; i < intermediate.size(); i++)
{
intermediate[i] = new const_node(front + 'a', intermediate[i]);
}
if (intermediate.empty())
{
intermediate.push_back(new const_node(front + 'a', NULL));
}
}
std::vector<const_node*> decode_it(int digits[9], size_t count)
{
int current = 0;
std::vector<const_node*> intermediates[3];
for (size_t i = 0; i < count; i++)
{
current = (current + 1) % 3;
int prev = (current + 2) % 3; // -1
int prevprev = (current + 1) % 3; // -2
size_t index = count - i - 1; // invert direction
// copy from prev
intermediates[current] = intermediates[prev];
// update current (part 1)
update_helper(digits[index], intermediates[current]);
if (index + 1 < count && digits[index] &&
digits[index] * 10 + digits[index + 1] < 26)
{
// update prevprev
update_helper(digits[index] * 10 + digits[index + 1], intermediates[prevprev]);
// add to current (part 2)
intermediates[current].insert(intermediates[current].end(), intermediates[prevprev].begin(), intermediates[prevprev].end());
}
}
return intermediates[current];
}
void cleanupDelete(std::vector<const_node*>& nodes);
int main()
{
int code[] = { 1, 2, 3, 1, 2, 3, 1, 2, 3 };
int size = sizeof(code) / sizeof(int);
std::vector<const_node*> result = decode_it(code, size);
// output
for (size_t i = 0; i < result.size(); i++)
{
std::cout.width(3);
std::cout.flags(std::ios::right);
std::cout << i << ": ";
const_node* item = result[i];
while (item)
{
std::cout << item->content;
item = item->next;
}
std::cout << std::endl;
}
cleanupDelete(result);
}
void fillCleanup(const_node* n, std::set<const_node*>& all_nodes)
{
if (n)
{
all_nodes.insert(n);
fillCleanup(n->next, all_nodes);
}
}
void cleanupDelete(std::vector<const_node*>& nodes)
{
// this is like multiple inverse trees, hard to delete correctly, since multiple next pointers refer to the same target
std::set<const_node*> all_nodes;
for each (auto var in nodes)
{
fillCleanup(var, all_nodes);
}
nodes.clear();
for each (auto var in all_nodes)
{
delete var;
}
all_nodes.clear();
}
A drawback of the dynamically reused structure is the cleanup, since you wanna be careful to delete each node only once.

How to get the equilibrium index of an array in O(n)?

I have done a test in C++ asking for a function that returns one of the indices that splits the input vector in 2 parts having the same sum of the elements, for eg: for the vec = {1, 2, 3, 5, 4, -1, 1, 1, 2, -1}, it may return 3, because 1+2+3 = 6 = 4-1+1+1+2-1. So I have done the function that returns the correct answer:
int func(const std::vector< int >& vecIn)
{
for (std::size_t p = 0; p < vecin.size(); p++)
{
if (std::accumulator(vecIn.begin(), vecIn.begin() + p, 0) ==
std::accumulator(vecIn.begin() + p + 1, vecIn.end(), 0))
return p;
}
return -1;
}
My problem was when the input was a very long vector containing just 1 (or -1), the return of the function was slow. So I have thought of starting the search for the wanted index from middle, and then go left and right. But the best approach I suppose is the one where the index is in the merge-sort algorithm order, that means: n/2, n/4, 3n/4, n/8, 3n/8, 5n/8, 7n/8... where n is the size of the vector. Is there a way to write this order in a formula, so I can apply it in my function?
Thanks
EDIT
After some comments I have to mention that I had done the test a few days ago, so I have forgot to put and mention the part of no solution: it should return -1... I have updated also the question title.
Specifically for this problem, I would use the following algorithm:
Compute the total sum of the vector. This gives two sums (empty vector, and full vector)
for each element in order, move one element from full to empty, which means adding the value of next element from sum(full) to sum(empty). When the two sums are equal, you have found your index.
This give a o(n) algorithm instead of o(n2)
You can solve the problem much faster without calling std::accumulator at each step:
int func(const std::vector< int >& vecIn)
{
int s1 = 0;
int s2 = std::accumulator(vecIn.begin(), vecIn.end(), 0);
for (std::size_t p = 0; p < vecin.size(); p++)
{
if (s1 == s2)
return p;
s1 += vecIn[p];
s2 -= vecIn[p];
}
}
This is O(n). At each step, s1 will contain the sum of the first p elements, and s2 the sum of the rest. You can update both of them with an addition and a subtraction when moving to the next element.
Since std::accumulator needs to iterate over the range you give it, your algorithm was O(n^2), which is why it was so slow for many elements.
To answer the actual question: Your sequence n/2, n/4, 3n/5, n/8, 3n/8 can be rewritten as
1*n/2
1*n/4 3*n/4
1*n/8 3*n/8 5*n/8 7*n/8
...
that is to say, the denominator runs from i=2 up in powers of 2, and the nominator runs from j=1 to i-1 in steps of 2. However, this is not what you need for your actual problem, because the example you give has n=10. Clearly you don't want n/4 there - your indices have to be integer.
The best solution here is to recurse. Given a range [b,e], pick a value middle (b+e/2) and set the new ranges to [b, (b+e/2)-1] and [(b+e/2)=1, e]. Of course, specialize ranges with length 1 or 2.
Considering MSalters comments, I'm afraid another solution would be better. If you want to use less memory, maybe the selected answer is good enough, but to find the possibly multiple solutions you could use the following code:
static const int arr[] = {5,-10,10,-10,10,1,1,1,1,1};
std::vector<int> vec (arr, arr + sizeof(arr) / sizeof(arr[0]) );
// compute cumulative sum
std::vector<int> cumulative_sum( vec.size() );
cumulative_sum[0] = vec[0];
for ( size_t i = 1; i < vec.size(); i++ )
{ cumulative_sum[i] = cumulative_sum[i-1] + vec[i]; }
const int complete_sum = cumulative_sum.back();
// find multiple solutions, if there are any
const int complete_sum_half = complete_sum / 2; // suggesting this is valid...
std::vector<int>::iterator it = cumulative_sum.begin();
std::vector<int> mid_indices;
do {
it = std::find( it, cumulative_sum.end(), complete_sum_half );
if ( it != cumulative_sum.end() )
{ mid_indices.push_back( it - cumulative_sum.begin() ); ++it; }
} while( it != cumulative_sum.end() );
for ( size_t i = 0; i < mid_indices.size(); i++ )
{ std::cout << mid_indices[i] << std::endl; }
std::cout << "Split behind these indices to obtain two equal halfs." << std::endl;
This way, you get all the possible solutions. If there is no solution to split the vector in two equal halfs, mid_indices will be left empty.
Again, you have to sum up each value only once.
My proposal is this:
static const int arr[] = {1,2,3,5,4,-1,1,1,2,-1};
std::vector<int> vec (arr, arr + sizeof(arr) / sizeof(arr[0]) );
int idx1(0), idx2(vec.size()-1);
int sum1(0), sum2(0);
int idxMid = -1;
do {
// fast access without using the index each time.
const int& val1 = vec[idx1];
const int& val2 = vec[idx2];
// Precompute the next (possible) sum values.
const int nSum1 = sum1 + val1;
const int nSum2 = sum2 + val2;
// move the index considering the balanace between the
// left and right sum.
if ( sum1 - nSum2 < sum2 - nSum1 )
{ sum1 = nSum1; idx1++; }
else
{ sum2 = nSum2; idx2--; }
if ( idx1 >= idx2 ){ idxMid = idx2; }
} while( idxMid < 0 && idx2 >= 0 && idx1 < vec.size() );
std::cout << idxMid << std::endl;
It does add every value only once no matter how many values. Such that it's complexity is only O(n) and not O(n^2).
The code simply runs from left and right simultanuously and moves the indices further if it's side is lower than the other.
You want nth term of the series you mentioned. Then it would be:
numerator: (n - 2^((int)(log2 n)) ) *2 + 1
denominator: 2^((int)(log2 n) + 1)
I came across the same question in Codility tests. There is a similar looking answer above (didn't pass some of the unit tests), but below code segment was successful in tests.
#include <vector>
#include <numeric>
#include <iostream>
using namespace std;
// Returns -1 if equilibrium point is not found
// use long long to support bigger ranges
int FindEquilibriumPoint(vector<long> &values) {
long long lower = 0;
long long upper = std::accumulate(values.begin(), values.end(), 0);
for (std::size_t i = 0; i < values.size(); i++) {
upper -= values[i];
if (lower == upper) {
return i;
}
lower += values[i];
}
return -1;
}
int main() {
vector<long> v = {-1, 3, -4, 5, 1, -6, 2, 1};
cout << "Equilibrium Point:" << FindEquilibriumPoint(v) << endl;
return 0;
}
Output
Equilibrium Point:1
Here it is the algorithm in Javascript:
function equi(arr){
var N = arr.length;
if (N == 0){ return -1};
var suma = 0;
for (var i=0; i<N; i++){
suma += arr[i];
}
var suma_iz = 0;
for(i=0; i<N; i++){
var suma_de = suma - suma_iz - arr[i];
if (suma_iz == suma_de){
return i};
suma_iz += arr[i];
}
return -1;
}
As you see this code satisfy the condition of O(n)

What is the fastest way to find longest 'consecutive numbers' streak in vector ?

I have a sorted std::vector<int> and I would like to find the longest 'streak of consecutive numbers' in this vector and then return both the length of it and the smallest number in the streak.
To visualize it for you :
suppose we have :
1 3 4 5 6 8 9
I would like it to return: maxStreakLength = 4 and streakBase = 3
There might be occasion where there will be 2 streaks and we have to choose which one is longer.
What is the best (fastest) way to do this ? I have tried to implement this but I have problems with coping with more than one streak in the vector. Should I use temporary vectors and then compare their lengths?
No you can do this in one pass through the vector and only storing the longest start point and length found so far. You also need much fewer than 'N' comparisons. *
hint: If you already have say a 4 long match ending at the 5th position (=6) and which position do you have to check next?
[*] left as exercise to the reader to work out what's the likely O( ) complexity ;-)
It would be interesting to see if the fact that the array is sorted can be exploited somehow to improve the algorithm. The first thing that comes to mind is this: if you know that all numbers in the input array are unique, then for a range of elements [i, j] in the array, you can immediately tell whether elements in that range are consecutive or not, without actually looking through the range. If this relation holds
array[j] - array[i] == j - i
then you can immediately say that elements in that range are consecutive. This criterion, obviously, uses the fact that the array is sorted and that the numbers don't repeat.
Now, we just need to develop an algorithm which will take advantage of that criterion. Here's one possible recursive approach:
Input of recursive step is the range of elements [i, j]. Initially it is [0, n-1] - the whole array.
Apply the above criterion to range [i, j]. If the range turns out to be consecutive, there's no need to subdivide it further. Send the range to output (see below for further details).
Otherwise (if the range is not consecutive), divide it into two equal parts [i, m] and [m+1, j].
Recursively invoke the algorithm on the lower part ([i, m]) and then on the upper part ([m+1, j]).
The above algorithm will perform binary partition of the array and recursive descent of the partition tree using the left-first approach. This means that this algorithm will find adjacent subranges with consecutive elements in left-to-right order. All you need to do is to join the adjacent subranges together. When you receive a subrange [i, j] that was "sent to output" at step 2, you have to concatenate it with previously received subranges, if they are indeed consecutive. Or you have to start a new range, if they are not consecutive. All the while you have keep track of the "longest consecutive range" found so far.
That's it.
The benefit of this algorithm is that it detects subranges of consecutive elements "early", without looking inside these subranges. Obviously, it's worst case performance (if ther are no consecutive subranges at all) is still O(n). In the best case, when the entire input array is consecutive, this algorithm will detect it instantly. (I'm still working on a meaningful O estimation for this algorithm.)
The usability of this algorithm is, again, undermined by the uniqueness requirement. I don't know whether it is something that is "given" in your case.
Anyway, here's a possible C++ implementation
typedef std::vector<int> vint;
typedef std::pair<vint::size_type, vint::size_type> range;
class longest_sequence
{
public:
const range& operator ()(const vint &v)
{
current = max = range(0, 0);
process_subrange(v, 0, v.size() - 1);
check_record();
return max;
}
private:
range current, max;
void process_subrange(const vint &v, vint::size_type i, vint::size_type j);
void check_record();
};
void longest_sequence::process_subrange(const vint &v,
vint::size_type i, vint::size_type j)
{
assert(i <= j && v[i] <= v[j]);
assert(i == 0 || i == current.second + 1);
if (v[j] - v[i] == j - i)
{ // Consecutive subrange found
assert(v[current.second] <= v[i]);
if (i == 0 || v[i] == v[current.second] + 1)
// Append to the current range
current.second = j;
else
{ // Range finished
// Check against the record
check_record();
// Start a new range
current = range(i, j);
}
}
else
{ // Subdivision and recursive calls
assert(i < j);
vint::size_type m = (i + j) / 2;
process_subrange(v, i, m);
process_subrange(v, m + 1, j);
}
}
void longest_sequence::check_record()
{
assert(current.second >= current.first);
if (current.second - current.first > max.second - max.first)
// We have a new record
max = current;
}
int main()
{
int a[] = { 1, 3, 4, 5, 6, 8, 9 };
std::vector<int> v(a, a + sizeof a / sizeof *a);
range r = longest_sequence()(v);
return 0;
}
I believe that this should do it?
size_t beginStreak = 0;
size_t streakLen = 1;
size_t longest = 0;
size_t longestStart = 0;
for (size_t i=1; i < len.size(); i++) {
if (vec[i] == vec[i-1] + 1) {
streakLen++;
}
else {
if (streakLen > longest) {
longest = streakLen;
longestStart = beginStreak;
}
beginStreak = i;
streakLen = 1;
}
}
if (streakLen > longest) {
longest = streakLen;
longestStart = beginStreak;
}
You can't solve this problem in less than O(N) time. Imagine your list is the first N-1 even numbers, plus a single odd number (chosen from among the first N-1 odd numbers). Then there is a single streak of length 3 somewhere in the list, but worst case you need to scan the entire list to find it. Even on average you'll need to examine at least half of the list to find it.
Similar to Rodrigo's solutions but solving your example as well:
#include <vector>
#include <cstdio>
#define len(x) sizeof(x) / sizeof(x[0])
using namespace std;
int nums[] = {1,3,4,5,6,8,9};
int streakBase = nums[0];
int maxStreakLength = 1;
void updateStreak(int currentStreakLength, int currentStreakBase) {
if (currentStreakLength > maxStreakLength) {
maxStreakLength = currentStreakLength;
streakBase = currentStreakBase;
}
}
int main(void) {
vector<int> v;
for(size_t i=0; i < len(nums); ++i)
v.push_back(nums[i]);
int lastBase = v[0], currentStreakBase = v[0], currentStreakLength = 1;
for(size_t i=1; i < v.size(); ++i) {
if (v[i] == lastBase + 1) {
currentStreakLength++;
lastBase = v[i];
} else {
updateStreak(currentStreakLength, currentStreakBase);
currentStreakBase = v[i];
lastBase = v[i];
currentStreakLength = 1;
}
}
updateStreak(currentStreakLength, currentStreakBase);
printf("maxStreakLength = %d and streakBase = %d\n", maxStreakLength, streakBase);
return 0;
}