Algorithm: Longest Approximate interval - c++

I am trying to solve this question:
When Xellos was doing a practice course in university, he once had to
measure the intensity of an effect that slowly approached equilibrium.
A good way to determine the equilibrium intensity would be choosing a
sufficiently large number of consecutive data points that seems as
constant as possible and taking their average. Of course, with the
usual sizes of data, it's nothing challenging — but why not make a
similar programming contest problem while we're at it?
You're given a sequence of n data points a1, ..., an. There aren't any
big jumps between consecutive data points — for each 1 ≤ i < n, it's
guaranteed that |ai + 1 - ai| ≤ 1.
A range [l, r] of data points is said to be almost constant if the
difference between the largest and the smallest value in that range is
at most 1. Formally, let M be the maximum and m the minimum value of
ai for l ≤ i ≤ r; the range [l, r] is almost constant if M - m ≤ 1.
Find the length of the longest almost constant range.
Input The first line of the input contains a single integer n
(2 ≤ n ≤ 100 000) — the number of data points.
The second line contains n integers a1, a2, ..., an
(1 ≤ ai ≤ 100 000).
Output Print a single number — the maximum length of an almost
constant range of the given sequence.
I see a solution here but I don't understand the algorithm - specifically the body of the loop. I am familiar with C++ syntax and I understand what's happening. I just don't understand why the algorithm works.
using namespace std;
int a[1000005];
int main()
int n,ans = 2,x;
for(int i = 1; i <= n; i++)
a[x] = i;
if(a[x-1] > a[x+1])
ans = max(ans,i-max(a[x+1],a[x-2]));
ans = max(ans,i-max(a[x+2],a[x-1]));
return 0;
Could someone explain it to me?

Array a stands for the last position of value x.
Now let's calculate the left bound for each position(with value x), if a[x - 1] is more close to a[x] compare to a[x + 1], it means the position that will break the rule of almost constant is at a[x + 1](because there is a x - 1 in between) or a[x - 2](because there is a x - 1 in between).
Vice versa.


A problem of taking combination for set theory

Given an array A with size N. Value of a subset of Array A is defined as product of all numbers in that subset. We have to return the product of values of all possible non-empty subsets of array A %(10^9+7).
E.G. array A {3,5}
` Value{3} = 3,
Value{5} = 5,
Value{3,5} = 5*3 = 15
answer = 3*5*15 %(10^9+7).
Can someone explain the mathematics behind the problem. I am thinking of solving it by combination to solve it efficiently.
I have tried using brute force it gives correct answer but it is way too slow.
Next approach is using combination. Now i think that if we take all the sets and multiply all the numbers in those set then we will get the correct answer. Thus i have to find out how many times a number is coming in calculation of answer. In the example 5 and 3 both come 2 times. If we look closely, each number in a will come same number of times.
You're heading in the right direction.
Let x be an element of the given array A. In our final answer, x appears p number of times, where p is equivalent to the number of subsets of A possible that include x.
How to calculate p? Once we have decided that we will definitely include x in our subset, we have two choices for the rest N-1 elements: either include them in set or do not. So, we conclude p = 2^(N-1).
So, each element of A appears exactly 2^(N-1) times in the final product. All remains is to calculate the answer: (a1 * a2 * ... * an)^p. Since the exponent is very large, you can use binary exponentiation for fast calculation.
As Matt Timmermans suggested in comments below, we can obtain our answer without actually calculating p = 2^(N-1). We first calculate the product a1 * a2 * ... * an. Then, we simply square this product n-1 times.
The corresponding code in C++:
int func(vector<int> &a) {
int n = a.size();
int m = 1e9+7;
if(n==0) return 0;
if(n==1) return (m + a[0]%m)%m;
long long ans = 1;
//first calculate ans = (a1*a2*...*an)%m
for(int x:a){
//negative sign does not matter since we're squaring
if(x<0) x *= -1;
x %= m;
ans *= x;
ans %= m;
//now calculate ans = [ ans^(2^(n-1)) ]%m
//we do this by squaring ans n-1 times
for(int i=1; i<n; i++){
ans = ans*ans;
ans %= m;
return (int)ans;
All possible subset of A is ={{},{a},{b},{c},{a,b},{b,c},{c,a},{a,b,c,d}}
Here number of occurrence of each of the element are 4 times.
So if A={a,b,c,d}, then numbers of occurrence of each of the element will be 2^3.
So if the size of A is n, number of occurrence of eachof the element will be 2^(n-1)
So final result will be = a1^p*a2^pa3^p....*an^p
where p is 2^(n-1)
We need to solve x^2^(n-1) % mod.
We can write x^2^(n-1) % mod as x^(2^(n-1) % phi(mod)) %mod . link
As mod is a prime then phi(mod)=mod-1.
So at first find p= 2^(n-1) %(mod-1).
Then find Ai^p % mod for each of the number and multiply with the final result.
I read the previous answers and I was understanding the process of making sets. So here I am trying to put it in as simple as possible for people so that they can apply it to similar problems.
Let i be an element of array A. Following the approach given in the question, i appears p number of times in final answer.
Now, how do we make different sets. We take sets containing only one element, then sets containing group of two, then group of 3 ..... group of n elements.
Now we want to know for every time when we are making set of certain numbers say group of 3 elements, how many of these sets contain i?
There are n elements so for sets of 3 elements which always contains i, combinations are (n-1)C(3-1) because from n-1 elements we can chose 3-1 elements.
if we do this for every group, p = [ (n-1)C(x-1) ] , m going from 1 to n. Thus, p= 2^(n-1).
Similarly for every element i, p will be same. Thus we get
final answer= A[0]^p *A[1]^p...... A[n]^p

Generate N random numbers within a range with a constant sum

I want to generate N random numbers drawn from a specif distribution (e.g uniform random) between [a,b] which sum to a constant C. I have tried a couple of solutions I could think of myself, and some proposed on similar threads but most of them either work for a limited form of problem or I can't prove the outcome still follows the desired distribution.
What I have tried:
Generage N random numbers, divide all of them by the sum of them and multiply by the desired constant. This seems to work but the result does not follow the rule that the numbers should be within [a:b].
Generage N-1 random numbers add 0 and desired constant C and sort them. Then calculate the difference between each two consecutive nubmers and the differences are the result. This again sums to C but have the same problem of last method(the range can be bigger than [a:b].
I also tried to generate random numbers and always keep track of min and max in a way that the desired sum and range are kept and come up with this code:
bool generate(function<int(int,int)> randomGenerator,int min,int max,int len,int sum,std::vector<int> &output){
* Not possible to produce such a sequence
if(min*len > sum)
return false;
if(max*len < sum)
return false;
int curSum = 0;
int left = sum - curSum;
int leftIndexes = len-1;
int curMax = left - leftIndexes*min;
int curMin = left - leftIndexes*max;
for(int i=0;i<len;i++){
int num = randomGenerator((curMin< min)?min:curMin,(curMax>max)?max:curMax);
curSum += num;
left = sum - curSum;
curMax = left - leftIndexes*min;
curMin = left - leftIndexes*max;
return true;
This seems to work but the results are sometimes very skewed and I don't think it's following the original distribution (e.g. uniform). E.g:
//10 numbers within [1:10] which sum to 50:
2,7,2,5,2,10,5,8,4,5 => sum=50
//This looks reasonable for uniform, but let's change to
//10 numbers within [1:25] which sum to 50:
24,12,6,2,1,1,1,1,1,1 => sum= 50
Notice how many ones exist in the output. This might sound reasonable because the range is larger. But they really don't look like a uniform distribution.
I am not sure even if it is possible to achieve what I want, maybe the constraints are making the problem not solvable.
In case you want the sample to follow a uniform distribution, the problem reduces to generate N random numbers with sum = 1. This, in turn, is a special case of the Dirichlet distribution but can also be computed more easily using the Exponential distribution. Here is how:
Take a uniform sample v1 … vN with all vi between 0 and 1.
For all i, 1<=i<=N, define ui := -ln vi (notice that ui > 0).
Normalize the ui as pi := ui/s where s is the sum u1+...+uN.
The p1..pN are uniformly distributed (in the simplex of dim N-1) and their sum is 1.
You can now multiply these pi by the constant C you want and translate them by summing some other constant A like this
qi := A + pi*C.
In order to address some issues raised in the comments, let me add the following:
To ensure that the final random sequence falls in the interval [a,b] choose the constants A and C above as A := a and C := b-a, i.e., take qi = a + pi*(b-a). Since pi is in the range (0,1) all qi will be in the range [a,b].
One cannot take the (negative) logarithm -ln(vi) if vi happens to be 0 because ln() is not defined at 0. The probability of such an event is extremely low. However, in order to ensure that no error is signaled the generation of v1 ... vN in item 1 above must threat any occurrence of 0 in a special way: consider -ln(0) as +infinity (remember: ln(x) -> -infinity when x->0). Thus the sum s = +infinity, which means that pi = 1 and all other pj = 0. Without this convention the sequence (0...1...0) would never be generated (many thanks to #Severin Pappadeux for this interesting remark.)
As explained in the 4th comment attached to the question by #Neil Slater it is logically impossible to fulfill all the requirements of the original framing. Therefore any solution must relax the constraints to a proper subset of the original ones. Other comments by #Behrooz seem to confirm that this would suffice in this case.
One more issue has been raised in the comments:
Why rescaling a uniform sample does not suffice?
In other words, why should I bother to take negative logarithms?
The reason is that if we just rescale then the resulting sample won't distribute uniformly across the segment (0,1) (or [a,b] for the final sample.)
To visualize this let's think 2D, i.e., let's consider the case N=2. A uniform sample (v1,v2) corresponds to a random point in the square with origin (0,0) and corner (1,1). Now, when we normalize such a point dividing it by the sum s=v1+v2 what we are doing is projecting the point onto the diagonal as shown in the picture (keep in mind that the diagonal is the line x + y = 1):
But given that green lines, which are closer to the principal diagonal from (0,0) to (1,1), are longer than orange ones, which are closer to the axes x and y, the projections tend to accumulate more around the center of the projection line (in blue), where the scaled sample lives. This shows that a simple scaling won't produce a uniform sample on the depicted diagonal. On the other hand, it can be proven mathematically that the negative logarithms do produce the desired uniformity. So, instead of copypasting a mathematical proof I would invite everyone to implement both algorithms and check that the resulting plots behave as this answer describes.
(Note: here is a blog post on this interesting subject with an application to the Oil & Gas industry)
Let's try to simplify the problem.
By substracting the lower bound, we can reduce it to finding N numbers in [0,b-a] such that their sum is C-Na.
Renaming the parameters, we can look for N numbers in [0,m] whose sum is S.
Now the problem is akin to partitioning a segment of length S in N distinct sub-segments of length [0,m].
I think the problem is simply not solvable.
if S=1, N=1000 and m anything above 0, the only possible repartition is one 1 and 999 zeroes, which is nothing like a random spread.
There is a correlation between N, m and S, and even picking random values will not make it disappear.
For the most uniform repartition, the length of the sub-segments will follow a gaussian curve with a mean value of S/N.
If you tweak your random numbers differently, you will end up with whatever bias, but in the end you will never have both a uniform [a,b] repartition and a total length of C, unless the length of your [a,b] interval happens to be 2C/N-a.
For my answer I'll assume that we have a uniform distribution.
Since we have a uniform distribution, every tuple of C has the same probability to occur. For example for a = 2, b = 2, C = 12, N = 5 we have 15 possible tuples. From them 10 start with 2, 4 start with 3 and 1 starts with 4. This gives the idea of selecting a random number from 1 to 15 in order to choose the first element. From 1 to 10 we select 2, from 11 to 14 we select 3 and for 15 we select 4. Then we continue recursively.
#include <time.h>
#include <random>
std::default_random_engine generator(time(0));
int a = 2, b = 4, n = 5, c = 12, numbers[5];
// Calculate how many combinations of n numbers have sum c
int calc_combinations(int n, int c) {
if (n == 1) return (c >= a) && (c <= b);
int sum = 0;
for (int i = a; i <= b; i++) sum += calc_combinations(n - 1, c - i);
return sum;
// Chooses a random array of n elements having sum c
void choose(int n, int c, int *numbers) {
if (n == 1) { numbers[0] = c; return; }
int combinations = calc_combinations(n, c);
std::uniform_int_distribution<int> distribution(0, combinations - 1);
int s = distribution(generator);
int sum = 0;
for (int i = a; i <= b; i++) {
if ((sum += calc_combinations(n - 1, c - i)) > s) {
numbers[0] = i;
choose(n - 1, c - i, numbers + 1);
int main() { choose(n, c, numbers); }
Possible outcome:
This algorithm won't scale well for large N because of overflows in the calculation of combinations (unless we use a big integer library), the time needed for this calculation and the need for arbitrarily large random numbers.
well, for n=10000 cant we have a small number in there that is not random?
maybe generating sequence till sum > C-max reached and then just put one simple number to sum it up.
1 in 10000 is more like a very small noise in the system.
Although this was old topic but I think I got a idea. Consider we want N random number which sum is C and each random between a and b. To solve problem, we create N holes and prepare C balls, for each time we ask each hole "Do you want another ball?". If no, we pass to next hole, else, we put a ball into the hole. Each hole has a cap value: b-a. If some hole reach the cap value then always pass to next hole.
3 random numbers between 0 and 2 which sum is 5.
simulation result:
1st run: -+-
2nd run: ++-
3rd run: ---
4th run: +*+
-:refuse ball
+:accept ball
*:full pass

How can I find number of consecutive sequences of various lengths satisfy a particular property?

I am given a array A[] having N elements which are positive integers
.I have to find the number of sequences of lengths 1,2,3,..,N that satisfy a particular property?
I have built an interval tree with O(nlogn) complexity.Now I want to count the number of sequences that satisfy a certain property ?
All the properties required for the problem are related to sum of the sequences
Note an array will have N*(N+1)/2 sequences. How can I iterate over all of them in O(nlogn) or O(n) ?
If we let k be the moving index from 0 to N(elements), we will run an algorithm that is essentially looking for the MIN R that satisfies the condition (lets say I), then every other subset for L = k also is satisfied for R >= I (this is your short circuit). After you find I, simply return an output for (L=k, R>=I). This of course assumes that all numerics in your set are >= 0.
To find I, for every k, begin at element k + (N-k)/2. Figure out if this defined subset from (L=k, R=k+(N-k)/2) satisfies your condition. If it does, then decrement R until your condition is NOT met, then R=1 is your MIN (your could choose to print these results as you go, but they results in these cases would be essentially printed backwards). If (L=k, R=k+(N-k)/2) does not satisfy your condition, then INCREMENT R until it does, and this becomes your MIN for that L=k. This degrades your search space for each L=k by a factor of 2. As k increases and approaches N, your search space continuously decreases.
// This declaration wont work unless N is either a constant or MACRO defined above
unsigned int myVals[N];
unsigned int Ndiv2 = N / 2;
unsigned int R;
for(unsigned int k; k < N; k++){
if(TRUE == TESTVALS(myVals, k, Ndiv2)){ // It Passes
for(I = NDiv2; I>=k; I--){
if(FALSE == TESTVALS(myVals, k, I)){
}else{ // It Didnt Pass
for(I = NDiv2; I>=k; I++){
if(TRUE == TESTVALS(myVals, k, I)){
// PRINT ALL PAIRS from L=k, from R=I to R=N-1
if((k & 0x00000001) == 0) Ndiv2++;
} // END --> for(unsigned int k; k < N; k++)
The complexity of the algorithm above is O(N^2). This is because for each k in N(i.e. N iterations / tests) there is no greater than N/2 values for each that need testing. Big O notation isnt concerned about the N/2 nor the fact that truly N gets smaller as k grows, it is concerned with really only the gross magnitude. Thus it would say N tests for every N values thus O(N^2)
There is an Alternative approach which would be FASTER. That approach would be to whenever you wish to move within the secondary (inner) for loops, you could perform a move have the distance algorithm. This would get you to your O(nlogn) set of steps. For each k in N (which would all have to be tested), you run this half distance approach to find your MIN R value in logN time. As an example, lets say you have a 1000 element array. when k = 0, we essentially begin the search for MIN R at index 500. If the test passes, instead of linearly moving downward from 500 to 0, we test 250. Lets say the actual MIN R for k = 0 is 300. Then the tests to find MIN R would look as follows:
While this is oversimplified, your are most likely going to have to optimize, and test 301 as well 299 to make sure youre in the sweet spot. Another not is to be careful when dividing by 2 when you have to move in the same direction more than once in a row.
Given an array of N numbers,find the number of sequences of all lengths having the range of R?

This is a follow up question to Given a sequence of N numbers ,extract number of sequences of length K having range less than R?
I basically need a vector v as an answer of size N such that V[i] denotes number of sequences of length i which have range <=R.
Traditionally, in recursive solutions, you would compute the solution for K = 0, K = 1, and then find some kind of recurrence relation between subsequent elements to avoid recomputing the solution from scratch each time.
However here I believe that maybe attacking the problem from the other side would be interesting, because of the property of the spread:
Given a sequence of spread R (or less), any subsequence has a spread inferior to R as well
Therefore, I would first establish a list of the longest subsequences of spread R beginning at each index. Let's call this list M, and have M[i] = j where j is the higher index in S (the original sequence) for which S[j] - S[i] <= R. This is going to be O(N).
Now, for any i, the number of sequences of length K starting at i is either 0 or 1, and this depends whether K is greater than M[i] - i or not. A simple linear pass over M (from 0 to N-K) gives us the answer. This is once again O(N).
So, if we call V the resulting vector, with V[k] denoting the number of subsequences of length K in S with spread inferior to R, then we can do it in a single iteration over M:
for i in [0, len(M)]:
for k in [0, M[i] - i]:
The algorithm is simple, however the number of updates can be rather daunting. In the worst case, supposing than M[i] - i equals N - i, it is O(N*N) complexity. You would need a better data structure (probably an adaptation of a Fenwick Tree) to use this algorithm an lower the cost of computing those numbers.
If you are looking for contiguous sequences, try doing it recursively : The K-length subsequences set having a range inferior than R are included in the (K-1)-length subsequences set.
At K=0, you have N solutions.
Each time you increase K, you append (resp. prepend) the next (resp.previous) element, check if it the range is inferior to R, and either store it in a set (look for duplicates !) or discard it depending on the result.
If think the complexity of this algorithm is O(n*n) in the worst-case scenario, though it may be better on average.
I think Matthieu has the right answer when looking for all sequences with spread R.
As you are only looking for sequences of length K, you can do a little better.
Instead of looking at the maximum sequence starting at i, just look at the sequence of length K starting at i, and see if it has range R or not. Do this for every i, and you have all sequences of length K with spread R.
You don't need to go through the whole list, as the latest start point for a sequence of length K is n-K+1. So complexity is something like (n-K+1)*K = n*K - K*K + K. For K=1 this is n,
and for K=n it is n. For K=n/2 it is n*n/2 - n*n/4 + n/2 = n*n/2 + n/2, which I think is the maximum. So while this is still O(n*n), for most values of K you get a little better.
Start with a simpler problem: count the maximal length of sequences, starting at each index and having the range, equal to R.
To do this, let first pointer point to the first element of the array. Increase second pointer (also starting from the first element of the array) while sequence between pointers has the range, less or equal to R. Push every array element, passed by second pointer, to min-max-queue, made of a pair of mix-max-stacks, described in this answer. When difference between max and min values, reported by min-max-queue exceeds R, stop increasing second pointer, increment V[ptr2-ptr1], increment first pointer (removing element, pointed by it, from min-max-queue), and continue increasing second pointer (keeping range under control).
When second pointer leaves bounds of the array, increment V[N-ptr1] for all remaining ptr1 (corresponding ranges may be less or equal to R). To add all other ranges, that are less than R, compute cumulative sum of array V[], starting from its end.
Both time and space complexities are O(N).
p1 = p2 = 0;
do {
do {
} while (p2 < N && min_max_queue.range() <= R);
if (p2 < N) {
++v[p2 - p1 - 1];
} while (p2 < N);
for (i = 1; i <= N-p1; ++i) {
sum = 0;
for (j = N; j > 0; --j) {
value = v[j];
v[j] += sum;
sum += value;

Finding the maximum weight subsequence of an array of positive integers?

I'm tring to find the maximum weight subsequence of an array of positive integers - the catch is that no adjacent members are allowed in the final subsequence.
The exact same question was asked here, and a recursive solution was given by MarkusQ thus:
function Max_route(A)
if A's length = 1
maximum of
He provides an explanation, but can anyone help me understand how he has expanded the function? Specifically what does he mean by
f[] :- [],0
f [x] :- [x],x
f [a,b] :- if a > b then [a],a else [b],b
f [a,b,t] :-
ft = f t
fbt = f [b|t]
if a + ft.sum > fbt.sum
Why does he expand f[] to [],0? Also how does his solution take into consideration non-adjacent members?
I have some C++ code that is based on this algorithm, which I can post if anyone wants to see it, but I just can't for the life of me fathom why it works.
==========For anyone who's interested - the C++ code ==============
I should add, that the array of integers is to be treated as a circular list, so any sequence containing the first element cannot contain the last.
int memo[55][55];
int solve(int s, int e)
if( s>e ) return 0;
int &ret=memo[s][e];
return ret;
ret=max(solve(s+1,e), solve(s+2,e)+a[s]);
return ret;
class Sequence
int maxSequence(vector <int> s)
int n = s.size();
for(int i=0; i<n; i++)
return max(solve(0,n-2),solve(1,n-1));
I don't really understand that pseudocode, so post the C++ code if this isn't helpful and I'll try to improve it.
I'm tring to find the maximum weight subsequence of an array of positive integers - the catch is that no adjacent members are allowed in the final subsequence.
Let a be your array of positive ints. Let f[i] = value of the maximum weight subsequence of the sequence a[0..i].
We have:
f[0] = a[0] because if there's only one element, we have to take it.
f[1] = max(a[0], a[1]) because you have the no adjacent elements restriction, so if you have two elements, you can only take one of them. It makes sense to take the largest one.
Now, generally you have:
f[i > 1] = max(
f[i - 2] + a[i] <= add a[i] to the largest subsequence of the sequence a[0..i - 2]. We cannot take a[0..i - 1] because otherwise we risk adding an adjacent element.
f[i - 1] <= don't add the current element to the maximum of a[0..i - 2], instead take the maximum of a[0..i - 1], to which we cannot add a[i].
I think this way is easier to understand than what you have there. The approaches are equivalent, I just find this clearer for this particular problem, since recursion makes things harder in this case and the pseudocode could be clearer either way.
But what do you NOT understand? It seems quite clear for me:
we will build the maximal subsequence for every prefix of our given sequence
to calculate the maximal subsequence for prefix of length i, we consider two possibilities: Either the last element is, or isn't in the maximal subsequence (clearly there are no other possibilities).
if it is there, we consider the value of the last element, plus the value of maximal subsequence of the prefix two elements shorter (because in this case, we know the last element cannot be present in the maximal subsequence because of the adjacent elements rule)
if it isn't we take the value of maximal sum of prefix one element shorter (if the last element of the prefix is not in the maximal subsequence, the maximal subsequence has to be equal for this and the previous prefix)
we compare and take the maximum of the two
Plus: you need to remember actual subsequences; you need to avoid superfluous function invocations, hence the memoization.
Why does he expand f[] to [],0?
Because the first from the pair in return value means current maximal subsequence, and the second is its value. Maximal subsequence of an empty sequence is empty and has value zero.