Maximum contiguous sub-array (With most number of elements) - c++

Given an array of natural numbers and an another natural T, how to find the contiguous subarray with sum less than or equal to T but the number of element in this subarray is maximized?
For example, if the given array is:
{3, 1, 2, 1, 1} and T = 5. Then the maximum contigous subarray is {1, 2, 1, 1} because it will contain 5 elements and the sum is equal to 5.
Another example: {10,1,1,1,1,3,6,7} with T = 8. Then the maximum contigous subarray is ${1,1,1,1,3}$
I can do it with O(n^2) operation. However I am looking for a linear time solution for this problem. Any ideas?

It ought to be possible to do this with O(n). I've not tested this, but it looks OK:
int start = 0, end = 0;
int beststart = 0, bestend = 0;
int sum = array[0];
while (end + 1 < arraysize) {
if (array[end + 1] + sum <= T)
sum += array[end++];
else
sum -= array[start++];
if ((end - start) > (bestend - beststart)) {
beststart = start;
bestend = end;
}
}
So, basically, it moves a sliding window along the array and records the point at which end - start is the greatest.

It seems to be a capped version of the Maximum subarray problem: http://en.wikipedia.org/wiki/Maximum_subarray_problem
I guess you can find inspirations with existing algorithms.

Related

Finding the number of sub arrays that have a sum of K

I am trying to find the number of sub arrays that have a sum equal to k:
int subarraySum(vector<int>& nums, int k)
{
int start, end, curr_sum = 0, count = 0;
start = 0, end = 0;
while (end < (int)nums.size())
{
curr_sum = curr_sum + nums[end];
end++;
while (start < end && curr_sum >= k)
{
if (curr_sum == k)
count++;
curr_sum = curr_sum - nums[start];
start++;
}
}
return count;
}
The above code I have written, works for most cases, but fails for the following:
array = {-1, -1, 1} with k = 0
I have tried to add another while loop to iterate from the start and go up the array until it reaches the end:
int subarraySum(vector<int>& nums, int k)
{
int start, end, curr_sum = 0, count = 0;
start = 0, end = 0;
while (end < (int)nums.size())
{
curr_sum = curr_sum + nums[end];
end++;
while (start < end && curr_sum >= k)
{
if (curr_sum == k)
count++;
curr_sum = curr_sum - nums[start];
start++;
}
}
while (start < end)
{
if (curr_sum == k)
count++;
curr_sum = curr_sum - nums[start];
start++;
}
return count;
}
Why is this not working? I am sliding the window until the last element is reached, which should have found a sum equal to k? How can I solve this issue?
Unfortunately, you did not program a sliding window in the correct way. And a sliding window is not really a solution for this problem. One of your main issues is, that you do not move the start of the window based on the proper conditions. You always sum up and wait until the sum is greater than the search value.
This will not really work. Especially for your example -1, -1, 1. The running sum of this is: -1, -2, -1 and you do not see the 0, although it is there. You may have the idea to write while (start < end && curr_sum != k), but this will also not work, because you handle the start pointer not correctly.
Your approach will lead to the brute force solution that typically takes something like N*N loop operations, where N is the size of the array. This, because we need a double nested loop.
That will of course always work, but maybe very time-consuming, and, in the end, too slow.
Anyway. Let us implement that. We will start from each value in the std::vector and try out all sub arrays starting from the beginning value. We must evaluate all following values in the std::vector, because for example the last value could be a big negative number and bring down the sum again to the search value.
We could implement this for example like the following:
#include <iostream>
#include <vector>
using namespace std;
int subarraySum(vector<int>& numbers, int searchSumValue) {
// Here we will store the result
int resultingCount{};
// Iterate over all values in the array. So, use all different start values
for (std::size_t i{}; i < numbers.size(); ++i) {
// Here we stor the running sum of the elements in the vector
int sum{ numbers[i] };
// Check for trivial case. A one-element sub-array does already match the search value
if (sum == searchSumValue) ++resultingCount;
// Now we build all subarrays beginning with the start value
for (std::size_t k{ i + 1 }; k < numbers.size(); ++k) {
sum += numbers[k];
if (sum == searchSumValue) ++resultingCount;
}
}
return resultingCount;
}
int main() {
vector v{ -1,-1,1 };
std::cout << subarraySum(v, 0);
}
.
But, as said, the above is often too slow for big vectors and there is indeed a better solution available, which is based on a DP (dynamic programming) algorithm.
It uses so-called prefix sums, running sums, based on the running sum before the current evaluated value.
We need to show an example. Let's use a std::vector with 5 values {1,2,3,4,5}. And we want to look subarrays with a sum of 9.
We can “guess” that there are 2 subarrays: {2,3,4} and {4,5} that have a sum of 9.
Let us investigate further
Index 0 1 2 3 4
Value 1 2 3 4 5
We can now add a running sum and see, how much delta we have between the current evaluated element and the left neighbor or over-next neighbor and so on. And if we have a delta that is equal to our search value, then we must have a subarray building this sum.
Running Sum 1 3 6 10 15
Deltas of 2 3 4 5 against next left
Running sum 5 7 9 against next next left
9 12 against next next next left
Example {2,3,4}. If we evaluate the 4 with a running sum of 10, and subtract the search value 9, then we get the previous running sum 1. “1+9=10” all values are there.
Example {4,5}. If we evaluate the 5 with a running sum of 15, and subtract the search value 9, then we get the previous running sum = 6. “6+9=15” all values are there.
We can find all solutions using the same approach.
So, the only thing we need to do, is to subtract the search value from the current running sum and see, if we have this running sum already calculated before.
Like: “Search-Value” + “previously Calculated Sum” = “Current Running Sum”.
Or: “Current Running Sum” – “Search-Value” = “previously Calculated Sum”
Again, we need to do the subtraction and check, if we already calculated such a sum previously.
So, we need to store all previously calculated running sums. And, because such a sum may appear more than one, we need to find occurrences of equal running sums and count them.
It is very hard to digest, and you need to think a while to understand.
With the above wisdom, you can draft the below potential solution.
#include <iostream>
#include <vector>
#include <unordered_map>
int subarraySum(std::vector<int>& numbers, int searchSumValue) {
// Here we will store the result
int resultingSubarrayCount{};
// Here we will stor all running sums and how ofthen their value appeared
std::unordered_map<int, int> countOfRunningSums;
// Continuosly calculating the running sum
int runningSum{};
// And initialize the first value
countOfRunningSums[runningSum] = 1;
// Now iterate over all values in the vector
for (const int n : numbers) {
// Calculate the running sum
runningSum += n;
// Check, if we have the searched value already available
// And add the number of occurences to our resulting number of subarrays
resultingSubarrayCount += countOfRunningSums[runningSum - searchSumValue];
// Store the new running sum. Respectively. Add 1 to the counter, if the running sum was alreadyy existing
countOfRunningSums[runningSum]++;
}
return resultingSubarrayCount;
}
int main() {
std::vector v{ 1,2,3,4,5 };
std::cout << subarraySum(v, 9);
}

Fastest sorting method for k digits and N elements, k <<< N

Question: There are n balls, they are labeled 0, 1, 2 and the order is chaotic, I want to sort them from small to large. Balls:
1, 2, 0, 1, 1, 2, 2, 0, 1, 2, ...
We must use the fastest way to solve and cannot use sort() function, I thought many ways like the bubble sort, inset sort, etc. But it is not fast. Is there an algorithm that makes the time complexity is O(logn) or O(n)?
given balls list A[] and length n
void sortBalls(int A[], int n)
{
//code here
}
Given the very limited number of item types (0, 1, and 2), you just count the number of occurrences of each. Then to print the "sorted" array, you repeatedly print each label the number of times it occurred. Running time is O(N)
int balls[N] = {...}; // array of balls: initialized to whatever
int sorted_balls[N]; // sorted array of balls (to be set below)
int counts[3] = {}; // count of each label, zero initialized array.
// enumerate over the input array and count each label's occurance
for (int i = 0; i < N; i++)
{
counts[balls[i]]++;
}
// sort the items by just printing each label the number of times it was counted above
int k = 0;
for (int j = 0; j < 3; j++)
{
for (int x = 0; x < counts[j]; x++)
{
cout << j << ", "; // print
sorted_balls[k] = j; // store into the final sorted array
k++;
}
}
If you have a small number of possible values known in advance, and the value is everything you need to know about the ball (they carry no other attributes), "sorting" becomes equivalent to "counting how many of each value there are". So you generate a histogram - an array from 0 to 2, in your case - go through your values and increase the corresponding count. Then you generate an array of n_0 balls with number 0, n_1 balls with number 1 and n_2 with number 2, and voila, they're sorted.
It's trivially obvious that you cannot go below O(n) - at the very least, you have to look at each value once to count it, and for n values, that's n operations right away.

Intuition behind storing the remainders?

I am trying to solve a question on LeetCode.com:
Given a list of non-negative numbers and a target integer k, write a function to check if the array has a continuous subarray of size at least 2 that sums up to the multiple of k, that is, sums up to n*k where n is also an integer. For e.g., if [23, 2, 4, 6, 7], k=6, then the output should be True, since [2, 4] is a continuous subarray of size 2 and sums up to 6.
I am trying to understand the following solution:
class Solution {
public:
bool checkSubarraySum(vector<int>& nums, int k) {
int n = nums.size(), sum = 0, pre = 0;
unordered_set<int> modk;
for (int i = 0; i < n; ++i) {
sum += nums[i];
int mod = k == 0 ? sum : sum % k;
if (modk.count(mod)) return true;
modk.insert(pre);
pre = mod;
}
return false;
}
};
I understand that we are trying to store: 0, (a/k), (a+b)/k, (a+b+c)/k, etc. into the hashSet (where k!=0) and that we do that in the next iteration since we want the subarray size to be at least 2.
But, how does this guarantee that we get a subarray whose elements sum up to k? What mathematical property guarantees this?
The set modk is gradually populated with all sums (considered modulo k) of contiguous sub-arrays starting at the beginning of the array.
The key observation is that:
a-b = n*k for some natural n iff
a-b ≡ 0 mod k iff
a ≡ b mod k
so if a contiguous sub-array nums[i_0]..nums[i_1], sums up to 0 modulo k, then the two sub-arrays nums[0]..nums[i_0] and nums[i_0 + 1]..nums[i_1] have the same sum modulo k.
Thus it's enough if two distinct sub-arrays starting at the beginning of the array have the same sum, modulo k.
Luckily, there are only k such values, so you only need to use a set of size k.
Some nitpicks:
if n > k, you're going to have an appropriate sub-array anyway (the pigeon-hole principle), so the loop will actually never iterate more than k+1 times.
There should not be any sort of class involved here, that makes no sense.
contiguous, not continuous. Arrays and sub-arrays are discrete and can't be continuous...
module base k of sum is equivalent to the module k of sum of the modules base k
(a+b)%k = (a%k + b%k) % k
(23 + 2) % 6 = 1
( (23%6) + (2%6) ) % 6 = (5 + 2) % 6 = 1
modk stores all modules that you calculated iteratively. If at iteration i you get a repeated module calculated at i-m that means that you added a subsequence of m elements which sum is multiple of k
i=0 nums[0] = 23 sum = 23 sum%6 = 5 modk = [5]
i=1 nums[1] = 2 sum = 25 sum%6 = 1 modk = [5, 1]
i=2 nums[2] = 4 sum = 29 sum%6 = 5 5 already exists in modk (4+2)%6 =0

How to calculate the minimum cost to convert all n numbers in an array to m?

I have been given the following assignment:
Given N integers in the form of A(i) where 1≤i≤N, make each number
A(i) in the N numbers equal to M. To convert a number A(i) to M, it
will cost |M−Ai| units. Find out the minimum cost to convert all the N
numbers to M, so you should choose the best M to get the minimum cost.
Given:
1 <= N <= 10^5
1 <= A(i) <= 10^9
My approach was to calculate the sum of all numbers and find avg = sum / n and then subtract each number by avg to get the minimum cost.
But this fails in many test cases. How can I find the optimal solution for this?
You should take the median of the numbers (or either of the two numbers nearest the middle if the list has even length), not the mean.
An example where the mean fails to minimize is: [1, 2, 3, 4, 100]. The mean is 110 / 5 = 22, and the total cost is 21 + 20 + 19 + 18 + 78 = 156. Choosing the median (3) gives total cost: 2 + 1 + 0 + 1 + 97 = 101.
An example where the median lies between two items in the list is [1, 2, 3, 4, 5, 100]. Here the median is 3.5, and it's ok to either use M=3 or M=4. For M=3, the total cost is 2 + 1 + 0 + 1 + 2 + 97 = 103. For M=4, the total cost is 3 + 2 + 1 + 0 + 1 + 96 = 103.
A formal proof of correctness can be found on Mathematics SE, although you may convince yourself of the result by noting that if you nudge M a small amount delta in one direction (but not past one of the data points) -- and for example's sake let's say it's in the positive direction, the total cost increases by delta times the number of points to the left of M minus delta times the number of points to the right of M. So M is minimized when the number of points to its left and the right are equal in number, otherwise you could move it a small amount one way or the other to decrease the total cost.
#PaulHankin already provided a perfect answer. Anyway, when thinking about the problem, I didn't think of the median being the solution. But even if you don't know about the median, you can come up with a programming solution.
I made similar observations as #PaulHankin in the last paragraph of his last answer. This made me realize, that I have to eliminate outliers iteratively in order to find m. So I wrote a program that first sorts the input array (vector) A and then analyzes the minimum and maximum values.
The idea is to move the minimum values towards the second smallest values and the maximum values towards the second largest values. You always move either the minimum or maximum values, depending on whether you have less minimum values than maximum values or not. If all array items end up being the same value, then you found your m:
#include <vector>
#include <algorithm>
#include <iostream>
using namespace std;
int getMinCount(vector<int>& A);
int getMaxCount(vector<int>& A);
int main()
{
// Example as given by #PaulHankin
vector<int> A;
A.push_back(1);
A.push_back(2);
A.push_back(3);
A.push_back(4);
A.push_back(100);
sort(A.begin(), A.end());
int minCount = getMinCount(A);
int maxCount = getMaxCount(A);
while (minCount != A.size() && maxCount != A.size())
{
if(minCount <= maxCount)
{
for(int i = 0; i < minCount; i++)
A[i] = A[minCount];
// Recalculate the count of the minium value, because we changed the minimum.
minCount = getMinCount(A);
}
else
{
for(int i = 0; i < maxCount; i++)
A[A.size() - 1 - i] = A[A.size() - 1 - maxCount];
// Recalculate the count of the maximum value, because we changed the maximum.
maxCount = getMaxCount(A);
}
}
// Print out the one and only remaining value, which is m.
cout << A[0] << endl;
return 0;
}
int getMinCount(vector<int>& A)
{
// Count how often the minimum value exists.
int minCount = 1;
int pos = 1;
while (pos < A.size() && A[pos++] == A[0])
minCount++;
return minCount;
}
int getMaxCount(vector<int>& A)
{
// Count how often the maximum value exists.
int maxCount = 1;
int pos = A.size() - 2;
while (pos >= 0 && A[pos--] == A[A.size() - 1])
maxCount++;
return maxCount;
}
If you think about the algorithm, then you will come to the conclusion, that it actually calculates the median of the values in the array A. As example input I took the first example given by #PaulHankin. As expected, the code provides the correct result (3) for it.
I hope my approach helps you to understand how to tackle such kind of problems even if you don't know the correct solution. This is especially helpful when you are in an interview, for example.

C++: function creation using array

Write a function which has:
input: array of pairs (unique id and weight) length of N, K =< N
output: K random unique ids (from input array)
Note: being called many times frequency of appearing of some Id in the output should be greater the more weight it has.
Example: id with weight of 5 should appear in the output 5 times more often than id with weight of 1. Also, the amount of memory allocated should be known at compile time, i.e. no additional memory should be allocated.
My question is: how to solve this task?
EDIT
thanks for responses everybody!
currently I can't understand how weight of pair affects frequency of appearance of pair in the output, can you give me more clear, "for dummy" explanation of how it works?
Assuming a good enough random number generator:
Sum the weights (total_weight)
Repeat K times:
Pick a number between 0 and total_weight (selection)
Find the first pair where the sum of all the weights from the beginning of the array to that pair is greater than or equal to selection
Write the first part of the pair to the output
You need enough storage to store the total weight.
Ok so you are given input as follows:
(3, 7)
(1, 2)
(2, 5)
(4, 1)
(5, 2)
And you want to pick a random number so that the weight of each id is reflected in the picking, i.e. pick a random number from the following list:
3 3 3 3 3 3 3 1 1 2 2 2 2 2 4 5 5
Initially, I created a temporary array but this can be done in memory as well, you can calculate the size of the list by summing all the weights up = X, in this example = 17
Pick a random number between [0, X-1], and calculate which which id should be returned by looping through the list, doing a cumulative addition on the weights. Say I have a random number 8
(3, 7) total = 7 which is < 8
(1, 2) total = 9 which is >= 8 **boom** 1 is your id!
Now since you need K random unique ids you can create a hashtable from initial array passed to you to work with. Once you find an id, remove it from the hash and proceed with algorithm. Edit Note that you create the hashmap initially only once! You algorithm will work on this instead of looking through the array. I did not put in in the top to keep the answer clear
As long as your random calculation is not using any extra memory secretly, you will need to store K random pickings, which are <= N and a copy of the original array so max space requirements at runtime are O(2*N)
Asymptotic runtime is :
O(n) : create copy of original array into hastable +
(
O(n) : calculate sum of weights +
O(1) : calculate random between range +
O(n) : cumulative totals
) * K random pickings
= O(n*k) overall
This is a good question :)
This solution works with non-integer weights and uses constant space (ie: space complexity = O(1)). It does, however modify the input array, but the only difference in the end is that the elements will be in a different order.
Add the weight of each input to the weight of the following input, starting from the bottom working your way up. Now each weight is actually the sum of that input's weight and all of the previous weights.
sum_weights = the sum of all of the weights, and n = N.
K times:
Choose a random number r in the range [0,sum_weights)
binary search the first n elements for the first slot where the (now summed) weight is greater than or equal to r, i.
Add input[i].id to output.
Subtract input[i-1].weight from input[i].weight (unless i == 0). Now subtract input[i].weight from to following (> i) input weights and also sum_weight.
Move input[i] to position [n-1] (sliding the intervening elements down one slot). This is the expensive part, as it's O(N) and we do it K times. You can skip this step on the last iteration.
subtract 1 from n
Fix back all of the weights from n-1 down to 1 by subtracting the preceding input's weight
Time complexity is O(K*N). The expensive part (of the time complexity) is shuffling the chosen elements. I suspect there's a clever way to avoid that, but haven't thought of anything yet.
Update
It's unclear what the question means by "output: K random unique Ids". The solution above assumes that this meant that the output ids are supposed to be unique/distinct, but if that's not the case then the problem is even simpler:
Add the weight of each input to the weight of the following input, starting from the bottom working your way up. Now each weight is actually the sum of that input's weight and all of the previous weights.
sum_weights = the sum of all of the weights, and n = N.
K times:
Choose a random number r in the range [0,sum_weights)
binary search the first n elements for the first slot where the (now summed) weight is greater than or equal to r, i.
Add input[i].id to output.
Fix back all of the weights from n-1 down to 1 by subtracting the preceding input's weight
Time complexity is O(K*log(N)).
My short answer: in no way.
Just because the problem definition is incorrect. As Axn brilliantly noticed:
There is a little bit of contradiction going on in the requirement. It states that K <= N. But as K approaches N, the frequency requirement will be contradicted by the Uniqueness requirement. Worst case, if K=N, all elements will be returned (i.e appear with same frequency), irrespective of their weight.
Anyway, when K is pretty small relative to N, calculated frequencies will be pretty close to theoretical values.
The task may be splitted on two subtasks:
Generate random numbers with a given distribution (specified by weights)
Generate unique random numbers
Generate random numbers with a given distribution
Calculate sum of weights (sumOfWeights)
Generate random number from the range [1; sumOfWeights]
Find an array element where the sum of weights from the beginning of the array is greater than or equal to the generated random number
Code
#include <iostream>
#include <cstdlib>
#include <ctime>
// 0 - id, 1 - weight
typedef unsigned Pair[2];
unsigned Random(Pair* i_set, unsigned* i_indexes, unsigned i_size)
{
unsigned sumOfWeights = 0;
for (unsigned i = 0; i < i_size; ++i)
{
const unsigned index = i_indexes[i];
sumOfWeights += i_set[index][2];
}
const unsigned random = rand() % sumOfWeights + 1;
sumOfWeights = 0;
unsigned i = 0;
for (; i < i_size; ++i)
{
const unsigned index = i_indexes[i];
sumOfWeights += i_set[index][3];
if (sumOfWeights >= random)
{
break;
}
}
return i;
}
Generate unique random numbers
Well known Durstenfeld-Fisher-Yates algorithm may be used for generation unique random numbers. See this great explanation.
It requires N bytes of space, so if N value is defined at compiled time, we are able to allocate necessary space at compile time.
Now, we have to combine these two algorithms. We just need to use our own Random() function instead of standard rand() in unique numbers generation algorithm.
Code
template<unsigned N, unsigned K>
void Generate(Pair (&i_set)[N], unsigned (&o_res)[K])
{
unsigned deck[N];
for (unsigned i = 0; i < N; ++i)
{
deck[i] = i;
}
unsigned max = N - 1;
for (unsigned i = 0; i < K; ++i)
{
const unsigned index = Random(i_set, deck, max + 1);
std::swap(deck[max], deck[index]);
o_res[i] = i_set[deck[max]][0];
--max;
}
}
Usage
int main()
{
srand((unsigned)time(0));
const unsigned c_N = 5; // N
const unsigned c_K = 2; // K
Pair input[c_N] = {{0, 5}, {1, 3}, {2, 2}, {3, 5}, {4, 4}}; // input array
unsigned result[c_K] = {};
const unsigned c_total = 1000000; // number of iterations
unsigned counts[c_N] = {0}; // frequency counters
for (unsigned i = 0; i < c_total; ++i)
{
Generate<c_N, c_K>(input, result);
for (unsigned j = 0; j < c_K; ++j)
{
++counts[result[j]];
}
}
unsigned sumOfWeights = 0;
for (unsigned i = 0; i < c_N; ++i)
{
sumOfWeights += input[i][1];
}
for (unsigned i = 0; i < c_N; ++i)
{
std::cout << (double)counts[i]/c_K/c_total // empirical frequency
<< " | "
<< (double)input[i][1]/sumOfWeights // expected frequency
<< std::endl;
}
return 0;
}
Output
N = 5, K = 2
Frequencies
Empiricical | Expected
0.253813 | 0.263158
0.16584 | 0.157895
0.113878 | 0.105263
0.253582 | 0.263158
0.212888 | 0.210526
Corner case when weights are actually ignored
N = 5, K = 5
Frequencies
Empiricical | Expected
0.2 | 0.263158
0.2 | 0.157895
0.2 | 0.105263
0.2 | 0.263158
0.2 | 0.210526
I do assume that the ids in the output must be unique. This makes this problem a specific instance of random sampling problems.
The first approach that I can think of solves this in O(N^2) time, using O(N) memory (The input array itself plus constant memory).
I Assume that the weights are possitive.
Let A be the array of pairs.
1) Set N to be A.length
2) calculate the sum of all weights W.
3) Loop K times
3.1) r = rand(0,W)
3.2) loop on A and find the first index i such that A[1].w + ...+ A[i].w <= r < A[1].w + ... + A[i+1].w
3.3) add A[i].id to output
3.4) A[i] = A[N-1] (or swap if the array contents should be preserved)
3.5) N = N - 1
3.6) W = W - A[i].w