I am solving a problem where I have a larger array and for given two numbers, I need to find sum of all contiguous sub-arrays between them.
All I could think is of this O(n2) code
for(i = min; i<= max; ++i)
{
sum = 0;
for(j = i; j <= max; ++j)
{
sum+=a[j];
printf("%lld\n", sum);
}
}
Can anyone please help me in optimising this code ?
Using dynamic programming you can achieve O(n) answer. The idea basically is to calculate prefix sums accumulated for all elements.
Let A(i) be the summation of elements from 0 to i. This can be calculated easily in O(n) by:
// let your array by Src[Max]
int A[MAX];
A[0] = Src[0];
for(int i=1; i<MAX; i++) A[i] +=A[i-1] + (i+1)*Src[i];
Then for any elements i and j, you can calculate sum(i,j) = A[j] - A[i] (adjust for boundaries depending on input requirements).
When max-min+1 is n, there would be n(n-1)/2 sums that you need to print. That's O(n2) values. The fastest possible algorithm to produce O(n2) values would have time complexity of O(n2), so your solution is already optimal.
There is no faster solution.
Since your output size is O(n2), not algorithm can be faster.
For insanely large block of computation or things like real time data analysis, since the content of arrays do not change, you may do the calculation in parallel threads.
For general cases, just loop them and let the compiler unroll and use vectorized instructions.
Related
I am doing this problem called 4sum. Link to problem.
Given an array of n numbers and an integer(this is the target
element). Calculate the total number of quads (collection of 4
distinct numbers chosen from these n numbers) are there for which the
sum of quad elements will add up to the target element.
I wrote this code for brute force approach. According to me the big-o time complexity comes out to be --- n^4 log (n^4). I have given the reason below. Although the complexity should be n^4 only. Please help me understand what am i missing.
set<vector<int>>s;
for (int i = 0; i < n; i++) {
for (int j = i + 1; j < n; j++) {
for (int k = j + 1; k < n; k++) {
for(int l = k + 1; l < n; l++) {
if (a[i]+a[j]+a[k]+a[l]==target) {
s.insert({a[i], a[j], a[k], a[l]});
}
}
}
}
}
Logic is to generate all possible quads (with distinct elements), then for each quad check whether the sum of quad elements is equal to the target or not. If yes, then insert the quad in the set.
Now, we cannot know how many quads will match this condition because this solely depends on input. But to get the upper bound we assume that every quad that we check satisfies the condition. Hence, there are a total of N^4 insertions in the set.
For N^4 insertions --- complexity is N^4 log(N^4).
if (a[i]+a[j]+a[k]+a[l]==target) {
s.insert({a[i], a[j], a[k], a[l]});
}
This gets executed O(N^4) times, indeed.
to get the upper bound we assume that every quad that we check satisfies the condition.
Correct.
For N^4 insertions --- complexity is N^4 log(N^4).
Not so, because N^4 insertions do not necessarily result in a set with N^4 elements.
The cost of an insertion is O(log(s.size()). But s.size() is upper-bound by the number of distinct ways K in which target can be expressed as a sum of 4 integers in a given range, so the worst case cost is O(log(K)). While K can be a large number, it does not depend on N, so as far as the complexity in N is concerned, this counts as constant time O(1), and therefore the overall complexity is still O(N^4)·O(1) = O(N^4).
[ EDIT ] Regarding #MysteriousUser's suggestion to use std::unordered_set instead of std::set, that would change the O(1) constant of the loop body to a better one, indeed, but would not change the overall complexity, which would still be O(N^4).
The other option which would in fact increase the complexity to O(N^4 log(N)) as proposed by the OP would be std::multi_set, since in that case each insertion would result in a size increase of the multiset.
What is the best way to sort an section-wise sorted array as depicted in the second image?
The problem is performing a quick-sort using Message Passing Interface. The solution is performing quick-sort on array sections obtained by using MPI_Scatter() then joining the sorted
pieces using MPI_Gather().
Problem is that the array as a whole is unsorted but sections of it are.
Merging the sub-sections similarly to this solution seems like the best way of sorting the array, but considering that the sub-arrays are already within a single array other sorting algorithms may prove better.
The inputs for a sort function would be the array, it's length and the number of equally sorted sub-sections.
A signature would look something like int* sort(int* array, int length, int sections);
The sections parameter can have any value between 1 and 25. The length parameter value is greater than 0, a multiple of sections and smaller than 2^32.
This is what I am currently using:
int* merge(int* input, int length, int sections)
{
int* sub_sections_indices = new int[sections];
int* result = new int[length];
int section_size = length / sections;
for (int i = 0; i < sections; i++) //initialisation
{
sub_sections_indices[i] = 0;
}
int min, min_index, current_index;
for (int i = 0; i < length; i++) //merging
{
min_index = 0;
min = INT_MAX;
for (int j = 0; j < sections; j++)
{
if (sub_sections_indices[j] < section_size)
{
current_index = j * section_size + sub_sections_indices[j];
if (input[current_index] < min)
{
min = input[current_index];
min_index = j;
}
}
}
sub_sections_indices[min_index]++;
result[i] = min;
}
return result;
}
Optimizing for performance
I think this answer that maintains a min-heap of the smallest item of each sub-array is the best way to handle arbitrary input. However, for small values of k, think somewhere between 10 and 100, it might be faster to implement the more naive solutions given in the question you linked to; while maintaining the min-heap is only O(log n) for each step, it might have a higher overhead for small values of n than the simple linear scan from the naive solutions.
All these solutions create a copy of the input, and they maintain O(k) state.
Optimizing for space
The only way to save space I see is to sort in-place. This will be a problem for the algorithms mentioned above. An in-place algorithm will have two swap elements, but any swaps will likely destroy the property that each sub-array is sorted, unless the larger of the swapped pair is re-sorted into the sub-array it is being swapped to, which will result in an O(n²) algorithm. So if you really do need to conserve memory, I think a regular in-place sorting algorithm would have to be used, which defeats your purpose.
Given that the input will be N numbers from 0 to N (with duplicates) how I can optimize the code bellow for both small and big arrays:
void countingsort(int* input, int array_size)
{
int max_element = array_size;//because no number will be > N
int *CountArr = new int[max_element+1]();
for (int i = 0; i < array_size; i++)
CountArr[input[i]]++;
for (int j = 0, outputindex = 0; j <= max_element; j++)
while (CountArr[j]--)
input[outputindex++] = j;
delete []CountArr;
}
Having a stable sort is not a requirement.
edit: In case it's not clear, I am talking about optimizing the algorithm.
IMHO there's nothing wrong here. I highly recommend this approach when max_element is small, numbers sorted are non sparse (i.e. consecutive and no gaps) and greater than or equal to zero.
A small tweak, I'd replace new / delete and just declare a finite array using heap, e.g. 256 for max_element.
int CountArr[256] = { }; // Declare and initialize with zeroes
As you bend these rules, i.e. sparse, negative numbers you'd be struggling with this approach. You will need to find an optimal hashing function to remap the numbers to your efficient array. The more complex the hashing becomes the benefit between this over well established sorting algorithms diminishes.
In terms of complexity this cannot be beaten. It's O(N) and beats standard O(NlogN) sorting by exploiting the extra knowledge that 0<x<N. You cannot go below O(N) because you need at least to swipe through the input array once.
The full problem statement is here. Suppose we have a double ended queue of known values. Each turn, we can take a value out of one or the other end and the values still in the queue increase as value*turns. The goal is to find maximum possible total value.
My first approach was to use straightforward top-down DP with memoization. Let i,j denote starting, ending indexes of "subarray" of array of values A[].
A[i]*age if i == j
f(i,j,age) =
max(f(i+1,j,age+1) + A[i]*age , f(i,j-1,age+1) + A[j]*age)
This works, however, proves to be too slow, as there are superfluous stack calls. Iterative bottom-up should be faster.
Let m[i][j] be the maximum reachable value of the "subarray" of A[] with begin/end indexes i,j. Because i <= j, we care only about the lower triangular part.
This matrix can be built iteratively using the fact that m[i][j] = max(m[i-1][j] + A[i]*age, m[i][j-1] + A[j]*age), where age is maximum on the diagonal (size of A[] and linearly decreases as A.size()-(i-j).
My attempt at implementation meets with bus error.
Is the described algorithm correct? What is the cause for the bus error?
Here is the only part of the code where the bus error might occur:
for(T j = 0; j < num_of_treats; j++) {
max_profit[j][j] = treats[j]*num_of_treats;
for(T i = j+1; i < num_of_treats; i++)
max_profit[i][j] = max( max_profit[i-1][j] + treats[i]*(num_of_treats-i+j),
max_profit[i][j-1] + treats[j]*(num_of_treats-i+j));
}
for(T j = 0; j < num_of_treats; j++) {
Inside this loop, j is clearly a valid index into the array max_profit. But you're not using just j.
The bus error is caused by trying to access array via negative index when j=0 and i=1 as I should have noticed during the debugging. The algorithm is wrong as well. First, the relationship used to construct the max_profit[][] array should is
max_profit[i][j] = max( max_profit[i+1][j] + treats[i]*(num_of_treats-i+j),
max_profit[i][j-1] + treats[j]*(num_of_treats-i+j));
Second, the array must by filled diagonally, so that max_profit[i+1][j] and max_profit[i][j-1] is already computed with exception of the main diagonal.
Third, the data structure chosen is extremely inefficient. I am using only half of the space allocated for max_profit[][]. Plus, at each iteration, I only need the last computed diagonal. An array of size num_of_treats should suffice.
Here is a working code using this improved algorithm. I really like it. I even used bit operators for the first time.
Given a big array which has numbers in range from 1 to 100. What's the best approach to sort it out?
The interviewer was emphasizing on the word range ie max number which is present in the array is 100.
try this:
long result[100] = {0};
for (iterator it = vec.begin(); it != vec.end(); ++it)
{
result[*it - 1]++;
}
So, you will move linear over your vector and count all numbers there exist. As result you will receive how many 1 you had, how many 2 you had and etc, i.e. it will be as sorted.
UPD: as KillianDS wrote, I mean counting sort. It's the fast one.
Well since the answer was basically given, example code. There's no need to copy data from the original array; it can be generated from the data in the histogram, called a variant algorithm in the wiki counting sort variant section:
std::vector <size_t> hist(101, 0); // using index 1 to 100 inclusive
size_t i, j, n;
for (i = 0; i < vec.size(); i++)
hist[vec[i]]++;
i = 0;
for(j = 1; j <= 100; j++)
for(n = hist[j]; n; n--)
vec[i++] = j;
May be they wanted to hear about radix sort.
It seems counting sort is the most suitable algorithm for this problem, it's O(n), stable, and easy to implement. http://en.wikipedia.org/wiki/Counting_sort