Largest rectangles in histogram - c++

I'm working on the below algorithm puzzle and here is the detailed problem statement.
Find the largest rectangle of the histogram; for example, given histogram = [2,1,5,6,2,3], the algorithm should return 10.
I am working on the below version of code. My question is, I think i-nextTop-1 could be replaced by i-top, but in some test cases (e.g. [2,1,2]), they have different results (i-nextTop-1 always produces the correct results). I think logically they should be the same, and wondering in what situations i-nextTop-1 is not equal to i-top
class Solution {
public:
int largestRectangleArea(vector<int>& height) {
height.push_back(0);
int result=0;
stack<int> indexStack;
for(int i=0;i<height.size();i++){
while(!indexStack.empty()&&height[i]<height[indexStack.top()]){
int top=indexStack.top();
indexStack.pop();
int nextTop=indexStack.size()==0?-1:indexStack.top();
result=max((i-nextTop-1)*height[top],result);
}
indexStack.push(i);
}
return result;
}
};

The situations where i-nextTop-1 != i-top occur are when the following is true:
nextTop != top-1
This can be seen by simply rearranging terms in the inequality i-nextTop-1 != i-top.
The key to understanding when this occurs lies in the following line within your code, in which you define the value of nextTop:
int nextTop = indexStack.size() == 0 ? -1 : indexStack.top();
Here, you are saying that if indexStack is empty (following the pop() on the previous line of code), then set nextTop to -1; otherwise set nextTop to the current indexStack.top().
So the only times when nextTop == top-1 are when
indexStack is empty and top == 0, or
indexStack.top() == top - 1.
In those cases, the two methods will always agree. In all other situations, they will not agree, and will produce different results.
You can see what is happening by printing the values of i, nextTop, (i - top), (i - nextTop - 1), and result for each iteration at the bottom of the while loop. The vector {5, 4, 3, 2, 1} works fine, but { 1, 2, 3, 4, 5} does not, when replacing i-nextTop-1 with i-top.
Theory of the Algorithm
The outer for loop iterates through the histogram elements one at a time. Elements are pushed onto the stack from left to right, and upon entry to the while loop the top of stack contains the element just prior to (or just to the left of) the current element. (This is because the current element is pushed onto the stack at the bottom of the for loop, right before looping back to the top.)
An element is popped off the stack within the while loop, when the algorithm has determined that the best possible solution that includes that element has already been considered.
The inner while loop will keep iterating as long as height[i] < height[indexStack.top()], that is, as long as the the height of the current element is less than the height of the element on the top of the stack.
At the start of each iteration of the while loop, the elements on the stack represent all of the contiguous elements to the immediate left of the current element, that are larger than the current element.
This allows the algorithm to calculate the area of the largest rectangle to the left of and including the current element. This calculation is done in the following two lines of code:
int nextTop = indexStack.size() == 0 ? -1 : indexStack.top();
result = max((i - nextTop - 1) * height[top], result);
Variable i is the index of the current histogram element, and represents the rightmost edge of the rectangle being currently calculated.
Variable nextTop represents the index of the leftmost edge of the rectangle.
The expression (i - nextTop - 1) represents the horizontal width of the rectangle. height[top] is the vertical height of the rectangle, so the result is the product of these two terms.
Each new result is the larger of the new calculation and the previous value for result.

Related

Intuition behind incrementing the iteration variable?

I am solving a question on LeetCode.com:
Given an array with n objects colored red, white or blue, sort them in-place so that objects of the same color are adjacent, with the colors in the order red, white and blue. Here, they use the integers 0, 1, and 2 to represent the color red, white, and blue respectively. [The trivial counting sort cannot be used].
For the input: [2,0,2,1,1,0]; the output expected is: [0,0,1,1,2,2].
One of the highly upvoted solutions goes like this:
public void sortColors(vector<int>& A) {
if(A.empty() || A.size()<2) return;
int low = 0;
int high = A.size()-1;
for(int i = low; i<=high;) {
if(A[i]==0) {
// swap A[i] and A[low] and i,low both ++
int temp = A[i];
A[i] = A[low];
A[low]=temp;
i++;low++;
}else if(A[i]==2) {
//swap A[i] and A[high] and high--;
int temp = A[i];
A[i] = A[high];
A[high]=temp;
high--;
}else {
i++;
}
}
}
My question is, why is i incremented when A[i]==0 and A[i]==1 and not when A[i]==2? Using pen and paper, the algorithm just works to give me the answer; but could you please provide some intuition?
Thanks!
This steps through the array and maintains the constraint that the elements 0..i are sorted, and all either 0 or 1. (The 2's that were there get swapped to the end of the array.)
When A[i]==0, you're swapping the element at i (which we just said was 0) with the element at low, which is the first 1-element (if any) in the range 0..i. Hence, after the swap, A[i]==1 which is OK (the constraint is still valid). We can safely move forward in the array now. The same is true if A[i]==1 originally, in which case no swap is performed.
When A[i]==2, you're essentially moving element i (which we just said was 2) to the end of the array. But you're also moving something from the end of the array into element i's place, and we don't know what that element is (because we haven't processed it before, unlike the A[i]==0 case). Hence, we cannot safely move i forward, because the new element at A[i] might not be in the right place yet. We need another iteration to process the new A[i].
That is, because for 0s and 1s, only items left of the current item are handled and those have already been reviewed / sorted. Only for 2s items from the right end of the array are handled, which haven't been looked at yet.
To be more specific: In this specific example only three different states are handled:
the current item being reviewed equals 0: in this case this sorting algorithm just puts this item at the end of all zeros, which have already been sorted (aka A[low]). Also the item which was at A[low] before can only be a 0 or 1 (since they have already sorted) which means you can just swap with the current item and not break the sequence. Now the interesting part: up until now, every item from A[0] over A[low] to A[i] has been already sorted, so the next item which has to be reviewed will be A[i + 1], hence the i++
the current item equals 1: in this case, no swapping has to be done, since all 0s and 1s has already been put in A[0] to A[i - 1] and all 2s have already been put at the end of the array. That means, the next item to be reviewed is A[i + 1], hence the i++
the current item equals 2: in this case, the current item will be put at the end of the array, next to (i.e., to the left of) all the other already sorted 2s (A[high]). The item, which will be swapped from A[high] to A[i] has not been sorted yet and therefor has to be reviewed in the next step, hence th i = i;

Finding maximum values of rests of array

For example:
array[] = {3, 9, 10, **12**,1,4,**7**,2,**6**,***5***}
First, I need maximum value=12 then I need maximum value among the rest of array (1,4,7,2,6,5), so value=7, then maxiumum value of the rest of array 6, then 5, After that, i will need series of this values. This gives back (12,7,6,5).
How to get these numbers?
I have tried the following code, but it seems to infinite
I think I'll need ​​a recursive function but how can I do this?
max=0; max2=0;...
for(i=0; i<array_length; i++){
if (matrix[i] >= max)
max=matrix[i];
else {
for (j=i; j<array_length; j++){
if (matrix[j] >= max2)
max2=matrix[j];
else{
...
...for if else for if else
...??
}
}
}
}
This is how you would do that in C++11 by using the std::max_element() standard algorithm:
#include <vector>
#include <algorithm>
#include <iostream>
int main()
{
int arr[] = {3,5,4,12,1,4,7,2,6,5};
auto m = std::begin(arr);
while (m != std::end(arr))
{
m = std::max_element(m, std::end(arr));
std::cout << *(m++) << std::endl;
}
}
Here is a live example.
This is an excellent spot to use the Cartesian tree data structure. A Cartesian tree is a data structure built out of a sequence of elements with these properties:
The Cartesian tree is a binary tree.
The Cartesian tree obeys the heap property: every node in the Cartesian tree is greater than or equal to all its descendants.
An inorder traversal of a Cartesian tree gives back the original sequence.
For example, given the sequence
4 1 0 3 2
The Cartesian tree would be
4
\
3
/ \
1 2
\
0
Notice that this obeys the heap property, and an inorder walk gives back the sequence 4 1 0 3 2, which was the original sequence.
But here's the key observation: notice that if you start at the root of this Cartesian tree and start walking down to the right, you get back the number 4 (the biggest element in the sequence), then 3 (the biggest element in what comes after that 4), and the number 2 (the biggest element in what comes after the 3). More generally, if you create a Cartesian tree for the sequence, then start at the root and keep walking to the right, you'll get back the sequence of elements that you're looking for!
The beauty of this is that a Cartesian tree can be constructed in time Θ(n), which is very fast, and walking down the spine takes time only O(n). Therefore, the total amount of time required to find the sequence you're looking for is Θ(n). Note that the approach of "find the largest element, then find the largest element in the subarray that appears after that, etc." would run in time Θ(n2) in the worst case if the input was sorted in descending order, so this solution is much faster.
Hope this helps!
If you can modify the array, your code will become simpler. Whenever you find a max, output that and change its value inside the original array to some small number, for example -MAXINT. Once you have output the number of elements in the array, you can stop your iterations.
std::vector<int> output;
for (auto i : array)
{
auto pos = std::find_if(output.rbegin(), output.rend(), [i](int n) { return n > i; }).base();
output.erase(pos,output.end());
output.push_back(i);
}
Hopefully you can understand that code. I'm much better at writing algorithms in C++ than describing them in English, but here's an attempt.
Before we start scanning, output is empty. This is the correct state for an empty input.
We start by looking at the first unlooked at element I of the input array. We scan backwards through the output until we find an element G which is greater than I. Then we erase starting at the position after G. If we find none, that means that I is the greatest element so far of the elements we've searched, so we erase the entire output. Otherwise, we erase every element after G, because I is the greatest starting from G through what we've searched so far. Then we append I to output. Repeat until the input array is exhausted.

Fastest way to find median in dynamically growing range

Can anyone suggest any methods or link to implementations of fast median finding for dynamic ranges in c++? For example, suppose that for iterations in my program the range grows, and I want to find the median at each run.
Range
4
3,4
8,3,4
2,8,3,4
7,2,8,3,4
So the above code would ultimately produce 5 median values for each line.
The best you can get without also keeping track of a sorted copy of your array is re-using the old median and updating this with a linear-time search of the next-biggest value. This might sound simple, however, there is a problem we have to solve.
Consider the following list (sorted for easier understanding, but you keep them in an arbitrary order):
1, 2, 3, 3, 3, 4, 5
// *
So here, the median is 3 (the middle element since the list is sorted). Now if you add a number which is greater than the median, this potentially "moves" the median to the right by one half index. I see two problems: How can we advance by one half index? (Per definition, the median is the mean value of the next two values.) And how do we know at which 3 the median was, when we only know the median was 3?
This can be solved by storing not only the current median but also the position of the median within the numbers of same value, here it has an "index offset" of 1, since it's the second 3. Adding a number greater than or equal to 3 to the list changes the index offset to 1.5. Adding a number less than 3 changes it to 0.5.
When this number becomes less than zero, the median changes. It also have to change if it goes beyond the count of equal numbers (minus 1), in this case 2, meaning the new median is more than the last equal number. In both cases, you have to search for the next smaller / next greater number and update the median value. To always know what the upper limit for the index offset is (in this case 2), you also have to keep track of the count of equal numbers.
This should give you a rough idea of how to implement median updating in linear time.
I think you can use a min-max-median heap. Each time when the array is updated, you just need log(n) time to find the new median value. For a min-max-median heap, the root is the median value, the left tree is a min-max heap, while the right side is a max-min heap. Please refer the paper "Min-Max Heaps and Generailized Priority Queues" for the details.
Fins some code below, I have reworked this stack to give your necessary output
private void button1_Click(object sender, EventArgs e)
{
string range = "7,2,8,3,4";
decimal median = FindMedian(range);
MessageBox.Show(median.ToString());
}
public decimal FindMedian(string source)
{
// Create a copy of the input, and sort the copy
int[] temp = source.Split(',').Select(m=> Convert.ToInt32(m)).ToArray();
Array.Sort(temp);
int count = temp.Length;
if (count == 0) {
throw new InvalidOperationException("Empty collection");
}
else if (count % 2 == 0) {
// count is even, average two middle elements
int a = temp[count / 2 - 1];
int b = temp[count / 2];
return (a + b) / 2m;
}
else {
// count is odd, return the middle element
return temp[count / 2];
}
}

Find dominant mode of an unsorted array

Note, this is a homework assignment.
I need to find the mode of an array (positive values) and secondarily return that value if the mode is greater that sizeof(array)/2,the dominant value. Some arrays will have neither.
That is simple enough, but there is a constraint that the array must NOT be sorted prior to the determination, additionally, the complexity must be on the order of O(nlogn).
Using this second constraint, and the master theorem we can determine that the time complexity 'T(n) = A*T(n/B) + n^D' where A=B and log_B(A)=D for O(nlogn) to be true. Thus, A=B=D=2. This is also convenient since the dominant value must be dominant in the 1st, 2nd, or both halves of an array.
Using 'T(n) = A*T(n/B) + n^D' we know that the search function will call itself twice at each level (A), divide the problem set by 2 at each level (B). I'm stuck figuring out how to make my algorithm take into account the n^2 at each level.
To make some code of this:
int search(a,b) {
search(a, a+(b-a)/2);
search(a+(b-a)/2+1, b);
}
The "glue" I'm missing here is how to combine these divided functions and I think that will implement the n^2 complexity. There is some trick here where the dominant must be the dominant in the 1st or 2nd half or both, not quite sure how that helps me right now with the complexity constraint.
I've written down some examples of small arrays and I've drawn out ways it would divide. I can't seem to go in the correct direction of finding one, single method that will always return the dominant value.
At level 0, the function needs to call itself to search the first half and second half of the array. That needs to recurse, and call itself. Then at each level, it needs to perform n^2 operations. So in an array [2,0,2,0,2] it would split that into a search on [2,0] and a search on [2,0,2] AND perform 25 operations. A search on [2,0] would call a search on [2] and a search on [0] AND perform 4 operations. I'm assuming these would need to be a search of the array space itself. I was planning to use C++ and use something from STL to iterate and count the values. I could create a large array and just update counts by their index.
if some number occurs more than half, it can be done by O(n) time complexity and O(1) space complexity as follow:
int num = a[0], occ = 1;
for (int i=1; i<n; i++) {
if (a[i] == num) occ++;
else {
occ--;
if (occ < 0) {
num = a[i];
occ = 1;
}
}
}
since u r not sure whether such number occurs, all u need to do is to apply the above algorithm to get a number first, then iterate the whole array 2nd time to get the occurance of the number and check whether it is greater than half.
If you want to find just the dominant mode of an array, and do it recursively, here's the pseudo-code:
def DominantMode(array):
# if there is only one element, that's the dominant mode
if len(array) == 1: return array[0]
# otherwise, find the dominant mode of the left and right halves
left = DominantMode(array[0:len(array)/2])
right = DominantMode(array[len(array)/2:len(array)])
# if both sides have the same dominant mode, the whole array has that mode
if left == right: return left
# otherwise, we have to scan the whole array to determine which one wins
leftCount = sum(element == left for element in array)
rightCount = sum(element == right for element in array)
if leftCount > len(array) / 2: return left
if rightCount > len(array) / 2: return right
# if neither wins, just return None
return None
The above algorithm is O(nlogn) time but only O(logn) space.
If you want to find the mode of an array (not just the dominant mode), first compute the histogram. You can do this in O(n) time (visiting each element of the array exactly once) by storing the historgram in a hash table that maps the element value to its frequency.
Once the histogram has been computed, you can iterate over it (visiting each element at most once) to find the highest frequency. Once you find a frequency larger than half the size of the array, you can return immediately and ignore the rest of the histogram. Since the size of the histogram can be no larger than the size of the original array, this step is also O(n) time (and O(n) space).
Since both steps are O(n) time, the resulting algorithmic complexity is O(n) time.

Finding the smallest window

Given two arrays A[n] and B[m], how can I find the smallest window in A that contains all the elements of B.
I am trying to solve this problem in O(n) time but I am having problem doing it. Is there any well know algorithm or procedure for solving it.
If m > n, A cannot contain all the elements of B (and hence we have an O(1) solution).
Otherwise:
Create a hash table mapping elements of B to the sequence {0..m-1} (this is O(n) since m <= n) .
Create an array C[m] to count occurences of the members of B (initialise to 0) in the current window.
Create a variable z to count the the number of 0 elements of C (initialise to m).
Create variables s and e to denote the start and end of the current window
while e < n:
If z is nonzero, increment e and update C and z. O(1)
else consider this window as a possible solution (i.e. if it's the min so far, store it), then increment s and update C and z. O(1)
The while loop can be shown to have no more than 2n iterations. So the whole thing is O(n), I think.
countLet's call window 'minimal' if it can't be reduced. I.e., after increasing its left border or decreasing its right border it's no longer valid window (doesn't contain all elements from B). There three in your example: [0, 2], [2, 6], [6, 7]
Let's assume say that you already found leftmost minimal window [left, right]. ([0, 2] in your example) Now we'll just slide it to the right.
// count[x] tells how many times number 'x'
// happens between 'left' and 'right' in 'A'
while (right < N - 1) {
// move right border by 1
++right;
if (A[right] is in B) {
++count[A[right]];
}
// check if we can move left border now
while (count[A[left]] > 1) {
--count[A[left]];
++left;
}
// check if current window is optimal
if (right - left + 1 < currentMin) {
currentMin = right - left + 1;
}
}
This sliding works because different 'minimal' windows can't contain one another.