Treats for the cows - bottom up dynamic programming - c++

The full problem statement is here. Suppose we have a double ended queue of known values. Each turn, we can take a value out of one or the other end and the values still in the queue increase as value*turns. The goal is to find maximum possible total value.
My first approach was to use straightforward top-down DP with memoization. Let i,j denote starting, ending indexes of "subarray" of array of values A[].
A[i]*age if i == j
f(i,j,age) =
max(f(i+1,j,age+1) + A[i]*age , f(i,j-1,age+1) + A[j]*age)
This works, however, proves to be too slow, as there are superfluous stack calls. Iterative bottom-up should be faster.
Let m[i][j] be the maximum reachable value of the "subarray" of A[] with begin/end indexes i,j. Because i <= j, we care only about the lower triangular part.
This matrix can be built iteratively using the fact that m[i][j] = max(m[i-1][j] + A[i]*age, m[i][j-1] + A[j]*age), where age is maximum on the diagonal (size of A[] and linearly decreases as A.size()-(i-j).
My attempt at implementation meets with bus error.
Is the described algorithm correct? What is the cause for the bus error?
Here is the only part of the code where the bus error might occur:
for(T j = 0; j < num_of_treats; j++) {
max_profit[j][j] = treats[j]*num_of_treats;
for(T i = j+1; i < num_of_treats; i++)
max_profit[i][j] = max( max_profit[i-1][j] + treats[i]*(num_of_treats-i+j),
max_profit[i][j-1] + treats[j]*(num_of_treats-i+j));
}

for(T j = 0; j < num_of_treats; j++) {
Inside this loop, j is clearly a valid index into the array max_profit. But you're not using just j.

The bus error is caused by trying to access array via negative index when j=0 and i=1 as I should have noticed during the debugging. The algorithm is wrong as well. First, the relationship used to construct the max_profit[][] array should is
max_profit[i][j] = max( max_profit[i+1][j] + treats[i]*(num_of_treats-i+j),
max_profit[i][j-1] + treats[j]*(num_of_treats-i+j));
Second, the array must by filled diagonally, so that max_profit[i+1][j] and max_profit[i][j-1] is already computed with exception of the main diagonal.
Third, the data structure chosen is extremely inefficient. I am using only half of the space allocated for max_profit[][]. Plus, at each iteration, I only need the last computed diagonal. An array of size num_of_treats should suffice.
Here is a working code using this improved algorithm. I really like it. I even used bit operators for the first time.

Related

Efficient algorithm to produce closest triplet from 3 arrays?

I need to implement an algorithm in C++ that, when given three arrays of unequal sizes, produces triplets a,b,c (one element contributed by each array) such that max(a,b,c) - min(a,b,c) is minimized. The algorithm should produce a list of these triplets, in order of size of max(a,b,c)-min(a,b,c). The arrays are sorted.
I've implemented the following algorithm (note that I now use arrays of type double), however it runs excruciatingly slow (even when compiled using GCC with -03 optimization, and other combinations of optimizations). The dataset (and, therefore, each array) has potentially tens of millions of elements. Is there a faster/more efficient method? A significant speed increase is necessary to accomplish the required task in a reasonable time frame.
void findClosest(vector<double> vec1, vector<double> vec2, vector<double> vec3){
//calculate size of each array
int len1 = vec1.size();
int len2 = vec2.size();
int len3 = vec3.size();
int i = 0; int j = 0; int k = 0; int res_i, res_j, res_k;
int diff = INT_MAX;
int iter = 0; int iter_bound = min(min(len1,len2),len3);
while(iter < iter_bound)
while(i < len1 && j < len2 && k < len3){
int minimum = min(min(vec1[i], vec2[j]), vec3[k]);
int maximum = max(max(vec1[i], vec2[j]), vec3[k]);
//if new difference less than previous difference, update difference, store
//resultants
if(fabs(maximum - minimum) < diff){ diff = maximum-minimum; res_i = i; res_j = j; res_k = k;}
//increment minimum value
if(vec1[i] == minimum) ++i;
else if(vec2[j] == minimum) ++j;
else ++k;
}
//"remove" triplet
vec1.erase(vec1.begin() + res_i);
vec2.erase(vec2.begin() + res_j);
vec3.erase(vec3.begin() + res_k);
--len1; --len2; --len3;
++iter_bound;
}
OK, you're going to need to be clever in a few ways to make this run well.
The first thing that you need is a priority queue, which is usually implemented with a heap. With that, the algorithm in pseudocode is:
Make a priority queue for possible triples in order of max - min, then how close median is to their average.
Make a pass through all 3 arrays, putting reasonable triples for every element into the priority queue
While the priority queue is not empty:
Pull a triple out
If all three of the triple are not used:
Add triple to output
Mark the triple used
else:
If you can construct reasonable triplets for unused elements:
Add them to the queue
Now for this operation to succeed, you need to efficiently find elements that are currently unused. Doing that at first is easy, just keep an array of bools where you mark off the indexes of the used values. But once a lot have been taken off, your search gets long.
The trick for that is to have a vector of bools for individual elements, a second for whether both in a pair have been used, a third for where all 4 in a quadruple have been used and so on. When you use an element just mark the individual bool, then go up the hierarchy, marking off the next level if the one you're paired with is marked off, else stopping. This additional data structure of size 2n will require an average of marking 2 bools per element used, but allows you to find the next unused index in either direction in at most O(log(n)) steps.
The resulting algorithm will be O(n log(n)).

Time complexity of mandelbrot set in term of big O notation

I'm trying to find the time complexity of a simple implementation of mandelbrot set. with following code
int main(){
int rows, columns, iterations;
rows = 22;
columns = 72;
iterations = 28;
char matrix[max_rows][max_columns];
for(int r = 0; r < rows; ++r){
for(int c = 0; c < columns; ++c){
complex<float> z;
int itr = 0;
while(abs(z) < 2 && ++itr < iterations)
z = pow(z, 2) + decltype(z)((float)c * 2 / columns - 1.5,
(float)r * 2 / rows - 1);
matrix[r][c]=(itr== iterations ? '*' : '.');
}
}
Now looking at above code i made some estimation for time complexity in terms of big O notation and want to know if it is correct or not
So we are creating a 2d array traversing it through nested loops and and at each element we are performing an operation and setting a value of that element, if we take n as input size we can say that greater the input the greater will be the complexity, so the time complexity for rowsxcolumns would be O(rxc) and then again we are traversing it for printout, so what would be the time complexity? is it O(rxc)+O(rxc) ? does the function itself have some effect on time complexity when we are doing multiplication and subtraction on rows and columns? If yes then how?
Almost, given r rows, c columns and i iterations then the running time is O(r*c*i). This should be trivial to see if abs(z)<2 is not there. But with this extra condition its not clear how many times will the inner while loop run in total. Yes, it will be less than r*c*i times, so O(r*c*i) is still the upper bound. But perhaps we might do better. Given that for any r,c you compute Mandelbrot set over the same domain with varying resolution then the while loop will run k*r*c*i times for some constant k which is somewhere between area-of-Mandelbrot-set-over-area-of-the-domain and 1 --> Running time of the code is Θ(r*c*i) and O(r*c*i) cannot be improved.
Had you computed the set over [-c,c]x[-r,r] domain with fixed resolution then for any |z|>2 the abs(z)<2 breaks after first iteration. Then O(r*c*i) would not be tight bound and this condition (as all loop conditions) should be considered if you want accurate estimation.
Please don't use malloc, std::vector is safer.
In big-O notation, O(rxc)+O(rxc) collapses to O(rxc).
Since the maximal iteration count is also an input variable, it has an influence on the complexity as well. In particular, the inner loop runs at most n iterations, therefore, your complexity is O(rxcxn).
All other operations are constant, in particular multiplication and addition of complex<float>. These operations by themselves are always O(1), which does not contribut to the overall complexity.

Merging K Sorted Arrays/Vectors Complexity

While looking into the problem of merging k sorted contiguous arrays/vectors and how it differs in implementation from merging k sorted linked lists I found two relatively easy naive solutions for merging k contiguous arrays and a nice optimized method based off of pairwise-merging that simulates how mergeSort() works. The two naive solutions I implemented seem to have the same complexity, but in a big randomized test I ran it seems one is way more inefficient than the other.
Naive merging
My naive merging method works as follows. We create an output vector<int> and set it to the first of k vectors we are given. We then merge in the second vector, then the third, and so on. Since a typical merge() method that takes in two vectors and returns one is asymptotically linear in both space and time to the number of elements in both vectors the total complexity will be O(n + 2n + 3n + ... + kn) where n is the average number of elements in each list. Since we're adding 1n + 2n + 3n + ... + kn I believe the total complexity is O(n*k^2). Consider the following code:
vector<int> mergeInefficient(const vector<vector<int> >& multiList) {
vector<int> finalList = multiList[0];
for (int j = 1; j < multiList.size(); ++j) {
finalList = mergeLists(multiList[j], finalList);
}
return finalList;
}
Naive selection
My second naive solution works as follows:
/**
* The logic behind this algorithm is fairly simple and inefficient.
* Basically we want to start with the first values of each of the k
* vectors, pick the smallest value and push it to our finalList vector.
* We then need to be looking at the next value of the vector we took the
* value from so we don't keep taking the same value. A vector of vector
* iterators is used to hold our position in each vector. While all iterators
* are not at the .end() of their corresponding vector, we maintain a minValue
* variable initialized to INT_MAX, and a minValueIndex variable and iterate over
* each of the k vector iterators and if the current iterator is not an end position
* we check to see if it is smaller than our minValue. If it is, we update our minValue
* and set our minValue index (this is so we later know which iterator to increment after
* we iterate through all of them). We do a check after our iteration to see if minValue
* still equals INT_MAX. If it has, all iterators are at the .end() position, and we have
* exhausted every vector and can stop iterative over all k of them. Regarding the complexity
* of this method, we are iterating over `k` vectors so long as at least one value has not been
* accounted for. Since there are `nk` values where `n` is the average number of elements in each
* list, the time complexity = O(nk^2) like our other naive method.
*/
vector<int> mergeInefficientV2(const vector<vector<int> >& multiList) {
vector<int> finalList;
vector<vector<int>::const_iterator> iterators(multiList.size());
// Set all iterators to the beginning of their corresponding vectors in multiList
for (int i = 0; i < multiList.size(); ++i) iterators[i] = multiList[i].begin();
int k = 0, minValue, minValueIndex;
while (1) {
minValue = INT_MAX;
for (int i = 0; i < iterators.size(); ++i){
if (iterators[i] == multiList[i].end()) continue;
if (*iterators[i] < minValue) {
minValue = *iterators[i];
minValueIndex = i;
}
}
iterators[minValueIndex]++;
if (minValue == INT_MAX) break;
finalList.push_back(minValue);
}
return finalList;
}
Random simulation
Long story short, I built a simple randomized simulation that builds a multidimensional vector<vector<int>>. The multidimensional vector starts with 2 vectors each of size 2, and ends up with 600 vectors each of size 600. Each vector is sorted, and the sizes of the larger container and each child vector increase by two elements every iteration. I time how long it takes for each algorithm to perform like this:
clock_t clock_a_start = clock();
finalList = mergeInefficient(multiList);
clock_t clock_a_stop = clock();
clock_t clock_b_start = clock();
finalList = mergeInefficientV2(multiList);
clock_t clock_b_stop = clock();
I then built the following plot:
My calculations say the two naive solutions (merging and selecting) both have the same time complexity but the above plot shows them as very different. At first I rationalized this by saying there may be more overhead in one vs the other, but then realized that the overhead should be a constant factor and not produce a plot like the following. What is the explanation for this? I assume my complexity analysis is wrong?
Even if two algorithms have the same complexity (O(nk^2) in your case) they may end up having enormously different running times depending upon your size of input and the 'constant' factors involved.
For example, if an algorithm runs in n/1000 time and another algorithm runs in 1000n time, they both have the same asymptotic complexity but they shall have very different running times for 'reasonable' choices of n.
Moreover, there are effects caused by caching, compiler optimizations etc that may change the running time significantly.
For your case, although your calculation of complexities seem to be correct, but in the first case, the actual running time shall be (nk^2 + nk)/2 whereas in the second case, the running time shall be nk^2. Notice that the division by 2 may be significant because as k increases the nk term shall be negligible.
For a third algorithm, you can modify the Naive selection by maintaining a heap of k elements containing the first elements of all the k vectors. Then your selection process shall take O(logk) time and hence the complexity shall reduce to O(nklogk).

Algorithm analysis: Am I analyzing these algorithms correctly? How to approach problems like these [closed]

This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center.
Closed 10 years ago.
1)
x = 25;
for (int i = 0; i < myArray.length; i++)
{
if (myArray[i] == x)
System.out.println("found!");
}
I think this one is O(n).
2)
for (int r = 0; r < 10000; r++)
for (int c = 0; c < 10000; c++)
if (c % r == 0)
System.out.println("blah!");
I think this one is O(1), because for any input n, it will run 10000 * 10000 times. Not sure if this is right.
3)
a = 0
for (int i = 0; i < k; i++)
{
for (int j = 0; j < i; j++)
a++;
}
I think this one is O(i * k). I don't really know how to approach problems like this where the inner loop is affected by variables being incremented in the outer loop. Some key insights here would be much appreciated. The outer loop runs k times, and the inner loop runs 1 + 2 + 3 + ... + k times. So that sum should be (k/2) * (k+1), which would be order of k^2. So would it actually be O(k^3)? That seems too large. Again, don't know how to approach this.
4)
int key = 0; //key may be any value
int first = 0;
int last = intArray.length-1;;
int mid = 0;
boolean found = false;
while( (!found) && (first <= last) )
{
mid = (first + last) / 2;
if(key == intArray[mid])
found = true;
if(key < intArray[mid])
last = mid - 1;
if(key > intArray[mid])
first = mid + 1;
}
This one, I think is O(log n). But, I came to this conclusion because I believe it is a binary search and I know from reading that the runtime is O(log n). I think it's because you divide the input size by 2 for each iteration of the loop. But, I don't know if this is the correct reasoning or how to approach similar algorithms that I haven't seen and be able to deduce that they run in logarithmic time in a more verifiable or formal way.
5)
int currentMinIndex = 0;
for (int front = 0; front < intArray.length; front++)
{
currentMinIndex = front;
for (int i = front; i < intArray.length; i++)
{
if (intArray[i] < intArray[currentMinIndex])
{
currentMinIndex = i;
}
}
int tmp = intArray[front];
intArray[front] = intArray[currentMinIndex];
intArray[currentMinIndex] = tmp;
}
I am confused about this one. The outer loop runs n times. And the inner for loop runs
n + (n-1) + (n-2) + ... (n - k) + 1 times? So is that O(n^3) ??
More or less, yes.
1 is correct - it seems you are searching for a specific element in what I assume is an un-sorted collection. If so, the worst case is that the element is at the very end of the list, hence O(n).
2 is correct, though a bit strange. It is O(1) assuming r and c are constants and the bounds are not variables. If they are constant, then yes O(1) because there is nothing to input.
3 I believe that is considered O(n^2) still. There would be some constant factor like k * n^2, drop the constant and you got O(n^2).
4 looks a lot like a binary search algorithm for a sorted collection. O(logn) is correct. It is log because at each iteration you are essentially halving the # of possible choices in which the element you are looking for could be in.
5 is looking like a bubble sort, O(n^2), for similar reasons to 3.
O() doesn't mean anything in itself: you need to specify if you are counting the "worst-case" O, or the average-case O. For some sorting algorithm, they have a O(n log n) on average but a O(n^2) in worst case.
Basically you need to count the overall number of iterations of the most inner loop, and take the biggest component of the result without any constant (for example if you have k*(k+1)/2 = 1/2 k^2 + 1/2 k, the biggest component is 1/2 k^2 therefore you are O(k^2)).
For example, your item 4) is in O(log(n)) because, if you work on an array of size n, then you will run one iteration on this array, and the next one will be on an array of size n/2, then n/4, ..., until this size reaches 1. So it is log(n) iterations.
Your question is mostly about the definition of O().
When someone say this algorithm is O(log(n)), you have to read:
When the input parameter n becomes very big, the number of operations performed by the algorithm grows at most in log(n)
Now, this means two things:
You have to have at least one input parameter n. There is no point in talking about O() without one (as in your case 2).
You need to define the operations that you are counting. These can be additions, comparison between two elements, number of allocated bytes, number of function calls, but you have to decide. Usually you take the operation that's most costly to you, or the one that will become costly if done too many times.
So keeping this in mind, back to your problems:
n is myArray.Length, and the number of operations you're counting is '=='. In that case the answer is exactly n, which is O(n)
you can't specify an n
the n can only be k, and the number of operations you count is ++. You have exactly k*(k+1)/2 which is O(n2) as you say
this time n is the length of your array again, and the operation you count is ==. In this case, the number of operations depends on the data, usually we talk about 'worst case scenario', meaning that of all the possible outcome, we look at the one that takes the most time. At best, the algorithm takes one comparison. For the worst case, let's take an example. If the array is [[1,2,3,4,5,6,7,8,9]] and you are looking for 4, your intArray[mid] will become successively, 5, 3 and then 4, and so you would have done the comparison 3 times. In fact, for an array which size is 2^k + 1, the maximum number of comparison is k (you can check). So n = 2^k + 1 => k = ln(n-1)/ln(2). You can extend this result to the case when n is not = 2^k + 1, and you will get complexity = O(ln(n))
In any case, I think you are confused because you don't exactly know what O(n) means. I hope this is a start.

Long array performance issue

I have an array of char pointers of length 175,000. Each pointer points to a c-string array of length 100, each character is either 1 or 0. I need to compare the difference between the strings.
char* arr[175000];
So far, I have two for loops where I compare every string with every other string. The comparison functions basically take two c-strings and returns an integer which is the number of differences of the arrays.
This is taking really long on my 4-core machine. Last time I left it to run for 45min and it never finished executing. Please advise of a faster solution or some optimizations.
Example:
000010
000001
have a difference of 2 since the last two bits do not match.
After i calculate the difference i store the value in another array
int holder;
for(int x = 0;x < UsedTableSpace; x++){
int min = 10000000;
for(int y = 0; y < UsedTableSpace; y++){
if(x != y){
//compr calculates difference between two c-string arrays
int tempDiff =compr(similarity[x]->matrix, similarity[y]->matrix);
if(tempDiff < min){
min = tempDiff;
holder = y;
}
}
}
similarity[holder]->inbound++;
}
With more information, we could probably give you better advice, but based on what I understand of the question, here are some ideas:
Since you're using each character to represent a 1 or a 0, you're using several times more memory than you need to use, which creates a big performance impact when it comes to caching and such. Instead, represent your data using numeric values that you can think of in terms of a series of bits.
Once you've implemented #1, you can grab an entire integer or long at a time and do a bitwise XOR operation to end up with a number that has a 1 in every place where the two numbers didn't have the same values. Then you can use some of the tricks mentioned here to count these bits speedily.
Work on "unrolling" your loops somewhat to avoid the number of jumps necessary. For example, the following code:
total = total + array[i];
total = total + array[i + 1];
total = total + array[i + 2];
... will work faster than just looping over total = total + array[i] three times. Jumps are expensive, and interfere with the processor's pipelining. Update: I should mention that your compiler may be doing some of this for you already--you can check the compiled code to see.
Break your overall data set into chunks that will allow you to take full advantage of caching. Think of your problem as a "square" with the i index on one axis and the j axis on the other. If you start with one i and iterate across all 175000 j values, the first j values you visit will be gone from the cache by the time you get to the end of the line. On the other hand, if you take the top left corner and go from j=0 to 256, most of the values on the j axis will still be in a low-level cache as you loop around to compare them with i=0, 1, 2, etc.
Lastly, although this should go without saying, I guess it's worth mentioning: Make sure your compiler is set to optimize!
One simple optimization is to compare the strings only once. If the difference between A and B is 12, the difference between B and A is also 12. Your running time is going to drop almost half.
In code:
int compr(const char* a, const char* b) {
int d = 0, i;
for (i=0; i < 100; ++i)
if (a[i] != b[i]) ++d;
return d;
}
void main_function(...) {
for(int x = 0;x < UsedTableSpace; x++){
int min = 10000000;
for(int y = x + 1; y < UsedTableSpace; y++){
//compr calculates difference between two c-string arrays
int tempDiff = compr(similarity[x]->matrix, similarity[y]->matrix);
if(tempDiff < min){
min = tempDiff;
holder = y;
}
}
similarity[holder]->inbound++;
}
}
Notice the second-th for loop, I've changed the start index.
Some other optimizations is running the run method on separate threads to take advantage of your 4 cores.
What is your goal, i.e. what do you want to do with the Hamming Distances (which is what they are) after you've got them? For example, if you are looking for the closest pair, or most distant pair, you probably can get an O(n ln n) algorithm instead of the O(n^2) methods suggested so far. (At n=175000, n^2 is 15000 times larger than n ln n.)
For example, you could characterize each 100-bit number m by 8 4-bit numbers, being the number of bits set in 8 segments of m, and sort the resulting 32-bit signatures into ascending order. Signatures of the closest pair are likely to be nearby in the sorted list. It is easy to lower-bound the distance between two numbers if their signatures differ, giving an effective branch-and-bound process as less-distant numbers are found.