Divide and Conquer Algorithms - divide-and-conquer

We are starting to work with Divide and conquer algorithms in my Data structures class and I am having a lot of trouble completely understanding what I am supposed to do. Below is the which is basically asking for me to write a program that sums an array, but it has to divide and conquer it down until the base is 4 which I assume means to to add the array together in chunks of 4 and then add all the chunks together. I dont even know where to begin. I just need a little explanation and understanding. The teacher wasnt much help. The array contains a line of numbers in the amount of a power of 2 less than a 1000
Problem
Write a divide-and-conquer algorithm for summing an array of n in-
tegers. The base case for this algorithm will be when the size of the
sub-problems are smaller or equal to 4 in which case you will use an
iterative loop to sum the integers of the sub-problems. You need to do
the following:

I would guess you're intended to write a recursive method that splits array A into two (or more) subarrays of half the array each, then passes the resulting arrays to itself in order to continue splitting until you have an array of size 4; then you can perform your sum and return the sum of those 4 units.

Let's not think about a programming language for a moment and think about what the approach would be in abstract.
Imagine if you were in a room with twenty stacks of paper, each with a single number on them. You're willing to do the work of bringing all of the numbers' totals together, but you realize that going one by one is going to take forever. So, you call in a friend and you each get ten stacks to each other to work on. The amount of work you've had to do by calling your friend went down by half.
You both realize that you're not going to get anywhere with ten stacks of paper, so each of you call another friend to lend a hand, leaving all four of you with five stacks apiece. Reasonable, but still overwhelming. So everyone, once again, calls another friend over and halves their stack with the other friends - leaving everyone with 2.5 stacks of papers with numbers on them.
You all agree that this is a reasonable amount of work to be done by one person, so you get to work adding the numbers together. Working from the last set of people that came in up to yourself, everyone returns the total sum of the numbers they had up until you have everyone else's sum, and you have your own. You sum these two together and arrive at your result.
This is the principle of divide and conquer: your work stack is divided into some fraction of work that you then make calls to other methods to do.
Here's a pseudoexample for you in Python.
def sum(*args):
if len(args) == 0: # Nothing in my list, so I'm done
return 0
elif len(args) == 1: # One thing in my list, return that
return args[0]
else: # Too much for me to handle; call in the Calvary!
return args[0] + sum(args[1:])

Related

Time Complexity of fibonacci series in Bottom Up approach(DP)

Algorithm in Bottom up approach
a[0]=0,a[1]=1
integer fibo(n)
if a[n]== null
a[n] = fibo(n-1) + fibo(n-2)
return a[n]
How this algorithm has the time limit of O(N)
For 5 it calls 8 times.
pass of fibnacci series in Bottom Up approach
fibo(5) calling 8 times to go top to down and also calling 8 times to return top from bottom. so total call is 8+8=16 of my view. So how the time complexity is O(N) it's unclear to me.
I found many similar questions answered here but all of those isn't related with my
interest.
Some of these are:
Time Complexity of Fibonacci Series
Time Complexity of Fibonacci Algorithm
Anyone help would be appreciated.Thanks
There are a couple of quick things to mention before answering your question about the time complexity. The reason for this is that the time complexity at least partially depends on these answers.
First, there seems to be a bug in your program as you have an array 'a' for the base conditions (Fibbinacci numbers 0 and 1) and some array 'm' which is set in the fibo function, but never used again. More importantly, when you reach n=1 or n=0, you return the value of m[n] which is entirely unknown. So, I'm going to assume the algorithm is rewritten as follows:
a[0]=0,a[1]=1
integer fibo(n)
if a[n]== null
a[n] = fibo(n-1) + fibo(n-2)
return a[n]
Okay, second problem. Let's assume that that a is always defined as at least n+1 integers. There needs to be enough room for the incoming data. This is important because c++ will let you overwrite values at the n+1th index. It's out of bounds and wrong, but c++ doesn't give those sorts of protections. It is up to you as the programmer to verify boundary conditions like that. (I'm assuming c++ because this is tagged with c++. The code looks more like python, which has its wrap-around indices which are problematic on their own.)
Third, let's assume that you don't start with a new array 'a' for each run of the algorithm. This is important because if a stores already-calculated values then you will save time on calculation by not having to re-evaluate those values. That time savings is a great thing even if it won't affect how I calculate time complexity.
Great. Let's get started with your question. Let's use the image below to answer it. When you start the algorithm at n you are going to make two recursive calls for fibo(n-1) and fibo(n-2) BUT they do not happen simultaneously. Instead the first call for fibo(n-1) takes place and must be 100% complete before the second call for fibo(n-2) begins. That call is represented by the green line from n-1 on the nth line to the n-1th line.
Now, those green lines apply to each recursion down the line until you reach the fibo(1) call. That call terminates early because a[n] is NOT null. Finally the second call for fibo(0) is executed and it also terminates early because a[n] is not null. Okay, so much for the first set of recursive calls.
As each recursive call returns, the second call (represented by the orange broken line) is made, but a[n] is no longer null, so that call terminates early and the call returns up to the next layer.
So, let's count the number of calls. From n to 1 is n-1 recursive calls. At the end there is one additional call to fibo(0) so that is n recursive calls. Then on the way up there are n-2 additional calls which terminate early. So, altogether we have 2n-2 calls which is O(n).
Of course, if you call fibo(k) and then fibo(k+x) you will only need to do the first 2x calls because everything from fibo(k) down is already known. It is a considerable savings after the initial investment. Any questions?
Regarding O(2n)=O(n), that is a good follow up. Big-O complexity rules say that we are interested in the order-of-magnitude when you compare efficiency. So, suppose that you were looking at a n=1000. O(n)=1000, O(2n)=2000, but O(n2)=1,000,000. O(n) is more or less the same as O(2n), but if you compare them with O(n2), that is a huge difference. Similarly, if you have O(n+1)=1001 that isn't much different from O(n). So, in general we say that the leading term, the most important value in the equation is what is important. We aren't really interested in extra terms. We aren't really interested in specific coefficients because they don't really affect the outcome.
If you still have questions, see this site for some additional information.
https://justin.abrah.ms/computer-science/big-o-notation-explained.html

What's the most efficient way to store a subset of column indices of big matrix and in C++?

I am working with a very big matrix X (say, 1,000-by-1,000,000). My algorithm goes like following:
Scan the columns of X one by one, based on some filtering rules, to identify only a subset of columns that are needed. Denote the subset of indices of columns be S. Its size depends on the filter, so is unknown before computation and will change if the filtering rules are different.
Loop over S, do some computation with a column x_i if i is in S. This step needs to be parallelized with openMP.
Repeat 1 and 2 for 100 times with changed filtering rules, defined by a parameter.
I am wondering what the best way is to implement this procedure in C++. Here are two ways I can think of:
(a) Use a 0-1 array (with length 1,000,000) to indicate needed columns for Step 1 above; then in Step 2 loop over 1 to 1,000,000, use if-else to check indicator and do computation if indicator is 1 for that column;
(b) Use std::vector for S and push_back the column index if identified as needed; then only loop over S, each time extract column index from S and then do computation. (I thought about using this way, but it's said push_back is expensive if just storing integers.)
Since my algorithm is very time-consuming, I assume a little time saving in the basic step would mean a lot overall. So my question is, should I try (a) or (b) or other even better way for better performance (and for working with openMP)?
Any suggestions/comments for achieving better speedup are very appreciated. Thank you very much!
To me, it seems that "step #1 really does not matter much." (At the end of the day, you're going to wind up with: "a set of columns, however represented.")
To me, what's really going to matter is: "just what's gonna happen when you unleash ("parallelized ...") step #2.
"An array of 'ones and zeros,'" however large, should be fairly simple for parallelization, while a more-'advanced' data structure might well, in this case, "just get in the way."
"One thousand mega-bits, these days?" Sure. Done. No problem. ("And if not, a simple array of bit-sets.") However-many simultaneously executing entities should be able to navigate such a data structure, in parallel, with a minimum of conflict . . . Therefore, to my gut, "big bit-sets win."
I think you will find std::vector easier to use. Regarding push_back, the cost is when the vector reallocates (and maybe copies) the data. To avoid that (if it matters), you could set vector::capacity to 1,000,000. Your vector is then 8 MB, insignificant compared to your problem size. It's only 1 order magnitude bigger than a bitmap would be, and a lot simpler to deal with: If we call your vector S and the nth interesting column i, then your column access is just x[S[i]].
(Based on my gut feeling) I'd probably go for pushing back into a vector, but the answer is quite simple: Measure both methods (they are both trivial to implement). Most likely you won't see a noticeable difference.

Comparison of two common comparison algorithms and their Big O help please

Today my professor gave us 2 take home questions as practice for upcoming array unit in C and I am wondering what exactly the sorting algorithm these 2 problems resemble and what their Big O is. Now, I am not coming here just expecting answers and I have ALREADY solved them, but I am not confident in my answers so I will post them udner each question and if I am wrong, please correct me and explain my error in thinking.
Question 1:
If we decide to go through an array's(box) element(folders) one at a time. Starting at the first element and comparing it with the next. Then if they are the same the comparison ends, however if both are not equal then it moves on to comparing the next two ELEMENTS [2] and [3]. This process is repeated and will stop once last two elements are compared and note that the array IS already sorted by last name and we are looking for same first name! Example: [ Harper Steven, Hawking John, Ingleton Steven]
My believed answer:
I beleive it is O(n) because it's just going over the elements of an array comparing array[0] to array[1] and then array[2] to array[3] ect ect. This process is linear and continues until the last two are compared. Definitely not logn because we aren't multiplying or diving by 2.
Final Question:
Suppose we have a box of folders each containing info on one person. If we were to want to look for people with same first name, we could first start by placing a sticker on the first folder in the box and then going through the folders after it in an orderly fashion until we find person with same name. If we find a folder with same name, we move that folder next to the folder with a sticker. Once we find ONE case where two people have same name, we stop and go to sleep because we're lazy. If the first search fails however, we simply remove sticker and place it on next folder and then continue as we did earlier. We repeat this process until sticker is on last folder in a scenario where we have no two people with same name.
This array is NOT sorted and compares the first folder with sticker folder[0] with the next i folder[i] elements.
My answer:
I feel like this can't be O(n), but maybe O(n^2) where it kinda feels like we have an array and then we keep repeating the process where n is proportional to the square of the input(folders). I could be wrong here through >.>
You're right on both questions… but it would help to explain things a bit more rigorously. I don't know what the standards of your class are; you probably don't need an actual proof, but showing more detailed reasoning than "we aren't multiplying or dividing by two" never hurts. So…
In the first question, there's clearly nothing happening here but comparisons, so that's what we have to count.
And the worst case is obviously that you have to go through the whole array.
So, in that case, you have to compare a[0] == a[1], then a[1] == a[2], …, a[N-1] == a[N]. For each of N-1 elements, there's 1 comparison. That's N-1 steps, which is obviously O(N).
The fact that the array is sorted turns out to be irrelevant here. (Of course since they're not sorted by your search key—that is, they're sorted by last name, but you're comparing by first name—that was already pretty obvious.)
In the second question, there are two things happening here: comparisons, and then moves.
For the comparisons, the worst case is that you have to do all N searches because there are no matches. As you say, we start with a[0] vs. a[1], …, a[N]; then a[1] vs. a[2], …, a[N], etc. So, N-1 comparisons, then N-2, and so on down to 0. So the total number of comparisons is sum(0…N-1), which is N*(N-1)/2, or N^2/2 - N/2, which is O(N^2).
For the moves, the worst case is that you find a match between a[0] and a[N]. In that case, you have to swap a[N] with a[N-1], then a[N-1] with a[N-2], and so on until you've swapped a[2] with a[1]. So, that's N-1 swaps, which is O(N), which you can ignore because you've already got an O(N^2) term.
As a side note, I'm not sure from your description whether you're talking about an array from a[0…N], or an array of length N, so a[0…N-1], so there could be an off-by-one error in both of the above. But it should be pretty easy to prove to yourself that it doesn't make a difference.
Scenario 2, a method of finding two matching items of arbitrary value, is indeed “quadratic”. Each pass looking for a match of one candidate against all the rest of the elements is O(n). But you repeat that n times. The value of n drops as you go so a detailed number of comparisons would be closer to n+(n-1)+(n-2)+ … 1 which is (n+1)×(n/2) or ½(n²+n) but all we care about is the overall shape of the curve so don't worry about the lower order terms or the coefficients. It's O(n²).

Fastest way to copy into an array random elements from another array C++

The title almost tells everything,but I will exemplify this: suppose that you have an array a of chars, and another array b also of chars. Is there a better way to put in a only the char located at prime positions in b? Suppose that we have an array with prime positions.
For now my naive code looks like this.
for(i = 0; i < n; i++)
a[i] = b[j + prime[i]];
Here prime[i] stores the prime positions of b and b is much larger than a,j is an arbitrary position in b(there will not be an out of bound problem because j+prime[i] does not exceed border of b).
What is better? One way is: If the prime[] locations are known at compile time, then we could add a prefetch to get the cache lines in ahead of time.
This is making the memory access time better.
You can either do this when you read (or copy) values into the array, using a prime function that tells you if a number is prime or not.
A way I sketched quickly is to generate prime numbers until they reach your array capacity and simply iterate through them and copy the desired elements from your a array. I can think of several ways of optimizing this, such as having a "preprocess" function that generates prime numbers in your program so you can reuse the list.
The prime number list will get cached and it will take a lot less time to be accessed(it s unlikely that you have an extremely huge prime number list)
Let's look at this from an algorithmic perspective.
You want to perform a hash function on each of the entries in array A. Assuming that you know nothing about the state of the items in array A, then that places the lower bound of run time for the algorithm at O(n), linear time. You must iterate through every member because you don't have any more information that could assist you in "skipping" some elements or optimizing the process.
That said, the challenge then becomes keeping the algorithm down at O(n). The code you demonstrate does do this, assuming you then follow up with copying the non-prime numbers in the same manner. So for the copying step, no there is not a way to make this any faster from an algorithm point of view. That doesn't mean that how you perform the hashing step won't affect the speed, though.

Fast adding random variables in C++

Short version: how to most efficiently represent and add two random variables given by lists of their realizations?
Mildly longer version:
for a workproject, I need to add several random variables each of which is given by a list of values. For example, the realizations of rand. var. A are {1,2,3} and the realizations of B are {5,6,7}. Hence, what I need is the distribution of A+B, i.e. {1+5,1+6,1+7,2+5,2+6,2+7,3+5,3+6,3+7}. And I need to do this kind of adding several times (let's denote this number of additions as COUNT, where COUNT might reach 720) for different random variables (C, D, ...).
The problem: if I use this stupid algorithm of summing each realization of A with each realization of B, the complexity is exponential in COUNT. Hence, for the case where each r.v. is given by three values, the amount of calculations for COUNT=720 is 3^720 ~ 3.36xe^343 which will last till the end of our days to calculate:) Not to mention that in real life, the lenght of each r.v. is gonna be 5000+.
Solutions:
1/ The first solution is to use the fact that I am OK with rounding, i.e. having integer values of realizations. Like this, I can represent each r.v. as a vector and for at the index corresponding to a realization I have a value of 1 (when the r.v. has this realization once). So for a r.v. A and a vector of realizations indexed from 0 to 10, the vector representing A would be [0,1,1,1,0,0,0...] and the representation for B would be [0,0,0,0,0,1,1,1,0,0,10]. Now I create A+B by going through these vectors and do the same thing as above (sum each realization of A with each realization of B and codify it into the same vector structure, quadratic complexity in vector length). The upside of this approach is that the complexity is bound. The problem of this approach is that in real applications, the realizations of A will be in the interval [-50000,50000] with a granularity of 1. Hence, after adding two random variables, the span of A+B gets to -100K, 100K.. and after 720 additions, the span of SUM(A, B, ...) gets to [-36M, 36M] and even quadratic complexity (compared to exponential complexity) on arrays this large will take forever.
2/ To have shorter arrays, one could possibly use a hashmap, which would most likely reduce the number of operations (array accesses) involved in A+B as the assumption is that some non-trivial portion of the theoreical span [-50K, 50K] will never be a realization. However, with continuing summing of more and more random variables, the number of realizations increases exponentially while the span increases only linearly, hence the density of numbers in the span increases over time. And this would kill the hashmap's benefits.
So the question is: how can I do this problem efficiently? The solution is needed for calculating a VaR in electricity trading where all distributions are given empirically and are like no ordinary distributions, hence formulas are of no use, we can only simulate.
Using math was considered as the first option as half of our dept. are mathematicians. However, the distributions that we're going to add are badly behaved and the COUNT=720 is an extreme. More likely, we are going to use COUNT=24 for a daily VaR. Taking into account the bad behaviour of distributions to add, for COUNT=24 the central limit theorem would not hold too closely (the distro of SUM(A1, A2, ..., A24) would not be close to normal). As we're calculating possible risks, we'd like to get a number as precise as possible.
The intended use is this: you have hourly casflows from some operation. The distribution of cashflows for one hour is the r.v. A. For the next hour, it's r.v. B, etc. And your question is: what is the largest loss in 99 percent of cases? So you model the cashflows for each of those 24 hours and add these cashflows as random variables so as to get a distribution of the total casfhlow over the whole day. Then you take the 0.01 quantile.
Try to reduce the number of passes required to make the whole addition, possibly reducing it to a single pass for every list, including the final one.
I don't think you can cut down on the total number of additions.
In addition, you should look into parallel algorithms and multithreading, if applicable.
At this point, most processors are able to perform additions in parallel, given proper instrucions (SSE), which will make the additions many times faster(still not a cure for the complexity problem).
As you said in your question, you're going to need an awful lot of computation to get the exact answer. So it's not going to happen.
However, as you're dealing with random values, it would be possible to apply some mathmatics to the problem. Wouldn't the result of all these additions result in something that approaches the normal distribution? For example, consider rolling a single dice. Each number has equal probability so the realisations don't follow a normal distribution (actually, they probably do, there was a program on BBC4 last week about it and it showed that lottery balls had a normal distribution to their appearance). However, if you roll two dice and sum them, then the realisations do follow a normal distribution. So I think the result of your computation is going to approximate a normal distribution so it becomes a problem of finding the average value and the sigma value for a given set of inputs. You can workout the upper and lower bounds for each input as well as their averages and I'm sure a bit of Googling will provide methods for applying functions to normal distributions.
I guess there is a corollary question and that is what the results are used for? Knowing how the results are used will inform the decision on how the results are created.
Ignoring the programmatic solutions, you can cut down the total number of additions quite significantly as your data set grows.
If we define four groups W, X, Y and Z, each with three elements, by your own maths this leads to a large number of operations:
W + X => 9 operations
(W + X) + Y => 27 operations
(W + X + Y) + Z => 81 operations
TOTAL: 117 operations
However, if we assume a strictly-ordered definition of your "add" operation so that two sets {a,b} and {c,d} always result in {a+c,a+d,b+c,b+d} then your operation is associative. That means that you can do this:
W + X => 9 operations
Y + Z => 9 operations
(W + X) + (Y + Z) => 81 operations
TOTAL: 99 operations
This is a saving of 18 operations, for a simple case. If you extend the above to 6 groups of 3 members, the total number of operations can be dropped from 1089 to 837 - almost 20% saving. This improvement is more pronounced the more data you have (more sets or more elements will give more savings).
Further, this opens the problem to better parallelisation: if you have 200 groups to process, you can start by combining the 100 pairs in parallel, then the 50 pairs or results, then 25, etc. This will allow a large degree of parallelism that should give you much better performance. (For example, 720 sets would be added in ~10 parallel operations as each parallel add will allow increasing COUNT by a factor of 2.)
I'm absolutely no expert on this, but it would seem an ideal problem for using the parallel procesing capability of a typical GPU - my understanding is that something like CUDA would make short work of processing all these calculations in parallel.
EDIT: If your real question is "what's your largest loss" then this is a much easier problem. Given that every value in the ultimate set is the sum of one value from each "component" set, your biggest loss will generally be found by combining the lowest value from each component set. Finding these lower values (one value per set) is a much simpler job, and you then only need sum together that limited set of values.
There are basically two methods. An approximative one and an exact one...
Approximative method models the sum of random variables by a lot of samplings. Basically, having random variables A, B we randomly sample from each r.v. 50K times, add the sampled values (here SSE can help a lot) and we have a distribution of A+B. This is how mathematicians would do this in Mathematica.
Exact method utilizes something Dan Puzey proposed, namely summing only some small portion of each r.v.'s density. Let's say we have random variables with the following "densities" (where each value is of the same likelihood for simplicity sake)
A = {-5,-3,-2}
B = {+0,+1,+2}
C = {+7,+8,+9}
The sum of A+B+C is going to be
{2,3,3,4,4,4,4,5,5,5,5,5,6,6,6,6,6,6,7,7,7,7,7,8,8,8,9}
and if I want to know the whole distribution precisely, I have no other choice than summing each elem of A with each elem of B and then each elem of this sum with each elem of C. However, if I only want the 99% VaR of this sum, i.e. 1% percentile of this sum, I only have to sum the smallest elements of A,B,C.
More precisely, I will take nA,nB,nC smallest elements from each distribution. To determine nA,nB,nC let's set these to 1 first. Then, increase nA by one if A[nA] = min( A[nA], B[nB], C[nC]) (counting on that A,B,C are sorted). This way, I can get the nA, nB, nC smallest elements of A,B,C which I will have to sum together (each with each other) and take the X-th smallest sum (where X is 1% multiplied by total combination count of sums, i.e. 3*3*3 for A,B,C). This also tells when to stop increasing nA,nB,nC - stop when nA*nB*nC > X.
However, like this I am doing the same redundancy again, i.e. I am calculating the whole distribution of A+B+C left of the 1% percentile. Even this will be MUCH shorter than calculating the whole distro of A+B+C, however. But I believe there should be a simple iterative algo to tell exaclty the the given VaR number in O(a*b) where a is the number of added r.v.s and b is the max number of elements in the density of each r.v.
I will be glad for any comments on whether I am correct.