I have a sympy sum:
import sympy
x = sympy.IndexedBase('x')
n = sympy.symbols('n')
i = sympy.symbols('i', cls=sympy.Idx)
my_sum = sympy.summation(x[i], (i,1,n))
In my various calculations and whatnot, I sometimes end up with nested sums. Sometimes, these sums have "free variables", and sometimes not. For example, I might end up with the following:
my_double_sum = sympy.summation(my_sum, (i,1,n))
Now, since my_sum doesn't have a "free" i from the persepective of the outer sum, then this should simplify to:
n*Sum(x[i], (i, 1, n))
However, sympy.simplify(my_double_sum) gives:
Sum(x[i], (i, 1, n), (i, 1, n))
How can I make sympy simplify summations intelligently with respect to the free summand indices?
This behavior did look a bit weird. Here is why it happens (and see the bottom for what is a bug and what is not).
First, summation is just a syntactic sugar to creating a Sum and running .doit(). Sum's doit uses among the rest eval_sum, which extracts the limit bounds when the limit variable is not a free variable of the function, and a simple checks shows that indeed this should hold here! (and as you showed, it doesn't):
>>> i in my_sum.free_symbols
False
so I did a little digging into the summation module.
Now, Sum has a parent called AddWithLimits. In its class creator, it uses the _common_new function that denests the function it receives.
This turns Sum(Sum(x[i], (i, 1, n)), (i, 1, n)) into Sum(x[i], (i, 1, n), (i, 1, n)) (way before doit is invoked), so the internal function is x[i] and not the Sum object you defined in my_sum (which doesn't pop up at first sight), so the limit variable actually is a free variable of the function.
I tried to manually cancel the denesting, commenting the three lines under the comment # denest any nested calls, and indeed, I received
n*Sum(x[i], (i, 1, n))
Of course, simply changing that would probably hurt other parts of the code, since the denesting is assumed in many other ExprWithLimits functions. Whether this is an intended behvior can be argued in favor, but if you think it should be covered, it probably has to be specified explicitly inside eval_sum as a special case.
However, I would expect summation with another variable to be simplified normally, like
summation(x[i], (i, 1, n), (j, 1, n))
which does not. This, I suspect, is more of an intended behavior (which is caused because the first iteration over eval_sum returns None, so it skips the j symbol expansion).
Related
I am new to using OpenMP. I am trying to parallelize a nested loop, and so far I have something of this form...
#pragma omp parallel for
for (j=0;j <m; j++) {
some work;
for (i= 0; i < n ; i++) {
p =b[i];
if (P< 0 && k < m) {
a[k] = c[i]; k++ ;
} else {
x=c[i];
}
}
some work
}
The outer loop is in parallel, and the inner loop updates k. The current value of k is needed for the other threads to update a[k] correctly. The problem is that all of the threads are updating a[k], but the proper order of k is not kept.
Some threads will update k and a[k], and some will not. How do I communicate the latest k between threads to update a[k] properly, since c[i] will have different values for each thread?
For example, when it runs serially, the program might set the first seven values of a to {1,3,5,7,3,9,13} and terminate with k equal to 7, but when done parallel, produces different results, or results in a different (therefore wrong) order.
How do I keep the same order and ensure parallelism at the same time?
Note: this answer was completely rewritten in light of OP clarifications. The original answer text is at the end.
How do I keep the same order and ensure parallelism at the same time?
Order dependency is antithetical to parallelism, as running operations in parallel inherently entails relaxing the relative order in which they are performed. Not all computations can be effectively parallelized.
Your case is not an exception. The second and each subsequent iteration of your outer loop needs to use the final value of k (among other things) computed by the previous iteration. How can it get that? Only by performing the previous iteration first. What room does that leave for concurrent operation? None. Concurrency is not the same thing as parallelism, but it is one of the main motivations for parallelism, because that's how parallelism yields improvements in elapsed time.
With no scope for concurrency, parallelism is actively counterproductive for you. Suppose you made the whole body of the outer loop a critical section, so that there was no concurrency in fact (as your present code requires) and no data races involving k. Then you would still pay the overhead for parallelism, get no speedup in return, and probably still get the wrong results because of evaluations of the outer-loop body being performed in the wrong order.
It may be that the whole thing can be rewritten to reduce or remove the data dependencies that prevent effective parallelization of the computation, or it may not. We haven't enough information to determine, as it depends in part on the details of "some work" and on the significance of the data. Probably you would need an altogether different algorithm for producing the desired results.
> Instead of giving a[n]={0,1,2,3,.......n} , it gives me garbage values for a when I use the reduction clause. I need the total sum of K, hence the reduction clause.
There is a closed-form equation for the sum of consecutive integers, and it has especially simple form when the first integer in the list is 0 or 1. In particular, the sum of the integers from 0 to n, inclusive, is n * (n + 1) / 2. You do not need a reduction for this.
If you wanted to use a reduction anyway, then you need to understand that it doesn't work the way you seem to think it does. What you get is a separate, private copy of the reduction variable for each thread executing the parallel construct, with the per thread (not per iteration) final values of those independant variables combined according to the reduction operator. Thus, if you really want to do the computation via an OpenMP reduction, then you would need to restructure the loop something like this:
#pragma omp parallel for reduction (+:k)
for (i = 0; i < 10; i++) {
a[i] = i;
k += i;
}
That assumes that the value of k is 0 immediately prior to the loop, as you indeed seem to be doing. If that were not a safe assumption then you would need something like
type_of_k k0 = k;
k = 0;
#pragma omp parallel for reduction (+:k)
for (i = 0; i < 10; i++) {
a[k0 + i] = i;
k += k0 + i;
}
Note that in either case, not only does that set up the reduction correctly, but it also breaks the data dependency between loop iterations that was previously carried by the expression k++.
It sounds like you're essentially filling in a with a filter of entries from c, and want to preserve their order. If this is the only use k has, some other methods spring to mind:
Always write a[i], but use a mark indicating unused values where the P predicate wasn't satisfied. This preserves order, but requires a larger a you can compact in a second pass.
Write an a_i array storing which index each entry belonged to. This still requires a #pragma omp atomic k_local = k++ access to k, and a second sort to restore order. And you'd need both a and a_i to be the full size again, or you might miss entries, so in all a terrible workaround.
Even with some sequential dependencies you can do optimizations, e.g. a scan to calculate what k would be for each i could be done in O(log n) rather than O(n). E.g. parallel prefix sum, openmp discussion on stack overflow. This sort of thing is what OpenMP's ordered depend is for, I believe. Anyhow, this leads to the third solution:
Generate a k array, holding the values k will have for each iteration, such that those threads that will write write to the correct places. This requires scanning the predicate.
It is useful to have higher level constructs like map, scan and reduce when planning out algorithms.
I am a bit confused on the topic of calculating complexity.
I know about Big O and also how to calculate the complexity of loops (nested also).
Suppose I have a program with 3 loops running from 1 to n
for (int i=0;i<n;i++)
{
cout << i ;
}
Now if I ran my CPP code having 3 for loops, will it take 3*n time?
Will the CPP compiler run all the 3 loops at the same time or will do it one after another?
I am very confused on this topic. Please help!
Now if I ran my CPP code having 3 for loops, will it take 3*n time?
Yes, assuming that the time of each loop iteration is the same, but in Big O notation O(3*n) == O(n), so the complexity is still linear.
Will the CPP compiler run all the 3 loops at the same time or will do it one after another?
Implicit concurrency requires a compiler to be 100% sure that parallelizing code will not change the outcome. It can be (and it is, see comments) done for simple operations, but cout << i is unlikely to be parallelized. It can be optimized in different ways however, e.g. if n is known at compile time, compiler could generate the whole string in one go and change the loop into cout << "123456...";.
Also, time complexity and concurrency are rather unrelated topics. Code executed on 20 threads will have the same complexity as code executed on one thread, it will just be faster (or not).
Now if I ran my CPP code having 3 for loops, will it take 3*n time?
Run a thousand loops and still it would be O(n), since while calculating the upper bound time complexity of a function any constant is neglected. So O(n*m) will always be O(n) if m doesn't depend on input size.
Also, the compiler won't run them at the same time, but sequentially one after the other(unless multi-threading, ofc). But even then, 3,10 or 1000 loops one after another will probably be considered O(n) as per the definition as long as the number of times you loop is not dependent on input size.
How to calculate complexity if a code contains multiple n complexity loops?
To understand Big-O notation and asymptotic complexity, it can be useful to resort at least to semi-formal notation.
Consider the problem of finding and upper bound on the asymptotic time complexity a function f(n) based on the growth of n.
To our help, lets loosely define a function or algorithm f being in O(g(n)) (to be picky, O(g(n)) being a set of functions, hence f ∈ O(...), rather than the commonly misused f(n) ∈ O(...)) as follows:
If a function f is in O(g(n)), then c · g(n) is an upper
bound on f(n), for some non-negative constant c such that f(n) ≤ c · g(n)
holds, for sufficiently large n (i.e. , n ≥ n0 for some constant
n0).
Hence, to show that f ∈ O(g(n)), we need to find a set of (non-negative) constants (c, n0) that fulfils
f(n) ≤ c · g(n), for all n ≥ n0, (+)
Let's consider your actual problem
void foo(int n) {
for (int i = 0; i < n; ++i) { std::cout << i << "\n"; }
for (int i = 0; i < n; ++i) { std::cout << i << "\n"; }
for (int i = 0; i < n; ++i) { std::cout << i << "\n"; }
}
and for analyzing the asymptotic behaviour of foo based on growth on n, consider std::cout << i << "\n"; as our basic operation. Thus, based on this definition, foo contains 3 * n basic operations, and we may consider foo mathematically as
f(n) = 3 * n.
Now, we need to find a g(n) and some set of constants c and n0 such that (+) holds. For this particular analysis this is nearly trivial; insert f(n) as above in (+) and let g(n) = n:
3 * n ≤ c · g(n), for all n ≥ n0, [let g(n) = n]
3 * n ≤ c · n, for all n ≥ n0, [choose c = 3]
3 * n ≤ 3 · n, for all n ≥ n0.
The latter holds for any valid n, and we may arbitrarily choose n0 = 0. Thus, as per our definition above of a function f being in O(g(n)), we have showed that f is in O(n).
It is apparent that even if we multiply the loop in foo a multiple of times, as long as this multiple is constant (and not dependent on n itself), we can always find a degenerate number of constants c and n0 that will fulfill (+) for g(n) = n, thus showing that the function f describing the number of basic operations in foo based on n is upper bounded by linear growth.
Now if I ran my CPP code having 3 for loops, will it take 3*n time?
However, it is essential to understand that Big-O notation describes the upper bound on the asymptotic behaviour of a mathematically described algorithm, or e.g. of programmatically implemented function that based on the definition of a basic operation can be described as the former. It does not, however, present an accurate description what runtime you may expect of different variations of how to implement a function. Cache locality, parallelism/vectorization, compiler optimizations and hardware intrinsics, inaccuracy in describing the basic operation are just a few of many factors that make the asymptotic complexity disjoint from actual runtime. The linked list data structure is good example of one where asymptotic analysis is not likely to give a good view of runtime performance (as loss of cache locality, actual size of lists and so on will likely have a larger effect).
For actual runtime of your algorithms, in case you are hitting a bottle neck, actually measuring on target hardware with product representative compiler and optimization flags is key.
I'm trying to get a vector of polynomials, but within the vector have each polynomial defined by a function in Pari.
For example, I want to be able to output a vector of this form:
[f(x) = x-1 , f(x) = x^2 - 1, f(x) = x^3 - 1, f(x) = x^4 - 1, f(x) = x^5 - 1]
A simple vector construction of vector( 5, n, f(x) = x^n-1) doesn't work, outputting [(x)->my(i=1);x^i-1, (x)->my(i=2);x^i-1, (x)->my(i=3);x^i-1, (x)->my(i=4);x^i-1, (x)->my(i=5);x^i-1].
Is there a way of doing this quite neatly?
Update:
I have a function which takes a polynomial in two variables (say x and y), replaces one of those variables (say y) with exp(I*t), and then integrates this between t=0 and t=1, giving a single variable polynomial in x: int(T)=intnum(t=0,1,T(x,exp(I*t)))
Because of the way this is defined, I have to explicitly define a polynomial T(x,y)=..., and then calculate int(T). Simply putting in a polynomial, say int(x*y)-1, returns:
*** at top-level: int(x*y-1)
*** ^----------
*** in function int: intnum(t=0,1,T(x,exp(I*t)))
*** ^--------------
*** not a function in function call
*** Break loop: type 'break' to go back to GP prompt
I want to be able to do this for many polynomials, without having to manually type T(x,y)=... for every single one. My plan is to try and do this using the apply feature (so, putting all the polynomials in a vector - for a simple example, vector(5, n, x^n*y-1)). However, because of the way I've defined int, I would need to have each entry in the vector defined as T(x,y)=..., which is where my original question spawned from.
Defining T(x,y)=vector(5, n, x^n*y-1) doesn't seem to help with what I want to calculate. And because of how int is defined, I can't think of any other way to go about trying to tackle this.
Any ideas?
The PARI inbuilt intnum function takes as its third argument an expression rather than a function. This expression can make use of the variable t. (Several inbuilt functions behave like this - they are not real functions).
Your int function can be defined as follows:
int(p)=intnum(t=0, 1, subst(p, y, exp(I*t)))
It takes as an argument a polynomial p and then it substitutes for y when required to do so.
You can then use int(x*y) which returns (0.84147098480789650665250232163029899962 + 0.45969769413186028259906339255702339627*I)*x'.
Similarly you can use apply with a vector of polynomials. For example:
apply(int, vector(5, n, x^n*y-1))
Coming back to your original proposal - it's not technically wrong and will work. I just wouldn't recommend it over the subst method, but perhaps if you are were wanting to perform numerical integration over a class of functions that were not representable as polynomials. Let's suppose int is defined as:
int(T)=intnum(t=0,1,T(x,exp(I*t)))
You can invoke it using the syntax int((x,y) -> x*y). The arrow is the PARI syntax for creating an anonymous function. (This is the difference between an expression and a function - you cannot create your own functions that work like PARI inbuilt functions)
You may even use it with a vector of functions:
apply(int, vector(5, n, (x,y)->x^n*y-1))
I am using the syntax (x,y)->x^n*y-1 here which is preferable to the f(x,y)=x^n*y-1 you had in your question, but they are essentially the same. (the latter form also defines f as a side effect which is not wanted so it is better to use anonymous functions.
I need to understand how this recursion work, I understand simple recursion examples but more advanced ones is hard. Even thought there are just two lines of code I got problem with... the return statement itself. I just draw a blank on how this works, especially the and/or operator. Any insight is very welcome.
bool subsetSumExists(Set<int> & set, int target) {
if (set.isEmpty()) {
return target == 0;
} else {
int element = set.first();
Set<int> rest = set - element;
return subsetSumExists(rest, target)
|| subsetSumExists(rest, target - element);
}
}
Recursive code is normally coupled with the concept of reduction. In general, reduction is a means to reduce an unknown problem to a known one via some transformation.
Let's take a look at your code. You need to find whether a given target sum can be constructed from an elements of the input data set.
If the data set is empty, there is nothing to do besides comparing the target sum to 0.
Otherwise, let's apply the reduction. If we choose a number from the set, there can actually be 2 possibilities - the chosen number participates in the sum you're seeking or it doesn't. No other possibilities here (it's very important to cover the full spectrum of possibilities!). In fact, it doesn't really matter which data element is chosen as long as you can cover all the possibilities for the remaining data.
First case: the number doesn't participate in the sum. We can reduce the problem to a smaller one, with data set without the inspected element and the same target sum.
Second case: the number participates in the sum. We can reduce the problem to a smaller one, with data set without the inspected element and the requested sum decreased by the value of the number.
Note, you don't know at this point whether any of these cases is true. You just continue reducing them until you get to the trivial empty case where you can know for sure the answer.
The answer to the original question would be true if it's true for any of these 2 cases. That's exactly what operator || does - it will yield true if any of its operands (the outcome of the 2 cases) are true.
|| is logical OR. It's evaluated left-to-right and short-circuited.
This means that in an expression A || B, A is evaluated first. If it's true, the entire expression is true and no further evaluation is done. If A is false, B is evaluated and the expression gets the value of B.
In your example, A is "try getting the same sum without using the 1st element from the set". B is "use the 1st element from the set, which decreases the total left to sum, and try to get that with the rest of the element."
Lets first look at algorithm..
The base case(i.e the case in which recursion terminates) is when the set is empty.
Otherwise the program takes the first elements subtracts it from the set.
Now it will call subsetSumExists(rest, target) and check if its true,
if it is it will return true otherwise it will call
subsetSumExists(rest, target - element) and return whatever it
returns.
In simple terms, it will this call subsetSumExists(rest, target - element) only if first one subsetSumExists(rest, target) returns false.
Now lets try to dry run this code with a small sample set of {3,5} and a sum of 8. I'll call the function sSE from now on
sSE({3,5}, 8) => "sSE({5}, 8) || sSE({5},(8-3))"
sSE({5}, 8) => sSE({}, 8) || sSE({}, (8-5))
sSE({}, 8) => false.. now will call sSE({}, (8-5))
sSE({}, 3) => false.. now will call sSE({5}, (8-3))
sSE({5}, 5) => sSE({}, 5} || sSE({}, (5-5))
sSE({}, 5) => false.. now will call sSE({}, (5-5))
sSE({}, 0) => true.. ends here and return true
To understand recursion, you need to understrand recursion.
To do that, you need to think recusively.
In this particular case.
For any: subsetSum(set, target)
If set is empty AND target is 0, then subsetSum exists
Otherwise, remove first element of the set. check if subdetSum(set, target) exists OR subdetSum(set, target - removed_element) exists (using step 0)
The set subtraction looks a strange syntax but I will assume it means pop() on the element.
It "works" through finding every possible combination although it is exponential.
In the || statement, the LHS is the sum including the current element and the RHS is the sum excluding it. So you will get, down the exponential tree, every combination of each element either switched on or off.
Exponential, by the way, means that if you have 30 elements it will produce 2 to the power of 30, i.e. 0x40000000 or close to a billion combinations.
Of course you may well run out of memory.
If it finds the solution it might not run through all 2^N cases. If there is no solution it will always visit them all.
If I speak for myself, difficulty in understanding of the problem stems from || operator. Let's glance at bottom return statement of same code with another way,
if (subsetSumExists(rest, target - element))
return true;
if (subsetSumExists(rest, target))
return true;
return false;
What would the big O notation of the function foo be?
int foo(char *s1, char *s2)
{
int c=0, s, p, found;
for (s=0; s1[s] != '\0'; s++)
{
for (p=0, found=0; s2[p] != '\0'; p++)
{
if (s2[p] == s1[s])
{
found = 1;
break;
}
}
if (!found) c++;
}
return c;
}
What is the efficiency of the function foo?
a) O(n!)
b) O(n^2)
c) O(n lg(base2) n )
d) O(n)
I would have said O(MN)...?
It is O(n²) where n = max(length(s1),length(s2)) (which can be determined in less than quadratic time - see below). Let's take a look at a textbook definition:
f(n) ∈ O(g(n)) if a positive real number c and positive integer N exist such that f(n) <= c g(n) for all n >= N
By this definition we see that n represents a number - in this case that number is the length of the string passed in. However, there is an apparent discrepancy, since this definition provides only for a single variable function f(n) and here we clearly pass in 2 strings with independent lengths. So we search for a multivariable definition for Big O. However, as demonstrated by Howell in "On Asymptotic Notation with Multiple Variables":
"it is impossible to define big-O notation for multi-variable functions in a way that implies all of these [commonly-assumed] properties."
There is actually a formal definition for Big O with multiple variables however this requires extra constraints beyond single variable Big O be met, and is beyond the scope of most (if not all) algorithms courses. For typical algorithm analysis we can effectively reduce our function to a single variable by bounding all variables to a limiting variable n. In this case the variables (specifically, length(s1) and length(s2)) are clearly independent, but it is possible to bound them:
Method 1
Let x1 = length(s1)
Let x2 = length(s2)
The worst case scenario for this function occurs when there are no matches, therefore we perform x1 * x2 iterations.
Because multiplication is commutative, the worst case scenario foo(s1,s2) == the worst case scenario of foo(s2,s1). We can therefore assume, without loss of generality, that x1 >= x2. (This is because, if x1 < x2 we could get the same result by passing the arguments in the reverse order).
Method 2 (in case you don't like the first method)
For the worst case scenario (in which s1 and s2 contain no common characters), we can determine length(s1) and length(s2) prior to iterating through the loops (in .NET and Java, determining the length of a string is O(1) - but in this case it is O(n)), assigning the greater to x1 and the lesser to x2. Here it is clear that x1 >= x2.
For this scenario, we will see that the extra calculations to determine x1 and x2 make this O(n² + 2n) We use the following simplification rule which can be found here to simplify to O(n²):
If f(x) is a sum of several terms, the one with the largest growth rate is kept, and all others omitted.
Conclusion
for n = x1 (our limiting variable), such that x1 >= x2, the worst case scenario is x1 = x2.
Therefore: f(x1) ∈ O(n²)
Extra Hint
For all homework problems posted to SO related to Big O notation, if the answer is not one of:
O(1)
O(log log n)
O(log n)
O(n^c), 0<c<1
O(n)
O(n log n) = O(log n!)
O(n^2)
O(n^c)
O(c^n)
O(n!)
Then the question is probably better off being posted to https://math.stackexchange.com/
In big-O notation, we always have to define what the occuring variables mean. O(n) doesn't mean anything unless we define what n is. Often, we can omit this information because it is clear from context. For example if we say that some sorting algorithm is O(n log(n)), n always denotes the number of items to sort, so we don't have to always state this.
Another important thing about big-O notation is that it only gives an upper limit -- every algorithm in O(n) is also in O(n^2). The notation is often used as meaning "the algorithm has the exact asymptotic complexity given by the expression (up to a constant factor)", but it's actual definition is "the complexity of the alogrithm is bounded by the given expression (up to a constant factor)".
In the example you gave, you took m and n to be the respective lengths of the two strings. With this definition, the algorithm is indeed O(m n). If we define n to be the length of the longer of the two strings though, we can also write this as O(n^2) -- this is also an upper limit for the complexity of the algorithm. And with the same definition of n, the algorithm is also O(n!), but not O(n) or O(n log(n)).
O(n^2)
The relevant part of the function, in terms of complexity, is the nested loops. The maximum number of iterations is the length of s1 times the length of s2, both of which are linear factors, so the worst-case computing time is O(n^2), i.e. the square of a linear factor. As Ethan said, O(mn) and O(n^2) are effectively the same thing.
Think of it this way:
There are two inputs. If the function simply returned, then it's performance is unrelated to the arguments. This would be O(1).
If the function looped over one string, then the performance is linearly related to the length of that string. Therefore O(N).
But the function has a loop within a loop. The performance is related to the length of s1 and the length of S2. Multiply those lengths together and you get the number of loop iterations. It's not linear any more, it follows a curve. This is O(N^2).