I have the following 'issue' with sympy at the moment:
I have a symbolic expression like M = matrix([pi*a, sin(1)*b]) which I want to lambdify and pass to a numerical optimizer. The issue is that the optimizer needs the function to input/output numpy arrays of shape (n,) and specifically NOT (n,1).
Now I have been able to achieve this with the following code (MWE):
import numpy as np
import sympy as sp
a, b = sp.symbols('a, b')
M = sp.Matrix([2*a, b])
f_tmp = sp.lambdify([[a,b]], M, 'numpy')
fun = lambda x: np.reshape( f_tmp(x), (2,))
Now, this is of course extremely ugly, since the reshape needs to be applied every time fun is evaluated (which might be LOTS of times). Is there a way to avoid this problem? The Matrix class is by definition always 2 dimensional. I tried using sympy's MutableDenseNDimArray-class, but they don't work in conjunction with lambdify. (symbolic variables don't get recognized)
One way is to convert a matrix to a nested list and take the first row:
fun = sp.lambdify([[a, b]], M.T.tolist()[0], 'numpy')
Now fun([2, 3]) is [4, 3]. This is a Python list, not a NumPy array, but optimizers (at least those in SciPy) should be okay with that.
One can also do
fun = sp.lambdify([[a, b]], np.squeeze(M), 'numpy')
which also returns a list.
In my test the above were equally fast, and faster than the version with a wrapping function (be it np.squeeze or np.reshape): about 6 µs vs 9 µs. It seems the gain is in eliminating one function call.
Related
I am trying to symbolically write the hypergeometric 1F1 function through the sympy. But I don't really understand what numbers should be in brackets to designate this particular hypergeometric function (1F1)?
import sympy as sp
y = sp.functions.special.hyper.hyper([...,...],[...],x)
I am trying to understand about sympy's symbolic functions:
import sympy
from sympy.abc import x,y,z
with sympy.evaluate(False):
print(sympy.sympify("diff(x,x)").func)
print(sympy.parse_expr("diff(x, x)", local_dict={'diff':sympy.Derivative}).func)
print(sympy.sympify("Derivative(x,x)").func)
pass
This puts out:
Piecewise
<class 'sympy.core.function.Derivative'>
<class 'sympy.core.function.Derivative'>
This example should illustrate that diff is not a symbolic function yet Derivative is.
sympy.sympify("diff(x,x)").func results in Piecewise.
What exactly makes a function in sympy 'symbolic'?
Why don't both of the functions belong to <class 'sympy.core.function.Derivative'>?
I tried to test on a few examples if a function is symbolic using:
list_of_funcs = [sin, cos, tan, sec, csc, cot, sinh, cosh, tanh, sech, csch, coth, asin, acos, atan, asec, acsc, acot, asinh, acosh, atanh, asech, acsch, acoth, log, log, log, exp, <class 'sympy.concrete.summations.Sum'>, <class 'sympy.concrete.products.Product'>, Piecewise, jacobi, Piecewise]
with sympy.evaluate(False):
for f in list_of_funcs:
if issubclass(f, sympy.Basic):
print(f'{f}: True')
It returned True for all yet as far as I understood Piecewise is not symbolic.
Could you help me finding a way to test if a function is symbolic?
Answering this question without going too deep into coding concepts is not easy, but I can give it a try.
SymPy exposes many functions:
cos, sin, exp, Derivative, Integral... we can think of them as symbolic functions. Let's say you provide one or more arguments, then:
the function might evaluate to some symbolic numerical value. For example, cos(0) will return the symbolic number 1.
the function might return an "unevaluated" object. For example, cos(x) returns cos(x): this is a symbolic expression of type cos (as you have seen by running the func attribute). Similarly, you can create a derivative object Derivative(expr, x): this is a symbolic expression that represents a derivative, it doesn't actually compute the derivative!
Unevaluate functions, for example Function("f")(x), which will render as f(x): this is a symbolic expression of type f.
diff, integrate, series, limit, ... : those are ordinary python functions (for example, created with def diff(...)) that are going to apply some operation to a symbolic expression. When you call diff(expr, x) you are asking SymPy to compute the derivative of expr with respect to x. What if you wanted to represent the derivative without actually computing it? You write Derivative(expr, x).
So, going back to your example:
sympy.parse_expr("diff(x, x)", local_dict={'diff':sympy.Derivative})
this can be easily simplified to:
sympy.parse_expr("Derivative(x, x)")
A few more "relationships":
diff and Derivative
integrate and Integral
limit and Limit
Think of "symbolic" functions as nouns and "non symbolic" as verbs.
noun-like verb-like
---------- ---------
Integral integrate
Derivative diff
Sum summation
Product product
Those that are Python functions (I think what you mean by non-symbolic) emit some evaluated form of something that could also exist in an unevaluated form, e.g. prod(x, (x, 1, 5)) -> 120; Product(x, (x, 1, 5)) -> no change.
It would be better to refer to them in some other way. What I think you mean is "evaluated" vs "non-evaluated" (since almost all of SymPy's functions will return SymPy objects with "symbols", even if numeric).
If the type(thing) -> function it is not a class itself, it is a Python function. If you check the type of cos or Sum you will not get function as the result.
Whereas some of the evaluates and unevaluated forms may have different names (Sum vs summation) others, like Add are evaluative by default but you can use the evaluate=False arg to stop that: Add(x, x) = x + x != Add(x,x,evaluate=False). Those which are evaluative can be made non-evaluative by passing evaluate=True, e.g. Derivative(x, x, evaluate=True) -> 1.
In a dynamic system my base values are all functions of time, d(t). I create the variable d using d = Function('d')(t) where t = S('t')
Obviously it's very common to have derivatives of d (rates of change like velocity etc.). However the default printing of diff(d(t)) gives:-
Derivative(d(t), t)
and using pretty printing in ipython (for e.g.) gives a better looking version of:-
d/dt (d(t))
The functions which include the derivatives of d(t) are fairly long in my problems however, and I'd like the printed representation to be something like d'(t) or \dot(d)(t) (Latex).
Is this possible in sympy? I can probably workaround this using subs but would prefer a generic sympy_print function or something I could tweak.
I do this by substitution. It is horribly stupid, but it works like a charm:
q = Function('q')(t)
q_d = Function('\\dot{q}')(t)
and then substitute with
alias = {q.diff(t):q_d, } # and higher derivatives etc..
hd = q.diff(t).subs(alias)
And the output hd has a pretty dot over it's head!
As I said: this is a work-around and works, but you have to be careful in order to substitute correctly (Also for q_d.diff(t), which must be q_d2 and so on! You can have one big list with all replacements for printing and just apply it after the relevant mathematical steps.)
The vector printing module that you already found is the only place where such printing is implemented in SymPy.
from sympy.physics.vector import dynamicsymbols
from sympy.physics.vector.printing import vpprint, vlatex
d = dynamicsymbols('d')
vpprint(d.diff()) # ḋ
vlatex(d.diff()) # '\\dot{d}'
The regular printers (pretty, LaTeX, etc) do not support either prime or dot notation for derivatives. Their _print_Derivative methods are written so that they also work for multivariable expressions, where one has to specify a variable by using some sort of d/dx notation.
It would be nice to have an option for shorter derivative notation in general.
I know SO is not rent-a-coder, but I have a really simple python example that I need help translating to C++
grey_image_as_array = numpy.asarray( cv.GetMat( grey_image ) )
non_black_coords_array = numpy.where( grey_image_as_array > 3 )
# Convert from numpy.where()'s two separate lists to one list of (x, y) tuples:
non_black_coords_array = zip( non_black_coords_array[1], non_black_coords_array[0] )
First one is rather simple I guess - a linear indexable array is created with what bytes are retruned from cv.GetMat, right?
What would be an equivalent of pyton's where and especially this zip functions?
I don't know about OpenCV, so I can't tell you what cv.GetMat() does. Apparently, it returns something that can be used as or converted to a two-dimensional array. The C or C++ interface to OpenCV that you are using will probably have a similarly names function.
The following lines create an array of index pairs of the entries in grey_image_as_array that are bigger than 3. Each entry in non_black_coords_array are zero based x-y-coordinates into grey_image_as_array. Given such a coordinates pair x, y, you can access the corresponsing entry in the two-dimensional C++ array grey_image_as_array with grey_image_as_array[y][x].
The Python code has to avoid explicit loops over the image to achieve good performance, so it needs to make to with the vectorised functions NumPy offers. The expression grey_image_as_array > 3 is a vectorised comparison and results in a Boolean array of the same shape as grey_image_as_array. Next, numpy.where() extracts the indices of the True entries in this Boolean array, but the result is not in the format described above, so we need zip() to restructure it.
In C++, there's no need to avoid explicit loops, and an equivalent of numpy.where() would be rather pointless -- you just write the loops and store the result in the format of your choice.
I want to check if any of the items in one list are present in another list. I can do it simply with the code below, but I suspect there might be a library function to do this. If not, is there a more pythonic method of achieving the same result.
In [78]: a = [1, 2, 3, 4, 5]
In [79]: b = [8, 7, 6]
In [80]: c = [8, 7, 6, 5]
In [81]: def lists_overlap(a, b):
....: for i in a:
....: if i in b:
....: return True
....: return False
....:
In [82]: lists_overlap(a, b)
Out[82]: False
In [83]: lists_overlap(a, c)
Out[83]: True
In [84]: def lists_overlap2(a, b):
....: return len(set(a).intersection(set(b))) > 0
....:
Short answer: use not set(a).isdisjoint(b), it's generally the fastest.
There are four common ways to test if two lists a and b share any items. The first option is to convert both to sets and check their intersection, as such:
bool(set(a) & set(b))
Because sets are stored using a hash table in Python, searching them is O(1) (see here for more information about complexity of operators in Python). Theoretically, this is O(n+m) on average for n and m objects in lists a and b. But
it must first create sets out of the lists, which can take a non-negligible amount of time, and
it supposes that hashing collisions are sparse among your data.
The second way to do it is using a generator expression performing iteration on the lists, such as:
any(i in a for i in b)
This allows to search in-place, so no new memory is allocated for intermediary variables. It also bails out on the first find. But the in operator is always O(n) on lists (see here).
Another proposed option is an hybridto iterate through one of the list, convert the other one in a set and test for membership on this set, like so:
a = set(a); any(i in a for i in b)
A fourth approach is to take advantage of the isdisjoint() method of the (frozen)sets (see here), for example:
not set(a).isdisjoint(b)
If the elements you search are near the beginning of an array (e.g. it is sorted), the generator expression is favored, as the sets intersection method have to allocate new memory for the intermediary variables:
from timeit import timeit
>>> timeit('bool(set(a) & set(b))', setup="a=list(range(1000));b=list(range(1000))", number=100000)
26.077727576019242
>>> timeit('any(i in a for i in b)', setup="a=list(range(1000));b=list(range(1000))", number=100000)
0.16220548999262974
Here's a graph of the execution time for this example in function of list size:
Note that both axes are logarithmic. This represents the best case for the generator expression. As can be seen, the isdisjoint() method is better for very small list sizes, whereas the generator expression is better for larger list sizes.
On the other hand, as the search begins with the beginning for the hybrid and generator expression, if the shared element are systematically at the end of the array (or both lists does not share any values), the disjoint and set intersection approaches are then way faster than the generator expression and the hybrid approach.
>>> timeit('any(i in a for i in b)', setup="a=list(range(1000));b=[x+998 for x in range(999,0,-1)]", number=1000))
13.739536046981812
>>> timeit('bool(set(a) & set(b))', setup="a=list(range(1000));b=[x+998 for x in range(999,0,-1)]", number=1000))
0.08102107048034668
It is interesting to note that the generator expression is way slower for bigger list sizes. This is only for 1000 repetitions, instead of the 100000 for the previous figure. This setup also approximates well when when no elements are shared, and is the best case for the disjoint and set intersection approaches.
Here are two analysis using random numbers (instead of rigging the setup to favor one technique or another):
High chance of sharing: elements are randomly taken from [1, 2*len(a)]. Low chance of sharing: elements are randomly taken from [1, 1000*len(a)].
Up to now, this analysis supposed both lists are of the same size. In case of two lists of different sizes, for example a is much smaller, isdisjoint() is always faster:
Make sure that the a list is the smaller, otherwise the performance decreases. In this experiment, the a list size was set constant to 5.
In summary:
If the lists are very small (< 10 elements), not set(a).isdisjoint(b) is always the fastest.
If the elements in the lists are sorted or have a regular structure that you can take advantage of, the generator expression any(i in a for i in b) is the fastest on large list sizes;
Test the set intersection with not set(a).isdisjoint(b), which is always faster than bool(set(a) & set(b)).
The hybrid "iterate through list, test on set" a = set(a); any(i in a for i in b) is generally slower than other methods.
The generator expression and the hybrid are much slower than the two other approaches when it comes to lists without sharing elements.
In most cases, using the isdisjoint() method is the best approach as the generator expression will take much longer to execute, as it is very inefficient when no elements are shared.
def lists_overlap3(a, b):
return bool(set(a) & set(b))
Note: the above assumes that you want a boolean as the answer. If all you need is an expression to use in an if statement, just use if set(a) & set(b):
def lists_overlap(a, b):
sb = set(b)
return any(el in sb for el in a)
This is asymptotically optimal (worst case O(n + m)), and might be better than the intersection approach due to any's short-circuiting.
E.g.:
lists_overlap([3,4,5], [1,2,3])
will return True as soon as it gets to 3 in sb
EDIT: Another variation (with thanks to Dave Kirby):
def lists_overlap(a, b):
sb = set(b)
return any(itertools.imap(sb.__contains__, a))
This relies on imap's iterator, which is implemented in C, rather than a generator comprehension. It also uses sb.__contains__ as the mapping function. I don't know how much performance difference this makes. It will still short-circuit.
You could also use any with list comprehension:
any([item in a for item in b])
In python 2.6 or later you can do:
return not frozenset(a).isdisjoint(frozenset(b))
You can use the any built in function /w a generator expression:
def list_overlap(a,b):
return any(i for i in a if i in b)
As John and Lie have pointed out this gives incorrect results when for every i shared by the two lists bool(i) == False. It should be:
return any(i in b for i in a)
This question is pretty old, but I noticed that while people were arguing sets vs. lists, that no one thought of using them together. Following Soravux's example,
Worst case for lists:
>>> timeit('bool(set(a) & set(b))', setup="a=list(range(10000)); b=[x+9999 for x in range(10000)]", number=100000)
100.91506409645081
>>> timeit('any(i in a for i in b)', setup="a=list(range(10000)); b=[x+9999 for x in range(10000)]", number=100000)
19.746716022491455
>>> timeit('any(i in a for i in b)', setup="a= set(range(10000)); b=[x+9999 for x in range(10000)]", number=100000)
0.092626094818115234
And the best case for lists:
>>> timeit('bool(set(a) & set(b))', setup="a=list(range(10000)); b=list(range(10000))", number=100000)
154.69790101051331
>>> timeit('any(i in a for i in b)', setup="a=list(range(10000)); b=list(range(10000))", number=100000)
0.082653045654296875
>>> timeit('any(i in a for i in b)', setup="a= set(range(10000)); b=list(range(10000))", number=100000)
0.08434605598449707
So even faster than iterating through two lists is iterating though a list to see if it's in a set, which makes sense since checking if a number is in a set takes constant time while checking by iterating through a list takes time proportional to the length of the list.
Thus, my conclusion is that iterate through a list, and check if it's in a set.
if you don't care what the overlapping element might be, you can simply check the len of the combined list vs. the lists combined as a set. If there are overlapping elements, the set will be shorter:
len(set(a+b+c))==len(a+b+c) returns True, if there is no overlap.
I'll throw another one in with a functional programming style:
any(map(lambda x: x in a, b))
Explanation:
map(lambda x: x in a, b)
returns a list of booleans where elements of b are found in a. That list is then passed to any, which simply returns True if any elements are True.