Substitute numerical constants with symbols in sympy - sympy

I have a question similar to this one: How to substitute multiple symbols in an expression in sympy? but in reverse.
I have a sympy expression with numerical values and symbols alike. I would like to substitute all numerical values with symbolic constants. I appreciate that such query is uncommon for sympy. What can I try next?
For example, I have:
-0.5967695*sin(0.15280747*x0 + 0.89256966) + 0.5967695*sin(sin(0.004289882*x0 - 1.5390939)) and would like to replace all numbers with a, b, c etc. ideally in a batch type of way.
The goal is to then apply trig identities to simplify the expression.

I'm not sure if there is already such a function. If there is not, it's quite easy to build one. For example:
import string
def num2symbols(expr):
# wild symbol to select all numbers
w = Wild("w", properties=[lambda t: isinstance(t, Number)])
# extract the numbers from the expression
n = expr.find(w)
# get a lowercase alphabet
alphabet = list(string.ascii_lowercase)
# create a symbol for each number
s = symbols(" ".join(alphabet[:len(n)]))
# create a dictionary mapping a number to a symbol
d = {k: v for k, v in zip(n, s)}
return d, expr.subs(d)
x0 = symbols("x0")
expr = -0.5967695*sin(0.15280747*x0 + 0.89256966) + 0.5967695*sin(sin(0.004289882*x0 - 1.5390939))
d, new_expr = num2symbols(expr)
print(new_expr)
# out: b*sin(c + d*x0) - b*sin(sin(a + f*x0))
print(d):
# {-1.53909390000000: a, -0.596769500000000: b, 0.892569660000000: c, 0.152807470000000: d, 0.596769500000000: e, 0.00428988200000000: f}

I feel like dict.setdefault was made for this purpose in Python :-)
>>> c = numbered_symbols('c',cls=Dummy)
>>> d = {}
>>> econ = expr.replace(lambda x:x.is_Float, lambda x: sign(x)*d.setdefault(abs(x),next(c)))
>>> undo = {v:k for k,v in d.items()}
Do what you want with econ and when done (after saving results to econ)
>>> econ.xreplace(undo) == expr
True
(But if you change econ the exact equivalence may no longer hold.) This uses abs to store symbols so if the expression has constants that differ by a sign they will appear in econ with +/-ci instead of ci and cj.

Related

Sympy - Simplify expression within domain

Can Sympy automatically simplify an expression that includes terms like this one:
cos(x)/(cos(x)**2)**(1/2)
which can be simplified to 1 in the domain that I am interested in 0 <= x <= pi/2 ?
(Examples of other terms that could be simplified in that domain: acos(cos(x)); sqrt(sin(x)**2); sqrt(cos(2*x) + 1); etc.)
If you know the functions that are in your expression (such as sin, cos and tan), you can do the following according to this stack overflow question:
from sympy import *
x = symbols("x", positive=True)
ex = cos(x)/(cos(x)**2)**(S(1)/2)
ex = refine(ex, Q.positive(sin(x)))
ex = refine(ex, Q.positive(cos(x)))
ex = refine(ex, Q.positive(tan(x)))
print(ex)
Note that Q.positive(x*(pi/2-x)) did not help in the process of simplification for trig functions even though this is exactly what you want in general.
But what if you might have crazy functions like polygamma? The following works for some arbitrary choices for ex according to my understanding.
It wouldn't be a problem if the expression was already generated before by SymPy, but if you are inputting the expression manually, I suggest using S(1)/2 or Rational(1, 2) to describe one half.
from sympy import *
# define everything as it would have come from previous code
# also define another variable y to be positive
x, y = symbols("x y", positive=True)
ex = cos(x)/(cos(x)**2)**(S(1)/2)
# If you can, always try to use S(1) or Rational(1, 2)
# if you are defining fractions.
# If it's already a pre-calculated variable in sympy,
# it will already understand it as a half, and you
# wouldn't have any problems.
# ex = cos(x)/(cos(x)**2)**(S(1)/2)
# if x = arctan(y) and both are positive,
# then we have implicitly that 0 < x < pi/2
ex = simplify(ex.replace(x, atan(y)))
# revert back to old variable x if x is still present
ex = simplify(ex.replace(y, tan(x)))
print(ex)
This trick can also be used to define other ranges. For example, if you wanted 1 < x, then you could have x = exp(y) where y = Symbol("y", positive=True).
I think subs() will also work instead of replace() but I just like to be forceful with substitutions, since SymPy can sometimes ignore the subs() command for some variable types like lists and stuff.
You can substitute for a symbol that has the assumptions you want:
In [27]: e = cos(x)/(cos(x)**2)**(S(1)/2) + cos(x)
In [28]: e
Out[28]:
cos(x)
cos(x) + ────────────
_________
╱ 2
╲╱ cos (x)
In [29]: cosx = Dummy('cosx', positive=True)
In [30]: e.subs(cos(x), cosx).subs(cosx, cos(x))
Out[30]: cos(x) + 1

How to simplify lengthy symbolic expressons in SymPy

I have been working on some integrations and even though the system is working, it takes much more time to work than it should.
The problem is that the expressions are many pages, and even though they are 3 variables only, sy.simplify just crashes the Kernel after 4 hours or so.
Is there a way to make such lengthy expressions more compact?
EDIT:
Trying to recreate a test expression, using cse. I can't really substitute the symbols to make a final expression, equal to the 1st one
sy.var('a:c x')
testexp = sp.log(x)+a*(0.5*x)**2+(b*(0.5*x)**2+b+sp.log(x))/c
r, e = sy.cse(testexp)
FinalFunction = sy.lambdify(r[0:][0]+(a,b,c,x),e[0])
Points = sy.lambdify((a,b,c,x),r[0:][1])
FinalFunction(Points(1,1,1,1),1,1,1,1)
>>>NameError: name 'x1' is not defined
cse(expr) is sometimes a way to get a more compact representation since repeated subexpressions can be replaced with a single symbol. cse returns a list of repeated expressions and a list of expressions (a singleton if you only passed a single expression):
>>> from sympy import solve
>>> var('a:c x');solve(a*x**2+b*x+c, x)
(a, b, c, x)
[(-b + sqrt(-4*a*c + b**2))/(2*a), -(b + sqrt(-4*a*c + b**2))/(2*a)]
>>> r, e = cse(_)
>>> for i in r: pprint(Eq(*i))
...
_____________
╱ 2
x₀ = ╲╱ -4⋅a⋅c + b
1
x₁ = ───
2⋅a
>>> for i in e: pprint(i)
...
x₁⋅(-b + x₀)
-x₁⋅(b + x₀)
You are still going to have long expressions but they will be represented more compactly (and more efficiently for computatation) if cse is able to identify repeated subexpressions.
To use this in SymPy you can create two Lambdas: one to translate the variables into the replacement values and the other to use those values:
>>> v = (a,b,c,x)
>>> Pts = Lambda(v, tuple([i[1] for i in r]+list(v)))
>>> Pts(1,2,3,4)
(2*sqrt(2)*I, 1/2, 1, 2, 3, 4)
>>> Func = Lambda(tuple([i[0] for i in r]+list(v)), tuple(e))
>>> Func(*Pts(1,2,3,4))
(-1 + sqrt(2)*I, -1 - sqrt(2)*I)

Initialize a variable number of sympy symbols

Whenever you want to work with the python package sympy, a package for symbolic calculation, you need to initialize the variables as
x, y, z = symbols('x y z')
For my application, the number of symbols, that I need, is not fixed. I only have the information, that I have to calculate with e.g. 4 variables.
Is there a smart way to write to e.g. initialize
a,b,c = symbols('a b c')
when I need three variables and
a,b,c,d,e = symbols('a b c d e')
when I need five variables?
In case, that I need more variables than letters in the alphabet,
the function should start to initialize
aa, ab, ac,... .
You can use slice notation in symbols to create numbered symbols like
In [16]: symbols('a1:100')
Out[16]:
(a₁, a₂, a₃, a₄, a₅, a₆, a₇, a₈, a₉, a₁₀, a₁₁, a₁₂, a₁₃, a₁₄, a₁₅, a₁₆, a₁₇, a₁₈, a₁₉, a₂₀, a₂₁, a₂₂, a₂₃, a₂₄, a₂₅, a₂₆, a₂₇, a
₂₈, a₂₉, a₃₀, a₃₁, a₃₂, a₃₃, a₃₄, a₃₅, a₃₆, a₃₇, a₃₈, a₃₉, a₄₀, a₄₁, a₄₂, a₄₃, a₄₄, a₄₅, a₄₆, a₄₇, a₄₈, a₄₉, a₅₀, a₅₁, a₅₂, a₅₃,
a₅₄, a₅₅, a₅₆, a₅₇, a₅₈, a₅₉, a₆₀, a₆₁, a₆₂, a₆₃, a₆₄, a₆₅, a₆₆, a₆₇, a₆₈, a₆₉, a₇₀, a₇₁, a₇₂, a₇₃, a₇₄, a₇₅, a₇₆, a₇₇, a₇₈, a₇
₉, a₈₀, a₈₁, a₈₂, a₈₃, a₈₄, a₈₅, a₈₆, a₈₇, a₈₈, a₈₉, a₉₀, a₉₁, a₉₂, a₉₃, a₉₄, a₉₅, a₉₆, a₉₇, a₉₈, a₉₉)
Then if you want n symbols where n is an int you can do
syms = symbols('a1:%d' % n)

Converting Dummy symbols to Symbols in Sympy

How can I convert Dummy variables (sympy.core.symbol.Dummy) to regular symbols in Sympy?
For example, say we want to find all vectors (x_1,x_2) in the kernel of a matrix that satisfying some equation f(x_1,x_2) = 0.
I would break it into two steps. First:
from sympy import Matrix
M = Matrix( [[1,0],
[0,0] ])
zeros = Matrix([[0],
[0]])
sol = M.gauss_jordan_solve(zeros)[0]
Second: solve f(sol) = 0.
But I don't know how to tell Sympy to treat the entries of sol as symbols. Any ideas?
sol.free_symbols returns a set of all symbols in sol ("free" is a detail that does not matter on this occasion). If you'd like to replace them with some other symbols of your choice, then create new symbols (as many as needed) and replace using subs, as shown below.
from sympy import symbols
dummies = list(sol.free_symbols)
my_syms = symbols("x0:{}".format(len(dummies))) # Example: symbols("x0:3") creates x0, x1, x2
sol = sol.subs(dict(zip(dummies, my_syms)))

Regular expression puzzle

This is not homework, but an old exam question. I am curious to see the answer.
We are given an alphabet S={0,1,2,3,4,5,6,7,8,9,+}. Define the language L as the set of strings w from this alphabet such that w is in L if:
a) w is a number such as 42 or w is the (finite) sum of numbers such as 34 + 16 or 34 + 2 + 10
and
b) The number represented by w is divisible by 3.
Write a regular expression (and a DFA) for L.
This should work:
^(?:0|(?:(?:[369]|[147](?:0*(?:\+?(?:0\+)*[369]0*)*\+?(?:0\+)*[147]0*(?:\+?(?:0\
+)*[369]0*)*\+?(?:0\+)*[258])*(?:0*(?:\+?(?:0\+)*[369]0*)*\+?(?:0\+)*[258]|0*(?:
\+?(?:0\+)*[369]0*)*\+?(?:0\+)*[147]0*(?:\+?(?:0\+)*[369]0*)*\+?(?:0\+)*[147])|[
258](?:0*(?:\+?(?:0\+)*[369]0*)*\+?(?:0\+)*[258]0*(?:\+?(?:0\+)*[369]0*)*\+?(?:0
\+)*[147])*(?:0*(?:\+?(?:0\+)*[369]0*)*\+?(?:0\+)*[147]|0*(?:\+?(?:0\+)*[369]0*)
*\+?(?:0\+)*[258]0*(?:\+?(?:0\+)*[369]0*)*\+?(?:0\+)*[258]))0*)+)(?:\+(?:0|(?:(?
:[369]|[147](?:0*(?:\+?(?:0\+)*[369]0*)*\+?(?:0\+)*[147]0*(?:\+?(?:0\+)*[369]0*)
*\+?(?:0\+)*[258])*(?:0*(?:\+?(?:0\+)*[369]0*)*\+?(?:0\+)*[258]|0*(?:\+?(?:0\+)*
[369]0*)*\+?(?:0\+)*[147]0*(?:\+?(?:0\+)*[369]0*)*\+?(?:0\+)*[147])|[258](?:0*(?
:\+?(?:0\+)*[369]0*)*\+?(?:0\+)*[258]0*(?:\+?(?:0\+)*[369]0*)*\+?(?:0\+)*[147])*
(?:0*(?:\+?(?:0\+)*[369]0*)*\+?(?:0\+)*[147]|0*(?:\+?(?:0\+)*[369]0*)*\+?(?:0\+)
*[258]0*(?:\+?(?:0\+)*[369]0*)*\+?(?:0\+)*[258]))0*)+))*$
It works by having three states representing the sum of the digits so far modulo 3. It disallows leading zeros on numbers, and plus signs at the start and end of the string, as well as two consecutive plus signs.
Generation of regular expression and test bed:
a = r'0*(?:\+?(?:0\+)*[369]0*)*\+?(?:0\+)*'
b = r'a[147]'
c = r'a[258]'
r1 = '[369]|[147](?:bc)*(?:c|bb)|[258](?:cb)*(?:b|cc)'
r2 = '(?:0|(?:(?:' + r1 + ')0*)+)'
r3 = '^' + r2 + r'(?:\+' + r2 + ')*$'
r = r3.replace('b', b).replace('c', c).replace('a', a)
print r
# Test on 10000 examples.
import random, re
random.seed(1)
r = re.compile(r)
for _ in range(10000):
x = ''.join(random.choice('0123456789+') for j in range(random.randint(1,50)))
if re.search(r'(?:\+|^)(?:\+|0[0-9])|\+$', x):
valid = False
else:
valid = eval(x) % 3 == 0
result = re.match(r, x) is not None
if result != valid:
print 'Failed for ' + x
Note that my memory of DFA syntax is woefully out of date, so my answer is undoubtedly a little broken. Hopefully this gives you a general idea. I've chosen to ignore + completely. As AmirW states, abc+def and abcdef are the same for divisibility purposes.
Accept state is C.
A=1,4,7,BB,AC,CA
B=2,5,8,AA,BC,CB
C=0,3,6,9,AB,BA,CC
Notice that the above language uses all 9 possible ABC pairings. It will always end at either A,B,or C, and the fact that every variable use is paired means that each iteration of processing will shorten the string of variables.
Example:
1490 = AACC = BCC = BC = B (Fail)
1491 = AACA = BCA = BA = C (Success)
Not a full solution, just an idea:
(B) alone: The "plus" signs don't matter here. abc + def is the same as abcdef for the sake of divisibility by 3. For the latter case, there is a regexp here: http://blog.vkistudios.com/index.cfm/2008/12/30/Regular-Expression-to-determine-if-a-base-10-number-is-divisible-by-3
to combine this with requirement (A), we can take the solution of (B) and modify it:
First read character must be in 0..9 (not a plus)
Input must not end with a plus, so: Duplicate each state (will use S for the original state and S' for the duplicate to distinguish between them). If we're in state S and we read a plus we'll move to S'.
When reading a number we'll go to the new state as if we were in S. S' states cannot accept (another) plus.
Also, S' is not "accept state" even if S is. (because input must not end with a plus).