Pretty print expression as entered - sympy

I would like to pretty print an expression to double check that it's what I want, without any manipulations or simplifications. Here's a simple example:
from sympy import *
import abc
init_session()
sigma_1, sigma_0, mu_1, mu_0,x = symbols("sigma_1 sigma_0 mu_1 mu_0 x")
diff = log(1/(sqrt(2*pi*sigma_1**2)) * exp(-(x-mu_1)**2/(2*sigma_1**2))) - log(1/(sqrt(2*pi*sigma_0**2)) * exp(-(x-mu_0)**2/(2*sigma_0**2)))
diff
This has manipulated the expression a bit, but I'd like to see it pretty printed just in the order I entered it, so I can check it easily against the formulas I've got written down.
Is there a way to do that?

You can avoid some simplifications by using
sympify("log(1/(sqrt(2*pi*sigma_1**2)) * exp(-(x-mu_1)**2/(2*sigma_1**2))) - log(1/(sqrt(2*pi*sigma_0**2)) * exp(-(x-mu_0)**2/(2*sigma_0**2)))", evaluate=False)
However, some simplifications can't be avoided. For example, there's no way to keep terms in the same order, and some expressions, like 1/x and x**-1 are internally represented in the same way. With that being said, there are definitely places where sympify(evaluate=False) could be improved.

Related

sympy function compose - bizzare results

I'm trying to compose two functions and I get a bizzare result
'''
#!/usr/bin/python
from sympy import *
init_printing(use_unicode=True)
x= symbols('x')
f = x/(x+1);
g = x/(x+2);
print(compose(f,g))
This shows : x/((x + 1)*(x + 2))
Should be x/(2x+2)
I don't get it. Does anyone has an idea?
Thanks
Despite being available in the top-level sympy namespace under the plain name compose, sympy.compose doesn't actually do general function composition.
sympy.compose is actually sympy.polys.polytools.compose. It's actually a function for polynomial composition. When you try to compose x/(x+1) and x/(x+2), it ends up interpreting these inputs as multivariate polynomials in 3 variables, x, 1/(x+1), and 1/(x+2), and the results are total nonsense.

Sympy: Simplify small compound fraction with squares and roots

I have got the following situation (in Sympy 1.8):
from sympy import *
u = symbols('u') # not necessarily positive
term = sqrt(1/u**2)/sqrt(u**2)
The term renders as
How can I simplify this to 1/u**2, i.e. ?
I have tried many functions from https://docs.sympy.org/latest/tutorial/simplification.html, and some arguments listed in https://docs.sympy.org/latest/modules/simplify/simplify.html but could not get it to work.
The variable needs to be declared as real number:
u=symbols('u', real=True)
Then the term is auto-simplified.
(I suggested a corresponding Sympy documentation change.)

sympy separate fractions from variables

Using sympy how do I keep fractions separate from variables
Mul(Fraction(3,5), Pow(K, Integer(2)))
2
3⋅K
────
5
to
3 2
─ K
5
I know this simplified version is not too bad, but when i have really big equations, it gets messy
I'm not very familiar with pretty printing or LaTeX printing but I managed to come up with something. Put UnevaluatedExpr in each of the arguments of Mul:
from sympy import *
from fractions import Fraction
K = symbols("K")
expr1 = Mul(UnevaluatedExpr(Fraction(3,5)), UnevaluatedExpr(Pow(K, Integer(2))))
expr2 = Mul(UnevaluatedExpr(pi/5), UnevaluatedExpr(Pow(K, Integer(2))))
expr3 = ((UnevaluatedExpr(S(1)*3123456789/512345679) * UnevaluatedExpr(Pow(K, Integer(2)))))
pprint(expr1)
pprint(expr2)
pprint(expr3)
Produces:
2
3/5⋅K
π 2
─⋅K
5
1041152263 2
──────────⋅K
170781893
I couldn't find a way to make it print a stacked fraction for the slashed fraction 3/5. Longer fractions seem to work though. If you are printing in LaTeX however, the documentation suggests something like latex(expr1, fold_frac_powers=False) to correct this.
Too bad I couldn't find an elegant solution like putting init_printing(div_stack_symbols=False) at the top of the document.
To elaborate on Maelstrom's Answer, you need to do 2 things to make this work like you want:
Create the separate fraction you want as its own expression.
Prevent the numerator or denominator from being modified when the expression is combined with other expressions.
What Maelstrom showed will work, but it's much more complicated than what's actually needed. Here's a much cleaner solution:
from sympy import *
K = symbols("K")
# Step 1: make the fraction
# This seems to be a weird workaround to prevent fractions from being broken
# apart. See the note after this code block.
lh_frac = UnevaluatedExpr(3) / 5
# Step 2: prevent the fraction from being modified
# Creating a new multiplication expression will normally modify the involved
# expressions as sympy sees fit. Setting evaluate to False prevents that.
expr = Mul(lh_frac , Pow(K, 2), evaluate=False)
pprint(expr)
gives:
3 2
-*K
5
Important Note:
Doing lh_frac = UnevaluatedExpr(3) / 5 is not how fractions involving 2 literal numbers should typically be created. Normally, you would do:
lh_frac = Rational(3, 5)
as shown in the sympy docs. However, that gives undesirable output for our use case right now:
2
3*K
----
5
This outcome is surprising to me; setting evaluate to False inside Mul should be sufficient to do what we want. I have an open question about this.

Why the output of model.wv.similarity() in Word2Vec results different with model.wv.similar()?

I have trained a Word2Vec model and I am trying to use it.
When I input the most similar words of ‘动力', I got the output like this:
动力系统 0.6429724097251892
驱动力 0.5936785936355591
动能 0.5788494348526001
动力车 0.5579575300216675
引擎 0.5339343547821045
推动力 0.5152761936187744
扭力 0.501279354095459
新动力 0.5010953545570374
支撑力 0.48610919713974
精神力量 0.47970670461654663
But the problem is that if I input model.wv.similarity('动力','动力系统') I got the result 0.0, which is not equal with
0.6429724097251892
what confused me more was that when I got the next similarity of word '动力' and word '驱动力', it showed
3.689349e+19
So why ? Did I make misunderstanding with the similarity? I need someone to tell me!!
And the code is:
res = model.wv.most_similar('动力')
for r in res:
print(r[0],r[1])
print(model.wv.similarity('动力','动力系统'))
print(model.wv.similarity('动力','驱动力'))
print(model.wv.similarity('动力','动能'))
output:
动力系统 0.6429724097251892
驱动力 0.5936785936355591
动能 0.5788494348526001
动力车 0.5579575300216675
引擎 0.5339343547821045
推动力 0.5152761936187744
扭力 0.501279354095459
新动力 0.5010953545570374
支撑力 0.48610919713974
精神力量 0.47970670461654663
0.0
3.689349e+19
2.0
I have written a function to replace the model.wv.similarity method.
def Similarity(w1,w2,model):
A = model[w1]; B = model[w2]
return sum(A*B)/(pow(sum(pow(A,2)),0.5)*pow(sum(pow(B,2)),0.5)
Where w1 and w2 are the words you input, model is the Word2Vec model you have trained.
Using the similarity method directly from the model is deprecated. It has a bit extra logic in it that performs vector normalization before evaluating the result.
You should be using vw directly, because as stated in their documentation, for the word vectors it is of non importance how they were trained so they should be looked as independent structure, the model is just the means to obtain it.
Here is short discussion which should give you starting points if you want to investigate further.
It may be an encoding issue, where you are not actually comparing the same tokens.
Try the following, to see if it gives results closer to what you expect.
res = model.wv.most_similar('动力')
for r in res:
print(r[0],r[1])
print(model.wv.similarity('动力', res[0][0]))
print(model.wv.similarity('动力', res[1][0]))
print(model.wv.similarity('动力', res[2][0]))
If it does, you could look further into why the model might be reporting strings which print as 动力系统 (etc), but don't match your typed-in-code string literals like '动力系统' (etc). For example:
print(res[0][0]=='动力系统')
print(type(res[0][0]))
print(type('动力系统'))

Nonlinear least squares in Stata, how to model summation over variables/sets?

I would like to estimate the following function by nonlinear least squares using Stata:
I am testing the results of another papper and would like to use Stata since it is the same software/solver as they used in the paper I am replicating and because it should be easier to do than using GAMS, for example.
My problem is that I cannot find any way to write out the sum part of the equation above. In my data all i's have are a single observation with the values for the j's in separate variables. I could write out the whole expression in the following manner (for three observations/i's):
nl (ln_wage = {alpha0} + {alpha0}*log( ((S_over_H_1)^{alpha2})*exp({alpha3}*distance_1) + ((S_over_H_2)^{alpha2})*exp({alpha3}*distance_2) + ((S_over_H_1)^{alpha2})*exp({alpha3}*distance_1) ))
Is there a simple way to tell Stata to sum over an expression/variables for a given set of numbers, like in GAMS where you can write:
lnwage(i) = alpha0 + alpha1*ln(sum((j), power(S_over_H(i,j),alpha2) * exp(alpha3 * distance(i,j))))
There is no direct equivalent in Stata of the GAMS notation you cite, but you could do this
forval j = 1/3 {
local call `call' S_over_H_`j'^({alpha2}) * exp({alpha3} * distance_`j')
}
nl (ln_wage = {alpha0} + {alpha1} * ln(`call')
P.S. please explain what GAMS is.