Parsing series of formulas from a string using sympy - sympy

I have a pandas df that contains many string formulas that I would like to be able to parse and eventually solve. I came across parse_expr and initially seemed like it would work for my problem but now I'm not so sure.
An example string formula might look like this:
A = B + C; D = A*.2;
parse_expr would seem to work well if i had a system of equations and I may not be using this correctly. As it stands, parse_expr throws an "invalid syntax" error I believe because of the equal sign. Can anyone tell if its possible to solve this problem using parse_expr or if there is another approach I should try?

SymPy cannot parse a bunch of semicolon-separated formulas at once, so the string needs to be split first. It will need to be split again at =, assuming all formulas have = in them. After parsing each side of =, you can combine them with Eq, which is SymPy's equation object; or use them somehow else.
from sympy import S, Eq
str = "A = B + C; D = A*.2;"
result = [Eq(*map(S, f.split("="))) for f in str.split(";")[:-1]]
The result is [Eq(A, B + C), Eq(D, 0.2*A)]
I use S, short for sympify; parse_expr could be used similarly, and it has a few options that are not needed here.

parse_expr is based on the Python tokenizer, but it has several extensions. These extensions take the form of functions that take a list of tokens, a locals dictionary, and a globals dictionary, and return a modified list of tokens. These are passed as a tuple to parse_expr, like parse_expr(expression, transformations=(transformation1, transformation2, ...)).
It's probably easiest to just take a look at the source of the sympy.parsing.sympy_parser submodule to see the existing transformations and how they work. Some of the transformations that are there will probably be useful to you. In this case, you would want a transformation that transforms the = token into something else (actually there's already a transformation function convert_equals_sign in the sympy_parser submodule that does this). You assumedly also want to handle *. somehow.
I've also written a guide on Python tokenization which may be helpful here: https://www.asmeurer.com/brown-water-python
If your syntax is too far off from Python's then it will be challenging to use parse_expr, since it only works with Python's tokenizer. In that case, you'd need to generate your own grammar and parser (e.g., using antlr) for your DSL and parse it into something that can then be transformed into a SymPy expression.

Related

How to translate multiline string in Django models

I use ugettext_lazy as _ , and in a models file my string is represented in this way:
s = _("firstline"
"secondline"
"thirdline")
But after running makemessages I found that in the .po file only "firstline" is marked for translation, the rest are absent. I wouldn't like to avoid using multilining, so is there any way to make translation work with this?
UPD:
Should complement my question: I need my multiline strings to be proceeded by django's makemessages
The best solution I can imagine so far, is
s = str(_("firstline")) +
str(_("secondline") +
str(_("thirdline"))
Edit : Goodguy mentions that makemessages won't do Python parsing, hence not properly collect those kind of "multiline" strings.
The first part is actually true and I stand corrected on this (my bad) - BUT xgettext does the same adjacent strings concatenation has Python, as mentionned here :
Some internationalization tools -- notably xgettext -- have already
been special-cased for implicit concatenation,
and here:
Note also that long strings can be split across lines, into multiple
adjacent string tokens. Automatic string concatenation is performed at
compile time according to ISO C and ISO C++; xgettext also supports
this syntax.
and as a matter of fact me and half a dozen co-workers have been using this very pattern for years on dozen of projects.
s = _("firstline" "secondline" "thirdline")
Python xgettext will automatically concatenate literal strings separated only by blank spaces (space, newlines etc), so this is the exact equivalent of
s = _("firstlinesecondlinethirdline")
If you only get the first of those strings in your po file then the problem is elsewhere - either your snippet is NOT what you actually have in your code or your po file is not correctly updated or anything else... (broken xgettext version maybe ?).
NB : this :
s = str(_("firstline")) +
str(_("secondline") +
str(_("thirdline"))
is about the worse possible solution from the translator's point of view (and can even make your message just impossible to translate in some languages).
I had a similar issue and solved it using standard Python multi-line but single-string format. For example for your string :
s = _("firstline\
secondline\
thirdline")
Update: The actual problem is that makemessages is not doing python (and JS and etc.) parsing, so it would not concatenate multiline strings as expected. Solution below will not work either (it won't see computed values).
Unfortunately, you have to find another way to format your message, preferably by splitting it into single-line parts.
Previous answer:
ugettext_lazy can only accept single argument so it's up to you how you want your translations to be.
If you are fine with "firstline" "secondline" "thirdline" being exported for translation as a single sentence you can do something like this:
s = _(' '.join(["firstline", "secondline", "thirdline"]))
If you want to keep them as a separate translation sentences when something like this may also work:
s = ' '.join(_(line) for line in ["firstline", "secondline", "thirdline"])
Or just call _ on every line and concatenate them

Regular expression MATLAB

I have tried to solve this problem by reading old questions and by googles help.
I writing a short script in matlab where the user types in a equation and then plot the data by using eval.
But I want to check if the equation is right and uses the right variables and so...
I have three variables, X,Y,Z with upper case, so for example 'X+Y-Z-7.5' is a solid equation, but 'XB-Z' isn't. Just 'X' is also a solid "equation"...
How can I write the expression? Here is what I have...
regexp(test,'(X|Y|Z)$|(X|Y|Z|\d)&&(+|-|*|/|)')
My next plan is to do like,
if regexp(test,'(X|Y|Z)$|(X|Y|Z|\d)&&(+|-|*|/|)') == 1
disp ('Correct')
end
So I want the regexp return if the string matches the whole expression, not just startindex. I have problem to fix that too.
Please, I'm stuck.
One potential solution (if you have the Symbolic Math Toolbox) is to simply rely on that to determine whether the equations are valid.
You can use symvar to extract all symbols used in the equation and compare these to the variables you allow.
allowed = {'X', 'Y', 'Z'};
vars = symvar(userinput);
tf = ismember(vars, allowed);
if ~all(tf)
disp('Invalid Variables Used');
end
This is likely going to be much more robust than attempting to create regular expressions as it relies on MATLAB's internal parser.

How to rewrite `sin(x)^2` to cos(2*x) form in Sympy

It is easy to obtain such rewrite in other CAS like Mathematica.
TrigReduce[Sin[x]^2]
(*1/2 (1 - Cos[2 x])*)
However, in Sympy, trigsimp with all methods tested returns sin(x)**2
trigsimp(sin(x)*sin(x),method='fu')
While dealing with a similar issue, reducing the order of sin(x)**6, I notice that sympy can reduce the order of sin(x)**n with n=2,3,4,5,... by using, rewrite, expand, and then rewrite, followed by simplify, as shown here:
expr = sin(x)**6
expr.rewrite(sin, exp).expand().rewrite(exp, sin).simplify()
this returns:
-15*cos(2*x)/32 + 3*cos(4*x)/16 - cos(6*x)/32 + 5/16
That works for every power similarly to what Mathematica will do.
On the other hand if you want to reduce sin(x)**2*cos(x) a similar strategy works. In that case you have to rewrite the cos and sin to exp and as before expand rewrite and simplify again as:
(sin(x)**2*cos(x)).rewrite(sin, exp).rewrite(cos, exp).expand().rewrite(exp, sin).simplify()
that returns:
cos(x)/4 - cos(3*x)/4
The full "fu" method tries many different combinations of transformations to find "the best" result.
The individual transforms used in the Fu-routines can be used to do targeted transformations. You will have to read the documentation to learn what the different functions do, but just running through the functions of the FU dictionary identifies TR8 as your workhorse here:
>>> for f in FU.keys():
... print("{}: {}".format(f, FU[f](sin(var('x'))**2)))
...
8<---
TR8 -cos(2*x)/2 + 1/2
TR1 sin(x)**2
8<---
Here is a silly way to get this job done.
trigsimp((sin(x)**2).rewrite(tan))
returns:
-cos(2*x)/2 + 1/2
also works for
trigsimp((sin(x)**3).rewrite(tan))
returns
3*sin(x)/4 - sin(3*x)/4
but not works for
trigsimp((sin(x)**2*cos(x)).rewrite(tan))
retruns
4*(-tan(x/2)**2 + 1)*cos(x/2)**6*tan(x/2)**2

custom regular expression parser

i would like to do regular expression matching on custom alphabets, using custom commands. the purpose is to investigate equations and expressions that appear in meteorology.
So for example my alpabet an be [p, rho, u, v, w, x, y, z, g, f, phi, t, T, +, -, /] NOTE: the rho and phi are multiple characters, that should be treated as single character.
I would also like to use custom commands, such a \v for variable, i.e. not the arithmatic operators.
I would like to use other commands such as (\v). note the dot should match dx/dt, where x is a variable. similarly, given p=p(x,y,z), p' would match dp/dx, dp/dy, and dp/dz, but not dp/df. (somewhere there would be given that p = p(x,y,z)).
I would also like to be able to backtrack.
Now, i have investigated PCRE and ragel with D, i see that the first two problems are solvable, with multiple character objects defined s fixed objects. and not a character class.
However how do I address the third?
I dont see either PCRE or RAGEL admitting a way to use custom commands.
Moreover, since I would like to use backtrack I am not sure if Ragel is the correct option, as this wouuld need a stack, which means I would be using CFG.
Is there perhaps a domainspeific language to build such regex/cfg machines (for linux 64 bit if that matters)
There is nothing impossible. Just write new class with regex inside with your programming language and define new syntax. It will be your personal regular expression syntax. For example, like:
result = latex_string.match("p'(x,y,z)", "full"); // match dp/dx, dp/dy, dp/dz
result = latex_string_array.match("p'(x,y,z)", "partial"); // match ∂p/∂x, ∂p/∂y, ∂p/∂z
. . .
The method match will treat new, pseudo-regular expression inside the your class and will return the result in desirable form. You can simply make input definition as a string and/or array form. Actually, if some function have to be matched by all derivatives, you must simplify search notation to .match("p'").
One simple notice:
,
have source: \mathrm{d}y=\frac{\mathrm{d}y}{\mathrm{d}t}\mathrm{d}t, and:
,
dy=\frac{dy}{dt}dt, and finally:
,
is dy=(dy/dt)dt
The problem of generalization for latex equations meaning with regular expressions is human input factor. It is just a notation and author can select various manners of input.
The best and precise way is to analysis of formula content and creation a computation three. In this case, you will search not just notations of differentials or derivatives, but instructions to calculate differentials and derivatives, but anyway it is connected with detailed analysis of the formula string with multiple cases of writing manners.
One more thing, and good news for you! It's not necessary to define magic regex-latex multibyte letter greek alphabet. UTF-8 have ρ - GREEK SMALL LETTER RHO you can use in UI, but in search method treat it as \rho, and use simply /\\frac{d\\rho}{dx}/ regex notation.
One more example:
// search string
equation = "dU= \left(\frac{\partial U}{\partial S}\right)_{V,\{N_i\}}dS+ \left(\frac{\partial U}{\partial V}\right)_{S,\{N_i\}}dV+ \sum_i\left(\frac{\partial U}{\partial N_i}\right)_{S,V,\{N_{j \ne i}\}}dN_i";
. . .
// user input by UI
. . .
// call method
equation.equation_match("U'");// example notation for all types of derivatives for all variables
. . .
// inside the 'equation_match' method you will use native regex methods
matches1 = equation.match(/dU/); // dU
matches2 = equation.match(/\\partial U/); // ∂U
etc.
return(matches);// combination of matches

Extracting values from comma separated lists

When given a list of comma separated values like 3, asdf, *#, 1212.3, I would like to extract each of these values, not including the comma, so I would have a value list like [3, "asdf", "*#", 1212.3] (not as a textual representation like that, but as an array of 'hits'). How would I do this?
I see you're using the D programming language. Here is a link to a CSV parser for D.
First off, if you are dealing with CSV files, don't use regex or your own parser. Basically when you think things are simple they really aren't, Stop Rolling Your Own CSV Parser.
Next up, you say that you would like to have an array ([3, "asdf", "*#", 1212.3]). This looks to be mixing types and can not be done in a static language. And ultimately is very inefficient even using std.variant. For each parsed value you'd have code like:
try {
auto data = to!double(parsedValue);
auto data2 = to!int(data);
if(data == data2)
returnThis = Variant(data2);
else
returnThis = Variant(data);
} catch(ConvException ce) { }
Now if your data is truely separated by some defined set of characters, and isn't broken into records with new lines, then you can use split(", ") from std.algorithm. Otherwise use a CSV parser. If you don't want to follow the standard wrap the parser so the data is what you desire. In your example you have spaces, which are not to be ignored by the CSV format, so call strip() on the output.
In the article I linked it mentions that what commonly happens is that people will write a parser in its simplest form and not handle the more complicated cases. So when you look for a CSV parser you'll find many that just don't cut it. This writing your own parser comes up, which I say is fine just handle all valid CSV files.
Luckily you don't need to write your own as I reciently made a CSV Parser for D. Error checking isn't done currently, I don't know the best way to report issues such that parsing can be corrected and continue. Usage examples are found in the unittest blocks. You can parse over a struct too:
struct MyData {
int a;
string b;
string c;
double d
}
foreach(data; csv.csv!MyData(str)) // I think I'll need to change the module/function name
//...
in perl you could do something like:
#anArray = split(',', "A,B,C,D,E,F,G");
(?:,|\s+)?([^ ,]+) should do.
It skips a comma or space, then selects anything but a comma or space. Modify to taste.