How can you structure a script to identify like algebraic terms? - python-2.7

I'm trying to write a script that in some way represents algebraic expressions, and I'm trying to make it as general as possible so that it can accommodate, eventually, things like multivariable expressions, e.g. xy^2 = z and other things like trig functions. However, I need my script to be able to simplify expressions, e.g. simplifying x^2 + 2x^2 = 3x^2 and in order to that I need it to recognize like terms. However, in order to get it to recognize like terms I need it to be able to tell me when two expressions are identical, even if they don't look the same. So for instance I need == to be defined in such a way that the computer will know that (x^2)^2 is x^4.
Now so far, the only way that I can see to make a computer know when two algebraic expressions are identical like this, is to try to create some kind of a "normal form" for all expressions, and then compare the normal forms. So for instance, if I distribute all exponents over multiplication, multiply powers of sums, distribute multiplication over addition, and calculate all simple expressions of just numbers, then this might be at least close to something like a normal form. So for example the normal form of (x^2)^2 would be x^4 and the normal form of x^4 would be x^4. Since they have the same normal form, the computer can tell me they're equivalent expressions. It would say the normal form of (2x)^2+x^2 is 4x^2+x^2 and so wouldn't recognize that this normal form is the same as the normal form of 5x^2, though.
I'm thinking, at this stage I could try to define some "weak" notion of equality, that of equality of normal-form-components. Use this notion of equality, group like terms in the normal form, and this would get me a more universally correct normal form.
But all of this sounds like an absolute ton of work. So far I've defined classes for Expressions, which have subclasses of Variables, Sums, Products, powers, and so on, and right now I'm about 1/4 of the way through defining the function that would produce the normal form of a power object--I haven't even begun on the normal form for a Sum or Product class--and already the code is many pages long, and I'm still not sure that it'll ultimately work the way I want it to.
So my question is, how do you accomplish this goal? Will my current method work? Does anyone know how software like Wolfram|Alpha or the sympy package accomplish this functionality?

Related

What are some alternative forms of if > then relationships?

Traditional if > then relationship in pseudo code:
if (x>y) {
then print "x is greater than y."
}
There are also relational databases.
Or just visual if>then tables. A visual table representation.
There are also tree or hierarchical structure if>then programming aids.
I'm looking for any and all alternatives and flavors of if>then constructs, but preferably practical ones. Since most humans are better at using and remembering visual constructs (tables vs raw code) than symbolic constructs, I'm looking for the most intuitive way to theoretically construct an if>then rule engine, graphically.
Note: I'm not trying to implement this, I'm just trying to get an idea of what could theoretically be done.
I hope I've interpreted the question correctly.
Everything eventually boils down to comparisons, its just a matter of breaking up these comparisons in manageable chunks for humans. There are many techniques to reduce if-thens, or at least transform them into something easier to understand.
One example would be polymorphism. This frees the programmer from one instance of if/then (basically a switch statement). Another example is maps. The implementation of a map uses if/thens, but one might pre-populate the map with all the data and use one logical piece of code instead of using if/then to differentiate. This moves to a data-driven approach. Another example is SQL; it is just a language, a higher level construct, that enables us to express conditions and constraints differently. How you choose to express these conditions is dependent on the problem domain. Some problems work well with traditional procedural programming, some with logic programming, declarative programming etc. If there are many levels of nested if-thens, a state machine approach might work well. Aspect-oriented programming tries to solve the problem of duplicated code in modules that doesn't belong specifically to any one module; a concern that "cross-cuts".
I would do some reading on Programming Paradigms. Do lots of research and if you run into a recurring problem, see if another approach allows you to reduce the amount of if-thens. Most times someone else has run into the same problem and come up with a solution.
Your question is a bit broad and we could ramble from logical gates to mathematical functions. I'm going to focus on this particular bit:
"I'm looking for the most intuitive way to theoretically construct an if>then rule engine, graphically".
First, two caveats:
The best representation depends on the number of possible rules. What works for 3-4 rules probably won't work for 30-40.
I'm going to pretend that else conditions don't exist.
If "X then Y" boils down to: one condition and one instruction whose execution depends on the condition. Let's pretend X -> Y means that "If X is true then Y is executed". Let's create two sets: one is C that contains all the possible conditions. The other one is I which contains all the possible instructions.
With this is mind, X ∈ C and Y ∈ I. In your specific case, can Y ∈ C (can Y be a condition)? If so, you have nested ifs.
Nested ifs can be represented as chains of conditions joined by and operators:
if (x > 3) {
if (y > 5) {
# do something
}
}
Can be written as:
if (x > 3 and y > 5) {
# do something
}
If you're only thinking about code then the latter can become problematic when you have many nested conditions, but when you go graphical, nesting (probably using tree-like structures) can look cluttered while chaining usually looks like a sequence of instructions (which I think is better).
If you don't consider nesting (chaining) in your rules, then connecting elements (boxes, circles, etc) from X -> Y is trivial way to work. The representation of this depends on how graphical you want to get (see the links below for some examples).
If you're considering nesting then three random ideas come to my mind:
Venn Diagrams: Visually attractive, useless for more than 3-4 conditions. They have a good fit with database representations. See: http://share.mheroin.com/image/3i3l1y0S2F39
Flowcharts: Highly functional and easy to read, not too cumbersome to create. Can get out of hand with 10+ elements. See: http://share.mheroin.com/image/2g071j3U1u29
Tables: As you mentioned, tables are a decent way to represent conditionals as long as you can restrain the set of applicable rules. This is an example taken from iTunes: http://share.mheroin.com/image/390y2G18123q. The "Match [all/any] of the following rules" works as a replacement for if/else.

Using QScriptEngine to compute calculations

I'm creating a diagram modeling tool that connects Items to Tasks. Items have Properties (simple name/value relationships) and Tasks have Formulas. I intend to produce a UI for the users to write in a QLineEdit a formula using C++ syntax ( ie, (property1 * property2)/property3), and then output the result. Of course, the formula would have to be somehow parsed and computed to output the result.
My concern with this is if using QScriptEngine is appropriate for this. I've seen that it can be used to perform calculations using evaluate(). Besides the 4 "regular" operations ( +, -, * and /), I only anticipate that probably sqrt() and pow() might be required - but apparently, Math is also usable inside the evaluation string.
Also, I need to store and recover these formulas, so I was considering handling them as QStrings for that purpose, as I will need to write/read them to/from files.
Do you think this is a good approach? What would you suggest as a good read for this type of objectives?
Yes, this approach is good. I've used it for a similar task. Note that QScriptEngine uses JavaScript syntax, not C++ syntax. But JavaScript syntax is powerful and fulfills usual needs of user-defined formulas. It supports regular operators, math functions, brackets, local variables, etc.
You can store a formula in QString. If you need to execute the same formula multiple times, you should use QScriptProgram to compile a formula before executing.

Equivalence of boolean expressions

I have a problem that consist in comparing boolean expressions ( OR is +, AND is * ). To be more precise here is an example:
I have the following expression: "A+B+C" and I want to compare it with "B+A+C". Comparing it like string is not a solution - it will tell me that the expressions don't match which is of course false. Any ideas on how to compare those expressions?
Any ideas about how can I tackle this problem? I accept any kind of suggestions but (as a note) the final code in my application will be written in C++ (C accepted of course).
An normal expression could contain also parenthesis:
(A * B * C) + D or A+B*(C+D)+X*Y
Thanks in advance,
Iulian
I think the competing approach to exhaustive (and possibly exhausting) creation of truth tables would be to reduce all your expressions to a canonical form and compare those. For example, rewrite everything into conjunctive normal form with some rule about the ordering of symbols (eg alphabetical order within terms) and terms (eg alphabetical by first symbol in term). This of course, requires that symbol A in one expression is the same as symbol A in another.
How easy it is to write (or grab from the net) a C or C++ function for rewriting your expressions into CNF I don't know. However, there's been a lot of AI work done in C and C++ so you'll probably find something when you Google.
I'm also a little unsure about the comparative computational complexity of this approach and the truth-table approach. I strongly suspect that it's the same.
Whether you use truth tables or a canonical representation you can of course keep down the work to be done by splitting your input forms into groups based on the number of different symbols that they contain.
EDIT: On reading the other answers, in particular the suggestion to generate all truth tables and compare them, I think that #Iulian has severely underestimated the number of possible truth tables.
Suppose that we settle on RPN to write the expressions, this will avoid having to deal with brackets, and that there are 10 symbols, which means 9 (binary) operators. There will be 10! different orderings of the symbols, and 2^9 different orderings of the operators. There will therefore be 10! x 2^9 == 1,857,945,600 rows in the truth table for this expression. This does include some duplicates, any expression containing only 'and' and 'or' for instance will be the same regardless of the order of symbols. But I'm not sure I can figure this any further ...
Or am I making a big mistake ?
You can calculate the truth table for each expression over all possible inputs then compare the truth tables.

How do I handle combinations of behaviours?

I am considering the problem of validating real numbers of various formats, because this is very similar to a problem I am facing in design.
Real numbers may come in different combinations of formats, for example:
1. with/without sign at the front
2. with/without a decimal point (if no decimal point, then perhaps number of decimals can be agreed beforehand)
3. base 10 or base 16
We need to allow for each combination, so there are 2x2x2=8 combinations. You can see that the complexity increases exponentially with each new condition imposed.
In OO design, you would normally allocate a class for each number format (e.g. in this case, we have 8 classes), and each class would have a separate validation function. However, with each new condition, you have to double the number of classes required and it soon becomes a nightmare.
In procedural programming, you use 3 flags (i.e. has_sign, has_decimal_point and number_base) to identify the property of the real number you are validating. You have a single function for validation. In there, you would use the flags to control its behaviour.
// This is part of the validation function
if (has_sign)
check_sign();
for (int i = 0; i < len; i++)
{
if (has_decimal_point)
// Check if number[i] is '.' and do something if it is. If not, continue
if (number_base = BASE10)
// number[i] must be between 0-9
else if (number_base = BASE16)
// number[i] must be between 0-9, A-F
}
Again, the complexity soon gets out of hand as the function becomes cluttered with if statements and flags.
I am sure that you have come across design problems of this nature before - a number of independent differences which result in difference in behaviour. I would be very interested to hear how have you been able to implement a solution without making the code completely unmaintainable.
Would something like the bridge pattern have helped?
In OO design, you would normally
allocate a class for each number
format (e.g. in this case, we have 8
classes), and each class would have a
separate validation function.
No no no no no. At most, you'd have a type for representing Numeric Input (in case String doesn't make it); another one for Real Number (in most languages you'd pick a built-in type, but anyway); and a Parser class, which has the knowledge to take a Numeric Input and transform it into a Real Number.
To be more general, one difference of behaviour in and by itself doesn't automatically map to one class. It can just be a property inside a class. Most importantly, behaviours should be treated orthogonally.
If (imagining that you write your own parser) you may have a sign or not, a decimal point or not, and hex or not, you have three independent sources of complexity and it would be ok to find three pieces of code, somewhere, that treat one of these issues each; but it would not be ok to find, anywhere, 2^3 = 8 different pieces of code that treat the different combinations in an explicit way.
Imagine that add a new choice: suddenly, you remember that numbers might have an "e" (such as 2.34e10) and want to be able to support that. With the orthogonal strategy, you'll have one more independent source of complexity, the fourth one. With your strategy, the 8 cases would suddenly become 16! Clearly a no-no.
I don't know why you think that the OO solution would involve a class for each number pattern. My OO solution would be to use a regular expression class. And if I was being procedural, I would probably use the standard library strtod() function.
You're asking for a parser, use one:
http://www.pcre.org/
http://www.complang.org/ragel/
sscanf
boost::lexical_cast
and plenty of other alternatives...
Also: http://en.wikipedia.org/wiki/Parser_generator
Now how do I handle complexity for this kind of problems ? Well if I can, I reformulate.
In your case, using a parser generator (or regular expression) is using a DSL (Domain Specific Language), that is a language more suited to the problem you're dealing with.
Design pattern and OOP are useful, but definitely not the best solution to each and every problem.
Sorry but since i use vb, what i do is a base function then i combine a evaluator function
so ill fake code it out the way i have done it
function getrealnumber(number as int){ return getrealnumber(number.tostring) }
function getrealnumber(number as float){ return getrealnumber(number.tostring) }
function getrealnumber(number as double){ return getrealnumber(number.tostring) }
function getrealnumber(number as string){
if ishex(){ return evaluation()}
if issigned(){ return evaluation()}
if isdecimal(){ return evaluation()}
}
and so forth up to you to figure out how to do binary and octal
You don't kill a fly with a hammer.
I realy feel like using a Object-Oriented solution for your problem is an EXTREME overkill. Just because you can design Object-Oriented solution , doesn't mean you have to force such one to every problem you have.
From my experience , almost every time there is a difficulty in finding an OOD solution to a problem , It probably mean that OOD is not appropiate. OOD is just a tool , its not god itself. It should be used to solve large scale problems , and not problems such one you presented.
So to give you an actual answer (as someone mentioned above) : use regular expression , Every solution beyond that is just an overkill.
If you insist using an OOD solution.... Well , since all formats you presented are orthogonal to each other , I dont see any need to create a class for every possible combination. I would create a class for each format and pass my input through each , in that case the complexity will grow linearly.

calculating user defined formulas (with c++)

We would like to have user defined formulas in our c++ program.
e.g. The value v = x + ( y - (z - 2)) / 2. Later in the program the user would define x,y and z -> the program should return the result of the calculation. Somewhen later the formula may get changed, so the next time the program should parse the formula and add the new values. Any ideas / hints how to do something like this ? So far I just came to the solution to write a parser to calculate these formulas - maybe any ideas about that ?
If it will be used frequently and if it will be extended in the future, I would almost recommend adding either Python or Lua into your code. Lua is a very lightweight scripting language which you can hook into and provide new functions, operators etc. If you want to do more robust and complicated things, use Python instead.
You can represent your formula as a tree of operations and sub-expressions. You may want to define types or constants for Operation types and Variables.
You can then easily enough write a method that recurses through the tree, applying the appropriate operations to whatever values you pass in.
Building your own parser for this should be a straight-forward operation:
) convert the equation from infix to postfix notation (a typical compsci assignment) (I'd use a stack)
) wait to get the values you want
) pop the stack of infix items, dropping the value for the variable in where needed
) display results
Using Spirit (for example) to parse (and the 'semantic actions' it provides to construct an expression tree that you can then manipulate, e.g., evaluate) seems like quite a simple solution. You can find a grammar for arithmetic expressions there for example, if needed... (it's quite simple to come up with your own).
Note: Spirit is very simple to learn, and quite adapted for such tasks.
There's generally two ways of doing it, with three possible implementations:
as you've touched on yourself, a library to evaluate formulas
compiling the formula into code
The second option here is usually done either by compiling something that can be loaded in as a kind of plugin, or it can be compiled into a separate program that is then invoked and produces the necessary output.
For C++ I would guess that a library for evaluation would probably exist somewhere so that's where I would start.
If you want to write your own, search for "formal automata" and/or "finite state machine grammar"
In general what you will do is parse the string, pushing characters on a stack as you go. Then start popping the characters off and perform tasks based on what is popped. It's easier to code if you force equations to reverse-polish notation.
To make your life easier, I think getting this kind of input is best done through a GUI where users are restricted in what they can type in.
If you plan on doing it from the command line (that is the impression I get from your post), then you should probably define a strict set of allowable inputs (e.g. only single letter variables, no whitespace, and only certain mathematical symbols: ()+-*/ etc.).
Then, you will need to:
Read in the input char array
Parse it in order to build up a list of variables and actions
Carry out those actions - in BOMDAS order
With ANTLR you can create a parser/compiler that will interpret the user input, then execute the calculations using the Visitor pattern. A good example is here, but it is in C#. You should be able to adapt it quickly to your needs and remain using C++ as your development platform.