split a mathematical expression into components - c++

I am writing a commandline calculator which, if I don't fudge it, will have fancy things like natural expression evaluation. I want an efficient method of splitting the input expressions into their components for easy evaluation, regardless of if tokens are separated by spaces or not.
even expressions like this should be processable-
2x^5 + 6d - h
For example,
2x^5+2y^4-62
would be split into
2
*
x
^
5
+
2
*
y
^
4
-
62
and then will be evaluated. I made an attempt at this, but it is very messy and ultimately doesn't work. Please give me a few hints on how to tokenise my input with the stl.

Related

C++ parsing expressions, breaking down the order of evaluation

I'm trying to write an expression parser. One part I'm stuck on is breaking down an expression into blocks via its appropriate order of precedence.
I found the order of precedence for C++ operators here. But where exactly do I split the expression based on this?
I have to assume the worst of the user. Here's a really messy over-exaggerated test example:
if (test(s[4]) < 4 && b + 3 < r && a!=b && ((c | e) == (g | e)) ||
r % 7 < 4 * givemeanobj(a & c & e, b, hello(c)).method())
Perhaps it doesn't even evaluate, and if it doesn't I still need to break it down to determine that.
It should break down into blocks of singles and pairs connected by operators. Essentially it breaks down into a tree-structure where the branches are the groupings, and each node has two branches.
Following the order of precedence the first thing to do would be to evaluate the givemeanobj(), however that's an easy one to see. The next would be the multiplication sign. Does that split everything before the * into a separate , or just the 4? 4 * givemeanobj comes before the <, right? So that's the first grouping?
Is there a straightforward rule to follow for this?
Is there a straightforward rule to follow for this?
Yes, use a parser generator such as ANTLR. You write your language specification formally, and it will generate code which parses all valid expressions (and no invalid ones). ANTLR is nice in that it can give you an abstract syntax tree which you can easily traverse and evaluate.
Or, if the language you are parsing is actually C++, use Clang, which is a proper compiler and happens to be usable as a library as well.

Haskell check if the regular expression r made up of the single symbol alphabet Σ = {a} defines language L(r) = a*

I have got to write an algorithm programatically using haskell. The program takes a regular expression r made up of the unary alphabet Σ = {a} and check if the regular expression r defines the language L(r) = a^* (Kleene star). I am looking for any kind of tip. I know that I can translate any regular expression to the corresponding NFA then to the DFA and at the very end minimize DFA then compare, but is there any other way to achieve my goal? I am asking because it is clearly said that this is the unary alphabet, so I suppose that I have to use this information somehow to make this exercise much easier.
This is how my regular expression data type looks like
data Reg = Epsilon | -- epsilon regex
Literal Char | -- a
Or Reg Reg | -- (a|a)
Then Reg Reg | -- (aa)
Star Reg -- (a)*
deriving Eq
Yes, there is another way. Every DFA for regular languages on the single-letter alphabet is a "lollipop"1: an initial string of nodes that each point to each other (some of which are marked as final and some not) followed by a loop of nodes (again, some of which are marked as final and some not). So instead of doing a full compilation pass, you can go directly to a DFA, where you simply store two [Bool] saying which nodes in the lead-in and in the loop are marked final (or perhaps two [Integer] giving the indices and two Integer giving the lengths may be easier, depending on your implementation plans). You don't need to ensure the compiled version is minimal; it's easy enough to check that all the Bools are True. The base cases for Epsilon and Literal are pretty straightforward, and with a bit of work and thought you should be able to work out how to implement the combining functions for "or", "then", and "star" (hint: think about gcd's and stuff).
1 You should try to prove this before you begin implementing, so you can be sure you believe me.
Edit 1: Hm, while on my afternoon walk today, I realized the idea I had in mind for "then" (and therefore "star") doesn't work. I'm not giving up on this idea (and deleting this answer) yet, but those operations may be trickier than I gave them credit for at first. This approach definitely isn't for the faint of heart!
Edit 2: Okay, I believe now that I have access to pencil and paper I've worked out how to do concatenation and iteration. Iteration is actually easier than concatenation. I'll give a hint for each -- though I have no idea whether the hint is a good one or not!
Suppose your two lollipops have a length m lead-in and a length n loop for the first one, and m'/n' for the second one. Then:
For iteration of the first lollipop, there's a fairly mechanical/simple way to produce a lollipop with a 2*m + 2*n-long lead-in and n-long loop.
For concatenation, you can produce a lollipop with m + n + m' + lcm(n, n')-long lead-in and n-long loop (yes, that short!).

Evaluating a mathematical expression in C++

In a coding problem I've been working on for some time now, I've come to a step where I have to evaluate a mathematical expression that looks like this :
3 * 2 ^ 3 ^ 2 * 5
and should be evaluated like this :
3 * 2 ^ 3 ^ 2 * 5 = 3 * 2^(3 * 2) * 5 = 3 * 64 * 5 = 960.
In the current form of my implementation, I have two vectors, one contains the operands as integers, while the other one contains the operators as chars.
For the current case, they would be : vector<int> operands = { 3, 2, 3, 2, 5 } and vector<char> operators = { '*', '^', '^', '*' }.
This is just a sample case, the order of operations may differ in the sense that multiplication might not always be the first/last operation to be performed.
I've been stuck at this particular step for a while now, namely evaluating the expression encapsulated by the two vector containers to an integer. I've looked at some mathematical parsers I could find on the web, but I still don't see how to implement a proper evaluation.
A solution would be very much appreciated.
Simply compute the value as you parse the expression, maintaining one variable for the final product and one for the current multiplicand (i.e. the current group of exponents with the corresponding base). Apply each exponential operand sequentially as you see it, thus performing left-associative exponentiation.
As an aside, I wouldn't bother storing the entire expression in some kind of vectorized format; I see no useful reason for doing so.
What you would like is possible with expression templates. They make it possible to evaluate expressions in non-standard order and/or behavior - using them you can also define multiple meaning for the same operator in an expression.

Math: Giving regular expression for a language:

I am going over and learning regular expressions and languages. I was working through some questions about giving a regular expression to represent a specified language. The question I was a little stuck on is this:
Come up with a regular expression that expresses the following
language. The alphabet of the langauge is {a,b}.
The language of all strings with two consecutive a's, but no three
consecutive a's. (ie, "aa", "aabaa", "babaa" are in the language,
while "abab", "aaaab" is not).
My answer for this so far is:
(b*(e+a+aa)bb*)* (aa) (bb*(e+a+aa)b*)*
where 'e' is the empty string and '+' functions essentially as an 'or'.
I guess what I am wondering is if my answer is correct (I believe it is), and if it can at all be simplified?
Thanks guys.
I believe that your regular expression is correct. It ensures that an aa exists in the string, and makes sure that aaa cannot exist. As for being simplest (simplest being subjective here), I would say the following is simpler:
(b + ab + aab)* aa (b + ba + baa)*
Note that you could actually derive the above from the regular expression that you have. Taking just the part before the aa in your regular expression, we have:
(b*(e+a+aa)bb*)*
= (b*bb* + b*abb* + b*aabb*)*
= (b + ab + aab)*
That last step is a little bit of a jump, but it takes noticing that all those b*'s are redundant due to the * on the whole expression, and a b existing inside the brackets.
I think this regex matches your language as well:
^((ab|b)*aa(ba|b)*)*$

Calculator which can take words as input

I want to write a calculator which can take words as input.
for e.g. "two plus five multiply with 7" should give 37 as output.
I won't lie, this is a homework so before doing this, I thought if I can be pointed to something which might be useful for these kinds of things and I am not aware of.
Also, approach n how to do this would be ok too , I guess. It has to be written in C++. No other language would be accepted.
Thanks.
[Edit] -- Thanks for the answers. This is an introductory course. So keeping things as simple as possible would be appreciated. I should have mentioned this earlier.
[Edit 2] -- Reached a stage where I can give input in numbers and get correct output with precedence and all. Just want to see how to convert word to number now. Thanks to everybody who is trying to help. This is why SO rocks.
As long as the acceptable input is strict enough, writing a recursive descent parser should be easy. The grammar for this shouldn't differ much from a grammar for a simple calculator that only accepts digits and symbols.
With an std::istream class and the extraction operator (>>) you can easily break the input into separate words.
Here's a lecture that shows how to do this for a regular calculator with digits and symbols.
The main difference from that in your case is parsing the numbers. Instead of the symbols '0-9', your input will have words like "zero", "one", "nine", "hundreds", "thousands", etc. To parse such a stream of words you can do something like this:
break the words into groups, nested by the "multipliers", like "thousands", "hundreds", "billions", etc; these groups can nest. Tens ("ten", "twenty", "thirty"), teens ("eleven", "twelve", "thirteen", ...), and units ("one", "two", "three", ...) don't nest anything. You turn something like "one hundred and three thousands two hundred and ninety seven" into something like this:
+--------+---------+-----+
/ | | \
thousand hundred ninety seven
/ \ |
hundred three two
|
one
for each group, recursively sum all its components, and then multiply it by its "multiplier":
103297
+--------+------+----+
/ | | \
(* 1000) + (* 100) + 90 + 7
/ \ |
(* 100) + 3 2
|
1
This was some of my favorite stuff in school.
The right way to do this is to implement it as a parser -- write a grammar, tokenize your input, and parse away. Then you can evaluate it recursively.
If none of that sounds familiar, though (i.e. this is an introductory class) then you can hack it together in a much less robust way -- just do it how you'd do it naturally. Convert the sentence to numbers and operations, and then just do the operations.
NOTE: I'm assuming this is an introductory level course, and not a course in compiler theory. If you're being taught about specific things relevant to this (e.g. algorithms other than what I mention), you will almost certainly be expected apply those concepts.
First, you'll have to understand the individual words. For this purpose, you can likely just hack it together - read one word at a time and try to understand that. Gradually build up a function which can read the set of numbers you need to be able to work with.
If the input is simple enough (only expressions of the basic form you provide, no parentheses or anything), you can simply alternate between reading a number and an operator until the input is fully read (but if this is supposed to be robust, you need to stop and display an error if the last thing you read is an operator), so you can write separate methods for numbers and operators.
To understand how the expression needs to be calculated, use the shunting yard algorithm to parse the expression with proper operator precedence. You can likely combine the initial parsing of the words with this by simply using that to supply the tokens to the algorithm.
To actually calculate the result, you'll need to evaluate the parsed expression. To do that, you can simply use a stack for the output: when you would normally push an operator, you instead pop 2 values, apply the operator to them, and push the result. After the shunting yard algorithm is complete, there will be one value on the stack, containing the final result.