Is there any function in C++ (included in some library or header file), that would count an expression from string?
Let's say we have a string, which equals 2 + 3 * 8 - 5 (but it's taken from user's keyboard, so we don't know what expression exactly it will be while writing the code), and we want this function co count it, but of course in the correct order (1. power / root 2. times / divide 3. increase / decrease).
I've tried to take all the numbers to an array of ints and operators to an array of chars (okay, actually vectors, because I don't know how many numbers and operators it's going to contain), but I'm not sure what to do next.
Note, that I'm asking if there's any function already written for that, if not I'm going to just try it again.
By count, I'm taking that to mean "evaluate the expression".
You'll have to use a parser generator like boost::spirit to do this properly. If you try hand-writing this I guarantee pain and misery for all involved.
Try looking on here for calculator apps, there are several:
http://boost-spirit.com/repository/applications/show_contents.php
There are also some simple calculator-style grammars in the boost::spirit examples.
Quite surprisingly, others before me have not redirected you to this particular algorithm which is easy to implement and converts your string into this special thing called Reverse Polish Notation (RPN). Computing expressions in RPN is easy, the hard part is implementing Shunting Yard but it is something done many times before and you may find many tutorials on the subject.
Quick overview of the algorithms:
PRN - RPN is a way to write expressions that eliminates the need for parentheses, thus it allows easier computation. Practically speaking, to compute one such expression you walk the string from left to right while keeping a stack of operands. Whenever you meet an operand, you push it onto the stack. Whenever you meet an operation token, you calculate its result on the last 2 operands (if the operation is binary ofcourse, on the last only if it is unary) and push it onto the stack. Rinse and repeat until the end of string.
Shunting Yard really is much harder to simply overview and if I do try to accomplish this task, this answer will end up looking much like the wikipedia article I linked above so I'll save that trouble to both of us.
Tl;DR; Go read the links in the first sentence.
You will have to do this yourself as there is no standard library that handles infix equation resolution
If you do, please include the code for what you are trying and we can help you
Related
I want to write a calculator which can take words as input.
for e.g. "two plus five multiply with 7" should give 37 as output.
I won't lie, this is a homework so before doing this, I thought if I can be pointed to something which might be useful for these kinds of things and I am not aware of.
Also, approach n how to do this would be ok too , I guess. It has to be written in C++. No other language would be accepted.
Thanks.
[Edit] -- Thanks for the answers. This is an introductory course. So keeping things as simple as possible would be appreciated. I should have mentioned this earlier.
[Edit 2] -- Reached a stage where I can give input in numbers and get correct output with precedence and all. Just want to see how to convert word to number now. Thanks to everybody who is trying to help. This is why SO rocks.
As long as the acceptable input is strict enough, writing a recursive descent parser should be easy. The grammar for this shouldn't differ much from a grammar for a simple calculator that only accepts digits and symbols.
With an std::istream class and the extraction operator (>>) you can easily break the input into separate words.
Here's a lecture that shows how to do this for a regular calculator with digits and symbols.
The main difference from that in your case is parsing the numbers. Instead of the symbols '0-9', your input will have words like "zero", "one", "nine", "hundreds", "thousands", etc. To parse such a stream of words you can do something like this:
break the words into groups, nested by the "multipliers", like "thousands", "hundreds", "billions", etc; these groups can nest. Tens ("ten", "twenty", "thirty"), teens ("eleven", "twelve", "thirteen", ...), and units ("one", "two", "three", ...) don't nest anything. You turn something like "one hundred and three thousands two hundred and ninety seven" into something like this:
+--------+---------+-----+
/ | | \
thousand hundred ninety seven
/ \ |
hundred three two
|
one
for each group, recursively sum all its components, and then multiply it by its "multiplier":
103297
+--------+------+----+
/ | | \
(* 1000) + (* 100) + 90 + 7
/ \ |
(* 100) + 3 2
|
1
This was some of my favorite stuff in school.
The right way to do this is to implement it as a parser -- write a grammar, tokenize your input, and parse away. Then you can evaluate it recursively.
If none of that sounds familiar, though (i.e. this is an introductory class) then you can hack it together in a much less robust way -- just do it how you'd do it naturally. Convert the sentence to numbers and operations, and then just do the operations.
NOTE: I'm assuming this is an introductory level course, and not a course in compiler theory. If you're being taught about specific things relevant to this (e.g. algorithms other than what I mention), you will almost certainly be expected apply those concepts.
First, you'll have to understand the individual words. For this purpose, you can likely just hack it together - read one word at a time and try to understand that. Gradually build up a function which can read the set of numbers you need to be able to work with.
If the input is simple enough (only expressions of the basic form you provide, no parentheses or anything), you can simply alternate between reading a number and an operator until the input is fully read (but if this is supposed to be robust, you need to stop and display an error if the last thing you read is an operator), so you can write separate methods for numbers and operators.
To understand how the expression needs to be calculated, use the shunting yard algorithm to parse the expression with proper operator precedence. You can likely combine the initial parsing of the words with this by simply using that to supply the tokens to the algorithm.
To actually calculate the result, you'll need to evaluate the parsed expression. To do that, you can simply use a stack for the output: when you would normally push an operator, you instead pop 2 values, apply the operator to them, and push the result. After the shunting yard algorithm is complete, there will be one value on the stack, containing the final result.
AFAIK no one have implemented an algorithm that takes a set of strings and substrings and gives back one or more regular expressions that would match the given substrings inside the strings. So, for instance, if I'd give my algorithm this two samples:
string1 = "fwef 1234 asdfd"
substring1 = "1234"
string2 = "asdf456fsdf"
substring2 = "456"
The algorithm would give me the regular expression "[0-9]*" back. I know it could give more than one regex or even no possible regex back and you might find 1000 reasons why such algorithm would be close to impossible to implement to perfection. But what's the closest thing?
I don't really care about regex itself also. Basically what I want is an algorithm that takes samples as the ones above and then finds a pattern in them that can be used to easily find the "kind" of text I want to find in a string without having to write any regex or code manually.
I don't have proof but I suspect no such discrete algorithm with a finite output could exist since you are asking for the creation of a regular language which could be "large" in respect to the input size.
With that, I suggest you peek at txt2re which can break down sample texts one-by-one and help you build regexes.
FlashFill a new feature of MS Excel 2013 would do exactly the task you want, but it does not give you the regular expression. It's a NP-complete problem and an open question for practical purposes. If you're interested in how to synthesise string manipulation from multiple examples, Go Flash Fill official website and read a few papers. They have pseudo-code and demo. movies as well.
There are many such algorithm in fact. This is a research area called "Grammatical inference".
I know RPNI, for example. (you could also look on the probabilistic branch, alergia, MDI, DEES). These algorithms generate DSA (Deterministic State Automata). In fact you absolutely don't need to enter the strings in your example. Only substrings.
There are also some algorithms to generate directly Non deterministic automata.
Of course, get the regular expression from an Non Deterministic Automata is easy.
The main ideas are simple:
Generate a PTSA (Prefix Tree State Automata) from your sample.
Then, you have to try to "merge" some states. From these merge, will emerge loops (i.e. * in the regular expression). All the difficulty being to choose the right rule to merge.
Here you go:
re = '|'.join(substrings)
If you want anything more general, your algorithm is going to have to make educated guesses about what type of strings are acceptable as matches, and it's trivial to demonstrate that no procedure can account for all possible sets of possible inputs without simply enumerating them all. For instance, consider some of these scenarios:
Match all prime numbers
Match hexadecimal strings, but no strings containing 'f' are in the sample set
Match the same string repeated twice
Match any even-length string
The root problem is that your question is incompletely specified. If you have a more specific requirement, that might be solvable, depending on what it is.
I need to search incoming not-very-long pieces of text for occurrences of given strings. The strings are constant for the whole session and are not many (~10). Additional simplification is that none of the strings is contained in any other.
I am currently using boost regex matching with str1 | str2 | .... The performance of this task is important, so I wonder if I can improve it. Not that I can program better than the boost guys, but perhaps a dedicated implementation is more efficient than a general one.
As the strings stay constant over long time, I can afford building a data structure, like a state transition table, upfront.
e.g., if the strings are abcx, bcy and cz, and I've read so far abc, I should be in a combined state that means you're either 3 chars into string 1, 2 chars into string 2 or 1 char into string 1. Then reading x next will move me to the string 1 matched state etc., and any char other than xyz will move to the initial state, and I will not need to retract back to b.
Any ideas or references are appreciated.
Check out the Aho–Corasick string matching algorithm!
Take a look at Suffix Tree.
Look at this: http://www.boost.org/doc/libs/1_44_0/libs/regex/doc/html/boost_regex/configuration/algorithm.html
The existence of a recursive/non-recursive distinction is a pretty strong suggestion that BOOST is not necessarily a linear-time discrete finite-state machine. Therefore, there's a good chance you can do better for your particular problem.
The best answer depends quite a bit on how many haystacks you have and the minimum size of a needle. If the smallest needle is longer than a few characters, you may be able to do a little bit better than a generalized regex library.
Basically all string searches work by testing for a match at the current position (cursor), and if none is found, then trying again with the cursor slid farther to the right.
Rabin-Karp builds a DFSM out of the string (or strings) for which you are searching so that the test and the cursor motion are combined in a single operation. However, Rabin-Karp was originally designed for a single needle, so you would need to support backtracking if one match could ever be a proper prefix of another. (Remember that for when you want to reuse your code.)
Another tactic is to slide the cursor more than one character to the right if at all possible. Boyer-Moore does this. It's normally built for a single needle. Construct a table of all characters and the rightmost position that they appear in the needle (if at all). Now, position the cursor at len(needle)-1. The table entry will tell you (a) what leftward offset from the cursor that the needle might be found, or (b) that you can move the cursor len(needle) farther to the right.
When you have more than one needle, the construction and use of your table grows more complicated, but it still may possibly save you an order of magnitude on probes. You still might want to make a DFSM but instead of calling a general search method, you call does_this_DFSM_match_at_this_offset().
Another tactic is to test more than 8 bits at a time. There's a spam-killer tool that looks at 32-bit machine words at a time. It then does some simple hash code to fit the result into 12 bits, and then looks in a table to see if there's a hit. You have four entries for each pattern (offsets of 0, 1, 2, and 3 from the start of the pattern) and then this way despite thousands of patterns in the table they only test one or two per 32-bit word in the subject line.
So in general, yes, you can go faster than regexes WHEN THE NEEDLES ARE CONSTANT.
I've been looking at the answers but none seem quite explicit... and mostly boiled down to a couple of links.
What intrigues me here is the uniqueness of your problem, the solutions exposed so far do not capitalize at all on the fact that we are looking for several needles at once in the haystack.
I would take a look at KMP / Boyer Moore, for sure, but I would not apply them blindly (at least if you have some time on your hand), because they are tailored for a single needle, and I am pretty convinced we could capitalized on the fact that we have several strings and check all of them at once, with a custom state machine (or custom tables for BM).
Of course, it's unlikely to improve the big O (Boyer Moore runs in 3n for each string, so it'll be linear anyway), but you could probably gain on the constant factor.
Regex engine initialization is expected to have some overhead,
so if there are no real regular expressions involved,
C - memcmp() should do fine.
If you can tell the file sizes and give some
specific use cases, we could build a benchmark
(I consider this very interesting).
Interesting: memcmp explorations and timing differences
Regards
rbo
There is always Boyer Moore
Beside Rabin-Karp-Algorithm and Knuth-Morris-Pratt-Algorithm, my Algorithm-Book suggests a Finite State Machine for String Matching. For every Search String you need to build such a Finite State Machine.
You can do it with very popular Lex & Yacc tools, with the support of Flex and Bison tools.
You can use Lex for getting tokens of the string.
Compare your pre-defined strings with the tokens returned from Lexer.
When match is found, perform the desired action.
There are many sites which describe about Lex and Yacc.
One such site is http://epaperpress.com/lexandyacc/
I'm writing some excel-like C++ console app for homework.
My app should be able to accept formulas for it's cells, for example it should evaluate something like this:
Sum(tablename\fieldname[recordnumber], fieldname[recordnumber], ...)
tablename\fieldname[recordnumber] points to a cell in another table,
fieldname[recordnumber] points to a cell in current table
or
Sin(fieldname[recordnumber])
or
anotherfieldname[recordnumber]
or
"10" // (simply a number)
something like that.
functions are Sum, Ave, Sin, Cos, Tan, Cot, Mul, Div, Pow, Log (10), Ln, Mod
It's pathetic, I know, but it's my homework :'(
So does anyone know a trick to evaluate something like this?
Ok, nice homework question by the way.
It really depends on how heavy you want this to be. You can create a full expression parser (which is fun but also time consuming).
In order to do that, you need to describe the full grammar and write a frontend (have a look at lex and yacc or flexx and bison.
But as I see your question you can limit yourself to three subcases:
a simple value
a lookup (possibly to an other table)
a function which inputs are lookups
I think a little OO design can helps you out here.
I'm not sure if you have to deal with real time refresh and circular dependency checks. Else they can be tricky too.
For the parsing, I'd look at Recursive descent parsing. Then have a table that maps all possible function names to function pointers:
struct FunctionTableEntry {
string name;
double (*f)(double);
};
You should write a parser. Parser should take the expression i.e., each line and should identify the command and construct the parse tree. This is the first phase. In the second phase you can evaluate the tree by substituting the data for each elements of the command.
Previous responders have hit it on the head: you need to parse the cell contents, and interpret them.
StackOverflow already has a whole slew of questions on building compilers and interperters where you can find pointers to resources. Some of them are:
Learning to write a compiler (#1669 people!)
Learning Resources on Parsers, Interpreters, and Compilers
What are good resources on compilation?
References Needed for Implementing an Interpreter in C/C++
...
and so on.
Aside: I never have the energy to link them all together, or even try to build a comprehensive list.
I guess you cannot use yacc/lex (or the like) so you have to parse "manually":
Iterate over the string and divide it into its parts. What a part is depends on you grammar (syntax). That way you can find the function names and the parameters. The difficulty of this depends on the complexity of your syntax.
Maybe you should read a bit about lexical analysis.
We would like to have user defined formulas in our c++ program.
e.g. The value v = x + ( y - (z - 2)) / 2. Later in the program the user would define x,y and z -> the program should return the result of the calculation. Somewhen later the formula may get changed, so the next time the program should parse the formula and add the new values. Any ideas / hints how to do something like this ? So far I just came to the solution to write a parser to calculate these formulas - maybe any ideas about that ?
If it will be used frequently and if it will be extended in the future, I would almost recommend adding either Python or Lua into your code. Lua is a very lightweight scripting language which you can hook into and provide new functions, operators etc. If you want to do more robust and complicated things, use Python instead.
You can represent your formula as a tree of operations and sub-expressions. You may want to define types or constants for Operation types and Variables.
You can then easily enough write a method that recurses through the tree, applying the appropriate operations to whatever values you pass in.
Building your own parser for this should be a straight-forward operation:
) convert the equation from infix to postfix notation (a typical compsci assignment) (I'd use a stack)
) wait to get the values you want
) pop the stack of infix items, dropping the value for the variable in where needed
) display results
Using Spirit (for example) to parse (and the 'semantic actions' it provides to construct an expression tree that you can then manipulate, e.g., evaluate) seems like quite a simple solution. You can find a grammar for arithmetic expressions there for example, if needed... (it's quite simple to come up with your own).
Note: Spirit is very simple to learn, and quite adapted for such tasks.
There's generally two ways of doing it, with three possible implementations:
as you've touched on yourself, a library to evaluate formulas
compiling the formula into code
The second option here is usually done either by compiling something that can be loaded in as a kind of plugin, or it can be compiled into a separate program that is then invoked and produces the necessary output.
For C++ I would guess that a library for evaluation would probably exist somewhere so that's where I would start.
If you want to write your own, search for "formal automata" and/or "finite state machine grammar"
In general what you will do is parse the string, pushing characters on a stack as you go. Then start popping the characters off and perform tasks based on what is popped. It's easier to code if you force equations to reverse-polish notation.
To make your life easier, I think getting this kind of input is best done through a GUI where users are restricted in what they can type in.
If you plan on doing it from the command line (that is the impression I get from your post), then you should probably define a strict set of allowable inputs (e.g. only single letter variables, no whitespace, and only certain mathematical symbols: ()+-*/ etc.).
Then, you will need to:
Read in the input char array
Parse it in order to build up a list of variables and actions
Carry out those actions - in BOMDAS order
With ANTLR you can create a parser/compiler that will interpret the user input, then execute the calculations using the Visitor pattern. A good example is here, but it is in C#. You should be able to adapt it quickly to your needs and remain using C++ as your development platform.