Extract items in an arithmetic expression in c/c++ - c++

If I have an arithmetic expression like x+y-12 / z in a string (c-style or otherwise) in c or c++, how can I extract one item at a time (including the operator)? There may or may not be a space in the expression and multiple digits are allowed for constants.

If your input is simple you can start with something like this:
typedef struct token {
int type;
int ival;
char sval[256];
int ssize;
} Token;
char *get_next_tok(char *buffer, Token *token) {
char *p = buffer; while (isspace(*p)) p++; // trim
if (my_isopchar(*p)) // checks -+*...
p=my_get_op(p, token); // a function to handle multi-char ops
else if (isdigit(*p)) {
token->ival=strtol(p, &p, 10);
token->type=TK_CONST;
}
else if (isalpha(*p)) {
while (isalpha(*p)) {
token->sval[token->ssize++] = *p; p++;
}
token->type = TK_VAR;
}
return p;
}

Easy way: strtok
Hard way: Flex+Bison

Look into parsing. What you describe can, in fact, be quite easily implemented using regular expressions, or hand-written parsing. Think of what makes up your expression's individual tokens, and how code to extract the next token would look.

There was a very nice tutorial on Flipcode on implementing scripting engines. You can read a few of the first chapters.
Basically you need to implement a lexical analyzer which breaks the string into tokens (identifier / constant / operator) and from tokens you can create a parse tree or reverse Polish notation e.g. by recursive descent or using a LL parser which is rather elegant if you are only interested in parsing arithmetic expressions.
Reverse Polish notation is then evaluated using stack-based interpreter or parse tree is evaluated using a recursive algorithm.
I have written a small expression evaluation class in C++ which supports simple expressions with variables.

Related

Why does // in a string does not start a comment in C++?

I am printing a line like this
cout<<"Hello //stackoverflow";
And this produces the following output
Hello //stackoverflow
I want to know why it does not give me an error as I commented half of the statement and there should be
missing terminating " character
error.
The grammar of C++ (like most of programming languages) is context-sensitive. Simply, // does not start a comment if it is within a string literal.
For an in depth analysis of this, you'd have to refer to the language grammar, and the string literal production rules in particular.
Informally speaking, the fact that // appears in the quoted string literal means that it does not denote a comment block. The same applies to /* and */.
The converse applies to other constructs, where maximal munch requires parsing into the token denoting the start of a comment block; a space is needed before the pointer dereference operator in
#include <iostream>
using namespace std;
int main() {
int n = 1;
int* p = &n;
cout << 1 / *p; // Removing the final space will fail compilation.
}
In easy terms, This is because everything inside quotes is recognized as a string and so the computer does not evaluate // as the way to start a comment.

Bison, interfacing with flex in c++

I try to write a compiler, and use flex/bison for the scanning and parsing.
My question is about how these 2 can communicate, so that lex passes a token type, and (if needed) a semantic value.
The problem is that I find different (conflicting?) documentations.
For example here they mention to use yylval subfields for the semantic value, and to return the token type (probably and integer).
[0-9]+ {
yylval->build<int> () = text_to_int (yytext);
return yy::parser::token::INTEGER;
}
[a-z]+ {
yylval->build<std::string> () = yytext;
return yy::parser::token::IDENTIFIER;
}
But then, I see (also in the official docs) this:
"-" return yy::calcxx_parser::make_MINUS (loc);
"+" return yy::calcxx_parser::make_PLUS (loc);
"*" return yy::calcxx_parser::make_STAR (loc);
"/" return yy::calcxx_parser::make_SLASH (loc);
"(" return yy::calcxx_parser::make_LPAREN (loc);
")" return yy::calcxx_parser::make_RPAREN (loc);
":=" return yy::calcxx_parser::make_ASSIGN (loc);
{int} {
errno = 0;
long n = strtol (yytext, NULL, 10);
if (! (INT_MIN <= n && n <= INT_MAX && errno != ERANGE))
driver.error (loc, "integer is out of range");
return yy::calcxx_parser::make_NUMBER (n, loc);
}
{id} return yy::calcxx_parser::make_IDENTIFIER (yytext, loc);
. driver.error (loc, "invalid character");
<<EOF>> return yy::calcxx_parser::make_END (loc);
Here, yylval is not mentioned at all, and what we return is some strange make_??? functions, that I fail to understand where they are defined, what parameters they accept and what they return.
Can somebody clarify to me the is the difference between those 2 approaches, and, if I should use the second, a short explanation on those mysterious make_??? methods?
Thanks in advance!
The documentation section you link to is the first of two sections which describe alternative APIs. It would be better to start reading at the beginning, where it is explained that:
The actual interface with yylex depends whether you use unions, or variants.
The example you cite uses variants, and therefore uses the complete symbols interface, where the make_* methods are defined. (These are not standard library or Boost variants; they are a simple discriminated union class defined by the bison framework.)
Which of the APIs you use is entirely up to you; they both have advantages and disadvantages.
There is also a third alternative: build both the parser and the lexer using C interfaces. That doesn't stop you from using C++ datatypes, but you cannot put them directly into the parser stack; you need to use pointers and that makes memory management more manual. (Actually, there are two different C APIs as well: the traditional one, in which the parser automatically calls the scanner when it needs a token, and the "push" interface, where the scanner calls the parser with each token.)

creating an expression using numbers and charater operator

I am trying to create and expression from two or more numbers and character operator. The exact scenario is that i have two number for eg.
float a = 10.1, b = 10.2;
and a character operator
char ch = '+';
Now i have to create an expression that would look like
float c = 10.1 '+' 10.2;
i.e. i want to apply the operator mentioned in char variable "ch" between the two float numbers i have. So in this case the charater is '+' so i want to create the expression where both the float values will be added, if '-' then substraction etc. All the values will actually be supplied by the user so want to create an expression and than perform the operation.
Now one solution I thought of is to have switch case for different operators and that would do the trick. Another one is below:
float a = 10.1, b = 20.3;
char ch = '+';
string result = "";
ostringstream os;
os << a;
result += os.str();
os.str("");
os << b;
result += ch + os.str();
Now I wrote the above snippet so that I can create the expression based on user input and than return that expression so that it can be evaluated it in another procedure.
I am not sure if that's possible. I mean the switch case solution seems to be fine where i evaluate the expression here itself and return the output value, but just wanted to know if there is a way to return the expression to another function and then evaluate it there?
In tcl scripting language we have a command "expr" which does the same job and so was wondering if we have any such ability to do the same in c++. Any help would be appreciated.
I think the key to your question is in considering the expression as an object. You're using C++, which some consider an object-oriented programming language, right? :) Consider writing a class Expression that follows the Composite Pattern. An Expression might be just a simple value:
Expression(10.1)
It could also represent the addition of two subordinate Expressions:
Expression(Expression(10.1) + Expression(20.3))
Or to give you a further hint:
Expression('+', Expression(10.1), Expression(20.3))
Make Expression hold the operators and operands of the expression without actually evaluating it. Then you are free to construct it in one place in your program, then pass that to another place to actually evalute it.
C++ has a wealth of expression parsing libraries. While I haven’t used any of them myself I have heard good things about muParser.
Assuming that this is another assignment/homework and you don't pursue fully featured expression parser, here is the solution as simple as it could be:
#include <iostream>
#include <sstream>
using std::stringstream;
using std::cout;
using std::endl;
float compute(float a, float b, char op) {
switch(op) {
case '+':
return a + b;
case '-':
return a - b;
// You may add more operations in the similar way.
default:
cout << "Operation is not supported." << endl;
}
return 0;
}
int main() {
// These guys are here to simulate user input.
float input_a = 10.1;
float input_b = 20.3;
char input_op = '+';
stringstream ss;
ss << input_a << input_op << input_b;
// If you really make it interactive, then the program actually starts here.
float a;
float b;
char op;
// You simply read operands and operator from some input stream,
// which in case of interactive program could be `std::cin`.
ss >> a;
ss >> op;
ss >> b;
// Print the result of computation.
cout << compute(a, b, op) << endl;
}
If you want to handle more complex situations, like evaluation of nested expressions, possibly including parentheses functionality, then I'd suggest that you read first 4 chapters of the classical Dragon Book. It really took me around 1-2 weeks to be able to write LR-parser for ANSI C, which is somewhat much more complicated than your problem.
Your task is very simple and can be described with a toy context-free grammar which doesn't even require LL-parser to handle. Anyway to understand, why and how, I encorage you to read this book.

HEX assignement in C

I have generated a long sequence of bytes which looks as follows:
0x401DA1815EB560399FE365DA23AAC0757F1D61EC10839D9B5521F.....
Now, I would like to assign it to a static unsigned char x[].
Obviously, I get the warning that hex escape sequence out of range when I do this here
static unsigned char x[] = "\x401DA1815EB56039.....";
The format it needs is
static unsigned char x[] = "\x40\x1D\xA1\x81\x5E\xB5\x60\x39.....";
So I am wondering if in C there is a way for this assignment without me adding the
hex escape sequence after each byte (could take quite a while)
I don't think there's a way to make a literal out of it.
You can parse the string at runtime and store it in another array.
You can use sed or something to rewrite the sequence:
echo 401DA1815EB560399FE365DA23AAC0757F1D61EC10839D9B5521F | sed -e 's/../\\x&/g'
\x40\x1D\xA1\x81\x5E\xB5\x60\x39\x9F\xE3\x65\xDA\x23\xAA\xC0\x75\x7F\x1D\x61\xEC\x10\x83\x9D\x9B\x55\x21F
AFAIK, No.
But you can use the regex s/(..)/\\x$1/g to convert your sequence to the last format.
No there is no way to do that in C or C++. The obvious solution is to write a program to insert the '\x' sequences at the correct point in the string. This would be a suitable task for a scripting language like perl, but you can also easily do it in C or C++.
If the sequence is fixed, I suggest following the regexp-in-editor suggestion.
If the sequence changes dynamically, you can relatively easily convert it on runtime.
char in[]="0x401DA1815EB560399FE365DA23AAC0757F1D61EC10839D9B5521F..."; //or whatever, loaded from a file or such.
char out[MAX_LEN]; //or malloc() as l/2 or whatever...
int l = strlen(in);
for(int i=2;i<l;i+=2)
{
out[i/2-1]=16*AsciiAsHex(in[i])+AsciiAsHex(in[i]+1);
}
out[i/2-1]='\0';
...
int AsciiAsHex(char in)
{
if(in>='0' && in<='9') return in-'0';
if(in>='A' && in<='F') return in+10-'A';
if(in>='a' && in<='f') return in+10-'a';
return 0;
}

std::stringstream to read int and strings, from a string

I am programming in C++ and I'm not sure how to achieve the following:
I am copying a file stream to memory (because I was asked to, I'd prefer reading from stream), and and then trying to access its values to store them into strings and int variables.
This is to create an interpreter. The code I will try to interpret is (ie):
10 PRINT A
20 GOTO 10
This is just a quick example code. Now the values will be stored in a "map" structure at first and accessed later when everything will be "interpreted".
The values to be stored are:
int lnum // line number
string cmd // command (PRINT and GOTO)
string exp // expression (A and 10 in this case but could hold expressions like (a*b)-c )
question is given the following code, how do I access those values and store them in memory?
Also the exp string is of variable size (can be just a variable or an expression) so I am not sure how to read that and store it in the string.
code:
#include <iostream>
#include <fstream>
#include <string>
#include <cstdlib>
#include <cstring>
#include <map>
#include <sstream>
using namespace std;
#include "main.hh"
int main ()
{
int lenght;
char *buffer;
// get file directory
string dir;
cout << "Please drag and drop here the file to interpret: ";
getline (cin,dir);
cout << "Thank you.\n";
cout << "Please wait while your file is being interpreted.\n \n";
// Open File
ifstream p_prog;
p_prog.open (dir.c_str());
// Get file size
p_prog.seekg (0, ios::end);
lenght = p_prog.tellg();
p_prog.seekg(0, ios::beg);
// Create buffer and copy stream to it
buffer = new char[lenght];
p_prog.read (buffer,lenght);
p_prog.close();
// Define map<int, char>
map<int, string> program;
map<int, string>::iterator iter;
/***** Read File *****/
int lnum; // line number
string cmd; // store command (goto, let, etc...)
string exp; // to be subst with expr. type inst.
// this is what I had in mind but not sure how to use it properly
// std::stringstream buffer;
// buffer >> lnum >> cmd >> exp;
program [lnum] = cmd; // store values in map
// free memory from buffer, out of scope
delete[] buffer;
return 0;
}
I hope this is clear.
Thank you for your help.
Valerio
You can use a std::stringstream to pull tokens, assuming that you already know the type.
For an interpreter, I'd highly recommend using an actual parser rather than writing your own. Boost's XPressive library or ANTLR work quite well. You can build your interpreter primitives using semantic actions as you parse the grammar or simply build an AST.
Another option would be Flex & Bison. Basically, these are all tools for parsing pre-defined grammars. You can build your own, but prepare for frustration. Recursively balancing parentheses or enforcing order of operations (divide before multiply, for example) isn't trivial.
The raw C++ parsing method follows:
#include <sstream>
#include <string>
// ... //
istringstream iss(buffer);
int a, b;
string c, d;
iss >> a;
iss >> b;
iss >> c;
iss >> d;
The way something like this can be done (especially the arithmetic expression part that you alluded to) is:
Write some code that determines where a token ends and begins. For example 5 or + would be called a token. You might scan the text for these, or common separators such as whitespace.
Write up the grammar of the language you're parsing. For example you might write:
expression -> value
expression -> expression + expression
expression -> expression * expression
expression -> function ( expression )
expression -> ( expression )
Then based on this grammar you would write something that parses tokens of expressions into trees.
So you might have a tree that looks like this (pardon the ASCII art)
+
/ \
5 *
/ \
x 3
Where this represents the expression 5 + (x * 3). By having this in a tree structure it is really easy to evaluate expressions in your code: you can recursively descend the tree, performing the operations with the child nodes as arguments.
See the following Wikipedia articles:
Parser
Top-down parsing (your needs are probably simple enough to use this)
Recursive-descent parser (a simple way to convert a grammar into code)
Or consult your local computer science department. :-)
There are also tools that will generate these parsers for you based on a grammar. You can do a search for "parser generator".
Don't do the dynamic allocation of the buffer explicitly use a vector.
This makes memory management implicit.
// Create buffer and copy stream to it
std::vector<char> buffer(lenght);
p_prog.read (&buffer[0],lenght);
p_prog.close();
Personally I don't explicitly use close() (unless I want to catch an exception). Just open a file in a scope that will cause the destructor to close the file when it goes out of scope.
This might be of help:
http://oopweb.com/CPP/Documents/CPPHOWTO/Volume/C++Programming-HOWTO-7.html
Especially section 7.3.
You might be better off just <<'ing the lines in rather than the seeking and charbuffer route.