I try to write a compiler, and use flex/bison for the scanning and parsing.
My question is about how these 2 can communicate, so that lex passes a token type, and (if needed) a semantic value.
The problem is that I find different (conflicting?) documentations.
For example here they mention to use yylval subfields for the semantic value, and to return the token type (probably and integer).
[0-9]+ {
yylval->build<int> () = text_to_int (yytext);
return yy::parser::token::INTEGER;
}
[a-z]+ {
yylval->build<std::string> () = yytext;
return yy::parser::token::IDENTIFIER;
}
But then, I see (also in the official docs) this:
"-" return yy::calcxx_parser::make_MINUS (loc);
"+" return yy::calcxx_parser::make_PLUS (loc);
"*" return yy::calcxx_parser::make_STAR (loc);
"/" return yy::calcxx_parser::make_SLASH (loc);
"(" return yy::calcxx_parser::make_LPAREN (loc);
")" return yy::calcxx_parser::make_RPAREN (loc);
":=" return yy::calcxx_parser::make_ASSIGN (loc);
{int} {
errno = 0;
long n = strtol (yytext, NULL, 10);
if (! (INT_MIN <= n && n <= INT_MAX && errno != ERANGE))
driver.error (loc, "integer is out of range");
return yy::calcxx_parser::make_NUMBER (n, loc);
}
{id} return yy::calcxx_parser::make_IDENTIFIER (yytext, loc);
. driver.error (loc, "invalid character");
<<EOF>> return yy::calcxx_parser::make_END (loc);
Here, yylval is not mentioned at all, and what we return is some strange make_??? functions, that I fail to understand where they are defined, what parameters they accept and what they return.
Can somebody clarify to me the is the difference between those 2 approaches, and, if I should use the second, a short explanation on those mysterious make_??? methods?
Thanks in advance!
The documentation section you link to is the first of two sections which describe alternative APIs. It would be better to start reading at the beginning, where it is explained that:
The actual interface with yylex depends whether you use unions, or variants.
The example you cite uses variants, and therefore uses the complete symbols interface, where the make_* methods are defined. (These are not standard library or Boost variants; they are a simple discriminated union class defined by the bison framework.)
Which of the APIs you use is entirely up to you; they both have advantages and disadvantages.
There is also a third alternative: build both the parser and the lexer using C interfaces. That doesn't stop you from using C++ datatypes, but you cannot put them directly into the parser stack; you need to use pointers and that makes memory management more manual. (Actually, there are two different C APIs as well: the traditional one, in which the parser automatically calls the scanner when it needs a token, and the "push" interface, where the scanner calls the parser with each token.)
Related
I'm reading the documentation of std::experimental::optional and I have a good idea about what it does, but I don't understand when I should use it or how I should use it. The site doesn't contain any examples as of yet which leaves it harder for me to grasp the true concept of this object. When is std::optional a good choice to use, and how does it compensate for what was not found in the previous Standard (C++11).
The simplest example I can think of:
std::optional<int> try_parse_int(std::string s)
{
//try to parse an int from the given string,
//and return "nothing" if you fail
}
The same thing might be accomplished with a reference argument instead (as in the following signature), but using std::optional makes the signature and usage nicer.
bool try_parse_int(std::string s, int& i);
Another way that this could be done is especially bad:
int* try_parse_int(std::string s); //return nullptr if fail
This requires dynamic memory allocation, worrying about ownership, etc. - always prefer one of the other two signatures above.
Another example:
class Contact
{
std::optional<std::string> home_phone;
std::optional<std::string> work_phone;
std::optional<std::string> mobile_phone;
};
This is extremely preferable to instead having something like a std::unique_ptr<std::string> for each phone number! std::optional gives you data locality, which is great for performance.
Another example:
template<typename Key, typename Value>
class Lookup
{
std::optional<Value> get(Key key);
};
If the lookup doesn't have a certain key in it, then we can simply return "no value."
I can use it like this:
Lookup<std::string, std::string> location_lookup;
std::string location = location_lookup.get("waldo").value_or("unknown");
Another example:
std::vector<std::pair<std::string, double>> search(
std::string query,
std::optional<int> max_count,
std::optional<double> min_match_score);
This makes a lot more sense than, say, having four function overloads that take every possible combination of max_count (or not) and min_match_score (or not)!
It also eliminates the accursed "Pass -1 for max_count if you don't want a limit" or "Pass std::numeric_limits<double>::min() for min_match_score if you don't want a minimum score"!
Another example:
std::optional<int> find_in_string(std::string s, std::string query);
If the query string isn't in s, I want "no int" -- not whatever special value someone decided to use for this purpose (-1?).
For additional examples, you could look at the boost::optional documentation. boost::optional and std::optional will basically be identical in terms of behavior and usage.
An example is quoted from New adopted paper: N3672, std::optional:
optional<int> str2int(string); // converts int to string if possible
int get_int_from_user()
{
string s;
for (;;) {
cin >> s;
optional<int> o = str2int(s); // 'o' may or may not contain an int
if (o) { // does optional contain a value?
return *o; // use the value
}
}
}
but I don't understand when I should use it or how I should use it.
Consider when you are writing an API and you want to express that "not having a return" value is not an error. For example, you need to read data from a socket, and when a data block is complete, you parse it and return it:
class YourBlock { /* block header, format, whatever else */ };
std::optional<YourBlock> cache_and_get_block(
some_socket_object& socket);
If the appended data completed a parsable block, you can process it; otherwise, keep reading and appending data:
void your_client_code(some_socket_object& socket)
{
char raw_data[1024]; // max 1024 bytes of raw data (for example)
while(socket.read(raw_data, 1024))
{
if(auto block = cache_and_get_block(raw_data))
{
// process *block here
// then return or break
}
// else [ no error; just keep reading and appending ]
}
}
Edit: regarding the rest of your questions:
When is std::optional a good choice to use
When you compute a value and need to return it, it makes for better semantics to return by value than to take a reference to an output value (that may not be generated).
When you want to ensure that client code has to check the output value (whoever writes the client code may not check for error - if you attempt to use an un-initialized pointer you get a core dump; if you attempt to use an un-initialized std::optional, you get a catch-able exception).
[...] and how does it compensate for what was not found in the previous Standard (C++11).
Previous to C++11, you had to use a different interface for "functions that may not return a value" - either return by pointer and check for NULL, or accept an output parameter and return an error/result code for "not available".
Both impose extra effort and attention from the client implementer to get it right and both are a source of confusion (the first pushing the client implementer to think of an operation as an allocation and requiring client code to implement pointer-handling logic and the second allowing client code to get away with using invalid/uninitialized values).
std::optional nicely takes care of the problems arising with previous solutions.
I often use optionals to represent optional data pulled from configuration files, that is to say where that data (such as with an expected, yet not necessary, element within an XML document) is optionally provided, so that I can explicitly and clearly show if the data was actually present in the XML document. Especially when the data can have a "not set" state, versus an "empty" and a "set" state (fuzzy logic). With an optional, set and not set is clear, also empty would be clear with the value of 0 or null.
This can show how the value of "not set" is not equivalent to "empty". In concept, a pointer to an int (int * p) can show this, where a null (p == 0) is not set, a value of 0 (*p == 0) is set and empty, and any other value (*p <> 0) is set to a value.
For a practical example, I have a piece of geometry pulled from an XML document that had a value called render flags, where the geometry can either override the render flags (set), disable the render flags (set to 0), or simply not affect the render flags (not set), an optional would be a clear way to represent this.
Clearly a pointer to an int, in this example, can accomplish the goal, or better, a share pointer as it can offer cleaner implementation, however, I would argue it's about code clarity in this case. Is a null always a "not set"? With a pointer, it is not clear, as null literally means not allocated or created, though it could, yet might not necessarily mean "not set". It is worth pointing out that a pointer must be released, and in good practice set to 0, however, like with a shared pointer, an optional doesn't require explicit cleanup, so there isn't a concern of mixing up the cleanup with the optional having not been set.
I believe it's about code clarity. Clarity reduces the cost of code maintenance, and development. A clear understanding of code intention is incredibly valuable.
Use of a pointer to represent this would require overloading the concept of the pointer. To represent "null" as "not set", typically you might see one or more comments through code to explain this intention. That's not a bad solution instead of an optional, however, I always opt for implicit implementation rather than explicit comments, as comments are not enforceable (such as by compilation). Examples of these implicit items for development (those articles in development that are provided purely to enforce intention) include the various C++ style casts, "const" (especially on member functions), and the "bool" type, to name a few. Arguably you don't really need these code features, so long as everyone obeys intentions or comments.
I am extracting a string from a .txt file and saving it in a variable:
std::string line = "The king's name is getKingName()";
Lets assume that getKingName() is a function that returns a King class' name data member.
How can I make a call to getKingName() when the string variable looks like that?
As far as I know, C++ does not provide such kind of functionality to interpolate functions call inside a string. All you can do implement your own logic to do that.
Like,
1) define all the valid methods like this,
string getKingName(){
return "Some name";
}
string otherMethods(){
return "other values";
}
2) One helper method for mapping of function call
string whomToCall(string methodName){
switch(methodName){
case "getKingName()":
return getKingName();
break;
case "otherMethods()":
return otherMethods();
break;
default:
return "No such method exist";
}
}
3) break the line in tokens(words), read one by one and check for following condition also if
token starts with any alphabetical character and ends with "()" substring
istringstream ss(line);
do {
string token;
ss >> token;
if(isMethod(token))
cout << whomToCall(token) << " ";
else
cout << token<< " ";
} while (ss);
4) isMethod() to check if token's value can be a valid method name
bool isMethod(string token){
int n= token.length();
return isalpha(token[0]) && token[n-2]=='(' && token[n-1] == ')' ;
}
This would be the easiest solution, but I think your problem consists of several such calls?
std::string line = "The king's name is getKingName()";
if (line.find("getKingName()") != std::string::npos) {
King name = getKingName();
}
Amended
This answer is a little off subject. I will leave it up, because others might find it relevant, but I agree with other answers, a simple map->function will work better for your case.
This is not supported by C++. C++ is not an interpreted language. If you you want to do things like this, why not use an interpreted language, which do these sorts of things by default. Languages like lua are designed to call C/C++ functions with an interpreted language, with a small overhead.
However, if you really need to do this, it is possible, depending on your operating system. For example,
On windows start with dbghelp. You will need to build a pdb, (e.g. build with symbols).
On linux, you will also need to build symbols (-g), and use something like dlsym see here for a discussion.
That said, there are lots of gotchas doing it this way. Optimization can get in the way (best to disable them). Also best to avoid dynamic linking (prefer static). You will also need to cope with C++ name mangling (the name of the function is not the name of your function in C++). see https://blog.oakbits.com/how-to-mangle-and-demangle-a-c-method-name.html.
I am just messing around in C++ with some things I recently learned and I wanted to know how to correctly compare two strings to each other. I looked at a previous thread for help, but I am not sure I am getting the variables right and there was a repeating error. (P.S. This is executed to the command prompt.)
string Users = "Username1";
//Set an empty string.
string UserChoice;
//Print out a line that warns the user to type a user.
std::cout << "Username: ";
std::cin >> UserChoice;
//If the user types out whatever "Users" is, run the code below.
if (strcmp(Users, UserChoice) == 0){
//Do Stuff
}
You want:
if (Users == UserChoice) {
The std::string class (well, really std::basic_string) overloads the == operator (and many others) to do what you want. You should not be using C functions like strcmp in C++ code, and in any case they cannot be directly applied to C++ std::strings.
Comparing strings is the same as comparing int values, char values, etc... . You should use the following method:
string a
string b
if (a == b)
{
// Do something
}
In your case, 'a' and 'b' would be replaced by 'Users', 'UserChoices'. But the basic format of comparing 2 variables of the same type stays the same regardless of the type (I'm not sure whether there are any exceptions to this rule or not).
It is also recommended, just as #latedeveloper mentioned, not to use c-language functions in a c++ program. The 2 languages are NOT interchangeable!
** Helpful tip: Always strive to keep your code as simple as possible. With some exceptions possible, the more complicated you make your code, the more hard you will make it for others to understand your code. To connect it to your case, why use a function strcmp() when you can keep it simple by using the == sign? This is just my 2 bits based on personal experience.
c style:
string a
string b
if(strcmp(a.c_str(),b.c.str()) == 0)
I'm working on an old network engine and the type of package sent over the network is made up of 2 bytes.
This is more or less human readable form, for example "LO" stands for Login.
In the part that reads the data there is an enormous switch, like this:
short sh=(((int)ad.cData[p])<<8)+((int)ad.cData[p+1]);
switch(sh)
{
case CMD('M','D'):
..some code here
break
where CMD is a define:
#define CMD(a,b) ((a<<8)+b)
I know there are better ways but just to clean up a bit and also to be able to search for the tag (say "LO") more easily (and not search for different types of "'L','O'" or "'L' , 'O'" or the occasional "'L', 'O'" <- spaces make it hard to search) I tried to make a MACRO for the switch so I could use "LO" instead of the define but I just can't get it to compile.
So here is the question: how do you change the #define to a macro that I can use like this instead:
case CMD("MD"):
..some code here
break
It started out as a little subtask to make life a little bit easier but now I can't get it out of my head, thanks for any help!
Cheers!
[edit] The code works, it the world that's wrong! ie. Visual Studio 2010 has a bug concerning this. No wonder I cut my teeth on it.
Macro-based solution
A string-literal is really an instance of char const[N] where N is the length of the string, including the terminating null-byte. With this in mind you can easily access any character within the string-literal by using string-literal[idx] to specify that you'd like to read the character stored at offset idx.
#define CMD(str) ((str[0]<<8)+str[1])
CMD("LO") => (("LO"[0]<<8)+"LO"[1]) => (('L'<<8)+'0')
You should however keep in mind that there's nothing preventing your from using the above macro with a string which is shorter than that of length 2, meaning that you can run into undefined-behavior if you try to read an offset which is not actually valid.
RECOMMENDED: C++11, use a constexpr function
You could create a function usable in constant-expressions (and with that, in case-labels), with a parameter of reference to const char[3], which is the "real" type of your string-literal "FO".
constexpr short cmd (char const(&ref)[3]) {
return (ref[0]<<8) + ref[1];
}
int main () {
short data = ...;
switch (data) {
case cmd("LO"):
...
}
}
C++11 and user-defined literals
In C++11 we were granted the possibility to define user-defined literals. This will make your code far easier to maintain and interpret, as well as having it be safer to use:
#include <stdexcept>
constexpr short operator"" _cmd (char const * s, unsigned long len) {
return len != 2 ? throw std::invalid_argument ("") : ((s[0]<<8)+s[1]);
}
int main () {
short data = ...;
switch (data) {
case "LO"_cmd:
...
}
}
The value associated with a case-label must be yield through a constant-expression. It might look like the above might throw an exception during runtime, but since a case-label is constant-expression the compiler must be able to evaluate "LO"_cmd during translation.
If this is not possible, as in "FOO"_cmd, the compiler will issue a diagnostic saying that the code is ill-formed.
If I have an arithmetic expression like x+y-12 / z in a string (c-style or otherwise) in c or c++, how can I extract one item at a time (including the operator)? There may or may not be a space in the expression and multiple digits are allowed for constants.
If your input is simple you can start with something like this:
typedef struct token {
int type;
int ival;
char sval[256];
int ssize;
} Token;
char *get_next_tok(char *buffer, Token *token) {
char *p = buffer; while (isspace(*p)) p++; // trim
if (my_isopchar(*p)) // checks -+*...
p=my_get_op(p, token); // a function to handle multi-char ops
else if (isdigit(*p)) {
token->ival=strtol(p, &p, 10);
token->type=TK_CONST;
}
else if (isalpha(*p)) {
while (isalpha(*p)) {
token->sval[token->ssize++] = *p; p++;
}
token->type = TK_VAR;
}
return p;
}
Easy way: strtok
Hard way: Flex+Bison
Look into parsing. What you describe can, in fact, be quite easily implemented using regular expressions, or hand-written parsing. Think of what makes up your expression's individual tokens, and how code to extract the next token would look.
There was a very nice tutorial on Flipcode on implementing scripting engines. You can read a few of the first chapters.
Basically you need to implement a lexical analyzer which breaks the string into tokens (identifier / constant / operator) and from tokens you can create a parse tree or reverse Polish notation e.g. by recursive descent or using a LL parser which is rather elegant if you are only interested in parsing arithmetic expressions.
Reverse Polish notation is then evaluated using stack-based interpreter or parse tree is evaluated using a recursive algorithm.
I have written a small expression evaluation class in C++ which supports simple expressions with variables.