Change yylex in C++ Flex

Change yylex in C++ Flex - c++

I want to change yylex to alpha_yylex, that also takes in a vector as an argument.
.
.
#define YY_DECL int yyFlexLexer::alpha_yylex(std::vector<alpha_token_t> tokens)
%}
.
.
. in main()
std::vector<alpha_token_t> tokens;
while(lexer->alpha_yylex(tokens) != 0) ;
I think i know why this fails, because obviously in the FlexLexer.h there is NO alpha_yylex , but i don't know how to achieve what i want...
How can I make my own alpha_yylex() or modify the existing one?

It's true that you cannot edit the definition of yyFlexLexer, since FlexLexer.h is effectively a system-wide header file. But you can certainly subclass it, which will provide most of what you need.
Subclassing yyFlexLexer
Flex allows you to use %option yyclass (or the --yyclass command-line option) to specify the name of a subclass, which will be used instead of yyFlexLexer to define yylex. Subclassing yyFlexLexer allows you to include your own header which defines your subclass' members and maybe even additional functions, as well as its constructors; in short, if your intention was simply to fill in a std::vector<alpha_token_t> with the successive tokens, you could easily do that by defining AlphaLexer as a subclass of yyFlexLexer, with an instance member called tokens (or, perhaps, with accessor functions).
You can also add additional member functions to your new class, which might provide what you need those additional arguments for.
The thing which is not quite so straight-forward, although it could easily be accomplished using the YY_DECL macro in the C interface, is to change the name and prototype of the scanning function generated by flex. It can be done (see below) but it is not clear that it is actually supported. In any case, it is possibly less important in the case of C++.
Aside from a small wrinkle created by the curious organization of Flex's C++ classes [Note 1], subclassing the lexer class is simple. You need to derive your class from yyFlexLexer [Note 2], which is declared in FlexLexer.h, and you need to tell Flex what the name of your class is, either by using %option yyclass in your Flex file, or by specifying the name on the command line with --yyclass.
yyFlexLexer includes the various methods for manipulating input buffers, as well as all the mutable state for the lexical scanner used by the standard skeleton. (Much of this is actually derived from the base class FlexLexer.) It also includes a virtual yylex method with prototype
virtual int yylex();
When you subclass yyFlexLexer, yyFlexLexer::yylex() is defined to signal an error by calling yyFlexLexer::LexerError(const char*) and the generated scanner is defined as the override in the class defined as yyclass. (If you don't subclass, the generated scanner is yyFlexLexer::yylex().)
The one wrinkle is the way you need to declare your subclass. Normally, you would do that in a header file like this:
File: myscanner.h (Don't use this version)
#pragma once
// DON'T DO THIS; IT WON'T WORK (flex 2.6)
#include <yyFlexLexer.h>
class MyScanner : public yyFlexLexer {
// whatever
};
You would then #include "myscanner.h" in any file which needed to use the scanner, including the generated scanner itself.
Unfortunately, that won't work because it will result in FlexLexer.h being included twice in the generated scanner; FlexLexer.h does not have an include guard in the normal sense of the word because it is designed to be included multiple times in order to support the prefix option. So you need to define two header files:
File: myscanner-internal.h
#pragma once
// This file depends on FlexLexer.h having already been included
// in the translation unit. Don't use it other than in the scanner
// definition.
class MyScanner : public yyFlexLexer {
// whatever
};
File: myscanner.h
#pragma once
#include <FlexLexer.h>
#include "myscanner.h"
Then you use #include "myscanner.h" in every file which needs to know about the scanner except the scanner definition itself. In your myscanner.ll file, you will #include "myscanner-internal.h", which works because Flex has already included FlexLexer.h before it inserts the prologue C++ code from your scanner definition.
Changing the yylex prototype
You can't really change the prototype (or name) of yylex, because it is declared in FlexLexer.h and, as mentioned above, defined to signal an error. You can, however, redefine YY_DECL to create a new scanner interface. To do so, you must first #undef the existing YY_DECL definition, at least in your scanner definition, because a scanner with %option yyclass="MyScanner" contains #define YY_DECL int MyScanner::yylex(). That would make your myscanner-internal.h` file look like this:
#pragma once
// This file depends on FlexLexer.h having already been included
// in the translation unit. Don't use it other than in the scanner
// definition.
#undef YY_DECL
#define YY_DECL int MyScanner::alpha_yylex(std::vector<alpha_token_t>& tokens)
#include <vector>
#include "alpha_token.h"
class MyScanner : public yyFlexLexer {
public:
int alpha_yylex(std::vector<alpha_token_t>& tokens);
// whatever else you need
};
The fact that the MyScanner object still has a (not very functional) yylex method might not be a problem. There are some undocumented interfaces in FlexLexer which call yylex(), but those don't matter if you don't use them. (They're not all that useful, anyway.) But you should at least be aware that the interface exists.
In any case, I don't see the point of renaming yylex (but perhaps you have a different aesthetic sense). It's already effectively namespaced by being a member of a specific class (MyScanner, above), so yylex doesn't really create any confusion.
In the particular case of the std::vector<alpha_token_t>& argument, it seems to me that a cleaner solution would be to put the reference as a member variable in the MyScanner class and set it with the constructor or with an accessor method. Unless you actually use different vectors at different points in the lexical analysis -- not evident in the example code in your question -- there's no point burdening every call site with the need to pass the address of the vector into the yylex call. Since lexer actions are compiled inside yylex, which is a member function of MyScanner, instance variables -- even private instance variables -- are usable in the lexer actions. Of course, that's not the only use case for extra yylex arguments, but it's a pretty common one.
Notes
"The C++ interface is a mess," according to a comment in the generated code.
Using %option prefix, you can change yy to something else if you want to. This a feature which is supposedly intended to allow you to include multiple lexical scanners in the same project. However, if you're planning on subclassing, the base classes for all these lexical scanners will be identical (other than their names). Thus, there is little or no point having different base classes. Renaming the scanner class using %option prefix is less flexible and no more efficient than subclassing, and it creates an additional header complication. (See this older answer for details.) So I'd recommend sticking with subclassing.

Related

C++ Including header file multiple times

I was just wondering where is it necessary/right to include a specific header file according to the example below. Let's assume I have a definition of an exception class:
//exc.hpp
#ifndef EXC_H
#define EXC_H
class MyException : public exception {
};
#endif /* EXC_H */
Then I have another class definition throwing such exception:
//a.cpp
void SomeClass::someMethod(void) {
throw MyException(...);
}
And having another file handling that exception, e.g.:
//main.cpp
#include "a.hpp"
int main() {
...
catch(MyException & e) { ... }
}
So my question is, where should I place #include "exc.hpp"? Just to a.hpp, or both a.hpp and main.cpp?
And when it comes to makefile... How should be the targets specified within such file?

Every translation unit that throws or catches exceptions of that type will need to be able to see its definition.
Loosely speaking, that means every .cpp file containing a throw or catch relating to that exception must include your .hpp file.
That's even if you only catch by reference and never inspect the exception object, which is not the case in other areas of C++ (where a forward declaration will do).
Makefiles are unrelated.

Both the file with implementation (a.cpp) and all files the use the class (main.cpp) should have the #include.
a.cpp without the include must not compile at all.

You need to include the .hpp file in both main.cpp and a.cpp. The purpose of the #ifndef sequence is to prevent accidental multiple inclusion (through indirect #includes). There is also a #pragma once directive in MS compilers that does the same thing.
The compiler figures out what .h/.hpp files to read based on the #includes in the .cpp files; the make file is not involved.

Remember that the compiler processes each source file independently, and doesn't remember anything from the source file once it's done processing. Even if you list several source files on a single compiler command-line.
You have a header-file that defines a type. Naturally, you must #include the header file in every source file where you need that type to be defined. (The compiler will not remember having seen the types when processing earlier source files.)
It might be tempting to #include headers within other headers, just so you don't have to #include so many things within the .c or .cpp files, but this should be avoided to the degree possible. It produces what is known as "header coupling", and it makes code that is hard to re-use later on other projects.
There's also a fine point hiding in what I said above: "where you need that type to be defined". There are two very specific concepts in C and C++ related to variables:
declaration -- when you make the compiler aware that a type exists, and
definition -- where you tell the compiler the details of the type.
You need to #include your header wherever you need the definition. I.e., when you intend to instantiate an object of the type, define members of the type in another struct or class, or call one of its methods (assuming C++). If instead, you only want to store a reference to objects of the type without creating or using them, a declaration will suffice, and you can just forward-declare the class. I.e.,
class MyException;
void setFileNotFoundExceptionObject(const MyException *exc) { ... }
Many APIs are designed specifically around only using pointers or references to objects so that the API headers only need forward delcarations of the types, and not the complete definitions (This keeps the internal members of the objects hidden to prevent developers from abusing them.)

How to use flex with my own parser?

I want to leave the lexical analysis to lex but develop the parser on my own.
I made a token.h header which has the enums for token types and a simple class hierarchy,
For the lex rule:
[0-9]+ {yylval = new NumToken(std::stoi(yytext));return NUM;}
How do I get the NumToken pointer from the parser code?
Suppose I just want to print out the tokens..
while(true)
{
auto t = yylex();
//std::cout <<yylval.data<<std::endl; // What goes here ?
}
I can do this with yacc/bison, but can not find any documentation or example about how to do this manually.

In a traditional bison/flex parser, yylval is a global variable defined in the parser generated by bison, and declared in the header file generated by bison (which should be #include'd into the generated scanner). So a simple solution would be just to replicate that: declare yylval (as a global) in token.h and define it somewhere in your parser.
But modern programming style has shifted away from the use of globals (for good reason), and indeed even flex will generate scanners which do not depend on global state, if requested. To request such a scanner, specify
%option reentrant
in your scanner definition. By default, this changes the prototype of yylex to:
int yylex(yyscan_t yyscanner);
where yyscan_t is an opaque pointer. (This is C, so that means it's a void*.) You can read about the details in the Flex manual; the most important takeaway is that you can ask flex to also generate a header file (with %option header-file), so that other translation units can refer to the various functions for creating, destroying and manipulating a yyscan_t, and that you need to minimally create one so that yylex has somewhere to store its state. (Ideally, you would also destroy it.) [Note 1].
The expected way to use a reentrant scanner from bison is to enable %option bison-bridge (and %option bison-location if your lexer generates source location information for each token). This will add an additional parameter to the yylex prototype:
int yylex(YYSTYPE *yylval_param, yyscan_t scanner);
With `%option bison-locations', two parameters are added:
int yylex(YYSTYPE *yylval_param,
YYLTYPE *yylloc_param,
yyscan_t scanner);
The semantic type YYSTYPE and the location type YYLTYPE are not declared by the flex-generated code. They must appear in the token.h header you #include into your scanner.
The intention of the bison-bridge parameters is to provide a mechanism to return the semantic value yylval to the caller (i.e. the parser). Since yylval is effectively the same as the parameter yylval_param [Note 2], it will be a pointer to the actual semantic value, so you need to write (for example) yylval->data = ... in your flex actions.
So that's one way to do it.
A possibly simpler alternative to bison-bridge is just to provide your own yylex prototype, which you can do with the macro YY_DECL. For example, you could do something like this (if YYSTYPE were something simple):
#define YY_DECL std::pair<int, YYSTYPE> yylex(yyscan_t yyscanner)
Then a rule could just return the pair:
[0-9]+ {return std::make_pair(NUM, new NumToken(std::stoi(yytext));}
Obviously, there are many variants on this theme.
Notes
Unfortunately, the generated header includes quite a lot of unnecessary baggage, including a bunch of macro definitions for the standard "globals" which won't work because in a reentrant scanner these variables can only be used in a flex action.
The scanner generated with bison-bridge defines yylval as a macro which refers to a field in the opaque state structure, and stores yylval_param into this field. yyget_lval and yyset_lval functions are provided in order to get or set this field from outside of yylex. I don't know why; it seems somewhere between unnecessary and dangerous, since the state will contain the pointer to the value, as supplied in the call to yylex, which may well be a dangling pointer once the call returns.

C++ header cannot be included without LNK2005 error

I have a large project which is designed to control and test hardware.
There are 4 device control classes (for interferometers, a piezo-motor, a PXI system, and a nano-positioning controller).
I created a "master" class called MainIO which stores an instance of each of the above classes, in order to perform operations across the range of IO (i.e. move motor and check interferometers). The MainIO header file includes the 4 control classes headers.
I then have a separate "global" hpp/cpp which contains global variables, conversions, ini file operations and so on. This is laid out with namespaces for the types of operation rather than creating a class, i.e. GCONV::someFunction(); and GMAIN::controllerModel;
I need all 4 control classes to have access to conversion and other global operations. I had them all including global.hpp at one point, but I've changed something (I can't think what it could be!) and now it seems that I cannot include global.hpp in ANY of my control class hpp's or cpp's without getting a linker error -
global.obj:-1: error: LNK2005: "class QString GMAIN::controllerModel" (?controllerModel#GMAIN##3VQString##A) already defined in controllers.obj
I'm absolutely certain that I've done something stupid and the solution is staring me in the face, but it's got to the stage where I'm getting so frustrated with it that I cannot see the wood for the trees.

I have discovered what I was doing wrong, and although it is frustratingly simple, it took me a while to find the relevant documentation to discover my error, and so I will answer my own question in the hope of giving someone else an easier time.
It turns out that in global.hpp I was declaring variables within a namespace like this:
namespace GMAIN {
QString controllerModel;
}
Essentially this means that every file that includes global.hpp will include its own definition of QString controllerModel thereby throwing the linker error. Each control class would have its own definition of the same named variable, violating the one definition rule.
To fix this, QString controllerModel needs to be extern'ed. The extern keyword allows a variable to be declared in multiple locations while only having a single definition (and hence not breaking the rule).
So the working code is now:
//in global.hpp
namespace GMAIN {
extern QString controllerModel; //declaration - this is called for each `#include global.hpp`
}
//in global.cpp
namespace GMAIN {
QString controllerModel; //definition - only called once as .cpp is never included
}

Are you defining controllerModel where you should only be declaring it?
http://www.cprogramming.com/declare_vs_define.html

You should export your dll.
Use __declspec(dllexport). You can include __declspec(dllexport) as a macro in your header file and put the macro in the beginning of each and every member function.
For example:
In your Header.h file include
#define MYMACRO __declspec(dllexport);
and in your class
class classname
{
public:
MYMACRO void MYFUNCTION();
MYMACRO void MYFUNCTION2();
};

Why must a subclass' using statements be duplicated?

Relatively new to cpp. When you subclass a baseclass, the #include cascade to the subclass, why don't the class-wide usings in the cpp file also cover the scope of the subclass? Is this a historical or pragmatic reason? Err.. What is the reason?
//available to subclass
#include <cinder/app/appBasic.h>
// have to duplicate in subclass
using namespace ci;
using namespace ci::app;
using namespace std;

using directive and using declaration are only valid for the current translation unit. you can put these in the header file which is not a good practice.

The main reason is that the compiler only looks at one .cpp file plus whatever it #includes. It has no idea what using statements there might be in the .cpp file for your base class. For that matter, it has no idea whether you've even written the .cpp file for the base class yet when you compile the .cpp file for the derived class. And it's not going to go rooting around the filesystem to find out, unlike for example javac.
Furthermore, I guess you're writing one .cpp file per class, and giving the file a name that has something to do with the class name. But C++ doesn't require that. You can have more than one class in a file, or for that matter you could split a class across multiple files if you want to.
So, you know that this .cpp file is the file for your derived class, and that the other .cpp file is the file for the base class, and therefore you think it might be convenient if some stuff from the other .cpp file was lifted into this .cpp file. But the compiler knows no such thing. It wouldn't make sense to the C++ compiler to talk about what's in the file for the base class.
Finally, a reason that is principled rather than pragmatic: just because it is convenient for the implementer of the base class to bring the names from certain namespaces into global scope doesn't mean the same will be true of the implementer of the derived class. The derived class might not use ci::app at all, let alone use it so much that the person writing the derived class is sick of typing it. So even if C++ could require the compiler to fetch those using statements (which, given the compilation model, it can't), I'm pretty sure the language's designers wouldn't want it to.
the #include cascade to the subclass
No they don't. Any #include in the base.h will be included into derived.cpp if (for example) derived.cpp includes derived.h which includes base.h. But any includes in base.cpp have no effect on derived.cpp.
This is all assuming that you follow the usual naming convention. There's nothing other than convention to stop you including a .cpp file from derived.cpp, in which case (1) any includes or using statements would apply in derived.cpp, but unfortunately (2) you'll probably break your build system, because most likely you won't be able to link base.o and derived.o together any more on account of them containing duplicate definitions for code entities subject to the One Defintion Rule. That is, functions.

Limiting Scope of #include Directives

Let's say I have a header file with a class that uses std::string.
#include <string>
class Foo
{
std::string Bar;
public:
// ...
}
The user of this header file might not want std::string to be included in his/her project. So, how do I limit the inclusion to just the header file?

The user of your class must include <string>, otherwise their compiler will not know how big a Foo object is (and if Foo's constructors/destructors are defined inline, then the compiler also won't know what constructor/destructor to call for the string member).
This is indeed an irritating side-effect of the C++ compilation model (basically inherited intact from C). If you want to avoid this sort of thing entirely, you probably want to take a look at the PIMPL idiom.

Basically, you don't. Once you've included a file, all of the entities from that file are available for the remainder of the translation unit.
The idiomatic way to hide this sort of dependency is to rely on the pimpl idiom.
That said, why would code using Foo care that <string> was included? All of its entities are in the std namespace (well, except that <string> might include some of the C Standard Library headers, but generally you should code with the expectation that the C Standard Library headers might be included by any of the C++ Standard Library headers).

I don't see how this can be done, or if it's possible in c++. The reason being: when the compiler sees the member of type "std::string", it must know what is the type in order to know its size. This information can only be obtained by looking into the class definition in a .h file.
One way the users can use a different string class in their source, is by using "using" construct:
//This is how users can use an adt with same name but in different namespaces
using std::string;
string bar = "there";
using my_own_lib::string;
string bar1 = "here";

You can't do that. If the user doesn't want to include std::string, then he or she should not use that class at all. std::string will have to be included in the project in order to link your class correctly.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Change yylex in C++ Flex - c++

Related

C++ Including header file multiple times

How to use flex with my own parser?

C++ header cannot be included without LNK2005 error

Why must a subclass' using statements be duplicated?

Limiting Scope of #include Directives

Categories

Resources