Flex++ Bisonc++ parser - c++

I'm trying to use flex and bison in my project to generate a parser code for a file structure. Main programming language is C++ and project is on an OO design mainly running in parallel.
I heard that flex and bison generated parsers are C codes and they're not reenterant. Googling, I found flex++ and bisonc++. Unfortunately there is no simple tutorial to get started. Most examples are based on bison/flex. Some people somehow integrated bison/flex parsers in their C++ code. They supposed to be "tricky"...
Documentation of flex++ and bisonc++ doesn't help me and. Tutorials and examples, they all get input from stdin and print some messages on stdout.
I need these features in my parser:
Parser should be a C++ class, defined in normal manner (a header and a cpp file)
Parser receives data from either an std::string or std::stringstream or a null-terminated char*.
I feel so confused. Should I use flex++/bisonc++ or flex/bison? And how to do that, satisfying above conditions?

There are flex/bison, flex++/bison++ and flexc++/bisonc++. I think it's best to pick one of these three pairs, instead of mixing/matching flex++ and bisonc++.
Here are the user guides for Flexc++ and Bisonc++.
From the Flexc++ website:
Flexc++, contrary to flex and flex++, generates code that is
explicitly intended for use by C++ programs. The well-known flex(1)
program generates C source-code and flex++(1) merely offers a C++-like
shell around the yylex function generated by flex(1) and hardly
supports present-day ideas about C++ software development.
Contrary to this, flexc++ creates a C++ class offering a predefined
member function lex matching input against regular expressions and
possibly executing C++ code once regular expressions were matched. The
code generated by flexc++ is pure C++, allowing its users to apply all
of the features offered by that language.
From the Bisonc++ website:
Bisonc++ is a general-purpose parser generator that converts a grammar
description for an LALR(1) context-free grammar into a C++ class to
parse that grammar. Once you are proficient with bisonc++, you may use
it to develop a wide range of language parsers, from those used in
simple desk calculators to complex programming languages. Bisonc++ is
highly comparable to the program bison++, written by Alain Coetmeur:
all properly-written bison++ grammars ought to be convertible to
bisonc++ grammars after very little or no change. Anyone familiar with
bison++ or its precursor, bison, should be able to use bisonc++ with
little trouble. You need to be fluent in using the C++ programming in
order to use bisonc++ or to understand this manual.
So flexc++/bisonc++ are more than just wrappers around the old flex/bison utilities. They generate complete C++ classes to be used for re-entrant scanning / parsing.

Flex can generate a reentrant C scanner. See Section 19 Reentrant C scanners in the Flex manual.
Similarly, Bison can generate a reentrant C parser. See Section 3.8.11 A Pure (Reentrant) Parser in the Bison manual for details.
Do you absolutely need to have a C++ parser and std::string/stringstream based parser data?
Have you looked at Boost.Spirit as an alternative?

The LRSTAR product (LR(k) parser and DFA lexer generator) is C++ based. Runs on Widowns and has six Visual Studio projects. The code also compiles with "gcc" and other compilers. There are classes for lexer and parser, symbol-table, AST. Complete source code is available. It gets good reviews. I should know. I am the author.

Related

What are the reasons for using Ragel to parse strings in a C++ codebase?

I inherited a C++ project which uses Ragel for string parsing.
This is the first time I have seen this being done and I would like to understand why someone would use Ragel instead of C++ to parse a string?
parser generators (improperly called "compiler-compilers") are very handsome to use and produce reliable and efficient C++ or C code (notably because parsing theory is well understood).
In general, using source code generators could be a wise thing to do. Sometimes, notably in large projects, it is sensible to write your own one (read about metaprogramming, notably SICP and even J.Pitrat's blog). Good build automation tools like GNU make or ninja can easily be configured to run C or C++ code generators and use them at build time.
Read Ragel intro. Look also into flex, bison, ANTLR, rpcgen, Qt moc, swig, gperf as common examples of C or C++ generators.
In some programs, you could even use some JIT compilation library (such as libgccjit or LLVM) to dynamically generate code at run time and use it. On POSIX systems you could also generate at runtime a temporary C or C++ file, compile it as a plugin, and load that temporary plugin using dlopen & dlsym. Having a good culture about compilers and interpreters (e.g. thru the Dragon Book) is then worthwhile.
Embedding some interpreter (like lua or guile) in your application is also an interesting approach. But it is a strong architectural decision.
In many cases, generating source code is easier than hand writing it. Of course that is not always possible.
PS. I never heard of Ragel before reading your question!

C++ BNF grammar with parsing/matching examples

I'm developing a C++ parser (for an IDE), so now trying to understand C++ grammar in details.
While I've found an excellent grammar source at http://www.nongnu.org/hcb/, I'm having trouble understanding some parts of it - and, especially, which "real" language constructs correspond to various productions.
So I'm looking for a C/C++ BNF grammar guide with examples of code that match various productions/rules. Are there any?
A hyperlinked (purported) grammar is not necessarily one on which you can build a parser easily. That is determined by the nature of your parsing engine, and which real dialect of C and C++ you care about (ANSI? GNU? C99? C++11? MS?).
Building a working C++ parser is really hard. See my answer to Why C++ cannot be parsed with a LR(1) parser? for some of the reasons. If you want a "good" parser, I suggest you use one of the existing ones. One worth looking at might be Elsa, since it is open source.

searching for a BNF (for yacc) grammar of C++

I found something similar here: Where can I find standard BNF or YACC grammar for C++ language?
But the download links don't work anymore, and I want to ask if somebody know where I can download it now?
C++ is not a context-free language and therefore cannot be accurately parsed using a parser like BNF or yacc. However, it is possible to parse a superset of the language with those tools, and then apply additional contextual processing to the parsed structure.
Looking here: http://www.parashift.com/c++-faq-lite/compiler-dependencies.html#faq-38.11, I found this: http://www.computing.surrey.ac.uk/research/dsrg/fog/CxxGrammar.y
Depending on your task, you might want to use an existing C++ frontend instead.
The EDG Compiler Frontend and the CLang Frontend have both been designed so as to be used independently from "pure compilation".
CLang notably features accurate location of tokens and for example includes "rewrite" tools that can be used to modify existing code.

What is the process of creating an interpreted language? [duplicate]

This question already has answers here:
Learning to write a compiler [closed]
(38 answers)
Closed 9 years ago.
I want to create a very simple experimental programming language. What are some resources can i check out to get an overview of the process of creating an interpreted language. I will be using c++ to build and compile the interpreter.
You need to implement both a parser and an interpreter.
There is a great free text book called "Programming Languages: Application and Interpretation" that uses scheme to build increasingly more complex interpreters. It also serves as a great introduction to programming language features.
Check it out here: http://www.cs.brown.edu/~sk/Publications/Books/ProgLangs/
If Scheme isn't your cup of tea it may be worth looking into.
A few steps:
First, build the lexer and parser. This is really easy to do with common tools such as lex and yacc, or using a more modern framework such as Antlr (which is what I recommend). These tools will generate source code for your target language that you can then compile and include in your project.
The lexer and parser will build the internal representation of the source file. There are a few different ways of approaching this:
In the bytecode model, the source file is compiled into a low-level internal language, for which you write a bytecode interpreter that directly executes the operations. This is the way that Perl and the .NET languages work, for example.
In the object tree model, the source file is compiled into an object tree where every object knows how to execute itself. Once parsing is completed, you just call Exec() on the root object (which in turn calls Exec() on its children, etc.). This is basically the method that I use for my interpreted domain-specific language Phonix.
To create an interpreted language, you need to create two things:
A formal definition of the language's grammar
A parser that can read and interpret the language
Once you have defined the language itself, there are several tools available to assist in creating a language parser. The classic tools are lex and yacc, and their open-source versions flex and bison.
Take a look at the boost library "spirit" LL parser.

How to turn type-labeled tokens into a parse-tree?

So I'm writing a programming language in C++. I've written pretty much all of it except for one little bit where I need to turn my tokens into a parse tree.
The tokens are already type labeled and ready to go, but I don't want to go through the effort of making my own parse tree generator. I've been looking around for apps to do this but always run into very complicating or overzealous apps, and all I want to turn a list of token types into a parse tree, nothing more, nothing less. Thanks in advance!
The simplest parser generator is yacc (or bison).
Bison is just a harry yacc (ie it has more options).
One of these is too generate a C++ parser object (rather than a C function).
Just add the following to the yacc file:
%skeleton "lalr1.cc"
The canonical parser generator is called yacc. There's a gnu version of it called bison. These are both C based tools, so they should integrate nicely with your C++ code. There is a tool for java called ANTLR which I've heard very good things about (i.e. it's easy to use and powerful). Keep in mind that with yacc or bison you will have to write a grammar in their language. This is most certainly doable, but not always easy. It's important to have a theoretical background in LR(k) parsing so you can understand that it means when it tells you to fix your ambiguous grammar.
Depending on what exactly your requirements are, Boost.Spirit might be an alternative. Its modular, so you should be able to use only components of it as well.