Good ways of parsing in C++ - c++

I'm writing a Program in C++ that is going to use the same input files as an existing Prolog program already uses.
The files will look like these :
expr1(t,[f,g]).
expr1(q,[]).
expr1(r,[e]).
expr2(a).
expr2(b).
expr2(e).
expr2(a,r).
expr2(b,d).
expr2(e,z).
What are some ways of parsing such files? I've read about the Boost Spirit.. anyone got thoughts on this? Or is a way of doing it using the standard C/C++ libraries? Ideas would be great.
Thank you.

That looks like a perfect job for a hand written recursive descent parser. No extra dependencies, easy to write, and straight forward for future maintainers.

What is wrong with Flex and Bison? This does have the benefit of the generated code being independent of libraries which you may or may not have. It is used for things as simple as parsing config files to things like a Javascript parser for Webkit.You might even find a Prolog grammar that you can use.

I would definitely not suggest Boost Spirit unless the task is really a lot more complicated than what it looks lke. There is nothing wrong with Boost Spirit, I mean it is really powerfull and would do the work just fine, but it also requires a lot of learning and might massively increase the compilation time.
Although I agree with Jörgen that a hand written decent parser would be a good option, it doesn't look like you are going to need a context-free parser, so I think a regular expression parser might be enough. If that is the case, I suggest you to take a look at the new regex library introduced in the new C++0x standard.

Related

Does anyone know C++ XML parser/writer similar to .Net's XML Document?

I'm writing a native C++ project and I need a simple XML parser/writer. I already know XmlDocument in C#, so something similar could be quite good, but if there isn't, does anyone know a quick-to-use XML parser/writer?
I'm trying to shorten my "learning of the library" time to minimum.
Thanks!
There are quite a few you can consider:
Xerces
TinyXML
libxml++
Expat Xml
XmlLite
I've used TinyXML, and it's worked reasonably well for what I've needed. My needs weren't terribly demanding though. At least when I used it, it didn't deal with DTD/XSD at all, so if you need to handle those it's probably not an option.
I ended up using RapidXML (http://rapidxml.sourceforge.net/).
For me, its intuitive, and was really easy to learn.
No body recommended it so I added my own answer :-\ ...

Thread-safe C++ wrapper around a lex/yacc parser

I am trying to write a JSON parser (instead of using one of the freely available ones, because of certain project constraints) and have written lex+yacc based version with a simple wrapper C++ class. I have redefined the YY_INPUT macro for lex to read from a memory buffer. Now the deal is to ensure that the parser is thread-safe and I am not sure how easy it is to ensure that. There are two concerns:
Ultimately YY_INPUT is reading from a global object. I could not think of another way of doing this.
I have no idea how many globals does the generated lex/yacc code end up using.
Would be great if folks can share their experience of doing something similar.
Cheers.
PS. We don' t use STL/string or any templates for that matter. We use our own variant-based containers. We use lex+yacc rather than flex+bison, on four major Unices.
I don't have much experience working directly with yacc, but I know that bison supports reentrant parsers that are thread-safe. It also looks like lex supports a reentrant lexer as well, and I'd guess that if you put the two together it should work out just fine.

Are you aware of any lexical analyzer or lexer in Qt?

Are you aware of any lexical analyzer or lexer in Qt? I need it for parsing text files.
It is kinda interesting how Qt has evolved into an all-compassing framework that makes the programmer that uses it believe that anything that is useful has to start with the letter Q. Very dot-netty. Qt is just a class library that runs on top of the language, it doesn't preclude using everyday libraries that get a job done. Especially when that's a library that has little to do with presenting a user interface, the job that Qt does so well.
There are many libraries that get lexical analysis and parsing done well. That starts with Lex and Yacc, Flex and Bison next, etcetera. You only have to Qt enable it for error messages, they readily support that.
QXmlReader has allows you to define a lexical handler, for plain text you can use QRegExp. If you want a full blown lexical analyzer take a look at Quex (not Qt specific, but is used to generate a C++ code based on your input).
If you can use it... (it's quite complex if you ask me!) there is the Spirit library from boost.
This can be used "dynamically" in the sense that it does not generate other files that you have to then compile to run your parser.
http://www.boost.org/doc/libs/1_48_0/libs/spirit/doc/html/spirit/lex.html
But it's complex (to my point of view) since even just the #include don't always work right (if you include them in the wrong order or the documentation may not match the tutorial, I'm not too sure.) Yet, I see many people using it!

How to turn type-labeled tokens into a parse-tree?

So I'm writing a programming language in C++. I've written pretty much all of it except for one little bit where I need to turn my tokens into a parse tree.
The tokens are already type labeled and ready to go, but I don't want to go through the effort of making my own parse tree generator. I've been looking around for apps to do this but always run into very complicating or overzealous apps, and all I want to turn a list of token types into a parse tree, nothing more, nothing less. Thanks in advance!
The simplest parser generator is yacc (or bison).
Bison is just a harry yacc (ie it has more options).
One of these is too generate a C++ parser object (rather than a C function).
Just add the following to the yacc file:
%skeleton "lalr1.cc"
The canonical parser generator is called yacc. There's a gnu version of it called bison. These are both C based tools, so they should integrate nicely with your C++ code. There is a tool for java called ANTLR which I've heard very good things about (i.e. it's easy to use and powerful). Keep in mind that with yacc or bison you will have to write a grammar in their language. This is most certainly doable, but not always easy. It's important to have a theoretical background in LR(k) parsing so you can understand that it means when it tells you to fix your ambiguous grammar.
Depending on what exactly your requirements are, Boost.Spirit might be an alternative. Its modular, so you should be able to use only components of it as well.

Are there any free tools to help with automatic code generation?

A few semesters back I had a class where we wrote a very rudimentary scheme parser and eventually an interpreter. After the class, I converted my parser into a C++ parser that did a reasonably good job of parsing C++ as long as I didn't do anything fancy with the preprocessor or macros. I could use it to read over my classes and functions and do neat things like automatically generate class readers or writers or set up function callbacks from a text file.
However, my program is pretty limited. I'm sure I could spend some time to make it more robust and do more neat things, but I don't want to spend the time and effort if there are already more robust tools available that do the same thing. I figure there has to be something like this out there since parsers are an essential part of compilers, but I haven't seen tools specifically for automatic code generation that make it easy to go through and play with data structures that represent classes, functions and variables for C++ specifically. Are there tools that do this?
Edit:
Hopefully this will clarify a little bit of what I'm looking for. The program I have runs as a prebuild step in visual studio. It reads over my source files, makes a list of classes, their members, their functions, etc. which is then used to generate new code. Currently I just use it to make it easy to read and write my data structures to a plain text file, but I could do other things as well. The file readers and writers are output into plain .cpp and .h files which I include in the rest of my project just as I would any other file. What I'm looking for are tools that do similar things so I can decide if I should continue to use my own or switch to a some better solution. I'm not looking for anything that generates machine code or edits code that I've written.
A complete parser-building tool like ANTLR or YACC is necessary if you want to parse C++ from scratch, but it's overkill for your purposes.
It reads over my source files, makes a list of classes, their members, their functions, etc. which is then used to generate new code.
Two main options:
GCC-XML can generate a list of classes, members, and functions. The distribution version on their web site is quite old; try the CVS version instead. I don't know about the availability of a Windows port.
Doxygen is designed for producing documentation, but it can also produce an XML output, which you should be able to use to do what you want.
Currently I just use it to make it easy to read and write my data structures to a plain text file...
This is known as serialization. Try Boost.Serialization or maybe libs11n or Google Protocol Buffers. Stack Overflow has further discussion.
...but I could do other things as well.
Other cool applications of this kind of automatic code generation include reflection (inspecting your objects' members at runtime, using duck typing with C++, etc.) and generating wrappers for calling C++ from scripting languages. For a C++ reflection library, see Reflex. For an example of generating wrappers for scripting languages, see Boost.Python or SWIG.
The C++ FAQ Lite has references to YACC grammars for C++. YACC is an old-school parser that was used to generate parser output, clumsy and difficult to learn but very powerful. Nowadays, you'd use Gnu Bison instead of YACC.
Don't forget about Cog. It requires you to know Python. In essence it embeds the output of Python scripts into your code. It's absurdly easy to use, but it takes a totally different approach from things like ANTLR and its purpose is somewhat different.
Maybe Boost::Serialize or ANTLR?
I answered a similar question (re splitting source files into separate header and cpp files) by suggesting the use of lzz.
lzz has a very powerful C++ parser that builds a representation for everything except the bodies of functions. As long as you don't need the contents of the function bodies you you could modify 'lzz' so that it performs the generation step you want.
If you want tools that can parse production C++ code, and carry out arbitrary analyses and transformations, see our DMS Software Reengineering Toolkit and its C++ front end.
It would be straightforward to use the information DMS can provide about C++ code, its structures, types, instances, to generate such access functions. If you wanted to generate access functions in another language, DMS provides means to code transformations from the input language (in this case, C++) to that target language.
Mozilla developed Pork for this kind of thing. I can't say it's easy to use (or even to build), but it is in production.
I've already used professionally the Nvelocity engine combined with C# as a prevoius step to coding, with very good results.