Thread-safe C++ wrapper around a lex/yacc parser - c++

I am trying to write a JSON parser (instead of using one of the freely available ones, because of certain project constraints) and have written lex+yacc based version with a simple wrapper C++ class. I have redefined the YY_INPUT macro for lex to read from a memory buffer. Now the deal is to ensure that the parser is thread-safe and I am not sure how easy it is to ensure that. There are two concerns:
Ultimately YY_INPUT is reading from a global object. I could not think of another way of doing this.
I have no idea how many globals does the generated lex/yacc code end up using.
Would be great if folks can share their experience of doing something similar.
Cheers.
PS. We don' t use STL/string or any templates for that matter. We use our own variant-based containers. We use lex+yacc rather than flex+bison, on four major Unices.

I don't have much experience working directly with yacc, but I know that bison supports reentrant parsers that are thread-safe. It also looks like lex supports a reentrant lexer as well, and I'd guess that if you put the two together it should work out just fine.

Related

What are Clojure's AOT limitations?

It looks like exposing clojure library api to Java code requires the use of AOT compilation (at least when the exposed api is OO). If I'm wrong about that, I am happy to be revised. In the past, there have been multiple issues with AOT, thusly relying on it may have seemed a bit reckless or unstable.
What is the current state in that?
Is it a safe practice?
Would you use that as a means to expose an OO api to java applications?
You can use AOT to produce bytecode that will pass muster as a Java library. But it's not terribly pleasant, for you or for the Java programmers, who will rightfully ask: Where is the JavaDoc? Why are you using Object instead of generics? And other similarly awkward questions.
Instead, my preferred approach is to not make the Clojure code Java-friendly at all. Expose it as ordinary Clojure vars, and that's it. Then, write some Java code yourself, in the same library, that consumes the Clojure-based API and repackages it in terms of whatever Java constructs you want.
For one example, see my thrift-gen library. Using it from Clojure, you get a function that takes a map as input and produces a sequence; using it from Java, you get a builder pattern for configuration instead of the map, and you get a List<? extends T> as output. No JavaDoc, because I felt the usage documentation in the readme was sufficient, but if I were serious about using this there is a real .java source file to easily add JavaDoc to.

replace c++ with go + swig

I recently asked this question https://softwareengineering.stackexchange.com/questions/129076/go-instead-of-c-c-with-cgo and got some very interesting input. However there's a mistake in my question: I assumed cgo could also be used to access c++ code but that's not possible. Instead you need to use SWIG.
The go faq says "The cgo program provides the mechanism for a “foreign function interface” to allow safe calling of C libraries from Go code. SWIG extends this capability to C++ libraries. "
my question:
Is it possible to access high-level c++ frameworks such as QT with SWIG + Go and get productive? I'd like to use Go as a "scripting language" to utilize c++ libraries.
Have you any experience with go and swig? Are there pitfalls I have to be aware of?
Update/Answer: I've asked this over IRC too and I think the question is solved:
SWIG is a rather clean way of interfacing c++ code from other languages. Sadly matching the types of c++ to something like go can be very complex and in most cases you have to specify the mapping yourself. That means that SWIG is a good way to leverage an existing codebase to reuse already written algorithms. However mapping a library like Qt to go will take you ages. Mind it's surely possible but you don't want to do it.
Those of you that came here for gui programming with go might want try go-gtk or the go version of wxWidgets.
Is it possible? Yes.
Can it be done in a reasonably short period of time? No.
If you go back and look at other projects that have taken large frameworks and tried to put an abstraction layer on it, you'll find most are "incomplete". You can probably make a fairly good start and get some initial wrappers in place, but generally even the work to get the simple cases solved takes time when there is a lot of underlying code to wrap, even with automated tools (which help, but are never a complete solution). And then... you get to the nasty remaining 10% that will take you forever (ok, a really really long time at least). And then think about how it's a changing target in the first place. Qt, for example, is about to release the next major rewrite.
Generally, it's safest to stick to the framework language that the framework was designed for. Though many have language extensions within the project itself. For example, for Qt you should check out QML, which provides (among many other things) a javascript binding to Qt. Sort of. But it might meet your "scripting" requirement.
A relevant update on this issue: it is now possible to interact with C++ using cgo with this CL, which is merged for Go 1.2. It is limited, however, to C-like functions calls, and classes, methods and C++ goodies are not supported (yet, I hope).

Lisp/Scheme DSEL in C++

I came across the following post on the boost mailing lists (emphasis mine):
hello all,
does anybody know of an existing spirit/lisp implimentation, and is there
any interest in developing such a project in open source?
None yet, AFAIK.
I'll be writing an example for Spirit2
to complement the tiny-C virtual
machine in there. What's equally
interesting though is that scheme (or
at least a subset of it) can be
implemented in pure c++. No parsing,
just pure DSEL in C++. Now, imagine a
parser that targets this DSEL (through
C++) -- a source to source translator.
Essentially, your scheme code will be
compiled into highly efficient C++.
Has anyone actually done this? I would be very interested in such a DSEL.
I wrote a Lisp-like language called Funky using Spirit in C++. An Open Source version is available at http://funky.vlinder.ca. It wouldn't take too much to turn that into a Lisp-like to C++ translator.
Actually, what it would take is a run-time support library to provide generic closure times and somesuch: if you want to turn the Lisp code into efficient C++, you will basically need C++ classes (functors, etc.) to do the heavy lifting once you get to run-time, so your Lisp-to-C++ translator would need to:
parse the Lisp
create an AST from the Lisp
transform the AST to optimize it, if possible (optimizations in Lisp are different from optimizations in C++, so if you want rally fast C++, you have to optimize the Lisp and let your C++ compiler optimize the generated C++)
generate the C++, for which you'd rely on your run-time support library for things like built-in functions, functor types, etc.
If you were to start from Funky, you'd already have the parse and the AST (though Funky doesn't optimize the AST), so you could go from there an create the run-time and generate the C++...
It wouldn't be overly complicated to write one from scratch either: Lisp grammar isn't that difficult, so most of the work would go into the AST and the run-time support.
If I weren't writing an object-oriented DSL right now, I might try my hand at this.
scheme to (readable) c++
http://www.suri.cs.okayama-u.ac.jp/servlets/APPLICATION.rkt
How about this
Not sure if this is what you want, but:
http://howtowriteaprogram.blogspot.com/2010/11/lisp-interpreter-in-90-lines-of-c.html
It looks like a start, at least.

Are you aware of any lexical analyzer or lexer in Qt?

Are you aware of any lexical analyzer or lexer in Qt? I need it for parsing text files.
It is kinda interesting how Qt has evolved into an all-compassing framework that makes the programmer that uses it believe that anything that is useful has to start with the letter Q. Very dot-netty. Qt is just a class library that runs on top of the language, it doesn't preclude using everyday libraries that get a job done. Especially when that's a library that has little to do with presenting a user interface, the job that Qt does so well.
There are many libraries that get lexical analysis and parsing done well. That starts with Lex and Yacc, Flex and Bison next, etcetera. You only have to Qt enable it for error messages, they readily support that.
QXmlReader has allows you to define a lexical handler, for plain text you can use QRegExp. If you want a full blown lexical analyzer take a look at Quex (not Qt specific, but is used to generate a C++ code based on your input).
If you can use it... (it's quite complex if you ask me!) there is the Spirit library from boost.
This can be used "dynamically" in the sense that it does not generate other files that you have to then compile to run your parser.
http://www.boost.org/doc/libs/1_48_0/libs/spirit/doc/html/spirit/lex.html
But it's complex (to my point of view) since even just the #include don't always work right (if you include them in the wrong order or the documentation may not match the tutorial, I'm not too sure.) Yet, I see many people using it!

Are there any free tools to help with automatic code generation?

A few semesters back I had a class where we wrote a very rudimentary scheme parser and eventually an interpreter. After the class, I converted my parser into a C++ parser that did a reasonably good job of parsing C++ as long as I didn't do anything fancy with the preprocessor or macros. I could use it to read over my classes and functions and do neat things like automatically generate class readers or writers or set up function callbacks from a text file.
However, my program is pretty limited. I'm sure I could spend some time to make it more robust and do more neat things, but I don't want to spend the time and effort if there are already more robust tools available that do the same thing. I figure there has to be something like this out there since parsers are an essential part of compilers, but I haven't seen tools specifically for automatic code generation that make it easy to go through and play with data structures that represent classes, functions and variables for C++ specifically. Are there tools that do this?
Edit:
Hopefully this will clarify a little bit of what I'm looking for. The program I have runs as a prebuild step in visual studio. It reads over my source files, makes a list of classes, their members, their functions, etc. which is then used to generate new code. Currently I just use it to make it easy to read and write my data structures to a plain text file, but I could do other things as well. The file readers and writers are output into plain .cpp and .h files which I include in the rest of my project just as I would any other file. What I'm looking for are tools that do similar things so I can decide if I should continue to use my own or switch to a some better solution. I'm not looking for anything that generates machine code or edits code that I've written.
A complete parser-building tool like ANTLR or YACC is necessary if you want to parse C++ from scratch, but it's overkill for your purposes.
It reads over my source files, makes a list of classes, their members, their functions, etc. which is then used to generate new code.
Two main options:
GCC-XML can generate a list of classes, members, and functions. The distribution version on their web site is quite old; try the CVS version instead. I don't know about the availability of a Windows port.
Doxygen is designed for producing documentation, but it can also produce an XML output, which you should be able to use to do what you want.
Currently I just use it to make it easy to read and write my data structures to a plain text file...
This is known as serialization. Try Boost.Serialization or maybe libs11n or Google Protocol Buffers. Stack Overflow has further discussion.
...but I could do other things as well.
Other cool applications of this kind of automatic code generation include reflection (inspecting your objects' members at runtime, using duck typing with C++, etc.) and generating wrappers for calling C++ from scripting languages. For a C++ reflection library, see Reflex. For an example of generating wrappers for scripting languages, see Boost.Python or SWIG.
The C++ FAQ Lite has references to YACC grammars for C++. YACC is an old-school parser that was used to generate parser output, clumsy and difficult to learn but very powerful. Nowadays, you'd use Gnu Bison instead of YACC.
Don't forget about Cog. It requires you to know Python. In essence it embeds the output of Python scripts into your code. It's absurdly easy to use, but it takes a totally different approach from things like ANTLR and its purpose is somewhat different.
Maybe Boost::Serialize or ANTLR?
I answered a similar question (re splitting source files into separate header and cpp files) by suggesting the use of lzz.
lzz has a very powerful C++ parser that builds a representation for everything except the bodies of functions. As long as you don't need the contents of the function bodies you you could modify 'lzz' so that it performs the generation step you want.
If you want tools that can parse production C++ code, and carry out arbitrary analyses and transformations, see our DMS Software Reengineering Toolkit and its C++ front end.
It would be straightforward to use the information DMS can provide about C++ code, its structures, types, instances, to generate such access functions. If you wanted to generate access functions in another language, DMS provides means to code transformations from the input language (in this case, C++) to that target language.
Mozilla developed Pork for this kind of thing. I can't say it's easy to use (or even to build), but it is in production.
I've already used professionally the Nvelocity engine combined with C# as a prevoius step to coding, with very good results.