Related
I've started long ago to work on a dynamic graph visualizer, editor and algorithm testing platform (graphs with nodes and arcs, not the other kinds).
For the algorithm testing platform i need to let the user write a script or call a script from a file, which will interact with the graph currently loaded. The visualizer would do things like light up nodes while they're being visited by the script algorithm, adding some artificial delay, in order to visualize the algorithm navigating and doing stuff.
Scripts would also be secondly used to add third party features that i could either make available as pre-existing scripts in the program folder OR just integrate inside the program in c++ once they're tested and working.
All my searches for an interpreter to embed in my program sent me to lua;
then i started handwriting my own recursive descent parser for my own C-like syntax scripting language (which i planned to use a subset of C++ grammar so that any code written in my scripting language can be copy-pasted in any C++ code.
It was an interesting crazy idea which i don't regret at all, I have scopes, functions, cycles, gotos, typesafe variables, expressions.
But now that i'm approaching the addition of classes, class methods, inheritance (some default classes would be necessary to interface scripts to the program), i realized it's going to take A LOT of time and effort. A bit too much for a personal project of an ungraduated student with exams to study for… but still i whish to complete this project.
The self-imposed requirement of the scripts being 100% compatible with C++ was all but necessary, it would have been just a little nice extra thing, which i can do without.
Now the question is, is there an alternative to lua with a c-like syntax that supports all i've already done plus classes and inheritance? (being able to add custom "classes" that interface scripts to the program is mandatory)
(i can't assume the user to have a full c++ compiler installed so i cant just compile their "script" at runtime as a dll to load and call it, although i whish i could)
Just-in-time compilation of C++
Parsing C++ is hard. Heck, parsing C is hard. It's difficult to get it right, and there are a lot of edge cases. Thankfully, there are a few libraries out there which can take code and even compile it for you.
libclang
libclang provides a lot of facilities for parsing c++. It's a good, clean library, and it'll parse anything the clang compiler itself will parse. This article here is a good starter
libclang provides a JIT compilation tool that allows you to write and compile C++ at runtime. See this blog post here for a overview of what it does and how to use it. It's very general, very powerful, and user-written code should be fast.
GCC also provides a library called libgccjit for just-in-time compilation during the runtime of a program. libgccjit is a C library, but there's also a C++ wrapper provided by the library maintainers. It can compile abstract syntax trees and link them at runtime, although it's still in Alpha mode.
cppast
If you don't want to use libclang, there's also a library under development called cppast, which is a C++ parser which will give you an abstract syntax tree representation of your c++ code. Unfortunately, it won't parse function bodies.
Other tools
If anyone knows any other libraries for compiling or interpreting C++ at runtime, I encourage them to update this post, or comment them so I can update it!
Here is something that lets you embed a C-like scripting language in your application (and a bunch of other cool things):
http://chaiscript.com/
There is lots of documentation:
https://codedocs.xyz/ChaiScript/ChaiScript/
I need a good, stable and, maybe, easy to use C++ parser library with C/C++ interface (C is preferred).
I hear that cint is good c++ interpreter. Can I use it (or some part of it) for this purpose?
Any suggestions?
See: http://clang.llvm.org/
It has both a C++ and a C interface (libclang).
C++ parsing is famously hard. AFAIK there are only three parsers that are acceptable by todays standards: EDG (widely used as a frontend in popular C++ compilers), GCC's and Microsoft's. And apparently, Microsoft has started using EDG's parser in VS2010, for Intellisense.
When you're looking at the free options, you're pretty much stuck at GCC. It can produce XML, though, so the easy part is there. (Easy by C++ parsing standards, that is)
Clang is the most up-to-date and mature option, with a decent C++ API (but no plain C). Elsa is a bit out of date and unmaintained, but still a usable choice. Both could be used as libraries as well as standalone XML frontends.
If you want to parse C or C++ code, there are some options:
http://bellard.org/tcc/
http://students.ceid.upatras.gr/~sxanth/ncc/
If you want to create a parser using C/C++, you can try:
http://boost-spirit.com/home/
http://dinosaur.compilertools.net/ Lex and Yacc
http://www.codeguru.com/csharp/.net/net_general/patterns/article.php/c12805 Flex and Bison
Our C++ Front End is able to parse a variety of C++ dialects (ANSI, GCC, MSVS), automatically builds ASTs whose nodes are marked with precise source positions and are decorated with any nearby comment text, and builds a full symbol table. (EDIT Jan 2013: the C++ front end has been able to handle C++11 for quite awhile now).
The C++ front end is built on top of our DMS Software Reengineering Toolkit, generalized compiler technology for program analysis and transformation, designed to support custom tool building. The C++ front end includes a preprocessor, in which the preprocessor directives can be expanded or not collectively or individually as appropriate for the task. It also includes full symbol construction with all the nasty Koenig lookup stuff.
DMS accepts explicit language definitions (that's how it understands C++; there are also fron ends for C, C#, Java, COBOL, and variety of other languages). DMS provides general parsing, symbol table building, flow analysis machinery, procedural APIs for tree navigation/inspection/modification, source-to-source transformation, and AST-to-source text regeneration including the original comments, number radices, etc. All of these capabilities are available for use by the C++ Front End.
DMS is also designed to handle the scale required for serious tasks. Often you need not just one compilation unit (which is what GCC will give you at best) but access to an entire set. DMS has been used to analyze/transform thousands of C++ compilation units, and literally tens of thousands of C compilation units (on a 25 million line application).
"Easy to use library" is an oxymoron when it comes to program manipulation tools. The langauges themselves are complex (C++ being one of the most difficult and getting worse with C++0X) and that induces complexity in the nature of the questions you can ask and what the answers look like (e.g. "are there any template instantions that can modify local variable X in method Y in class C in any namespace N?"). The questions themselves are hard.
What you want is a library with the necessary complexity to let you carry off your task. DMS has been under continuous development for the last 15 years, to provide that necessary complexity. If you want to do serious program processing, I claim you will need that information.
As proof, DMS has been used to carry out massive automated reengineering of C++-based mission avionics software for Boeing. I don't believe there are any other tools that can do this. (Clang looks to be trying, but only for C++. YMMV).
I don't know for cint, but I heard people use gcc-xml for this.
I have been looking for a good stand-alone library too, but haven't found any.
If you're feeling brave the links in the answer to "is there a yacc-able C++ grammar?" might be helpful. Gcc-xml and clang have already been suggested and Swig also has an XML output which depending on what you're trying to achieve might be relevant.
I did not try it, but I think that best choice will be getting modules for parsing from some popular open source compiler like gcc for C++;
Maybe you'll find something interesting here http://www.nobugs.org/developer/parsingcpp/
I'm trying to create an app to search my company's ColdFusion codebase. I'd like to be able to do intelligent searches, for example: find where a function is defined (and not hit everywhere the function is called). In order to do this, I'd need to parse the ColdFusion code to identify things like function declarations, function calls, database queries, etc.
I've looked into using lex and yacc, but I've never used them before and the learning curve seems very steep. I'm hoping there is something already out there that I could use. My other option is a mess of difficult-to-maintain regex-spaghetti code, which I want to avoid.
I used the source to CFEclipse, since it is open source and has a parser. Not sure about the legality of this if we were selling/redistributing it, but we're only using it for an internal tool.
Writing parsers for real langauges is usually difficult because they contain constructs that Lex and Yacc often don't handle well, e.g., the langauge isn't LALR(1). ColdFusion might be easier than some because of its XML-like style.
If you want to build a sophisticated parser quickly, you might consider using our
DMS Software Reengineering Toolkit which has GLR parsing support.
If you want to avoid writing your own or hacking all those Regexps, you could consider our Source Code Search Engine. It has language-sensitive parsers and can search across very large source code bases very quickly. One of its "language sensitive" parsers is AdhocText, which is designed to handle "generic" programming languages such as those you might find in a random programming book; it even understands XML-like tags such as ColdFusion has. You can download a evaluation version from the link provided to try it.
EDIT 4/3/2010: A recent feature added to the SCSE is the ability to tag definitions and uses separately. That would address the OP's desire to find the function definition rather than all the calls.
None existed. Since ColdFusion is more like scripts than code, I'd imagine it'll be hard to write a parser for it.
ColdFusion Builder can parse CFM/CFC to an outline in Eclipse. Maybe you can do some research on whether a CF Builder plugin can do what you want to do.
A few semesters back I had a class where we wrote a very rudimentary scheme parser and eventually an interpreter. After the class, I converted my parser into a C++ parser that did a reasonably good job of parsing C++ as long as I didn't do anything fancy with the preprocessor or macros. I could use it to read over my classes and functions and do neat things like automatically generate class readers or writers or set up function callbacks from a text file.
However, my program is pretty limited. I'm sure I could spend some time to make it more robust and do more neat things, but I don't want to spend the time and effort if there are already more robust tools available that do the same thing. I figure there has to be something like this out there since parsers are an essential part of compilers, but I haven't seen tools specifically for automatic code generation that make it easy to go through and play with data structures that represent classes, functions and variables for C++ specifically. Are there tools that do this?
Edit:
Hopefully this will clarify a little bit of what I'm looking for. The program I have runs as a prebuild step in visual studio. It reads over my source files, makes a list of classes, their members, their functions, etc. which is then used to generate new code. Currently I just use it to make it easy to read and write my data structures to a plain text file, but I could do other things as well. The file readers and writers are output into plain .cpp and .h files which I include in the rest of my project just as I would any other file. What I'm looking for are tools that do similar things so I can decide if I should continue to use my own or switch to a some better solution. I'm not looking for anything that generates machine code or edits code that I've written.
A complete parser-building tool like ANTLR or YACC is necessary if you want to parse C++ from scratch, but it's overkill for your purposes.
It reads over my source files, makes a list of classes, their members, their functions, etc. which is then used to generate new code.
Two main options:
GCC-XML can generate a list of classes, members, and functions. The distribution version on their web site is quite old; try the CVS version instead. I don't know about the availability of a Windows port.
Doxygen is designed for producing documentation, but it can also produce an XML output, which you should be able to use to do what you want.
Currently I just use it to make it easy to read and write my data structures to a plain text file...
This is known as serialization. Try Boost.Serialization or maybe libs11n or Google Protocol Buffers. Stack Overflow has further discussion.
...but I could do other things as well.
Other cool applications of this kind of automatic code generation include reflection (inspecting your objects' members at runtime, using duck typing with C++, etc.) and generating wrappers for calling C++ from scripting languages. For a C++ reflection library, see Reflex. For an example of generating wrappers for scripting languages, see Boost.Python or SWIG.
The C++ FAQ Lite has references to YACC grammars for C++. YACC is an old-school parser that was used to generate parser output, clumsy and difficult to learn but very powerful. Nowadays, you'd use Gnu Bison instead of YACC.
Don't forget about Cog. It requires you to know Python. In essence it embeds the output of Python scripts into your code. It's absurdly easy to use, but it takes a totally different approach from things like ANTLR and its purpose is somewhat different.
Maybe Boost::Serialize or ANTLR?
I answered a similar question (re splitting source files into separate header and cpp files) by suggesting the use of lzz.
lzz has a very powerful C++ parser that builds a representation for everything except the bodies of functions. As long as you don't need the contents of the function bodies you you could modify 'lzz' so that it performs the generation step you want.
If you want tools that can parse production C++ code, and carry out arbitrary analyses and transformations, see our DMS Software Reengineering Toolkit and its C++ front end.
It would be straightforward to use the information DMS can provide about C++ code, its structures, types, instances, to generate such access functions. If you wanted to generate access functions in another language, DMS provides means to code transformations from the input language (in this case, C++) to that target language.
Mozilla developed Pork for this kind of thing. I can't say it's easy to use (or even to build), but it is in production.
I've already used professionally the Nvelocity engine combined with C# as a prevoius step to coding, with very good results.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
What are some good tools for getting a quick start for parsing and analyzing C/C++ code?
In particular, I'm looking for open source tools that handle the C/C++ preprocessor and language. Preferably, these tools would use lex/yacc (or flex/bison) for the grammar, and not be too complicated. They should handle the latest ANSI C/C++ definitions.
Here's what I've found so far, but haven't looked at them in detail (thoughts?):
CScope - Old-school C analyzer. Doesn't seem to do a full parse, though. Described as a glorified 'grep' for finding C functions.
GCC - Everybody's favorite open source compiler. Very complicated, but seems to do it all. There's a related project for creating GCC extensions called GEM, but hasn't been updated since GCC 4.1 (2006).
PUMA - The PUre MAnipulator. (from the page: "The intention of this project is to
provide a library of classes for the analysis and manipulation of C/C++ sources. For this
purpose PUMA provides classes for scanning, parsing and of course manipulating C/C++
sources."). This looks promising, but hasn't been updated since 2001. Apparently PUMA has been incorporated into AspectC++, but even this project hasn't been updated since 2006.
Various C/C++ raw grammars. You can get c-c++-grammars-1.2.tar.gz, but this has been unmaintained since 1997. A little Google searching pulls up other basic lex/yacc grammars that could serve as a starting place.
Any others?
I'm hoping to use this as a starting point for translating C/C++ source into a new toy language.
Thanks!
-Matt
(Added 2/9): Just a clarification: I want to extract semantic information from the preprocessor in addition to the C/C++ code itself. I don't want "#define foo 42" to disappear into the integer "42", but remain attached to the name "foo". This, unfortunately, excludes several solutions that run the preprocessor first and only deliver the C/C++ parse tree)
Parsing C++ is extremely hard because the grammar is undecidable. To quote Yossi Kreinin:
Outstandingly complicated grammar
"Outstandingly" should be interpreted literally, because all popular languages have context-free (or "nearly" context-free) grammars, while C++ has undecidable grammar. If you like compilers and parsers, you probably know what this means. If you're not into this kind of thing, there's a simple example showing the problem with parsing C++: is AA BB(CC); an object definition or a function declaration? It turns out that the answer depends heavily on the code before the statement - the "context". This shows (on an intuitive level) that the C++ grammar is quite context-sensitive.
You can look at clang that uses llvm for parsing.
Support C++ fully now link
The ANTLR parser generator has a grammar for C/C++ as well as the preprocessor. I've never used it so I can't say how complete its parsing of C++ is going to be. ANTLR itself has been a useful tool for me on a couple of occasions for parsing much simpler languages.
Depending on your problem GCCXML might be your answer.
Basically it parses the source using GCC and then gives you easily digestible XML of parse tree.
With GCCXML you are done once and for all.
pycparser is a complete parser for C (C99) written in Python. It has a fully configurable AST backend, so it's being used as a basis for any kind of language processing you might need.
Doesn't support C++, though. Granted, it's much harder than C.
Update (2012): at this time the answer, without any doubt, would be Clang - it's modular, supports the full C++ (with many C++-11 features) and has a relatively friendly code base. It also has a C API for bindings to high-level languages (i.e. for Python).
Have a look at how doxygen works, full source code is available and it's flex-based.
A misleading candidate is GOLD which is a free Windows-based parser toolkit explicitly for creating translators. Their list of supported languages refers to the languages in which one can implement parsers, not the list of supported parse grammars.
They only have grammars for C and C#, no C++.
Parsing C++ is a very complex challenge.
There's the Boost/Spirit framework, and a couple of years ago they did play with the idea of implementing a C++ parser, but it's far from complete.
Fully and properly parsing ISO C++ is far from trivial, and there were in fact many related efforts. But it is an inherently complex job that isn't easily accomplished, without rewriting a full compiler frontend understanding all of C++ and the preprocessor. A pre-processor implementation called "wave" is available from the Spirit folks.
That said, you might want to have a look at pork/oink (elsa-based), which is a C++ parser toolkit specifically meant to be used for source code transformation purposes, it is being used by the Mozilla project to do large-scale static source code analysis and automated code rewriting, the most interesting part is that it not only supports most of C++, but also the preprocessor itself!
On the other hand there's indeed one single proprietary solution available: the EDG frontend, which can be used for pretty much all C++ related efforts.
Personally, I would check out the elsa-based pork/oink suite which is used at Mozilla, apart from that, the FSF has now approved work on gcc plugins using the runtime library license, thus I'd assume that things are going to change rapidly, once people can easily leverage the gcc-based C++ parser for such purposes using binary plugins.
So, in a nutshell: if you the bucks: EDG, if you need something free/open source now: else/oink are fairly promising, if you have some time, you might want to use gcc for your project.
Another option just for C code is cscout.
The grammar for C++ is sort of notoriously hairy. There's a good thread at Lambda about it, but the gist is that C++ grammar can require arbitrarily much lookahead.
For the kind of thing I imagine you might be doing, I'd think about hacking either Gnu CC, or Splint. Gnu CC in particular does separate out the language generation part pretty thoroughly, so you might be best off building a new g++ backend.
Actually, PUMA and AspectC++ are still both actively maintained and updated. I was looking into using AspectC++ and was wondering about the lack of updates myself. I e-mailed the author who said that both AspectC++ and PUMA are still being developed. You can get to source code through SVN https://svn.aspectc.org/repos/ or you can get regular binary builds at http://akut.aspectc.org. As with a lot of excellent c++ projects these days, the author doesn't have time to keep up with web page maintenance. Makes sense if you've got a full time job and a life.
how about something easier to comprehend like tiny-C or Small C
Elsa beats everything else I know hands down for C++ parsing, even though it is not 100% compliant. I'm a fan. There's a module that prints out C++, so that may be a good starting point for your toy project.
See our C++ Front End
for a full-featured C++ parser: builds ASTs, symbol tables, does name
and type resolution. You can even parse and retain the preprocessor
directives. The C++ front end is built on top of our DMS Software Reengineering
Toolkit, which allows you to use that information to carry out arbitrary
source code changes using source-to-source transformations.
DMS is the ideal engine for implementing such a translator.
Having said that, I don't see much point in your imagined task; I don't
see much value in trying to replace C++, and you'll find building
a complete translator an enormous amount of work, especially if your
target is a "toy" language. And there is likely little point in
parsing C++ using a robust parser, if its only purpose is to produce
an isomorphic version of C++ that is easier to parse (wait, we postulated
a robust C++ already!).
EDIT May 2012: DMS's C++ front end now handles GCC3/GCC4/C++11,Microsoft VisualC 2005/2010. Robustly.
EDIT Feb 2015: Now handles C++14 in GCC and MS dialects.
EDIT August 2015: Now parses and captures both the code and the preprocessor directives in a unified tree.
EDIT May 2020: Has been doing C++17 for the past few years. C++20 in process.
A while back I attempted to write a tool that will automatically generate unit tests for c files.
For preprosessing I put the files thru GCC. The output is ugly but you can easily trace where in the original code from the preprocessed file. But for your needs you might need somthing else.
I used Metre as the base for a C parser. It is open source and uses lex and yacc. This made it easy to get up and running in a short time without fully understanding lex & yacc.
I also wrote a C app since the lex & yacc solution could not help me trace functionality across functions and parse the structure of the entire function in one pass. It became unmaintainable in a short time and was abandoned.
What about using a tool like GNU's CFlow, that can analyse the code and produce charts of call-graphs, here's what the opengroup(man page) has to say about cflow. The GNU version of cflow comes with source, and open source also ...
Hope this helps,
Best regards,
Tom.