C++ Code Analyzer for displaying Class objects in an App - c++

I want to work on an app that will open up a Visual Studio Project and display all of the classes in the project. It will only read the header files to find classes.
It's taking me forever to parse each data member and each method for display properly.
So I was wondering if there is some sort of API or library that I can use to parse all of the details of a C++ Header file so I can display them.
EDIT:
This is what my app currently looks like. I currently have issues getting User Defined types, which is why you see several unnamed Int32 types.
App Preview

Everybody hopes parsing source code is easy, even C++. It is not.
If you want to accurately parse C++ (header) files, you need a full C++ parser. In fact, parsing header files, especially those from the vendor (e.g., Microsoft's and even GNU's) is particularly nasty because they tend to contain undocumented constructs specific to the compiler.
You have only 4 good choices here:
The GNU compiler. It obviously can read GCC header files. I doubt it can read MS header files because of the vendor-specific extenstions. GCC really, really, wants to be a compiler and will resist your attempts to bend it to other tasks. Melt is a GCC extension that tries to make this easier; I've looked at it, and it doesn't seem that much better, but I'm biased.
Clang. It has a full C++ parser, specializing in GCC-style source files. I don't know what it can do about MS specific constructs let alone MS headers. Clang is at least organized to let you use it for custom tasks. (Apparantly VS2015 includes a copy of Clang to support Intellisense, but you can't get at the information it collects).
EDG. This is a commercial front end. It has a full parser, and is designed to let you build tools around it. I don't know what it does about MS or GNU headers. AFAIK, it doesn't provide anything other than the front end. (That's a lot).
(Our) commercial DMS Software Reengineering Toolkit with its C++ front end. (I obviously know a lot about this). It has a full C++14 parser, and handles both GCC and MS header files. Our front ends are the only ones that make any attempt to preserve preprocessor directives, if that matters to you. DMS is designed to let you build tools around it. DMS provides lots of support for pattern matching and code transformation above and beyond "just parsing". After parsing, information about each class is available in C++'s symbol table; it would be pretty easy to enumerate them and their members, and their relationships to other classes.
No matter what C++ parsing technology you use, "its very complex", don't expect that dealing with C++ is going to be easy. And, expect a high learning curve to understand any of the above frameworks. If you make the investment, and follow through with building a real tool, you'll learn a lot and be ready to build a next, more sophisticated tool with a lot less effort.
If you don't care about accuracy, you could scan files using Perl and regexes to hunt for class declarations. This will probably lead to a useless tool.

Related

Is there a C-like syntax scripting language interpreter for C++?

I've started long ago to work on a dynamic graph visualizer, editor and algorithm testing platform (graphs with nodes and arcs, not the other kinds).
For the algorithm testing platform i need to let the user write a script or call a script from a file, which will interact with the graph currently loaded. The visualizer would do things like light up nodes while they're being visited by the script algorithm, adding some artificial delay, in order to visualize the algorithm navigating and doing stuff.
Scripts would also be secondly used to add third party features that i could either make available as pre-existing scripts in the program folder OR just integrate inside the program in c++ once they're tested and working.
All my searches for an interpreter to embed in my program sent me to lua;
then i started handwriting my own recursive descent parser for my own C-like syntax scripting language (which i planned to use a subset of C++ grammar so that any code written in my scripting language can be copy-pasted in any C++ code.
It was an interesting crazy idea which i don't regret at all, I have scopes, functions, cycles, gotos, typesafe variables, expressions.
But now that i'm approaching the addition of classes, class methods, inheritance (some default classes would be necessary to interface scripts to the program), i realized it's going to take A LOT of time and effort. A bit too much for a personal project of an ungraduated student with exams to study for… but still i whish to complete this project.
The self-imposed requirement of the scripts being 100% compatible with C++ was all but necessary, it would have been just a little nice extra thing, which i can do without.
Now the question is, is there an alternative to lua with a c-like syntax that supports all i've already done plus classes and inheritance? (being able to add custom "classes" that interface scripts to the program is mandatory)
(i can't assume the user to have a full c++ compiler installed so i cant just compile their "script" at runtime as a dll to load and call it, although i whish i could)
Just-in-time compilation of C++
Parsing C++ is hard. Heck, parsing C is hard. It's difficult to get it right, and there are a lot of edge cases. Thankfully, there are a few libraries out there which can take code and even compile it for you.
libclang
libclang provides a lot of facilities for parsing c++. It's a good, clean library, and it'll parse anything the clang compiler itself will parse. This article here is a good starter
libclang provides a JIT compilation tool that allows you to write and compile C++ at runtime. See this blog post here for a overview of what it does and how to use it. It's very general, very powerful, and user-written code should be fast.
GCC also provides a library called libgccjit for just-in-time compilation during the runtime of a program. libgccjit is a C library, but there's also a C++ wrapper provided by the library maintainers. It can compile abstract syntax trees and link them at runtime, although it's still in Alpha mode.
cppast
If you don't want to use libclang, there's also a library under development called cppast, which is a C++ parser which will give you an abstract syntax tree representation of your c++ code. Unfortunately, it won't parse function bodies.
Other tools
If anyone knows any other libraries for compiling or interpreting C++ at runtime, I encourage them to update this post, or comment them so I can update it!
Here is something that lets you embed a C-like scripting language in your application (and a bunch of other cool things):
http://chaiscript.com/
There is lots of documentation:
https://codedocs.xyz/ChaiScript/ChaiScript/

Current state of the art for c++ parsers?

I understand that it's a very hard thing to do, what with #ifdef, #define, and templates, but what is the state of the art of c++ parsers (be it open source, or proprietary?).
I mean, for a university project I'm thinking of creating a tool for analysing c++ code bases, but it seems very hard to find a good parser for it.
Should I give up and settle for java parsers? Similarly, what's the state of the art for java parsers? What about c#?
Also, would ripping the parser part of g++ apart from it ever work for the purposes of code analysis, or is it too much effort trying to do so?
You're in luck! Clang just started being able to parse most c++ programs within the last few months: http://clang.llvm.org/ It's one of the few open source parsers actually able to parse most of C++. (Mostly just GCC and CLANG, I hear Oink(?) Can get pretty good sometimes) And it's built to be used as a library by IDEs and the like, even has architecture built to support code rewriting.
There are some proprietary parsers that get the job done, But none of them are really usable without source access.
Regarding ripping apart gcc, That's not very practical for code analysis depending on what you are trying to do, you could use the new plugin architecture to get some usable information out of it, however at a very early level in parsing, it does something called term folding, where the parser itself will optimize out things like "x = x" (A simplistic example) And other aspects of the compiler expects this to happen, so it's not trivial to remove. Thus making gcc nearly useless for anything resembling source rewriting.
For C++ you can use GCC with -fdump-translation-unit & friends option to get AST from it.
See: http://www.manpagez.com/man/1/g++/
If you can compile something by g++ then you can get tree from it.
The industry stardard C++ parser, widely used in compilers, in EDG's C++ front end. I have no experience with this; but I understand it handles a huge variety of C++ dialects. I understand you can get it free for research purposes.
The open source standard is the GCC compiler. I hear is it difficult to understand and modify.
There's CLANG as mentioned in other answers. I have no experience here. My understanding is that it is fairly sophisticated especially in terms of supporting analysis.
Our proprietary DMS Software Reengineering Toolkit has full C++ parser with full name and type resolution, preprocessor expansion (or retention, which the other tools will not do). The C++ front end handles several dialects of C++: ANSI, GCC, MS Visual Studio. As you might guess, I have a lot of experience with this one.
DMS/CppFrontEnd has been used to carry out program analyses as well as massive program source-to-source transformations on C++ code, enabled by DMS's pattern parser, which will parse any fragment of C++ code. I believe the other C++ front ends don't provide source-to-source transformations. With those you can likely hack at the ASTs procedurally, but this is pretty inconvenient because you have to know the precise AST structure and for C++ this is pretty complicated.
DMS also has full C, Java and COBOL front ends with name and type resolution as well as control and data flow analysis. It has parsers (but not name and type analysis) for many other langauges, including C#.
AFAIK, the other "C++ parsers" can't do this, sort of by definition. One can apply source-to-source transformations on any of these, or any mixture of these.
clang is worth looking into. it's fast and they provide apis to hook into their backend.
Xcode 4 uses clang for tasks such as parsing, error reporting/detection in some cases, auto-completion, and fix-its.

Need C++ parser

I need a good, stable and, maybe, easy to use C++ parser library with C/C++ interface (C is preferred).
I hear that cint is good c++ interpreter. Can I use it (or some part of it) for this purpose?
Any suggestions?
See: http://clang.llvm.org/
It has both a C++ and a C interface (libclang).
C++ parsing is famously hard. AFAIK there are only three parsers that are acceptable by todays standards: EDG (widely used as a frontend in popular C++ compilers), GCC's and Microsoft's. And apparently, Microsoft has started using EDG's parser in VS2010, for Intellisense.
When you're looking at the free options, you're pretty much stuck at GCC. It can produce XML, though, so the easy part is there. (Easy by C++ parsing standards, that is)
Clang is the most up-to-date and mature option, with a decent C++ API (but no plain C). Elsa is a bit out of date and unmaintained, but still a usable choice. Both could be used as libraries as well as standalone XML frontends.
If you want to parse C or C++ code, there are some options:
http://bellard.org/tcc/
http://students.ceid.upatras.gr/~sxanth/ncc/
If you want to create a parser using C/C++, you can try:
http://boost-spirit.com/home/
http://dinosaur.compilertools.net/ Lex and Yacc
http://www.codeguru.com/csharp/.net/net_general/patterns/article.php/c12805 Flex and Bison
Our C++ Front End is able to parse a variety of C++ dialects (ANSI, GCC, MSVS), automatically builds ASTs whose nodes are marked with precise source positions and are decorated with any nearby comment text, and builds a full symbol table. (EDIT Jan 2013: the C++ front end has been able to handle C++11 for quite awhile now).
The C++ front end is built on top of our DMS Software Reengineering Toolkit, generalized compiler technology for program analysis and transformation, designed to support custom tool building. The C++ front end includes a preprocessor, in which the preprocessor directives can be expanded or not collectively or individually as appropriate for the task. It also includes full symbol construction with all the nasty Koenig lookup stuff.
DMS accepts explicit language definitions (that's how it understands C++; there are also fron ends for C, C#, Java, COBOL, and variety of other languages). DMS provides general parsing, symbol table building, flow analysis machinery, procedural APIs for tree navigation/inspection/modification, source-to-source transformation, and AST-to-source text regeneration including the original comments, number radices, etc. All of these capabilities are available for use by the C++ Front End.
DMS is also designed to handle the scale required for serious tasks. Often you need not just one compilation unit (which is what GCC will give you at best) but access to an entire set. DMS has been used to analyze/transform thousands of C++ compilation units, and literally tens of thousands of C compilation units (on a 25 million line application).
"Easy to use library" is an oxymoron when it comes to program manipulation tools. The langauges themselves are complex (C++ being one of the most difficult and getting worse with C++0X) and that induces complexity in the nature of the questions you can ask and what the answers look like (e.g. "are there any template instantions that can modify local variable X in method Y in class C in any namespace N?"). The questions themselves are hard.
What you want is a library with the necessary complexity to let you carry off your task. DMS has been under continuous development for the last 15 years, to provide that necessary complexity. If you want to do serious program processing, I claim you will need that information.
As proof, DMS has been used to carry out massive automated reengineering of C++-based mission avionics software for Boeing. I don't believe there are any other tools that can do this. (Clang looks to be trying, but only for C++. YMMV).
I don't know for cint, but I heard people use gcc-xml for this.
I have been looking for a good stand-alone library too, but haven't found any.
If you're feeling brave the links in the answer to "is there a yacc-able C++ grammar?" might be helpful. Gcc-xml and clang have already been suggested and Swig also has an XML output which depending on what you're trying to achieve might be relevant.
I did not try it, but I think that best choice will be getting modules for parsing from some popular open source compiler like gcc for C++;
Maybe you'll find something interesting here http://www.nobugs.org/developer/parsingcpp/

How to parse a collection of c++ header files?

I am working in a project and I want to do reflection in C++ so after research I found that the best way is to parse header files to get abstract syntax tree in XML format and use it in reflection. I tried many tools but none of them compatible with visual c++ 2008 or visual c++ 2010 like coco, cint, gccxml. please replay soon
Visual Studio already parses all code in your project (IntelliSense feature). You can use Visual C++ Code Model for access.
Our C++ front end is capable of parsing many dialects of C++, including GNU and MS. It builds compiler data structures for ASTs and symbol tables with the kind of information needed to "do reflection" for C++. It is rather trivial to export the parse tree as an XML document. The symbol table information could be exported as XML by walking the symbol structure.
People always seem to want the AST and symbol table data in XML format, I guess under the assumption that they can read it into a DOM structure or manipulate it with XSLT. There are two serious flaws to this idea: 1) the sheer volume of the XML data is enormous, and generating/rereading it simply adds a lot of time 2) that having these structures available will make "easy" to do ...something....
What we think people really want to do is to analyze the code, and/or transform the code (typically based on an analysis). That requires that the tool, whatever it is, provide access to the program structure in a way that makes is "easier" to analyze and, well, transform. For instance, if you decide to modify the AST how will you regenerate the source text?
We have built the DMS Software Reengineering Toolkit to provide exactly the kind of general purpose support to parse, analyze, transform, prettyprint ("regenerate source"). DMS has front ends for a wide variety of languages (C++, C, Java, COBOL, Python, ...) and provides the set of standard services useful to build custom analyzers/transformations on code. At the risk of being bold, we have spent a long time thinking about implementing useful mechanisms to cover this set of tasks, in the same way that MS has spent a long time determining what should be in Windows. You can try to replicate this mechanism but expect it to be a huge cost (we have been working on DMS for 15 years), or you can close your eyes and pretend you can hack together enough to do what you imagine you need to do (mostly what you'll discover is that it isn't enough in practice).
Because of this general need for "program manipulation services", our C++ front end is hosted on top of DMS.
DMS with the C++ front end have been used to build a variety of standard software engineering tools (test coverage, profilers) as well as carry out massive changes to code (there's a paper at the webiste on how DMS was used to massively rearchitect aircraft mission software).
EDIT 7/8/2014: Our Front end now handles full C++11, and parts of C++14, including control and dataflow for functions/procedures/methods.

Are there any free tools to help with automatic code generation?

A few semesters back I had a class where we wrote a very rudimentary scheme parser and eventually an interpreter. After the class, I converted my parser into a C++ parser that did a reasonably good job of parsing C++ as long as I didn't do anything fancy with the preprocessor or macros. I could use it to read over my classes and functions and do neat things like automatically generate class readers or writers or set up function callbacks from a text file.
However, my program is pretty limited. I'm sure I could spend some time to make it more robust and do more neat things, but I don't want to spend the time and effort if there are already more robust tools available that do the same thing. I figure there has to be something like this out there since parsers are an essential part of compilers, but I haven't seen tools specifically for automatic code generation that make it easy to go through and play with data structures that represent classes, functions and variables for C++ specifically. Are there tools that do this?
Edit:
Hopefully this will clarify a little bit of what I'm looking for. The program I have runs as a prebuild step in visual studio. It reads over my source files, makes a list of classes, their members, their functions, etc. which is then used to generate new code. Currently I just use it to make it easy to read and write my data structures to a plain text file, but I could do other things as well. The file readers and writers are output into plain .cpp and .h files which I include in the rest of my project just as I would any other file. What I'm looking for are tools that do similar things so I can decide if I should continue to use my own or switch to a some better solution. I'm not looking for anything that generates machine code or edits code that I've written.
A complete parser-building tool like ANTLR or YACC is necessary if you want to parse C++ from scratch, but it's overkill for your purposes.
It reads over my source files, makes a list of classes, their members, their functions, etc. which is then used to generate new code.
Two main options:
GCC-XML can generate a list of classes, members, and functions. The distribution version on their web site is quite old; try the CVS version instead. I don't know about the availability of a Windows port.
Doxygen is designed for producing documentation, but it can also produce an XML output, which you should be able to use to do what you want.
Currently I just use it to make it easy to read and write my data structures to a plain text file...
This is known as serialization. Try Boost.Serialization or maybe libs11n or Google Protocol Buffers. Stack Overflow has further discussion.
...but I could do other things as well.
Other cool applications of this kind of automatic code generation include reflection (inspecting your objects' members at runtime, using duck typing with C++, etc.) and generating wrappers for calling C++ from scripting languages. For a C++ reflection library, see Reflex. For an example of generating wrappers for scripting languages, see Boost.Python or SWIG.
The C++ FAQ Lite has references to YACC grammars for C++. YACC is an old-school parser that was used to generate parser output, clumsy and difficult to learn but very powerful. Nowadays, you'd use Gnu Bison instead of YACC.
Don't forget about Cog. It requires you to know Python. In essence it embeds the output of Python scripts into your code. It's absurdly easy to use, but it takes a totally different approach from things like ANTLR and its purpose is somewhat different.
Maybe Boost::Serialize or ANTLR?
I answered a similar question (re splitting source files into separate header and cpp files) by suggesting the use of lzz.
lzz has a very powerful C++ parser that builds a representation for everything except the bodies of functions. As long as you don't need the contents of the function bodies you you could modify 'lzz' so that it performs the generation step you want.
If you want tools that can parse production C++ code, and carry out arbitrary analyses and transformations, see our DMS Software Reengineering Toolkit and its C++ front end.
It would be straightforward to use the information DMS can provide about C++ code, its structures, types, instances, to generate such access functions. If you wanted to generate access functions in another language, DMS provides means to code transformations from the input language (in this case, C++) to that target language.
Mozilla developed Pork for this kind of thing. I can't say it's easy to use (or even to build), but it is in production.
I've already used professionally the Nvelocity engine combined with C# as a prevoius step to coding, with very good results.