Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
We all have the good habit of documenting our code, right?
Nowadays, in-code documentation itself has a syntax. It's almost like a programming language onto itself. The questions are:
What (How many) documentation syntax specifications exist?
Is there a standard documentation syntax?
Who is defining this standard? Is there an official committee or body (like there is one for defining C++ standards)?
Or has "doxygen" become the de-facto standard?
It's difficult not to have heard about doxygen. It is mentioned in every open source software project I have taken part in. Yet, looking at the official doxygen web site, it is far from obvious that doxygen is defining any kind of specification! The impression I get when I read "the ways it can help me", is that doxygen is simply a software to extract in-code documentation and present it in beautiful HTML pages. Looking at the doxygen front page, I could even think that doxygen could use any documentation syntax defined in 3rd party specifications and extract it and output it as HTML.
Also, it is interesting to note that the doxygen web site does not capitalize the word doxygen, as if it were not the brand of their software but a common noun (well, is it?)
What is doxygen really?
a parsing engine?
an HTML rendering engine?
a library that can be used by 3rd party software to render their own docs?
a documentation syntax (de facto) specification?
all of the above?
I am particularly confused as to the relationship between doxygen and other code parsers like ANTLR, boost-spirit, Ragel...
For example, what is it that doxygen can do that ANTLR cannot, and that ANTLR can that doxygen cannot?
Also, looking at the Drupal project. They have:
their own API module which is "an implementation of a subset of the Doxygen documentation generator specification".
their own grammar parser module (to add to the list above, alongside doxygen itself, ANTLR, et all).
their own API web site running the two aforementioned modules, presenting the Drupal in-code "doxygen" documentation.
So, to take a C++ analogy, it seems that the word "doxygen" is overloaded and means different things in different contexts.
Within the Drupal project, "doxygen" does not refer to a software, but simply a (standard?) specification for documentation syntax even though, as I said above, the front page of the doxygen web site itself does not claim to be such a thing!
So, my parting question is:
Is there any other documentation syntax specification?
What (How many) documentation syntax specifications exist?
Almost every medium software development organization seems to have their own. Often they are included under the umbrella of "coding style guidelines".
Is there a standard documentation syntax?
There are a few standards that I am aware of which have some widespread use. This is surely not a comprehensive list:
JavaDoc
The C# XML documentation format (ECMA-334)
QDoc (sometimes confused as being the Doxygen)
RubyDoc
Plain Old Documentation (POD)
Who is defining this standard?
There is no standard.
Is there an official committee or body (like there is one for defining C++ standards)?
Not really, though the C# XML documentation format is managed by ECMA, which is a standards organization.
Or has "doxygen" become the de-facto standard?
Doxygen is not a standard. It recognizes a number of standards. See http://www.doxygen.nl/manual/features.html.
Typically most people use doxygen to generate docs they wrote while loosely following either the QDoc standard or the JavaDoc standard. Often when people talk of "the" doxygen standard, more often than not they mean the QDoc documentation style, plus some arbitrary usage of doxygen extensions. My experience is that most organization using doxygen aren't really following any particular convention very rigidly, simply because doxygen doesn't enforce one.
...it is far from obvious that doxygen is defining any kind of specification!
It isn't.
doxygen is simply a software to extract in-code documentation and present it in beautiful HTML pages.
Yes exactly. It also supports XML, Latex, RTF, and UNIX "man" page outputs.
Looking at the doxygen front page, I could even think that doxygen could use any documentation syntax defined in 3rd party specifications and extract it and output it as HTML.
Not any, but many.
Also, it is interesting to note that the doxygen web site does not capitalize the word doxygen, as if it were not the brand of their software but a common noun (well, is it?)
Its not a commercial product, Dimitri doesn't care much about branding.
What is doxygen really?
A documentation generation tool.
I am particularly confused as to the relationship between doxygen and other code parsers like ANTLR, boost-spirit, Ragel...
Those are parsing libraries.
For example, what is it that doxygen can do that ANTLR cannot, and that ANTLR can that doxygen cannot?
Libraries like ANTLR are used to build software, while doxygen is a specialized tool for generating documentation. So while you could use ANTLR to write a documentation generator, you wouldn't want to use doxygen to build a compiler (I don't say can't, because surely you could, I have seen stranger things).
Is there any other documentation syntax specification?
Already answered above.
Hope this helps.
there is no standard.
Doxygen style is almost standard (gcc template library uses it).
http://en.wikipedia.org/wiki/Comparison_of_documentation_generators
You are right - Doxygen is more of a documentation extraction application than a "commenting standard" per se. It supports many different documentation styles - JavaDoc (with '#' introducing a command), a Doxygen variant (with '\' introducing the same commands), Documentation XML, and many variations on the comment block format that is allowed. It is also able to use the formatting of comments to indicate what content is (e.g. brief descriptions need not be tagged as such, and can be taken from the first sentence or paragraph of the text, etc.)
As such, it is highly configurable but allows almost every programmer to have their own style which leads to a nonstandard mess from one project to another, and often between different comments within a single project - even when they are written by a single programmer! The plus side is that as long as the comment stays within the basic style, Doxygen will correctly extract the docs for you and format them all into a consistent external document. The minus side is that although many programmers "use doxygen comments" (which sounds standardised), their comment formats can often be totally dissimilar.
One solution (for Visual Studio) that can at least help with this disparity of styles within your own project/team/company is an addin I've written, AtomineerUtils. This helps you to author and update Doxygen, JavaDoc and XML documentation format comments - it auto-generates documentation to save lots of time, and updates comments to keep them in sync with changes to the code. During this process it can reformat the comment to achieve a very consistent and readable style (order the entries in a standard format, enforce blank lines between comments and code and between entries, word-wrap the text in entries, etc). The user can set up templates that control exactly how all of this works, so it's easy to achieve precisely the style you want, but make it consistent across all your projects. This improves consistency a lot when you have more than one programmer working on a body of code.
If you are documenting in Visual Studio, I would recommend the XML documentation format. It's not as human-readable as Doxygen/JavaDoc styles can be, but it's used by the IDE to provide live intellisense data on code as you type, and is exported to XML files that any application can easily process, which gives you a lot more flexibility. Doxygen can build docs from this format, so you can stil use the Doxygen tools with XML source comments too.
Is there any other documentation syntax specification?
Yes, of course. For example, there's JavaDoc (or however that's spelled). And Microsoft's XML stuff (however that's called).
However, it seems doxygen is pretty much the de-facto standard in the Open Source C++ arena, though. When I originally heard about doxygen (~10 years ago), there used to be others around, but it seems they've vanished.
Related
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
A free tool to check C/C++ source code against a set of coding standards?
I am starting a c++ project involving several people I have no direct access to. We agreed on a coding style guide, which e.g. defines the casing for class members depending on the accessibility (i.e. privates in pascal case, publics and protecteds in camel case. Please, don't start discussions about the style guide. I had enough. Thank you.).
What I want to do now is to generate some reporting of style guide violations. I don't want to enforce the style guide, e.g. at commit, but I want to provide a tool which each developer can use to see where his/her code violates the style guide (if he/she wants to check it).
Do you know a tool which can do the Job?
(It needs to be able to understand some C++, e.g. to detect the accessibility of class members.)
well, you could run your code through AStyle or Uncrustify on commit, which would at least re-format bad code to some standard. I find that's the majority problem with code commits and standards - if you reformat after commmit, it shows up as a lot of delta changes that are entirely trivial.
Otherwise, check the other SO answer.
Style guides tends to be company-specific, and one has to write company-specific checks to achieve them.
My company offers customizable C++ style checkers, in which one can check for deprecated idioms by syntax, check that variables and types have certain properties, or verify that certain commands occur in certain orders locally. These checkers use C++ dialect precise parsers on the source code. The customization isn't easy; you need the underlying engine and some knowledge of parsing C++ programs.
It is possible to write rules that check for layout, but it is a lot of unrewarding work, and resolving such complaints isn't a productive use of programmer resource IMHO. And if you aren't going to enforce your style, why are you annoying the programmer with complaints at all? IT seems easier (as another poster noted) to simply run a layout-formatter that produces the right result at no cost to the programmer.
One of the issues with generic formatters is that being language-imprecise, they may misinterpret the source code and sometimes break it as they format, leading to compilation errors, debugging and wasted time. We also offer C++ Formatters to accomplish the formatting using the same language precise parsers as the style checker; they can't break your code during reformatting.
I've been successfully using the vera++ tool to do this for our projects. I've wrote a number of rules (in TCL) to adopt our company style guidelines. It was a bit painful, until I came around all the false positives reported from my checks. At least it's working well now and I have integrated the reports to the Jenkins build analysis.
The reports can also be easily adopted to a custom error analysis in the Eclipse IDE.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 2 years ago.
Improve this question
We want to parse our huge C++ source tree to gain enough info to feed to another tool to make diagrams of class and object relations, discern the overall organization of things etc.
My best try so far is a Python script that scans all .cpp and .h files, runs regex searches to try to detect class declarations, methods, etc. We don't need a full-blown analyzer to capture every detail, or some heavy UML diagram generator - there's a lot of detail we'd like to ignore and we're inventing new types of diagrams. The script sorta works, but by gosh it's true: C++ is hard to parse!
So I wonder what tools exist for extracting the info we want from our sources? I'm not a language expert, and don't want something with a steep learning curve. Something we low-brow blue-collar programmer grunts can use :P
Python is preferred as one of the standard languages here, but it's not essential.
I'll simply recommend Clang.
It's a C++ library-based compiler designed with ease of reuse in mind. It notably means that you can use it solely for parsing and generating an Abstract Syntax Tree. It takes care of all the tedious operator overloading resolution, template instantiation and so on.
Clang exports a C-based interface, which is extended with Python Bindings. The interface is normally quite rich, but I haven't use it. Anyway, contributions are welcome if you wish to help extending it.
You could check out GccXML and OpenC++, as well as doxygen.
Can you run a preprocessing step? Doxygen parses most C++ syntax and creates xml with all the relationships. Compilers also create debug databases (typically dwarf format from gcc and codeview format from MSC).
From what you say of our requirements, Tony's answer of GccXML will probably be the best option. If that doesn't work, you could try to generate an outline of your program with cscope or ctags, and then work your way to the info you want from it's output.
You asked for tools that can extract information from C++.
Our DMS Software Reengineering Toolkit is configurable compiler technology for building custom analyzers. It has a full C++ Front End with a preprocesser, full C++ parsing with AST construction (including capture of comments), and full symbol table. These could be used to extract such structural information, and export it to whatever you want to process it.
EDIT: One of the comments is that there are only 3 full C++ parsers in the world. I suspect more; surely IBM has one that works. DMS's C++ front end has been used in anger on large applications in both MS Visual Studio and on GNU C++ source codes, so it might reasonably qualify, too :-}
I've had good experience with PLY:
http://www.dabeaz.com/ply/
But this requires some experience with lex and yacc
If you can bring yourself to run this analysis using a Windows-platform application, save yourself a lot of time and trouble, and spend $200 on Enterprise Architect by Sparx Systems (I have no affiliation with this company, just a satisfied customer). (Note: this should not be confused with Microsoft's own "Enterprise Architect" bundle for Visual Studio.)
EA can reverse-engineer a number of languages, including C++, C, Java, and Python, generating some very nice UML class diagrams. (EA comes in a number of different packages, Desktop is the cheapest but you have to by Professional, the 2nd cheapest, to get the code engineering feature included.) I also like the integration between the generated class diagrams and sequence diagramming, where you can drag a line between object lifelines and a menu of defined methods is presented to you based on the class definition of the target object. At my former consulting business, we used this tool quite a bit to develop system architectural proposals which we then included as part of our project bid (just copy/paste the diagram into a Word doc). It wont take long to make back your $200.
Could anyone please suggest a proper format of comments I should use in a C++ project? I think that there is some analog of javadoc format, but I can't find which one. Is there any de-facto standard of that kind? Thanks!
Doxygen is used rather frequently for C++ projects (Doxygen supports other languages as well). You can use Doxygen-style comments as a starting point (here are some examples). Just be consistent.
It's quite controversial as a subject in every language.
While I appreciate the Doxygen answers, since it's effectively kind of a de facto standard, Doxygen itself supports multiple formats.
C++ supports 2 styles of comments:
/**/ multi-line comment
// until the end of line comment
I would advise not to use the former style. The issue is that it does not nest, which is quite annoying when for testing purpose you wish to comment out a whole block of code.
My 2 cents, as the saying goes.
Your mention of javadoc suggests you might be thinking of doxygen. This is a kit which supports structured comments in C++ and other languages, and can produce output in HTML and (if I recall correctly) PDF. It supports a variety of comment styles, including one which looks very much like javadoc.
It works well, and is broadly used.
We are slowly moving towards better-standardized commenting in a large C++ project, introducing Doxygen. I personally find it a pain typing in comments, especially since Java IDEs are so good at automating this.
So I wondered what tools there might be? A search turned up DoxyComment which looks quite nice, is this the best/standard tool or are there others worth a look too?
Atomineer is a tool that I and a few others have been using for documenting unmanaged C++ code with Doxygen markup. It's not free, but it is cheap, and may be worth a try:
http://www.atomineerutils.com/products.php
If typing the meta-comments which are instructions to doxygen is a significant part of your comment-writing effort, you're doing it wrong.
Comment should not include things which can be automatically determined by a tool, any programmer will determine just as much (or more) information from e.g. parameter names than any tool.
Another way to look at this is that doxygen already does an excellent job of presenting what can be automatically determined. You don't need to write: "B::B constructs a B object", since doxygen is going to sort it into the constructors section of the documentation automatically.
Focus on what's non-obvious, and take time to think about what you're writing.
Normally many functions and variables will have no need for an individual comment, since either the name is descriptive enough, or they are better explained in a class-level comment describing how multiple members interact.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
What are some good tools for getting a quick start for parsing and analyzing C/C++ code?
In particular, I'm looking for open source tools that handle the C/C++ preprocessor and language. Preferably, these tools would use lex/yacc (or flex/bison) for the grammar, and not be too complicated. They should handle the latest ANSI C/C++ definitions.
Here's what I've found so far, but haven't looked at them in detail (thoughts?):
CScope - Old-school C analyzer. Doesn't seem to do a full parse, though. Described as a glorified 'grep' for finding C functions.
GCC - Everybody's favorite open source compiler. Very complicated, but seems to do it all. There's a related project for creating GCC extensions called GEM, but hasn't been updated since GCC 4.1 (2006).
PUMA - The PUre MAnipulator. (from the page: "The intention of this project is to
provide a library of classes for the analysis and manipulation of C/C++ sources. For this
purpose PUMA provides classes for scanning, parsing and of course manipulating C/C++
sources."). This looks promising, but hasn't been updated since 2001. Apparently PUMA has been incorporated into AspectC++, but even this project hasn't been updated since 2006.
Various C/C++ raw grammars. You can get c-c++-grammars-1.2.tar.gz, but this has been unmaintained since 1997. A little Google searching pulls up other basic lex/yacc grammars that could serve as a starting place.
Any others?
I'm hoping to use this as a starting point for translating C/C++ source into a new toy language.
Thanks!
-Matt
(Added 2/9): Just a clarification: I want to extract semantic information from the preprocessor in addition to the C/C++ code itself. I don't want "#define foo 42" to disappear into the integer "42", but remain attached to the name "foo". This, unfortunately, excludes several solutions that run the preprocessor first and only deliver the C/C++ parse tree)
Parsing C++ is extremely hard because the grammar is undecidable. To quote Yossi Kreinin:
Outstandingly complicated grammar
"Outstandingly" should be interpreted literally, because all popular languages have context-free (or "nearly" context-free) grammars, while C++ has undecidable grammar. If you like compilers and parsers, you probably know what this means. If you're not into this kind of thing, there's a simple example showing the problem with parsing C++: is AA BB(CC); an object definition or a function declaration? It turns out that the answer depends heavily on the code before the statement - the "context". This shows (on an intuitive level) that the C++ grammar is quite context-sensitive.
You can look at clang that uses llvm for parsing.
Support C++ fully now link
The ANTLR parser generator has a grammar for C/C++ as well as the preprocessor. I've never used it so I can't say how complete its parsing of C++ is going to be. ANTLR itself has been a useful tool for me on a couple of occasions for parsing much simpler languages.
Depending on your problem GCCXML might be your answer.
Basically it parses the source using GCC and then gives you easily digestible XML of parse tree.
With GCCXML you are done once and for all.
pycparser is a complete parser for C (C99) written in Python. It has a fully configurable AST backend, so it's being used as a basis for any kind of language processing you might need.
Doesn't support C++, though. Granted, it's much harder than C.
Update (2012): at this time the answer, without any doubt, would be Clang - it's modular, supports the full C++ (with many C++-11 features) and has a relatively friendly code base. It also has a C API for bindings to high-level languages (i.e. for Python).
Have a look at how doxygen works, full source code is available and it's flex-based.
A misleading candidate is GOLD which is a free Windows-based parser toolkit explicitly for creating translators. Their list of supported languages refers to the languages in which one can implement parsers, not the list of supported parse grammars.
They only have grammars for C and C#, no C++.
Parsing C++ is a very complex challenge.
There's the Boost/Spirit framework, and a couple of years ago they did play with the idea of implementing a C++ parser, but it's far from complete.
Fully and properly parsing ISO C++ is far from trivial, and there were in fact many related efforts. But it is an inherently complex job that isn't easily accomplished, without rewriting a full compiler frontend understanding all of C++ and the preprocessor. A pre-processor implementation called "wave" is available from the Spirit folks.
That said, you might want to have a look at pork/oink (elsa-based), which is a C++ parser toolkit specifically meant to be used for source code transformation purposes, it is being used by the Mozilla project to do large-scale static source code analysis and automated code rewriting, the most interesting part is that it not only supports most of C++, but also the preprocessor itself!
On the other hand there's indeed one single proprietary solution available: the EDG frontend, which can be used for pretty much all C++ related efforts.
Personally, I would check out the elsa-based pork/oink suite which is used at Mozilla, apart from that, the FSF has now approved work on gcc plugins using the runtime library license, thus I'd assume that things are going to change rapidly, once people can easily leverage the gcc-based C++ parser for such purposes using binary plugins.
So, in a nutshell: if you the bucks: EDG, if you need something free/open source now: else/oink are fairly promising, if you have some time, you might want to use gcc for your project.
Another option just for C code is cscout.
The grammar for C++ is sort of notoriously hairy. There's a good thread at Lambda about it, but the gist is that C++ grammar can require arbitrarily much lookahead.
For the kind of thing I imagine you might be doing, I'd think about hacking either Gnu CC, or Splint. Gnu CC in particular does separate out the language generation part pretty thoroughly, so you might be best off building a new g++ backend.
Actually, PUMA and AspectC++ are still both actively maintained and updated. I was looking into using AspectC++ and was wondering about the lack of updates myself. I e-mailed the author who said that both AspectC++ and PUMA are still being developed. You can get to source code through SVN https://svn.aspectc.org/repos/ or you can get regular binary builds at http://akut.aspectc.org. As with a lot of excellent c++ projects these days, the author doesn't have time to keep up with web page maintenance. Makes sense if you've got a full time job and a life.
how about something easier to comprehend like tiny-C or Small C
Elsa beats everything else I know hands down for C++ parsing, even though it is not 100% compliant. I'm a fan. There's a module that prints out C++, so that may be a good starting point for your toy project.
See our C++ Front End
for a full-featured C++ parser: builds ASTs, symbol tables, does name
and type resolution. You can even parse and retain the preprocessor
directives. The C++ front end is built on top of our DMS Software Reengineering
Toolkit, which allows you to use that information to carry out arbitrary
source code changes using source-to-source transformations.
DMS is the ideal engine for implementing such a translator.
Having said that, I don't see much point in your imagined task; I don't
see much value in trying to replace C++, and you'll find building
a complete translator an enormous amount of work, especially if your
target is a "toy" language. And there is likely little point in
parsing C++ using a robust parser, if its only purpose is to produce
an isomorphic version of C++ that is easier to parse (wait, we postulated
a robust C++ already!).
EDIT May 2012: DMS's C++ front end now handles GCC3/GCC4/C++11,Microsoft VisualC 2005/2010. Robustly.
EDIT Feb 2015: Now handles C++14 in GCC and MS dialects.
EDIT August 2015: Now parses and captures both the code and the preprocessor directives in a unified tree.
EDIT May 2020: Has been doing C++17 for the past few years. C++20 in process.
A while back I attempted to write a tool that will automatically generate unit tests for c files.
For preprosessing I put the files thru GCC. The output is ugly but you can easily trace where in the original code from the preprocessed file. But for your needs you might need somthing else.
I used Metre as the base for a C parser. It is open source and uses lex and yacc. This made it easy to get up and running in a short time without fully understanding lex & yacc.
I also wrote a C app since the lex & yacc solution could not help me trace functionality across functions and parse the structure of the entire function in one pass. It became unmaintainable in a short time and was abandoned.
What about using a tool like GNU's CFlow, that can analyse the code and produce charts of call-graphs, here's what the opengroup(man page) has to say about cflow. The GNU version of cflow comes with source, and open source also ...
Hope this helps,
Best regards,
Tom.