Extracting Basic Blocks/CFG from LLVM/clang on the Backend - llvm

I've been beginning to work with LLVM and I'm interested to know if there is a programmatic way to extract the control flow graph and/or basic blocks from LLVM/clang in order to do some analysis on them. Is there a way to hook into the tool chain and pull out this information instead of doing a straight compilation? If not, what are the alternatives?

LLVM supports plugin passes. It would be straight-forward to write a pass to emit whatever data you want in whatever format you want.
However, LLVM has a large suite of analysis and transform passes already. You may be able to use the existing LLVM framework to extract the data you want after running the analysis passes you want.
Take a look at the docs, the code, and then ask more specific questions on the LLVMdev list to get the best answers.

The CFG (Control Flow Graph) is purely part of CLang.
The CFG supports Visitors (see CFG.h) but you might want to ask on CLang dev list if there is a code sample available.

Related

Generate metrics of project with Doxygen?

I currently use Doxygen to generate the documentation of my C++ projects. As Doxygen is great and generates a lot of information, I was wondering if there was a way to integrate metrics of the project in the generated documentation.
When I talk of metrics, I think of lines of code, number of classes, number of functions, cyclomatic complexity, etc.
Is there something to do that ?
If that's not possible directly, is there a way we can create a little plugin to Doxygen to add more informations to the generate documentation ?
I'd look into the XML output generated by doxygen which might have the information you need, although you may need to run doxygen again.
You can add a preprocessor script prior to running doxygen that will generate the metrics for you and create a set of pages to display this information. (Look into the INPUT_FILTER option in the Doxyfile)
I'd also post this question to doxygen-users#lists.sourceforge.net if you haven't already done so.

Extensible lightweight markup language

Lightweight markup languages offer a fixed set of features. This feature set is growing, but every time I write a more complex article, I have to realize something is missing. Examples include: proper image captions, table of figures, file include, cross-references, etc. So I end up creating a tool chain around it, with a Makefile and tricky sed commands.
I typically want to insert ad-hoc markers into my text and process them later. They can be one-liners, or more complex -- and this where the whole regex approach fails. Here is a snippet of an imaginary markup.
I can generate an image from an external dot file [.myDot diag.dot The process],
and it will be included with a caption.
Or the dot source is right here [.myDotHere
foo->bar->Done;
]
I'm looking for a markup tool which can be easily extended to suite my ad-hoc needs. The options I found so far
Makefile, pre- and postprocessing with sed/perl scripts
Built in regex pre-processing in txt2tags
Pandoc parses markdown into an internal AST which can be transformed with haskell scripts
So what I'm looking for is
a language designed with customization and extensibility in mind
lightweight; no TeX/LaTeX please
not something which handles all my specific issues, but not extensible
My output is usually just html, so it doesn't have to support many targets
I created Glyph with extensibility in mind. You can create your own macros either using Glyph itself or Ruby.
Glyph aims to make publishing easier while giving all possible control to the writer, it can manage book metadata, ToC, internal links, snippets, etc. etc.
For documentation on all its features check out the Glyph book, which was created using Glyph itself.
Your "toolchain" approach is a good one - You won't IMO find a single project that will handle your specific needs, best to follow the *nix philosophy and use the best tool for the job that plugs into your open toolchain.
If macro inclusion is an issue, don't worry about solving that by your choice of markup syntax - find the right tool for that specific job and use it upstream.
The choice of markup should be IMO based on the availability of transformation tools to your desired output. IMO Pandoc is by far the most actively developed project in this space, and very flexible, especially with its scripting facility. Note it's also very well supported in GoogleGroups - John will likely respond directly and quickly to any issues you may have.
Note that Pandoc's flexibility also means your master source text isn't as "locked in", as you can easily convert for example from its extended markdown syntax to reST, if say you wanted to take advantage of Sphinx's or DocBook's capabilities. (BTW also check out AsciiDoc, which the latest Pandoc outputs - apparently a reader is also in the works)
Check out Pandoc's "extras" wiki page, I've been particularly excited by the ConTeXt filter script; I'm not sure if it'll be a good fit for you, but it includes some macro include capabilities, and IMO nothing will give you better typographical control.

Tool for generating a call flow graph [C C++ solaris linux]

I'm quite fond of IDA, but I'm working in Solaris on this project. I do have a linux machine, and if nothing is in the same league as IDA then I'll convince management to purchase a license for it.
Barring that, I'm looking for alternative suggestions. Some of the other features in IDA would be handy, but the main thing I need at the moment is a call flow graph generator not based on source code. If it needs extra output from the build step, that's fine, but some of the libraries I need to look at I don't have source for.
So far, it looks like my best choices are Valgrind's Callgrind, lida, and gprof. Any further suggestions are welcome.
re: gprof, the GNU compiler set provided to us by Windriver is missing some libraries that would normally be supplied with a GNU compiler to provide (among other things) facilities for profiling. It's a good solution to the more general problem, but for now I'm opting to try other solutions first.
edit Some of the Rational tools (Purify, Quantify, etc) might also work well for this. I'm in the same boat as with IDA with that, but I figure someone googling might find the suggestion helpful.
edit2 Valgrind hasn't been ported to solaris/sparc ;p
Take a look at the ERESI Project. It's a reverse engineering framework and it has a tool, called ELFsh, with capabilities of generating CFG from machine code. It doesn't have a stable/final yet, but it's worth a shot.
If you want to try it:
download and install (apt-get on Ubuntu)
run elfsh32. You'll enter a shell.
load your binary: load /bin/bash
analyse it: analyse
generate the graph: graph
You'll get a graph in .dot format and a rendered PNG (this one was too large to post here).
You can generate a call graph with Gprof. It can be visualized with Kprof.
Very late answer but can still be useful.. On Solaris you can use collect.
collect your_program your_args...
It will generate a directory like test.1.er
You can then visualize the call graph on the console with er_print -calltree test.1.er
Or on X-Window with analyser

Xerces-C++ DOM node line/column number location

I'm writing a custom XML validator using Xerces-C++. My current approach loads the document into a DOM, and then checks are performed on it. What I need is a way to access the line/column number of a node in the DOM. I've been reading the API docs and googling, but I'm coming up short. Is it possible to somehow retrieve this kind of information about the nodes?
Implementing the XMLValidator interface looks like it would probably provide me with that kind of info, but it would require completely rewriting the intended validation architecture. Frankly, an XMLValidator approach seems ugly and monolithic. I have a different and much simpler validation system in mind (one that is also easily parallelizable) and everything works; all I need is the line/column number info of the nodes. The Qt DOM implementation that I've used before (and which I can't use now) provides this information up front, so I can't see why Xerces is making things difficult.
A possible solution can be found here.

Can you easily configure MediaWiki to accept full HTML/CSS or even JS content?

I'd like to create a technical wiki site and it requires the full use of HTML/CSS and maybe Javascript when editing a page. Is this something I can easily configure in MediaWiki? If not, is there any other wiki software that you'd recommend?
Thanks!
You can enable raw HTML support by setting $wgRawHtml = true; in your LocalSettings.php:
http://www.mediawiki.org/wiki/Manual:$wgRawHtml
However, as noted above this is rather insecure for a public site. (If locked down to registered usage only by known folks it's ok -- but you need to trust your users.)
There are some links on that manual page to extensions organized around letting you put specific known bits of HTML/JS in your output code as well, which may or may not fit your needs better.
Well, while MediaWiki itself does not support this, there are some extensions which allow at least HTML in a page. See for example this extension list. SecureHTML might so what you are looking for.
That said, I'd like to point out that allowing raw HTML rather defeats the purpose of a wiki:
it can and will mess up formatting and create weird problems (clashes between generated and user-provided HTML)
it makes it hard/impossible to convert the wiki to other formats (such as to print it)
it makes searching harder
it makes any kind of security impossible (think XSS)
This is doubly true for allowing Javascript.
So I'd like to ask why you need this. If you need special formatting that MediaWiki does not offer, consider using (or writing) an extension for this.
If you really need arbitrary HTML, a Wiki might not be the best tool for you. You should consider a CMS, or just put HTML files into Subversion.
So what are you trying to do?
Use nowiki tags. Docs can be found here: https://www.mediawiki.org/wiki/Help:Formatting