Use doxygen to document and compare includes and their purpose - c++

I want to use doxygen to document my code in high detail. One thing I want to document, is the "purpose" of single instances of includes of other .c/.h files. And with any chance to find out, if the documented "usage" of the includes meets reality.
Let's say I have an Include #include <some_lib.h>, with a big set of functions but only want or be allowed to use one or two of them, I want to document which I actually use. I would like to say something like:
/**
* Used elements:
* MY_MACRO()
* myFunction()
* myGlobal_Var()
*/
#include <some_lib.h>
Then I want Doxygen to produce some sort of documentation for that file using this information. Preferably a list containing every included file and the the promised items that are used from it.
It would be even better if there is a possibility to get Doxygen to produce a list of the actually used elements. But there are for sure tools out there to do this job.
A Reviewer or a script comparing those 2 results could then easily check if other items then the promised ones used in the implementation (and if the used ones are allowed). At the same time Doxygen will deliver the documentation of the items.
Do you know about a way to achieve this? And/Or have I failed to recognize a Doxygen built-in feature to bring me close to that goal?

Related

Reverse offsetof / Get name of element by offset

Given the offset of an element in a C++ struct, how can I find its name/type without manual counting? This would be especially useful when decoding ASM code where such offsets are regularly used. Ideally the tool would parse a C(++) header file and then give the answer from that. Thanks for any pointers :)
One such tool might be the compiler itself (using the same ABI-relevant flags as used to generate the code). Create a small program which includes the header file, then prints the result of offsetofapplied to each struct's members. You'll then have a suitable look-up table which you could refer to manually, or use as input to another tool you might write.
It may be possible (depending on the complexity of the headers) to auto-generate the program above (you'll probably want to run the header through the C preprocessor first, to expand macros and select the correct branch of conditionals).

How do I associate changed lines with functions in a git repository of C code?

I'm attempting to construct a “heatmap” from a multi-year history stored in a git repository where the unit of granularity is individual functions. Functions should grow hotter as they change more times, more frequently, and with more non-blank lines changed.
As a start, I examined the output of
git log --patch -M --find-renames --find-copies-harder --function-context -- *.c
I looked at using Language.C from Hackage, but it seems to want a complete translation unit—expanded headers and all—rather being able to cope with a source fragment.
The --function-context option is new since version 1.7.8. The foundation of the implementation in v1.7.9.4 is a regex:
PATTERNS("cpp",
/* Jump targets or access declarations */
"!^[ \t]*[A-Za-z_][A-Za-z_0-9]*:.*$\n"
/* C/++ functions/methods at top level */
"^([A-Za-z_][A-Za-z_0-9]*([ \t*]+[A-Za-z_][A-Za-z_0-9]*([ \t]*::[ \t]*[^[:space:]]+)?){1,}[ \t]*\\([^;]*)$\n"
/* compound type at top level */
"^((struct|class|enum)[^;]*)$",
/* -- */
"[a-zA-Z_][a-zA-Z0-9_]*"
"|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?"
"|[-+*/<>%&^|=!]=|--|\\+\\+|<<=?|>>=?|&&|\\|\\||::|->"),
This seems to recognize boundaries reasonably well but doesn’t always leave the function as the first line of the diff hunk, e.g., with #include directives at the top or with a hunk that contains multiple function definitions. An option to tell diff to emit separate hunks for each function changed would be really useful.
This isn’t safety-critical, so I can tolerate some misses. Does that mean I likely have Zawinski’s “two problems”?
I realise this suggestion is a bit tangential, but it may help in order to clarify and rank requirements. This would work for C or C++ ...
Instead of trying to find text blocks which are functions and comparing them, use the compiler to make binary blocks. Specifically, for every C/C++ source file in a change set, compile it to an object. Then use the object code as a basis for comparisons.
This might not be feasible for you, but IIRC there is an option on gcc to compile so that each function is compiled to an 'independent chunk' within the generated object code file. The linker can pull each 'chunk' into a program. (It is getting pretty late here, so I will look this up in the morning, if you are interested in the idea. )
So, assuming we can do this, you'll have lots of functions defined by chunks of binary code, so a simple 'heat' comparison is 'how much longer or shorter is the code between versions for any function?'
I am also thinking it might be practical to use objdump to reconstitute the assembler for the functions. I might use some regular expressions at this stage to trim off the register names, so that changes to register allocation don't cause too many false positive (changes).
I might even try to sort the assembler instructions in the function bodies, and diff them to get a pattern of "removed" vs "added" between two function implementations. This would give a measure of change which is pretty much independent of layout, and even somewhat independent of the order of some of the source.
So it might be interesting to see if two alternative implementations of the same function (i.e. from different a change set) are the same instructions :-)
This approach should also work for C++ because all names have been appropriately mangled, which should guarantee the same functions are being compared.
So, the regular expressions might be kept very simple :-)
Assuming all of this is straightforward, what might this approach fail to give you?
Side Note: This basic strategy could work for any language which targets machine code, as well as VM instruction sets like the Java VM Bytecode, .NET CLR code, etc too.
It might be worth considering building a simple parser, using one of the common tools, rather than just using regular expressions. Clearly it is better to choose something you are familiar with, or which your organisation already uses.
For this problem, a parser doesn't actually need to validate the code (I assume it is valid when it is checked in), and it doesn't need to understand the code, so it might be quite dumb.
It might throw away comments (retaining new lines), ignore the contents of text strings, and treat program text in a very simple way. It mainly needs to keep track of balanced '{' '}', balanced '(' ')' and all the other valid program text is just individual tokens which can be passed 'straight through'.
It's output might be a separate file/function to make tracking easier.
If the language is C or C++, and the developers are reasonably disciplined, they might never use 'non-syntactic macros'. If that is the case, then the files don't need to be preprocessed.
Then a parser is mostly just looking for a the function name (an identifier) at file scope followed by ( parameter-list ) { ... code ... }
I'd SWAG it would be a few days work using yacc & lex / flex & bison, and it might be so simple that their is no need for the parser generator.
If the code is Java, then ANTLR is a possible, and I think there was a simple Java parser example.
If Haskell is your focus, their may be student projects published which have made a reasonable stab at a parser.

Parse an XML in standard C/C++ without additional libraries

I have an XML (assuming it is valid) and I must parse it and store it in a tree.
What is the best approach to parse it, without using other libraries, just basic manipulation of strings?
Keep in mind that I don't have to validate it, just parse and memorize it into a tree.
The basic structure of XML is quite simple:
<tagname [attribute[="value"] ...]>content</tagname>
where the content may contain both normal text and more XML structures, or the special form
<tagname [attribute[="value"] ...]/>
which is equivalent to
<tagname [attribute[="value"] ...]></tagname>
that is,. empty content.
So if you don't need to interpret a DTD or do other fancy things, you can do the following:
Check that the first non-whitespace character is <. If not, you don't have XML and can just give an error and exit.
Now follows the tag name, until the first whitespace, or the / or the > character. Store that.
If the next non-whitespace character is /, check that it is followed by >. If so, you've finished parsing and can return your result. Otherwise, you've got malformed XML, and can exit with an error.
If the character is >, then you've found the end of the begin tag. Now follows the content. Continue at step 6.
Otherwise what follows is an argument. Parse that, store the result, and continue at step 3.
Read the content until you find a < character.
If that character is followed by /, it's the end tag. Check that it is followed by the tag name and >, and if yes, return the result. Otherwise, throw an error.
If you get here, you've found the beginning of a nested XML. Parse that with this algorithm, and then continue at 6.
Reading XML looks simple but doing it correctly involves a few complexities you don't really want to deal with. Indeed, writing a simple XML parser effectively amounts to creating yet another XML library. I have done it and an incomplete version of this is sitting somewhere on my disk. Even if you don't need to validate your XML structure:
whether you validate or not, you need to deal with entity references like < and the variety of character entity references like A and
the plain body of an XML document is relatively simple but the header a major pain to deal with in particular the DTD: there are two versions thereof which are slightly different and you probably need to process the inline DTD
even the body isn't entirely trivial because of these annoying character data segments
even without validation you may need to support external entity references
the characters to be accepted and/or rejected for various parts of XML are also somewhat interesting
note that XML is defined in terms of Unicode and proper handling of this isn't entirely trivial either: just using char or wchar_t just doesn't cut it.
The first version I implemented was a nice little iterator intended to pop out all the elements encountered. This allowed for the nice feature of easily stopping and continuing the parsing at the choice of the iterator user. Unfortunately, I didn't get it to fly when trying to copy with the various entity references. It would parse simple XML files nice and fast but some quirks in the specification I just didn't get right.
What worked best for me was creating a simple recursive decent parser combined with a suitable stack of buffers to somewhat transparently deal with entity references. However, to finish this completely I still need to deal with some encoding issues and in the end I just had higher priority projects to work on (in my spare time, that is).
In summary: it can be done, obviously, as others did. It is probably a somewhat pointless exercise unless you have a really bright idea which makes your implementation uniquely better suited than the alternatives.
The best and only approach is to re-implement such a library from scratch without using any other libraries...
You're welcome to use existing libraries like pugixml, for example. It's installation is as simple as adding the files to your project and start using it. It's lightweight compared to other validating parsers, such as Xerces.

best practice for implementing a Search-Function via prepared statements

I'm trying to implement a Search-Function using c++ and libpqxx.
But I've got the following problem:
The user is able to specify 4 different search patterns (each of them optional):
from date
till date
document type
document id
Each of them is optional. So if I want to use prepared statements I would need 2^4 = 16 different prepared statements. Well, it's possible, but I want to avoid this.
Here as an example what a prepared statement in libpqxx looks like:
_connection->prepare("ExampleStmnt", "SELECT * FROM foo WHERE title=$1 AND id=$2 AND date=$3")
("text", pqxx::prepare::treat_string)
("smallint", pqxx::prepare::treat_direct)
("timestamp", pqxx::prepare::treat_direct);
Therefore I also have no idea how I would piece such a prepared statement together.
Is there any other 'nice' way that I didn't think of?
The best you can do is to have four different ->prepare clauses, depending on how many search criteria are actually used, concatenate the criteria into your String, and then branch to one of the four prepare code blocks. (That will probably spook your style checker into thinking you are creating an injection vulnerability, but of course you aren't, as long as you insert only elements of the closed set os column names.)
Note that this isn't a very nice solution, but even Stephane Faroult (in The Art of SQL) says it's the best one possible, so who am I to argue?

Scheme: Detecting duplicate elements in a list

Does R6RS or Chez Scheme v7.9.4 have a library function to check if a list contains duplicate elements?
Alternatively, do either have any built in functionality for sets (which dis-allow duplicate elements)? So far, I've only been able to find an example here.
The problem with that is that it doesn't appear to actually be part of the Chez Scheme library. Although I could write my own version of this, I'd much rather use a well known, tested, and maintained library function - especially given how basic an operation this is.
So a simple "use these built-in functions" or a "no built-in library implements this" will suffice. Thanks!
SRFI 1 on list processing has a delete-duplicates function (so you could use that and check the length afterward) and may well have other functions you might find useful.
Kyle,
Awhile back I needed to use a few SRFIs with Chez Scheme. A few that a converted for use with Chez Scheme (including SRFI-1) are at:
http://github.com/dharmatech/chez-srfi
After you add the path to 'chez-srfi' to your CHEZSCHEMELIBDIRS, you can import SRFI-1:
(import (srfi :1))
Ed