Best place to document our C++ code - c++

After having some readings about Doxygen I'm a bit confused where to document my variable, function etc. Should it be in the implementation file(source) or in its interface(header file).
What is the best practise regarding that.

Place documentation in your headers. And one very important thing to look out for is to not overdocument. Don't start writing a comment for every single variable and function, especially if all you do is state the obvious. Examples...
This comment below is obvious and unhelpful. All the comment says is perfectly clear just by looking at the function.
/**
This function does stuff with a prime number. */
void do_stuff(int prime);
You should instead document how the function behaves in extreme situations. For example, what does it do if the parameters are wrong? If it returns a pointer, whose responsibility is it to delete the pointer? What other things should programmers keep in mind when using this function? etc.
/**
This function does stuff with a prime number.
\param prime A prime number. The function must receive only primes, it
does not check the integer it receives to be prime.
*/
void do_stuff(int prime);
Also, I would advice you only document the interface in the header files: don't talk about how the function does something, tell only what it does. If you want to explain the actual implementation, I'd put some relevant (normal) comments in the source file.

You should aim to document only your header files, although at times it may prove difficult.

I generally recommend to put the documentation in the header file, and documented it from a user perspective.
In rare situations it may be beneficial to put the comments in the source file (or even in a separate file), for instance if
the cost of changing a header (in terms of build impact) is huge, and
you expect (frequent) changes to the documentation, without changing the syntax of the interface; for instance you regularly improve the documentation based on user feedback, or you have a different team of professional writers that write the documentation after the interface is delivered.
There can be other, less strong reasons: some people like comments in the source code, because it keeps the header file small and tidy. Others expect the documentation to be easier to keep up to date if it is close to the actual implementation (with the risk that they documented what the function does instead of how to use it).

Related

Insert text into C++ code between functions

I have following requirement:
Adding text at the entry and exit point of any function.
Not altering the source code, beside inserting from above (so no pre-processor or anything)
For example:
void fn(param-list)
{
ENTRY_TEXT (param-list)
//some code
EXIT_TEXT
}
But not only in such a simple case, it'd also run with pre-processor directives!
Example:
void fn(param-list)
#ifdef __WIN__
{
ENTRY_TEXT (param-list)
//some windows code
EXIT_TEXT
}
#else
{
ENTRY_TEXT (param-list)
//some any-os code
if (condition)
{
return; //should become EXIT_TEXT
}
EXIT_TEXT
}
So my question is: Is there a proper way doing this?
I already tried some work with parsers used by compilers but since they all rely on running a pre-processor before parsing, they are useless to me.
Also some of the token generating parser, which do not need a pre-processor are somewhat useless because they generate a memory-mapping of tokens, which then leads to a complete new source code, instead of just inserting the text.
One thing I am working on is to try it with FLEX (or JFlex), if this is a valid option, I would appreciate some input on it. ;-)
EDIT:
To clarify a little bit: The purpose is to allow something like a stack trace.
I want to trace every function call, and in order to follow the call-hierachy, I need to place a macro at the entry-point of a function and at the exit point of a function.
This builds a function-call trace. :-)
EDIT2: Compiler-specific options are not quite suitable since we have many different compilers to use, and many that are propably not well supported by any tools out there.
Unfortunately, your idea is not only impractical (C++ is complex to parse), it's also doomed to fail.
The main issue you have is that exceptions will bypass your EXIT_TEXT macro entirely.
You have several solutions.
As has been noted, the first solution would be to use a platform dependent way of computing the stack trace. It can be somewhat imprecise, especially because of inlining: ie, small functions being inlined in their callers, they do not appear in the stack trace as no function call was generated at assembly level. On the other hand, it's widely available, does not require any surgery of the code and does not affect performance.
A second solution would be to only introduce something on entry and use RAII to do the exit work. Much better than your scheme as it automatically deals with multiple returns and exceptions, it suffers from the same issue: how to perform the insertion automatically. For this you will probably want to operate at the AST level, and modify the AST to introduce your little gem. You could do it with Clang (look up the c++11 migration tool for examples of rewrites at large) or with gcc (using plugins).
Finally, you also have manual annotations. While it may seem underpowered (and a lot of work), I would highlight that you do not leave logging to a tool... I see 3 advantages to doing it manually: you can avoid introducing this overhead in performance sensitive parts, you can retain only a "summary" of big arguments and you can customize the summary based on what's interesting for the current function.
I would suggest using LLVM libraries & Clang to get started.
You could also leverage the C++ language to simplify your process. If you just insert a small object into the code that is constructed on function scope entrance & rely on the fact that it will be destroyed on exit. That should massively simplify recording the 'exit' of the function.
This does not really answer you question, however, for your initial need, you may use the backtrace() function from execinfo.h (if you are using GCC).
How to generate a stacktrace when my gcc C++ app crashes

File is getting really big need to separate data into another file but also need to use private variables. How can I achieve this correctly?

So I have a huge (legacy) file, call it HUGE.cxx. I'm adding new feature, but the file is getting even more big. I tried to create different classes for different jobs, but for some task I need to access the private variables. Here is a rough draft of what is going on
//HUGE.h
class Huge{
NewFeature object;
//...more stuff
};
//HUGE.cxx
Huge::Huge(){
//imagine object keeps track of id->func callback
object.on('uniqueID1', boost::bind(&HUGE::onID1Clicked,this));
}
void Huge::onID1Clicked()const{ return satisfiesSomeCondition(); }
//called internally when user right clicks
void Huge::createPopup()const{
for itr = object.begin to end
callback = itr->second;
//if satisfies condition add to popupmenu
if(callback()) addToPopupMenu( itr->first );
}
//event handler
void Huge::event(id){
//oh uniqueID1 was clicked as a menu item in right click
case 'uniqueID1': doSpecificFunctionality(); break;
}
so you see, I have some dependencies going there, but the file is so big and so are my changes. Do you have any advice on further separating out into more files. I know I can add a friend declaration to Huge file and add another class, but wanted to avoid that option if possible.
Sounds like you actually need a major refactor, separating concerns into their proper places.
But, to solve your immediate problem, there's no particular reason why all of Huge needs to be defined in Huge.cxx. You can split the function definitions into separate files, as long as every function is defined somewhere.
You might end up with:
Huge.h
Huge-private.cxx
Huge-public.cxx
Or however it makes sense to split your code.
As long as all the .cxx files include HUGE.h, and all the used functions are declared there (which should be the case), you can split up the implementation in as many .cxx files as you want. You could even put each function into its own file.
To call a function, the compiler only needs to see the prototype from HUGE.h. Later, when all the compiled files are linked together, the linker will combine the code from the different object files as appropriate.
Serious advice: Learn about refactoring (http://refactoring.com) and design patterns.
Without seeing the whole thing, it is hard or impossible to tell you something really specific. You probably need an arsenal of refactoring ammunition. For some parts, extracting methods and merging common functionality is the right thing; for other parts, dependency inversion may be the tool of choice.
Beyond some critical mass of mud, a (clean) rewrite might be the sanest and most profitable thing to do: Begin with defining what the input and the expected output is (during that, write tests).

Looking for a C++ implementation of the C4.5 algorithm

I've been looking for a C++ implementation of the C4.5 algorithm, but I haven't been able to find one yet. I found Quinlan's C4.5 Release 8, but it's written in C... has anybody seen any open source C++ implementations of the C4.5 algorithm?
I'm thinking about porting the J48 source code (or simply writing a wrapper around the C version) if I can't find an open source C++ implementation out there, but I hope I don't have to do that! Please let me know if you have come across a C++ implementation of the algorithm.
Update
I've been considering the option of writing a thin C++ wrapper around the C implementation of the C5.0 algorithm (C5.0 is the improved version of C4.5). I downloaded and compiled the C implementation of the C5.0 algorithm, but it doesn't look like it's easily portable to C++. The C implementation uses a lot of global variables and simply writing a thin C++ wrapper around the C functions will not result in an object oriented design because each class instance will be modifying the same global parameters. In other words: I will have no encapsulation and that's a pretty basic thing that I need.
In order to get encapsulation I will need to make a full blown port of the C code into C++, which is about the same as porting the Java version (J48) into C++.
Update 2.0
Here are some specific requirements:
Each classifier instance must encapsulate its own data (i.e. no global variables aside from constant ones).
Support the concurrent training of classifiers and the concurrent evaluation of the classifiers.
Here is a good scenario: suppose I'm doing 10-fold cross-validation, I would like to concurrently train 10 decision trees with their respective slice of the training set. If I just run the C program for each slice, I would have to run 10 processes, which is not horrible. However, if I need to classify thousands of data samples in real time, then I would have to start a new process for each sample I want to classify and that's not very efficient.
A C++ implementation for C4.5 called YaDT is available here, in the "Decision Trees" section:
http://www.di.unipi.it/~ruggieri/software.html
This is the source code for the last version:
http://www.di.unipi.it/~ruggieri/YaDT/YaDT1.2.5.zip
From the paper where the tool is described:
[...] In this paper, we describe a new from-scratch C++ implementation of a decision tree induction algorithm, which yields entropy-based decision trees in the style of C4.5. The implementation is called YaDT, an acronym for Yet another Decision Tree builder. The intended contribution of this paper is to present the design principles of the implementation that allowed for obtaining a highly efficient system. We discuss our choices on memory representation and modelling of data and metadata,on the algorithmic optimizations and their effect on memory and time performances, and on the trade-off between efficiency and accuracy of pruning heuristics. [...]
The paper is available here.
I may have found a possible C++ "implementation" of C5.0 (See5.0), but I haven't been able to dig into the source code enough to determine if it really works as advertised.
To reiterate my original concerns, the author of the port states the following about the C5.0 algorithm:
Another drawback with See5Sam [C5.0] is the impossibility to have more than
one application tree at the same time. An application is read from
files each time the executable is run and is stored in global
variables here and there.
I will update my answer as soon as I get some time to look into the source code.
Update
It's looking pretty good, here is the C++ interface:
class CMee5
{
public:
/**
Create a See 5 engine from tree/rules files.
\param pcFileStem The stem of the See 5 file system. The engine
initialisation will look for the following files:
- pcFileStem.names Vanilla See 5 names file (mandatory)
- pcFileStem.tree or pcFileStem.rules Vanilla See 5 tree or rules
file (mandatory)
- pcFileStem.costs Vanilla See 5 costs file (mandatory)
*/
inline CMee5(const char* pcFileStem, bool bUseRules);
/**
Release allocated memory for this engine.
*/
inline ~CMee5();
/**
General classification routine accepting a data record.
*/
inline unsigned int classifyDataRec(DataRec Case, float* pOutConfidence);
/**
Show rules that were used to classify the last case.
Classify() will have set RulesUsed[] to
number of active rules for trial 0,
first active rule, second active rule, ..., last active rule,
number of active rules for trial 1,
first active rule, second active rule, ..., last active rule,
and so on.
*/
inline void showRules(int Spaces);
/**
Open file with given extension for read/write with the actual file stem.
*/
inline FILE* GetFile(String Extension, String RW);
/**
Read a raw case from file Df.
For each attribute, read the attribute value from the file.
If it is a discrete valued attribute, find the associated no.
of this attribute value (if the value is unknown this is 0).
Returns the array of attribute values.
*/
inline DataRec GetDataRec(FILE *Df, Boolean Train);
inline DataRec GetDataRecFromVec(float* pfVals, Boolean Train);
inline float TranslateStringField(int Att, const char* Name);
inline void Error(int ErrNo, String S1, String S2);
inline int getMaxClass() const;
inline int getClassAtt() const;
inline int getLabelAtt() const;
inline int getCWtAtt() const;
inline unsigned int getMaxAtt() const;
inline const char* getClassName(int nClassNo) const;
inline char* getIgnoredVals();
inline void FreeLastCase(void* DVec);
}
I would say that this is the best alternative I've found so far.
If I'm reading this correctly...it appears not to be organized as a C API, but as a C program. A data set is fed in, then it runs an algorithm and gives you back some rule descriptions.
I'd think the path you should take depends on whether you:
merely want a C++ interface for supplying data and retrieving rules from the existing engine, or...
want a C++ implementation that you can tinker with in order to tweak the algorithm to your own ends
If what you want is (1) then you could really just spawn the program as a process, feed it input as strings, and take the output as strings. That would probably be the easiest and most future-proof way of developing a "wrapper", and then you'd only have to develop C++ classes to represent the inputs and model the rule results (or match existing classes to these abstractions).
But if what you want is (2)...then I'd suggest trying whatever hacks you have in mind on top of the existing code in either C or Java--whichever you are most comfortable. You'll get to know the code that way, and if you have any improvements you may be able to feed them upstream to the author. If you build a relationship over the longer term then maybe you could collaborate and bring the C codebase slowly forward to C++, one aspect at a time, as the language was designed for.
Guess I just think the "When in Rome" philosophy usually works better than Port-In-One-Go, especially at the outset.
RESPONSE TO UPDATE: Process isolation takes care of your global variable issue. As for performance and data set size, you only have as many cores/CPUs and memory as you have. Whether you're using processes or threads usually isn't the issue when you're talking about matters of scale at that level. The overhead you encounter is if the marshalling is too expensive.
Prove the marshalling is the bottleneck, and to what extent... and you can build a case for why a process is a problem over a thread. But, there may be small tweaks to existing code to make marshalling cheaper which don't require a rewrite.

Parsing C++ to make some changes in the code

I would like to write a small tool that takes a C++ program (a single .cpp file), finds the "main" function and adds 2 function calls to it, one in the beginning and one in the end.
How can this be done? Can I use g++'s parsing mechanism (or any other parser)?
If you want to make it solid, use clang's libraries.
As suggested by some commenters, let me put forward my idea as an answer:
So basically, the idea is:
... original .cpp file ...
#include <yourHeader>
namespace {
SpecialClass specialClassInstance;
}
Where SpecialClass is something like:
class SpecialClass {
public:
SpecialClass() {
firstFunction();
}
~SpecialClass() {
secondFunction();
}
}
This way, you don't need to parse the C++ file. Since you are declaring a global, its constructor will run before main starts and its destructor will run after main returns.
The downside is that you don't get to know the relative order of when your global is constructed compared to others. So if you need to guarantee that firstFunction is called
before any other constructor elsewhere in the entire program, you're out of luck.
I've heard the GCC parser is both hard to use and even harder to get at without invoking the whole toolchain. I would try the clang C/C++ parser (libparse), and the tutorials linked in this question.
Adding a function at the beginning of main() and at the end of main() is a bad idea. What if someone calls return in the middle?.
A better idea is to instantiate a class at the beginning of main() and let that class destructor do the call function you want called at the end. This would ensure that that function always get called.
If you have control of your main program, you can hack a script to do this, and that's by far the easiet way. Simply make sure the insertion points are obvious (odd comments, required placement of tokens, you choose) and unique (including outlawing general coding practices if you have to, to ensure the uniqueness you need is real). Then a dumb string hacking tool to read the source, find the unique markers, and insert your desired calls will work fine.
If the souce of the main program comes from others sources, and you don't have control, then to do this well you need a full C++ program transformation engine. You don't want to build this yourself, as just the C++ parser is an enormous effort to get right. Others here have mentioned Clang and GCC as answers.
An alternative is our DMS Software Reengineering Toolkit with its C++ front end. DMS, using its C++ front end, can parse code (for a variety of C++ dialects), builds ASTs, carry out full name/type resolution to determine the meaning/definition/use of all symbols. It provides procedural and source-to-source transformations to enable changes to the AST, and can regenerate compilable source code complete with original comments.

Make a variable unavailable in portion of codes

From time to time, I want, as a safety check, to check that a variable v is not used in some portion of code, or in the remainder of some function, even though it is still visible in the scope of this function/portion of code. For instance:
int x;
// do something with x
DEACTIVATE(x);
// a portion of code which should not use x
ACTIVATE(x);
// do something else with x
Is there a good way to perform that type of verification at compile time?
NOTE: I know that one should always use a scope that is as small as possible for each variable, but there are cases where pushing this practice to an extreme can become cumbersome, and such a tool would be useful.
Thanks!
The best way to achieve this is to actually have small scopes in your code, i.e. use short, focused methods which do one thing only. This way you tend to have few local variables per each individual method, and they go out of scope automatically once you don't need them.
If you have long legacy methods which make you worry about this problem, the best long-term solution is to refactor them by extracting smaller chunks of functionality into separate methods. Most modern IDEs have automated refactoring support which lowers the risk of introducing bugs with such changes - although the best is of course to have a proper set of unit tests to ensure you aren't breaking anything.
Recommended reading is Clean Code.
Use
#define v #
..
#undef v
This should do it as # is with very low probability conflicting with any other variable name or keyword or operator.
As i know, no such compile verification. Maybe you can verify it by yourself using grep. I think the best way is to separate your function into two functions. One use the variable, and the other cannot see the variable. That's one of the reasons why we need functions.