Managing command line arguments - c++

Working on updating some old code and the original authors decided that all the command line argument variables should be globals. This obviously makes things more challenging, from a testing and development standpoint.
My question is how do I best manage command line arguments that all classes need to use (for example a trace flag / debugging flag). A coworker suggested at the very least wrapping the variables in a namespace, but that just doesn't seem sufficient. I thought about a singleton or static class and just providing getters but that doesn't seem very elegant. On the other hand that seems better than having to pass 5 configuration options to every class that needs to know if debugging and a handful of other options are set though.

The biggest problem with global variables is that changing them from within a function tends to become an unexpected side effect which introduces bugs. In the case of command line arguments, however, they are essentially constants as far as the running process is concerned. The only thing preventing you from declaring them const is that they need to be assigned when you are parsing the command line in the beginning.
I would suggest creating some mechanism that allows you to initialize the arguments in the beginning, but then prevents any part of the program from ever changing them. That will effectively avoid any disadvantage that global variables would normally introduce.
One way might be a ProgramArguments class/struct with const members that are initialized in the constructor, by parsing the command line. You could then have something like:
std::unique_ptr<ProgramArguments const> g_program_arguments;
int main(int argc, char* argv[])
{
g_program_arguments.reset(new ProgramArguments(argc, argv));
if(g_program_arguments->verbose)
std::cout << "verbose!" << std::endl;
// ...
return 0;
}
That wouldn't prevent you from changing the pointer to point to a different ProgramArguments instance however. Another way might be to temporarily cast away the constness for initialization purposes:
struct ProgramArguments {
ProgramArguments() {}
bool verbose;
};
ProgramArguments const g_program_arguments;
void init_program_arguments(int argc, char* argv[])
{
ProgramArguments& program_arguments = const_cast<ProgramArguments&>(g_program_arguments);
program_arguments.verbose = true;
}
int main(int argc, char* argv[])
{
init_program_arguments(argc, argv);
if(g_program_arguments.verbose)
std::cout << "verbose!" << std::endl;
return 0;
}

This will depend on the amount of globals we are talking about. Personally, I think it is fine to have a few globals for things like debug flags and say a singleton Log Manager.
If you really want to respect OOP principles by the book, then you would have to pass everything a function or object needs as parameters. Never accessing global state. As you have mentioned, passing a lot of common parameters to every function gets boring pretty quickly, so one pattern that might help you alleviate this is the Context Object.

Related

Writing a Clang-Tidy check, how do I find if a pointer is initialized before it is called?

I am trying to port some static checks from an old in-house C++ static checker to clang-tidy.
Since I am really new to this tool, I am having a really hard time doing it and I am starting to think that it's not the proper tool to do what I want.
So basically what I am currently trying to implement, is a check on pointer initialization. I want to verify that a local pointer is properly initialized before being used.
For example if I take this sample of code:
void method(const char *);
int main(int argc, char **argv){
const char * ptNotInit;
const char * ptInit = "hello";
method(ptNotInit);
method(ptInit);
return 0;
}
I want to get an error on method(ptNotInit) because I am passing a nullptr to method.
At first I try a very simple matcher:
Finder->addMatcher(varDecl(hasType(pointerType())).bind("pointerDeclaration"),this);
// and
const auto *MatchedPtDecl = Result.Nodes.getNodeAs<VarDecl>("pointerDeclaration");
if ( MatchedPtDecl->hasInit() == false )
// Do an error
So i get an error on ptNotInit and argv, so I add MatchedPtDecl->isLocalVarDecl() and all seems fine.
Except that in my code sample I add:
ptNotInit = "Hello again";
method(ptNotInit);
I still get an error on ptNotInit when I abviously initialized it just before the call to method.
I suppose that the VarDecl method hasInit() just apply for the declaration of the variable, explaining why it return false?
So my question is, how can I know when calling method(ptNotInit) if ptNotInit was initialized?
Clang-tidy seems powerful to find something, but I don't know how to find the lack of something, if you see what I mean...
I try to write more complex matcher to find init like this one
Finder->addMatcher(binaryOperator(hasOperatorName("="),hasLHS(declRefExpr(hasType(pointerType()),hasDeclaration(varDecl().bind("pointerDeclaration"))).bind("affectation")))
If my pointer is on the left of an = operator, that should be an initialization... Ok why not, but at the end I want to know that there are no initialization, I don't want to match initialization syntax... Maybe I am taking the problem backward.
Any tips would help, or if you can point me to an already implemented checker doing something similar, that would be a great help!

C++: How to pass user input through the system without using global variables?

I am having the problem, that my application can has a lot of user input which determines how the application will be run. The application is an in memory database system and the user could for example invoke the program with commands like '--pagesize 16384' (sets the memory page size to use), '--alignment 4096' (sets the memory alignment to use) or '--measure' (sets a flag to measure certain routines).
Currently I save all the user input in global variables which are defined as extern in a header file:
//#file common.hh
extern size_t PAGE_SIZE_GLOBAL;
extern size_t ALIGNMENT_GLOBAL;
extern size_t MEMCHUNK_SIZE_GLOBAL;
extern size_t RUNS_GLOBAL;
extern size_t VECTORIZE_SIZE_GLOBAL;
extern bool MEASURE_GLOBAL;
extern bool PRINT_GLOBAL;
extern const char* PATH_GLOBAL;
and in main source file:
#include "modes.hh"
size_t PAGE_SIZE_GLOBAL;
size_t ALIGNMENT_GLOBAL;
size_t MEMCHUNK_SIZE_GLOBAL;
size_t RUNS_GLOBAL;
size_t VECTORIZE_SIZE_GLOBAL;
bool MEASURE_GLOBAL;
bool PRINT_GLOBAL;
const char* PATH_GLOBAL;
int main(const int argc, const char* argv[]){
...
//Initialize the globals with user input
PAGE_SIZE_GLOBAL = lArgs.pageSize();
ALIGNMENT_GLOBAL = lArgs.alignment();
MEMCHUNK_SIZE_GLOBAL = lArgs.chunkSize();
RUNS_GLOBAL = lArgs.runs();
VECTORIZE_SIZE_GLOBAL = lArgs.vectorized();
MEASURE_GLOBAL = lArgs.measure();
PRINT_GLOBAL = lArgs.print();
std::string tmp = lArgs.path() + storageModel + "/";
PATH_GLOBAL = tmp.c_str();
...
}
I then include the header file common.hh in each file, where a global variable is needed (which can be very deep down in the system).
I already read a dozen times to prevent global variables so this is obviously bad style. In the book 'Code Complete 2' from Steve McConnell the chapter about global variables also stated to prevent global variables and use access routines instead. In the section 'How to Use Access Routines' he writes
"Hide data in a class. Declare that data by using the static keyword
(...) to ensure only a single instance of the data exists. Write
routines that let you look at the data and change it."
First of all, the global data won't change (maybe this is changed later but at least not in the near future). But I don't get how these access routines are any better? I will also have a class I need to include at every file where the data is needed. The only difference is the global data are static members accessed through getter functions.
(Edited) I also thought about using a global data Singleton class. But an object with ALL the global data sounds overkill since only a few global variables of the object are needed at its different destinations.
My Question: Should I just stick to the global variables? Are there better solutions, what am I missing? What are the best practices?
Edit:
If I would identify a few classes where the user input is needed the most, I could change the global data to member variables. What would be the best practice to pass the user input to these classes? Passing the data as parameters through the whole system down to the lowest layers sounds wrong. Is there are design pattern (thinking about something like a factory) which would be suited here?
How to pass user input through the system without using global
variables.
It is easy. Surprise, I created a class.
For a while, I called this class a travel case, because I considered it analogous to the needs of a suitcase during a trip. The TC_t is a non-standard container which held useful things for what is going on at your destination, and there is only one created, with references passed to any other objects that could use the information. Not global, in the strictest sense.
This TC_t is created in main() thread, while studying the command line options.
I recently wrote yet-another-game-of-life. User inputs included a) destination of output (i.e. a tty num), b) initial fill-pattern choices, c) 'overrides' for game board dimensions, d) test modes, including max speed, and vector vs. array options for cell behaviours.
The GOLUtil_t (Game Of Life Utility) (previously TC_t) includes methods that are useful in more than one effort.
For your question, the two typical globals I avoided are the a) gameBoard, and b) ansi terminal access.
std::cout << "accessing '" << aTermPFN << "' with std::ofstream "
<< std::endl;
std::ofstream* ansiTerm = new std::ofstream(aTermPFN);
if (!ansiTerm->is_open())
{
dtbAssert(nullptr != ansiTerm)(aTermPFN);
std::cerr << "Can not access '" << aTermPFN << "'" << std::endl;
assert(0); // abort
}
// create game-board - with a vector of cell*
CellVec_t gameBoard;
gameBoard.reserve (aMaxRow * aMaxCol);
GOLUtil_t gBrd(aMaxRow, aMaxCol, gameBoard, *ansiTerm);
This last line invoked the ctor of GOLUtil_t.
The instance "gBrd" is then passed (by reference) to the ctor of the game, and from there, to any aggregate objects it contained.
std::string retVal;
{
// initialize display, initialize pattern
GameOfLife_t GOL(gBrd, timeOfDay, fillPatternChoiceLetter, useArray);
std::string retValS = GOL.exec2(testMode);
retVal = gBrd.clearGameBoard(retValS); // delete all cells
}
// force GameOfLife_t dtor before close ansiTerm
ansiTerm->close();
Summary - No globals.
Every instance of any class that needed this info (where to output? what are dimensions?) has access to the GOLUtil_t for their entire lifetime. And GOLUtil_t has methods to lighten the coding load.
Note: because single output terminal, I used a single thread (main)
Your first refactor effort might be to:
a) remove the global classes,
b) and instead instantiate these in main() (for lifetime control)
c) and then pass-by-reference these formerly global instances to those non-global objects that make use of them. I recommend in the ctor(s).
d) remember to clean up (delete if new'd)
my environment: Ubuntu 15.10, 64 bit, g++ V5

Bad practice to call static function from external file via function pointer?

Consider the following code:
file_1.hpp:
typedef void (*func_ptr)(void);
func_ptr file1_get_function(void);
file1.cpp:
// file_1.cpp
#include "file_1.hpp"
static void some_func(void)
{
do_stuff();
}
func_ptr file1_get_function(void)
{
return some_func;
}
file2.cpp
#include "file1.hpp"
void file2_func(void)
{
func_ptr function_pointer_to_file1 = file1_get_function();
function_pointer_to_file1();
}
While I believe the above example is technically possible - to call a function with internal linkage only via a function pointer, is it bad practice to do so? Could there be some funky compiler optimizations that take place (auto inline, for instance) that would make this situation problematic?
There's no problem, this is fine. In fact , IMHO, it is a good practice which lets your function be called without polluting the space of externally visible symbols.
It would also be appropriate to use this technique in the context of a function lookup table, e.g. a calculator which passes in a string representing an operator name, and expects back a function pointer to the function for doing that operation.
The compiler/linker isn't allowed to make optimizations which break correct code and this is correct code.
Historical note: back in C89, externally visible symbols had to be unique on the first 6 characters; this was relaxed in C99 and also commonly by compiler extension.
In order for this to work, you have to expose some portion of it as external and that's the clue most compilers will need.
Is there a chance that there's a broken compiler out there that will make mincemeat of this strange practice because they didn't foresee someone doing it? I can't answer that.
I can only think of false reasons to want to do this though: Finger print hiding, which fails because you have to expose it in the function pointer decl, unless you are planning to cast your way around things, in which case the question is "how badly is this going to hurt".
The other reason would be facading callbacks - you have some super-sensitive static local function in module m and you now want to expose the functionality in another module for callback purposes, but you want to audit that so you want a facade:
static void voodoo_function() {
}
fnptr get_voodoo_function(const char* file, int line) {
// you tagged the question as C++, so C++ io it is.
std::cout << "requested voodoo function from " << file << ":" << line << "\n";
return voodoo_function;
}
...
// question tagged as c++, so I'm using c++ syntax
auto* fn = get_voodoo_function(__FILE__, __LINE__);
but that's not really helping much, you really want a wrapper around execution of the function.
At the end of the day, there is a much simpler way to expose a function pointer. Provide an accessor function.
static void voodoo_function() {}
void do_voodoo_function() {
// provide external access to voodoo
voodoo_function();
}
Because here you provide the compiler with an optimization opportunity - when you link, if you specify whole program optimization, it can detect that this is a facade that it can eliminate, because you let it worry about function pointers.
But is there a really compelling reason not just to remove the static from infront of voodoo_function other than not exposing the internal name for it? And if so, why is the internal name so precious that you would go to these lengths to hide that?
static void ban_account_if_user_is_ugly() {
...;
}
fnptr do_that_thing() {
ban_account_if_user_is_ugly();
}
vs
void do_that_thing() { // ban account if user is ugly
...
}
--- EDIT ---
Conversion. Your function pointer is int(*)(int) but your static function is unsigned int(*)(unsigned int) and you don't want to have to cast it.
Again: Just providing a facade function would solve the problem, and it will transform into a function pointer later. Converting it to a function pointer by hand can only be a stumbling block for the compiler's whole program optimization.
But if you're casting, lets consider this:
// v1
fnptr get_fn_ptr() {
// brute force cast because otherwise it's 'hassle'
return (fnptr)(static_fn);
}
int facade_fn(int i) {
auto ui = static_cast<unsigned int>(i);
auto result = static_fn(ui);
return static_cast<int>(result);
}
Ok unsigned to signed, not a big deal. And then someone comes along and changes what fnptr needs to be to void(int, float);. One of the above becomes a weird runtime crash and one becomes a compile error.

Several specific methods or one generic method?

this is my first question after long time checking on this marvelous webpage.
Probably my question is a little silly but I want to know others opinion about this. What is better, to create several specific methods or, on the other hand, only one generic method? Here is an example...
unsigned char *Method1(CommandTypeEnum command, ParamsCommand1Struct *params)
{
if(params == NULL) return NULL;
// Construct a string (command) with those specific params (params->element1, ...)
return buffer; // buffer is a member of the class
}
unsigned char *Method2(CommandTypeEnum command, ParamsCommand2Struct *params)
{
...
}
unsigned char *Method3(CommandTypeEnum command, ParamsCommand3Struct *params)
{
...
}
unsigned char *Method4(CommandTypeEnum command, ParamsCommand4Struct *params)
{
...
}
or
unsigned char *Method(CommandTypeEnum command, void *params)
{
switch(command)
{
case CMD_1:
{
if(params == NULL) return NULL;
ParamsCommand1Struct *value = (ParamsCommand1Struct *) params;
// Construct a string (command) with those specific params (params->element1, ...)
return buffer;
}
break;
// ...
default:
break;
}
}
The main thing I do not really like of the latter option is this,
ParamsCommand1Struct *value = (ParamsCommand1Struct *) params;
because "params" could not be a pointer to "ParamsCommand1Struct" but a pointer to "ParamsCommand2Struct" or someone else.
I really appreciate your opinions!
General Answer
In Writing Solid Code, Steve Macguire's advice is to prefer distinct functions (methods) for specific situations. The reason is that you can assert conditions that are relevant to the specific case, and you can more easily debug because you have more context.
An interesting example is the standard C run-time's functions for dynamic memory allocation. Most of it is redundant, as realloc can actually do (almost) everything you need. If you have realloc, you don't need malloc or free. But when you have such a general function, used for several different types of operations, it's hard to add useful assertions and it's harder to write unit tests, and it's harder to see what's happening when debugging. Macquire takes it a step farther and suggests that, not only should realloc just do _re_allocation, but it should probably be two distinct functions: one for growing a block and one for shrinking a block.
While I generally agree with his logic, sometimes there are practical advantages to having one general purpose method (often when operations is highly data-driven). So I usually decide on a case by case basis, with a bias toward creating very specific methods rather than overly general purpose ones.
Specific Answer
In your case, I think you need to find a way to factor out the common code from the specifics. The switch is often a signal that you should be using a small class hierarchy with virtual functions.
If you like the single method approach, then it probably should be just a dispatcher to the more specific methods. In other words, each of those cases in the switch statement simply call the appropriate Method1, Method2, etc. If you want the user to see only the general purpose method, then you can make the specific implementations private methods.
Generally, it's better to offer separate functions, because they by their prototype names and arguments communicate directly and visibly to the user that which is available; this also leads to more straightforward documentation.
The one time I use a multi-purpose function is for something like a query() function, where a number of minor query functions, rather than leading to a proliferation of functions, are bundled into one, with a generic input and output void pointer.
In general, think about what you're trying to communicate to the API user by the API prototypes themselves; a clear sense of what the API can do. He doesn't need excessive minutae; he does need to know the core functions which are the entire point of having the API in the first place.
First off, you need to decide which language you are using. Tagging the question with both C and C++ here makes no sense. I am assuming C++.
If you can create a generic function then of course that is preferable (why would you prefer multiple, redundant functions?) The question is; can you? However, you seem to be unaware of templates. We need to see what you have omitted here to tell if you if templates are suitable however:
// Construct a string (command) with those specific params (params->element1, ...)
In the general case, assuming templates are appropriate, all of that turns into:
template <typename T>
unsigned char *Method(CommandTypeEnum command, T *params) {
// more here
}
On a side note, how is buffer declared? Are you returning a pointer to dynamically allocated memory? Prefer RAII type objects and avoid dynamically allocating memory like that if so.
If you are using C++ then I would avoid using void* as you don't really need to. There is nothing wrong with having multiple methods. Note that you don't actually have to rename the function in your first set of examples - you can just overload a function using different parameters so that there is a separate function signature for each type. Ultimately, this kind of question is very subjective and there are a number of ways of doing things. Looking at your functions of the first type, you would perhaps be well served by looking into the use of templated functions
You could create a struct. That's what I use to handle console commands.
typedef int (* pFunPrintf)(const char*,...);
typedef void (CommandClass::*pKeyFunc)(char *,pFunPrintf);
struct KeyCommand
{
const char * cmd;
unsigned char cmdLen;
pKeyFunc pfun;
const char * Note;
long ID;
};
#define CMD_FORMAT(a) a,(sizeof(a)-1)
static KeyCommand Commands[]=
{
{CMD_FORMAT("one"), &CommandClass::CommandOne, "String Parameter",0},
{CMD_FORMAT("two"), &CommandClass::CommandTwo, "String Parameter",1},
{CMD_FORMAT("three"), &CommandClass::CommandThree, "String Parameter",2},
{CMD_FORMAT("four"), &CommandClass::CommandFour, "String Parameter",3},
};
#define AllCommands sizeof(Commands)/sizeof(KeyCommand)
And the Parser function
void CommandClass::ParseCmd( char* Argcommand )
{
unsigned int x;
for ( x=0;x<AllCommands;x++)
{
if(!memcmp(Commands[x].cmd,Argcommand,Commands[x].cmdLen ))
{
(this->*Commands[x].pfun)(&Argcommand[Commands[x].cmdLen],&::printf);
break;
}
}
if(x==AllCommands)
{
// Unknown command
}
}
I use a thread safe printf pPrintf, so ignore it.
I don't really know what you want to do, but in C++ you probably should derive multiple classes from a Formatter Base class like this:
class Formatter
{
virtual void Format(unsigned char* buffer, Command command) const = 0;
};
class YourClass
{
public:
void Method(Command command, const Formatter& formatter)
{
formatter.Format(buffer, command);
}
private:
unsigned char* buffer_;
};
int main()
{
//
Params1Formatter formatter(/*...*/);
YourClass yourObject;
yourObject.Method(CommandA, formatter);
// ...
}
This removes the resposibility to handle all that params stuff from your class and makes it closed for changes. If there will be new commands or parameters during further development you don't have to modifiy (and eventually break) existing code but add new classes that implement the new stuff.
While not full answer this should guide you in correct direction: ONE FUNCTION ONE RESPONSIBILITY. Prefer the code where it is responsible for one thing only and does it well. The code whith huge switch statement (which is not bad by itself) where you need cast void * to some other type is a smell.
By the way I hope you do realise that according to standard you can only cast from void * to <type> * only when the original cast was exactly from <type> * to void *.

Calling Function Overwrites Value

I have several configuration flags that I am implementing as structs. I create an object. I call a method of the object with a flag, which eventually triggers a comparison between two flags. However, by this time, one of the flags has been overwritten somehow.
To clarify, here's a VERY simplified version of the code that should illustrate what I'm seeing:
class flag_type { unsigned int flag; /*more stuff*/ };
flag_type FLAG1
flag_type FLAG2
class MyObject {
public:
void method1(const flag_type& flag_arg) {
//conditionals, and then:
const flag_type flag_args[2] = {flag_arg,flag_arg};
method2(flag_args);
}
void method2(const flag_type flag_args[2]) {
//conditionals, and then:
method3(flag_args[0]);
}
void method3(const flag_type& flag_arg) { //Actually in a superclass
//stuff
if (flag_arg==FLAG1) { /*stuff*/ }
//stuff
}
};
int main(int argc, const char* argv[]) {
//In some functions called by main:
MyObject* obj = new MyObject();
//Later in some other functions:
obj->method1(FLAG1);
}
With a debugger and print statements, I can confirm that both FLAG1 and flag_arg/flag_args are fine in both "method1" and "method2". However, when I get to method3, "FLAG1.flag" has been corrupted, so the comparison fails.
Now, although I'm usually stellar about not doing it, and it passes MSVC's static code analysis on strictest settings, this to me looks like the behavior of a buffer overrun.
I haven't found any such error by looking, but of course one usually doesn't. My question isA: Am I screwing up somewhere else? I realize I'm not sharing any real code, but am I missing something already? This scheme worked before before I rewrote a large portion of the code.
B: Is there an easier way than picking through the code more carefully until I find it? The code is cross-platform, so I'm already setting it up to check with Valgrind on an Ubuntu box.
Thanks to those who tried to help. Though, it should be noted that the code was for clarification purposes only; I typed it from scratch to show generally was was happening; not to compile. In retrospect, I realize it wasn't fair to ask people to solve it on so little information--though my actual question "Is there an easier way than picking through the code more carefully" didn't really concern actually solving the problem--just how to approach it.
As to this question, on Ubuntu Linux, I got "stack smashing" which told me more or less where the problem occurred. Interestingly, the traceback for stack smashing was the most helpful. Long story short, it was an embarrassingly basic error; strcpy was overflowing (in the operators for ~, | and &, the flags have a debug string set this way). At least it wasn't me who wrote that code. Always use strncpy, people :P