I'm trying to do some refactoring and wish to figure out the best path forward.
I have
myonce{
static int i //for operation 1
switch(commandid) {
case 1: operation 1
i = 1;
...
where myonce is a function that is called in a loop. This is not my code, I'm trying to make it better. Operation 1 (or each case) is a series of commands, and I want to put them in their own translation units (one function per file).
Since myonce runs in a loop, the original author has many static variables that he uses to keep state, some of these state sets are used across multiple operations. Note that these are not static file scope, they are static block scope.
To keep things simple, as a proof of concept, I want to know if the following is possible.
Consider 1 operation with 1 set of static vars.
main.cpp
myonce {
static int i //for op 1
switch(commandid) {
case 1: operation1();
operation1.cpp
extern int i;
void operation1() {
i = 1;
}
In the case of multiple operations using the same sets of state, I would make a header to declare them all extern.
Currently compilation of this file is counted in minutes, and my first goal is to break it up into smaller compilation units so that the author can work more freely. this refactoring will take a long time, but I mention this as an explanation of my motivation of this approach.
I understand that a static file scope variable is not accessible to other translation units (extern in other files), so I wish to distinguish that this is not the case I'm handling. What I don't know at the moment, is where I should declare operation1() to main, should it be
static int i
extern void operation1();
So that int is declared as visible to the function?
I would appreciate any pointers in this regard. Thanks.
Put the state variables into a struct. Pass this struct to each function.
Example.
// foo.h
struct TheState
{
int x;
char *y;
// ...
};
void func1(TheState &);
void func2(TheState &);
// main.cc
#include "foo.h"
void main_loop()
{
TheState the_state; // initialize this however you want
for (;;)
{
if ( blah) func1(the_state);
else func2(the_state);
}
}
// func1.cc
#include "foo.h"
void func1(TheState &the_state)
{
++the_state.x;
}
No, you can't do that. static objects aren't visible in other source files, ever.
How large is your switch anyway? And what is the reason for modifying it?
Perhaps the original programmer had good reasons for the local, static variables? You say it is called in a loop, and some of the static variables are used to keep state from one iteration to the next, shared among branches of the switch. It is certainly a weird way to structure the code. I can think of doing something like this to run some sort of finite automaton, but in that case I'd write the automaton as a string of snippets of code for each state, and transfer among them by straight gotos. I'd make certain somewhere very near there is a description of the automaton in a more readable form.
But I might be totally off-base. Can you share a bit more about what this code does?
First, switches often are avoidable by creating better data structures with their functions (e.g. classes with a virtual member function command whose implementations do the right thing).
On a less ambitious level you could just pass pointers to the statics which are needed in that particular case to the function so that it can read and modify the state of those variables.
Depending on what the functions do, one could also pass state information as value parameters (copies), let the function do their work depending on that state, receive the results and THEN change global state in the main switch according to the result. The state change then is clearly visible (i.e. no side effects in the functions) and the noisy distracting details are banned to another file.
If each case tends to use many of the static variables then you could put them all in a struct; that change should be doable with a text editor (replace variable name x with mystruct.x etc.). Then each function just gets a pointer to that struct. EDIT: As I said in a comment: Perhaps the commands naturally form groups which are concerned with only parts of the state (e.g. there are commands which only read, others which only write data etc.). Then the global state could be split in corresponding groups of data. Each function only gets to see the data group which concerns it, which limits potential side effects.
But generally spoken the function as it is now seems badly designed/grown over time; working on a large set of static variables means having "side effects" in the code all over -- it's not easy to see what any given portion of code does and how it interacts with others. The information flow is not explicit. Analyzing clusters of data which belong together, organizing them in classes and separating them in files would be one task here, even without any virtual member functions.
As to your last question: The "case functions" you create (operation1(); etc.) need only be known in the file which call them. If they are in one or several separate files you should create a header containing the prototypes.
Related
I'm implementing a helper class which has a number of useful functions which will be used in a large number of classes. However, a few of them are not designed to be called from within certain sections of code (from interrupt functions, this is an embedded project).
However, for users of this class the reasons why some functions are allowed while others are prohibited from being called from interrupt functions might not be immediately obvious, and in many cases the prohibited functions might work but can cause very subtle and hard to find bugs later on.
The best solution for me would be to cause a compiler error if the offending function is called from a code section it shouldn't be called from.
I've also considered a few non-technical solutions, but a technical one would be preferred.
Indicate it in the documentation with a warning. Might be easily missed, especially when the function seems obvious, like read_byte(), why would anyone study the documentation whether the function is reentrant or not?
Indicate it in the function's name. Ugly. Who likes function names like read_byte_DO_NOT_CALL_FROM_INTERRUPT() ?
Have a global variable in a common header, included in each and every file, which is set to true at the beginning of each interrupt, set to false at the end, and the offending functions check it at their beginning, and exit if it's set. Problem: interrupts might interrupt each other. Also, it doesn't cause compile-time warnings or errors.
Similar to #3, have a global handler with a stack, so that nested interrupts can be handled. Still has the problem of only working at runtime and it also adds a lot of overhead. Interrupts should not waste more than a clock cycle or two for this feature, if at all.
Abusing the preprocessor. Unfortunately, the naive way of a #define at the beginning and an #undef at the end of each interrupt, with an #ifdef at the beginning of the offending function doesn't work, because the preprocessor doesn't care about scope.
As interrupts are always classless functions, I could make the offending functions protected, and declare them as friends in all classes which use them. This way, it would be impossible to use them directly from within interrupts. As main() is classless, I'll have to place most of it into a class method. I don't like this too much, as it can become needlessly complicated, and the error it generates is not obvious (so users of this function might encapsulate them to "solve" the problem, without realizing what the real problem was). A compiler or linker error message like "ERROR: function_name() is not to be used from within an interrupt" would be much more preferable.
Checking the interrupt registers within the function has several issues. In a large microcontroller there are a lot of registers to check. Also, there is a very small but dangerous chance of a false positive when an interrupt flag is being set exactly one clock cycle before, so my function would fail because it thinks it was called from an interrupt, while the interrupt would be called in the next cycle. Also, in nested interrupts, the interrupt flags are cleared, causing a false negative. And finally, this is yet another runtime solution.
I did play with some very basic template metaprogramming a while ago, but I'm not that experienced with it to find a very simple and elegant solution. I would rather try other ways before committing myself to try to implement a template metaprogramming bloatware.
A solution working with only features available in C would also be acceptable, even preferable.
Some comments below. As a warning, they won't be fun reading, but I won't do you a service by not pointing out what's wrong here.
If you are calling external functions from inside an ISR, no amount of documentation or coding will help you. Since in most cases, it is bad practice to do so. The programmer must know what they are doing, or no amount of documentation or coding mechanisms will save the program.
Programmers do not design library functions specifically for the purpose of getting called from inside an ISR. Rather, programmers design ISR:s with all the special restrictions that come with an ISR in mind: make sure interrupt flags are cleared correctly, keep the code short, do not call external functions, do not block the MCU longer than necessary, consider re-entrancy, consider dangerous compiler optimizations (use volatile). A person who does not know this is not competent enough to write ISRs.
If you actually have a function int read_byte(int address) then this suggests that the program design is bad to begin with. This function could do one of two things:
Either it can read a byte some some peripheral hardware, in which case the function name is very bad and should be changed.
Or it could read any generic byte from an address, in which case the function is 100% useless "bloatware". You can safely assume that a somewhat competent C programmer can read a byte from a memory address without some bloatware holding their hand.
In either case, int is not a byte. It is a word of 16 or 32 bits. The function should be returning uint8_t. Similarly, if the parameter passed is used to descibe a memory-mapped address of an MCU, it should either have type void*, uint8_t* or uintptr_t. Everything else is wrong.
Notably, if you are using int rather than stdint.h for embedded systems programming, then this whole discussion is the least of your problems, as you haven't even gotten the fundamental basics right. Your programs will be filled to the brim with undefined behavior and implicit promotion bugs.
Overall, all the solutions you suggest are simply not acceptable. The root of the problem here appears to be the program design. Deal with that instead of inventing ways to defend the broken design with horrible meta programming.
I would suggest option 8 & 9.
Peer reviews & assertions.
You state in the comments that your interrupt functions are short. If that's really the case, then reviewing them will be trivial. Adding comments in the header will make it so that anyone can see what's going on. On adding an assert, while you make it viable that debug builds will return the wrong result in error, it will also ensure that you you will catch any calls; and give you a fighting chance during testing to catch the problem.
Ultimately, the macro processing just won't work since the best you can do is catch if a header has been included, but if the callstack goes via another wrapper (that doesn't have comments) then you just can't catch that.
Alternatively you could make your helper a template, but then that would mean every wrapper around your helper would also have to be a template so that can know if you're in an interrupt routine... which will ultimately be your entire code base.
if you have one file for all interrupt routine then this might be helpful:
define one macro in class header ,say FORBID_INTERRUPT_ROUTINE_ACCESS.
and in interrupt handler file check for that macro definition :
#ifdef FORBID_INTERRUPT_ROUTINE_ACCESS
#error : cannot access function from interrupt handlers.
#endif
if someone add header file for that class to use that class in interrupt handler then it will throw an error.
Note : you have to build target by specifying that warnings will be considered as error.
Here is the C++ template functions suggestion.
I don't think this is metaprogramming or bloatware.
First make 2 classes which will define the context which the user will be using the functions in:
class In_Interrupt_Handler;
class In_Non_Interrupt_Handler;
If You will have some common implementations between the 2 contexts, a Base class can be added:
class Handy_Base
{
protected:
static int Handy_protected() { return 0; }
public:
static int Handy_public() { return 0; }
};
The primary template definition, without any implementations. The implemenations will be provided by the specialization classes:
template< class Is_Interrupt_Handler >
class Handy_functions;
And the specializations.
// Functions can be used when inside an interrupt handler
template<>
struct Handy_functions< In_Interrupt_Handler >
: Handy_Base
{
static int Handy1() { return 1; }
static int Handy2() { return 2; }
};
// Functions can be used when inside any function
template<>
struct Handy_functions< In_Non_Interrupt_Handler >
: Handy_Base
{
static int Handy1() { return 4; }
static int Handy2() { return 8; }
};
In this way if the user of the API wants to access the functions, the only way is by specifing what type of functions are needed.
Example of usage:
int main()
{
using IH_funcs = Handy_functions<In_Interrupt_Handler>;
std::cout << IH_funcs::Handy1() << '\n';
std::cout << IH_funcs::Handy2() << '\n';
using Non_IH_funcs = Handy_functions<In_Non_Interrupt_Handler>;
std::cout << Non_IH_funcs::Handy1() << '\n';
std::cout << Non_IH_funcs::Handy2() << '\n';
}
In the end I think the problem boils down to the developer using Your framework. And How much Your framework requires the devloper to boilerplate.
The above does not stop the developer calling the Non Interrupt Handler functions from inside an Interrupt Handler.
I think that type of analysis would require some type of static analysis checking system.
I’ve used global variables without having any noticeable problems but would like to know if there are potential problems or drawbacks with my use of globals.
In the first scenario, I include const globals into a globals.h file, I then include the header into various implementation files where I need access to any one of the globals:
globals.h
const int MAX_URL_LEN = 100;
const int MAX_EMAIL_LEN = 50;
…
In the second scenario, I declare and initialize the globals in an implementation file when the application executes. These globals are never modified again. When I need access to these globals from a different implementation file, I use the extern keyword:
main.cpp
char application_path[128];
char data_path[128];
// assign data to globals
strcpy(application_path, get_dll_path().c_str());
…
do_something.cpp
extern char application _path[]; // global is now accessible in do_something.cpp
Regarding the first scenario above, I’ve considered removing all of the different “include globals.h” and using extern where access to those globals is needed but have not done so since just including the globals.h is so convenient.
I am concerned that I will have different versions of the variables for each implementation file that includes globals.h.
Should I use extern instead of including the globals.h everywhere access is needed?
Please advise, and thank you.
Global mutable variables
provide invisible lines of influence across all of the code, and
you cannot rely on their values, or whether they've been initialized.
That is, global mutable variables do for data flow what the global goto once did for execution flow, creating a spaghetti mess, wasting everyone's time.
Constant global variables are more OK, but even for those you run into
the initialization order fiasco.
I remember how angry I got when I realized that all my troubles in wrapping a well known GUI framework, was due to it needlessly using global variables and provoking the initialization order fiasco. First the anger was directed at the author, then at myself for being so stupid, not realizing what was going on (or rather, was not going on). Anyway.
A sensible solution to all this is Meyers' singletons, like
inline
auto pi_decimal_digits()
-> const string&
{
static const string the_value = compute_pi_digits();
return the_value;
}
For the case of a global that's dynamically initialized from some place that knows the value, “one programmer's constant is another programmer's variable”, there is no good solution, but one practical solution is to accept the possibility of a run time error and at least detect it:
namespace detail {
inline
auto mutable_pi_digits()
-> string&
{
static string the_value;
return the_value;
}
} // namespace detail
inline
void set_pi_digits( const string& value )
{
string& digits = detail::mutable_pi_digits();
assert( digits.length() == 0 );
digits = value;
}
inline
auto pi_digits()
-> const string&
{ return detail::mutable_pi_digits(); }
Your implementation is fine for now. Globals become a problem when
Your program grows and so does your number of globals.
New people join the team that don't know what you were thinking.
Number 1 becomes particularly troublesome when your program becomes multi-threaded. Then you have a number of threads using the same data and you may require protection, which is difficult with just a list of globals.
By grouping data in separate files according to some criteria such as purpose or subject matter your code becomes more maintainable as it grows and you leave breadcrumbs for new programmers on the project to figure out how the software works.
One issue with globals is that when you go to include 3rd party libraries in your code, sometimes they've used globals with the same names as yours. There are definitely times when a global makes sense, but if possible you should also take care to do something like put it into a namespace.
I'm working on a project where we have several executables that share several object files. We want to add logging to all of the executables, and have a library for doing so.
However, it seems clumsy to go to the main() function of every executable file and add in the same boiler-plate function call to start the logging. It means we write the same thing over again, and loose out on maintainability and DRY ("don't repeat yourself"). It would be nice if we could systematically ensure that logging started before the main function gets called.
It occurred to me there are functions in libc++ that make the call to main, and it may be possible to override them. However, I don't know what they are and imagine this could break things if we're not careful. Does anyone know how this would be done? Or, if that's too over-the-top, any other suggestions on how to proceed?
We're using C++11 with g++ 4.8 if it makes any difference.
You do not need to do this by modifying main().
You should instead create a class at global scope in a shared object library. The constructor of this class will perform the "initialisation" you want to do, before main() runs, and its destructor will run after main().
The issue you need to deal with is that the order of this initialisation and destruction is not guaranteed to be deterministic with regards to any other global-scope objects. All of this could go in one .cpp compilation unit.
class LoggingManager // you can make this a singleton but not necessary
{
public:
LoggingManager();
~LoggingManager();
};
LoggingManager::LoggingManager()
{
// your initialisation code goes here
}
LoggingManager::~LoggingManager()
{
// your clean-up code goes here. It should not throw
}
LoggingManager loggingManagerStaticInstance;
Note that there is a small danger of the "static initialization" issue which means in reality your loggingManagerStaticInstance might not be loaded until your compilation unit is first accessed.
In reality it doesn't matter if this is after main() as long as the initialisation happens before it is first needed (a bit like a singleton) but it means your compilation unit might need to contain something that is guaranteed to get pulled in.
If you want to "stick" to gnu or similar they provide __attribute__(constructor) which might resolve it although there is an easier way of having some dummy extern int implemented or dummy function that returns an int that gets called from within whatever header you do actually use to implement logging.
I am working on a C++ project in Xcode, and one of my .cpp files instantiates some variables. Another .cpp file in the application uses these variables to instantiate another object and needs them to be instantiated to not throw a null-pointer exception. My solution so far was simply to drag-drop (XCode simplicity) the first file over the second one in the build-phase order. It works fine now, but I have a feeling that it is not the optimal solution, and that there is something fundamentally wrong with my code if I need to organise the compile order manually for the application to run properly.
Should I never instantiate something outside of functions, or what is the golden rule? Thanks.
EDIT: An example as requested.
The problem lies in a Observer/Event system.
In a source-file I do this:
Trigger* mainMenu_init = new Trigger(std::vector<Event*> {
// Event(s):
event_gameInit,
}, [](Event* e) {
// Action(s):
std::cout << "Hello World" << std::endl;
});
In the trigger's constructor the Event is asked to add is as an observer:
for(Event* event : events)
event->addObserver(this);
BUT, the events are just external pointers, so if they are not initialised (which they are in another source-file) this initialisation will fail. So what I found was that if I do not organise the compilation-phase myself, random triggers will not work while other will, depending on if they are built before or after the Event.cpp file.
I assume you are talking about non-trivial initialization of global variables (or of static variables), such as (at the top level of a file):
MyObject *myPtrObject = new MyObject(42, "blah");
MyObject myOtherObject;
("trivial" initialization is, roughly speaking, when there is no constructor involved and everything just involves constants; so if you initialize a pointer to zero, it will be zero before any code is actually invoked)
The order of initialization between different source files is NOT GUARANTEED in C++. It happens to depend on the order of the files with Apple's current system, but THAT MIGHT CHANGE.
So yes, there is something fundamentally wrong.
Golden Rules
IMPORTANT: In the initialization of a global object, don't use any other global objects from different source files.
Don't overuse global variables. They have numerous disadvantages from a software design point of view.
Keep initialization of global objects simple. That will make it easier to stick to the first rule.
Not knowing anything about your program, it's of course hard to give more concrete design advice.
I have programmed in both Java and C, and now I am trying to get my hands dirty with C++.
Given this code:
class Booth {
private :
int tickets_sold;
public :
int get_tickets_sold();
void set_tickets_sold();
};
In Java, wherever I needed the value of tickets_sold, I would call the getter repeatedly.
For example:
if (obj.get_tickets_sold() > 50 && obj.get_tickets_sold() < 75){
//do something
}
In C I would just get the value of the particular variable in the structure:
if( obj_t->tickets_sold > 50 && obj_t->tickets_sold < 75){
//do something
}
So while using structures in C, I save on the two calls that I would otherwise make in Java, the two getters that is, I am not even sure if those are actual calls or Java somehow inlines those calls.
My point is if I use the same technique that I used in Java in C++ as well, will those two calls to getter member functions cost me, or will the compiler somehow know to inline the code? (thus reducing the overhead of function call altogether?)
Alternatively, am I better off using:
int num_tickets = 0;
if ( (num_tickets = obj.get_ticket_sold()) > 50 && num_tickets < 75){
//do something
}
I want to write tight code and avoid unnecessary function calls, I would care about this in Java, because, well, we all know why. But, I want my code to be readable and to use the private and public keywords to correctly reflect what is to be done.
Unless your program is too slow, it doesn't really matter. In 99.9999% of code, the overhead of a function call is insignificant. Write the clearest, easiest to maintain, easiest to understand code that you can and only start tweaking for performance after you know where your performance hot spots are, if you have any at all.
That said, modern C++ compilers (and some linkers) can and will inline functions, especially simple functions like this one.
If you're just learning the language, you really shouldn't worry about this. Consider it fast enough until proven otherwise. That said, there are a lot of misleading or incomplete answers here, so for the record I'll flesh out a few of the subtler implications. Consider your class:
class Booth
{
public:
int get_tickets_sold();
void set_tickets_sold();
private:
int tickets_sold;
};
The implementation (known as a definition) of the get and set functions is not yet specified. If you'd specified function bodies inside the class declaration then the compiler would consider you to have implicitly requested they be inlined (but may ignore that if they're excessively large). If you specify them later using the inline keyword, that has exactly the safe effect. Summarily...
class Booth
{
public:
int get_tickets_sold() { return tickets_sold; }
...
...and...
class Booth
{
public:
int get_tickets_sold();
...
};
inline int Booth::get_tickets_sold() { return tickets_sold; }
...are equivalent (at least in terms of what the Standard encourages us to expect, but individual compiler heuristics may vary - inlining is a request that the compiler's free to ignore).
If the function bodies are specified later without the inline keyword, then the compiler is under no obligation to inline them, but may still choose to do so. It's much more likely to do so if they appear in the same translation unit (i.e. in the .cc/.cpp/.c++/etc. "implementation" file you're compiling or some header directly or indirectly included by it). If the implementation is only available at link time then the functions may not be inlined at all, but it depends on the way your particular compiler and linker interact and cooperate. It is not simply a matter of enabling optimisation and expecting magic. To prove this, consider the following code:
// inline.h:
void f();
// inline.cc:
#include <cstdio>
void f() { printf("f()\n"); }
// inline_app.cc:
#include "inline.h"
int main() { f(); }
Building this:
g++ -O4 -c inline.cc
g++ -O4 -o inline_app inline_app.cc inline.o
Investigating the inlining:
$ gdb inline_app
...
(gdb) break main
Breakpoint 1 at 0x80483f3
(gdb) break f
Breakpoint 2 at 0x8048416
(gdb) run
Starting program: /home/delroton/dev/inline_app
Breakpoint 1, 0x080483f3 in main ()
(gdb) next
Single stepping until exit from function main,
which has no line number information.
Breakpoint 2, 0x08048416 in f ()
(gdb) step
Single stepping until exit from function _Z1fv,
which has no line number information.
f()
0x080483fb in main ()
(gdb)
Notice the execution went from 0x080483f3 in main() to 0x08048416 in f() then back to 0x080483fb in main()... clearly not inlined. This illustrates that inlining can't be expected just because a function's implementation is trivial.
Notice that this example is with static linking of object files. Clearly, if you use library files you may actually want to avoid inlining of the functions specifically so that you can update the library without having to recompile the client code. It's even more useful for shared libraries where the linking is done implicitly at load time anyway.
Very often, classes providing trivial functions use the two forms of expected-inlined function definitions (i.e. inside class or with inline keyword) if those functions can be expected to be called inside any performance-critical loops, but the countering consideration is that by inlining a function you force client code to be recompiled (relatively slow, possibly no automated trigger) and relinked (fast, for shared libraries happens on next execution), rather than just relinked, in order to pick up changes to the function implementation.
These kind of considerations are annoying, but deliberate management of these tradeoffs is what allows enterprise use of C and C++ to scale to tens and hundreds of millions of lines and thousands of individual projects, all sharing various libraries over decades.
One other small detail: as a ballpark figure, an out-of-line get/set function is typically about an order of magnitude (10x) slower than the equivalent inlined code. That will obviously vary with CPU, compiler, optimisation level, variable type, cache hits/misses etc..
No, repetitive calls to member functions will not hurt.
If it's just a getter function, it will almost certainly be inlined by the C++ compiler (at least with release/optimized builds) and the Java Virtual Machine may "figure out" that a certain function is being called frequently and optimize for that. So there's pretty much no performance penalty for using functions in general.
You should always code for readability first. Of course, that's not to say that you should completely ignore performance outright, but if performance is unacceptable then you can always profile your code and see where the slowest parts are.
Also, by restricting access to the tickets_sold variable behind getter functions, you can pretty much guarantee that the only code that can modify the tickets_sold variable to member functions of Booth. This allows you to enforce invariants in program behavior.
For example, tickets_sold is obviously not going to be a negative value. That is an invariant of the structure. You can enforce that invariant by making tickets_sold private and making sure your member functions do not violate that invariant. The Booth class makes tickets_sold available as a "read-only data member" via a getter function to everyone else and still preserves the invariant.
Making it a public variable means that anybody can go and trample over the data in tickets_sold, which basically completely destroys your ability to enforce any invariants on tickets_sold. Which makes it possible for someone to write a negative number into tickets_sold, which is of course nonsensical.
The compiler is very likely to inline function calls like this.
class Booth {
public:
int get_tickets_sold() const { return tickets_sold; }
private:
int tickets_sold;
};
Your compiler should inline get_tickets_sold, I would be very surprised if it didn't. If not, you either need to use a new compiler or turn on optimizations.
Any compiler worth its salt will easily optimize the getters into direct member access. The only times that won't happen are when you have optimization explicitly disabled (e.g. for a debug build) or if you're using a brain-dead compiler (in which case, you should seriously consider ditching it for a real compiler).
The compiler will very likely do the work for you, but in general, for things like this I would approach it more from the C perspective rather than the Java perspective unless you want to make the member access a const reference. However, when dealing with integers, there's usually little value in using a const reference over a copy (at least in 32 bit environments since both are 4 bytes), so your example isn't really a good one here... Perhaps this may illustrate why you would use a getter/setter in C++:
class StringHolder
{
public:
const std::string& get_string() { return my_string; }
void set_string(const std::string& val) { if(!val.empty()) { my_string = val; } }
private
std::string my_string;
}
That prevents modification except through the setter which would then allow you to perform extra logic. However, in a simple class such as this, the value of this model is nil, you've just made the coder who is calling it type more and haven't really added any value. For such a class, I wouldn't have a getter/setter model.