Breaking down function Logic into sub function - c++

Splitting up functions, into smaller sub function into the code, can effect efficiency of the program?
while reducing cyclomatic complexity of functions i have break down function into smaller parts, and has used helper function and inline functions for it.
void functionParent(arguments)
{
intialCheckFunction(arguments);
functionOne();
functionTwo();
functionThree();
functionFour();
return STATUS;
}
void functionOne()
{
/*follows unary Principle.*/
}
My concern is regarding the stack pointer, does a frequent switch of SP reduce efficiency of program drastically or it negligible.
The above functionOne,Two,.. are having UNARY Logic in them.
Kindenter code herely reply in both context, C as well C++

You should split off logic into its own function whenever you think that it would aid readability: the cost of a function call itself is negligible.
Although it is generally true that calling a function consumes some space and CPU cycles, you shouldn't be worrying about it at all: the instructions involved are optimized beyond belief, and the compiler can inline your code when it sees fit.
EDIT (in response to comment by Potatoswatter)
One thing you need to be careful is passing parameters, especially in C++, where user code can participate in the process of copying parameters being passed to the function. Passing large structs by value can take more than a few cycles in C, too, so you should pass them by reference or by pointer whenever you can.

Generally I prefer to break up function into smaller functions so that we can reuse it in other places, It is often required during re-factoring. If you are so concerned about the switch and if the function is really small you can mark it as inline. However i don't think having too many functions make such a huge difference in performance of your program.

If you are using C++ you can declare inline functions.
Then no overhead will happen due function call, once code will be replaced in line.
inline bool check(args){
if( some_condiction(args) ){
return true;
}
}
inline void functionOne(){...}
inline void functionTwo(){...}
inline void functionThree(){...}
inline void functionFour(){...}
int functionParent(arguments){
if(check(arguments)==false)
return FAIL;
functionOne();
functionTwo();
functionThree();
functionFour();
return STATUS;
}
The current processors architecture execute more than one instruction per time, lets say 5. At a given moment they may be in intermediate stages of completion, lets say 90%,80%,70%,60%,50%. If the first instruction is a function call, all effort made to evaluate the next 4 instructions will be in vain, which will greatly reduce the program execution speed.
Its not needed to care too much about these details, unless you are creating a critical application. Usually compiler is smart enough to inline the needed functions when using optimization flags.

Related

MSVC optimizer saves and restores XMM SIMD registers on an early-out path through a function. Why? [duplicate]

In C, if I have a function call that looks like
// main.c
...
do_work_on_object(object, arg1, arg2);
...
// object.c
void do_work_on_object(struct object_t *object, int arg1, int arg2)
{
if(object == NULL)
{
return;
}
// do lots of work
}
then the compiler will generate a lot of stuff in main.o to save state, pass parameters (hopefully in registers in this case), and restore state.
However, at link time it can be observed that arg1 and arg2 are not used in the quick-return path, so the clean-up and state restoration can be short-circuited. Do linkers tend to do this kind of thing automatically, or would one need to turn on link-time optimization (LTO) to get that kind of thing to work?
(Yes, I could inspect the disassembled code, but I'm interested in the behaviours of compilers and linkers in general, and on multiple architectures, so hoping to learn from others' experience.)
Assuming that profiling shows this function call is worth optimizing, should we expect the following code to be noticeably faster (e.g. without the need to use LTO)?
// main.c
...
if(object != NULL)
{
do_work_on_object(object, arg1, arg2);
}
...
// object.c
void do_work_on_object(struct object_t *object, int arg1, int arg2)
{
assert(object != NULL) // generates no code in release build
// do lots of work
}
Some compilers (like GCC and clang) are able to do "shrink-wrap" optimization to delay saving call-preserved regs until after a possible early-out, if they're able to spot the pattern. But some don't, e.g. apparently MSVC 16.11 still doesn't.
I don't think any do partial inlining of just the early-out check into the caller, to avoid even the overhead of arg-passing and the call / ret itself.
Since compiler/linker support for this is not universal and not always successful even for shrink-wrapping, you can write your code in a way that gets much of the benefit, at the cost of splitting the logic of your function into two places.
If you have a fast-path that takes hardly any code, but happens often enough to matter, put that part in a header so it gets inlined, with a fallback to calling the rest of the function (which you make private, so it can assume that any checks in the inlined part are already done).
e.g. par2's routine that processes a block of data has a fast-path for when the galois16 factor is zero. (dst[i] += 0 * src[i] is a no-op, even when * is a multiply in Galois16, and += is a GF16 add (i.e. a bitwise XOR)).
Note how the commit in question renames the old function to InternalProcess, and adds a new template<class g> inline bool ReedSolomon<g>::Process that checks for the fast-path, and otherwise calls InternalProcess. (as well as making a bunch of unrelated whitespace changes, and some ifdefs... It was originally a 2006 CVS commit.)
The comment in the commit claims an overall 8% speed gain for repairing.
Neither the setup or cleanup state code can be short-circuited, because the resulted compiled code is static, and it doesn't know what will happen when the program get's executed. So the compiler will always have to setup the whole parameter stack.
Think of two situations: in one object is nil, in the other is not. How will the assembly code know if to put on the stack the rest of the argument? Especially as the caller is the one responsible of placing the arguments at their proper location (stack or registry).

Lookup table to Function Pointer Array C++ performance

I have a following code to emulate basic system on my pc (x86):
typedef void (*op_fn) ();
void add()
{
//add Opcode
//fetch next opcode
opcodes[opcode]();
}
void nop()
{
//NOP opcode
//fetch next opcode
opcodes[opcode]();
}
const op_fn opcodes[256] =
{
add,
nop,
etc...
};
and i call this "table" via opcodes[opcode]()
I am trying to improve performance of my interpreter.
What about inlining every function, like
inline void add()
inline void nop()
Is there any benefits of doing it?
Is there anyway to make it go faster?
Thanks
Just because you flag a method as inline it doesn't require the compiler to do so - it's more of a hint than an order.
Given that you are storing the opcode handlers in an array the compiler will need to place the address of the function into the array, therefore it can't inline it.
There's actually nothing wrong with your approach. If you really think you've got performance issues then get some metrics, otherwise don't worry (at this point!). The concept of a table of pointers to functions is nothing new - it's actually how C++ implement virtual functions (ie the vtable).
"Inline" means "don't emit a function call; instead, substitute the function body at compile time."
Calling through a function pointer means "do a function call, the details of which won't be known until runtime."
The two features are fundamentally opposed. (The best you could hope for is that a sufficiently advanced compiler could statically determine which function is being called through a function pointer in very limited circumstances and inline those.)
switch blocks are typically implemented as jump tables, which could have less overhead than function calls, so replacing your function pointer array with a switch block and using inline might make a difference.
inline is just a hint to your compiler, it does not guarantee any inlining being done. You should read up on inlining (maybe at the ISO C++ FAQ), as too much inlining can actually make your code slower (through code bloat and associated virtual memory trashing ).

Which approach is better for supplying compile time constants to a function ? Function argument vs. Template parameter

I have logging function being called at several places throughout the code. To every log, I have to supply 2 compile time constants. There are 2 approaches to accomplish:
(1) Function argument:
template<typename T>
void log (const T &obj, const int LINE, const int COUNT)
{
// T is used for some purpose
if(debug)
logging(obj.out(), LINE, COUNT);
}
call it as,
log(str, __LINE__, __COUNTER__);
(2) Template parameter:
template<typename T, int LINE, int COUNT>
void log (T &obj)
{
// T is used for some purpose
if(debug)
logging(obj.out(), LINE, COUNT);
}
call it as,
log<__LINE__, __COUNTER__>(str);
I am not able to decide, because 1st approach offers simplicity, but we are passing constant at compile time. 2nd approach is perfect, but compilation time would probably increase. This task is tedious, and I haven't implemented any of them yet, so I don't have any bench mark.
It will be a great help if someone can answer this from their experience/knowledge.
Since the choice between these two makes a difference to the calling code, I would recommend logging via a macro. Then you don't have to worry now about which of these is better, because it's easy to switch between them.
Once you have your real application written, you can mess with the macro definition to compare the two. Or not, if there are more productive areas to optimize. If it turns out to make a big difference, you can even leave it open to the build config to decide whether to use -DLOGGING_COMPILES_QUICKLY or -DLOGGING_RUNS_QUICKLY.
Another potential benefit of a macro: you could arrange that the first argument is evaluated if and only if debug is true. I don't know what the interface of str is, or where those objects come from, but if it costs anything to produce the right value to pass to log, and then log doesn't use it in the non-debug case, then that's a potential waste of runtime.
I would go with the first option. The performance impact of passing two integers is negligible. The optimizer will also probably inline the function call in which case there would be no difference between the two. The second option I think is a bad idea, since you will be creating a lot of versions of the same function, for no reason.

Do repetitive calls to member functions hurt?

I have programmed in both Java and C, and now I am trying to get my hands dirty with C++.
Given this code:
class Booth {
private :
int tickets_sold;
public :
int get_tickets_sold();
void set_tickets_sold();
};
In Java, wherever I needed the value of tickets_sold, I would call the getter repeatedly.
For example:
if (obj.get_tickets_sold() > 50 && obj.get_tickets_sold() < 75){
//do something
}
In C I would just get the value of the particular variable in the structure:
if( obj_t->tickets_sold > 50 && obj_t->tickets_sold < 75){
//do something
}
So while using structures in C, I save on the two calls that I would otherwise make in Java, the two getters that is, I am not even sure if those are actual calls or Java somehow inlines those calls.
My point is if I use the same technique that I used in Java in C++ as well, will those two calls to getter member functions cost me, or will the compiler somehow know to inline the code? (thus reducing the overhead of function call altogether?)
Alternatively, am I better off using:
int num_tickets = 0;
if ( (num_tickets = obj.get_ticket_sold()) > 50 && num_tickets < 75){
//do something
}
I want to write tight code and avoid unnecessary function calls, I would care about this in Java, because, well, we all know why. But, I want my code to be readable and to use the private and public keywords to correctly reflect what is to be done.
Unless your program is too slow, it doesn't really matter. In 99.9999% of code, the overhead of a function call is insignificant. Write the clearest, easiest to maintain, easiest to understand code that you can and only start tweaking for performance after you know where your performance hot spots are, if you have any at all.
That said, modern C++ compilers (and some linkers) can and will inline functions, especially simple functions like this one.
If you're just learning the language, you really shouldn't worry about this. Consider it fast enough until proven otherwise. That said, there are a lot of misleading or incomplete answers here, so for the record I'll flesh out a few of the subtler implications. Consider your class:
class Booth
{
public:
int get_tickets_sold();
void set_tickets_sold();
private:
int tickets_sold;
};
The implementation (known as a definition) of the get and set functions is not yet specified. If you'd specified function bodies inside the class declaration then the compiler would consider you to have implicitly requested they be inlined (but may ignore that if they're excessively large). If you specify them later using the inline keyword, that has exactly the safe effect. Summarily...
class Booth
{
public:
int get_tickets_sold() { return tickets_sold; }
...
...and...
class Booth
{
public:
int get_tickets_sold();
...
};
inline int Booth::get_tickets_sold() { return tickets_sold; }
...are equivalent (at least in terms of what the Standard encourages us to expect, but individual compiler heuristics may vary - inlining is a request that the compiler's free to ignore).
If the function bodies are specified later without the inline keyword, then the compiler is under no obligation to inline them, but may still choose to do so. It's much more likely to do so if they appear in the same translation unit (i.e. in the .cc/.cpp/.c++/etc. "implementation" file you're compiling or some header directly or indirectly included by it). If the implementation is only available at link time then the functions may not be inlined at all, but it depends on the way your particular compiler and linker interact and cooperate. It is not simply a matter of enabling optimisation and expecting magic. To prove this, consider the following code:
// inline.h:
void f();
// inline.cc:
#include <cstdio>
void f() { printf("f()\n"); }
// inline_app.cc:
#include "inline.h"
int main() { f(); }
Building this:
g++ -O4 -c inline.cc
g++ -O4 -o inline_app inline_app.cc inline.o
Investigating the inlining:
$ gdb inline_app
...
(gdb) break main
Breakpoint 1 at 0x80483f3
(gdb) break f
Breakpoint 2 at 0x8048416
(gdb) run
Starting program: /home/delroton/dev/inline_app
Breakpoint 1, 0x080483f3 in main ()
(gdb) next
Single stepping until exit from function main,
which has no line number information.
Breakpoint 2, 0x08048416 in f ()
(gdb) step
Single stepping until exit from function _Z1fv,
which has no line number information.
f()
0x080483fb in main ()
(gdb)
Notice the execution went from 0x080483f3 in main() to 0x08048416 in f() then back to 0x080483fb in main()... clearly not inlined. This illustrates that inlining can't be expected just because a function's implementation is trivial.
Notice that this example is with static linking of object files. Clearly, if you use library files you may actually want to avoid inlining of the functions specifically so that you can update the library without having to recompile the client code. It's even more useful for shared libraries where the linking is done implicitly at load time anyway.
Very often, classes providing trivial functions use the two forms of expected-inlined function definitions (i.e. inside class or with inline keyword) if those functions can be expected to be called inside any performance-critical loops, but the countering consideration is that by inlining a function you force client code to be recompiled (relatively slow, possibly no automated trigger) and relinked (fast, for shared libraries happens on next execution), rather than just relinked, in order to pick up changes to the function implementation.
These kind of considerations are annoying, but deliberate management of these tradeoffs is what allows enterprise use of C and C++ to scale to tens and hundreds of millions of lines and thousands of individual projects, all sharing various libraries over decades.
One other small detail: as a ballpark figure, an out-of-line get/set function is typically about an order of magnitude (10x) slower than the equivalent inlined code. That will obviously vary with CPU, compiler, optimisation level, variable type, cache hits/misses etc..
No, repetitive calls to member functions will not hurt.
If it's just a getter function, it will almost certainly be inlined by the C++ compiler (at least with release/optimized builds) and the Java Virtual Machine may "figure out" that a certain function is being called frequently and optimize for that. So there's pretty much no performance penalty for using functions in general.
You should always code for readability first. Of course, that's not to say that you should completely ignore performance outright, but if performance is unacceptable then you can always profile your code and see where the slowest parts are.
Also, by restricting access to the tickets_sold variable behind getter functions, you can pretty much guarantee that the only code that can modify the tickets_sold variable to member functions of Booth. This allows you to enforce invariants in program behavior.
For example, tickets_sold is obviously not going to be a negative value. That is an invariant of the structure. You can enforce that invariant by making tickets_sold private and making sure your member functions do not violate that invariant. The Booth class makes tickets_sold available as a "read-only data member" via a getter function to everyone else and still preserves the invariant.
Making it a public variable means that anybody can go and trample over the data in tickets_sold, which basically completely destroys your ability to enforce any invariants on tickets_sold. Which makes it possible for someone to write a negative number into tickets_sold, which is of course nonsensical.
The compiler is very likely to inline function calls like this.
class Booth {
public:
int get_tickets_sold() const { return tickets_sold; }
private:
int tickets_sold;
};
Your compiler should inline get_tickets_sold, I would be very surprised if it didn't. If not, you either need to use a new compiler or turn on optimizations.
Any compiler worth its salt will easily optimize the getters into direct member access. The only times that won't happen are when you have optimization explicitly disabled (e.g. for a debug build) or if you're using a brain-dead compiler (in which case, you should seriously consider ditching it for a real compiler).
The compiler will very likely do the work for you, but in general, for things like this I would approach it more from the C perspective rather than the Java perspective unless you want to make the member access a const reference. However, when dealing with integers, there's usually little value in using a const reference over a copy (at least in 32 bit environments since both are 4 bytes), so your example isn't really a good one here... Perhaps this may illustrate why you would use a getter/setter in C++:
class StringHolder
{
public:
const std::string& get_string() { return my_string; }
void set_string(const std::string& val) { if(!val.empty()) { my_string = val; } }
private
std::string my_string;
}
That prevents modification except through the setter which would then allow you to perform extra logic. However, in a simple class such as this, the value of this model is nil, you've just made the coder who is calling it type more and haven't really added any value. For such a class, I wouldn't have a getter/setter model.

Execution time differences, are there any?

Consider this piece of code:
class A {
void methodX() {
// snip (1 liner function)
}
}
class B {
void methodX() {
// same -code
}
}
Now other way i can go is, I have a class(AppManager) most of whose members are static, (from legacy code, don't suggest me singleton ;))
class AppManager {
public:
static void methodX(){
// same-code
}
}
Which one should be preferred?
As both are inlined, there shouldn't be a runtime difference, right?
Which form is more cleaner?
Now first of all, this is a concern so minuscule that you would never have to worry about it unless the functions are called thousands of times per frame (and you're doing something where "frames" matter).
Second, IF they are inlined, the code will be (hopefully) optimized so much that there is no sign whatsoever of the function being non-static. It would be identical.
Even if they were not inlined, the difference would be minor. The ABI would put the "this" pointer into a register (or the stack), which it wouldn't do in a static function, but again, the net result would be almost not measurable.
Bottom line - write your code in the cleanest possible way. Performance is not a concern at this point.
In my opinion Inline way would be faster.
because inline functions are replaced in code in compile time and therefor there is no need to save registers, make a function call and then return again. but when you call a static function it's just a function call and it has much overhead than the inline one.
I think that this is most common optimisation problem. At first level when you writing a code you try every single trick that would help compiler so if compiler can not optimise code well, you already have. This is wrong. What are you looking for in first stage of optimisation during writing code is just clean and understandable code, design and structure. That will make by far better code, that "optimised" by hand.
Rule is:
If you do not have resources to benchmark code, rewrite it and spend lot of time for optimisation than you do not need optimised code. In most cases it is hard to gain any speed boost whit any kind optimisation, if you structured your code well.