Pass parameters as array? - c++

Is it better (performance …) to pass parameters as array
template<typename Number>
static int solveQuadraticFunction(Number* dst, const Number* src)
{
Number a=src[0], b=src[1], c=src[2];
// …
}
or the "standard" way
template<typename Number>
static int solveQuadraticFunction(Number* dst, Number a, Number b, Number c)
{
// …
}

All questions regarding performance here wind up having the same answer - implement both solutions and time them. Having said that, I can''t see any reason the array version would be significantly faster or slower, but it sure is less convenient and less readable.

Probably not.
It depends on the arrangement of the arguments before the call. In the former case, you're requiring that the arguments be arranged into an array before the call, which might already be the case, or it might not; if it is already the case, and there are a large number of arguments, then it may be more optimal simply because it doesn't require the values to be assembled on the stack. However, it might result on the values simply being copied from the array onto the stack inside the called function instead of outside it, depending on how you then access the arguments (the specific example you give looks problematic: you define local variables and assign them from array elements; local variables typically live on the stack, though the compiler may be able to optimize them away).
Of course if the argument are not already arranged in an array before the call, then there is no gain (and there is probably at least a slight penalty) because you have to find somewhere to store the arguments as an array - which might involve memory allocation/deallocation - and then the arguments must be accessed indirectly via a pointer, which also has a slight cost.

I doubt you'll see any difference in performance; given appropriate optimization settings and inlining, the compiler should be able to decide which one to use.
The second option should be preferred when there's a small number of arguments, since it allows the compiler to check the number of items passed in.
(But concerning your example, be sure to use references: Number const &a=src[0], &b=src[1], &c=src[2]. These will be optimized away by any half-decent compiler.)

You will be pushing 2 more variables onto the stack when you call the second method than the first.
But it probably makes very little difference unless this was running in a very tight loop.

This depends on many things and is of course hardware/platform-dependent. In general, for many parameters (> 4) the array method is probably more efficient. If the parameter type doesn't fit into a CPU register, passing as an array should always be more efficient.

Ofcourse it's better to pass the address of the first element rather than pushing all the elements. You can either pass the pointer or the array by reference as below:
template<typename Number, unsigned int SIZE>
static int solveQuadraticFunction(Number* dst, Number (&src)[SIZE])
{
// … src is passed by reference
}

Related

Is there any performance difference between accessing a hardcode array and a run time initialization array?

For example, I want to create a square root table using array SQRT[i] to optimize a game, but I don't know if there is a performance difference between the following initialization when accessing the value of SQRT[i]:
Hardcode array
int SQRT[]={0,1,1,1,2,2,2,2,2,3,3,.......255,255,255}
Generate value at run time
int SQRT[65536];
int main(){
for(int i=0;i<65536;i++){
SQRT[i]=sqrt(i);
}
//other code
return 0;
}
Some example of accessing them:
if(SQRT[a*a+b*b]>something)
...
I'm not clear if the program stores or accesses a hard-code array in a different way, also I don't know if the compiler would optimize the hard-code array to speed up the access time, is there performance differences between them when accessing the array?
First, you should do the hard-coded array right:
static const int SQRT[]={0,1,1,1,2,2,2,2,2,3,3,.......255,255,255};
(also using uint8_t instead of int is probably better, so as to reduce the size of the array and making it more cache-friendly)
Doing this has one big advantage over the alternative: the compiler can easily check that the contents of the array cannot be changed.
Without this, the compiler has to be paranoid -- every function call might change the contents of SQRT, and every pointer could potentially be pointing into SQRT and thus any write through an int* or char* might be modifying the array. If the compiler cannot prove this doesn't happen, then that puts limits on the kinds of optimizations it can do, which in some cases could show a performance impact.
Another potential advantage is the ability to resolve things involving constants at compile-time.
If needed, you may be able to help the compiler figure things out with clever use of __restrict__.
In modern C++ you can get the best of both worlds; it should be possible (and in a reasonable way) to write code that would run at compile time to initialize SQRT as a constexpr. That would best be asked a new question, though.
As people said in comments:
if(SQRT[a*a+b*b]>something)
is a horrible example use-case. If that's all you need SQRT for, just square something.
As long as you can tell the compiler that SQRT doesn't alias anything, then a run-time loop will make your executable smaller, and only add a tiny amount of CPU overhead during startup. Definitely use uint8_t, not int. Loading a 32bit temporary from an 8bit memory location is no slower than from a zero-padded 32b memory location. (The extra movsx instruction on x86, instead of using a memory operand, will more than pay for itself in reduced cache pollution. RISC machines typically don't allow memory operands anyway, so you always need an instruction to load the value into a register.)
Also, sqrt is 10-21 cycle latency on Sandybridge. If you don't need it often, the int->double, sqrt, double->int chain is not much worse than an L2 cache hit. And better than going to L3 or main memory. If you need a lot of sqrt, then sure, make a LUT. The throughput will be much better, even if you're bouncing around in the table and causing L1 misses.
You could optimize the initialization by squaring instead of sqrting, with something like
uint8_t sqrt_lookup[65536];
void init_sqrt (void)
{
int idx = 0;
for (int i=0 ; i < 256 ; i++) {
// TODO: check that there isn't an off-by-one here
int iplus1_sqr = (i+1)*(i+1);
memset(sqrt_lookup+idx, i, iplus1_sqr-idx);
idx = iplus1_sqr;
}
}
You can still get the benefits of having sqrt_lookup being const (compiler knows it can't alias). Either use restrict, or lie to the compiler, so users of the table see a const array, but you actually write to it.
This might involve lying to the compiler, by having it declared extern const in most places, but not in the file that initializes it. You'd have to make sure this actually works, and doesn't create code referring to two different symbols. If you just cast away the const in the function that initializes it, you might get a segfault if the compiler placed it in rodata (or read-only bss memory if it's uninitialized, if that's possible on some platforms?)
Maybe we can avoid lying to the compiler, with:
uint8_t restrict private_sqrt_table[65536]; // not sure about this use of restrict, maybe that will do it?
const uint8_t *const sqrt_lookup = private_sqrt_table;
Actually, that's just a const pointer to const data, not a guarantee that what it's pointing to can't be changed by another reference.
The access time will be the same. When you hardcode an array, the C library routines called before the main will initialize it (in an embedded system the start up code copies the Read write data, i.e. hardcoded from ROM to RAM address where the array is located, if the array is constant, then it is accessed directly from ROM).
If the for loop is used to initialize, then there is an overhead of calling the Sqrt function.

Does const call by reference improve performance when applied to primitive types?

Concerning objects (especially strings), call by reference is faster than call-by-value because the function call does not need to create a copy of the original object. Using const, one can also ensure that the reference is not abused.
My question is whether const call-by-reference is also faster if using primitive types, like bool, int or double.
void doSomething(const string & strInput, unsigned int iMode);
void doSomething(const string & strInput, const unsigned int & iMode);
My suspicion is that it is advantageous to use call-by-reference as soon as the primitive type's size in bytes exceeds the size of the address value. Even if the difference is small, I'd like to take the advantage because I call some of these functions quite often.
Additional question: Does inlining have an influence on the answer to my question?
My suspicion is that it is advantageous to use call-by-reference as soon as the primitive type's size in bytes exceeds the size of the address value. Even if the difference is small, I'd like to take the advantage because I call some of these functions quite often.
Performance tweaking based on hunches works about 0% of the time in C++ (that's is a gut feeling I have about statistics, it works usually...)
It is correct that the const T& will be smaller than the T if sizeof(T) > sizeof(ptr), so usually 32-bits, or 64, depending on the system..
Now ask yourself :
1) How many built-in types are bigger than 64 bits ?
2) Is not copying 32-bits worth making the code less clear ? If your function becomes significantly faster because you didn't copy a 32bit value to it, maybe it doesn't do much ?
3) Are you really that clever ? (spoiler alert : no.) See this great answer for the reason why it is almost always a bad idea :
https://stackoverflow.com/a/4705871/1098041
Ultimately just pass by value. If after (thorough) profiling you identify that some function is a bottleneck, and all of the other optimizations that you tried weren't enough (and you should try most of them before this), pass-by-const-reference.
Then See that it doesn't change anything. roll-over and cry.
I addition to the other answers I would like to note that when you pass a reference and use (aka dereference) that a lot in your function, it could be slower than making a copy.
This is because local variables to a function (usually) get loaded into the cache together, but when one of them is a pointer/reference and the function uses that, it could result in a cache miss. Meaning it needs to go to the (slower) main memory to get the pointed to variable, which could be slower than making the copy which is loaded in cache together with the function.
So even for 'small objects' it could be potentially faster to just pass by value.
(I read this in the very good book: Computer Systems: a programmers perspective)
Some more interesting discussion on the whole cache hit/miss topic: How does one write code that best utilizes the CPU cache to improve performance?
On a 64-bit architecture, there is no primitive type---at least not in C++11---which is larger than a pointer/reference. You should test this, but intuitively, there should be the same amount of data shuffled around for a const T& as for an int64_t, and less for any primitive where sizeof(T) < sizeof(int64_t). Therefore, insofar as you can measure any difference, passing primitives by value should be faster if your compiler is doing the obvious thing---which is why I stress that if you need certainty here, you should write a test case.
Another consideration is that primitive function parameters can end up in CPU registers, which makes accessing them as fast as a memory access can be. You might find there are more instructions being generated for your const T& parameters than for your T parameters when T is a primitive. You could test this by checking the assembler output by your compiler.
I was taught:
Pass by value when an argument variable is one of the fundamental built-in types, such as bool, int, or float. Objects of these types are so small that passing by reference doesn't result in any gain in efficiency. Also if you want to make a copy of a variable.
Pass a constant reference when you want to efficiently pass a value that you don't need to change.
Pass a reference only when you want to alter the value of the argument variable. But to try to avoid changing argument variables whenever possible.
const is a keyword which is evaluated at compiletime. It does not have any impact on runtime performance. You can read some more about this here: https://isocpp.org/wiki/faq/const-correctness

About the order of input parameters

For a function/method contains many input parameters, does it make a difference if passing-in in different orders? If does, in what aspects (readability, efficiency, ...)? I am more curious about how should I do for my own functions/methods?
It seems to me that:
Parameters passing by references/pointers often come before parameters passing by values. For example:
void* memset( void* dest, int ch, std::size_t count );
Destination parameters often come before source parameters. For example:
void* memcpy( void* dest, const void* src, std::size_t count );
Except for some hard constraints, i.e., parameters with default values must come last. For example:
size_type find( const basic_string& str, size_type pos = 0 ) const;
They are functional equivalent (achieve the same goal) no matter what order they pass in.
There are a few reasons it can matter - listed below. The C++ Standard itself doesn't mandate any particular behaviours in this space, so there's no portable way to reason about performance impact, and even if something's demonstrably (slightly) faster in one executable, a change anywhere in the program, or to the compiler options or version, might remove or even reverse the earlier benefit. In practice it's extremely rare to hear people talk about parameter ordering being of any significance in their performance tuning. If you really care you'd best examine your own compiler's output and/or benchmark resultant code.
Exceptions
The order of evaluation of expressions passed to function parameters is unspecified, and it's quite possible that it could be affected by changes to the order they appear in the source code, with some combinations working better in the CPU execution pipeline, or raising an exception earlier that short-circuits some other parameter preparation. This could be a significant performance factor if some of the parameters are temporary objects (e.g. results of expressions) that are expensive to allocate/construct and destruct/deallocate. Again, any change to the program could remove or reverse a benefit or penalty observed earlier, so if you care about this you should create a named temporary for parameters you want evaluated first before making the function call.
Registers vs cache (stack memory)
Some parameters may be passed in registers, while others are pushed on to the stack - which effectively means entering at least the fastest of the CPU caches, and implies their handling may be slower.
If the function ends up accessing all the parameters anyway, and the choice is between putting parameter X in a register and Y on the stack or vice versa, it doesn't matter much how they're passed, but given the function may have conditions affecting which variables are actually used (if statements, switches, loops that may or may not be entered, early returns or breaks etc.), it's potentially faster if a variable that's not actually needed was on the stack while one that was needed was in a register.
See http://en.wikipedia.org/wiki/X86_calling_conventions for some background and information on calling conventions.
Alignment and padding
Performance could theoretically be affected by the minutae of parameter passing conventions: the parameters may need particular alignment for any - or perhaps just full-speed - access on the stack, and the compiler might choose to pad rather than reorder the values it pushes - it's hard to imagine that being significant unless the data for parameters was on the scale of cache page sizes
Non-performance factors
Some of the other factors you mention can be quite important - for example, I tend to put any non-const pointers and references first, and name the function load_xxx, so I have a consistent expectation of which parameters may be modified and which order to pass them. There's no particularly dominant convention though.
Strictly speaking it doesn't matter - parameters are pushed onto the stack and the function accessing them by taking them from the stack in some way.
Howver, most C/C++ compilers allow you to specify alternative calling conventions. For example, Visual C++ supports the __fastcall convention which stores the first 2 parameters in the ECX and EDX registers, which (in theory) should give you a performance improvement in the right circumstances.
There's also __thiscall which stores the this pointer in the ECX register. If you're doing C++ then this may be useful.
There are some answers here mentioning calling conventions. They have nothing to do with your question: No matter what calling convention you use, the order in which you declare the parameters doesn't matter. It doesn't matter which parameters are passed by registers and which are passed by stack, as long as the same number of parameters are passed by registers and the same amount of parameters are passed by stack. Please note that parameters that are higher in size than the native architecture size (4-bytes for 32-bit and 8-byte for 64-bit) are passed by an address, so they are passed with the same speed as a smaller size data.
Let's take an example:
You have a function with 6 parameters. And you have a calling convention, lets call it CA, that passes one parameter by register and the rest (5 in this case) by stack, and a second calling convention, lets call it CB, that passes 4 parameters by registers and the rest (in this case 2) by stack.
Now, of course that CA will be faster than CB, but it has nothing to do with the order the parameters are declared. For CA, it will be as fast no matter which parameter you declare first (by register) and which you declare 2nd, 3rd..6th (stack), and for CB it will be as fast no matter which 4 arguments you declare for registers and which you declare as last 2 parameters.
Now, regarding your question:
The only rule that is mandatory is that optional parameters must be declared last. No non-optional parameter can follow an optional parameter.
Other than that, you can use whatever order you want, and the only strong advice I can give you is be consistent. Choose a model and stick to it.
Some guidelines you could consider:
destination comes before source. This is to be close to destination = source.
the size of the buffer comes after the buffer: f(char * s, unsigned size)
input parameters first, output parameters last (this conflicts with the first one I gave you)
But there is no "wrong" or "right" or even a universal accepted guideline for the order of the parameters. Choose something and be consistent.
Edit
I thought of a "wrong" way to order you parameters: by alphabetic order :).
Edit 2
For example, both for CA, if I pass a vector(100) and a int, it will be better if vector(100) comes first, i.e. use registers to load larger data type. Right?
No. As I have mentioned it doesn't matter the data size. Let's talk on a 32-bit architecture (the same discussion is valid for any architecture 16-bit, 64-bit etc). Let's analyze the 3 case we can have regarding the size of the parameters in relation with the native size of the architecture.
Same size: 4-bytes parameters. Nothing to talk here.
Smaller size: a 4-bytes register will be used or 4-bytes will be allocated on stack. So nothing interesting here either.
Larger size: (e.g. a struct with many fields, or a static array). No matter which method is chosen for passing this argument, this data resides in memory, and what is passed is a pointer (size 4-bytes) to that data. Again we have a 4-bytes register or 4-bytes on the stack.
It doesn't matter the size of the parameters.
Edit 3
How #TonyD explained, the order matters if you don't access all the parameters. See his answer.
I have somehow found a few related pages.
https://softwareengineering.stackexchange.com/questions/101346/what-is-best-practice-on-ordering-parameters-in-a-function
https://google.github.io/styleguide/cppguide.html#Function_Parameter_Ordering
So first Google's C++ style does not really answer the question since it fails to answer the actual order within the input parameters or output parameters.
The other page basically suggests that order parameters in a sense easy to be understood and used.
For the sake of readability, I personally prefer to order parameter based on alphabet order. But you can also work on some strategy to name the parameters to be nicely ordered so that they could be still easy to be understood and used.

Passing scalar types by value or reference: does it matter?

Granted, micro-optimization is stupid and probably the cause of many mistakes in practice. Be that as it may, I have seen many people do the following:
void function( const double& x ) {}
instead of:
void function( double x ) {}
because it was supposedly "more efficient". Say that function is called ridiculously often in a program, millions of times; does this sort of "optimisation" matter at all?
Long story short no, and particularly not on most modern platforms where scalar and even floating point types are passed via register. The general rule of thumb I've seen bandied about is 128bytes as the dividing line between when you should just pass by value and pass by reference.
Given the fact that the data is already stored in a register you're actually slowing things down by requiring the processor to go out to cache/memory to get the data. That could be a huge hit depending on if the cache line the data is in is invalid.
At the end of the day it really depends on what the platform ABI and calling convention is. Most modern compilers will even use registers to pass data structures if they will fit (e.g. a struct of two shorts etc.) when optimization is turned up.
Passing by reference in this case is certainly not more efficient by itself. Note that qualifying that reference with a const does not mean that the referenced object cannot change. Moreover, it does not mean that the function itself cannot change it (if the referee is not constant, then the function it can legally use const_cast to get rid of that const). Taking that into account, it is clear that passing by reference forces the compiler to take into account possible aliasing issues, which in general case will lead to generation of [significantly] less efficient code in pass-by-reference case.
In order to take possible aliasing out of the picture, one'd have to begin the latter version with
void function( const double& x ) {
double non_aliased_x = x;
// ... and use `non_aliased_x` from now on
...
}
but that would defeat the proposed reasoning for passing by reference in the first place.
Another way to deal with aliasing would be to use some sort of C99-style restrict qualifier
void function( const double& restrict x ) {
but again, even in this case the cons of passing by reference will probably outweigh the pros, as explained in other answers.
In the latter example you save 4B of being copied to stack during function call. It takes 8B to store doubles and only 4B to store a pointer (in 32b environment, in 64b it takes 64b=8B so you don't save anything) or a reference which is nothing more than a pointer with a bit of compiler support.
Unless the function is inlined, and depending on the calling convention (the following assumes stack-based parameter passing, which in modern calling conventions is only used when the function has too many arguments*), there are two differences in how the argument is passed and used:
double: The (probably) 8 byte large value is written onto the stack and read by the function as is.
double & or double *: The value lies somewhere in the memory (might be "near" the current stack pointer, e.g. if it's a local variable, but might also be somewhere far away). A (probably) 4 or 8 byte large pointer address (32 bit or 64 bit system respectively) is stored on the stack and the function needs to dereference the address to read the value. This also requires the value to be in addressable memory, which registers aren't.
This means, the stack space required to pass the argument might be a little bit less when using references. This not only decreases memory requirement but also cache efficiency of the topmost bytes of the stack. When using references, dereferencing adds some piece of work more to do.
To summarize, use references for large types (let's say when sizeof(T) > 32 or maybe even more). When stack size and hotness plays a very important role maybe already if sizeof(T) > sizeof(T*).
*) See the comments on this and SOReader's answer for what's happening if this is not the case.

Performance of vector::size() : is it as fast as reading a variable?

I have do an extensive calculation on a big vector of integers. The vector size is not changed during the calculation. The size of the vector is frequently accessed by the code. What is faster in general: using the vector::size() function or using helper constant vectorSize storing the size of the vector?
I know that compilers usually able to inline the size() function when setting the proper compiler flags, however, making a function inline is something that a compiler may do but can not be forced.
Interesting question.
So, what's going to happened ? Well if you debug with gdb you'll see something like 3 member variables (names are not accurate):
_M_begin: pointer to the first element of the dynamic array
_M_end: pointer one past the last element of the dynamic array
_M_capacity: pointer one past the last element that could be stored in the dynamic array
The implementation of vector<T,Alloc>::size() is thus usually reduced to:
return _M_end - _M_begin; // Note: _Mylast - _Myfirst in VC 2008
Now, there are 2 things to consider when regarding the actual optimizations possible:
will this function be inlined ? Probably: I am no compiler writer, but it's a good bet since the overhead of a function call would dwarf the actual time here and since it's templated we have all the code available in the translation unit
will the result be cached (ie sort of having an unnamed local variable): it could well be, but you won't know unless you disassemble the generated code
In other words:
If you store the size yourself, there is a good chance it will be as fast as the compiler could get it.
If you do not, it will depend on whether the compiler can establish that nothing else is modifying the vector; if not, it cannot cache the variable, and will need to perform memory reads (L1) every time.
It's a micro-optimization. In general, it will be unnoticeable, either because the performance does not matter or because the compiler will perform it regardless. In a critical loop where the compiler does not apply the optimization, it can be a significant improvement.
As I understand the 1998 C++ specification, vector<T>::size() takes constant time, not linear time. So, this question likely boils down to whether it's faster to read a local variable than calling a function that does very little work.
I'd therefore claim that storing your vector's size() in a local variable will speed up your program by a small amount, since you'll only call that function (and therefore the small constant amount of time it takes to execute) once instead of many times.
Performance of vector::size() : is it
as fast as reading a variable?
Probably not.
Does it matter
Probably not.
Unless the work you're doing per iteration is tiny (like one or two integer operations) the overhead will be insignificant.
In every implementation I've, seen vector::size() performs a subtraction of end() and begin(), ie its not as fast as reading a variable.
When implementing a vector, the implementer has to make a choice between which shall be fastest, end() or size(), ie storing the number of initialized elements or the pointer/iterator to the element after the last initialized element.
In other words; iterate by using iterators.
If you are worried of the size() performance, write your index based for loop like this;
for (size_t i = 0, i_end = container.size(); i < i_end; ++i){
// do something performance critical
}
I always save vector.size() in a local variable (if the the size doesn't change inside the loop!).
Calling it on each iteration vs. saving it in a local variable can be faster.
At least, that's what I experienced.
I can't give you any real numbers, as I tested this a very long time ago. However from what I can recall, it made a noticeable difference (however potentially only in debug mode), especially when nesting loops.
And to all the people complaining about micro-optimization:
It's a single additional line of code that introduces no downsides.
You could write yourself a functor for your loop body and call it via std::for_each. It does the iteration for you, and then your question becomes moot. However, you're introducing a function call (that may or may not get inlined) for every loop iteration, so you'd best profile it if you're not getting the performance you expect.
Always get a profile of your application before looking at this sort of micro optimization. Remember that even if it performs a subtraction, the compiler could still easily optimize it in many ways that would negate any performance loss.