About the order of input parameters - c++

For a function/method contains many input parameters, does it make a difference if passing-in in different orders? If does, in what aspects (readability, efficiency, ...)? I am more curious about how should I do for my own functions/methods?
It seems to me that:
Parameters passing by references/pointers often come before parameters passing by values. For example:
void* memset( void* dest, int ch, std::size_t count );
Destination parameters often come before source parameters. For example:
void* memcpy( void* dest, const void* src, std::size_t count );
Except for some hard constraints, i.e., parameters with default values must come last. For example:
size_type find( const basic_string& str, size_type pos = 0 ) const;
They are functional equivalent (achieve the same goal) no matter what order they pass in.

There are a few reasons it can matter - listed below. The C++ Standard itself doesn't mandate any particular behaviours in this space, so there's no portable way to reason about performance impact, and even if something's demonstrably (slightly) faster in one executable, a change anywhere in the program, or to the compiler options or version, might remove or even reverse the earlier benefit. In practice it's extremely rare to hear people talk about parameter ordering being of any significance in their performance tuning. If you really care you'd best examine your own compiler's output and/or benchmark resultant code.
Exceptions
The order of evaluation of expressions passed to function parameters is unspecified, and it's quite possible that it could be affected by changes to the order they appear in the source code, with some combinations working better in the CPU execution pipeline, or raising an exception earlier that short-circuits some other parameter preparation. This could be a significant performance factor if some of the parameters are temporary objects (e.g. results of expressions) that are expensive to allocate/construct and destruct/deallocate. Again, any change to the program could remove or reverse a benefit or penalty observed earlier, so if you care about this you should create a named temporary for parameters you want evaluated first before making the function call.
Registers vs cache (stack memory)
Some parameters may be passed in registers, while others are pushed on to the stack - which effectively means entering at least the fastest of the CPU caches, and implies their handling may be slower.
If the function ends up accessing all the parameters anyway, and the choice is between putting parameter X in a register and Y on the stack or vice versa, it doesn't matter much how they're passed, but given the function may have conditions affecting which variables are actually used (if statements, switches, loops that may or may not be entered, early returns or breaks etc.), it's potentially faster if a variable that's not actually needed was on the stack while one that was needed was in a register.
See http://en.wikipedia.org/wiki/X86_calling_conventions for some background and information on calling conventions.
Alignment and padding
Performance could theoretically be affected by the minutae of parameter passing conventions: the parameters may need particular alignment for any - or perhaps just full-speed - access on the stack, and the compiler might choose to pad rather than reorder the values it pushes - it's hard to imagine that being significant unless the data for parameters was on the scale of cache page sizes
Non-performance factors
Some of the other factors you mention can be quite important - for example, I tend to put any non-const pointers and references first, and name the function load_xxx, so I have a consistent expectation of which parameters may be modified and which order to pass them. There's no particularly dominant convention though.

Strictly speaking it doesn't matter - parameters are pushed onto the stack and the function accessing them by taking them from the stack in some way.
Howver, most C/C++ compilers allow you to specify alternative calling conventions. For example, Visual C++ supports the __fastcall convention which stores the first 2 parameters in the ECX and EDX registers, which (in theory) should give you a performance improvement in the right circumstances.
There's also __thiscall which stores the this pointer in the ECX register. If you're doing C++ then this may be useful.

There are some answers here mentioning calling conventions. They have nothing to do with your question: No matter what calling convention you use, the order in which you declare the parameters doesn't matter. It doesn't matter which parameters are passed by registers and which are passed by stack, as long as the same number of parameters are passed by registers and the same amount of parameters are passed by stack. Please note that parameters that are higher in size than the native architecture size (4-bytes for 32-bit and 8-byte for 64-bit) are passed by an address, so they are passed with the same speed as a smaller size data.
Let's take an example:
You have a function with 6 parameters. And you have a calling convention, lets call it CA, that passes one parameter by register and the rest (5 in this case) by stack, and a second calling convention, lets call it CB, that passes 4 parameters by registers and the rest (in this case 2) by stack.
Now, of course that CA will be faster than CB, but it has nothing to do with the order the parameters are declared. For CA, it will be as fast no matter which parameter you declare first (by register) and which you declare 2nd, 3rd..6th (stack), and for CB it will be as fast no matter which 4 arguments you declare for registers and which you declare as last 2 parameters.
Now, regarding your question:
The only rule that is mandatory is that optional parameters must be declared last. No non-optional parameter can follow an optional parameter.
Other than that, you can use whatever order you want, and the only strong advice I can give you is be consistent. Choose a model and stick to it.
Some guidelines you could consider:
destination comes before source. This is to be close to destination = source.
the size of the buffer comes after the buffer: f(char * s, unsigned size)
input parameters first, output parameters last (this conflicts with the first one I gave you)
But there is no "wrong" or "right" or even a universal accepted guideline for the order of the parameters. Choose something and be consistent.
Edit
I thought of a "wrong" way to order you parameters: by alphabetic order :).
Edit 2
For example, both for CA, if I pass a vector(100) and a int, it will be better if vector(100) comes first, i.e. use registers to load larger data type. Right?
No. As I have mentioned it doesn't matter the data size. Let's talk on a 32-bit architecture (the same discussion is valid for any architecture 16-bit, 64-bit etc). Let's analyze the 3 case we can have regarding the size of the parameters in relation with the native size of the architecture.
Same size: 4-bytes parameters. Nothing to talk here.
Smaller size: a 4-bytes register will be used or 4-bytes will be allocated on stack. So nothing interesting here either.
Larger size: (e.g. a struct with many fields, or a static array). No matter which method is chosen for passing this argument, this data resides in memory, and what is passed is a pointer (size 4-bytes) to that data. Again we have a 4-bytes register or 4-bytes on the stack.
It doesn't matter the size of the parameters.
Edit 3
How #TonyD explained, the order matters if you don't access all the parameters. See his answer.

I have somehow found a few related pages.
https://softwareengineering.stackexchange.com/questions/101346/what-is-best-practice-on-ordering-parameters-in-a-function
https://google.github.io/styleguide/cppguide.html#Function_Parameter_Ordering
So first Google's C++ style does not really answer the question since it fails to answer the actual order within the input parameters or output parameters.
The other page basically suggests that order parameters in a sense easy to be understood and used.
For the sake of readability, I personally prefer to order parameter based on alphabet order. But you can also work on some strategy to name the parameters to be nicely ordered so that they could be still easy to be understood and used.

Related

Performance cost of passing by value vs. by reference or by pointer?

Let's consider an object foo (which may be an int, a double, a custom struct, a class, whatever). My understanding is that passing foo by reference to a function (or just passing a pointer to foo) leads to higher performance since we avoid making a local copy (which could be expensive if foo is large).
However, from the answer here it seems that pointers on a 64-bit system can be expected in practice to have a size of 8 bytes, regardless of what's being pointed. On my system, a float is 4 bytes. Does that mean that if foo is of type float, then it is more efficient to just pass foo by value rather than give a pointer to it (assuming no other constraints that would make using one more efficient than the other inside the function)?
It depends on what you mean by "cost", and properties of the host system (hardware, operating system) with respect to operations.
If your cost measure is memory usage, then the calculation of cost is obvious - add up the sizes of whatever is being copied.
If your measure is execution speed (or "efficiency") then the game is different. Hardware (and operating systems and compiler) tend to be optimised for performance of operations on copying things of particular sizes, by virtue of dedicated circuits (machine registers, and how they are used).
It is common, for example, for a machine to have an architecture (machine registers, memory architecture, etc) which result in a "sweet spot" - copying variables of some size is most "efficient", but copying larger OR SMALLER variables is less so. Larger variables will cost more to copy, because there may be a need to do multiple copies of smaller chunks. Smaller ones may also cost more, because the compiler needs to copy the smaller value into a larger variable (or register), do the operations on it, then copy the value back.
Examples with floating point include some cray supercomputers, which natively support double precision floating point (aka double in C++), and all operations on single precision (aka float in C++) are emulated in software. Some older 32-bit x86 CPUs also worked internally with 32-bit integers, and operations on 16-bit integers required more clock cycles due to translation to/from 32-bit (this is not true with more modern 32-bit or 64-bit x86 processors, as they allow copying 16-bit integers to/from 32-bit registers, and operating on them, with fewer such penalties).
It is a bit of a no-brainer that copying a very large structure by value will be less efficient than creating and copying its address. But, because of factors like the above, the cross-over point between "best to copy something of that size by value" and "best to pass its address" is less clear.
Pointers and references tend to be implemented in a similar manner (e.g. pass by reference can be implemented in the same way as passing a pointer) but that is not guaranteed.
The only way to be sure is to measure it. And realise that the measurements will vary between systems.
There is one thing nobody mentioned.
There is a certain GCC optimization called IPA SRA, that replaces "pass by reference" with "pass by value" automatically: https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html (-fipa-sra)
This is most likely done for scalar types (eg. int, double, etc), that does not have non-default copy semantics and can fit into cpu registers.
This makes
void(const int &f)
probably as fast (and space optimized)
void(int f)
So with this optimization enabled, using references for small types should be as fast as passing them by value.
On the other hand passing (for example) std::string by value could not be optimized to by-reference speed, as custom copy semantics are being involved.
From what I understand, using pass by reference for everything should never be slower than manually picking what to pass by value and what to pass by reference.
This is extremely useful especially for templates:
template<class T>
void f(const T&)
{
// Something
}
is always optimal
You must test any given scenario where performance is absolutely critical, but be very careful about trying to force the compiler to generate code in a specific way.
The compiler's optimizer is allowed to re-write your code in any way it chooses as long as the final result is the provably same, which can lead to some very nice optimizations.
Consider that passing a float by value requires making a copy of the float, but under the right conditions, passing a float by reference could allow storing the original float in a CPU floating-point register, and treat that register as the "reference" parameter to the function. By contrast, if you pass a copy, the compiler has to find a place to store the copy in order to preserve the contents of the register, or even worse, it may not be able to use a register at all because of the need for preserving the original (this is especially true in recursive functions!).
This difference is also important if you are passing the reference to a function that could be inlined, where the reference may reduce the cost of inlining since the compiler doesn't have to guarantee that a copied parameter cannot modify the original.
The more a language allows you to focus on describing what you want done rather than how you want it done, the more the compiler is able to find creative ways of doing the hard work for you. In C++ especially, it is generally best not to worry about performance, and instead focus on describing what you want as clearly and simply as possible. By trying to describe how you want the work done, you will just as often prevent the compiler from doing its job of optimizing your code for you.
Does that mean that if foo is of type float, then it is more efficient to just pass foo by value?
Passing a float by value could be more efficient. I would expect it to be more efficient - partly because of what you said: A float is smaller than a pointer on a system that you describe. But in addition, when you copy the pointer, you still need to dereference the pointer to get the value within the function. The indirection added by the pointer could have a significant effect on the performance.
The efficiency difference could be negligible. In particular, if the function can be inlined and optimization is enabled, there is likely not going to be any difference.
You can find out if there is any performance gain from passing the float by value in your case by measuring. You can measure the efficiency with a profiling tool.
You may substitute pointer with reference and the answer will still apply equally well.
Is there some sort of overhead in using a reference, the way that there is when a pointer must be dereferenced?
Yes. It is likely that a reference has exactly the same performance characteristics as a pointer does. If it is possible to write a semantically equivalent program using either references or pointers, both are probably going to generate identical assembly.
If passing a small object by pointer would be faster than copying it, then surely it would be true for an object of same size, wouldn't you agree? How about a pointer to a pointer, that's about the size of a pointer, right? (It's exactly the same size.) Oh, but pointers are objects too. So, if passing an object (such as a pointer) by pointer is faster than copying the object (the pointer), then passing a pointer to a pointer to a pointer to a pointer ... to a pointer would be faster than the progarm with less pointers that's still faster than the one that didn't use pointers... Perhap's we've found an infinite source of efficiency here :)
Always prioritize pass by reference than pointers if you want an optimized execution time to avoid random access. For pass by references vs by value, the GCC optimize your code such that small variable that do not need to be changed will be passed by value.
Can't believe that no one brought up the correct answer yet.
On a 64 bit system passing 8 bytes or 4 bytes has exactly the same cost. The reason for this is that the data bus is 64 bit wide (which is 8 bytes) and thus even if you pass only 4 bytes - it doesn't make a difference for the machine: The data bus is 8 bytes wide.
The cost only increases if you want to move more than 64 bit. Everything equal or below 64 bits comes at the same number of clock cycles.

Does const call by reference improve performance when applied to primitive types?

Concerning objects (especially strings), call by reference is faster than call-by-value because the function call does not need to create a copy of the original object. Using const, one can also ensure that the reference is not abused.
My question is whether const call-by-reference is also faster if using primitive types, like bool, int or double.
void doSomething(const string & strInput, unsigned int iMode);
void doSomething(const string & strInput, const unsigned int & iMode);
My suspicion is that it is advantageous to use call-by-reference as soon as the primitive type's size in bytes exceeds the size of the address value. Even if the difference is small, I'd like to take the advantage because I call some of these functions quite often.
Additional question: Does inlining have an influence on the answer to my question?
My suspicion is that it is advantageous to use call-by-reference as soon as the primitive type's size in bytes exceeds the size of the address value. Even if the difference is small, I'd like to take the advantage because I call some of these functions quite often.
Performance tweaking based on hunches works about 0% of the time in C++ (that's is a gut feeling I have about statistics, it works usually...)
It is correct that the const T& will be smaller than the T if sizeof(T) > sizeof(ptr), so usually 32-bits, or 64, depending on the system..
Now ask yourself :
1) How many built-in types are bigger than 64 bits ?
2) Is not copying 32-bits worth making the code less clear ? If your function becomes significantly faster because you didn't copy a 32bit value to it, maybe it doesn't do much ?
3) Are you really that clever ? (spoiler alert : no.) See this great answer for the reason why it is almost always a bad idea :
https://stackoverflow.com/a/4705871/1098041
Ultimately just pass by value. If after (thorough) profiling you identify that some function is a bottleneck, and all of the other optimizations that you tried weren't enough (and you should try most of them before this), pass-by-const-reference.
Then See that it doesn't change anything. roll-over and cry.
I addition to the other answers I would like to note that when you pass a reference and use (aka dereference) that a lot in your function, it could be slower than making a copy.
This is because local variables to a function (usually) get loaded into the cache together, but when one of them is a pointer/reference and the function uses that, it could result in a cache miss. Meaning it needs to go to the (slower) main memory to get the pointed to variable, which could be slower than making the copy which is loaded in cache together with the function.
So even for 'small objects' it could be potentially faster to just pass by value.
(I read this in the very good book: Computer Systems: a programmers perspective)
Some more interesting discussion on the whole cache hit/miss topic: How does one write code that best utilizes the CPU cache to improve performance?
On a 64-bit architecture, there is no primitive type---at least not in C++11---which is larger than a pointer/reference. You should test this, but intuitively, there should be the same amount of data shuffled around for a const T& as for an int64_t, and less for any primitive where sizeof(T) < sizeof(int64_t). Therefore, insofar as you can measure any difference, passing primitives by value should be faster if your compiler is doing the obvious thing---which is why I stress that if you need certainty here, you should write a test case.
Another consideration is that primitive function parameters can end up in CPU registers, which makes accessing them as fast as a memory access can be. You might find there are more instructions being generated for your const T& parameters than for your T parameters when T is a primitive. You could test this by checking the assembler output by your compiler.
I was taught:
Pass by value when an argument variable is one of the fundamental built-in types, such as bool, int, or float. Objects of these types are so small that passing by reference doesn't result in any gain in efficiency. Also if you want to make a copy of a variable.
Pass a constant reference when you want to efficiently pass a value that you don't need to change.
Pass a reference only when you want to alter the value of the argument variable. But to try to avoid changing argument variables whenever possible.
const is a keyword which is evaluated at compiletime. It does not have any impact on runtime performance. You can read some more about this here: https://isocpp.org/wiki/faq/const-correctness

How many arguments can theoretically be passed as parameters in c++ functions?

I was wondering if there was a limit on the number of parameters you can pass to a function.
I'm just wondering because I have to maintain functions of 5+ arguments here at my jobs.
And is there a critical threshold in nbArguments, talking about performance, or is it linear?
Neither the C nor C++ standard places an absolute requirement on the number of arguments/parameters you must be able to pass when calling a function, but the C standard suggests that an implementation should support at least 127 parameters/arguments (§5.2.4.1/1), and the C++ standard suggests that it should support at least 256 parameters/arguments (§B/2).
The precise wording from the C standard is:
The implementation shall be able to translate and execute at least one program that
contains at least one instance of every one of the following limits.
So, one such function must be successfully translated, but there's no guarantee that if your code attempts to do so that compilation will succeed (but it probably will, in a modern implementation).
The C++ standard doesn't even go that far, only going so far as to say that:
The bracketed number following each quantity is recommended as the minimum for that quantity. However, these quantities are only guidelines and do not determine compliance.
As far as what's advisable: it depends. A few functions (especially those using variadic parameters/variadic templates) accept an arbitrary number of arguments of (more or less) arbitrary types. In this case, passing a relatively large number of parameters can make sense because each is more or less independent from the others (e.g., printing a list of items).
When the parameters are more...interdependent, so you're not just passing a list or something on that order, I agree that the number should be considerably more limited. In C, I've seen a few go as high as 10 or so without being terribly unwieldy, but that's definitely starting to push the limit even at best. In C++, it's generally enough easier (and more common) to aggregate related items into a struct or class that I can't quite imagine that many parameters unless it was in a C-compatibility layer or something on that order, where a more...structured approach might force even more work on the user.
In the end, it comes down to this: you're going to either have to pass a smaller number of items that are individually larger, or else break the function call up into multiple calls, passing a smaller number of parameters to each.
The latter can tend to lead toward a stateful interface, that basically forces a number of calls in a more or less fixed order. You've reduced the complexity of a single call, but may easily have done little or nothing to reduce the overall complexity of the code.
In the other direction, a large number of parameters may well mean that you've really defined the function to carry out a large number of related tasks instead of one clearly defined task. In this case, finding more specific tasks for individual functions to carry out, and passing a smaller set of parameters needed by each may well reduce the overall complexity of the code.
It seems like you're veering into subjective territory, considering that C varargs are (usually) passed mechanically the same way as other arguments.
The first few arguments are placed in CPU registers, under most ABIs. How many depends on the number of architectural registers; it may vary from two to ten. In C++, empty classes (such as overload dispatch tags) are usually omitted entirely. Loading data into registers is usually "cheap as free."
After registers, arguments are copied onto the stack. You could say this takes linear time, but such operations are not all created equal. If you are going to be calling a series of functions on the same arguments, you might consider packaging them together as a struct and passing that by reference.
To literally answer your question, the maximum number of arguments is an implementation-defined quantity, meaning that the ISO standard requires your compiler manual to document it. The C++ standard also recommends (Annex B) that no implementation balk at less than 256 arguments, which should be Enough For Anyone™. C requires (§5.2.4.1) support for at least 127 arguments, although that requirement is normatively qualified such as to weaken it to only a recommendation.
It is not really dirty, sometimes you can't avoid using 4+ arguments while maintaining stability and efficiency. If possible it should be minimized for sake of clarity (perhaps by use of structs), especially if you think that some function is becoming a god construct (function that runs most of the program, they should be avoided for sake of stability). If this is the case, functions that take larger numbers of arguments are pretty good indicators of such constructs.

Passing scalar types by value or reference: does it matter?

Granted, micro-optimization is stupid and probably the cause of many mistakes in practice. Be that as it may, I have seen many people do the following:
void function( const double& x ) {}
instead of:
void function( double x ) {}
because it was supposedly "more efficient". Say that function is called ridiculously often in a program, millions of times; does this sort of "optimisation" matter at all?
Long story short no, and particularly not on most modern platforms where scalar and even floating point types are passed via register. The general rule of thumb I've seen bandied about is 128bytes as the dividing line between when you should just pass by value and pass by reference.
Given the fact that the data is already stored in a register you're actually slowing things down by requiring the processor to go out to cache/memory to get the data. That could be a huge hit depending on if the cache line the data is in is invalid.
At the end of the day it really depends on what the platform ABI and calling convention is. Most modern compilers will even use registers to pass data structures if they will fit (e.g. a struct of two shorts etc.) when optimization is turned up.
Passing by reference in this case is certainly not more efficient by itself. Note that qualifying that reference with a const does not mean that the referenced object cannot change. Moreover, it does not mean that the function itself cannot change it (if the referee is not constant, then the function it can legally use const_cast to get rid of that const). Taking that into account, it is clear that passing by reference forces the compiler to take into account possible aliasing issues, which in general case will lead to generation of [significantly] less efficient code in pass-by-reference case.
In order to take possible aliasing out of the picture, one'd have to begin the latter version with
void function( const double& x ) {
double non_aliased_x = x;
// ... and use `non_aliased_x` from now on
...
}
but that would defeat the proposed reasoning for passing by reference in the first place.
Another way to deal with aliasing would be to use some sort of C99-style restrict qualifier
void function( const double& restrict x ) {
but again, even in this case the cons of passing by reference will probably outweigh the pros, as explained in other answers.
In the latter example you save 4B of being copied to stack during function call. It takes 8B to store doubles and only 4B to store a pointer (in 32b environment, in 64b it takes 64b=8B so you don't save anything) or a reference which is nothing more than a pointer with a bit of compiler support.
Unless the function is inlined, and depending on the calling convention (the following assumes stack-based parameter passing, which in modern calling conventions is only used when the function has too many arguments*), there are two differences in how the argument is passed and used:
double: The (probably) 8 byte large value is written onto the stack and read by the function as is.
double & or double *: The value lies somewhere in the memory (might be "near" the current stack pointer, e.g. if it's a local variable, but might also be somewhere far away). A (probably) 4 or 8 byte large pointer address (32 bit or 64 bit system respectively) is stored on the stack and the function needs to dereference the address to read the value. This also requires the value to be in addressable memory, which registers aren't.
This means, the stack space required to pass the argument might be a little bit less when using references. This not only decreases memory requirement but also cache efficiency of the topmost bytes of the stack. When using references, dereferencing adds some piece of work more to do.
To summarize, use references for large types (let's say when sizeof(T) > 32 or maybe even more). When stack size and hotness plays a very important role maybe already if sizeof(T) > sizeof(T*).
*) See the comments on this and SOReader's answer for what's happening if this is not the case.

Pass parameters as array?

Is it better (performance …) to pass parameters as array
template<typename Number>
static int solveQuadraticFunction(Number* dst, const Number* src)
{
Number a=src[0], b=src[1], c=src[2];
// …
}
or the "standard" way
template<typename Number>
static int solveQuadraticFunction(Number* dst, Number a, Number b, Number c)
{
// …
}
All questions regarding performance here wind up having the same answer - implement both solutions and time them. Having said that, I can''t see any reason the array version would be significantly faster or slower, but it sure is less convenient and less readable.
Probably not.
It depends on the arrangement of the arguments before the call. In the former case, you're requiring that the arguments be arranged into an array before the call, which might already be the case, or it might not; if it is already the case, and there are a large number of arguments, then it may be more optimal simply because it doesn't require the values to be assembled on the stack. However, it might result on the values simply being copied from the array onto the stack inside the called function instead of outside it, depending on how you then access the arguments (the specific example you give looks problematic: you define local variables and assign them from array elements; local variables typically live on the stack, though the compiler may be able to optimize them away).
Of course if the argument are not already arranged in an array before the call, then there is no gain (and there is probably at least a slight penalty) because you have to find somewhere to store the arguments as an array - which might involve memory allocation/deallocation - and then the arguments must be accessed indirectly via a pointer, which also has a slight cost.
I doubt you'll see any difference in performance; given appropriate optimization settings and inlining, the compiler should be able to decide which one to use.
The second option should be preferred when there's a small number of arguments, since it allows the compiler to check the number of items passed in.
(But concerning your example, be sure to use references: Number const &a=src[0], &b=src[1], &c=src[2]. These will be optimized away by any half-decent compiler.)
You will be pushing 2 more variables onto the stack when you call the second method than the first.
But it probably makes very little difference unless this was running in a very tight loop.
This depends on many things and is of course hardware/platform-dependent. In general, for many parameters (> 4) the array method is probably more efficient. If the parameter type doesn't fit into a CPU register, passing as an array should always be more efficient.
Ofcourse it's better to pass the address of the first element rather than pushing all the elements. You can either pass the pointer or the array by reference as below:
template<typename Number, unsigned int SIZE>
static int solveQuadraticFunction(Number* dst, Number (&src)[SIZE])
{
// … src is passed by reference
}