Google's style guide about input/output parameters as pointers - c++

The Google C++ Style Guide draws a clear distinction (strictly followed by cpplint.py) between input parameters(→ const ref, value) and input-output or output parameters (→ non const pointers) :
Parameters to C/C++ functions are either input to the function, output
from the function, or both. Input parameters are usually values or
const references, while output and input/output parameters will be
non-const pointers.
And further :
In fact it is a very strong convention in Google code that input arguments are values or const references while output arguments are pointers.
But I can't figure out why input/output arguments (I leave output arguments aside) should not be passed by reference. On stackoverflow there are plenty of topics related to this question : e.g. here, the accepted answer clearly say that
it's mostly about style
but that if
you want to be able to pass null, you must use a pointer
So, what's the point to always demand a pointer if I want to avoid the pointer to be null ? Why only use references for input arguments ?

The reason for demanding that output parameters are passed as pointers is simple:
It makes it clear at the call site that the argument is potentially going to be mutated:
foo(x, y); // x and y won't be mutated
bar(x, &y); // y may be mutated
When a code base evolves and undergoes incremental changes that are reviewed by people who may not know the entire context all the time, it is important to be able to understand the context and impact of a change as quickly as possible. So with this style rule, it is immediately clear whether a change introduces a mutation.

The point they are making (which I disagree with) is that say I have some function
void foo(int a, Bar* b);
If the b argument is optional, or it is unnecessary sometimes, you can call the function like so
foo(5, nullptr);
If the function was declared as
void foo(int a, Bar& b);
Then there is no way to not pass in a Bar.
This point (emphasis mine) is completely opinion-based and up to the developer's discretion.
In fact it is a very strong convention in Google code that input arguments are values or const references while output arguments are pointers.
If I intend for b to be an output parameter, either of the following are perfectly valid and reasonable.
void foo(int a, Bar* b); // The version Google suggests
void foo(int a, Bar& b); // Reference version, also perfectly fine.

You're first question: "So, what's the point to always demand a pointer if I want to avoid the pointer to be null?"
Using a pointer announces to the caller that their variable may be modified. If I am calling foo(bar), is bar going to be modified? If I am calling foo(&bar) it's clear that the value of bar may be modified.
There are many examples of functions which take in a null indicating an optional output parameter (off the top of my head time is a good example.)
Your second question: "Why only use references for input arguments?"
Working with a reference parameter is easier than working with a pointer argument.
int foo(const int* input){
int return = *input;
while(*input < 100){
return *= *input;
(*input)++;
}
}
This code rewritten with a reference looks like:
int foo(const int& input){
int return = input;
while(input < 100){
return *= input;
input++;
}
}
You can see that using a const int& input simplifies the code.

They likely use it for consistency because they use output parameters both as references to existing memory (they're modifying previously initialized variables) and as actual outputs (the output arguments are assumed to be assigned by the function itself). For consistency, they use it as a way to more clearly indicate inputs vs. outputs.
If you never need a function/method to assign the memory of the output parameter, like returning a pointer from a lookup or allocating memory itself and returning it through a pointer, use references. If you need to do that but don't care about using pointers to act as an indication of whether a parameter is input or output, use references for output parameters when appropriate. There's no absolute requirement to use pointers in all cases unless the requirements of that function/method itself requires it.

Related

In C++ Output Parameter vs Return Value

What is/are the difference(s) between output parameters and return values? I've looked everywhere and I can't seem to find a simple definition for either.
The return value is something well defined in C++; for instance, x in return x; is the return value, whereas int in the function declaration int myfunc() is the return type.
The concpet of output paramters is not defined in C++. However someone would interpret it as either of the following:
function parameters passed by non-const reference, as long as you actually modify it in the function, otherwise why would you call it output? An example is x in the following function declaration: void myfunc(int& x);
function parameters passed by (not necessarily const) pointer to non-const, like x in both the following function declarations: void fun1(int * x) and void fun2(int * const x);
concerning this latter case, it allows "encoding" a missing parameter in as a nullptr default value, as in void fun3(int * x = nullptr).
A first difference is aesthetic: which one you like the most?
Another one is that the former concept and syntax help you convey the message that the function is giving back a value, something which is clear at the call site too (whereas at the call site you might not know if a parameter corresponding to an argument is passed by reference or not).
There are several differences between this two ways a function can have "consequences" at the caller site. For instance, you can only have one return value, whereas you can have as many paramters as you like.
Performancewise, complier optimizations can minimize the difference in performance, and maybe you should not worry (yet) about it.

C++ - Reference, Pointers in Arguments

There are many questions about "when do I use reference and when pointers?". They confused me a little bit. I thought a reference wouldn't take any memory because it's just the address.
Now I made a simple Date class and showed them the community of code-review. They told me not to use the reference in the following example. But why?
Someone told me that it'll allocate the same memory a pointer would allocate. That's the opposite of what I learned.
class A{
int a;
public:
void setA(const int& b) { a = b; } /* Bad! - But why?*/
};
class B{
int b;
public:
void setB(int c) { b = c; } /* They told me to do this */
};
So when do I use references or pointers in arguments and when just a simple copy? Without the reference in my example, is the constant unnecessary?
It is not guaranteed to be bad. But it is unnecessary in this specific case.
In many (or most) contexts, references are implemented as pointers in disguise. Your example happens to be one of those cases. Assuming that the function does not get inlined, parameter b will be implemented "under the hood" as a pointer. So, what you really pass into setA in the first version is a pointer to int, i.e. something that provides indirect access to your argument value. In the second version you pass an immediate int, i.e. something that provides direct access to your argument value.
Which is better and which is worse? Well, a pointer in many cases has greater size than an int, meaning that the first variant might passes larger amount of data. This might be considered "bad", but since both data types will typically fit into the hardware word size, it will probably make no appreciable difference, especially if parameters are passed in CPU registers.
Also, in order to read b inside the function you have to dereference that disguised pointer. This is also "bad" from the performance point of view.
These are the formal reasons one would prefer to pass by value any parameters of small size (smaller or equal to pointer size). For parameters or bigger size, passing by const reference becomes a better idea (assuming you don't explicitly require a copy).
However, in most cases a function that simple will probably be inlined, which will completely eliminate the difference between the two variants, regardless of which parameter type you use.
The matter of const being unnecessary in the second variant is a different story. In the first variant that const serves two important purposes:
1) It prevents you from modifying the parameter value, and thus protects the actual argument from modification. If the reference weren't const, you would be able to modify the reference parameter and thus modify the argument.
2) It allows you to use rvalues as arguments, e.g. call some_obj.setA(5). Without that const such calls would be impossible.
In the second version neither of this is an issue. There's no need to protect the actual argument from modification, since the parameter is a local copy of that argument. Regardless of what you do to the parameter, the actual argument will remain unchanged. And you can already use rvalues as arguments to SetA regardless of whether the parameter is declared const or not.
For this reason people don't normally use top-level const qualifiers on parameters passed by value. But if you do declare it const, it will simply prevent you from modifying the local b inside the function. Some people actually like that, since it enforces the moderately popular "don't modify original parameter values" convention, for which reason you might sometimes see top-level const qualifiers being used in parameter declarations.
If you has light-weight type like a int or long you should use passing by value, because there won't be additional costs from work with references. But when you passing some heavy types, you should use references
I agree with the reviewer. And here's why:
A (const or non-const) reference to a small simple type, such as int will be more complex (in terms of number of instructions). This is because the calling code will have to pass the address of the argument into setA, and then inside setA the value has to be dereferenced from the address stored in b. In the case where b is a plain int, it just copies the value itself. So there is at least one step of a memory reference in saving. This may not make much of a difference in a long runtime of a large program, but if you keep adding one extra cycle everywhere you do this, then it does soon add up to noticeably slower.
I had a look at a piece of code that went something like this:
class X
{
vector v;
public:
...
void find(int& index, int b);
....
}
bool X::find(int &index, int b)
{
while(v[index] != b)
{
if (index == v.size()-1)
{
return false;
}
index++;
}
return true;
}
Rewriting this code to:
bool X::find(int &index, int b)
{
int i = index;
while(v[i] != b)
{
if (i == v.size()-1)
{
index = i;
return false;
}
i++;
}
index = i;
return true;
}
meant that this function went from about 30% of the total execution of some code that called find quite a bit, to about 5% of the execution time of the same test. Because the compiler put i in a register, and only updated the reference value when it finished searching.
References are implemented as pointers (that's not a requirement, but it's universally true, I believe).
So in your first one, since you're just passing an "int", passing the pointer to that int will take about the same amount of space to pass (same or more registers, or same or more stack space, depending on your architecture), so there's no savings there. Plus now you have to dereference that pointer, which is an extra operation (and will almost surely cause you to go to memory, which you might not have to do with the second one, again, depending on your architecture).
Now, if what you're passing is much larger than an int, then the first one could be better because you're only passing a pointer. [NB that there are cases where it still might make sense to pass by value even for a very large object. Those cases are usually when you plan to create your own copy anyway. In that case, it's better to let the compiler do the copy, because the overall approach may improve it's ability to optimize. Those cases are very complex, and my opinion is that if you're asking this question, you should study C++ more before you try to tackle them. Although they do make for interesting reading.]
Passing primitives as const-reference does not save you anything. A pointer and an int use the same amount of memory. If you pass a const-reference, the machine will have to allocate memory for a pointer and copy the pointer address, which has the same cost as allocating and copying an integer. If your Date class uses a single 64-bit integer (or double) to store the date, then you don't need to use const-reference. However, if your Data class becomes more complex and stores additional fields, then passing the Date object by const-reference should have a lower cost than passing it by value.

Any efficiency benefit to passing primitive types by reference instead of returning by value?

In C++, is there an efficiency benefit in passing primitive types by reference instead of returning by value?
[...] is there an efficiency benefit to passing primitive types by reference instead of returning by value?
Unlikely. First of all, unless you have data from your profiler that give you a reason for doing otherwise, you should not worry about performance issues when designing your program. Choose the simplest design, and the design that best communicates your intent.
Moreover, primitive types are usually cheap to copy, so this is unlikely to be the bottleneck in your application. And since it is the simplest option and the one that makes the interface of the function clearest, you should pass by value.
Just looking at the signature, it is clear that a function such as:
void foo(int);
Will not store a reference to the argument (and consequently, won't run into issues such as dangling references or pointers), will not alter the argument in a way that is visible to the caller, and so on and so on.
None of the above can be deduced from a function signature like:
void f(int&); // May modify the argument! Will it? Who knows...
Or even:
void f(int const&); // May store a reference! Will it? Who knows...
Besides, passing by value may even improve performance by allowing the compiler to perform optimizations that potential aliasing would prevent.
Of course, all of this is under the assumption that you do not actually need to modify the argument inside the function in a way that side-effects on that argument will be visible to the caller after the function returns - or store a reference to that argument.
If that is the case, then you should of course pass by reference and use the appropriate const qualification.
For a broader discussion, also see this Q&A on StackOverflow.
In general there won't be any performance benefit and there may well be a performance cost. Consider this code:
void foo(const int& a, const int& b, int& res) {
res = a + b;
res *= a;
}
int a = 1, b = 2;
foo(a, b, a);
When a compiler encounters a function like add() it must assume that a and res may alias as in the example call so without global optimizations it will have to generate code that loads a, loads b, then stores the result of a + b to res, then loads a again and performs a multiply, before storing the result back to res.
If instead you'd written your function like this:
int foo(int a, int b) {
int res = a + b;
res *= a;
return res;
}
int a = 1, b = 2;
int c = foo(a, b);
Then the compiler can load a and b into registers (or even pass them directly in registers), do the add and multiply in registers and then return the result (which in many calling conventions can be returned directly in the register it was generated in).
In most cases you actually want the semantics in the pass / return by value version of foo and the aliasing semantics possible in the pass / return by reference version do not really need to be supported. You can end up paying a real performance penalty by using the pass / return by reference version.
Chandler Carruth gave a good talk that touched on this at C++ Now.
There may be some obscure architecture where this is the case, but I'm not aware of any where returning builtin types is less performant than passing an out parameter by reference. You can always examine the relevant assembly to compare if you want.

When is the right time to use *, & or const in C++?

I was studying pointers references and came across different ways to feed in parameters. Can someone explain what each one actually means?
I think the first one is simple, it's that x is a copy of the parameter fed in so another variable is created on the stack.
As for the others I'm clueless.
void doSomething1(int x){
//code
}
void doSomething2(int *x){
//code
}
void doSomething3(int &x){
//code
}
void doSomething3(int const &x){
//code
}
I also see stuff like this when variables are declared. I don't understand the differences between them. I know that the first one will put 100 into the variable y on the stack. It won't create a new address or anything.
//example 1
int y = 100;
//example 2
int *y = 100;
//Example 3: epic confusion!
int *y = &z;
Question 1: How do I use these methods? When is it most appropriate?
Question 2: When do I declare variables in that way?
Examples would be great.
P.S. this is one the main reasons I didn't learn C++ as Java just has garbage collection. But now I have to get into C++.
//example 1
int y = 100;
//example 2
int *y = 100;
//Example 3: epic confusion!
int *y = &z;
I think the problem for most students is that in C++ both & and * have different meanings, depending on the context in which they are used.
If either of them appears after a type within an object declaration (T* or T&), they are type modifiers and change the type from plain T to a reference to a T (T&) or a pointer to a T (T*).
If they appear in front of an object (&obj or *obj), they are unary prefix operators invoked on the object. The prefix & returns the address of the object it is invoked for, * dereferences a pointer, iterator etc., yielding the value it references.
It doesn't help against confusion that the type modifiers apply to the object being declared, not the type. That is, T* a, b; defines a T* named a and a plain T named b, which is why many people prefer to write T *a, b; instead (note the placement of the type-modifying * adjacent the object being defined, instead of the type modified).
Also unhelpful is that the term "reference" is overloaded. For one thing it means a syntactic construct, as in T&. But there's also the broader meaning of a "reference" being something that refers to something else. In this sense, both a pointer T* and a reference (other meaning T&) are references, in that they reference some object. That comes into play when someone says that "a pointer references some object" or that a pointer is "dereferenced".
So in your specific cases, #1 defines a plain int, #2 defines a pointer to an int and initializes it with the address 100 (whatever lives there is probably best left untouched ), and #3 defines another pointer and initializes it with the address of an object z (necessarily an int, too).
A for how to pass objects to functions in C++, here is an old answer from me to that.
From Scott Myers - More Effective C++ -> 1
First, recognize that there is no such thing as a null reference. A reference must always refer to some object.Because a reference must refer to an object, C++ requires that references be initialized.
Pointers are subject to no such restriction. The fact that there is no such thing as a null reference implies that it can be more efficient to use references than to use pointers. That's because there's no need to test the validity of a reference before using it.
Another important difference between pointers and references is that pointers may be reassigned to refer to different objects. A reference, however, always refers to the object with which it is initialized
In general, you should use a pointer whenever you need to take into account the possibility that there's nothing to refer to (in which case you can set the pointer to null) or whenever you need to be able to refer to different things at different times (in which case you can change where the pointer points). You should use a reference whenever you know there will always be an object to refer to and you also know that once you're referring to that object, you'll never want to refer to anything else.
References, then, are the feature of choice when you know you have something to refer to, when you'll never want to refer to anything else, and when implementing operators whose syntactic requirements make the use of pointers undesirable. In all other cases, stick with pointers.
Read S.Lippmann's C++ Premier or any other good C++ book.
As for passing the parameters, generally when copying is cheap we pass by value. For mandatory out parameters we use references, for optional out parameters - pointers, for input parameters where copying is costly, we pass by const references
Thats really complicated topic. Please read here: http://www.goingware.com/tips/parameters/.
Also Scott Meiers "Effective C++" is a top book on such things.
void doSomething1(int x){
//code
}
This one pass the variable by value, whatever happens inside the function, the original variable doesn't change
void doSomething2(int *x){
//code
}
Here you pass a variable of type pointer to integer. So when accessing the number you should use *x for the value or x for the address
void doSomething3(int &x){
//code
}
Here is like the first one, but whatever happens inside the function, the original variable will be changed as well
int y = 100;
normal integer
//example 2
int *y = 100;
pointer to address 100
//Example 3: epic confusion!
int *y = &z;
pointer to the address of z
void doSomething1(int x){
//code
}
void doSomething2(int *x){
//code
}
void doSomething3(int &x){
//code
}
And i am really getting confused between them?
The first is using pass-by-value and the argument to the function will retain its original value after the call.
The later two are using pass-by-reference. Essentially they are two ways of achieving the same thing. The argument is not guarenteed to retain its original value after the call.
Most programmers prefer to pass large objects by const reference to improve the performance of their code and provide a constraint that the value will not change. This ensures the copy constructor is not called.
Your confusion might be due to the '&' operator having two meanings. The one you seem to be familiar with is the 'reference operator'. It is also used as the 'address operator'. In the example you give you are taking the address of z.
A good book to check out that covers all of this in detail is 'Accelerated C++' by Andrew Koening.
The best time to use those methods is when it's more efficient to pass around references as opposed to entire objects. Sometimes, some data structure operations are also faster using references (inserting into a linked list for example). The best way to understand pointers is to read about them and then write programs to use them (and compare them to their pass-by-value counterparts).
And for the record, knowledge of pointers makes you considerably more valuable in the workplace. (all too often, C++ programmers are the "mystics" of the office, with knowledge of how those magical boxes under the desks process code /semi-sarcasm)

Passing integers as constant references versus copying

This might be a stupid question, but I notice that in a good number of APIs, a lot of method signatures that take integer parameters that aren't intended to be modified look like:
void method(int x);
rather than:
void method(const int &x);
To me, it looks like both of these would function exactly the same. (EDIT: apparently not in some cases, see answer by R Samuel Klatchko) In the former, the value is copied and thus can't change the original. In the latter, a constant reference is passed, so the original can't be changed.
What I want to know is why one over the other - is it because the performance is basically the same or even better with the former? e.g. passing a 16-bit value or 32-bit value rather than a 32-bit or 64-bit address? This was the only logical reason I could think of, I just want to know if this is correct, and if not, why and when one should prefer int x over const int &x and vice versa.
It's not just the cost of passing a pointer (that's essentially what a reference is), but also the de-referencing in the called method's body to retrieve the underlying value.
That's why passing an int by value will be virtually guaranteed to be faster (Also, the compiler can optimize and simply pass the int via processor registers, eliminating the need to push it onto the stack).
To me, it looks like both of these would function exactly the same.
It depends on exactly what the reference is to. Here is an admittedly made up example that would change based on whether you pass a reference or a value:
static int global_value = 0;
int doit(int x)
{
++global_value;
return x + 1;
}
int main()
{
return doit(global_value);
}
This code will behave differently depending on whether you have int doit(int) or int doit(const int &)
Integers are usually the size of the processor's native word and can pass easily into a registers. From this perspective, there is no difference between passing by value or passing by constant reference.
When in doubt, print the assembly language listing for your functions to find out how the compiler is passing the argument. Print out for both pass by value and pass by constant reference.
Also, when passing by value, the function can modify the copy. When passing by constant reference, the function cannot modify the variable (it's marked as const).
There will probably be a very, very small de-optimization for passing by reference, since at the very least one dereference will need to occur to get the actual value (unless the call is inlined, the compiler cannot simply pass the value due to the fact that the call site and function might be separately compiled, and it's valid and well-defined to cast away the const for a passed parameter that isn't actually const itself - see What are the benefits to passing integral types by const ref). Note, however, that the 'de-optimization' is likely to be so small as to be difficult to measure.
Most people seem to dislike pass-by-const-ref for built-ins because of this (some very much). However, I think that it it might be preferable in some cases if you want the compiler to assist you in ensuring that the value isn't accidentally changed within the function. It's not a big thing, but sometimes it might help.
Depending on the underlying instruction set, an integer parameter can be passed as register or on the stack. Register is definitely faster than memory access, which would always be required in case of const refs (considering early cache-less architectures)
You cannot pass an int literal as a const int&
Explicit type-casts allow you cast a const int& into * (const int *) opening the possibility to change the value of the passed reference