Is there a way to declare a variable is non-aliased in clang to allow for more optimizations where the variable is used?
I understand restrict can be used to declare pointers as non-aliasing.
However, I'm also wondering about variables which can be pointer into. I guess (perhaps wrongfully) that the compiler has to be careful about assuming things which can allow it to cache a variable's value instead of re-fetching it each time.
Example:
class Data
{
public:
void updateVal() {
// Updates m_val with some value each time it's called (value may differ across different calls)
...
}
int complicatedCalculation() const {
return 3 * m_val + 2;
}
int m_val;
};
class User
{
User(Data& data) : m_data{data} {}
void f()
{
m_data.updateVal();
for (int i=0; i<1000; ++i)
g();
}
void g()
{
// Will the optimizer be able to cache calc's value for use in all the calls to g() from f()?
int calc = m_data.complicatedCalculation();
// Do more work
...
}
Data& m_data;
};
Even if the answer to the question in the sample code is "yes", might it not change to "no" if the code were more complicated (e.g. work being under // Do more work), due to a possibility of a pointer contents being modified where the pointer might have pointed into m_data.m_val? Or is this something the compiler assumes never happens, unless it sees the address of m_val being taken somewhere in the code?
If it doesn't assume that, or even it does but the address of m_val does get taken somewhere (but we know its contents won't be modified), then it would be nice to be able to mark m_val as "safe" from aliasing concerns, so its value can be assumed to not be changed by pointer access.
The compiler will allocate a register to store calcin g unless it determines that there are other hotter variables that would be better to be stored in registers.
Now even if calc is stored in a register, this may still require a function call to complicatedCalculation and a memory access to m_val. The compiler may inline complicatedCalculation and eliminate the function call but it cannot eliminate the memory access unless it can determine that m_val is effectively a constant all the time.
What you really want is to eliminate the unnecessary memory accesses to m_val in the loop in f rather than in g. For this to happen, the compiler has to deem that g is eligible for inlining in f. Only when it's inlined can the compiler eliminate the unnecessary memory accesses. Even if g directly modifies m_val, the compiler can still allocate calc in a register and modifies it accordingly. The only caveat here is when g may throw an exception. If an exception is ever thrown, the in-memory version of m_val has to be updated to the latest value before the exception is allowed to propagate. The compiler has to emit code to ensure this. Without this code, it has to update the in-memory version of m_val in every iteration. I don't know which version of clang uses which approach. You have to examine the generated assembly code.
If the address of m_value is taken anywhere in the code, the compiler may not be able to eliminate any memory accesses to it. In this case, using restrict may help. m_value should not be modified through any other pointer because this violates the standard and results in undefined behavior. It's your responsibility to ensure that.
I hope that you care about this either because you have experimentally determined that this is a performance bottleneck in the code or you are just curious, rather than for any other reason.
Related
A local variable (say an int) can be stored in a processor register, at least as long as its address is not needed anywhere. Consider a function computing something, say, a complicated hash:
int foo(int const* buffer, int size)
{
int a; // local variable
// perform heavy computations involving frequent reads and writes to a
return a;
}
Now assume that the buffer does not fit into memory. We write a class for computing the hash from chunks of data, calling foo multiple times:
struct A
{
void foo(int const* buffer, int size)
{
// perform heavy computations involving frequent reads and writes to a
}
int a;
};
A object;
while (...more data...)
{
A.foo(buffer, size);
}
// do something with object.a
The example may be a bit contrived. The important difference here is that a was a local variable in the free function and now is a member variable of the object, so the state is preserved across multiple calls.
Now the question: would it be legal for the compiler to load a at the beginning of the foo method into a register and store it back at the end? In effect this would mean that a second thread monitoring the object could never observe an intermediate value of a (synchronization and undefined behavior aside). Provided that speed is a major design goal of C++, this seems to be reasonable behavior. Is there anything in the standard that would keep a compiler from doing this? If no, do compilers actually do this? In other words, can we expect a (possibly small) performance penalty for using a member variable, aside from loading and storing it once at the beginning and the end of the function?
As far as I know, the C++ language itself does not even specify what a register is. However, I think that the question is clear anyway. Whereever this matters, I appreciate answers for a standard x86 or x64 architecture.
The compiler can do that if (and only if) it can prove that nothing else will access a during foo's execution.
That's a non-trivial problem in general; I don't think any compiler attempts to solve it.
Consider the (even more contrived) example
struct B
{
B (int& y) : x(y) {}
void bar() { x = 23; }
int& x;
};
struct A
{
int a;
void foo(B& b)
{
a = 12;
b.bar();
}
};
Looks innocent enough, but then we say
A baz;
B b(baz.a);
baz.foo(b);
"Optimising" this would leave 12 in baz.a, not 23, and that is clearly wrong.
Short answer to "Can a member variable (attribute) reside in a register?": yes.
When iterating through a buffer and writing the temporary result to any sort of primitive, wherever it resides, keeping the temporary result in a register would be a good optimization. This is done frequently in compilers. However, it is implementation based, even influenced by passed flags, so to know the result, you should check the generated assembly.
I need a once-and-for-all clarification on passing by value/pointer/reference.
If I have a variable such as
int SomeInt = 10;
And I want to pass it to a function like
void DoSomething(int Integer)
{
Integer = 1;
}
In my current scenario when passing SomeInt to DoSomething() I want SomeInt's value to be updated based on whatever we do to it inside of DoSomething() as well as be most efficient on memory and performance so I'm not copying the variable around?. That being said which of the following prototypes would accomplish this task?
void DoSomething(int* Integer);
void DoSomething(int& Integer);
How would I actually pass the variable into the function? What is the difference between the previous two prototypes?
Finally if using a function within a class
class SomeClass
{
int MyInteger;
public:
void ChangeValue(int& NewValue)
{
MyInteger = NewValue;
}
};
If I pass an integer into ChangeValue, when the integer I passed in get's deleted will that mean when I try to use MyInteger from within the class it will no longer be useable?
Thank you all for your time, I know this is kind of a basic question but the explanations I keep running into confuse me further.
Functionally, all three of these work:
pass an int and change the return type to int so you can return the new value, usage: x = f(x);
when you plan to set the value without needing to read the initial value, it's much better to use a function like int DoSomething(); so the caller can just say int x = f(); without having to create x on an earlier line and wondering/worrying whether it needs to be initialised to anything before the call.
pass an int& and set it inside the function, usage: int x; x = ? /* if an input */; f(x);
pass an int* and set the pointed-to int inside the function, usage: int x; x = ?; f(&x);
most efficient on memory and performance so I'm not copying the variable around
Given the C++ Standard doesn't dictate how references should be implemented by the compiler, it's a bit dubious trying to reason about their characteristics - if you care compile your code to assembly or machine code and see how it works out on your particular compiler (for specific compiler commandline options etc.). If you need a rule of thumb, assume that references have identical performance characteristics to pointers unless profiling or generated-code inspection suggests otherwise.
For an int you can expect the first version above to be no slower than the pointer version, and possibly be faster, because the int parameter can be passed and returned in a register without ever needing a memory address.
If/when/where the by-pointer version is inlined there's more chance that the potentially slow "needing a memory address so we can pass a pointer" / "having to dereference a pointer to access/update the value" aspect of the pass-by-pointer version can be optimised out (if you've asked the compiler to try), leaving both versions with identical performance....
Still, if you need to ask a question like this I can't imagine you're writing code where these are the important optimisation choices, so a better aim is to do what gives you the cleanest, most intuitive and robust usage for the client code... now - whether that's x = f(x); (where you might forget the leading x =), or f(x) where you might not realise x could be modified, or f(&x) (where some caller might think they can pass nullptr is a reasonable question in its own right, but separate from your performance concerns. FWIW, the C++ FAQ Lite recommends references over pointers for this kind of situation, but I personally reject its reasoning and conclusions - it all boils down to familiarity with either convention, and how often you need to pass const pointer values, or pointer values where nullptr is a valid sentinel, that could be confused with the you-may-modify-me implication hoped for in your scenario... that depends a lot on your coding style, libraries you use, problem domain etc..
Both of your examples
void DoSomething(int* Integer);
void DoSomething(int& Integer);
will accomplish the task. In the first case - with pointer - you need to call the function with DoSomething(&SomeInt);, in the second case - with reference - simpler as DoSomething(SomeInt);
The recommended way is to use references whenever they are sufficient, and pointers only if they are necessary.
You can use either. Function call for first prototype would be
DoSomething(&SomeInt);
and for second prototype
DoSomething(SomeInt);
As was already said before, you can use both. The advantage of the
void DoSomething(int* Integer)
{
*Integer=0xDEADBEEF;
}
DoSomething(&myvariable);
pattern is that it becomes obvious from the call that myvariable is subject to change.
The advantage of the
void DoSomething(int& Integer)
{
Integer=0xDEADBEEF;
}
DoSomething(myvariable);
pattern is that the code in DoSomething is a bit cleaner, DoSomething has a harder time to mess with memory in bad ways and that you might get better code out of it. Disadvantage is that it isn't immediately obvious from reading the call that myvariable might get changed.
Do built-in types which are not defined dynamically, always stay in the same piece of memory during the duration of the program?
If it's something I should understand how do I go about and check it?
i.e.
int j = 0;
double k = 2.2;
double* p = &k;
Does the system architecture or compiler move around all these objects if a C/C++ program is, say, highly memory intensive?
Note: I'm not talking about containers such as std::vectors<T>. These can obviously reallocate in certain situations, but again this is dynamic.
side question:
The following scenario will obviously raise a few eyebrows. Just as an example, will this pointer always be valid during the duration of the program?
This side-question is obsolete, thanks to my ignorance!
struct null_deleter
{
void operator() (void const *) const {};
};
int main()
{
// define object
double b=0;
// define shared pointer
std::shared_ptr<double> ptr_store;
ptr_store.reset(&b,null_deleter()); // this works and behaves how you would expect
}
In the abstract machine, an object's address does not change during that object's lifetime.
(The word "object" here does not refer to "object-oriented" anything; an "object" is merely a region of storage.)
That really means that a program must behave as if an object's address never changes. A compiler can generate code that plays whatever games it likes, including moving objects around or not storing them anywhere at all, as long as such games don't affect the visible behavior in a way that violates the standard.
For example, this:
int n;
int *addr1 = &n;
int *addr2 = &n;
if (addr1 == addr2) {
std::cout << "Equal\n";
}
must print "Equal" -- but a clever optimizing compiler could legally eliminate everything but the output statement.
The ISO C standard states this explcitly, in section 6.2.4:
The lifetime of an object is the portion of program execution during
which storage is guaranteed to be reserved for it. An object exists,
has a constant address, and retains its last-stored value throughout
its lifetime.
with a (non-normative) footnote:
The term "constant address" means that two pointers to the object
constructed at possibly different times will compare equal. The
address may be different during two different executions of the same
program.
I haven't found a similar explicit statement in the C++ standard; either I'm missing it, or the authors considered it too obvious to bother stating.
The compiler is free to do whatever it wants, so long as it doesn't affect the observable program behaviour.
Firstly, consider that local variables might not even get put in memory (they might get stored in registers only, or optimized away entirely).
So even in your example where you take the address of a local variable, that doesn't mean that it has to live in a fixed location in memory. It depends what you go on to do with it, and whether the compiler is smart enough to optimize it. For example, this:
double k = 2.2;
double *p = &k;
*p = 3.3;
is probably equivalent to this:
double k = 3.3;
Yes and no.
Global variables will stay in the same place.
Stack variables (inside a function) will get allocated and deallocated each time the function is called and returns. For example:
void k(int);
void f() {
int x;
k(x);
}
void g() {
f();
}
int main() {
f();
g();
}
Here, the second time f() is called, it's x will be in a different location.
There are several answers to this question, depending on factors you haven't mentioned.
If a data object's address is never taken, then a conforming C program cannot tell whether or not it even has an address. It might exist only in registers, or be optimized completely out; if it does exist in memory, it need not have a fixed address.
Data objects with "automatic" storage duration (to first approximation, function-local variables not declared with static) are created each time their containing function is invoked and destroyed when it exits; there may be multiple copies of them at any given time, and there's no guarantee that a new instance of one has the same address as an old one.
We speak of the & operator as "taking the address" of a data object, but technically speaking that's not what it does. It constructs a pointer to that data object. Pointers are opaque entities in the C standard. If you inspect the bits (by converting to integer) the result is implementation-defined. And if you inspect the bits twice in a row there is no guarantee that you get the same number! A hypothetical garbage-collected C implementation could track all pointers to each datum and update them as necessary when it moved the heap around. (People have actually tried this. It tends to break programs that don't stick to the letter of the rules.)
In C++ if I get and return the address of a variable and the caller then immediately dereferences it, will the compiler reliably optimize out the two operations?
The reason I ask is I have a data structure where I'm using an interface similar to std::map where find() returns a pointer (iterator) to a value, and returns NULL (there is no trivial .end() equivalent) to indicate that the value has not been found.
I happen to know that the variables being stored are pointers, so returning NULL works fine even if I returned the value directly, but it seems that returning a pointer to the value is more general. Otherwise if someone tried to store an int there that was actually 0 the data structure would claim it isn't there.
However, I'm wondering if there's even any loss in efficiency here, seeing as the compiler should optimize away actions that just undo the effect of each other. The problem is that the two are separated by a function return so maybe it wouldn't be able to detect that they just undo each other.
Lastly, what about having one private member function that just returns the value and an inline public member function that just takes the address of the value. Then at least the address/dereference operations would take place together and have a better chance of being optimized out, while the whole body of the find() function is not inlined.
private:
V _find(key) {
... // a few dozen lines...
}
public:
inline V* find(key) {
return &_find(key);
}
std::cout << *find(a_key);
This would return a pointer to a temporary, which I didn't think about. The only thing that can be done similar to this is to do a lot of processing in the _find() and do the last step and the return of the pointer in find() to minimize the amount of inlined code.
private:
W* _find(key) {
... // a few dozen lines...
}
public:
inline V* find(key) {
return some_func(_find(key)); // last steps on W to get V*
}
std::cout << *find(a_key);
Or as yet another responder mentioned, we could return a reference to V in the original version (again, not sure why we're all blind to the trivial stuff at first glance... see discussion.)
private:
V& _find(key) {
... // a few dozen lines...
}
public:
inline V* find(key) {
return &_find(key);
}
std::cout << *find(a_key);
_find returns a temporary object of type V. find then attempts to take the address of the temporary and return it. Temporary objects don't last very long, hence the name. So the temporary returned by _find will be destroyed after getting its address. And therefore find will return a pointer to a previously destroyed object, which is bad.
I've seen it go either way. It really depends on the compiler and the level optimization. Even when it does get inlined, I've seen cases where the compiler will not optimize this out.
The only way to see if it does get optimized out it is to actually look at the disassembly.
What you should probably do is to make a version where you manually inline them. Then benchmark it to see if you actually get a noticeable performance gain. If not, then this whole question is moot.
Your code (even in its second incarnation) is broken. _find returns a V, which find destroys immediately before returning its address.
If _find returned a V& to an object that outlives the call (thus producing a correct program), then the dereference would be a no-op, since a reference is no different to a pointer at the machine code level.
Consider the sample application below. It demonstrates what I would call a flawed class design.
#include <iostream>
using namespace std;
struct B
{
B() : m_value(1) {}
long m_value;
};
struct A
{
const B& GetB() const { return m_B; }
void Foo(const B &b)
{
// assert(this != &b);
m_B.m_value += b.m_value;
m_B.m_value += b.m_value;
}
protected:
B m_B;
};
int main(int argc, char* argv[])
{
A a;
cout << "Original value: " << a.GetB().m_value << endl;
cout << "Expected value: 3" << endl;
a.Foo(a.GetB());
cout << "Actual value: " << a.GetB().m_value << endl;
return 0;
}
Output:
Original value: 1
Expected value: 3
Actual value: 4
Obviously, the programmer is fooled by the constness of b. By mistake b points to this, which yields the undesired behavior.
My question: What const-rules should you follow when designing getters/setters?
My suggestion: Never return a reference to a member variable if it can be set by reference through a member function. Hence, either return by value or pass parameters by value. (Modern compilers will optimize away the extra copy anyway.)
Obviously, the programmer is fooled by the constness of b
As someone once said, You keep using that word. I do not think it means what you think it means.
Const means that you cannot change the value. It does not mean that the value cannot change.
If the programmer is fooled by the fact that some other code else can change something that they cannot, they need a better grounding in aliasing.
If the programmer is fooled by the fact that the token 'const' sounds a bit like 'constant' but means 'read only', they need a better grounding in the semantics of the programming language they are using.
So if you have a getter which returns a const reference, then it is an alias for an object you don't have the permission to change. That says nothing about whether its value is immutable.
Ultimately, this comes down to a lack of encapsulation, and not applying the Law of Demeter. In general, don't mutate the state of other objects. Send them a message to ask them to perform an operation, which may (depending on their own implementation details) mutate their state.
If you make B.m_value private, then you can't write the Foo you have. You either make Foo into:
void Foo(const B &b)
{
m_B.increment_by(b);
m_B.increment_by(b);
}
void B::increment_by (const B& b)
{
// assert ( this != &b ) if you like
m_value += b.m_value;
}
or, if you want to ensure that the value is constant, use a temporary
void Foo(B b)
{
m_B.increment_by(b);
m_B.increment_by(b);
}
Now, incrementing a value by itself may or may not be reasonable, and is easily tested for within B::increment_by. You could also test whether &m_b==&b in A::Foo, though once you have a couple of levels of objects and objects with references to other objects rather than values (so &a1.b.c == &a2.b.c does not imply that &a1.b==&a2.b or &a1==&a2), then you really have to just be aware that any operation is potentially aliased.
Aliasing means that incrementing by an expression twice is not the same as incrementing by the value of the expression the first time you evaluated it; there's no real way around it, and in most systems the cost of copying the data isn't worth the risk of avoiding the alias.
Passing in arguments which have the least structure also works well. If Foo() took a long rather than an object which it has to get a long from, then it would not suffer aliasing, and you wouldn't need to write a different Foo() to increment m_b by the value of a C.
I propose a slightly different solution to this that has several advantages (especially in an every increasing, multi-threaded world). Its a simple idea to follow, and that is to "commit" your changes last.
To explain via your example you would simply change the 'A' class to:
struct A
{
const B& GetB() const { return m_B; }
void Foo(const B &b)
{
// copy out what we are going to change;
int itm_value = m_b.m_value;
// perform operations on the copy, not our internal value
itm_value += b.m_value;
itm_value += b.m_value;
// copy over final results
m_B.m_value = itm_value ;
}
protected:
B m_B;
};
The idea here is to place all assignment to memory viewable above the current function at the end, where they pretty much can't fail. This way, if an error is thrown (say there was a divide in the middle of those 2 operations, and if it just happens to be 0) in the middle of the operation, then we aren't left with half baked data in the middle.
Furthermore, in a multi-threading situation, you can do all of the operation, and then just check at the end if anything has changed before your "commit" (an optimistic approach, which will usually pass and usually yield much better results than locking the structure for the entire operation), if it has changed, you simply discard the values and try again (or return a value saying it has failed if there is something it can do instead).
On top of this, the compiler can usually optimise this better, because it is no longer required to write the variables being modified to memory (we are only forcing one read of the value to be changed and one write). This way, the compiler has the option of just keeping the relevant data in a register, saves L1 cache access if not cache misses. Otherwise the compiler will probably make it write to the memory as it doesn't know what aliasing might be taking place (so it can't ensure those values stay the same, if they are all local, it knows it can't be aliasing because the current function is the only one that knows about it).
There's a lot of different things that can happen with the original code posted. I wouldn't be surprised if some compilers (with optimizations enabled) will actually produce code that produces the "expected" result, whereas others won't. All of this is simply because the point at which variables, that aren't 'volatile', are actually written/read from memory isn't well defined within the c++ standards.
The real problem here is atomicity. The precondition of the Foo function is that it's argument doesn't change while in use.
If e.g. Foo had been specified with a value-argument i.s.o. reference argument, no problem would have shown.
Frankly, A::Foo() rubs me the wrong way more than your original problem. Anyhow I look at it, it must be B::Foo(). And inside B::Foo() check for this wouldn't be that outlandish.
Otherwise I do not see how one can specify a generic rule to cover that case. And keep teammates sane.
From past experience, I would treat that as a plain bug and would differentiate two cases: (1) B is small and (2) B is large. If B is small, then simply make A::getB() to return a copy. If B is large, then you have no choice but to handle the case that objects of B might be both rvalue and lvalue in the same expression.
If you have such problems constantly, I'd say simpler rule would be to always return a copy of an object instead of a reference. Because quite often, if object is large, then you have to handle it differently from the rest anyway.
My stupid answer, I leave it here just in case someone else comes up with the same bad idea:
The problem is I think that the object referred to is not const (B const & vs const B &), only the reference is const in your code.