I need help understanding two different versions of functor dispatcher, see here:
#include <cmath>
#include <complex>
double* psi;
double dx = 0.1;
int range;
struct A
{
double operator()(int x) const
{
return dx* (double)x*x;
}
};
template <typename T>
void dispatchA()
{
constexpr T op{};
for (int i=0; i<range; i++)
psi[i]+=op.operator()(i);
}
template <typename T>
void dispatchB(T op)
{
for (int i=0; i<range; i++)
psi[i]+=op.operator()(i);
}
int main(int argc, char** argv)
{
range= argc;
psi = new double[range];
dispatchA<A>();
// dispatchB<A>(A{});
}
Live at https://godbolt.org/z/93h5T46oq
The dispatcher will be called many times in a big loop, so I need to make sure that I'm doing it right.
Both version seem to me unnecessarily complex since the type of the functor is known at compile-time.
DispatchA, because it unnecessarily creates an (constexpr) object.
DispatchB, because it passes the object over and over.
Of course those could be solved by a) making the a static function in the functor,
but static functions are bad practice, right?
b) making a static instance of the functor inside the dispatcher, but then the lifetime of the object grows to the lifetime of the program.
That being said I don't know enough assembly to meaningfully compare the two appoaches.
Is there a more elegant/efficient approach?
This likely isn't the answer you are looking for, but the general advice you are going to get from almost any seasoned developer is to just write the code in a natural/understandable way, and only optimize if you need to.
This may sound like a non-answer, but it's actually good advice.
The majority of the time, the cost you may (if at all) incur due to small decisions like this will be inconsequential overall. Generally, you'll see more gains when optimizing an algorithm more so than optimizing a few instructions. There are, indeed, exceptions to this rule -- but generally such optimizations are part of a tight loop -- and this is the type of thing you can retroactively look at by profiling and benchmarking.
It's better to write code in a way that can be maintained in the future, and only really optimizing it if this proves to be an issue down the line.
For the code in question, both code-snippets when optimized produce identical assembly -- meaning that both approach should perform equally as well in practice (provided the calling characteristics are the same). But even then, benchmarking would be the only real way to verify this.
Since the dispatchers are function template definitions, they are implicitly inline, and their definition will always be visible before invoking. Often, this is enough for an optimizer to both introspect and inline such code (if it deems this is better than not).
... static functions are bad practice, right?
No; static functions are not bad practice. Like any utility in C++, they can surely be misused -- but there is nothing inherently bad about them.
DispatchA, ... unnecessarily creates an (constexpr) object
constexpr objects are constructed at compile-time -- and so you would not see any real cost to this other than perhaps a bit more space on the stack being reserved. This cost would really be minimal.
You could also make this static constexpr instead if you really wanted to avoid this. Although logically the "lifetime of the object grows to the lifetime of the program" as you mentioned, constexpr objects cannot have exit-time behavior in C++, so the cost is virtually nonexistent.
Assuming A is stateless, as it is in your example, and has no non-static data members, they are identical. The compiler is smart enough to see that construction of the object is a no-op and omits it. Let's clear up your code a bit to get clean assembly we can easily reason about:
struct A {
double operator()(int) const noexcept;
};
void useDouble(double);
int genInt();
void dispatchA() {
constexpr A op{};
auto const range = genInt();
for (int i = 0; i < range; i++) useDouble(op(genInt()));
}
void dispatchB(A op) {
auto const range = genInt();
for (int i = 0; i < range; i++) useDouble(op(genInt()));
}
Here, where input comes from and where the output goes is abstracted away. Generated assembly can only differ because of how the op object is created. Compiling it with GCC 11.1, I get identical assembly generation. No creation or initialization of A takes place.
Related
A local variable (say an int) can be stored in a processor register, at least as long as its address is not needed anywhere. Consider a function computing something, say, a complicated hash:
int foo(int const* buffer, int size)
{
int a; // local variable
// perform heavy computations involving frequent reads and writes to a
return a;
}
Now assume that the buffer does not fit into memory. We write a class for computing the hash from chunks of data, calling foo multiple times:
struct A
{
void foo(int const* buffer, int size)
{
// perform heavy computations involving frequent reads and writes to a
}
int a;
};
A object;
while (...more data...)
{
A.foo(buffer, size);
}
// do something with object.a
The example may be a bit contrived. The important difference here is that a was a local variable in the free function and now is a member variable of the object, so the state is preserved across multiple calls.
Now the question: would it be legal for the compiler to load a at the beginning of the foo method into a register and store it back at the end? In effect this would mean that a second thread monitoring the object could never observe an intermediate value of a (synchronization and undefined behavior aside). Provided that speed is a major design goal of C++, this seems to be reasonable behavior. Is there anything in the standard that would keep a compiler from doing this? If no, do compilers actually do this? In other words, can we expect a (possibly small) performance penalty for using a member variable, aside from loading and storing it once at the beginning and the end of the function?
As far as I know, the C++ language itself does not even specify what a register is. However, I think that the question is clear anyway. Whereever this matters, I appreciate answers for a standard x86 or x64 architecture.
The compiler can do that if (and only if) it can prove that nothing else will access a during foo's execution.
That's a non-trivial problem in general; I don't think any compiler attempts to solve it.
Consider the (even more contrived) example
struct B
{
B (int& y) : x(y) {}
void bar() { x = 23; }
int& x;
};
struct A
{
int a;
void foo(B& b)
{
a = 12;
b.bar();
}
};
Looks innocent enough, but then we say
A baz;
B b(baz.a);
baz.foo(b);
"Optimising" this would leave 12 in baz.a, not 23, and that is clearly wrong.
Short answer to "Can a member variable (attribute) reside in a register?": yes.
When iterating through a buffer and writing the temporary result to any sort of primitive, wherever it resides, keeping the temporary result in a register would be a good optimization. This is done frequently in compilers. However, it is implementation based, even influenced by passed flags, so to know the result, you should check the generated assembly.
Within a class method, I'm accessing private attributes - or attributes of a nested class. Moreover, I'm looping over these attributes.
I was wondering what is the most efficient way in terms of time (and memory) between:
copying the attributes and accessing them within the loop
Accessing the attributes within the loop
Or maybe using an iterator over the attribute
I feel my question is related to : Efficiency of accessing a value through a pointer vs storing as temporary value. But in my case, I just need to access a value, not change it.
Example
Given two classes
class ClassA
{
public:
vector<double> GetAVector() { return AVector; }
private:
vector<double> m_AVector;
}
and
class ClassB
{
public:
void MyFunction();
private:
vector<double> m_Vector;
ClassA m_A;
}
I. Should I do:
1.
void ClassB::MyFunction()
{
vector<double> foo;
for(int i=0; i<... ; i++)
{
foo.push_back(SomeFunction(m_Vector[i]));
}
/// do something ...
}
2.
void ClassB::MyFunction()
{
vector<double> foo;
vector<double> VectorCopy = m_Vector;
for(int i=0; i<... ; i++)
{
foo.push_back(SomeFunction(VectorCopy[i]));
}
/// do something ...
}
3.
void ClassB::MyFunction()
{
vector<double> foo;
for(vector<double>::iterator it = m_Vector.begin(); it != m_Vector.end() ; it++)
{
foo.push_back(SomeFunction((*it)));
}
/// do something ...
}
II. What if I'm not looping over m_vector but m_A.GetAVector()?
P.S. : I understood while going through other posts that it's not useful to 'micro'-optimize at first but my question is more related to what really happens and what should be done - as for standards (and coding-style)
You're in luck: you can actually figure out the answer all by yourself, by trying each approach with your compiler and on your operating system, and timing each approach to see how long it takes.
There is no universal answer here, that applies to every imaginable C++ compiler and operating system that exists on the third planet from the sun. Each compiler, and hardware is different, and has different runtime characteristics. Even different versions of the same compiler will often result in different runtime behavior that might affect performance. Not to mention various compilation and optimization options. And since you didn't even specify your compiler and operating system, there's literally no authoritative answer that can be given here.
Although it's true that for some questions of this type it's possible to arrive at the best implementation with a high degree of certainty, for most use cases, this isn't one of them. The only way you can get the answer is to figure it out yourself, by trying each alternative yourself, profiling, and comparing the results.
I can categorically say that 2. is less efficient than 1. Copying to a local copy, and then accessing it like you would the original would only be of potential benefit if accessing a stack variable is quicker than accessing a member one, and it's not, so it's not (if you see what I mean).
Option 3. is trickier, since it depends on the implementation of the iter() method (and end(), which may be called once per loop) versus the implementation of the operator [] method. I could irritate some C++ die-hards and say there's an option 4: ask the Vector for a pointer to the array and use a pointer or array index on that directly. That might just be faster than either!
And as for II, there is a double-indirection there. A good compiler should spot that and cache the result for repeated use - but otherwise it would only be marginally slower than not doing so: again, depending on your compiler.
Without optimizations, option 2 would be slower on every imaginable platform, becasue it will incur copy of the vector, and the access time would be identical for local variable and class member.
With optimization, depending on SomeFunction, performance might be the same or worse for option 2. Same performance would happen if SomeFunction is either visible to compiler to not modify it's argument, or it's signature guarantees that argument will not be modified - in this case compiler can optimize away the copy altogether. Otherwise, the copy will remain.
I'm trying to figure out when const should be used when writing C++ code. Are these all examples of pessimization or is it beneficial to write code this way?:
Example 1:
int findVal(const int OTHER_VAL) const
{
switch(OTHER_VAL)
{
case 1:
return 2;
default:
return 3;
}
}
Example 2:
enum class MobType
{
COW, CHICKEN, DOG, PIG
};
class BaseMob
{
protected:
BaseMob(const MobType TYPE) : TYPE(TYPE) { }
const MobType TYPE;
};
Example 3:
void showWorld(const World &world)
{
auto data = world.getData();
for (auto &i:data)
i.print();
}
No, they aren't.
const on local variables with automatic storage (including function args) is purely syntactic sugar to help human programmers set rules for their code. It doesn't help the optimizer at all. Optimizing compilers extract the necessary data-movement from the C source, and optimize that. They generally don't care if you reuse the same tmp variable for many different things, or have 10 different const tmp1 = a+10; in the same function.
And yes, this applies to function args passed by value; they are local variables with automatic storage, passed in registers or on the stack. And no, this doesn't mean the caller can assume that a function didn't modify the stack memory used for arg-passing, so it doesn't help the optimizer much either. (Making a 2nd function call with the same arguments still requires re-writing the args to the stack (if not all args fit in registers), because the const on an arg doesn't change the fact that the called function "owns" that stack space and can use it as scratch space however it wants.)
const on static/global/reference variables does help. static const int foo = 10; can be inlined as an immediate constant instead of loaded from memory. (e.g. add eax, 10 instead of add eax, [foo]).
Using const to mark a class method as not changing any class members can also help the compiler avoid re-loading class members after a function call. (i.e. keep them live in registers). This mostly only applies if the compiler can't see the function definition, otherwise a good optimizing compiler can just look at what the called function does and optimize accordingly. (As long as it's not in a Unix library, where symbol interposition means that it can't assume the called function it sees at compile time will be the one called after dynamic linking.)
Whenever you logically do not alter a value or an object you should make it const. By logically I do not mean every time you are technically allowed to, but every time it is logical in the context of your functions, classes and code.
A simple example could be a simple "get" function as seen in example 1, these functions should not modify the state of the class, and should therefore be marked constant, as this will help document your intent to the user, besides helping you ensure the invariance of the class.
There are situations where it makes sense to make an immutable object, as seen in example 2. It is not that often we see these in C++, but many other languages use them frequently. If it does not add any value to be able to change a certain member during an objects lifetime, you might as well make it const.
Passing const reference parameters gives you the performance benefits of the reference, but at the same time ensures that the source object is kept unmodified, which is both great documentation to the user, but also allows som optimizations to happen.
Having mentioned all these reasons, there are other reasons to use const as briefly mentioned in the last paragraph, optimizations. When the compiler knows that something is constant and is not being altered it can enable some pretty clever optimizations, don't use const for performance reasons though.
This is also why working around constness by (for instance) the const_cast cast, which can cast away const, can lead to some undesired behaviour. As an example check out the following:
#include <stdio.h>
static const int foo = 10;
int constsum(void) {
return foo + 5;
}
int main(int argc, char* argv[]) {
int a = constsum();
int* newFoo = const_cast<int*>(&foo);
*newFoo = 20;
int b = constsum();
printf("%d\n", a + b);
return 0;
}
As can be seen from this example (see code running here) this might not produce the desired result, as the code result in 30 being printed, and not as perhaps expected 40.
When examining the produced assembly we can see why (compiled into assembly):
constsum():
mov eax, 15
ret
main:
mov eax, 30
ret
The compiler simply inlines the values, as it can see that they constant, it does not take special care that the const_cast is being used.
So const correctness, and use of const is a valuable tool, that can benefit performance, and stability of your code, but also (and not to forget) it helps documenting your code.
I need a once-and-for-all clarification on passing by value/pointer/reference.
If I have a variable such as
int SomeInt = 10;
And I want to pass it to a function like
void DoSomething(int Integer)
{
Integer = 1;
}
In my current scenario when passing SomeInt to DoSomething() I want SomeInt's value to be updated based on whatever we do to it inside of DoSomething() as well as be most efficient on memory and performance so I'm not copying the variable around?. That being said which of the following prototypes would accomplish this task?
void DoSomething(int* Integer);
void DoSomething(int& Integer);
How would I actually pass the variable into the function? What is the difference between the previous two prototypes?
Finally if using a function within a class
class SomeClass
{
int MyInteger;
public:
void ChangeValue(int& NewValue)
{
MyInteger = NewValue;
}
};
If I pass an integer into ChangeValue, when the integer I passed in get's deleted will that mean when I try to use MyInteger from within the class it will no longer be useable?
Thank you all for your time, I know this is kind of a basic question but the explanations I keep running into confuse me further.
Functionally, all three of these work:
pass an int and change the return type to int so you can return the new value, usage: x = f(x);
when you plan to set the value without needing to read the initial value, it's much better to use a function like int DoSomething(); so the caller can just say int x = f(); without having to create x on an earlier line and wondering/worrying whether it needs to be initialised to anything before the call.
pass an int& and set it inside the function, usage: int x; x = ? /* if an input */; f(x);
pass an int* and set the pointed-to int inside the function, usage: int x; x = ?; f(&x);
most efficient on memory and performance so I'm not copying the variable around
Given the C++ Standard doesn't dictate how references should be implemented by the compiler, it's a bit dubious trying to reason about their characteristics - if you care compile your code to assembly or machine code and see how it works out on your particular compiler (for specific compiler commandline options etc.). If you need a rule of thumb, assume that references have identical performance characteristics to pointers unless profiling or generated-code inspection suggests otherwise.
For an int you can expect the first version above to be no slower than the pointer version, and possibly be faster, because the int parameter can be passed and returned in a register without ever needing a memory address.
If/when/where the by-pointer version is inlined there's more chance that the potentially slow "needing a memory address so we can pass a pointer" / "having to dereference a pointer to access/update the value" aspect of the pass-by-pointer version can be optimised out (if you've asked the compiler to try), leaving both versions with identical performance....
Still, if you need to ask a question like this I can't imagine you're writing code where these are the important optimisation choices, so a better aim is to do what gives you the cleanest, most intuitive and robust usage for the client code... now - whether that's x = f(x); (where you might forget the leading x =), or f(x) where you might not realise x could be modified, or f(&x) (where some caller might think they can pass nullptr is a reasonable question in its own right, but separate from your performance concerns. FWIW, the C++ FAQ Lite recommends references over pointers for this kind of situation, but I personally reject its reasoning and conclusions - it all boils down to familiarity with either convention, and how often you need to pass const pointer values, or pointer values where nullptr is a valid sentinel, that could be confused with the you-may-modify-me implication hoped for in your scenario... that depends a lot on your coding style, libraries you use, problem domain etc..
Both of your examples
void DoSomething(int* Integer);
void DoSomething(int& Integer);
will accomplish the task. In the first case - with pointer - you need to call the function with DoSomething(&SomeInt);, in the second case - with reference - simpler as DoSomething(SomeInt);
The recommended way is to use references whenever they are sufficient, and pointers only if they are necessary.
You can use either. Function call for first prototype would be
DoSomething(&SomeInt);
and for second prototype
DoSomething(SomeInt);
As was already said before, you can use both. The advantage of the
void DoSomething(int* Integer)
{
*Integer=0xDEADBEEF;
}
DoSomething(&myvariable);
pattern is that it becomes obvious from the call that myvariable is subject to change.
The advantage of the
void DoSomething(int& Integer)
{
Integer=0xDEADBEEF;
}
DoSomething(myvariable);
pattern is that the code in DoSomething is a bit cleaner, DoSomething has a harder time to mess with memory in bad ways and that you might get better code out of it. Disadvantage is that it isn't immediately obvious from reading the call that myvariable might get changed.
I was recently told in a code review (by an older and wiser C++ developer) to rewrite a class I'd written turning it into a set of static methods instead. He justified this by saying that although my object did contain a very small amount of internal state, it could be derived at runtime anyway and if I changed to static methods I'd avoid the cost of insantiating objects all over the place.
I have now made this change but it got me to thinking, what is the cost of instantiation in C++? I'm aware that in managed languages, there's all the cost of garbage collecting the object which would be significant. However, my C++ object was simply on the stack, it didn't contain any virtual methods so there would be no runtime function lookup cost. I'd used the new C++11 delete mechanism to delete the default copy/assignment operators so there was no copying involved. It was just a simple object with a constructor that did a small amount of work (required anyway with static methods) and a destructor which did nothing. Can anyway tell me what these instation consts would be? (The reviewer is a bit intimidating and I don't want to look stupid by asking him!) ;-)
Short answer - inherently object allocation is cheap but can get expensive in certain cases.
Long Answer
In C++ the cost of instantiating an object is the same as instantiating a struct in C. All an object is, is a block of memory big enough to store the v-table (if it has one) and all the data attributes. Methods consume no further memory after the v-table has been instantiated.
A non-virtual method is a simple function with an implicit this as its first parameter. Calling a virtual function is a bit more complicated since it must to a v-table lookup in order to know which function of which class to call.
This means that instantiating a object on the stack involves a simple decrement of the stack pointer (for a full decending stack).
When an object is instantiated on the heap, the cost can go up substantially. But this is something inherent with any heap related allocation. When allocating memory on the heap, the heap needs to find a free block big enough to hold your object. Finding such a block is a non-constant time operation and can be expensive.
C++ has constructors that may allocated more memory for certain pointer data attributes. These are normally heap allocated. This is further compounded if said data members perform heap allocations themselves. This can lead to something involving a substantial number of instructions.
So bottom line is that it depends on how and what the object is that you are instatiating.
If your object-type must invoke a non-trivial constructor and destructor during it's life-time, then the cost is the going to be the minimum cost of creating any C++ object that has a non-trivial constructor and destructor. Making the rest of your methods static will not reduce that cost. The "price" of space will be at least 1 byte since your class is not a base-class of a derived class, and the only cost-savings in the static class method calls will be the omission of the implicit this pointer passed as the hidden first argument of the call, something that would be required for non-static class methods.
If the methods your reviewer is asking you to re-designate as static never touch the non-static data-members of your class-type, then the passing of the implicit this pointer is a wasted resource, and the reviewer has a good point. Otherwise, you would have to add an argument to the static methods that would take the class-type as either a reference or pointer, nullifying the gained performance from the omission of the implicit this pointer.
Probably not a lot, and I'd be amazed if it were any sort of bottleneck. But there's the principle of the thing if nothing else.
However, you should ask the guy; never be afraid to do that, and it's not entirely clear here that losing the stored state and instead deriving it each time (if that's what you're doing instead) is not going to make things worse. And, if it's not, you'd think a namespace would be better than static methods.
A testcase/example would make this easier to answer categorically, further than "you should ask him".
It depends on what your application does. Is it a real time system on a device with limited memory? If not, most of the time object instantiation won't be an issue, unless you are instantiating millions of these and keeping them around or some weird design like that.
Most systems will have a lot more bottlenecks such as:
user input
network calls
database access
computation intensive algos
thread switching costs
system calls
I think in most cases encapsulation into a class for design trumps small costs of instantiation. Of course there can be those 1% of cases where this doesn't hold but is yours one of those?
As a general rule, if a function can be made static it probably should be. It is cheaper. How much cheaper? That depends on what the object does in its constructor, but the base cost of constructing a C++ object is not that high (dynamic memory allocation of course is more expensive).
The point is not to pay for that which you do not need. If a function can be static, why make it a member function? It makes no sense to be a member function in that case. Will the penalty of creating an object kill the performance of your application? Probably not, but again, why pay for what you don't need?
As others have suggested talk to your colleague and ask him to explain his reasoning. If practical, you should investigate with a small test program the performance of the two versions. Doing both of these will help you grow as a programmer.
In general I agree with the advice to make a member function static if practical. Not because of performance reasons but because it reduces the amount of context you need to remember to understand the behaviour of the function.
It is worth noting that there is one case where using a member function will result in faster code. That case is when the compiler can perform inlining. This is kind of an advanced topic but it is stuff like that makes it hard to write categorical rules about programming.
#include <algorithm>
#include <iostream>
#include <vector>
#include <stdlib.h>
#include <time.h>
bool int_lt(int a, int b)
{
return a < b;
}
int
main()
{
size_t const N = 50000000;
std::vector<int> c1;
c1.reserve(N);
for (size_t i = 0; i < N; ++i) {
int r = rand();
c1.push_back(r);
}
std::vector<int> c2 = c1;
std::vector<int> c3 = c1;
clock_t t1 = clock();
std::sort(c2.begin(), c2.end(), std::less<int>());
clock_t t2 = clock();
std::sort(c3.begin(), c3.end(), int_lt);
clock_t t3 = clock();
std::cerr << (t2 - t1) / double(CLOCKS_PER_SEC) << '\n';
std::cerr << (t3 - t2) / double(CLOCKS_PER_SEC) << '\n';
return 0;
}
On my i7 Linux because g++ can't inline the function int_lt but can inline std::less::operator() the non member function version is about 50% slower.
> g++-4.5 -O2 p3.cc
> ./a.out
3.85
5.88
To understand why such a big difference you need to consider what type the compiler infers for the comparator. In the case int_lt it infers the type bool (*)(int, int) whereas with std::less it infers std::less. With the function pointer the function to be called is only ever known at run time. Which means that it is impossible for the compiler to inline its definition at compile time. In contrast with std::less the compiler has access to the type and its definition at compile time so it can inline std::less::operator(). Which makes a significant difference to performance in this case.
Is this behaviour only related to templates? No, it relates to a loss of abstraction when passing functions as objects. A function pointer does not include as much information as a function object type for the compiler to make use of. Here is a similar example using no templates (well aside from std::vector for convenience).
#include <iostream>
#include <time.h>
#include <vector>
#include <stdlib.h>
typedef long (*fp_t)(long, long);
inline long add(long a, long b)
{
return a + b;
}
struct add_fn {
long operator()(long a, long b) const
{
return a + b;
}
};
long f(std::vector<long> const& x, fp_t const add, long init)
{
for (size_t i = 0, sz = x.size(); i < sz; ++i)
init = add(init, x[i]);
return init;
}
long g(std::vector<long> const& x, add_fn const add, long init)
{
for (size_t i = 0, sz = x.size(); i < sz; ++i)
init = add(init, x[i]);
return init;
}
int
main()
{
size_t const N = 5000000;
size_t const M = 100;
std::vector<long> c1;
c1.reserve(N);
for (size_t i = 0; i < N; ++i) {
long r = rand();
c1.push_back(r);
}
std::vector<long> c2 = c1;
std::vector<long> c3 = c1;
clock_t t1 = clock();
for (size_t i = 0; i < M; ++i)
long s2 = f(c2, add, 0);
clock_t t2 = clock();
for (size_t i = 0; i < M; ++i)
long s3 = g(c3, add_fn(), 0);
clock_t t3 = clock();
std::cerr << (t2 - t1) / double(CLOCKS_PER_SEC) << '\n';
std::cerr << (t3 - t2) / double(CLOCKS_PER_SEC) << '\n';
return 0;
}
Cursory testing indicates that the free function is 100% slower than the member function.
> g++ -O2 p5.cc
> ./a.out
0.87
0.32
Bjarne Stroustrup provided an excellent lecture recently on C++11 which touches on this. You can watch it at the link below.
http://channel9.msdn.com/Events/GoingNative/GoingNative-2012/Keynote-Bjarne-Stroustrup-Cpp11-Style